New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: PyTables Enhancements for future #2391

Closed
jreback opened this Issue Nov 29, 2012 · 12 comments

Comments

Projects
None yet
5 participants
@jreback
Contributor

jreback commented Nov 29, 2012

open (not in any particular order)

  1. add support for other dtypes in table columns (datetime,date,unicode)
  2. Implement variable length strings in a parallel VLArray (and synchronize): PyTables/PyTables#198
  3. revisit Term syntax - can we do better / more readability?
    3a. implement or in Terms (maybe use pyparsing like syntax)
  4. implement WORMTable
  5. one big area is to test whether data columns really are slower; it thus may make sense to make data columns = True the default (but not necessarily index them). see https://groups.google.com/forum/m/?fromgroups#!topic/pydata/cmw1F3OFJSc - see the end of this post for some perf tests, so this is prob not a good idea after all
  6. add export function, to export to different PyTables formats(an easy to read table for R (partially done), and output a GenericTable)
  7. provide better access to columns that are data_columns (as we can directly select them) - see read_column, expand this to the entire table (if possible), allows one to avoid selecting all columns in a table (and then reindexing), this works if columns argument is provided to select or inferred from the where.
  8. add out-of-core computation support (see my comment about 1/2 down in #622), this is partially supported now that we have an iterator (#3078)
  9. add a method to create a table structure (create_table)?, w/o actually appending, so don't have to add parms in each call to append.
  10. Support a better mechanism for table splitting Splitter? that a user can specify how to split (rather than a dict); then store this object, so can automatically recreate the resulting table (enable for both Storer and Table objects)
  11. Optimize table appending, I think we can do better! (GH #3537) makes some improvements
  12. allow itemsize='truncate' to allow subsquent appends to proceed with string truncation (on specific columns)
  13. allow where in select_column, return a properly indexed Series, add option to include the index (use_index=True?)
  14. Better deal with a very long list as input to a Term, but running multiple or sub-queries
  15. Add support for coulumn oriented tables, dep is carray, http://carray.pytables.org/docs/manual/

done

  1. DONE (GH #2401): access store paths via path notation / dot notation (GH #2755)
  2. DONE (GH #2497): add to docs (GH #2397) - issues about reading/writing concurrently in threads/processes
    http://sourceforge.net/mailarchive/message.php?msg_id=30190886
  3. DONE (GH #2497): support panelnd (GH #2242)
  4. DONE (GH #2561): Should DataFrames be automagically indexed on 'index' (prob yes), but then should have a flag in append/put, and enable passing of the indexing options
  5. DONE (GH #2497): Check if create_table_index changes the current index if different options are passed
  6. DONE (GH #2561): for writing add chunk keyword to select to provide generator like behavior - each call to return the next chunk of data
  7. DONE (GH #2561): support multi indexes on tables
    5a. DONE real dtype integration is coming on PR #2708 (eg even though 0.10.1 will actually read/write float32 columns u can't really do much with them w/o having them upcasted) - in any event I think HDFStore will accommodate this already. but more testing needed
  8. DONE iterator support in select, http://stackoverflow.com/questions/14614512/merging-two-tables-with-millions-of-rows-in-python (GH #3078)
  9. DONE (GH #3531) support timezones in datelike columns (index should be ok already) (scott?), (GH #2852)
@gerigk

This comment has been minimized.

Show comment
Hide comment
@gerigk

gerigk Nov 29, 2012

what about allowing creation/access of groups by using "/" in the key.

i.e.,

store.put('some/path/to/df', df)

would create/access the groups some, path, to and finally df.

Right now I can only save the data on one level within an hdf5 file
although HDF5/PyTables supports access by file system like paths.
It would not break anything since the occurrence of a '/' raises an
exception right now.

On Thu, Nov 29, 2012 at 6:20 PM, jreback notifications@github.com wrote:

  1. add support for other dtypes in table columns
    (datetime64,datetime,date,unicode)

  2. support min_itemsize for table columns (currently supported only in
    indexers) also might be a better way of doing this (e.g. have the info
    attached to a dataframe, or support a global pandas option to provide a
    minimum)

  3. revisit Term syntax - can we do better / more readability?

  4. implement WORMTable


    Reply to this email directly or view it on GitHubhttps://github.com//issues/2391.

gerigk commented Nov 29, 2012

what about allowing creation/access of groups by using "/" in the key.

i.e.,

store.put('some/path/to/df', df)

would create/access the groups some, path, to and finally df.

Right now I can only save the data on one level within an hdf5 file
although HDF5/PyTables supports access by file system like paths.
It would not break anything since the occurrence of a '/' raises an
exception right now.

On Thu, Nov 29, 2012 at 6:20 PM, jreback notifications@github.com wrote:

  1. add support for other dtypes in table columns
    (datetime64,datetime,date,unicode)

  2. support min_itemsize for table columns (currently supported only in
    indexers) also might be a better way of doing this (e.g. have the info
    attached to a dataframe, or support a global pandas option to provide a
    minimum)

  3. revisit Term syntax - can we do better / more readability?

  4. implement WORMTable


    Reply to this email directly or view it on GitHubhttps://github.com//issues/2391.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 29, 2012

Contributor

good idea...shouldn't be too hard to implement

Contributor

jreback commented Nov 29, 2012

good idea...shouldn't be too hard to implement

@scottkidder

This comment has been minimized.

Show comment
Hide comment
@scottkidder

scottkidder Jan 30, 2013

Here are things that are most interesting/beneficial to my current workload:

Full Float32 support & full pandas dtype support
WORMTable (unsure of implementation or performance gains)
data_columns is very useful and I can do more testing to determine how fast/slow they are.
**read_column would also be very useful in many instances.

I like the way Term's work. Is there support for ORing Terms or other logical operations in the Selection?

I can pick up work on any of these issues, but I would absolutely to like to discuss some of the details first.

scottkidder commented Jan 30, 2013

Here are things that are most interesting/beneficial to my current workload:

Full Float32 support & full pandas dtype support
WORMTable (unsure of implementation or performance gains)
data_columns is very useful and I can do more testing to determine how fast/slow they are.
**read_column would also be very useful in many instances.

I like the way Term's work. Is there support for ORing Terms or other logical operations in the Selection?

I can pick up work on any of these issues, but I would absolutely to like to discuss some of the details first.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 30, 2013

Contributor

Scott send me an email and I'll send u offline so we can correspond
jeff@reback.net

Contributor

jreback commented Jan 30, 2013

Scott send me an email and I'll send u offline so we can correspond
jeff@reback.net

@meteore

This comment has been minimized.

Show comment
Hide comment
@meteore

meteore Feb 7, 2013

Contributor

Term language: perhaps it makes sense to piggyback on existing syntax. SQL comes to mind, but also XESAM (whole http://xesam.org is down at the time, but one can get the gist of it here: http://banshee.fm/support/guide/searching/.

Contributor

meteore commented Feb 7, 2013

Term language: perhaps it makes sense to piggyback on existing syntax. SQL comes to mind, but also XESAM (whole http://xesam.org is down at the time, but one can get the gist of it here: http://banshee.fm/support/guide/searching/.

@meteore

This comment has been minimized.

Show comment
Hide comment
@meteore

meteore Feb 7, 2013

Contributor

It would be nice if attribute access (e.g. store.df) could be enabled for all the leaves that have suitable names. This might require a big API overhaul, though (store.df.append ...).

Contributor

meteore commented Feb 7, 2013

It would be nice if attribute access (e.g. store.df) could be enabled for all the leaves that have suitable names. This might require a big API overhaul, though (store.df.append ...).

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 7, 2013

Contributor

see #2485, this is actually somewhat easy in HDFStore, the problem is that pandas in general doesnt' propogate these attributes; you can easily store/retrieve attributes if you want on the nodes themselves

something like:

s = store.get_storer('df')
s.attrs['my_attribute'] = 1
Contributor

jreback commented Feb 7, 2013

see #2485, this is actually somewhat easy in HDFStore, the problem is that pandas in general doesnt' propogate these attributes; you can easily store/retrieve attributes if you want on the nodes themselves

something like:

s = store.get_storer('df')
s.attrs['my_attribute'] = 1
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 7, 2013

Contributor

sorry...misundestood your comment....(though you meant saving attributes)

attribute access on the store is not a big deal, will add to the list

Contributor

jreback commented Feb 7, 2013

sorry...misundestood your comment....(though you meant saving attributes)

attribute access on the store is not a big deal, will add to the list

@meteore

This comment has been minimized.

Show comment
Hide comment
@meteore

meteore Feb 7, 2013

Contributor

Thank you for considering this, dotted access will save my pinky a lot of strain [''] (dead keys b/c need accents...).

Regarding attributes on DFs actually this would preempt a number of cases for specialization of DataFrame (see recent MetaDataFrame PR #2695) and in particular perhaps support the addition for metadata that would facilitate automated merges (foreign keys...).

EDIT: there was a discussion about this topic in the mailing list

Contributor

meteore commented Feb 7, 2013

Thank you for considering this, dotted access will save my pinky a lot of strain [''] (dead keys b/c need accents...).

Regarding attributes on DFs actually this would preempt a number of cases for specialization of DataFrame (see recent MetaDataFrame PR #2695) and in particular perhaps support the addition for metadata that would facilitate automated merges (foreign keys...).

EDIT: there was a discussion about this topic in the mailing list

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 7, 2013

Contributor

see #2755 , was pretty easy to add dotted access, so i did!

Contributor

jreback commented Feb 7, 2013

see #2755 , was pretty easy to add dotted access, so i did!

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 12, 2013

Contributor

@scottkidder did you get a chance to look at issue 13. #2852

Contributor

jreback commented Mar 12, 2013

@scottkidder did you get a chance to look at issue 13. #2852

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jul 25, 2016

Contributor

dated

Contributor

jreback commented Jul 25, 2016

dated

@jreback jreback closed this Jul 25, 2016

@jorisvandenbossche jorisvandenbossche modified the milestones: No action, Someday Jul 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment