Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
PyTables enhancements for selection #1996
Comments
jreback
added a commit
to jreback/pandas
that referenced
this issue
Nov 15, 2012
|
|
jreback |
3b690d0
|
This was referenced Nov 15, 2012
jreback
added a commit
to jreback/pandas
that referenced
this issue
Nov 24, 2012
|
|
jreback |
94bd4e7
|
jreback
referenced
this issue
Nov 24, 2012
Closed
Pytables: bug fixes, code cleanup, and much updated docs #2346
jreback
closed this
Nov 24, 2012
|
I wonder if this would be a useful feature to extend this notation to regular DataFrames... has it been discussed before? (I think it may have been.) Someone was trying to roll theIr own DSL for this on SO... |
|
absolutely, hopefuly in #3202, #3393 going to implement
The theory is to accept a numpy-like DSL (but with frames/series/constants) that potentially need alignment and then pass the numpified to Which is also similar to the expressions in a bit non-trivial as to have to take the string expression, compile/parse it, walk the ast tree to find the aligning sections, then repackage to numexpr @hayd up for it???? |
|
Do you think going via Terms is a good solution:
and then the eval'd string would be parsed into that. That way, we could first get select working with Terms (which shouldn't be too bad), and then write the parser for the DSL (we have to come to a consensus on the grammar...). ? |
|
That is definitely a good start on it. The thoughts I had were:
Roughty equivalent to:
which is an easy way to compose (not that user friendly though), This then could replace the syntax in want to give it a try? |
|
Happy to give this a try. Will thrash this out to expressions later in the week, and ping back on the other thread. :) |
|
great! FYI the |
jreback commentedSep 30, 2012
now
changes to pandas.io.pytables to support more natural selection (from tables):
store.select('mypanel', where = [ 'major>=20120103', 'major<=20120401', dict(minor = ['A','B','C' ]))rather than existing
future
not sure that pandas should get really fancy just yet with operations - (e.g. 'or' operations, and actual value selection)
but probably necessary once pandas support 'chunking' type operations on pytables
need to build a full-fledged selection parser to translate to the numexpr type operations (maybe with a patsy backend????)
BUT this may actually be useful to support generic operations in this way on in-memory panels/frames
not sure of use cases here though - I usually just read in 'about' what data I need and sub-select from there
unless you have hundreds of millions of rows I don't know if its necessary to optimize more (in which case it is!)