Implement coalesce/greatest/least and some e2e testing
Author: Wes McKinney <wes@cloudera.com> Closes #176 from wesm/coalesce-like-functions and squashes the following commits: d02c192 [Wes McKinney] Add root_tables to fix e2e tests for coalesce/greatest/least b779125 [Wes McKinney] Add e2e test for greatest/least 65a8c8b [Wes McKinney] Coalesce / greatest / least expr formatting 342304b [Wes McKinney] API stubs and IR tests for greatest/least too ee23c3e [Wes McKinney] Couple coalesce expression API tests 871e64b [Wes McKinney] Some builtin e2e scaffolding, placeholders for unimplemented conditional built-ins 49816d3 [Wes McKinney] Add a couple BIF stubs
SQL GROUP_CONCAT implementation
Author: Wes McKinney <wes@cloudera.com> Closes #181 from wesm/group-concat-impl and squashes the following commits: 913522b [Wes McKinney] Complete e2e test for some aggregates and group_concat c3a1d3b [Wes McKinney] Implement group concat in IR form
Revamp expression analysis logic having to do with projections
Author: Wes McKinney <wes@cloudera.com> Closes #183 from wesm/projection-roots-rework and squashes the following commits: 41189db [Wes McKinney] Refactor projection fusion and twiddle logic just enough to get tests passing again. Future cleanup will be needed 1a722b1 [Wes McKinney] More rigorous/correct 'predicate pushdown' validation, still some lingering CTE factoring issues due to questionable topk pushdown a4d8f5f [Wes McKinney] Fix up sql translator tests with now properly-factored CTEs, tweak some expr analysis logic 91f648b [Wes McKinney] Failing unit test and break all the things with root modification
Resolve subquery factoring inside correlated subquery bugs
Closes #173 Author: Wes McKinney <wes@cloudera.com> Closes #184 from wesm/correlated-subquery-factoring and squashes the following commits: ba750c0 [Wes McKinney] Checking for foreign exprs should examine parent contexts as well (for example, table that has been extracted in top level subquery). Tests passing again c030d86 [Wes McKinney] Visit filter expression trees looking for subqueries. Introduces new correlated subquery bug when subquery has been factored out in parent context 16a5425 [Wes McKinney] Failing SQL translation and e2e test for correlated subquery + CTE factor failure
ENH: self joins are possible if same table is passed and tuple of col…
…umn names. Closes #165
Grouped table column selection convenience
Closes #135 Author: Wes McKinney <wes@cloudera.com> Closes #185 from wesm/groupby-column-select and squashes the following commits: be3b3dc [Wes McKinney] ENH: add grouped array wrapper API and tests. Decide to quote all identifiers to avoid illegal naming issues f1ac79d [Wes McKinney] REF: move groupby API stuff into its own module
ENH: ibis.desc can created 'deferred' sort keys, take a string instea…
…d of a column expression. Closes #151
Parse requirements.txt in setup.py and match those requirements durin…
…g i... ...nstall. A small change to parse the requirements file and use those reqs during install. Author: Juliet Hougland <juliet@cloudera.com> Closes #192 from jhlch/setup-reqs and squashes the following commits: 74e4b7c [Juliet Hougland] Remove complex debugging logic. 28fdaec [Juliet Hougland] Parse requirements.txt in setup.py and match those requirements during install.
ENH: use operator argument names (if provided) to distinguish subexpr…
…essions in developer repr. Close #32
Preliminary tools for interacting with external datasets in HDFS
Per #35, #136, #139. Also adds list_databases and list_tables APIs (#206). First cut of a "public API" (#182). Author: Wes McKinney <wes@cloudera.com> Closes #209 from wesm/external-table-tools and squashes the following commits: 4d67423 [Wes McKinney] Use HDFS wrapper class b4fd74e [Wes McKinney] Add CSV file query test. explicit delimiter required 2a3a1a0 [Wes McKinney] Start implementing HDFS wrapper for webhdfs 3rd party lib, add CSV test data loading script 9dddc07 [Wes McKinney] Persisting parquet file. Implement list_databases bd0d63e [Wes McKinney] Add a list_tables API per #206 bd9bad1 [Wes McKinney] Fix STORED AS PARQUET issue and complete other parquet test cases 81f5650 [Wes McKinney] Add option to pass webhdfs parameters. Parquet test not quite there yet 309a508 [Wes McKinney] Format schemas in delimited file DDL. More stubs in connection API 3ff54c1 [Wes McKinney] Can create parquet-based tables with schemas 5ea7909 [Wes McKinney] Schema creation API, add more create-from-parquet options 6a8af99 [Wes McKinney] Add webhdfs library requirement d88f271 [Wes McKinney] Scaffolding for initial external table workflows. Hitting rough edges with directories-of-parquet files 4286806 [Wes McKinney] Generate DDL for a broader set of create table statements
ENH: implement modulus binary operator and add type promotion rules f…
…or binary infix ops for decimals. Closes #194
BUG: handle table name and database passed separately, table names ov…
…erlapping with impala identifiers. Close #198
ENH: implement avro_table API and test assuming you have fully-formed…
… avro schema as Python object. Close #214
Fix bugs having to do with using id() as an expr hash key
Closes #102 Author: Wes McKinney <wes@cloudera.com> Closes #219 from wesm/nix-object-id-keys and squashes the following commits: f999d6f [Wes McKinney] Another CTE factoring id() test case db8cc80 [Wes McKinney] id use in expression direct subtitution can fail 6117e23 [Wes McKinney] Use the special set class instead of id + set be1c154 [Wes McKinney] Fix more object-id related bugs in a self-join subquery extraction faa4b5d [Wes McKinney] Fix object-id bugs in expr formatting. No other bugs uncovered yet
ENH: add .contains method to strings, raise exception on python_strin…
…g in string_expr. Close #217
Closes #143. Modestly inspired by `pandas.tseries.offsets`. Author: Wes McKinney <wes@cloudera.com> Closes #225 from wesm/timedelta-api and squashes the following commits: 09c01de [Wes McKinney] Better document ibis.timedelta b8074f7 [Wes McKinney] Pretty repr for timedeltas 063e267 [Wes McKinney] Implement offset multiplication and test some upconversion 5375913 [Wes McKinney] Translation rules and e2e tests for week offset 771fdcf [Wes McKinney] Micro/nanosecond e2e tests 4224852 [Wes McKinney] BUG: fix root_tables issues with Where f51b632 [Wes McKinney] Initial e2e tests for time offsets 3dfaa86 [Wes McKinney] sub/rsub, months impls and basic sql translation tests c5c69bf [Wes McKinney] Handle timestamp + delta right-side dispatch with NotImplemented 2d5ebcd [Wes McKinney] Basic timedelta + timestamp. radd not working yet 9c819bc [Wes McKinney] Implement timedelta API and combine/simple unit promotion 3be25a0 [Wes McKinney] Scaffolding for initial timedelta API
Client and HDFS interface refactor
Per #212, #213 Author: Wes McKinney <wes@cloudera.com> Closes #233 from wesm/ibis-client-refactor and squashes the following commits: afed256 [Wes McKinney] Add dev flag for development sdist and more robust ibis.test function cb94a19 [Wes McKinney] Ignore special files in find_any_file per #212 f41c84e [Wes McKinney] Fix find_any_file use 482716b [Wes McKinney] Fix imports 981ca19 [Wes McKinney] Ping is a method on cursor 74e7b18 [Wes McKinney] Refactor client interface into client-impala connection-hdfs client
Implement bucket and histogram numeric expr transforms, category synt…
…hetic type Per #33 and #34 Author: Wes McKinney <wes@cloudera.com> Closes #242 from wesm/bucket-transform and squashes the following commits: ee037d6 [Wes McKinney] Slighly better label docs and move to analytics module abb8797 [Wes McKinney] Add error checking for number of buckets 2bf3b51 [Wes McKinney] Implement label method for CategoryValue c3609b3 [Wes McKinney] Casting bucket category to int32 is a noop 9b3b971 [Wes McKinney] Handle bucket edge cases and no-bucket under/over case 3abdb9b [Wes McKinney] Fix list repr interactive mode bug and tweak histogram base to avoid some FP error issues' 8dac292 [Wes McKinney] Initial histogram implementation, but interactive mode repr problems dda0475 [Wes McKinney] Fix category type repr f0404e3 [Wes McKinney] More exhaustive bucket test cases, and move dimension creation to translate_expr code path cb90310 [Wes McKinney] Preliminary bucket implementation 013a5b9 [Wes McKinney] Implement basic category type and bucket and histogram APIs
Assorted timestamp usability and API improvements
per #246, #239, and #164 Author: Wes McKinney <wes@cloudera.com> Closes #248 from wesm/timestamp-improvements and squashes the following commits: 7990078 [Wes McKinney] api to ibis 2543a03 [Wes McKinney] Implement an UNIX integer-to-timestamp conversion API 5122602 [Wes McKinney] Handle some formatting of timestamp literals c0b455b [Wes McKinney] Add a timestamp API and add handling for datetime.datetime objects and coercible Python strings d350860 [Wes McKinney] Auto-promote literal strings to timestamp scalars in timestamp/string comparisons using pandas per #164
Add a summary API utility function
Per #243. Also adds an "expression list" type to handle the multiple metrics coming from summary(...) in a principled / composable way. Author: Wes McKinney <wes@cloudera.com> Closes #251 from wesm/summary-api and squashes the following commits: a967afe [Wes McKinney] Better docs d65ffa8 [Wes McKinney] Test a non-numeric summary, too 6f462d4 [Wes McKinney] Summary to SQL evaluation, prefixes 0f099ee [Wes McKinney] Scaffold for summary, and an expression list API
Improve WebHDFS API coverage / support / testing
per #210, #223 Author: Wes McKinney <wes@cloudera.com> Closes #254 from wesm/hdfs-writing and squashes the following commits: f8b7456 [Wes McKinney] Add an hdfs_connect helper top level API bfcebaa [Wes McKinney] Write whole directories to HDFS and test rmdir too ec1d4f1 [Wes McKinney] Start on more comprehensive webhdfs api impl and testing