Database object usability layer
Addresses #266 and #292. Use `ImpalaClient.database` to obtain an object with tab completion for tables. Impala tables now have a `drop` method and fit into the overall database entity class hierarchy. Author: Wes McKinney <wes@cloudera.com> Closes #454 from wesm/orm-ish-layer and squashes the following commits: 214b647 [Wes McKinney] Docstrings 62d811b [Wes McKinney] Testing / cleanup for database object usability layer. Wrap Impala tables/views in a TableExpr subclass with a drop method c3ebe29 [Wes McKinney] Start of an ORM-like database object interaction layer
Adapt merge-pr script for upstream main branch and no authorization
Author: Wes McKinney <wes@cloudera.com> Closes #455 from wesm/merge-pr-upstream and squashes the following commits: 9419724 [Wes McKinney] Leave authorization code in merge-pr in case ever needed ce1167a [Wes McKinney] DEV: Reference upstream instead of origin in merge-pr 86ee41e [Wes McKinney] DEV: Do not require authorization in merge-py.py
BUG: filtered tables should be memoized in the console repr. Close #440
Add type checking e2e test scaffold
Also fixes some bugs in numeric built-ins that fell out. Author: Wes McKinney <wes@cloudera.com> Closes #459 from wesm/e2e-type-checking and squashes the following commits: 5a8551d [Wes McKinney] Add type checking e2e scaffold and fix several output type bugs turned up through this addition. Close #294
Support for secure / kerberized clusters
This patch makes our test suite pass against a kerberized cluster. cc @caseyching for review. Related: #366, #406 Author: Uri Laserson <laserson@cloudera.com> Closes #451 from laserson/kerberos and squashes the following commits: 0dac8eb [Uri Laserson] Added another note on kerb to docs 46b75a1 [Uri Laserson] Added params to docstring 6e0075c [Uri Laserson] ENH: Ibis now *actually* support Kerberos
Add any and all reductions and their cumulative counterparts
Close #433 Author: Wes McKinney <wes@cloudera.com> Closes #478 from wesm/feature/any-all-cumany-cumall and squashes the following commits: e132c13 [Wes McKinney] Add notall to API and test 22afbf7 [Wes McKinney] Add notany and notall d3311d1 [Wes McKinney] Cumulative any and all, plus e2e tests ad28d66 [Wes McKinney] Any/all cleanup and make sure operations can be conditionally reductions
ENH: don't require kerberos libraries by default
Ibis no longer requires installation of Kerberos libraries (through the original `hdfs[kerberos]` dep which required `requests_kerberos`). Fixes #453. Author: Uri Laserson <laserson@cloudera.com> Closes #480 from laserson/IBIS-453 and squashes the following commits: 9c46098 [Uri Laserson] Fix 1db1c72 [Uri Laserson] Show git status output c89af70 [Uri Laserson] ENH: don't require kerberos libraries by default
Create empty partitioned tables and API for getting partition schema
Framework for Impala C++ UDFs/UDAFs in Ibis
Addresses #262 and #195. Replaces previous PR Author: Meghana Vuyyuru <megvuyyuru@gmail.com> Closes #448 from megvuyyuru/UDF and squashes the following commits: 7cbd776 [Meghana Vuyyuru] TST: Adding a test for multi-arg UDF using the decimal datatype a52ed99 [Meghana Vuyyuru] DOC: Adds UDF functions to ibis namespace with docstrings 26b79ce [Meghana Vuyyuru] Can now wrap and use decimal-type UDFs edddad9 [Meghana Vuyyuru] TST: Implicit typecasting testing, drop_udf testing. BUG: first pass at decimal types 0e5e390 [Meghana Vuyyuru] Correcting must_exist logic, better sha1 hashes in udf name_creation bb445d5 [Meghana Vuyyuru] Adding additional test cases e8c9b26 [Meghana Vuyyuru] Separating UDF testing for code cleanliness 2bf8e7c [Meghana Vuyyuru] Modifies some tests to use table columns rather than literals fa98543 [Meghana Vuyyuru] Switching user-facing UDF stuff from Impala types to Ibis types 3ab9192 [Meghana Vuyyuru] Modifies various udf functions semantics for user workflow consistency 67de062 [Meghana Vuyyuru] Switching db to database for code readability 684e362 [Meghana Vuyyuru] ENH, TST: Better behaviour for listing and deleting UDFs. More robust UDF type testing and relevant fixes a0b299e [Meghana Vuyyuru] TST: Adding more C++ UDFs for testing purposes f5d1237 [Meghana Vuyyuru] TST: Adds and uploads .so files to HDFS for testing. Fixes some other test bugs dc86218 [Meghana Vuyyuru] ENH: Fully-tested DDL for creating, deleting, and listing UDFs and UDAFS per #262 Adds functionality for using UDFs in Ibis directly per #195
Add chown and chmod to HDFS clients
from @ihodes, Close #446. Author: Wes McKinney <wes@cloudera.com> Author: Isaac Hodes <isaachodes@gmail.com> Closes #498 from wesm/feature/chown-chmod and squashes the following commits: 1b468ff [Wes McKinney] Fix chown test and skip udf tests if .so not present 18381ce [Isaac Hodes] Add chmod and chown to HDFS clients
ENH: Add conda recipes for Ibis and deps
Addresses #382. Tested on linux-64 and osx-64 arches. Conda packages are available at https://anaconda.org/koverholt for testing using: ``` conda install -c koverholt ibis-framework ``` Author: Kristopher Overholt <koverholt@gmail.com> Closes #449 from koverholt/conda-recipes and squashes the following commits: b3e7a3c [Kristopher Overholt] Remove recipes/deps related to avro and hdfs[kerberos] b86d6dc [Kristopher Overholt] Update recipes for ibis-framework and dependencies c610e69 [Kristopher Overholt] Remove recipes for snakebite and dependencies 1f6b5e2 [Kristopher Overholt] ENH: Add conda recipes for Ibis and deps
ENH: ibis.cross_join can accept more than 2 tables. Close #492
ENH: add ImpalaClient.raw_sql API for running query strings [unsafely]
BUG: fix materialized join handling in SQL translator
Inserting a slight hack here, but fully-materialized-joins (equivalent of a `SELECT *` on top of a series of joins) were handled incorrectly and would fail to generate valid SQL with more than 2 tables in the join. Author: Wes McKinney <wes@cloudera.com> Closes #506 from wesm/bug/sql-materialized-join and squashes the following commits: 24ef2a8 [Wes McKinney] BUG: fix materialized join handling in SQL translator
BUG: Quote field names to avoid identifier / syntax conflicts. Close #…
Enable late binding / composition without local variables using funct…
…ions Close #460 Author: Wes McKinney <wes@cloudera.com> Closes #510 from wesm/feature/late-binding and squashes the following commits: acbab36 [Wes McKinney] Test add/set_column d49024e [Wes McKinney] Make mutate work with functions 500892d [Wes McKinney] Test functions in projections 63505d8 [Wes McKinney] Can call desc on a function 72766c2 [Wes McKinney] Functions in sort_by 220c9a4 [Wes McKinney] Test having 43765fe [Wes McKinney] Test list of filters 614f2f0 [Wes McKinney] Function binding in filter/__getitem__ 95aff59 [Wes McKinney] Bind functions in group_by 6bf55e1 [Wes McKinney] Late binding via functions in aggregate metrics
Generalize top-k expressions and make them executable
This also adds the notion of a more general `AnalyticExpr`. Before, `TopK` was yielding a `BooleanArray` which made execution-as- aggregation a little awkward. I'm hoping this opens up to more generalization in contextual analytic expressions. Close #392 and #91. Author: Wes McKinney <wes@cloudera.com> Closes #514 from wesm/feature/generalize-topk and squashes the following commits: e33d4b5 [Wes McKinney] TopK expressions auto-convert to aggregations during SQL translation. More natural metric names in topk aggregations bdfe320 [Wes McKinney] Model TopK as an analytic expr, and add a summary filter helper operation for parsimony 5f0632d [Wes McKinney] ArrayNode decruft
Allows UDFs to be built without preinstalling Impala's UDF SDK. Also fixes a few bugs that were causing Jenkins builds to crash. Author: Meghana Vuyyuru <megvuyyuru@gmail.com> Closes #513 from megvuyyuru/master and squashes the following commits: fe539f7 [Meghana Vuyyuru] BUG: Fixing incorrect variable naming in create_uda 9a5e765 [Meghana Vuyyuru] ENH: Adds -u/--udf flag for building + uploading only UDF files. Adds -d/--data_dir flag for specifying local test-data directory a71a035 [Meghana Vuyyuru] Switches tests to use .ll rather than .so 2cdff15 [Meghana Vuyyuru] BUG: Fixing dictionary for Python2.6 compatibility 7e8d36d [Meghana Vuyyuru] BUG: Fixes bug with creating test UDFs 3a2ce3d [Meghana Vuyyuru] TST: Modifies UDF C++ code to work without pre-installing UDF SDK
Fix conda recipe for Windows build
Sync Windows build script options with Linux/OS X build script options. Packages can be tested by running the following on Windows: ``` conda install -c koverholt ibis-framework ``` Author: Kristopher Overholt <koverholt@gmail.com> Closes #509 from koverholt/fix/conda-recipes-win and squashes the following commits: 2631674 [Kristopher Overholt] Fix conda recipe for windows build
Fix silent failures if user creates a table with an existing name
Enrich data types and schemas, add schema validation to Client.insert
This makes all data types, including primitives, proper objects, e.g. `Int16` or `String`. This will make it easier to create more complex schemas and deal with implicit casts and other items on the roadmap. Close #235. Author: Wes McKinney <wes@cloudera.com> Closes #523 from wesm/refactor/data-types and squashes the following commits: 78dda3a [Wes McKinney] Add validate option to check that a table's schema is safely insertable (avoiding HS2Error). #235 05b6186 [Wes McKinney] Standardize on type objects, some refactoring d956496 [Wes McKinney] Hack out more data types f15f82b [Wes McKinney] Move value type code into its own module. Fix udf test failures that popped up
ImpalaClient.exists_table does not respect schema
Fixup for exists_table when database is not None Author: Marius van Niekerk <marius.v.niekerk@gmail.com> Author: mariusvniekerk <marius.v.niekerk@gmail.com> Closes #525 from mariusvniekerk/exists_table and squashes the following commits: e8c7be1 [mariusvniekerk] Added test case 5de823f [Marius van Niekerk] Update client.py
Reorganize Impala-specific code and tests cases
Also some assorted cleanup and dead code elimination. Adds `ibis.impala.api` as place to collec tuser API (e.g. `ibis.impala_connect` -> `ibisimpala.connect`) Author: Wes McKinney <wes@cloudera.com> Closes #534 from wesm/refactor/impala-cleaning and squashes the following commits: 2da7bbd [Wes McKinney] Empty spark submodule 0c8fa6a [Wes McKinney] Rename some test cases 07782d6 [Wes McKinney] Add Database/ImpalaTable to api 664fa9d [Wes McKinney] Some consolidation of exprs.py 4aef004 [Wes McKinney] Doc fixes b5a672e [Wes McKinney] Major reorg of Impala SQL translation toolchain and e2e tests 6ce5b3e [Wes McKinney] Start reorganizing impala code into ibis.impala 328957d [Wes McKinney] Move impyla imports to ibis.impala.compat 90b9256 [Wes McKinney] Remove unneeded testing code
Documentation updates for 0.4 release
Author: Wes McKinney <wes@cloudera.com> Closes #531 from wesm/docs/0.4-updates and squashes the following commits: e29ce10 [Wes McKinney] Initial documentation for scalar UDF wrapping e44b4de [Wes McKinney] Make docstring more sphinx-friendly 7c5925e [Wes McKinney] Front page notes 12f0778 [Wes McKinney] Fix duplicate label 656d8e3 [Wes McKinney] Some release notes and API docs for 0.4
Add ImpalaClient.close method that closes Impyla sessions and drops t…
…emp tables Close #533 Author: Wes McKinney <wes@cloudera.com> Closes #536 from wesm/feature/client-close-cleanup and squashes the following commits: 6ffdf61 [Wes McKinney] Add close to API docs 651f7d3 [Wes McKinney] Add WeakValueDictionary to track impyla connections and temporary tables. Add close method to ImpalaClient per #533