| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| .. currentmodule:: ibis | ||
| .. _impala-udf: | ||
|
|
||
| ************************* | ||
| Using Impala UDFs in Ibis | ||
| ************************* | ||
|
|
||
| Impala currently supports user-defined scalar functions (known henceforth as | ||
| *UDFs*) and aggregate functions (respectively *UDAs*) via a C++ extension API. | ||
|
|
||
| Initial support for using C++ UDFs in Ibis came in version 0.4.0. | ||
|
|
||
| Using scalar functions (UDFs) | ||
| ----------------------------- | ||
|
|
||
| Let's take an example to illustrate how to make a C++ UDF available to | ||
| Ibis. Here is a function that computes an approximate equality between floating | ||
| point values: | ||
|
|
||
| .. code-block:: c++ | ||
|
|
||
| #include "impala_udf/udf.h" | ||
|
|
||
| #include <cctype> | ||
| #include <cmath> | ||
|
|
||
| BooleanVal FuzzyEquals(FunctionContext* ctx, const DoubleVal& x, const DoubleVal& y) { | ||
| const double EPSILON = 0.000001f; | ||
| if (x.is_null || y.is_null) return BooleanVal::null(); | ||
| double delta = fabs(x.val - y.val); | ||
| return BooleanVal(delta < EPSILON); | ||
| } | ||
|
|
||
| You can compile this to either a shared library (a ``.so`` file) or to LLVM | ||
| bitcode with clang (a ``.ll`` file). Skipping that step for now (will add some | ||
| more detailed instructions here later, promise). | ||
|
|
||
| To make this function callable, we first create a UDF wrapper with | ||
| ``ibis.impala.wrap_udf``: | ||
|
|
||
| .. code-block:: python | ||
| library = '/ibis/udfs/udftest.ll' | ||
| inputs = ['double', 'double'] | ||
| output = 'boolean' | ||
| symbol = 'FuzzyEquals' | ||
| udf_db = 'ibis_testing' | ||
| udf_name = 'fuzzy_equals' | ||
| wrapper = ibis.impala.wrap_udf(library, inputs, output, symbol, name=udf_name) | ||
| In typical workflows, you will set up a UDF in Impala once then use it | ||
| thenceforth. So the *first time* you do this, you need to create the UDF in | ||
| Impala: | ||
|
|
||
| .. code-block:: python | ||
| client.create_udf(wrapper, name=udf_name, database=udf_db) | ||
| Now, we must register this function as a new Impala operation in Ibis. This | ||
| must take place each time you load your Ibis session. | ||
|
|
||
| .. code-block:: python | ||
| operation_class = wrapper.to_operation() | ||
| ibis.impala.add_operation(operation_class, udf_name, udf_db) | ||
| Lastly, we define a *user API* to make ``fuzzy_equals`` callable on Ibis | ||
| expressions: | ||
|
|
||
| .. code-block:: python | ||
| def fuzzy_equals(left, right): | ||
| """ | ||
| Approximate equals UDF | ||
| Parameters | ||
| ---------- | ||
| left : numeric | ||
| right : numeric | ||
| Returns | ||
| ------- | ||
| is_approx_equal : boolean | ||
| """ | ||
| op = operation_class(left, right) | ||
| return op.to_expr() | ||
| Now, we have a callable Python function that works with Ibis expressions: | ||
|
|
||
| .. code-block:: python | ||
| In [35]: db = c.database('ibis_testing') | ||
| In [36]: t = db.functional_alltypes | ||
| In [37]: expr = fuzzy_equals(t.float_col, t.double_col / 10) | ||
| In [38]: expr.execute()[:10] | ||
| Out[38]: | ||
| 0 True | ||
| 1 False | ||
| 2 False | ||
| 3 False | ||
| 4 False | ||
| 5 False | ||
| 6 False | ||
| 7 False | ||
| 8 False | ||
| 9 False | ||
| Name: tmp, dtype: bool | ||
| Note that the call to ``ibis.impala.add_operation`` must happen each time you | ||
| use Ibis. If you have a lot of UDFs, I suggest you create a file with all of | ||
| your wrapper declarations and user APIs that you load with your Ibis session to | ||
| plug in all your own functions. | ||
|
|
||
| Using aggregate functions (UDAs) | ||
| -------------------------------- | ||
|
|
||
| Coming soon. | ||
|
|
||
| Adding UDF functions to Ibis types | ||
| ---------------------------------- | ||
|
|
||
| Coming soon. | ||
|
|
||
| Installing the Impala UDF SDK on OS X and Linux | ||
| ----------------------------------------------- | ||
|
|
||
| Coming soon. | ||
|
|
||
| Impala types to Ibis types | ||
| -------------------------- | ||
|
|
||
| Coming soon. See ``ibis.schema`` for now. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| .. currentmodule:: ibis | ||
| .. _impala: | ||
|
|
||
| ********************* | ||
| Ibis for Impala users | ||
| ********************* | ||
|
|
||
| Another goal of Ibis is to provide an integrated Python API for an Impala | ||
| cluster without requiring you to switch back and forth between Python code and | ||
| the Impala shell (where one would be using a mix of DDL and SQL statements). | ||
|
|
||
| Table metadata | ||
| -------------- | ||
|
|
||
| Computing table statistics | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Impala-backed physical tables have a method ``compute_stats`` that computes | ||
| table, column, and partition-level statistics to assist with query planning and | ||
| optimization. It is good practice to invoke this after creating a table or | ||
| loading new data: | ||
|
|
||
| .. code-block:: python | ||
| table.compute_stats() | ||
| Table partition management | ||
| -------------------------- | ||
|
|
||
| Coming soon |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| .. currentmodule:: ibis | ||
| .. _sql: | ||
|
|
||
| *********************** | ||
| Ibis for SQL Developers | ||
| *********************** |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| .. _tutorial: | ||
|
|
||
| ******** | ||
| Tutorial | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| # Copyright 2014 Cloudera Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import ibis.expr.types as ir | ||
| import ibis.expr.operations as ops | ||
| import ibis | ||
|
|
||
| from ibis.compat import unittest | ||
| from ibis.expr.tests.mocks import BasicTestCase | ||
| from ibis.tests.util import assert_equal | ||
|
|
||
|
|
||
| class TestCaseExpressions(BasicTestCase, unittest.TestCase): | ||
|
|
||
| def test_ifelse(self): | ||
| bools = self.table.g.isnull() | ||
| result = bools.ifelse("foo", "bar") | ||
| assert isinstance(result, ir.StringArray) | ||
|
|
||
| def test_ifelse_literal(self): | ||
| pass | ||
|
|
||
| def test_simple_case_expr(self): | ||
| case1, result1 = "foo", self.table.a | ||
| case2, result2 = "bar", self.table.c | ||
| default_result = self.table.b | ||
|
|
||
| expr1 = self.table.g.lower().cases( | ||
| [(case1, result1), | ||
| (case2, result2)], | ||
| default=default_result | ||
| ) | ||
|
|
||
| expr2 = (self.table.g.lower().case() | ||
| .when(case1, result1) | ||
| .when(case2, result2) | ||
| .else_(default_result) | ||
| .end()) | ||
|
|
||
| assert_equal(expr1, expr2) | ||
| assert isinstance(expr1, ir.Int32Array) | ||
|
|
||
| def test_multiple_case_expr(self): | ||
| case1 = self.table.a == 5 | ||
| case2 = self.table.b == 128 | ||
| case3 = self.table.c == 1000 | ||
|
|
||
| result1 = self.table.f | ||
| result2 = self.table.b * 2 | ||
| result3 = self.table.e | ||
|
|
||
| default = self.table.d | ||
|
|
||
| expr = (ibis.case() | ||
| .when(case1, result1) | ||
| .when(case2, result2) | ||
| .when(case3, result3) | ||
| .else_(default) | ||
| .end()) | ||
|
|
||
| op = expr.op() | ||
| assert isinstance(expr, ir.DoubleArray) | ||
| assert isinstance(op, ops.SearchedCase) | ||
| assert op.default is default | ||
|
|
||
| def test_simple_case_no_default(self): | ||
| # TODO: this conflicts with the null else cases below. Make a decision | ||
| # about what to do, what to make the default behavior based on what the | ||
| # user provides. SQL behavior is to use NULL when nothing else | ||
| # provided. The .replace convenience API could use the field values as | ||
| # the default, getting us around this issue. | ||
| pass | ||
|
|
||
| def test_simple_case_null_else(self): | ||
| expr = self.table.g.case().when("foo", "bar").end() | ||
| op = expr.op() | ||
|
|
||
| assert isinstance(expr, ir.StringArray) | ||
| assert isinstance(op.default, ir.ValueExpr) | ||
| assert isinstance(op.default.op(), ir.NullLiteral) | ||
|
|
||
| def test_multiple_case_null_else(self): | ||
| expr = ibis.case().when(self.table.g == "foo", "bar").end() | ||
| op = expr.op() | ||
|
|
||
| assert isinstance(expr, ir.StringArray) | ||
| assert isinstance(op.default, ir.ValueExpr) | ||
| assert isinstance(op.default.op(), ir.NullLiteral) | ||
|
|
||
| def test_case_type_precedence(self): | ||
| pass | ||
|
|
||
| def test_no_implicit_cast_possible(self): | ||
| pass |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Copyright 2015 Cloudera Inc | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from ibis.impala.client import (ImpalaConnection, ImpalaClient, # noqa | ||
| Database, ImpalaTable) | ||
| from ibis.impala.udf import add_operation, wrap_udf, wrap_uda # noqa | ||
|
|
||
|
|
||
| def connect(host='localhost', port=21050, protocol='hiveserver2', | ||
| database='default', timeout=45, use_ssl=False, ca_cert=None, | ||
| use_ldap=False, ldap_user=None, ldap_password=None, | ||
| use_kerberos=False, kerberos_service_name='impala', | ||
| pool_size=8): | ||
| """ | ||
| Create an Impala Client for use with Ibis | ||
| Parameters | ||
| ---------- | ||
| host : host name | ||
| port : int, default 21050 (HiveServer 2) | ||
| protocol : {'hiveserver2', 'beeswax'} | ||
| database : | ||
| timeout : | ||
| use_ssl : | ||
| ca_cert : | ||
| use_ldap : boolean, default False | ||
| ldap_user : | ||
| ldap_password : | ||
| use_kerberos : boolean, default False | ||
| kerberos_service_name : string, default 'impala' | ||
| Returns | ||
| ------- | ||
| con : ImpalaConnection | ||
| """ | ||
| params = { | ||
| 'host': host, | ||
| 'port': port, | ||
| 'protocol': protocol, | ||
| 'database': database, | ||
| 'timeout': timeout, | ||
| 'use_ssl': use_ssl, | ||
| 'ca_cert': ca_cert, | ||
| 'use_ldap': use_ldap, | ||
| 'ldap_user': ldap_user, | ||
| 'ldap_password': ldap_password, | ||
| 'use_kerberos': use_kerberos, | ||
| 'kerberos_service_name': kerberos_service_name | ||
| } | ||
|
|
||
| return ImpalaConnection(pool_size=pool_size, **params) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # Copyright 2015 Cloudera Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from impala.error import Error as ImpylaError # noqa | ||
| from impala.error import HiveServer2Error as HS2Error # noqa | ||
| import impala.dbapi as impyla # noqa |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # Copyright 2015 Cloudera Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from ibis.tests.conftest import * # noqa |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,212 @@ | ||
| # Copyright 2014 Cloudera Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import pandas as pd | ||
|
|
||
| from ibis.compat import unittest | ||
| from ibis.tests.util import IbisTestEnv, ImpalaE2E, assert_equal, connect_test | ||
|
|
||
| import ibis.common as com | ||
| import ibis.config as config | ||
| import ibis.expr.types as ir | ||
| import ibis.util as util | ||
|
|
||
|
|
||
| def approx_equal(a, b, eps): | ||
| assert abs(a - b) < eps | ||
|
|
||
|
|
||
| ENV = IbisTestEnv() | ||
|
|
||
|
|
||
| class TestImpalaClient(ImpalaE2E, unittest.TestCase): | ||
|
|
||
| def test_raise_ibis_error_no_hdfs(self): | ||
| # #299 | ||
| client = connect_test(ENV, with_hdfs=False) | ||
| self.assertRaises(com.IbisError, getattr, client, 'hdfs') | ||
|
|
||
| def test_get_table_ref(self): | ||
| table = self.db.functional_alltypes | ||
| assert isinstance(table, ir.TableExpr) | ||
|
|
||
| table = self.db['functional_alltypes'] | ||
| assert isinstance(table, ir.TableExpr) | ||
|
|
||
| def test_run_sql(self): | ||
| query = """SELECT li.* | ||
| FROM {0}.tpch_lineitem li | ||
| """.format(self.test_data_db) | ||
| table = self.con.sql(query) | ||
|
|
||
| li = self.con.table('tpch_lineitem') | ||
| assert isinstance(table, ir.TableExpr) | ||
| assert_equal(table.schema(), li.schema()) | ||
|
|
||
| expr = table.limit(10) | ||
| result = expr.execute() | ||
| assert len(result) == 10 | ||
|
|
||
| def test_sql_with_limit(self): | ||
| query = """\ | ||
| SELECT * | ||
| FROM functional_alltypes | ||
| LIMIT 10""" | ||
| table = self.con.sql(query) | ||
| ex_schema = self.con.get_schema('functional_alltypes') | ||
| assert_equal(table.schema(), ex_schema) | ||
|
|
||
| def test_raw_sql(self): | ||
| query = 'SELECT * from functional_alltypes limit 10' | ||
| cur = self.con.raw_sql(query, results=True) | ||
| rows = cur.fetchall() | ||
| cur.release() | ||
| assert len(rows) == 10 | ||
|
|
||
| def test_explain(self): | ||
| t = self.con.table('functional_alltypes') | ||
| expr = t.group_by('string_col').size() | ||
| result = self.con.explain(expr) | ||
| assert isinstance(result, str) | ||
|
|
||
| def test_get_schema(self): | ||
| t = self.con.table('tpch_lineitem') | ||
| schema = self.con.get_schema('tpch_lineitem', | ||
| database=self.test_data_db) | ||
| assert_equal(t.schema(), schema) | ||
|
|
||
| def test_result_as_dataframe(self): | ||
| expr = self.alltypes.limit(10) | ||
|
|
||
| ex_names = expr.schema().names | ||
| result = self.con.execute(expr) | ||
|
|
||
| assert isinstance(result, pd.DataFrame) | ||
| assert list(result.columns) == ex_names | ||
| assert len(result) == 10 | ||
|
|
||
| def test_adapt_scalar_array_results(self): | ||
| table = self.alltypes | ||
|
|
||
| expr = table.double_col.sum() | ||
| result = self.con.execute(expr) | ||
| assert isinstance(result, float) | ||
|
|
||
| with config.option_context('interactive', True): | ||
| result2 = expr.execute() | ||
| assert isinstance(result2, float) | ||
|
|
||
| expr = (table.group_by('string_col') | ||
| .aggregate([table.count().name('count')]) | ||
| .string_col) | ||
|
|
||
| result = self.con.execute(expr) | ||
| assert isinstance(result, pd.Series) | ||
|
|
||
| def test_array_default_limit(self): | ||
| t = self.alltypes | ||
|
|
||
| result = self.con.execute(t.float_col, limit=100) | ||
| assert len(result) == 100 | ||
|
|
||
| def test_limit_overrides_expr(self): | ||
| # #418 | ||
| t = self.alltypes | ||
| result = self.con.execute(t.limit(10), limit=5) | ||
| assert len(result) == 5 | ||
|
|
||
| def test_verbose_log_queries(self): | ||
| queries = [] | ||
|
|
||
| def logger(x): | ||
| queries.append(x) | ||
|
|
||
| with config.option_context('verbose', True): | ||
| with config.option_context('verbose_log', logger): | ||
| self.con.table('tpch_orders', database=self.test_data_db) | ||
|
|
||
| assert len(queries) == 1 | ||
| expected = 'SELECT * FROM {0}.`tpch_orders` LIMIT 0'.format( | ||
| self.test_data_db) | ||
| assert queries[0] == expected | ||
|
|
||
| def test_sql_query_limits(self): | ||
| table = self.con.table('tpch_nation', database=self.test_data_db) | ||
| with config.option_context('sql.default_limit', 100000): | ||
| # table has 25 rows | ||
| assert len(table.execute()) == 25 | ||
| # comply with limit arg for TableExpr | ||
| assert len(table.execute(limit=10)) == 10 | ||
| # state hasn't changed | ||
| assert len(table.execute()) == 25 | ||
| # non-TableExpr ignores default_limit | ||
| assert table.count().execute() == 25 | ||
| # non-TableExpr doesn't observe limit arg | ||
| assert table.count().execute(limit=10) == 25 | ||
| with config.option_context('sql.default_limit', 20): | ||
| # TableExpr observes default limit setting | ||
| assert len(table.execute()) == 20 | ||
| # explicit limit= overrides default | ||
| assert len(table.execute(limit=15)) == 15 | ||
| assert len(table.execute(limit=23)) == 23 | ||
| # non-TableExpr ignores default_limit | ||
| assert table.count().execute() == 25 | ||
| # non-TableExpr doesn't observe limit arg | ||
| assert table.count().execute(limit=10) == 25 | ||
| # eliminating default_limit doesn't break anything | ||
| with config.option_context('sql.default_limit', None): | ||
| assert len(table.execute()) == 25 | ||
| assert len(table.execute(limit=15)) == 15 | ||
| assert len(table.execute(limit=10000)) == 25 | ||
| assert table.count().execute() == 25 | ||
| assert table.count().execute(limit=10) == 25 | ||
|
|
||
| def test_database_repr(self): | ||
| assert self.test_data_db in repr(self.db) | ||
|
|
||
| def test_database_drop(self): | ||
| tmp_name = '__ibis_test_{0}'.format(util.guid()) | ||
| self.con.create_database(tmp_name) | ||
|
|
||
| db = self.con.database(tmp_name) | ||
| self.temp_databases.append(tmp_name) | ||
| db.drop() | ||
| assert not self.con.exists_database(tmp_name) | ||
|
|
||
| def test_namespace(self): | ||
| ns = self.db.namespace('tpch_') | ||
|
|
||
| assert 'tpch_' in repr(ns) | ||
|
|
||
| table = ns.lineitem | ||
| expected = self.db.tpch_lineitem | ||
| attrs = dir(ns) | ||
| assert 'lineitem' in attrs | ||
| assert 'functional_alltypes' not in attrs | ||
|
|
||
| assert_equal(table, expected) | ||
|
|
||
| def test_close_drops_temp_tables(self): | ||
| from posixpath import join as pjoin | ||
|
|
||
| hdfs_path = pjoin(self.test_data_dir, 'parquet/tpch_region') | ||
|
|
||
| client = connect_test(ENV) | ||
| table = client.parquet_file(hdfs_path) | ||
|
|
||
| name = table.op().name | ||
| assert self.con.exists_table(name) is True | ||
| client.close() | ||
|
|
||
| assert not self.con.exists_table(name) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # Copyright 2014 Cloudera Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import ibis | ||
|
|
||
| from ibis.compat import unittest | ||
| from ibis.tests.util import ImpalaE2E, assert_equal | ||
|
|
||
| import ibis.util as util | ||
|
|
||
|
|
||
| class TestPartitioning(ImpalaE2E, unittest.TestCase): | ||
|
|
||
| def test_create_table_with_partition_column(self): | ||
| schema = ibis.schema([('year', 'int32'), | ||
| ('month', 'int8'), | ||
| ('day', 'int8'), | ||
| ('value', 'double')]) | ||
|
|
||
| name = util.guid() | ||
| self.con.create_table(name, schema=schema, partition=['year', 'month']) | ||
| self.temp_tables.append(name) | ||
|
|
||
| # the partition column get put at the end of the table | ||
| ex_schema = ibis.schema([('day', 'int8'), | ||
| ('value', 'double'), | ||
| ('year', 'int32'), | ||
| ('month', 'int8')]) | ||
| table_schema = self.con.get_schema(name) | ||
| assert_equal(table_schema, ex_schema) | ||
|
|
||
| partition_schema = self.con.get_partition_schema(name) | ||
| expected = ibis.schema([('year', 'int32'), | ||
| ('month', 'int8')]) | ||
| assert_equal(partition_schema, expected) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,262 @@ | ||
| # Copyright 2015 Cloudera Inc | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from hashlib import sha1 | ||
|
|
||
| from ibis.common import IbisTypeError | ||
|
|
||
| from ibis.expr.datatypes import validate_type | ||
| import ibis.expr.datatypes as _dt | ||
| import ibis.expr.operations as _ops | ||
| import ibis.expr.rules as rules | ||
| import ibis.expr.types as ir | ||
| import ibis.sql.exprs as _expr | ||
| import ibis.util as util | ||
|
|
||
|
|
||
| class UDFInfo(object): | ||
|
|
||
| def __init__(self, input_type, output_type, name): | ||
| self.inputs = input_type | ||
| self.output = output_type | ||
| self.name = name | ||
|
|
||
| def __repr__(self): | ||
| return ('{0}({1}) returns {2}'.format( | ||
| self.name, | ||
| ', '.join([repr(x) for x in self.inputs]), | ||
| self.output)) | ||
|
|
||
|
|
||
| class UDFCreatorParent(UDFInfo): | ||
|
|
||
| def __init__(self, hdfs_file, input_type, | ||
| output_type, name=None): | ||
| file_suffix = hdfs_file[-3:] | ||
| if not(file_suffix == '.so' or file_suffix == '.ll'): | ||
| raise ValueError('Invalid file type. Must be .so or .ll ') | ||
| self.hdfs_file = hdfs_file | ||
| inputs = [validate_type(x) for x in input_type] | ||
| output = validate_type(output_type) | ||
| new_name = name | ||
| if not name: | ||
| string = self.so_symbol | ||
| for in_type in inputs: | ||
| string += in_type.name() | ||
| new_name = sha1(string).hexdigest() | ||
|
|
||
| UDFInfo.__init__(self, inputs, output, new_name) | ||
|
|
||
| def to_operation(self, name=None): | ||
| """ | ||
| Creates and returns an operator class that can | ||
| be passed to add_operation() | ||
| Parameters | ||
| ---------- | ||
| name : string (optional). Used internally to track function | ||
| Returns | ||
| ------- | ||
| op : an operator class to use in constructing function | ||
| """ | ||
| (in_values, out_value) = _operation_type_conversion(self.inputs, | ||
| self.output) | ||
| class_name = name | ||
| if self.name and not name: | ||
| class_name = self.name | ||
| elif not (name or self.name): | ||
| class_name = 'UDF_{0}'.format(util.guid()) | ||
| func_dict = { | ||
| 'input_type': in_values, | ||
| 'output_type': out_value, | ||
| } | ||
| UdfOp = type(class_name, (_ops.ValueOp,), func_dict) | ||
| return UdfOp | ||
|
|
||
| def get_name(self): | ||
| return self.name | ||
|
|
||
|
|
||
| class UDFCreator(UDFCreatorParent): | ||
|
|
||
| def __init__(self, hdfs_file, input_type, output_type, | ||
| so_symbol, name=None): | ||
| self.so_symbol = so_symbol | ||
| UDFCreatorParent.__init__(self, hdfs_file, input_type, | ||
| output_type, name=name) | ||
|
|
||
|
|
||
| class UDACreator(UDFCreatorParent): | ||
|
|
||
| def __init__(self, hdfs_file, input_type, output_type, init_fn, | ||
| update_fn, merge_fn, finalize_fn, name=None): | ||
| self.init_fn = init_fn | ||
| self.update_fn = update_fn | ||
| self.merge_fn = merge_fn | ||
| self.finalize_fn = finalize_fn | ||
| UDFCreatorParent.__init__(self, hdfs_file, input_type, | ||
| output_type, name=name) | ||
|
|
||
|
|
||
| def _validate_impala_type(t): | ||
| if t in _impala_to_ibis_type: | ||
| return t | ||
| elif _dt._DECIMAL_RE.match(t): | ||
| return t | ||
| raise IbisTypeError("Not a valid Impala type for UDFs") | ||
|
|
||
|
|
||
| def _operation_type_conversion(inputs, output): | ||
| in_type = [validate_type(x) for x in inputs] | ||
| in_values = [rules.value_typed_as(_convert_types(x)) for x in in_type] | ||
| out_type = validate_type(output) | ||
| out_value = rules.shape_like_flatargs(out_type) | ||
| return (in_values, out_value) | ||
|
|
||
|
|
||
| def wrap_uda(hdfs_file, inputs, output, init_fn, update_fn, | ||
| merge_fn, finalize_fn, name=None): | ||
| """ | ||
| Creates and returns a useful container object that can be used to | ||
| issue a create_uda() statement and register the uda within ibis | ||
| Parameters | ||
| ---------- | ||
| hdfs_file: .so file that contains relevant UDA | ||
| inputs: list of strings denoting ibis datatypes | ||
| output: string denoting ibis datatype | ||
| init_fn: string, C++ function name for initialization function | ||
| update_fn: string, C++ function name for update function | ||
| merge_fn: string, C++ function name for merge function | ||
| finalize_fn: C++ function name for finalize function | ||
| name: string (optional). Used internally to track function | ||
| Returns | ||
| ------- | ||
| container : UDA object | ||
| """ | ||
| return UDACreator(hdfs_file, inputs, output, init_fn, | ||
| update_fn, merge_fn, finalize_fn, | ||
| name=name) | ||
|
|
||
|
|
||
| def wrap_udf(hdfs_file, inputs, output, so_symbol, name=None): | ||
| """ | ||
| Creates and returns a useful container object that can be used to | ||
| issue a create_udf() statement and register the udf within ibis | ||
| Parameters | ||
| ---------- | ||
| hdfs_file: .so file that contains relevant UDF | ||
| inputs: list of strings denoting ibis datatypes | ||
| output: string denoting ibis datatype | ||
| so_symbol: string, C++ function name for relevant UDF | ||
| name: string (optional). Used internally to track function | ||
| Returns | ||
| ------- | ||
| container : UDF object | ||
| """ | ||
| return UDFCreator(hdfs_file, inputs, output, so_symbol, name=name) | ||
|
|
||
|
|
||
| def scalar_function(inputs, output, name=None): | ||
| """ | ||
| Creates and returns an operator class that can be passed to add_operation() | ||
| Parameters: | ||
| inputs: list of strings denoting ibis datatypes | ||
| output: string denoting ibis datatype | ||
| name: string (optional). Used internally to track function | ||
| Returns | ||
| ------- | ||
| op : operator class to use in construction function | ||
| """ | ||
| (in_values, out_value) = _operation_type_conversion(inputs, output) | ||
| class_name = name | ||
| if not name: | ||
| class_name = 'UDF_{0}'.format(util.guid()) | ||
|
|
||
| func_dict = { | ||
| 'input_type': in_values, | ||
| 'output_type': out_value, | ||
| } | ||
| UdfOp = type(class_name, (_ops.ValueOp,), func_dict) | ||
| return UdfOp | ||
|
|
||
|
|
||
| def add_operation(op, func_name, db): | ||
| """ | ||
| Registers the given operation within the Ibis SQL translation toolchain | ||
| Parameters | ||
| ---------- | ||
| op: operator class | ||
| name: used in issuing statements to SQL engine | ||
| database: database the relevant operator is registered to | ||
| """ | ||
| full_name = '{0}.{1}'.format(db, func_name) | ||
| arity = len(op.input_type.types) | ||
| _expr._operation_registry[op] = _expr._fixed_arity_call(full_name, arity) | ||
|
|
||
|
|
||
| def _impala_type_to_ibis(tval): | ||
| if tval in _impala_to_ibis_type: | ||
| return _impala_to_ibis_type[tval] | ||
| result = _dt._parse_decimal(tval) | ||
| if result: | ||
| return result.__repr__() | ||
| raise Exception('Not a valid Impala type') | ||
|
|
||
|
|
||
| def _ibis_string_to_impala(tval): | ||
| if tval in _expr._sql_type_names: | ||
| return _expr._sql_type_names[tval] | ||
| result = _dt._parse_decimal(tval) | ||
| if result: | ||
| return result.__repr__() | ||
|
|
||
|
|
||
| def _convert_types(t): | ||
| name = t.name() | ||
| return _conversion_types[name] | ||
|
|
||
|
|
||
| _conversion_types = { | ||
| 'boolean': (ir.BooleanValue), | ||
| 'int8': (ir.Int8Value), | ||
| 'int16': (ir.Int8Value, ir.Int16Value), | ||
| 'int32': (ir.Int8Value, ir.Int16Value, ir.Int32Value), | ||
| 'int64': (ir.Int8Value, ir.Int16Value, ir.Int32Value, ir.Int64Value), | ||
| 'float': (ir.FloatValue, ir.DoubleValue), | ||
| 'double': (ir.FloatValue, ir.DoubleValue), | ||
| 'string': (ir.StringValue), | ||
| 'timestamp': (ir.TimestampValue), | ||
| 'decimal': (ir.DecimalValue, ir.FloatValue, ir.DoubleValue) | ||
| } | ||
|
|
||
|
|
||
| _impala_to_ibis_type = { | ||
| 'boolean': 'boolean', | ||
| 'tinyint': 'int8', | ||
| 'smallint': 'int16', | ||
| 'int': 'int32', | ||
| 'bigint': 'int64', | ||
| 'float': 'float', | ||
| 'double': 'double', | ||
| 'string': 'string', | ||
| 'timestamp': 'timestamp', | ||
| 'decimal': 'decimal' | ||
| } |