Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upAdding new backend for MapD #1419
Conversation
xmnlab
added some commits
Apr 5, 2018
cpcloud
requested changes
Apr 14, 2018
Thanks @xmnlab! This is solid progress. Let's do a few more review cycles before we merge this in and try to clean up a bit of the duplication. Overall, though, this is pretty close.
| @@ -388,6 +389,8 @@ def row_number(): | ||
| e = ops.E().to_expr() | ||
| +pi = ops.Pi().to_expr().name('pi') |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Apr 14, 2018
Member
I would leave this unnamed for now. No reason to make this choice for users.
cpcloud
Apr 14, 2018
Member
I would leave this unnamed for now. No reason to make this choice for users.
| +acos = _unary_op('acos', ops.Acos) | ||
| +asin = _unary_op('asin', ops.Asin) | ||
| +atan = _unary_op('atan', ops.Atan) | ||
| +atan2 = _generic_op('atan2', ops.Atan2) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Apr 14, 2018
Member
This is a binary operation right? There should be something like _binary_op function lying around here.
cpcloud
Apr 14, 2018
Member
This is a binary operation right? There should be something like _binary_op function lying around here.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| @@ -516,6 +521,53 @@ class Log10(Logarithm): | ||
| """Logarithm base 10""" | ||
| +# TRIGONOMETRIC OPERATIONS | ||
| + | ||
| +class TrigonometryUnary(UnaryOp): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Apr 14, 2018
Member
Can you change the name from TrigonometryUnary to TrigonometricUnary, and do the same for TrigonometryBinary.
cpcloud
Apr 14, 2018
Member
Can you change the name from TrigonometryUnary to TrigonometricUnary, and do the same for TrigonometryBinary.
| @@ -2183,6 +2235,14 @@ def output_type(self): | ||
| return partial(ir.FloatingScalar, dtype=dt.float64) | ||
| +class Pi(Constant): | ||
| + """ |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + elif GPUDataFrame is not None and isinstance( | ||
| + self.cursor, GPUDataFrame | ||
| + ): | ||
| + result = self.cursor |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Apr 14, 2018
Member
Since you're executing the same code (assigning self.cursor to result) in the case that the cursor is a pandas DataFrame or a GPUDataFrame, can you remove the last two elifs? Is there a case where self.cursor is not None and it's not either a pandas DataFrame or a GPUDataFrame?
cpcloud
Apr 14, 2018
Member
Since you're executing the same code (assigning self.cursor to result) in the case that the cursor is a pandas DataFrame or a GPUDataFrame, can you remove the last two elifs? Is there a case where self.cursor is not None and it's not either a pandas DataFrame or a GPUDataFrame?
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + # compile the argument | ||
| + compiled_arg = translator.translate(arg) | ||
| + | ||
| + return 'CHAR_LENGTH(%s)' % compiled_arg |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| +import ibis.expr.datatypes as dt | ||
| + | ||
| + | ||
| +def test_timestamp_accepts_date_literals(alltypes): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| @@ -0,0 +1,5 @@ | ||
| +""" | ||
| +User Defined Function |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + assert result == expected | ||
| + | ||
| +''' | ||
| +def test_simple_aggregate_execute(alltypes, df): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Apr 14, 2018
Member
Many of these tests are BigQuery specific. Can you remove them from here?
The preferred alternative is to add appropriate tests in ibis/tests/all/test_*.py. To do that, you'll also need to add a MapD class in ibis/tests/all/backends.py. There are many examples in that file to get you started.
cpcloud
Apr 14, 2018
Member
Many of these tests are BigQuery specific. Can you remove them from here?
The preferred alternative is to add appropriate tests in ibis/tests/all/test_*.py. To do that, you'll also need to add a MapD class in ibis/tests/all/backends.py. There are many examples in that file to get you started.
| @@ -0,0 +1,798 @@ | ||
| +from six import StringIO |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Apr 14, 2018
Member
It looks like a lot of the functions and objecs in here are duplicated from either the impala or bigquery backends. Can you see if you can reuse some of their functions so we have only what's needed for MapD?
cpcloud
Apr 14, 2018
Member
It looks like a lot of the functions and objecs in here are duplicated from either the impala or bigquery backends. Can you see if you can reuse some of their functions so we have only what's needed for MapD?
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
added some commits
Apr 14, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Apr 17, 2018
Contributor
When I try to use a mean after a group by the backend return this compile string:
table.group_by('origin_name').dest_lat.mean().compile()SELECT "origin_name", avg("dest_lat") AS mean(dest_lat)
FROM mapd.flights_2008_10k
GROUP BY origin_nameif I try to put a name to the expression, the backend raise an error:
> > > table.group_by('origin_name').dest_lat.mean().name('new_name').compile()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-38-25574603f8fa> in <module>()
1 # TODO: resolve aggregation name
----> 2 table.group_by('origin_name').dest_lat.mean().name('tst').compile()
/media/xmn/29bd0df7-9ccb-4dd1-93a0-e47e2b4f2fc2/dev/quansight/ibis/ibis/expr/types.py in __getattr__(self, key)
441
442 if key not in schema:
--> 443 raise AttributeError(key)
444
445 try:
AttributeError: nameHow can I fixed that? the default name breaks on my backend.
@cpcloud do you have any idea?
|
When I try to use a mean after a group by the backend return this compile string: table.group_by('origin_name').dest_lat.mean().compile()SELECT "origin_name", avg("dest_lat") AS mean(dest_lat)
FROM mapd.flights_2008_10k
GROUP BY origin_nameif I try to put a name to the expression, the backend raise an error: > > > table.group_by('origin_name').dest_lat.mean().name('new_name').compile()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-38-25574603f8fa> in <module>()
1 # TODO: resolve aggregation name
----> 2 table.group_by('origin_name').dest_lat.mean().name('tst').compile()
/media/xmn/29bd0df7-9ccb-4dd1-93a0-e47e2b4f2fc2/dev/quansight/ibis/ibis/expr/types.py in __getattr__(self, key)
441
442 if key not in schema:
--> 443 raise AttributeError(key)
444
445 try:
AttributeError: nameHow can I fixed that? the default name breaks on my backend. @cpcloud do you have any idea? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Apr 17, 2018
Contributor
we have a special case of data type that using encoding key word. Example:
CREATE TABLE IF NOT EXISTS tweets (
tweet_id BIGINT NOT NULL,
tweet_time TIMESTAMP NOT NULL ENCODING FIXED(32),
lat REAL,
lon REAL,
sender_id BIGINT NOT NULL,
sender_name TEXT NOT NULL ENCODING DICT,
location TEXT ENCODING DICT,
source TEXT ENCODING DICT,
reply_to_user_id BIGINT,
reply_to_tweet_id BIGINT,
lang TEXT ENCODING DICT,
followers INT,
followees INT,
tweet_count INT,
join_time TIMESTAMP ENCODING FIXED(32),
tweet_text TEXT,
state TEXT ENCODING DICT,
county TEXT ENCODING DICT,
place_name TEXT,
state_abbr TEXT ENCODING DICT,
county_state TEXT ENCODING DICT,
origin TEXT ENCODING DICT);
What is the best approach to do it?
Refs:
https://www.mapd.com/docs/latest/mapd-core-guide/fixed-encoding/
https://www.mapd.com/docs/latest/mapd-core-guide/tables/
|
we have a special case of data type that using encoding key word. Example: CREATE TABLE IF NOT EXISTS tweets (
tweet_id BIGINT NOT NULL,
tweet_time TIMESTAMP NOT NULL ENCODING FIXED(32),
lat REAL,
lon REAL,
sender_id BIGINT NOT NULL,
sender_name TEXT NOT NULL ENCODING DICT,
location TEXT ENCODING DICT,
source TEXT ENCODING DICT,
reply_to_user_id BIGINT,
reply_to_tweet_id BIGINT,
lang TEXT ENCODING DICT,
followers INT,
followees INT,
tweet_count INT,
join_time TIMESTAMP ENCODING FIXED(32),
tweet_text TEXT,
state TEXT ENCODING DICT,
county TEXT ENCODING DICT,
place_name TEXT,
state_abbr TEXT ENCODING DICT,
county_state TEXT ENCODING DICT,
origin TEXT ENCODING DICT);
What is the best approach to do it? Refs: https://www.mapd.com/docs/latest/mapd-core-guide/fixed-encoding/ |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Apr 17, 2018
Contributor
How is the best way to rewrite ibis.year()?
for our backend, the value returned should be quoted:
SELECT NOW() - INTERVAL '1' YEAR AS tmp|
How is the best way to rewrite ibis.year()? for our backend, the value returned should be quoted: SELECT NOW() - INTERVAL '1' YEAR AS tmp |
xmnlab
added some commits
Apr 18, 2018
xmnlab
added some commits
Jun 5, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Ok I'm reviewing this now. |
| - pymysql | ||
| + - pytables |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Jun 6, 2018
Contributor
I just copied from 3.6.yml to have the same environment ... should I remove that?
xmnlab
Jun 6, 2018
Contributor
I just copied from 3.6.yml to have the same environment ... should I remove that?
| @@ -29,6 +31,7 @@ dependencies: | ||
| - ruamel.yaml | ||
| - six | ||
| - sqlalchemy>=1.0.0,<1.1.15 | ||
| + - thrift |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Jun 6, 2018
Contributor
I can try to remove that .. but I think pymad was breaking without thrift
xmnlab
Jun 6, 2018
Contributor
I can try to remove that .. but I think pymad was breaking without thrift
| @@ -22,6 +22,7 @@ dependencies: | ||
| - plumbum | ||
| - psycopg2 | ||
| - pyarrow>=0.6.0 | ||
| + - pymapd |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| @@ -71,6 +72,12 @@ | ||
| # pip install ibis-framework[bigquery] | ||
| import ibis.bigquery.api as bigquery | ||
| +with suppress(ImportError): | ||
| + # pip install ibis-framework[mapd] | ||
| + if sys.version_info[0] < 3: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| +with suppress(ImportError): | ||
| + # pip install ibis-framework[mapd] | ||
| + if sys.version_info[0] < 3: | ||
| + raise ImportError('ibis.mapd is not allowed it for Python 2.') |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Jun 6, 2018
Member
typo here, should read ibis.mapd is not allowed for Python 2 or The mapd backend is not supported under Python 2
cpcloud
Jun 6, 2018
Member
typo here, should read ibis.mapd is not allowed for Python 2 or The mapd backend is not supported under Python 2
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
added some commits
Jun 7, 2018
| + dtype = self.left.type().largest | ||
| + else: | ||
| + dtype = dt.float64 | ||
| + return dtype.scalar_type() |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Jun 7, 2018
Contributor
Oh I see. Sorry, my mistake. I was a little bit confused. I will changed that.
xmnlab
Jun 7, 2018
Contributor
Oh I see. Sorry, my mistake. I was a little bit confused. I will changed that.
| + how = Arg(rlz.isin({'sample', 'pop'}), default=None) | ||
| + where = Arg(rlz.boolean, default=None) | ||
| + | ||
| + def output_type(self): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + | ||
| +class Distance(ValueOp): | ||
| + """ | ||
| + Calculates distance in meters between two WGS-84 positions. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Jun 7, 2018
Contributor
Yes, we can do that. We just cannot test it because mapd doesn't have a Euclidean operation.
I am just not sure if common Euclidean distance function works with lat lon parameters .. so maybe would be better to remove this function and create a issue to follow this discussion.
xmnlab
Jun 7, 2018
Contributor
Yes, we can do that. We just cannot test it because mapd doesn't have a Euclidean operation.
I am just not sure if common Euclidean distance function works with lat lon parameters .. so maybe would be better to remove this function and create a issue to follow this discussion.
xmnlab
added some commits
Jun 7, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Jun 7, 2018
Contributor
@cpcloud thanks a lot for reviewing this PR.
this is a compilation of the main fixes here:
- ci/requirements-dev-3.5.yml: I just made a rollback and works good.
- Degrees and Radians: I changed the input to numeric and I changed to output to float64, I also add tests for that on mapd tests.
- Correlation and Covariance: Sorry I misunderstood that, thanks for the patience :) ... I changed the output to dt.float64.scalar_type() .. I also added tests for these operations on mapd tests.
- Distance: I am not sure if Euclidean distance common functions work with lat lon .. so I removed that from this PR and I will create now a issue to follow up this.
if these points are ok for you, I think it is ready for a new review. I just changed the left and right parameters from correlation to column numeric.
Again, thank you so much for your attention.
|
@cpcloud thanks a lot for reviewing this PR. this is a compilation of the main fixes here:
if these points are ok for you, I think it is ready for a new review. I just changed the left and right parameters from correlation to column numeric. Again, thank you so much for your attention. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
there is a conflict now .. I will rebase here now. |
xmnlab
added some commits
Jun 13, 2018
cpcloud
requested changes
Jun 18, 2018
@xmnlab After you address this round of comments I wil approve and merge! Thanks for the effort!!
| + self, name, password=None, is_super=None, insert_access=None | ||
| + ): | ||
| + """ | ||
| + Create a new MapD database |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + statement = ddl.DropDatabase(name) | ||
| + self._execute(statement) | ||
| + | ||
| + def create_user(self, name, password, is_super=False): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Jun 18, 2018
Member
This looks like a new API. @xmnlab can you create a follow up issue to add this API to the clients that have support for such functionality?
cpcloud
Jun 18, 2018
Member
This looks like a new API. @xmnlab can you create a follow up issue to add this API to the clients that have support for such functionality?
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + ) | ||
| + self._execute(statement) | ||
| + | ||
| + def drop_user(self, name): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| @@ -41,9 +41,11 @@ def test_timestamp_extract(backend, alltypes, df, attr): | ||
| @pytest.mark.parametrize('unit', [ | ||
| 'Y', 'M', 'D', | ||
| param('W', marks=pytest.mark.xfail), | ||
| - 'h', 'm', 's', 'ms', 'us', 'ns' | ||
| + 'h', 'm', 's', 'ms', 'us', | ||
| + param('ns', marks=pytest.mark.xfail) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Jun 18, 2018
Contributor
Yes, for some reason when it try to test ns unit .. raise an error that the pytest breaks ...
I tried to use pytest mark for ns and it didn't work .. I needed to skip that for mapd.
for another tests here .. I could remove skipif_backend and just put some xfail for some units and works good.
xmnlab
Jun 18, 2018
•
Contributor
Yes, for some reason when it try to test ns unit .. raise an error that the pytest breaks ...
I tried to use pytest mark for ns and it didn't work .. I needed to skip that for mapd.
for another tests here .. I could remove skipif_backend and just put some xfail for some units and works good.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Jun 18, 2018
Member
Is it necessary to have both xfail on 'ns' and the skipif_backend('MapD') decorator? Shouldn't it be enough to just skip this on MapD altogether?
cpcloud
Jun 18, 2018
Member
Is it necessary to have both xfail on 'ns' and the skipif_backend('MapD') decorator? Shouldn't it be enough to just skip this on MapD altogether?
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + pytest.param(Impala, marks=pytest.mark.impala) | ||
| +] | ||
| + | ||
| +if sys.version_info.major == 3: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + ), | ||
| + param( | ||
| + lambda t: t.double_col.cov(t.float_col), | ||
| + 91.67005567565313, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Jun 18, 2018
Member
Ultimately we will want to change these to use a pandas call or numpy call, so that we don't have to depend on hard coded values.
This is fine for now though.
cpcloud
Jun 18, 2018
Member
Ultimately we will want to change these to use a pandas call or numpy call, so that we don't have to depend on hard coded values.
This is fine for now though.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| @@ -0,0 +1,422 @@ | ||
| +from ibis.sql.compiler import DDL, DML |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
cpcloud
Jun 18, 2018
Member
Do you have tests for the classes in this file? If not, please add them in a follow-up PR.
cpcloud
Jun 18, 2018
Member
Do you have tests for the classes in this file? If not, please add them in a follow-up PR.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Jun 18, 2018
Contributor
No ... I had some problems adding DDL tests because it was breaking the mapd database container. I will create now a issue for that.
xmnlab
Jun 18, 2018
Contributor
No ... I had some problems adding DDL tests because it was breaking the mapd database container. I will create now a issue for that.
| @@ -1752,6 +1814,34 @@ def _string_like(self, patterns): | ||
| ) | ||
| +def _string_ilike(self, patterns): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
added some commits
Jun 18, 2018
| + ), | ||
| + param( | ||
| + lambda t: t.double_col.cov(t.float_col), | ||
| + 91.67005567565313, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + pytest.param(Impala, marks=pytest.mark.impala) | ||
| +] | ||
| + | ||
| +if sys.version_info.major == 3: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| @@ -0,0 +1,422 @@ | ||
| +from ibis.sql.compiler import DDL, DML |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Jun 18, 2018
Contributor
No ... I had some problems adding DDL tests because it was breaking the mapd database container. I will create now a issue for that.
xmnlab
Jun 18, 2018
Contributor
No ... I had some problems adding DDL tests because it was breaking the mapd database container. I will create now a issue for that.
| + statement = ddl.DropDatabase(name) | ||
| + self._execute(statement) | ||
| + | ||
| + def create_user(self, name, password, is_super=False): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + self, name, password=None, is_super=None, insert_access=None | ||
| + ): | ||
| + """ | ||
| + Create a new MapD database |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| + ) | ||
| + self._execute(statement) | ||
| + | ||
| + def drop_user(self, name): |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| @@ -41,9 +41,11 @@ def test_timestamp_extract(backend, alltypes, df, attr): | ||
| @pytest.mark.parametrize('unit', [ | ||
| 'Y', 'M', 'D', | ||
| param('W', marks=pytest.mark.xfail), | ||
| - 'h', 'm', 's', 'ms', 'us', 'ns' | ||
| + 'h', 'm', 's', 'ms', 'us', | ||
| + param('ns', marks=pytest.mark.xfail) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xmnlab
Jun 18, 2018
Contributor
Yes, for some reason when it try to test ns unit .. raise an error that the pytest breaks ...
I tried to use pytest mark for ns and it didn't work .. I needed to skip that for mapd.
for another tests here .. I could remove skipif_backend and just put some xfail for some units and works good.
xmnlab
Jun 18, 2018
•
Contributor
Yes, for some reason when it try to test ns unit .. raise an error that the pytest breaks ...
I tried to use pytest mark for ns and it didn't work .. I needed to skip that for mapd.
for another tests here .. I could remove skipif_backend and just put some xfail for some units and works good.
xmnlab
added some commits
Jun 18, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Merging on green! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@cpcloud thank you so much for your attention and support! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
nice! bombs away! |
xmnlab commentedApr 13, 2018
Also resolves #1418 and resolves #893