New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new backend for MapD #1419

Closed
wants to merge 161 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@xmnlab
Contributor

xmnlab commented Apr 13, 2018

Also resolves #1418 and resolves #893

xmnlab added some commits Apr 5, 2018

Merge pull request #1 from xmnlab/master
Added mapd backend initial files.
Merge pull request #3 from xmnlab/master
Improved mapd client and compiler; Added initial documentation.
Merge pull request #8 from xmnlab/master
README updated; Initial changes to use execute method.
Merge pull request #9 from xmnlab/master
Improving ibis.mapd client and compiler
Merge pull request #11 from xmnlab/master
Added Math, trigonometric and geometric operations
Merge pull request #15 from xmnlab/master
Resolves ibis #1418 and resolves #893
@cpcloud

Thanks @xmnlab! This is solid progress. Let's do a few more review cycles before we merge this in and try to clean up a bit of the duplication. Overall, though, this is pretty close.

@@ -388,6 +389,8 @@ def row_number():
e = ops.E().to_expr()
pi = ops.Pi().to_expr().name('pi')

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

I would leave this unnamed for now. No reason to make this choice for users.

acos = _unary_op('acos', ops.Acos)
asin = _unary_op('asin', ops.Asin)
atan = _unary_op('atan', ops.Atan)
atan2 = _generic_op('atan2', ops.Atan2)

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

This is a binary operation right? There should be something like _binary_op function lying around here.

This comment has been minimized.

@xmnlab

xmnlab Apr 16, 2018

Contributor

you're right. I think this is the right function: _binop_expr

@@ -516,6 +521,53 @@ class Log10(Logarithm):
"""Logarithm base 10"""
# TRIGONOMETRIC OPERATIONS
class TrigonometryUnary(UnaryOp):

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

Can you change the name from TrigonometryUnary to TrigonometricUnary, and do the same for TrigonometryBinary.

@@ -2183,6 +2235,14 @@ def output_type(self):
return partial(ir.FloatingScalar, dtype=dt.float64)
class Pi(Constant):
"""

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

I guess you could saying something here like The constant pi.

elif GPUDataFrame is not None and isinstance(
self.cursor, GPUDataFrame
):
result = self.cursor

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

Since you're executing the same code (assigning self.cursor to result) in the case that the cursor is a pandas DataFrame or a GPUDataFrame, can you remove the last two elifs? Is there a case where self.cursor is not None and it's not either a pandas DataFrame or a GPUDataFrame?

This comment has been minimized.

@xmnlab

xmnlab Apr 16, 2018

Contributor

you're right! thanks!

# compile the argument
compiled_arg = translator.translate(arg)
return 'CHAR_LENGTH(%s)' % compiled_arg

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

Format strings.

import ibis.expr.datatypes as dt
def test_timestamp_accepts_date_literals(alltypes):

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

This is a BigQuery specific test. Do you really need it?

@@ -0,0 +1,5 @@
"""
User Defined Function

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

I wouldn't add this module unless there's support for this in MapD.

assert result == expected
'''
def test_simple_aggregate_execute(alltypes, df):

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

Many of these tests are BigQuery specific. Can you remove them from here?

The preferred alternative is to add appropriate tests in ibis/tests/all/test_*.py. To do that, you'll also need to add a MapD class in ibis/tests/all/backends.py. There are many examples in that file to get you started.

@@ -0,0 +1,798 @@
from six import StringIO

This comment has been minimized.

@cpcloud

cpcloud Apr 14, 2018

Member

It looks like a lot of the functions and objecs in here are duplicated from either the impala or bigquery backends. Can you see if you can reuse some of their functions so we have only what's needed for MapD?

This comment has been minimized.

@xmnlab

xmnlab Apr 17, 2018

Contributor

Ok. I will take a look :) thanks!

@@ -22,6 +22,7 @@ dependencies:
- plumbum
- psycopg2
- pyarrow>=0.6.0
- pymapd

This comment has been minimized.

@cpcloud

cpcloud Jun 6, 2018

Member

does this require a version constraint?

This comment has been minimized.

@xmnlab

xmnlab Jun 6, 2018

Contributor

you're right! it is better to pin the version here. thanks!

@@ -71,6 +72,12 @@
# pip install ibis-framework[bigquery]
import ibis.bigquery.api as bigquery
with suppress(ImportError):
# pip install ibis-framework[mapd]
if sys.version_info[0] < 3:

This comment has been minimized.

@cpcloud

cpcloud Jun 6, 2018

Member

Use sys.version_info.major here

This comment has been minimized.

@xmnlab

xmnlab Jun 6, 2018

Contributor

ok! thanks

with suppress(ImportError):
# pip install ibis-framework[mapd]
if sys.version_info[0] < 3:
raise ImportError('ibis.mapd is not allowed it for Python 2.')

This comment has been minimized.

@cpcloud

cpcloud Jun 6, 2018

Member

typo here, should read ibis.mapd is not allowed for Python 2 or The mapd backend is not supported under Python 2

This comment has been minimized.

@xmnlab

xmnlab Jun 6, 2018

Contributor

thanks!

xmnlab added some commits Jun 7, 2018

Merge pull request #60 from xmnlab/master
 Corrections from the revision
@xmnlab

Thanks @cpcloud !

I've push new changes

dtype = self.left.type().largest
else:
dtype = dt.float64
return dtype.scalar_type()

This comment has been minimized.

@xmnlab

xmnlab Jun 7, 2018

Contributor

Oh I see. Sorry, my mistake. I was a little bit confused. I will changed that.

how = Arg(rlz.isin({'sample', 'pop'}), default=None)
where = Arg(rlz.boolean, default=None)
def output_type(self):

This comment has been minimized.

@xmnlab

xmnlab Jun 7, 2018

Contributor

Ok! thanks!

class Distance(ValueOp):
"""
Calculates distance in meters between two WGS-84 positions.

This comment has been minimized.

@xmnlab

xmnlab Jun 7, 2018

Contributor

Yes, we can do that. We just cannot test it because mapd doesn't have a Euclidean operation.
I am just not sure if common Euclidean distance function works with lat lon parameters .. so maybe would be better to remove this function and create a issue to follow this discussion.

xmnlab added some commits Jun 7, 2018

Merge pull request #61 from xmnlab/master
Changed left and right params to column numeric type
@xmnlab

This comment has been minimized.

Contributor

xmnlab commented Jun 7, 2018

@cpcloud thanks a lot for reviewing this PR.

this is a compilation of the main fixes here:

  1. ci/requirements-dev-3.5.yml: I just made a rollback and works good.
  2. Degrees and Radians: I changed the input to numeric and I changed to output to float64, I also add tests for that on mapd tests.
  3. Correlation and Covariance: Sorry I misunderstood that, thanks for the patience :) ... I changed the output to dt.float64.scalar_type() .. I also added tests for these operations on mapd tests.
  4. Distance: I am not sure if Euclidean distance common functions work with lat lon .. so I removed that from this PR and I will create now a issue to follow up this.

if these points are ok for you, I think it is ready for a new review. I just changed the left and right parameters from correlation to column numeric.

Again, thank you so much for your attention.

@xmnlab

This comment has been minimized.

Contributor

xmnlab commented Jun 13, 2018

hi @cpcloud @kszucs

any update about this PR?

thanks a lot!

@xmnlab

This comment has been minimized.

Contributor

xmnlab commented Jun 13, 2018

there is a conflict now .. I will rebase here now.

xmnlab added some commits Jun 13, 2018

Merge pull request #62 from xmnlab/master
Merged from master
@cpcloud

@xmnlab After you address this round of comments I wil approve and merge! Thanks for the effort!!

self, name, password=None, is_super=None, insert_access=None
):
"""
Create a new MapD database

This comment has been minimized.

@cpcloud

cpcloud Jun 18, 2018

Member

This docstring looks wrong.

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

you're right. thanks!

statement = ddl.DropDatabase(name)
self._execute(statement)
def create_user(self, name, password, is_super=False):

This comment has been minimized.

@cpcloud

cpcloud Jun 18, 2018

Member

This looks like a new API. @xmnlab can you create a follow up issue to add this API to the clients that have support for such functionality?

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

Ok I will do it.

)
self._execute(statement)
def drop_user(self, name):

This comment has been minimized.

@cpcloud

cpcloud Jun 18, 2018

Member

Same as above.

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

ok I will do it! thanks!

@@ -41,9 +41,11 @@ def test_timestamp_extract(backend, alltypes, df, attr):
@pytest.mark.parametrize('unit', [
'Y', 'M', 'D',
param('W', marks=pytest.mark.xfail),
'h', 'm', 's', 'ms', 'us', 'ns'
'h', 'm', 's', 'ms', 'us',
param('ns', marks=pytest.mark.xfail)

This comment has been minimized.

@cpcloud

cpcloud Jun 18, 2018

Member

Does this pass now that you added the skipif_backend?

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

Yes, for some reason when it try to test ns unit .. raise an error that the pytest breaks ...
I tried to use pytest mark for ns and it didn't work .. I needed to skip that for mapd.
for another tests here .. I could remove skipif_backend and just put some xfail for some units and works good.

This comment has been minimized.

@cpcloud

cpcloud Jun 18, 2018

Member

Is it necessary to have both xfail on 'ns' and the skipif_backend('MapD') decorator? Shouldn't it be enough to just skip this on MapD altogether?

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

you're right! sorry, I will remove this now.

pytest.param(Impala, marks=pytest.mark.impala)
]
if sys.version_info.major == 3:

This comment has been minimized.

@cpcloud

cpcloud Jun 18, 2018

Member

should be > 2

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

ok thanks!

),
param(
lambda t: t.double_col.cov(t.float_col),
91.67005567565313,

This comment has been minimized.

@cpcloud

cpcloud Jun 18, 2018

Member

Ultimately we will want to change these to use a pandas call or numpy call, so that we don't have to depend on hard coded values.

This is fine for now though.

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

Ok thanks!

@@ -0,0 +1,422 @@
from ibis.sql.compiler import DDL, DML

This comment has been minimized.

@cpcloud

cpcloud Jun 18, 2018

Member

Do you have tests for the classes in this file? If not, please add them in a follow-up PR.

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

No ... I had some problems adding DDL tests because it was breaking the mapd database container. I will create now a issue for that.

@@ -1752,6 +1814,34 @@ def _string_like(self, patterns):
)
def _string_ilike(self, patterns):

This comment has been minimized.

@cpcloud

cpcloud Jun 18, 2018

Member

Does this API have a test in this PR?

@xmnlab

changes done. I will commit my changes.

),
param(
lambda t: t.double_col.cov(t.float_col),
91.67005567565313,

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

Ok thanks!

pytest.param(Impala, marks=pytest.mark.impala)
]
if sys.version_info.major == 3:

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

ok thanks!

@@ -0,0 +1,422 @@
from ibis.sql.compiler import DDL, DML

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

No ... I had some problems adding DDL tests because it was breaking the mapd database container. I will create now a issue for that.

statement = ddl.DropDatabase(name)
self._execute(statement)
def create_user(self, name, password, is_super=False):

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

Ok I will do it.

self, name, password=None, is_super=None, insert_access=None
):
"""
Create a new MapD database

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

you're right. thanks!

)
self._execute(statement)
def drop_user(self, name):

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

ok I will do it! thanks!

@@ -41,9 +41,11 @@ def test_timestamp_extract(backend, alltypes, df, attr):
@pytest.mark.parametrize('unit', [
'Y', 'M', 'D',
param('W', marks=pytest.mark.xfail),
'h', 'm', 's', 'ms', 'us', 'ns'
'h', 'm', 's', 'ms', 'us',
param('ns', marks=pytest.mark.xfail)

This comment has been minimized.

@xmnlab

xmnlab Jun 18, 2018

Contributor

Yes, for some reason when it try to test ns unit .. raise an error that the pytest breaks ...
I tried to use pytest mark for ns and it didn't work .. I needed to skip that for mapd.
for another tests here .. I could remove skipif_backend and just put some xfail for some units and works good.

xmnlab added some commits Jun 18, 2018

Merge pull request #63 from xmnlab/master
Changes from the review
Merge pull request #64 from xmnlab/master
Removed xfail mark for ns unit
@cpcloud

This comment has been minimized.

Member

cpcloud commented Jun 18, 2018

Merging on green!

@xmnlab

This comment has been minimized.

Contributor

xmnlab commented Jun 18, 2018

@cpcloud thank you so much for your attention and support!

@cpcloud

This comment has been minimized.

Member

cpcloud commented Jun 18, 2018

nice! bombs away!

@cpcloud cpcloud closed this in 037db67 Jun 18, 2018

@xmnlab xmnlab changed the title from [WiP] Adding new backend for MapD to Adding new backend for MapD Jun 19, 2018

This was referenced Jun 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment