Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MapD] Add geo spatial datatype support #1666

Closed
wants to merge 8 commits into from

Conversation

xmnlab
Copy link
Contributor

@xmnlab xmnlab commented Oct 25, 2018

Resolves partial #1665

  • added support to geo spatial data

MapD GEO spatial functions support should be addressed in another PR

ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/expr/datatypes.py Outdated Show resolved Hide resolved
@cpcloud cpcloud added this to the 1.0.0 milestone Oct 28, 2018
@cpcloud cpcloud added the mapd label Oct 28, 2018
@xmnlab
Copy link
Contributor Author

xmnlab commented Oct 29, 2018

@cpcloud @kszucs

I am moving the geospatial types to it own file. now I have a question.
Inside geospatial.py I will import datatypes to create geo spatial data types. so where I should import/add the spatial data? My first guess is to add this inside datatype.py but it will create a circular import ... so I think it is not desired.

what would be the workflow for this implementation?

ibis/expr/geospatial.py Outdated Show resolved Hide resolved
@cpcloud cpcloud self-assigned this Oct 30, 2018
@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 5, 2018

@cpcloud @kszucs

I moved the geospatial types to it own file. now I have a question.
Inside geospatial.py I will import datatypes to create geo spatial data types. so where I should import/add the spatial data? My first guess is to add this inside datatype.py but it will create a circular import ... so I think it is not desired.

what would be the workflow for this implementation?

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 8, 2018

@cpcloud @kszucs we decide to remove the dependence of shapely for now, so I moved back again geo spatial data type from geospatial.py to datatype.py and I removed geospatial.py

just testing the new data types it seems it is working. but It seems I am missing something related to inference ... could you provide any help?

thanks!

ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/expr/datatypes.py Outdated Show resolved Hide resolved
@kszucs
Copy link
Member

kszucs commented Nov 8, 2018

@xmnlab Try removing the infer functions above.

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 8, 2018

@kszucs thanks for the feedback!

with out infer function, it raises an error:

---------------------------------------------------------------------------
InputTypeError                            Traceback (most recent call last)
<ipython-input-3-26e2613b0aa2> in <module>
      1 point = (0, 0)
----> 2 l_point = ibis.literal(point, type='point')
      3 
      4 print(l_point.compile())
      5 print(type(l_point))

/mnt/sda1/dev/quansight/ibis/ibis/expr/types.py in literal(value, type)
    894         dtype = dt.null
    895     else:
--> 896         dtype = dt.infer(value)
    897 
    898     if type is not None:

/mnt/sda1/storage/miniconda/envs/ibis/lib/python3.6/site-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
    276             self._cache[types] = func
    277         try:
--> 278             return func(*args, **kwargs)
    279 
    280         except MDNotImplementedError:

/mnt/sda1/dev/quansight/ibis/ibis/expr/datatypes.py in infer_dtype_default(value)
   1189 @infer.register(object)
   1190 def infer_dtype_default(value):
-> 1191     raise com.InputTypeError(value)
   1192 
   1193 

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 8, 2018

@kszucs

I could resolve the problem here:

  • added a infer function for tuple -> Array(Primitive())
  • added cast for array to point, line, polygon and multipolygon

let me know if this is reasonable or if you prefer another approach.

@xmnlab xmnlab changed the title [WIP] [MapD] Add geo spatial data support [MapD] Add geo spatial data support Nov 8, 2018
@kszucs
Copy link
Member

kszucs commented Nov 8, 2018

That sounds good!

@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 8, 2018

@kszucs it seems all tests passed except for azure (mysql installation issue) and python27 (sqlite issues)

do you have any idea how to fixed that?

@kszucs
Copy link
Member

kszucs commented Nov 10, 2018

MySQL testing has been disabled on the master. I'm rebasing it.

@codecov
Copy link

codecov bot commented Nov 10, 2018

Codecov Report

Merging #1666 into master will decrease coverage by 2.56%.
The diff coverage is 93.9%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #1666      +/-   ##
=========================================
- Coverage   89.97%   87.4%   -2.57%     
=========================================
  Files         186     186              
  Lines       27300   27486     +186     
  Branches     2311    2344      +33     
=========================================
- Hits        24563   24024     -539     
- Misses       2335    3050     +715     
- Partials      402     412      +10
Impacted Files Coverage Δ
ibis/mapd/client.py 50.85% <ø> (ø) ⬆️
ibis/expr/tests/test_decimal.py 100% <ø> (ø) ⬆️
ibis/expr/tests/test_value_exprs.py 99.5% <100%> (+0.01%) ⬆️
ibis/expr/types.py 91.77% <100%> (+0.5%) ⬆️
ibis/expr/tests/test_datatypes.py 100% <100%> (ø) ⬆️
ibis/mapd/tests/test_operations.py 98.11% <100%> (+0.61%) ⬆️
ibis/expr/api.py 93.06% <100%> (-0.38%) ⬇️
ibis/mapd/operations.py 72.59% <83.33%> (+1.17%) ⬆️
ibis/expr/datatypes.py 94.87% <93.47%> (-0.2%) ⬇️
ibis/bigquery/tests/test_client.py 25.87% <0%> (-73.55%) ⬇️
... and 18 more

kszucs
kszucs previously approved these changes Nov 11, 2018
@xmnlab
Copy link
Contributor Author

xmnlab commented Nov 11, 2018

Thanks @kszucs !! It seems awesome!

@kszucs kszucs dismissed their stale review November 11, 2018 12:42

Found issues with implicit casting.

@kszucs
Copy link
Member

kszucs commented Nov 11, 2018

@xmnlab @cpcloud In overall the new datatypes are working, however We'll need a couple of follow-up PRs.

The current implementation doesn't reflect the hierarchy between the spatial types:

Point = Array[Numeric, 2]
Line = Array[Point]
Polygon = Array[Line]
Multypolygon = Array[Polygon]

And the implicit casting rules need to be more restrictive as well.

__slots__ = ()


class MultiPolygon(GeoSpatial):
Copy link
Member

@cpcloud cpcloud Nov 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document these types? I think people may wonder how these types map to PostgreSQL's versions of these, since they are pretty similar. Also, these types should be flexible enough to support Postgres's versions. I think the exercise of going through the comparison will be very helpful in determining if this is the right set of types to support databases that support them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds good .. I will also change Line to Linestring. I will also add srid type information.

]
)
def test_literal_cases(value, expected_type):
@pytest.mark.parametrize(['value', 'expected_type'], [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave this formatting as is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some unknown reason I lost this ... I am doing that.

multipolygon1 = [polygon1, polygon2]


@pytest.mark.parametrize(['value', 'expected_type'], [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave this formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK .. I am doing that. thanks!

ibis/expr/types.py Show resolved Hide resolved
@@ -330,9 +330,43 @@ def _cross_join(translator, expr):
return translator.translate(left.join(right, ibis.literal(True)))


def _format_point_value(value):
return ' '.join([str(v) for v in value])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to construct a list when calling str.join.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xmnlab please remove the list comprehension and use a generator instead like Phillip mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. sorry I lost that for a unknown reason. I am fixing that again. thanks again!

@@ -602,6 +603,38 @@ def __str__(self) -> str:
)


class GeoSpatial(DataType):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some shared functionality among all child classes of this class? If not, then we should just make the geospatial types subclass from DataType and not tie them together unnecessarily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to create the GeoSpatial data type just to generalize the functions parameters, for example:

ST_Distance(poly1, ST_GeomFromText('POINT(0 0)'))

where st_distance returns shortest planar distance between geometries

maybe would be better to change GeoSpatial name to Geometry

what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I will think more about this .. postgis use geometry and geography types ...

ibis/expr/datatypes.py Show resolved Hide resolved
elif isinstance(expr, ir.PolygonScalar):
return "POLYGON({0!s})".format(_format_polygon_value(value))
elif isinstance(expr, ir.MultiPolygonScalar):
return "MULTIPOLYGON({0!s})".format(_format_multipolygon_value(value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the 0!s in the format spec, just write it as {}.

Copy link
Contributor Author

@xmnlab xmnlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cpcloud I will work on that.

ibis/expr/types.py Show resolved Hide resolved
@@ -602,6 +603,38 @@ def __str__(self) -> str:
)


class GeoSpatial(DataType):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I will think more about this .. postgis use geometry and geography types ...

@xmnlab
Copy link
Contributor Author

xmnlab commented Dec 4, 2018

@cpcloud @kszucs

I was thinking, I will change the geospatial literals to accept 2 new arguments, maybe something like (option 1):

literal((1, 2), type='point;4326:geography')

when geotype (geography) and srid (4326) are optionals.

is this pattern ok for geo types? I am trying to do something more closely to the postgis literal, example:

'SRID=4326;POINT(1 2)'::geography

or should I use something like the other ibis types, using () or <> ? Example (option 2):

literal((1, 2), type='point<geography>(4326))

@xmnlab
Copy link
Contributor Author

xmnlab commented Dec 10, 2018

hey @kszucs @cpcloud

any thought about the last comment (#1666 (comment)) ?

@xmnlab xmnlab mentioned this pull request Dec 18, 2018
20 tasks
@kszucs
Copy link
Member

kszucs commented Jan 31, 2019

@xmnlab please rebase

@xmnlab
Copy link
Contributor Author

xmnlab commented Jan 31, 2019

@kszucs @cpcloud I rebased it from master. could this PR be merged?

@xmnlab
Copy link
Contributor Author

xmnlab commented Feb 5, 2019

@cpcloud @kszucs any feedback about this PR?

@@ -946,6 +1043,28 @@ def type(self) -> DataType:
struct : "struct" "<" field ":" type ("," field ":" type)* ">"

field : [a-zA-Z_][a-zA-Z_0-9]*

Copy link
Member

@kszucs kszucs Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xmnlab please add test cases for the newly introduces parsing parts, like the semicolon

Copy link
Member

@kszucs kszucs Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for example add an example parametrization of test_dtype with linestring;<srid>:<geotype>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov doesn't redirect properly, see the lines from 1191.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @kszucs! let me know if it is missing anything else 👍

ibis/expr/datatypes.py Outdated Show resolved Hide resolved
ibis/mapd/operations.py Outdated Show resolved Hide resolved
ibis/mapd/operations.py Outdated Show resolved Hide resolved
@kszucs
Copy link
Member

kszucs commented Feb 21, 2019

@xmnlab please add an unreleased version to https://github.com/ibis-project/ibis/blob/a16772aea43a936cadc39046ab96d3f54526ecdf/docs/source/release.rst

With an "Experimental GeoSpatial datatype support" (or something like that) entry and a reference to the appropiate issue.

@kszucs kszucs changed the title [MapD] Add geo spatial data support [MapD] Add geo spatial datatype support Feb 21, 2019
xmnlab and others added 6 commits February 21, 2019 17:37
Moving geospatial data to its own module.

Added new changes

Added geo data type for mapd

Fixed flake8 issue

Fixed linestring

Added infer for tuple

Passed an connected ibis as a fixture

Changed ibis_connected fixture

Changed fixture scope

Refactoring geo spatial data type tests

Added initial structure for spatial data support

Moving geospatial data to its own module.

Added new changes

Added geo data type for mapd

Fixed flake8 issue

Fixed linestring

Added infer for tuple

Passed an connected ibis as a fixture

Changed ibis_connected fixture

Changed fixture scope

Refactoring geo spatial data type tests
Added srid and geotypes

fixed tests and pep8

Fixed pep8 issues

Fixed linestring test for datatypes

Removed the inference tuple function.

Fixing geospatial literal for mapd

Fixed line -> linestring

Added map for geospatial datatypes

Fixed pep8 issues

Fixed pep8 issues

changed line -> linestring

Fixed small issues from PR feedback

Fixed description for geo type usage.
@@ -55,8 +53,7 @@

with suppress(ImportError):
# pip install ibis-framework[mapd]
if sys.version_info.major >= 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@kszucs
Copy link
Member

kszucs commented Feb 22, 2019

The coverage is good now too. Thanks @xmnlab!

@kszucs kszucs closed this in 1eb3cb5 Feb 22, 2019
@xmnlab xmnlab deleted the add_geospatial_support branch February 22, 2019 15:10
kszucs added a commit that referenced this pull request Mar 6, 2019
This PR solves #1665 and solves #1707      Add Geo Spatial functions
on the main structure and define these functions inside MapD backend.
References:     - Quansight/omnisci#21  -
https://www.omnisci.com/docs/latest/5_geospatial_functions.html
Depends on #1666 ( PR 1666 was used as base for the current PR)     #
Geospatial functions    - Geometry/Geography Constructors    - [x]
ST_GeomFromText(WKT) - using literals    - [x] ST_GeogFromText(WKT) -
using literals  - Geometry Editors    - ~ST_Transform (Returns a new
geometry with its coordinates transformed to a different spatial
reference system.)~    - ~ST_SetSRID (Sets the SRID on a geometry to a
particular integer value.)~  - Geometry Accessors    - [x] ST_X
(Return the X coordinate of the point, or NULL if not available. Input
must be a point.)    - [x] ST_Y (Return the Y coordinate of the point,
or NULL if not available. Input must be a point.)    - [x] ST_XMin
(Returns Y minima of a bounding box 2d or 3d or a geometry.)    - [x]
ST_XMax (Returns X maxima of a bounding box 2d or 3d or a geometry.)
- [x] ST_YMin (Returns Y minima of a bounding box 2d or 3d or a
geometry.)    - [x] ST_YMax (Returns Y maxima of a bounding box 2d or
3d or a geometry.)    - [x] ST_StartPoint (Returns the first point of
a LINESTRING geometry as a POINT or NULL if the input parameter is not
a LINESTRING.)    - [x] ST_EndPoint (Returns the last point of a
LINESTRING geometry as a POINT or NULL if the input parameter is not a
LINESTRING.)    - [x] ST_PointN (Return the Nth point in a single
linestring in the geometry. Negative values are counted backwards from
the end of the LineString, so that -1 is the last point. Returns NULL
if there is no linestring in the geometry.)    - [x] ST_NPoints
(Return the number of points in a geometry. Works for all geometries.)
- [x] ST_NRings (If the geometry is a polygon or multi-polygon returns
the number of rings. It counts the outer rings as well.)    - [x]
ST_SRID (Returns the spatial reference identifier for the ST_Geometry)
- Spatial Relationships and Measurements    - [x] ST_Distance    - [x]
ST_Contains    - [x] ST_Area    - [x] ST_Perimeter    - [x] ST_Length
- [x] ST_MaxDistance  - Extra    - ~CastToGeography~ TODO: will be
added in a new PR.

Author: Krisztián Szűcs <szucs.krisztian@gmail.com>
Author: Ivan Ogasawara <ivan.ogasawara@gmail.com>

Closes #1678 from xmnlab/geospatial_functions and squashes the following commits:

4f94baf [Krisztián Szűcs] update docs conda dependencies
44dfaa6 [Krisztián Szűcs] remove IBIS_TEST_DOWNLOAD_DIRECTORY from the docs container as well
a96c0fd [Ivan Ogasawara] Added pymapd dependence from conda
a1e177d [Krisztián Szűcs] use pkg_resources to get pymapd's version
2cc2285 [Krisztián Szűcs] fox download path in docker-compose
e4d4410 [Krisztián Szűcs] more robost testing data download script; updated requirements
bb4c8f4 [Krisztián Szűcs] use the zip github endpoint to download the repository
4c3969c [Ivan Ogasawara] Added more tests
@ian-r-rose ian-r-rose mentioned this pull request May 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants