Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MapD] Add geo spatial datatype support #1666

Closed
wants to merge 8 commits into from

Conversation

@xmnlab
Copy link
Collaborator

commented Oct 25, 2018

Resolves partial #1665

  • added support to geo spatial data

MapD GEO spatial functions support should be addressed in another PR

Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/expr/datatypes.py Outdated

@cpcloud cpcloud added this to the 1.0.0 milestone Oct 28, 2018

@cpcloud cpcloud added this to To do in New Operations and Types via automation Oct 28, 2018

@cpcloud cpcloud added this to In progress in MapD via automation Oct 28, 2018

@cpcloud cpcloud added the mapd label Oct 28, 2018

@xmnlab

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 29, 2018

@cpcloud @kszucs

I am moving the geospatial types to it own file. now I have a question.
Inside geospatial.py I will import datatypes to create geo spatial data types. so where I should import/add the spatial data? My first guess is to add this inside datatype.py but it will create a circular import ... so I think it is not desired.

what would be the workflow for this implementation?

Show resolved Hide resolved ibis/expr/geospatial.py Outdated

MapD automation moved this from In progress to Needs review Oct 30, 2018

@cpcloud cpcloud self-assigned this Oct 30, 2018

@xmnlab

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 5, 2018

@cpcloud @kszucs

I moved the geospatial types to it own file. now I have a question.
Inside geospatial.py I will import datatypes to create geo spatial data types. so where I should import/add the spatial data? My first guess is to add this inside datatype.py but it will create a circular import ... so I think it is not desired.

what would be the workflow for this implementation?

@xmnlab

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 8, 2018

@cpcloud @kszucs we decide to remove the dependence of shapely for now, so I moved back again geo spatial data type from geospatial.py to datatype.py and I removed geospatial.py

just testing the new data types it seems it is working. but It seems I am missing something related to inference ... could you provide any help?

thanks!

Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/expr/datatypes.py Outdated
@kszucs

This comment has been minimized.

Copy link
Member

commented Nov 8, 2018

@xmnlab Try removing the infer functions above.

@xmnlab

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 8, 2018

@kszucs thanks for the feedback!

with out infer function, it raises an error:

---------------------------------------------------------------------------
InputTypeError                            Traceback (most recent call last)
<ipython-input-3-26e2613b0aa2> in <module>
      1 point = (0, 0)
----> 2 l_point = ibis.literal(point, type='point')
      3 
      4 print(l_point.compile())
      5 print(type(l_point))

/mnt/sda1/dev/quansight/ibis/ibis/expr/types.py in literal(value, type)
    894         dtype = dt.null
    895     else:
--> 896         dtype = dt.infer(value)
    897 
    898     if type is not None:

/mnt/sda1/storage/miniconda/envs/ibis/lib/python3.6/site-packages/multipledispatch/dispatcher.py in __call__(self, *args, **kwargs)
    276             self._cache[types] = func
    277         try:
--> 278             return func(*args, **kwargs)
    279 
    280         except MDNotImplementedError:

/mnt/sda1/dev/quansight/ibis/ibis/expr/datatypes.py in infer_dtype_default(value)
   1189 @infer.register(object)
   1190 def infer_dtype_default(value):
-> 1191     raise com.InputTypeError(value)
   1192 
   1193 
@xmnlab

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 8, 2018

@kszucs

I could resolve the problem here:

  • added a infer function for tuple -> Array(Primitive())
  • added cast for array to point, line, polygon and multipolygon

let me know if this is reasonable or if you prefer another approach.

@xmnlab xmnlab changed the title [WIP] [MapD] Add geo spatial data support [MapD] Add geo spatial data support Nov 8, 2018

@kszucs

This comment has been minimized.

Copy link
Member

commented Nov 8, 2018

That sounds good!

@xmnlab

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 8, 2018

@kszucs it seems all tests passed except for azure (mysql installation issue) and python27 (sqlite issues)

do you have any idea how to fixed that?

@kszucs

This comment has been minimized.

Copy link
Member

commented Nov 10, 2018

MySQL testing has been disabled on the master. I'm rebasing it.

@kszucs kszucs force-pushed the Quansight:add_geospatial_support branch from 5564df6 to b6c991c Nov 10, 2018

@codecov

This comment has been minimized.

Copy link

commented Nov 10, 2018

Codecov Report

Merging #1666 into master will decrease coverage by 2.56%.
The diff coverage is 93.9%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #1666      +/-   ##
=========================================
- Coverage   89.97%   87.4%   -2.57%     
=========================================
  Files         186     186              
  Lines       27300   27486     +186     
  Branches     2311    2344      +33     
=========================================
- Hits        24563   24024     -539     
- Misses       2335    3050     +715     
- Partials      402     412      +10
Impacted Files Coverage Δ
ibis/mapd/client.py 50.85% <ø> (ø) ⬆️
ibis/expr/tests/test_decimal.py 100% <ø> (ø) ⬆️
ibis/expr/tests/test_value_exprs.py 99.5% <100%> (+0.01%) ⬆️
ibis/expr/types.py 91.77% <100%> (+0.5%) ⬆️
ibis/expr/tests/test_datatypes.py 100% <100%> (ø) ⬆️
ibis/mapd/tests/test_operations.py 98.11% <100%> (+0.61%) ⬆️
ibis/expr/api.py 93.06% <100%> (-0.38%) ⬇️
ibis/mapd/operations.py 72.59% <83.33%> (+1.17%) ⬆️
ibis/expr/datatypes.py 94.87% <93.47%> (-0.2%) ⬇️
ibis/bigquery/tests/test_client.py 25.87% <0%> (-73.55%) ⬇️
... and 18 more
@xmnlab

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 11, 2018

Thanks @kszucs !! It seems awesome!

@kszucs kszucs dismissed their stale review Nov 11, 2018

Found issues with implicit casting.

@kszucs

This comment has been minimized.

Copy link
Member

commented Nov 11, 2018

@xmnlab @cpcloud In overall the new datatypes are working, however We'll need a couple of follow-up PRs.

The current implementation doesn't reflect the hierarchy between the spatial types:

Point = Array[Numeric, 2]
Line = Array[Point]
Polygon = Array[Line]
Multypolygon = Array[Polygon]

And the implicit casting rules need to be more restrictive as well.

__slots__ = ()


class MultiPolygon(GeoSpatial):

This comment has been minimized.

Copy link
@cpcloud

cpcloud Nov 12, 2018

Member

Can you document these types? I think people may wonder how these types map to PostgreSQL's versions of these, since they are pretty similar. Also, these types should be flexible enough to support Postgres's versions. I think the exercise of going through the comparison will be very helpful in determining if this is the right set of types to support databases that support them.

This comment has been minimized.

Copy link
@xmnlab

xmnlab Nov 19, 2018

Author Collaborator

it sounds good .. I will also change Line to Linestring. I will also add srid type information.

]
)
def test_literal_cases(value, expected_type):
@pytest.mark.parametrize(['value', 'expected_type'], [

This comment has been minimized.

Copy link
@cpcloud

cpcloud Nov 12, 2018

Member

Can you leave this formatting as is?

This comment has been minimized.

Copy link
@xmnlab

xmnlab Nov 19, 2018

Author Collaborator

OK!

This comment has been minimized.

Copy link
@xmnlab

xmnlab Feb 21, 2019

Author Collaborator

for some unknown reason I lost this ... I am doing that.

multipolygon1 = [polygon1, polygon2]


@pytest.mark.parametrize(['value', 'expected_type'], [

This comment has been minimized.

Copy link
@cpcloud

cpcloud Nov 12, 2018

Member

Please leave this formatting.

This comment has been minimized.

Copy link
@xmnlab

xmnlab Feb 21, 2019

Author Collaborator

OK .. I am doing that. thanks!

Show resolved Hide resolved ibis/expr/types.py
@@ -330,9 +330,43 @@ def _cross_join(translator, expr):
return translator.translate(left.join(right, ibis.literal(True)))


def _format_point_value(value):
return ' '.join([str(v) for v in value])

This comment has been minimized.

Copy link
@cpcloud

cpcloud Nov 12, 2018

Member

You don't need to construct a list when calling str.join.

This comment has been minimized.

Copy link
@xmnlab

xmnlab Nov 19, 2018

Author Collaborator

thanks!

This comment has been minimized.

Copy link
@kszucs

kszucs Feb 21, 2019

Member

@xmnlab please remove the list comprehension and use a generator instead like Phillip mentioned.

This comment has been minimized.

Copy link
@xmnlab

xmnlab Feb 21, 2019

Author Collaborator

sounds good. sorry I lost that for a unknown reason. I am fixing that again. thanks again!

@@ -602,6 +603,38 @@ def __str__(self) -> str:
)


class GeoSpatial(DataType):

This comment has been minimized.

Copy link
@cpcloud

cpcloud Nov 12, 2018

Member

Is there some shared functionality among all child classes of this class? If not, then we should just make the geospatial types subclass from DataType and not tie them together unnecessarily.

This comment has been minimized.

Copy link
@xmnlab

xmnlab Nov 19, 2018

Author Collaborator

I thought to create the GeoSpatial data type just to generalize the functions parameters, for example:

ST_Distance(poly1, ST_GeomFromText('POINT(0 0)'))

where st_distance returns shortest planar distance between geometries

maybe would be better to change GeoSpatial name to Geometry

what do you think?

This comment has been minimized.

Copy link
@xmnlab

xmnlab Nov 20, 2018

Author Collaborator

actually I will think more about this .. postgis use geometry and geography types ...

Show resolved Hide resolved ibis/expr/datatypes.py
elif isinstance(expr, ir.PolygonScalar):
return "POLYGON({0!s})".format(_format_polygon_value(value))
elif isinstance(expr, ir.MultiPolygonScalar):
return "MULTIPOLYGON({0!s})".format(_format_multipolygon_value(value))

This comment has been minimized.

Copy link
@cpcloud

cpcloud Nov 12, 2018

Member

You don't need the 0!s in the format spec, just write it as {}.

@xmnlab
Copy link
Collaborator Author

left a comment

Thanks @cpcloud I will work on that.

Show resolved Hide resolved ibis/expr/types.py
@@ -602,6 +603,38 @@ def __str__(self) -> str:
)


class GeoSpatial(DataType):

This comment has been minimized.

Copy link
@xmnlab

xmnlab Nov 20, 2018

Author Collaborator

actually I will think more about this .. postgis use geometry and geography types ...

@kszucs

This comment has been minimized.

Copy link
Member

commented Jan 31, 2019

@xmnlab please rebase

@xmnlab xmnlab force-pushed the Quansight:add_geospatial_support branch from 3a8ebd3 to 7742daf Jan 31, 2019

@xmnlab

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 31, 2019

@kszucs @cpcloud I rebased it from master. could this PR be merged?

@xmnlab

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 5, 2019

@cpcloud @kszucs any feedback about this PR?

@@ -946,6 +1043,28 @@ def type(self) -> DataType:
struct : "struct" "<" field ":" type ("," field ":" type)* ">"
field : [a-zA-Z_][a-zA-Z_0-9]*

This comment has been minimized.

Copy link
@kszucs

kszucs Feb 21, 2019

Member

@xmnlab please add test cases for the newly introduces parsing parts, like the semicolon

This comment has been minimized.

Copy link
@kszucs

kszucs Feb 21, 2019

Member

So for example add an example parametrization of test_dtype with linestring;<srid>:<geotype>

This comment has been minimized.

This comment has been minimized.

Copy link
@kszucs

kszucs Feb 21, 2019

Member

Codecov doesn't redirect properly, see the lines from 1191.

This comment has been minimized.

Copy link
@xmnlab

xmnlab Feb 22, 2019

Author Collaborator

Tests added

This comment has been minimized.

Copy link
@xmnlab

xmnlab Feb 22, 2019

Author Collaborator

thanks @kszucs! let me know if it is missing anything else 👍

Show resolved Hide resolved ibis/expr/datatypes.py Outdated
Show resolved Hide resolved ibis/mapd/operations.py Outdated
Show resolved Hide resolved ibis/mapd/operations.py Outdated
@kszucs

This comment has been minimized.

Copy link
Member

commented Feb 21, 2019

@xmnlab please add an unreleased version to https://github.com/ibis-project/ibis/blob/a16772aea43a936cadc39046ab96d3f54526ecdf/docs/source/release.rst

With an "Experimental GeoSpatial datatype support" (or something like that) entry and a reference to the appropiate issue.

@kszucs kszucs changed the title [MapD] Add geo spatial data support [MapD] Add geo spatial datatype support Feb 21, 2019

@xmnlab xmnlab force-pushed the Quansight:add_geospatial_support branch from ba2035d to 8dfc607 Feb 21, 2019

xmnlab and others added some commits Oct 25, 2018

Added initial structure for spatial data support
Moving geospatial data to its own module.

Added new changes

Added geo data type for mapd

Fixed flake8 issue

Fixed linestring

Added infer for tuple

Passed an connected ibis as a fixture

Changed ibis_connected fixture

Changed fixture scope

Refactoring geo spatial data type tests

Added initial structure for spatial data support

Moving geospatial data to its own module.

Added new changes

Added geo data type for mapd

Fixed flake8 issue

Fixed linestring

Added infer for tuple

Passed an connected ibis as a fixture

Changed ibis_connected fixture

Changed fixture scope

Refactoring geo spatial data type tests

@xmnlab xmnlab force-pushed the Quansight:add_geospatial_support branch from 8dfc607 to 6228f08 Feb 21, 2019

Fixed small issues
Added srid and geotypes

fixed tests and pep8

Fixed pep8 issues

Fixed linestring test for datatypes

Removed the inference tuple function.

Fixing geospatial literal for mapd

Fixed line -> linestring

Added map for geospatial datatypes

Fixed pep8 issues

Fixed pep8 issues

changed line -> linestring

Fixed small issues from PR feedback

Fixed description for geo type usage.

@xmnlab xmnlab force-pushed the Quansight:add_geospatial_support branch from 6228f08 to 479cb54 Feb 21, 2019

@@ -55,8 +53,7 @@

with suppress(ImportError):
# pip install ibis-framework[mapd]
if sys.version_info.major >= 3:

This comment has been minimized.

Copy link
@kszucs

kszucs Feb 22, 2019

Member

Nice catch!

@kszucs

This comment has been minimized.

Copy link
Member

commented Feb 22, 2019

The coverage is good now too. Thanks @xmnlab!

@kszucs

kszucs approved these changes Feb 22, 2019

@kszucs kszucs closed this in 1eb3cb5 Feb 22, 2019

New Operations and Types automation moved this from To do to Done Feb 22, 2019

MapD automation moved this from Needs review to Done Feb 22, 2019

@xmnlab xmnlab deleted the Quansight:add_geospatial_support branch Feb 22, 2019

kszucs added a commit that referenced this pull request Mar 6, 2019

[MapD] Added Geospatial functions
This PR solves #1665 and solves #1707      Add Geo Spatial functions
on the main structure and define these functions inside MapD backend.
References:     - Quansight/mapd#21  -
https://www.omnisci.com/docs/latest/5_geospatial_functions.html
Depends on #1666 ( PR 1666 was used as base for the current PR)     #
Geospatial functions    - Geometry/Geography Constructors    - [x]
ST_GeomFromText(WKT) - using literals    - [x] ST_GeogFromText(WKT) -
using literals  - Geometry Editors    - ~ST_Transform (Returns a new
geometry with its coordinates transformed to a different spatial
reference system.)~    - ~ST_SetSRID (Sets the SRID on a geometry to a
particular integer value.)~  - Geometry Accessors    - [x] ST_X
(Return the X coordinate of the point, or NULL if not available. Input
must be a point.)    - [x] ST_Y (Return the Y coordinate of the point,
or NULL if not available. Input must be a point.)    - [x] ST_XMin
(Returns Y minima of a bounding box 2d or 3d or a geometry.)    - [x]
ST_XMax (Returns X maxima of a bounding box 2d or 3d or a geometry.)
- [x] ST_YMin (Returns Y minima of a bounding box 2d or 3d or a
geometry.)    - [x] ST_YMax (Returns Y maxima of a bounding box 2d or
3d or a geometry.)    - [x] ST_StartPoint (Returns the first point of
a LINESTRING geometry as a POINT or NULL if the input parameter is not
a LINESTRING.)    - [x] ST_EndPoint (Returns the last point of a
LINESTRING geometry as a POINT or NULL if the input parameter is not a
LINESTRING.)    - [x] ST_PointN (Return the Nth point in a single
linestring in the geometry. Negative values are counted backwards from
the end of the LineString, so that -1 is the last point. Returns NULL
if there is no linestring in the geometry.)    - [x] ST_NPoints
(Return the number of points in a geometry. Works for all geometries.)
- [x] ST_NRings (If the geometry is a polygon or multi-polygon returns
the number of rings. It counts the outer rings as well.)    - [x]
ST_SRID (Returns the spatial reference identifier for the ST_Geometry)
- Spatial Relationships and Measurements    - [x] ST_Distance    - [x]
ST_Contains    - [x] ST_Area    - [x] ST_Perimeter    - [x] ST_Length
- [x] ST_MaxDistance  - Extra    - ~CastToGeography~ TODO: will be
added in a new PR.

Author: Krisztián Szűcs <szucs.krisztian@gmail.com>
Author: Ivan Ogasawara <ivan.ogasawara@gmail.com>

Closes #1678 from xmnlab/geospatial_functions and squashes the following commits:

4f94baf [Krisztián Szűcs] update docs conda dependencies
44dfaa6 [Krisztián Szűcs] remove IBIS_TEST_DOWNLOAD_DIRECTORY from the docs container as well
a96c0fd [Ivan Ogasawara] Added pymapd dependence from conda
a1e177d [Krisztián Szűcs] use pkg_resources to get pymapd's version
2cc2285 [Krisztián Szűcs] fox download path in docker-compose
e4d4410 [Krisztián Szűcs] more robost testing data download script; updated requirements
bb4c8f4 [Krisztián Szűcs] use the zip github endpoint to download the repository
4c3969c [Ivan Ogasawara] Added more tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.