44 changes: 31 additions & 13 deletions docs/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,43 +7,61 @@ Release Notes
These release notes are for versions of ibis **1.0 and later**. Release
notes for pre-1.0 versions of ibis can be found at :doc:`/release-pre-1.0`

* :release: `1.1.0 <2019-06-09>`
* :bug:`1819` Fix group_concat test and implementations
* :release:`1.2.0 <2019-06-24>`
* :feature:`1836` Add new geospatial functions to OmniSciDB backend
* :support:`1847` Skip SQLAlchemy backend tests in connect method in backends.py
* :bug:`1855 major` Fix call to psql causing failing CI
* :bug:`1851 major` Fix nested array literal repr
* :support:`1848` Validate order_by when using rows_with_max_lookback window
* :bug:`1850 major` Fix repr of empty schema
* :support:`1845` Generate release notes from commits
* :support:`1844` Raise exception on backends where rows_with_max_lookback can't be implemented
* :bug:`1843 major` Add max_lookback to window replace and combine functions
* :bug:`1837 major` Partially revert #1758
* :support:`1840` Tighter version spec for pytest
* :feature:`1838` allow pandas timedelta in rows_with_max_lookback
* :feature:`1825` Accept rows-with-max-lookback as preceding parameter
* :feature:`1787` PostGIS support
* :support:`1826` Allow passing a branch to ci/feedstock.py
* :support:`-` Bugs go into feature releases
* :support:`-` No space after :release:
* :release:`1.1.0 <2019-06-09>`
* :bug:`1819 major` Fix group_concat test and implementations
* :support:`1820` Remove decorator hacks and add custom markers
* :bug:`1818` Fix failing strftime tests on Python 3.7
* :bug:`1757` Remove unnecessary (and erroneous in some cases) frame clauses
* :bug:`1818 major` Fix failing strftime tests on Python 3.7
* :bug:`1757 major` Remove unnecessary (and erroneous in some cases) frame clauses
* :support:`1814` Add development deps to setup.py
* :feature:`1809` Conslidate trailing window functions
* :bug:`1799` Chained mutate operations are buggy
* :bug:`1799 major` Chained mutate operations are buggy
* :support:`1805` Fix design and developer docs
* :support:`1810` Pin sphinx version to 2.0.1
* :feature:`1766` Call to_interval when casting integers to intervals
* :bug:`1783` Allow projections from joins to attempt fusion
* :bug:`1783 major` Allow projections from joins to attempt fusion
* :feature:`1796` Add session feature to mapd client API
* :bug:`1798` Fix Python 3.5 dependency versions
* :bug:`1798 major` Fix Python 3.5 dependency versions
* :feature:`1792` Add min periods parameter to Window
* :support:`1793` Add pep8speaks integration
* :support:`1821` Fix typo in UDF signature specification
* :feature:`1785` Allow strings for types in pandas UDFs
* :feature:`1790` Add missing date operations and struct field operation for the pandas backend
* :bug:`1789` Fix compatibility and bugs associated with pandas toposort reimplementation
* :bug:`1772` Fix outer_join generating LEFT join instead of FULL OUTER
* :bug:`1789 major` Fix compatibility and bugs associated with pandas toposort reimplementation
* :bug:`1772 major` Fix outer_join generating LEFT join instead of FULL OUTER
* :feature:`1771` Add window operations to the OmniSci backend
* :feature:`1758` Reimplement the pandas backend using topological sort
* :support:`1779` Clean up most xpassing tests
* :bug:`1782` NullIf should enforce that its arguments are castable to a common type
* :bug:`1782 major` NullIf should enforce that its arguments are castable to a common type
* :support:`1781` Update omnisci container version
* :feature:`1778` Add marker for xfailing specific backends
* :feature:`1777` Enable window function tests where possible
* :bug:`1775` Fix conda create command in documentation
* :bug:`1775 major` Fix conda create command in documentation
* :support:`1776` Constrain PyMapD version to get passing builds
* :bug:`1765` Fix preceding and following with ``None``
* :bug:`1765 major` Fix preceding and following with ``None``
* :support:`1763` Remove warnings and clean up some docstrings
* :support:`1638` Add StringToTimestamp as unsupported
* :feature:`1743` is_computable_arg dispatcher
* :support:`1759` Add isort pre-commit hooks
* :feature:`1753` Added float32 and geospatial types for create table from schema
* :bug:`1661` PostgreSQL interval type not recognized
* :bug:`1661 major` PostgreSQL interval type not recognized
* :support:`1750` Add Python 3.5 testing back to CI
* :support:`1700` Re-enable CI for building step
* :support:`1749` Update README reference to MapD to say OmniSci
Expand Down
497 changes: 490 additions & 7 deletions ibis/expr/api.py

Large diffs are not rendered by default.

90 changes: 86 additions & 4 deletions ibis/expr/datatypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@

import ibis.common as com
import ibis.expr.types as ir
from ibis import util


class DataType:
Expand Down Expand Up @@ -112,6 +113,10 @@ def scalar_type(self):
def column_type(self):
return functools.partial(self.column, dtype=self)

def _literal_value_hash_key(self, value) -> int:
"""Return a hash for `value`."""
return self, value


class Any(DataType):
__slots__ = ()
Expand Down Expand Up @@ -516,6 +521,22 @@ def __str__(self) -> str:
', '.join(itertools.starmap('{}: {}'.format, self.pairs.items())),
)

def _literal_value_hash_key(self, value):
return self, _tuplize(value.items())


def _tuplize(values):
"""Recursively convert `values` to a tuple of tuples."""
def tuplize_iter(values):
yield from (
tuple(tuplize_iter(value))
if util.is_iterable(value)
else value
for value in values
)

return tuple(tuplize_iter(values))


class Array(Variadic):
scalar = ir.ArrayScalar
Expand All @@ -532,6 +553,9 @@ def __init__(
def __str__(self) -> str:
return '{}<{}>'.format(self.name.lower(), self.value_type)

def _literal_value_hash_key(self, value):
return self, _tuplize(value)


class Set(Variadic):
scalar = ir.SetScalar
Expand Down Expand Up @@ -581,10 +605,16 @@ def __str__(self) -> str:
self.name.lower(), self.key_type, self.value_type
)

def _literal_value_hash_key(self, value):
return self, _tuplize(value.items())


class GeoSpatial(DataType):
__slots__ = 'geotype', 'srid'

column = ir.GeoSpatialColumn
scalar = ir.GeoSpatialScalar

def __init__(
self, geotype: str = None, srid: int = None, nullable: bool = True
):
Expand All @@ -610,6 +640,46 @@ def __init__(
self.geotype = geotype
self.srid = srid

def __str__(self) -> str:
geo_op = self.name.lower()
if self.geotype is not None:
geo_op += ':' + self.geotype
if self.srid is not None:
geo_op += ';' + str(self.srid)
return geo_op


class Geometry(GeoSpatial):
"""Geometry is used to cast from geography types."""

column = ir.GeoSpatialColumn
scalar = ir.GeoSpatialScalar

__slots__ = ()

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.geotype = 'geometry'

def __str__(self) -> str:
return self.name.lower()


class Geography(GeoSpatial):
"""Geography is used to cast from geometry types."""

column = ir.GeoSpatialColumn
scalar = ir.GeoSpatialScalar

__slots__ = ()

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.geotype = 'geography'

def __str__(self) -> str:
return self.name.lower()


class Point(GeoSpatial):
"""A point described by two coordinates."""
Expand Down Expand Up @@ -678,6 +748,8 @@ class MultiPolygon(GeoSpatial):
interval = Interval()
category = Category()
# geo spatial data type
geometry = GeoSpatial()
geography = GeoSpatial()
point = Point()
linestring = LineString()
polygon = Polygon()
Expand Down Expand Up @@ -1036,6 +1108,10 @@ def type(self) -> DataType:
field : [a-zA-Z_][a-zA-Z_0-9]*
geography: "geography"
geometry: "geometry"
point : "point"
| "point" ";" srid
| "point" ":" geotype
Expand Down Expand Up @@ -1171,6 +1247,12 @@ def type(self) -> DataType:
return Struct(names, types)

# geo spatial data type
elif self._accept(Tokens.GEOMETRY):
return Geometry()

elif self._accept(Tokens.GEOGRAPHY):
return Geography()

elif self._accept(Tokens.POINT):
geotype = None
srid = None
Expand Down Expand Up @@ -1511,10 +1593,10 @@ def can_cast_variadic(


# geo spatial data type
@castable.register(Array, Point)
@castable.register(Array, LineString)
@castable.register(Array, Polygon)
@castable.register(Array, MultiPolygon)
# cast between same type, used to cast from/to geometry and geography
@castable.register(Array, (Point, LineString, Polygon, MultiPolygon))
@castable.register((Point, LineString, Polygon, MultiPolygon), Geometry)
@castable.register((Point, LineString, Polygon, MultiPolygon), Geography)
def can_cast_geospatial(source, target, **kwargs):
return True

Expand Down
11 changes: 4 additions & 7 deletions ibis/expr/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,17 +166,14 @@ def _indent(self, text, indents=1):
def _format_table(self, expr):
table = expr.op()
# format the schema
rows = ['name: {0!s}\nschema:'.format(table.name)]
rows = ['name: {}\nschema:'.format(table.name)]
rows.extend(
[
' %s : %s' % tup
for tup in zip(table.schema.names, table.schema.types)
]
map(' {} : {}'.format, table.schema.names, table.schema.types)
)
opname = type(table).__name__
type_display = self._get_type_display(expr)
opline = '%s[%s]' % (opname, type_display)
return '{0}\n{1}'.format(opline, self._indent('\n'.join(rows)))
opline = '{}[{}]'.format(opname, type_display)
return '{}\n{}'.format(opline, self._indent('\n'.join(rows)))

def _format_column(self, expr):
# HACK: if column is pulled from a Filter of another table, this parent
Expand Down
213 changes: 212 additions & 1 deletion ibis/expr/operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -1025,6 +1025,15 @@ def __init__(self, expr, window):
if table is not None:
window = window.bind(table)

if window.max_lookback is not None:
error_msg = ("'max lookback' windows must be ordered "
"by a timestamp column")
if len(window._order_by) != 1:
raise com.IbisInputError(error_msg)
order_var = window._order_by[0].op().args[0]
if not isinstance(order_var.type(), dt.Timestamp):
raise com.IbisInputError(error_msg)

expr = propagate_down_window(expr, window)
super().__init__(expr, window)

Expand Down Expand Up @@ -2892,6 +2901,15 @@ def output_type(self):
def root_tables(self):
return []

def __hash__(self) -> int:
"""Return the hash of a literal value.
We override this method to make sure that we can handle things that
aren't eminently hashable like an ``array<array<int64>>``.
"""
return hash(self.dtype._literal_value_hash_key(self.value))


class NullLiteral(Literal):
"""Typeless NULL literal"""
Expand Down Expand Up @@ -2994,6 +3012,68 @@ class GeoContains(GeoSpatialBinOp):
output_type = rlz.shape_like('args', dt.boolean)


class GeoContainsProperly(GeoSpatialBinOp):
"""Check if the first geo spatial data contains the second one,
and no boundary points are shared."""

output_type = rlz.shape_like('args', dt.boolean)


class GeoCovers(GeoSpatialBinOp):
"""Returns True if no point in Geometry B is outside Geometry A"""

output_type = rlz.shape_like('args', dt.boolean)


class GeoCoveredBy(GeoSpatialBinOp):
"""Returns True if no point in Geometry/Geography A is
outside Geometry/Geography B"""

output_type = rlz.shape_like('args', dt.boolean)


class GeoCrosses(GeoSpatialBinOp):
"""Returns True if the supplied geometries have some, but not all,
interior points in common."""

output_type = rlz.shape_like('args', dt.boolean)


class GeoDisjoint(GeoSpatialBinOp):
"""Returns True if the Geometries do not “spatially intersect” -
if they do not share any space together."""

output_type = rlz.shape_like('args', dt.boolean)


class GeoEquals(GeoSpatialBinOp):
"""Returns True if the given geometries represent the same geometry."""

output_type = rlz.shape_like('args', dt.boolean)


class GeoIntersects(GeoSpatialBinOp):
"""Returns True if the Geometries/Geography “spatially intersect in 2D”
- (share any portion of space) and False if they don’t (they are Disjoint).
"""

output_type = rlz.shape_like('args', dt.boolean)


class GeoOverlaps(GeoSpatialBinOp):
"""Returns True if the Geometries share space, are of the same dimension,
but are not completely contained by each other."""

output_type = rlz.shape_like('args', dt.boolean)


class GeoTouches(GeoSpatialBinOp):
"""Returns True if the geometries have at least one point in common,
but their interiors do not intersect."""

output_type = rlz.shape_like('args', dt.boolean)


class GeoArea(GeoSpatialUnOp):
"""Area of the geo spatial data"""

Expand Down Expand Up @@ -3105,6 +3185,137 @@ class GeoNRings(GeoSpatialUnOp):


class GeoSRID(GeoSpatialUnOp):
"""Returns the spatial reference identifier for the ST_Geometry"""
"""Returns the spatial reference identifier for the ST_Geometry."""

output_type = rlz.shape_like('args', dt.int64)


class GeoSetSRID(GeoSpatialUnOp):
"""Set the spatial reference identifier for the ST_Geometry."""
srid = Arg(rlz.integer)
output_type = rlz.shape_like('args', dt.geometry)


class GeoBuffer(GeoSpatialUnOp):
"""Returns a geometry that represents all points whose distance from this
Geometry is less than or equal to distance. Calculations are in the
Spatial Reference System of this Geometry.
"""

radius = Arg(rlz.floating)

output_type = rlz.shape_like('args', dt.geometry)


class GeoCentroid(GeoSpatialUnOp):
"""Returns the geometric center of a geometry."""

output_type = rlz.shape_like('arg', dt.point)


class GeoDFullyWithin(GeoSpatialBinOp):
"""Returns True if the geometries are fully within the specified distance
of one another.
"""
distance = Arg(rlz.floating)

output_type = rlz.shape_like('args', dt.boolean)


class GeoDWithin(GeoSpatialBinOp):
"""Returns True if the geometries are within the specified distance
of one another.
"""
distance = Arg(rlz.floating)

output_type = rlz.shape_like('args', dt.boolean)


class GeoEnvelope(GeoSpatialUnOp):
"""Returns a geometry representing the boundingbox of the supplied geometry.
"""

output_type = rlz.shape_like('arg', dt.polygon)


class GeoAzimuth(GeoSpatialBinOp):
"""Returns the angle in radians from the horizontal of the vector defined
by pointA and pointB. Angle is computed clockwise from down-to-up:
on the clock: 12=0; 3=PI/2; 6=PI; 9=3PI/2.
"""

left = Arg(rlz.point)
right = Arg(rlz.point)

output_type = rlz.shape_like('args', dt.float64)


class GeoWithin(GeoSpatialBinOp):
"""Returns True if the geometry A is completely inside geometry B"""

output_type = rlz.shape_like('args', dt.boolean)


class GeoIntersection(GeoSpatialBinOp):
"""Returns a geometry that represents the point set intersection
of the Geometries.
"""

output_type = rlz.shape_like('args', dt.geometry)


class GeoDifference(GeoSpatialBinOp):
"""Returns a geometry that represents that part of geometry A
that does not intersect with geometry B
"""

output_type = rlz.shape_like('args', dt.geometry)


class GeoSimplify(GeoSpatialUnOp):
"""Returns a simplified version of the given geometry."""

tolerance = Arg(rlz.floating)
preserve_collapsed = Arg(rlz.boolean)

output_type = rlz.shape_like('arg', dt.geometry)


class GeoTransform(GeoSpatialUnOp):
"""Returns a transformed version of the given geometry into a new SRID."""

srid = Arg(rlz.integer)

output_type = rlz.shape_like('arg', dt.geometry)


class GeoAsBinary(GeoSpatialUnOp):
"""Return the Well-Known Binary (WKB) representation of the
geometry/geography without SRID meta data.
"""

output_type = rlz.shape_like('arg', dt.binary)


class GeoAsEWKB(GeoSpatialUnOp):
"""Return the Well-Known Binary (WKB) representation of the
geometry/geography with SRID meta data.
"""

output_type = rlz.shape_like('arg', dt.binary)


class GeoAsEWKT(GeoSpatialUnOp):
"""Return the Well-Known Text (WKT) representation of the
geometry/geography with SRID meta data.
"""

output_type = rlz.shape_like('arg', dt.string)


class GeoAsText(GeoSpatialUnOp):
"""Return the Well-Known Text (WKT) representation of the
geometry/geography without SRID metadata.
"""

output_type = rlz.shape_like('arg', dt.string)
4 changes: 4 additions & 0 deletions ibis/expr/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,10 @@ def array_of(inner, arg):
mapping = value(dt.Map(dt.any, dt.any))

geospatial = value(dt.GeoSpatial)
point = value(dt.Point)
linestring = value(dt.LineString)
polygon = value(dt.Polygon)
multipolygon = value(dt.MultiPolygon)


@validator
Expand Down
2 changes: 1 addition & 1 deletion ibis/expr/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def __init__(self, names, types):
raise com.IntegrityError('Duplicate column names')

def __repr__(self):
space = 2 + max(map(len, self.names))
space = 2 + max(map(len, self.names), default=0)
return "ibis.Schema {{{}\n}}".format(
util.indent(
''.join(
Expand Down
1 change: 1 addition & 0 deletions ibis/expr/tests/test_geospatial.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ def test_geo_ops_smoke(geo_table):

# test ops
point.srid()
point.set_srid(4326)
point.x()
point.y()

Expand Down
9 changes: 9 additions & 0 deletions ibis/expr/tests/test_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,3 +140,12 @@ def test_schema_subset():

assert s1 >= s2
assert s2 <= s1


def test_empty_schema():
schema = ibis.schema([])
result = repr(schema)
expected = """\
ibis.Schema {
}"""
assert result == expected
31 changes: 31 additions & 0 deletions ibis/expr/tests/test_value_exprs.py
Original file line number Diff line number Diff line change
Expand Up @@ -1540,3 +1540,34 @@ def test_chained_select_on_join():
expr1 = join["a", "b"]
expr2 = join.select(["a", "b"])
assert expr1.equals(expr2)


def test_repr_list_of_lists():
lit = ibis.literal([[1]])
result = repr(lit)
expected = """\
Literal[array<array<int8>>]
[[1]]"""
assert result == expected


def test_repr_list_of_lists_in_table():
t = ibis.table([('a', 'int64')], name='t')
lit = ibis.literal([[1]])
expr = t[t, lit.name('array_of_array')]
result = repr(expr)
expected = """\
ref_0
UnboundTable[table]
name: t
schema:
a : int64
Selection[table]
table:
Table: ref_0
selections:
Table: ref_0
array_of_array = Literal[array<array<int8>>]
[[1]]"""
assert result == expected
151 changes: 147 additions & 4 deletions ibis/expr/tests/test_window_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,12 @@
# limitations under the License.

import numpy as np
import pandas as pd
import pytest

import ibis
from ibis.expr.window import _determine_how
import ibis.common as com
from ibis.expr.window import _determine_how, rows_with_max_lookback
from ibis.tests.util import assert_equal


Expand Down Expand Up @@ -62,6 +64,55 @@ def test_combine_windows(alltypes):
with pytest.raises(ibis.common.IbisInputError):
w1.combine(w6)

w7 = ibis.trailing_window(
rows_with_max_lookback(3, ibis.interval(days=5))
)
w8 = ibis.trailing_window(
rows_with_max_lookback(5, ibis.interval(days=7))
)
w9 = w7.combine(w8)
expected = ibis.trailing_window(
rows_with_max_lookback(3, ibis.interval(days=5))
)
assert_equal(w9, expected)


def test_replace_window(alltypes):
t = alltypes
w1 = ibis.window(
preceding=5,
following=1,
group_by=t.a,
order_by=t.b
)
w2 = w1.group_by(t.c)
expected = ibis.window(
preceding=5,
following=1,
group_by=[t.a, t.c],
order_by=t.b
)
assert_equal(w2, expected)

w3 = w1.order_by(t.d)
expected = ibis.window(
preceding=5,
following=1,
group_by=t.a,
order_by=[t.b, t.d]
)
assert_equal(w3, expected)

w4 = ibis.trailing_window(
rows_with_max_lookback(3, ibis.interval(months=3))
)
w5 = w4.group_by(t.a)
expected = ibis.trailing_window(
rows_with_max_lookback(3, ibis.interval(months=3)),
group_by=t.a
)
assert_equal(w5, expected)


def test_over_auto_bind(alltypes):
# GH #542
Expand Down Expand Up @@ -173,9 +224,83 @@ def test_preceding_following_validate(alltypes):
case()


@pytest.mark.xfail(raises=AssertionError, reason='NYT')
def test_max_rows_with_lookback_validate(alltypes):
t = alltypes
mlb = rows_with_max_lookback(3, ibis.interval(days=5))
window = ibis.trailing_window(mlb, order_by=t.i)
t.f.lag().over(window)

window = ibis.trailing_window(mlb)
with pytest.raises(com.IbisInputError):
t.f.lag().over(window)

window = ibis.trailing_window(mlb, order_by=t.a)
with pytest.raises(com.IbisInputError):
t.f.lag().over(window)

window = ibis.trailing_window(mlb, order_by=[t.i, t.a])
with pytest.raises(com.IbisInputError):
t.f.lag().over(window)


def test_window_equals(alltypes):
assert False
t = alltypes
w1 = ibis.window(
preceding=1,
following=2,
group_by=t.a,
order_by=t.b
)
w2 = ibis.window(
preceding=1,
following=2,
group_by=t.a,
order_by=t.b
)
assert w1.equals(w2)

w3 = ibis.window(
preceding=1,
following=2,
group_by=t.a,
order_by=t.c
)
assert not w1.equals(w3)

w4 = ibis.range_window(
preceding=ibis.interval(hours=3),
group_by=t.d
)
w5 = ibis.range_window(
preceding=ibis.interval(hours=3),
group_by=t.d
)
assert w4.equals(w5)

w6 = ibis.range_window(
preceding=ibis.interval(hours=1),
group_by=t.d
)
assert not w4.equals(w6)

w7 = ibis.trailing_window(
rows_with_max_lookback(3, ibis.interval(days=5)),
group_by=t.a,
order_by=t.b
)
w8 = ibis.trailing_window(
rows_with_max_lookback(3, ibis.interval(days=5)),
group_by=t.a,
order_by=t.b
)
assert w7.equals(w8)

w9 = ibis.trailing_window(
rows_with_max_lookback(3, ibis.interval(months=5)),
group_by=t.a,
order_by=t.b
)
assert not w7.equals(w9)


def test_determine_how():
Expand All @@ -197,14 +322,32 @@ def test_determine_how():
how = _determine_how(ibis.interval(months=5) + ibis.interval(days=10))
assert how == 'range'

how = _determine_how(rows_with_max_lookback(3, ibis.interval(months=3)))
assert how == 'rows'

how = _determine_how(rows_with_max_lookback(3, pd.Timedelta(days=3)))
assert how == 'rows'

how = _determine_how(
rows_with_max_lookback(np.int64(7), ibis.interval(months=3))
)
assert how == 'rows'

with pytest.raises(TypeError):
_determine_how(8.9)

with pytest.raises(TypeError):
_determine_how('invalid preceding')

with pytest.raises(TypeError):
_determine_how({'start': 1, 'end': 2})
_determine_how({'rows': 1, 'max_lookback': 2})

with pytest.raises(TypeError):
_determine_how(
rows_with_max_lookback(
ibis.interval(days=3), ibis.interval(months=1)
)
)

with pytest.raises(TypeError):
_determine_how([3, 5])
86 changes: 74 additions & 12 deletions ibis/expr/window.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
"""Encapsulation of SQL window clauses."""

import functools
from typing import NamedTuple, Union

import numpy as np
import pandas as pd

import ibis.common as com
import ibis.expr.operations as ops
Expand All @@ -12,16 +16,14 @@ def _sequence_to_tuple(x):
return tuple(x) if util.is_iterable(x) else x


def _determine_how(preceding):
if isinstance(preceding, tuple):
start, end = preceding
if start is None:
offset_type = type(end)
else:
offset_type = type(start)
else:
offset_type = type(preceding)
RowsWithMaxLookback = NamedTuple('RowsWithMaxLookback',
[('rows', Union[int, np.integer]),
('max_lookback', ir.IntervalValue)]
)


def _determine_how(preceding):
offset_type = type(_get_preceding_value(preceding))
if issubclass(offset_type, (int, np.integer)):
how = 'rows'
elif issubclass(offset_type, ir.IntervalScalar):
Expand All @@ -31,10 +33,43 @@ def _determine_how(preceding):
'Type {} is not supported for row- or range- based trailing '
'window operations'.format(offset_type)
)

return how


@functools.singledispatch
def _get_preceding_value(preceding):
raise TypeError(
"Type {} is not a valid type for 'preceding' "
"parameter".format(type(preceding))
)


@_get_preceding_value.register(tuple)
def _get_preceding_value_tuple(preceding):
start, end = preceding
if start is None:
preceding_value = end
else:
preceding_value = start
return preceding_value


@_get_preceding_value.register(int)
@_get_preceding_value.register(np.integer)
@_get_preceding_value.register(ir.IntervalScalar)
def _get_preceding_value_simple(preceding):
return preceding


@_get_preceding_value.register(RowsWithMaxLookback)
def _get_preceding_value_mlb(preceding):
preceding_value = preceding.rows
if not isinstance(preceding_value, (int, np.integer)):
raise TypeError("'Rows with max look-back' only supports integer "
"row-based indexing.")
return preceding_value


class Window:
"""Class to encapsulate the details of a window frame.
Expand All @@ -53,6 +88,7 @@ def __init__(
order_by=None,
preceding=None,
following=None,
max_lookback=None,
how='rows',
):
if group_by is None:
Expand All @@ -71,7 +107,13 @@ def __init__(
x = ops.SortKey(x).to_expr()
self._order_by.append(x)

self.preceding = _sequence_to_tuple(preceding)
if isinstance(preceding, RowsWithMaxLookback):
self.preceding = preceding.rows
self.max_lookback = preceding.max_lookback
else:
self.preceding = _sequence_to_tuple(preceding)
self.max_lookback = max_lookback

self.following = _sequence_to_tuple(following)
self.how = how

Expand Down Expand Up @@ -163,6 +205,14 @@ def _validate_frame(self):
"'how' must be 'rows' or 'range', got {}".format(self.how)
)

if self.max_lookback is not None:
if not isinstance(
self.max_lookback, (ir.IntervalValue, pd.Timedelta)):
raise com.IbisInputError(
"'max_lookback' must be specified as an interval "
"or pandas.Timedelta object"
)

def bind(self, table):
# Internal API, ensure that any unresolved expr references (as strings,
# say) are bound to the table being windowed
Expand All @@ -178,9 +228,11 @@ def combine(self, window):
"Expecting '{}' Window, got '{}'"
).format(self.how.upper(), window.how.upper())
)
mlb = self.max_lookback
kwds = dict(
preceding=self.preceding or window.preceding,
following=self.following or window.following,
max_lookback=mlb if mlb is not None else window.max_lookback,
group_by=self._group_by + window._group_by,
order_by=self._order_by + window._order_by,
)
Expand All @@ -196,6 +248,7 @@ def _replace(self, **kwds):
order_by=kwds.get('order_by', self._order_by),
preceding=kwds.get('preceding', self.preceding),
following=kwds.get('following', self.following),
max_lookback=kwds.get('max_lookback', self.max_lookback),
how=kwds.get('how', self.how),
)
return Window(**new_kwds)
Expand Down Expand Up @@ -235,11 +288,20 @@ def equals(self, other, cache=None):

equal = ops.all_equal(
self.preceding, other.preceding, cache=cache
) and ops.all_equal(self.following, other.following, cache=cache)
) and ops.all_equal(
self.following, other.following, cache=cache
) and ops.all_equal(
self.max_lookback, other.max_lookback, cache=cache
)
cache[self, other] = equal
return equal


def rows_with_max_lookback(rows, max_lookback):
"""Create a bound preceding value for use with trailing window functions"""
return RowsWithMaxLookback(rows, max_lookback)


def window(preceding=None, following=None, group_by=None, order_by=None):
"""Create a window clause for use with window functions.
Expand Down
14 changes: 3 additions & 11 deletions ibis/file/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@

import ibis
import ibis.expr.types as ir
from ibis.pandas.core import execute
from ibis.pandas.dispatch import execute_last
from ibis.pandas.core import execute_and_reset


class FileClient(ibis.client.Client):
Expand Down Expand Up @@ -35,15 +34,8 @@ def database(self, name=None, path=None):
return FileDatabase(name, self, path=path)

def execute(self, expr, params=None, **kwargs): # noqa
assert isinstance(expr, ir.Expr), "Expected ir.Expr, got {}".format(
type(expr)
)
return execute_last(
expr.op(),
execute(expr, params=params, **kwargs),
params=params,
**kwargs,
)
assert isinstance(expr, ir.Expr)
return execute_and_reset(expr, params=params, **kwargs)

def list_tables(self, path=None):
raise NotImplementedError
Expand Down
3 changes: 3 additions & 0 deletions ibis/file/tests/test_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,9 @@ def test_read(csv, data):
expected['time'] = expected['time'].astype(str)
tm.assert_frame_equal(result, expected)

result = closes.execute()
tm.assert_frame_equal(result, expected)


def test_read_with_projection(csv2, data):

Expand Down
3 changes: 3 additions & 0 deletions ibis/file/tests/test_hdf5.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ def test_read(hdf, data):
expected = data['close']
tm.assert_frame_equal(result, expected)

result = closes.execute()
tm.assert_frame_equal(result, expected)


def test_insert(transformed, tmpdir):

Expand Down
4 changes: 3 additions & 1 deletion ibis/file/tests/test_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
from pandas.util import testing as tm

import ibis

from ibis.file.client import FileDatabase

pa = pytest.importorskip('pyarrow') # isort:skip
Expand Down Expand Up @@ -87,6 +86,9 @@ def test_read(parquet, data):
expected = data['close']
tm.assert_frame_equal(result, expected)

result = closes.execute()
tm.assert_frame_equal(result, expected)


def test_write(transformed, tmpdir):
t = transformed
Expand Down
4 changes: 4 additions & 0 deletions ibis/impala/compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,10 @@ def _window(translator, expr):
def _format_window(translator, op, window):
components = []

if window.max_lookback is not None:
raise NotImplementedError('Rows with max lookback is not implemented '
'for Impala-based backends.')

if len(window._group_by) > 0:
partition_args = [translator.translate(x) for x in window._group_by]
components.append('PARTITION BY {}'.format(', '.join(partition_args)))
Expand Down
12 changes: 11 additions & 1 deletion ibis/impala/tests/test_window.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import ibis
import ibis.common as com
from ibis import window
from ibis.expr.window import rows_with_max_lookback
from ibis.impala.compiler import to_sql # noqa: E402
from ibis.tests.util import assert_equal

Expand Down Expand Up @@ -106,7 +107,7 @@ def test_add_default_order_by(alltypes):
(
ibis.trailing_window(10),
'rows between 10 preceding and current row',
),
)
],
)
def test_window_frame_specs(con, window, frame):
Expand All @@ -122,6 +123,15 @@ def test_window_frame_specs(con, window, frame):
assert_sql_equal(expr, expected)


def test_window_rows_with_max_lookback(con):
t = con.table('alltypes')
mlb = rows_with_max_lookback(3, ibis.interval(days=3))
w = ibis.trailing_window(mlb, order_by=t.i)
expr = t.a.sum().over(w)
with pytest.raises(NotImplementedError):
to_sql(expr)


@pytest.mark.parametrize(
('cumulative', 'static'),
[
Expand Down
14 changes: 13 additions & 1 deletion ibis/mapd/operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,18 @@ def _cast(translator, expr):
op = expr.op()
arg, target = op.args
arg_ = translator.translate(arg)
type_ = str(MapDDataType.from_ibis(target, nullable=False))

if isinstance(arg, ir.GeoSpatialValue):
# NOTE: CastToGeography expects geometry with SRID=4326
type_ = target.geotype.upper()

if type_ == 'GEOMETRY':
raise com.UnsupportedOperationError(
'OmnisciDB/MapD doesn\'t support yet convert '
+ 'from GEOGRAPHY to GEOMETRY.'
)
else:
type_ = str(MapDDataType.from_ibis(target, nullable=False))
return 'CAST({0!s} AS {1!s})'.format(arg_, type_)


Expand Down Expand Up @@ -812,6 +822,8 @@ def formatter(translator, expr):
ops.GeoNPoints: unary('ST_NPOINTS'),
ops.GeoNRings: unary('ST_NRINGS'),
ops.GeoSRID: unary('ST_SRID'),
ops.GeoTransform: fixed_arity('ST_TRANSFORM', 2),
ops.GeoSetSRID: fixed_arity('ST_SETSRID', 2)
}

# STRING
Expand Down
7 changes: 2 additions & 5 deletions ibis/pandas/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@
import ibis.expr.schema as sch
import ibis.expr.types as ir
from ibis.compat import CategoricalDtype, DatetimeTZDtype
from ibis.pandas.core import execute
from ibis.pandas.dispatch import execute_last
from ibis.pandas.core import execute_and_reset

try:
infer_pandas_dtype = pd.api.types.infer_dtype
Expand Down Expand Up @@ -370,9 +369,7 @@ def execute(self, query, params=None, limit='default', **kwargs):
type(query).__name__
)
)
result = execute(query, params=params, **kwargs)
query_op = query.op()
return execute_last(query_op, result, params=params, **kwargs)
return execute_and_reset(query, params=params, **kwargs)

def compile(self, expr, *args, **kwargs):
"""Compile `expr`.
Expand Down
431 changes: 242 additions & 189 deletions ibis/pandas/core.py

Large diffs are not rendered by default.

48 changes: 1 addition & 47 deletions ibis/pandas/dispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

from functools import partial

import pandas as pd
import toolz
from multipledispatch import Dispatcher

Expand Down Expand Up @@ -34,8 +33,7 @@ def execute_node_without_scope(node, **kwargs):
pre_execute = Dispatcher(
'pre_execute',
doc="""\
Given a node and zero or more clients, compute a partial scope prior to
execution.
Given a node, compute a (possibly partial) scope prior to standard execution.
Notes
-----
Expand All @@ -61,25 +59,6 @@ def pre_execute_multiple_clients(node, *clients, scope=None, **kwargs):
)


execute_first = Dispatcher(
"execute_first", doc="Execute code before any nodes have been evaluated."
)


@execute_first.register(ops.Node)
@execute_first.register(ops.Node, ibis.client.Client)
def execute_first_default(node, *clients, **kwargs):
return {}


@execute_first.register(ops.Node, [ibis.client.Client])
def execute_first_multiple_clients(node, *clients, scope=None, **kwargs):
return toolz.merge(
scope,
*map(partial(execute_first, node, scope=scope, **kwargs), clients),
)


execute_literal = Dispatcher(
'execute_literal',
doc="""\
Expand Down Expand Up @@ -119,29 +98,4 @@ def post_execute_default(op, data, **kwargs):
return data


execute_last = Dispatcher(
"execute_last", doc="Execute code after all nodes have been evaluated."
)


@execute_last.register(ops.Node, object)
def execute_last_default(_, result, **kwargs):
"""Return the input result."""
return result


@execute_last.register(ops.Node, pd.DataFrame)
def execute_last_dataframe(op, result, **kwargs):
"""Reset the `result` :class:`~pandas.DataFrame`."""
schema = op.to_expr().schema()
df = result.reset_index()
return df.loc[:, schema.names]


@execute_last.register(ops.Node, pd.Series)
def execute_last_series(_, result, **kwargs):
"""Reset the `result` :class:`~pandas.Series`."""
return result.reset_index(drop=True)


execute = Dispatcher("execute")
29 changes: 28 additions & 1 deletion ibis/pandas/execution/tests/test_window.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
from pandas.util import testing as tm

import ibis
import ibis.common as com
import ibis.expr.operations as ops
from ibis.expr.window import rows_with_max_lookback
from ibis.pandas.dispatch import pre_execute

execute = ibis.pandas.execute
Expand Down Expand Up @@ -451,6 +453,31 @@ def test_window_with_preceding_expr():
tm.assert_series_equal(result, expected)


def test_window_with_mlb():
index = pd.date_range('20170501', '20170507')
data = np.random.randn(len(index), 3)
df = (pd.DataFrame(data, columns=list('abc'), index=index)
.rename_axis('time').reset_index(drop=False))
client = ibis.pandas.connect({'df': df})
t = client.table('df')
rows_with_mlb = rows_with_max_lookback(5, ibis.interval(days=10))
expr = t.mutate(
sum=lambda df: df.a.sum().over(
ibis.trailing_window(rows_with_mlb, order_by='time')
)
)
with pytest.raises(NotImplementedError):
expr.execute()

rows_with_mlb = rows_with_max_lookback(5, 10)
with pytest.raises(com.IbisInputError):
t.mutate(
sum=lambda df: df.a.sum().over(
ibis.trailing_window(rows_with_mlb, order_by='time')
)
)


def test_window_has_pre_execute_scope():
signature = ops.Lag, ibis.pandas.PandasClient
called = [0]
Expand All @@ -472,4 +499,4 @@ def test_pre_execute(op, client, **kwargs):
# once in window op at the top to pickup any scope changes before computing
# twice in window op when calling execute on the ops.Lag node at the
# beginning of execute and once before the actual computation
assert called[0] == 2
assert called[0] == 3
16 changes: 4 additions & 12 deletions ibis/pandas/execution/window.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,10 @@ def execute_window_op(
**kwargs,
)

if window.max_lookback is not None:
raise NotImplementedError('Rows with max lookback is not implemented '
'for pandas backend.')

following = window.following
order_by = window._order_by

Expand Down Expand Up @@ -147,18 +151,6 @@ def execute_window_op(
factory=OrderedDict,
)

# operand inputs are coming in computed, but we need to recompute them in
# the case of a group by
if group_by:
operand_inputs = {
arg.op() for arg in operand.op().inputs if hasattr(arg, "op")
}
new_scope = OrderedDict(
(node, value)
for node, value in new_scope.items()
if node not in operand_inputs
)

# figure out what the dtype of the operand is
operand_type = operand.type()
operand_dtype = operand_type.to_pandas()
Expand Down
112 changes: 110 additions & 2 deletions ibis/sql/alchemy.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import contextlib
import functools
import operator
import sys

import pandas as pd
import sqlalchemy as sa
Expand All @@ -25,6 +26,19 @@
from ibis.client import Database, Query, SQLClient
from ibis.sql.compiler import Dialect, Select, TableSetFormatter, Union

# Don't support geospatial operations on Python 3.5
geospatial_supported = False
if sys.version_info >= (3, 6):
try:
import geoalchemy2 as ga
import geoalchemy2.shape as shape
import geopandas

geospatial_supported = True
except ImportError:
pass


# TODO(cleanup)
_ibis_type_to_sqla = {
dt.Null: sa.types.NullType,
Expand Down Expand Up @@ -64,6 +78,15 @@ def _to_sqla_type(itype, type_map=None):
)
)
return sa.ARRAY(_to_sqla_type(ibis_type, type_map=type_map))
elif geospatial_supported and isinstance(itype, dt.GeoSpatial):
if itype.geotype == 'geometry':
return ga.Geometry
elif itype.geotype == 'geography':
return ga.Geography
else:
raise TypeError(
'Unexpected geospatial geotype {}'.format(itype.geotype)
)
else:
return type_map[type(itype)]

Expand Down Expand Up @@ -114,6 +137,23 @@ def sa_double(_, satype, nullable=True):
return dt.Double(nullable=nullable)


if geospatial_supported:

@dt.dtype.register(SQLAlchemyDialect, ga.Geometry)
def ga_geometry(_, gatype, nullable=True):
t = gatype.geometry_type
if t == 'POINT':
return dt.Point(nullable=nullable)
if t == 'LINESTRING':
return dt.LineString(nullable=nullable)
if t == 'POLYGON':
return dt.Polygon(nullable=nullable)
if t == 'MULTIPOLYGON':
return dt.MultiPolygon(nullable=nullable)
else:
raise ValueError("Unrecognized geometry type: {}".format(t))


POSTGRES_FIELD_TO_IBIS_UNIT = {
"YEAR": "Y",
"MONTH": "M",
Expand Down Expand Up @@ -524,6 +564,10 @@ def _window(t, expr):
arg = _cumulative_to_window(t, arg, window)
return t.translate(arg)

if window.max_lookback is not None:
raise NotImplementedError('Rows with max lookback is not implemented '
'for SQLAlchemy-based backends.')

# Some analytic functions need to have the expression of interest in
# the ORDER BY part of the window clause
if isinstance(window_op, _require_order_by) and not window._order_by:
Expand Down Expand Up @@ -694,6 +738,56 @@ def _ntile(t, expr):
ops.CumulativeMean: unary(sa.func.avg),
}

if geospatial_supported:
_geospatial_functions = {
ops.GeoArea: unary(sa.func.ST_Area),
ops.GeoAsBinary: unary(sa.func.ST_AsBinary),
ops.GeoAsEWKB: unary(sa.func.ST_AsEWKB),
ops.GeoAsEWKT: unary(sa.func.ST_AsEWKT),
ops.GeoAsText: unary(sa.func.ST_AsText),
ops.GeoAzimuth: fixed_arity(sa.func.ST_Azimuth, 2),
ops.GeoBuffer: fixed_arity(sa.func.ST_Buffer, 2),
ops.GeoCentroid: unary(sa.func.ST_Centroid),
ops.GeoContains: fixed_arity(sa.func.ST_Contains, 2),
ops.GeoContainsProperly: fixed_arity(sa.func.ST_Contains, 2),
ops.GeoCovers: fixed_arity(sa.func.ST_Covers, 2),
ops.GeoCoveredBy: fixed_arity(sa.func.ST_CoveredBy, 2),
ops.GeoCrosses: fixed_arity(sa.func.ST_Crosses, 2),
ops.GeoDFullyWithin: fixed_arity(sa.func.ST_DFullyWithin, 3),
ops.GeoDifference: fixed_arity(sa.func.ST_Difference, 2),
ops.GeoDisjoint: fixed_arity(sa.func.ST_Disjoint, 2),
ops.GeoDistance: fixed_arity(sa.func.ST_Distance, 2),
ops.GeoDWithin: fixed_arity(sa.func.ST_DWithin, 3),
ops.GeoEnvelope: unary(sa.func.ST_Envelope),
ops.GeoEquals: fixed_arity(sa.func.ST_Equals, 2),
ops.GeoIntersection: fixed_arity(sa.func.ST_Intersection, 2),
ops.GeoIntersects: fixed_arity(sa.func.ST_Intersects, 2),
ops.GeoLength: unary(sa.func.ST_Length),
ops.GeoNPoints: unary(sa.func.ST_NPoints),
ops.GeoOverlaps: fixed_arity(sa.func.ST_Overlaps, 2),
ops.GeoPerimeter: unary(sa.func.ST_Perimeter),
ops.GeoSimplify: fixed_arity(sa.func.ST_Simplify, 3),
ops.GeoSRID: unary(sa.func.ST_SRID),
ops.GeoTouches: fixed_arity(sa.func.ST_Touches, 2),
ops.GeoTransform: fixed_arity(sa.func.ST_Transform, 2),
ops.GeoWithin: fixed_arity(sa.func.ST_Within, 2),
ops.GeoX: unary(sa.func.ST_X),
ops.GeoY: unary(sa.func.ST_Y),
# Missing casts:
# ST_As_GML
# ST_As_GeoJSON
# ST_As_KML
# ST_As_Raster
# ST_As_SVG
# ST_As_TWKB
# Missing Geometric ops:
# ST_Distance_Sphere
# ST_Dump
# ST_DumpPoints
}

_operation_registry.update(_geospatial_functions)


for _k, _v in _binary_ops.items():
_operation_registry[_k] = fixed_arity(_v, 2)
Expand Down Expand Up @@ -893,7 +987,20 @@ def _fetch(self, cursor):
columns=cursor.proxy.keys(),
coerce_float=True,
)
return self.schema().apply_to(df)
df = self.schema().apply_to(df)
# If the dataframe has contents and we support geospatial operations,
# convert the dataframe into a GeoDataFrame with shapely geometries.
if len(df) and geospatial_supported:
geom_col = None
for name, dtype in self.schema().items():
if isinstance(dtype, dt.GeoSpatial):
geom_col = geom_col or name
df[name] = df.apply(
lambda x: shape.to_shape(x[name]), axis=1
)
if geom_col:
df = geopandas.GeoDataFrame(df, geometry=geom_col)
return df


class AlchemyDialect(Dialect):
Expand Down Expand Up @@ -1021,7 +1128,8 @@ def list_tables(self, like=None, database=None, schema=None):
tables : list of strings
"""
inspector = self.inspector
names = inspector.get_table_names(schema=schema)
# inspector returns a mutable version of its names, so make a copy.
names = inspector.get_table_names(schema=schema).copy()
names.extend(inspector.get_view_names(schema=schema))
if like is not None:
names = [x for x in names if like in x]
Expand Down
10 changes: 10 additions & 0 deletions ibis/sql/postgres/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,21 @@ def alltypes(db):
return db.functional_alltypes


@pytest.fixture(scope='module')
def geotable(con):
return con.table('geo')


@pytest.fixture(scope='module')
def df(alltypes):
return alltypes.execute()


@pytest.fixture(scope='module')
def gdf(geotable):
return geotable.execute()


@pytest.fixture(scope='module')
def at(alltypes):
return alltypes.op().sqla_table
Expand Down
12 changes: 12 additions & 0 deletions ibis/sql/postgres/tests/test_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import ibis.expr.datatypes as dt
import ibis.expr.types as ir
from ibis import literal as L
from ibis.expr.window import rows_with_max_lookback

sa = pytest.importorskip('sqlalchemy')
pytest.importorskip('psycopg2')
Expand Down Expand Up @@ -886,6 +887,17 @@ def test_rolling_window(alltypes, func, df):
tm.assert_series_equal(result, expected)


def test_rolling_window_with_mlb(alltypes):
t = alltypes
window = ibis.trailing_window(
preceding=rows_with_max_lookback(3, ibis.interval(days=5)),
order_by=t.timestamp_col
)
expr = t['double_col'].sum().over(window)
with pytest.raises(NotImplementedError):
expr.execute()


@pytest.mark.parametrize('func', ['mean', 'sum', 'min', 'max'])
def test_partitioned_window(alltypes, func, df):
t = alltypes
Expand Down
229 changes: 229 additions & 0 deletions ibis/sql/postgres/tests/test_postgis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
import pandas.util.testing as tm
import pytest

gp = pytest.importorskip('geopandas')
sa = pytest.importorskip('sqlalchemy')
pytest.importorskip('psycopg2')

pytestmark = pytest.mark.postgis


def test_load_geodata(con):
t = con.table('geo')
result = t.execute()
assert isinstance(result, gp.GeoDataFrame)


def test_empty_select(geotable):
expr = geotable[geotable.geo_point.equals(geotable.geo_linestring)]
result = expr.execute()
assert len(result) == 0


def test_select_point_geodata(geotable):
expr = geotable['geo_point']
sqla_expr = expr.compile()
compiled = str(sqla_expr.compile(compile_kwargs=dict(literal_binds=True)))
expected = "SELECT ST_AsEWKB(t0.geo_point) AS geo_point \nFROM geo AS t0"
assert compiled == expected
data = expr.execute()
assert data.geom_type.iloc[0] == 'Point'


def test_select_linestring_geodata(geotable):
expr = geotable['geo_linestring']
sqla_expr = expr.compile()
compiled = str(sqla_expr.compile(compile_kwargs=dict(literal_binds=True)))
expected = (
"SELECT ST_AsEWKB(t0.geo_linestring) AS geo_linestring \n"
"FROM geo AS t0"
)
assert compiled == expected
data = expr.execute()
assert data.geom_type.iloc[0] == 'LineString'


def test_select_polygon_geodata(geotable):
expr = geotable['geo_polygon']
sqla_expr = expr.compile()
compiled = str(sqla_expr.compile(compile_kwargs=dict(literal_binds=True)))
expected = (
"SELECT ST_AsEWKB(t0.geo_polygon) AS geo_polygon \n"
"FROM geo AS t0"
)
assert compiled == expected
data = expr.execute()
assert data.geom_type.iloc[0] == 'Polygon'


def test_select_multipolygon_geodata(geotable):
expr = geotable['geo_multipolygon']
sqla_expr = expr.compile()
compiled = str(sqla_expr.compile(compile_kwargs=dict(literal_binds=True)))
expected = (
"SELECT ST_AsEWKB(t0.geo_multipolygon) AS geo_multipolygon \n"
"FROM geo AS t0"
)
assert compiled == expected
data = expr.execute()
assert data.geom_type.iloc[0] == 'MultiPolygon'


def test_geo_area(geotable, gdf):
expr = geotable.geo_multipolygon.area()
result = expr.execute()
expected = gp.GeoSeries(gdf.geo_multipolygon).area
tm.assert_series_equal(result, expected, check_names=False)


def test_geo_buffer(geotable, gdf):
expr = geotable.geo_linestring.buffer(1.0)
result = expr.execute()
expected = gp.GeoSeries(gdf.geo_linestring).buffer(1.0)
tm.assert_series_equal(
result.area, expected.area, check_names=False, check_less_precise=2
)


def test_geo_contains(geotable):
expr = geotable.geo_point.buffer(1.0).contains(geotable.geo_point)
assert expr.execute().all()


def test_geo_contains_properly(geotable):
expr = geotable.geo_point.buffer(1.0).contains_properly(geotable.geo_point)
assert expr.execute().all()


def test_geo_covers(geotable):
expr = geotable.geo_point.buffer(1.0).covers(geotable.geo_point)
assert expr.execute().all()


def test_geo_covered_by(geotable):
expr = geotable.geo_point.covered_by(geotable.geo_point.buffer(1.0))
assert expr.execute().all()


def test_geo_d_fully_within(geotable):
expr = geotable.geo_point.d_fully_within(
geotable.geo_point.buffer(1.0), 2.0
)
assert expr.execute().all()


def test_geo_d_within(geotable):
expr = geotable.geo_point.d_within(geotable.geo_point.buffer(1.0), 1.0)
assert expr.execute().all()


def test_geo_envelope(geotable, gdf):
expr = geotable.geo_linestring.buffer(1.0).envelope()
result = expr.execute()
expected = gp.GeoSeries(gdf.geo_linestring).buffer(1.0).envelope
tm.assert_series_equal(result.area, expected.area, check_names=False)


def test_geo_within(geotable):
expr = geotable.geo_point.within(geotable.geo_point.buffer(1.0))
assert expr.execute().all()


def test_geo_disjoint(geotable):
expr = geotable.geo_point.disjoint(geotable.geo_point)
assert not expr.execute().any()


def test_geo_equals(geotable):
expr = geotable.geo_point.equals(geotable.geo_point)
assert expr.execute().all()


def test_geo_intersects(geotable):
expr = geotable.geo_point.intersects(geotable.geo_point.buffer(1.0))
assert expr.execute().all()


def test_geo_overlaps(geotable):
expr = geotable.geo_point.overlaps(geotable.geo_point.buffer(1.0))
assert not expr.execute().any()


def test_geo_touches(geotable):
expr = geotable.geo_point.touches(geotable.geo_linestring)
assert expr.execute().all()


def test_geo_distance(geotable, gdf):
expr = geotable.geo_point.distance(geotable.geo_multipolygon.centroid())
result = expr.execute()
expected = gdf.geo_point.distance(
gp.GeoSeries(gdf.geo_multipolygon).centroid
)
tm.assert_series_equal(result, expected, check_names=False)


def test_geo_length(geotable, gdf):
expr = geotable.geo_linestring.length()
result = expr.execute()
expected = gp.GeoSeries(gdf.geo_linestring).length
tm.assert_series_equal(result, expected, check_names=False)


def test_geo_n_points(geotable):
expr = geotable.geo_linestring.n_points()
result = expr.execute()
assert (result == 2).all()


def test_geo_perimeter(geotable):
expr = geotable.geo_multipolygon.perimeter()
result = expr.execute()
# Geopandas doesn't implement perimeter, so we do a simpler check.
assert (result > 0.0).all()


def test_geo_srid(geotable):
expr = geotable.geo_linestring.srid()
result = expr.execute()
assert (result == 4326).all()


def test_geo_difference(geotable, gdf):
expr = geotable.geo_linestring.buffer(1.0).difference(
geotable.geo_point.buffer(0.5)
).area()
result = expr.execute()
expected = gp.GeoSeries(gdf.geo_linestring).buffer(1.0).difference(
gp.GeoSeries(gdf.geo_point).buffer(0.5)
).area
tm.assert_series_equal(
result, expected, check_names=False, check_less_precise=2
)


def test_geo_intersection(geotable, gdf):
expr = geotable.geo_linestring.buffer(1.0).intersection(
geotable.geo_point.buffer(0.5)
).area()
result = expr.execute()
expected = gp.GeoSeries(gdf.geo_linestring).buffer(1.0).intersection(
gp.GeoSeries(gdf.geo_point).buffer(0.5)
).area
tm.assert_series_equal(
result, expected, check_names=False, check_less_precise=2
)


def test_geo_x(geotable, gdf):
expr = geotable.geo_point.x()
result = expr.execute()
expected = gp.GeoSeries(gdf.geo_point).x
tm.assert_series_equal(result, expected, check_names=False)


def test_geo_y(geotable, gdf):
expr = geotable.geo_point.y()
result = expr.execute()
expected = gp.GeoSeries(gdf.geo_point).y
tm.assert_series_equal(result, expected, check_names=False)
8 changes: 4 additions & 4 deletions ibis/tests/all/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ def con(backend):

@pytest.fixture(scope='session')
def alltypes(backend):
return backend.functional_alltypes()
return backend.functional_alltypes


@pytest.fixture(scope='session')
Expand All @@ -145,17 +145,17 @@ def sorted_alltypes(alltypes):

@pytest.fixture(scope='session')
def batting(backend):
return backend.batting()
return backend.batting


@pytest.fixture(scope='session')
def awards_players(backend):
return backend.awards_players()
return backend.awards_players


@pytest.fixture(scope='session')
def geo(backend):
return backend.geo()
return backend.geo


@pytest.fixture
Expand Down
4 changes: 1 addition & 3 deletions ibis/tests/all/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,7 @@ def test_query_schema(backend, con, alltypes, expr_fn, expected):
@pytest.mark.xfail_unsupported
def test_sql(backend, con, alltypes, field):

if not hasattr(backend, 'sql') or not hasattr(
backend, '_get_schema_using_query'
):
if not hasattr(con, 'sql') or not hasattr(con, '_get_schema_using_query'):
pytest.skip('Backend {} does not support sql method'.format(backend))

result = con.sql(alltypes.compile())
Expand Down
128 changes: 117 additions & 11 deletions ibis/tests/all/test_geospatial.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
""" Tests for geo spatial data types"""
from inspect import isfunction

import numpy as np
import pytest
from numpy import testing

Expand All @@ -11,6 +12,8 @@
point_0 = ibis.literal((0, 0), type='point:geometry').name('p')
point_1 = ibis.literal((1, 1), type='point:geometry').name('p')
point_2 = ibis.literal((2, 2), type='point;4326:geometry').name('p')
point_3 = ibis.literal((1, 1), type='point:geography').name('p')
point_4 = ibis.literal((2, 2), type='point;4326:geography').name('p')
polygon_0 = ibis.literal(
(
((1, 0), (0, 1), (-1, 0), (0, -1), (1, 0)),
Expand All @@ -19,6 +22,9 @@
type='polygon',
)

# add here backends that support geo spatial types
all_db_geo_supported = [MapD]


@pytest.mark.parametrize(
('expr_fn', 'expected'),
Expand All @@ -36,7 +42,7 @@
(lambda t: t['geo_point'].srid(), [0] * 5),
],
)
@pytest.mark.only_on_backends([MapD])
@pytest.mark.only_on_backends(all_db_geo_supported)
def test_geo_spatial_unops(backend, geo, expr_fn, expected):
"""Testing for geo spatial unary operations."""
expr = expr_fn(geo)
Expand Down Expand Up @@ -67,7 +73,7 @@ def test_geo_spatial_unops(backend, geo, expr_fn, expected):
),
],
)
@pytest.mark.only_on_backends([MapD])
@pytest.mark.only_on_backends(all_db_geo_supported)
def test_geo_spatial_binops(backend, geo, fn, arg_left, arg_right, expected):
"""Testing for geo spatial binary operations."""
left = arg_left(geo) if isfunction(arg_left) else arg_left
Expand All @@ -85,7 +91,7 @@ def test_geo_spatial_binops(backend, geo, fn, arg_left, arg_right, expected):
(lambda t: t['geo_linestring'].start_point(), [False] * 5),
],
)
@pytest.mark.only_on_backends([MapD])
@pytest.mark.only_on_backends(all_db_geo_supported)
def test_get_point(backend, geo, expr_fn, expected):
"""Testing for geo spatial get point operations."""
# a geo spatial data does not contain its boundary
Expand All @@ -96,17 +102,117 @@ def test_get_point(backend, geo, expr_fn, expected):


@pytest.mark.parametrize(('arg', 'expected'), [(polygon_0, [1.98] * 5)])
@pytest.mark.only_on_backends([MapD])
@pytest.mark.only_on_backends(all_db_geo_supported)
def test_area(backend, geo, arg, expected):
"""Testing for geo spatial area operation."""
expr = geo[geo, arg.area().name('tmp')]
expr = geo[geo.id, arg.area().name('tmp')]
result = expr.execute()['tmp']
testing.assert_almost_equal(result, expected, decimal=2)


@pytest.mark.parametrize(('arg', 'expected'), [(point_2.srid() == 4326, True)])
@pytest.mark.only_on_backends([MapD])
def test_srid_literals(backend, geo, arg, expected):
"""Testing for geo spatial srid operation (from literal)."""
result = geo[geo, arg.name('tmp')].execute()['tmp'][[0]]
testing.assert_almost_equal(result, [expected], decimal=2)
@pytest.mark.parametrize(
('condition', 'expected'),
[
(lambda t: point_2.srid(), 4326),
(lambda t: point_0.srid(), 0),
(lambda t: t.geo_point.srid(), 0),
(lambda t: t.geo_linestring.srid(), 0),
(lambda t: t.geo_polygon.srid(), 0),
(lambda t: t.geo_multipolygon.srid(), 0),
]
)
@pytest.mark.only_on_backends(all_db_geo_supported)
def test_srid(backend, geo, condition, expected):
"""Testing for geo spatial srid operation."""
expr = geo[geo.id, condition(geo).name('tmp')]
result = expr.execute()['tmp'][[0]]
assert np.all(result == expected)


@pytest.mark.parametrize(
('condition', 'expected'),
[
(lambda t: point_0.set_srid(4326).srid(), 4326),
(lambda t: point_0.set_srid(4326).set_srid(0).srid(), 0),
(lambda t: t.geo_point.set_srid(4326).srid(), 4326),
(lambda t: t.geo_linestring.set_srid(4326).srid(), 4326),
(lambda t: t.geo_polygon.set_srid(4326).srid(), 4326),
(lambda t: t.geo_multipolygon.set_srid(4326).srid(), 4326),
]
)
@pytest.mark.only_on_backends(all_db_geo_supported)
def test_set_srid(backend, geo, condition, expected):
"""Testing for geo spatial set_srid operation."""
expr = geo[geo.id, condition(geo).name('tmp')]
result = expr.execute()['tmp'][[0]]
assert np.all(result == expected)


@pytest.mark.parametrize(
('condition', 'expected'),
[
(lambda t: point_0.set_srid(4326).transform(900913).srid(),
900913),
(lambda t: point_2.transform(900913).srid(),
900913),
(lambda t: t.geo_point.set_srid(4326).transform(900913).srid(),
900913),
(lambda t: t.geo_linestring.set_srid(4326).transform(900913).srid(),
900913),
(lambda t: t.geo_polygon.set_srid(4326).transform(900913).srid(),
900913),
(lambda t: t.geo_multipolygon.set_srid(4326).transform(900913).srid(),
900913),
]
)
@pytest.mark.only_on_backends(all_db_geo_supported)
@pytest.mark.xfail_unsupported
def test_transform(backend, geo, condition, expected):
"""Testing for geo spatial transform operation."""
expr = geo[geo.id, condition(geo).name('tmp')]
result = expr.execute()['tmp'][[0]]
assert np.all(result == expected)


@pytest.mark.parametrize(
'expr_fn',
[
lambda t: t.geo_point.set_srid(4326),
lambda t: point_0.set_srid(4326),
lambda t: point_1.set_srid(4326),
lambda t: point_2,
lambda t: point_3.set_srid(4326),
lambda t: point_4,
]
)
@pytest.mark.only_on_backends(all_db_geo_supported)
@pytest.mark.xfail_unsupported
def test_cast_geography(backend, geo, expr_fn):
"""Testing for geo spatial transform operation."""
p = expr_fn(geo).cast('geography')
expr = geo[geo.id, p.distance(p).name('tmp')]
result = expr.execute()['tmp'][[0]]
# distance from a point to a same point should be 0
assert np.all(result == 0)


@pytest.mark.parametrize(
'expr_fn',
[
lambda t: t.geo_point.set_srid(4326),
lambda t: point_0.set_srid(4326),
lambda t: point_1.set_srid(4326),
lambda t: point_2,
lambda t: point_3.set_srid(4326),
lambda t: point_4,
]
)
@pytest.mark.only_on_backends(all_db_geo_supported)
@pytest.mark.xfail_unsupported
def test_cast_geometry(backend, geo, expr_fn):
"""Testing for geo spatial transform operation."""
p = expr_fn(geo).cast('geometry')
expr = geo[geo.id, p.distance(p).name('tmp')]
result = expr.execute()['tmp'][[0]]
# distance from a point to a same point should be 0
assert np.all(result == 0)
304 changes: 215 additions & 89 deletions ibis/tests/backends.py

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,15 @@ markers =
clickhouse
csv
hdfs
hdf5
impala
kudu
mapd
mysql
only_on_backends
pandas
parquet
postgis
postgresql
skip_backends
skip_missing_feature
Expand Down
7 changes: 6 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@
else:
parquet_requires = ['pyarrow>=0.12.0']

geospatial_requires = ['geoalchemy2', 'geopandas']

all_requires = (
impala_requires
+ postgres_requires
Expand All @@ -51,6 +53,7 @@
+ bigquery_requires
+ hdf5_requires
+ parquet_requires
+ geospatial_requires
)

develop_requires = all_requires + [
Expand All @@ -59,7 +62,8 @@
'isort',
'mypy',
'pre-commit',
'pytest>=3',
'pygit2',
'pytest>=4.5',
]

install_requires = [
Expand Down Expand Up @@ -93,6 +97,7 @@
'bigquery': bigquery_requires,
'hdf5': hdf5_requires,
'parquet': parquet_requires,
'geospatial': geospatial_requires,
},
description="Productivity-centric Python Big Data Framework",
long_description=LONG_DESCRIPTION,
Expand Down