Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: (optionally) use pygeos for vectorized GeometryArray operations #1154

Merged
merged 33 commits into from
Mar 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
e065828
POC: use pygeos for vectorized GeometryArray operations
jorisvandenbossche Oct 12, 2019
348cf67
more methods from pygeos
jorisvandenbossche Oct 22, 2019
f2a704e
Merge remote-tracking branch 'upstream/master' into pygeos
jorisvandenbossche Oct 31, 2019
2724fce
try adding pygeos to CI
jorisvandenbossche Oct 31, 2019
d98a39a
fix geocoding test
jorisvandenbossche Oct 31, 2019
f5ce717
add reference to shapely issue
jorisvandenbossche Oct 31, 2019
ec2b137
Merge remote-tracking branch 'upstream/master' into pygeos
jorisvandenbossche Nov 18, 2019
3f5aae0
correct is_simple test
jorisvandenbossche Nov 18, 2019
3ec46bf
use pygeos for bounds
jorisvandenbossche Nov 19, 2019
2556d59
Merge remote-tracking branch 'upstream/master' into pygeos
jorisvandenbossche Feb 18, 2020
02d303c
REF: split vectorized compat code into separate file
jorisvandenbossche Feb 18, 2020
87cab75
remove redundant from_shapely calls
jorisvandenbossche Feb 18, 2020
86d41c4
Test both with and without PyGEOS on Travis CI
jorisvandenbossche Feb 18, 2020
1b02211
avoid pygeos import when not available
jorisvandenbossche Feb 18, 2020
bcbf806
fix travis.yml syntax
jorisvandenbossche Feb 18, 2020
9ca8cc9
fix testing of is_simple
jorisvandenbossche Feb 18, 2020
7270845
fix test of exterior
jorisvandenbossche Feb 18, 2020
8d53086
fix unary geo ops tests
jorisvandenbossche Feb 18, 2020
1013ee8
Use vectorized transform for to_crs
jorisvandenbossche Feb 19, 2020
9055557
fixup usage of .array -> .values (for old pandas)
jorisvandenbossche Feb 19, 2020
c8e2080
make USE_PYGEOS constant switchable + add option
jorisvandenbossche Mar 8, 2020
6a044ff
check minimum pygeos version
jorisvandenbossche Mar 8, 2020
fd6bbb6
add docs on how to enable/disable the speedups
jorisvandenbossche Mar 13, 2020
611e50b
add pygeos to asv env
jorisvandenbossche Mar 21, 2020
ddbd020
Merge remote-tracking branch 'upstream/master' into pygeos
jorisvandenbossche Mar 21, 2020
01231be
update for feedback
jorisvandenbossche Mar 21, 2020
4aebcbb
optimize pygeos <-> shapely conversion
jorisvandenbossche Mar 21, 2020
0cfdb53
fix config tests
jorisvandenbossche Mar 21, 2020
02ea1ad
fix issue with pygeos and prepared geom in sjoin
jorisvandenbossche Mar 21, 2020
a48c5e0
fix performance degration of sjoin when using pygeos
jorisvandenbossche Mar 21, 2020
c07efd2
fix travis.yml
jorisvandenbossche Mar 21, 2020
6f5a776
optimize geom_type
jorisvandenbossche Mar 23, 2020
12b8634
array -> values for old pandas
jorisvandenbossche Mar 23, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
19 changes: 11 additions & 8 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ matrix:
- env: ENV_FILE="ci/travis/35-minimal.yaml"

# Python 3.6 test all supported Pandas versions
- env: ENV_FILE="ci/travis/36-pd023.yaml"
- env: ENV_FILE="ci/travis/36-pd024.yaml"
- env: ENV_FILE="ci/travis/36-pd023.yaml" PYGEOS=true
- env: ENV_FILE="ci/travis/36-pd024.yaml" PYGEOS=true

- env: ENV_FILE="ci/travis/37-latest-defaults.yaml" STYLE=true
- env: ENV_FILE="ci/travis/37-latest-conda-forge.yaml"
- env: ENV_FILE="ci/travis/37-latest-defaults.yaml" STYLE=true PYGEOS=true
- env: ENV_FILE="ci/travis/37-latest-conda-forge.yaml" PYGEOS=true

- env: ENV_FILE="ci/travis/38-latest-conda-forge.yaml"
- env: ENV_FILE="ci/travis/38-latest-conda-forge.yaml" PYGEOS=true

- env: ENV_FILE="ci/travis/37-dev.yaml" DEV=true
- env: ENV_FILE="ci/travis/37-dev.yaml" DEV=true PYGEOS=true

allow_failures:
- env: ENV_FILE="ci/travis/37-dev.yaml" DEV=true
- env: ENV_FILE="ci/travis/37-dev.yaml" DEV=true PYGEOS=true

install:
# Install conda
Expand All @@ -46,7 +46,10 @@ install:
- python -c "import geopandas; geopandas.show_versions();"

script:
- py.test geopandas --cov geopandas -v --cov-report term-missing
- echo "Testing without PyGEOS"
- USE_PYGEOS=0 pytest geopandas --cov geopandas -v --cov-report term-missing
- if [ "$PYGEOS" ]; then echo "Testing with PyGEOS"; fi
- if [ "$PYGEOS" ]; then USE_PYGEOS=1 pytest geopandas --cov geopandas -v --cov-report term-missing; fi
- if [ "$STYLE" ]; then black --check geopandas; fi
- if [ "$STYLE" ]; then flake8 geopandas; fi

Expand Down
2 changes: 1 addition & 1 deletion asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
"matrix": {
"pandas": [],
"shapely": [],
"cython": [],
"pygeos": [],
"fiona": [],
"pyproj": [],
"rtree": [],
Expand Down
2 changes: 2 additions & 0 deletions ci/travis/36-pd023.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies:
- gdal=2.3
- fiona
#- pyproj
- geos
# testing
- pytest
- pytest-cov
Expand All @@ -28,3 +29,4 @@ dependencies:
- pyproj==2.3.1
- geopy
- codecov
- git+https://github.com/pygeos/pygeos.git
2 changes: 2 additions & 0 deletions ci/travis/36-pd024.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ dependencies:
- shapely
- fiona=1.7
#- pyproj
- geos
# testing
- pytest
- pytest-cov
Expand All @@ -25,3 +26,4 @@ dependencies:
- codecov
- geopy
- mapclassify
- git+https://github.com/pygeos/pygeos.git
2 changes: 2 additions & 0 deletions ci/travis/37-dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ dependencies:
- shapely
- fiona
- pyproj
- geos
# testing
- pytest
- pytest-cov
Expand All @@ -25,3 +26,4 @@ dependencies:
- codecov
- geopy
- mapclassify
- git+https://github.com/pygeos/pygeos.git
1 change: 1 addition & 0 deletions ci/travis/37-latest-conda-forge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ dependencies:
- shapely
- fiona
- pyproj
- pygeos
# testing
- pytest
- pytest-cov
Expand Down
2 changes: 2 additions & 0 deletions ci/travis/37-latest-defaults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ dependencies:
- shapely
- fiona
- pyproj
- geos
# testing
- pytest
- pytest-cov
Expand All @@ -24,3 +25,4 @@ dependencies:
- codecov
- geopy
- mapclassify
- git+https://github.com/pygeos/pygeos.git
1 change: 1 addition & 0 deletions ci/travis/38-latest-conda-forge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ dependencies:
- shapely
- fiona
- pyproj
- pygeos
# testing
- pytest
- pytest-cov
Expand Down
42 changes: 42 additions & 0 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,46 @@ For plotting, these additional packages may be used:
- `mapclassify`_


Using the optional PyGEOS dependency
------------------------------------

Work is ongoing to improve the performance of GeoPandas. Currently, the
fast implementations of basic spatial operations live in the `PyGEOS`_
package (but work is under way to contribute those improvements to Shapely).
Starting with GeoPandas 0.8, it is possible to optionally use those
experimental speedups by installing PyGEOS. This can be done with conda
(using the conda-forge channel) or pip::

# conda
conda install pygeos --channel conda-forge
# pip
pip install pygeos

More specifically, whether the speedups are used or not is determined by:

- If PyGEOS is installed, it will be used by default (but installing GeoPandas
will not yet automatically install PyGEOS as dependency, you need to do this
manually).

- You can still toggle the use of PyGEOS when it is available, by:

- Setting an environment variable (``USE_PYGEOS=0/1``). Note this variable
is only checked at first import of GeoPandas.
- Setting an option: ``geopandas.options.use_pygeos = True/False``. Note,
although this variable can be set during an interactive session, it will
only work if the GeoDataFrames you use are created (e.g. reading a file
with ``read_file``) after changing this value.

.. warning::

The use of PyGEOS is experimental! Although it is passing all tests,
there might still be issues and not all functions of GeoPandas will
already benefit from speedups. But trying this out is very welcome!
Any issues you encounter (but also reports of successful usage are
interesting!) can be reported at https://gitter.im/geopandas/geopandas
or https://github.com/geopandas/geopandas/issues


.. _PyPI: https://pypi.python.org/pypi/geopandas

.. _GitHub: https://github.com/geopandas/geopandas
Expand Down Expand Up @@ -204,3 +244,5 @@ For plotting, these additional packages may be used:
.. _GEOS: https://geos.osgeo.org

.. _PROJ: https://proj.org/

.. _PyGEOS: https://github.com/pygeos/pygeos/
3 changes: 2 additions & 1 deletion geopandas/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from geopandas._config import options # noqa

from geopandas.geoseries import GeoSeries # noqa
from geopandas.geodataframe import GeoDataFrame # noqa
from geopandas.array import points_from_xy # noqa
Expand All @@ -12,7 +14,6 @@

import geopandas.datasets # noqa

from geopandas._config import options # noqa

# make the interactive namespace easier to use
# for `from geopandas import *` demos.
Expand Down
80 changes: 80 additions & 0 deletions geopandas/_compat.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from distutils.version import LooseVersion
import os

import pandas as pd

Expand All @@ -9,3 +10,82 @@
PANDAS_GE_024 = str(pd.__version__) >= LooseVersion("0.24.0")
PANDAS_GE_025 = str(pd.__version__) >= LooseVersion("0.25.0")
PANDAS_GE_10 = str(pd.__version__) >= LooseVersion("0.26.0.dev")


# -----------------------------------------------------------------------------
# Shapely / PyGEOS compat
# -----------------------------------------------------------------------------

USE_PYGEOS = None
PYGEOS_SHAPELY_COMPAT = None


def set_use_pygeos(val=None):
"""
Set the global configuration on whether to use PyGEOS or not.

The default is use PyGEOS if it is installed. This can be overridden
with an environment variable USE_PYGEOS (this is only checked at
first import, cannot be changed during interactive session).

Alternatively, pass a value here to force a True/False value.
"""
global USE_PYGEOS
global PYGEOS_SHAPELY_COMPAT

if val is not None:
USE_PYGEOS = bool(val)
else:
if USE_PYGEOS is None:
try:
import pygeos # noqa

USE_PYGEOS = True
except ImportError:
USE_PYGEOS = False

env_use_pygeos = os.getenv("USE_PYGEOS", None)
if env_use_pygeos is not None:
USE_PYGEOS = bool(int(env_use_pygeos))

# validate the pygeos version
if USE_PYGEOS:
try:
import pygeos # noqa

# validate the pygeos version
if not str(pygeos.__version__) >= LooseVersion("0.6"):
raise ImportError(
"PyGEOS >= 0.6 is required, version {0} is installed".format(
pygeos.__version__
)
)

# Check whether Shapely and PyGEOS use the same GEOS version.
# Based on PyGEOS from_shapely implementation.

from shapely.geos import geos_version_string as shapely_geos_version
from pygeos import geos_capi_version_string

# shapely has something like: "3.6.2-CAPI-1.10.2 4d2925d6"
# pygeos has something like: "3.6.2-CAPI-1.10.2"
if not shapely_geos_version.startswith(geos_capi_version_string):
warnings.warn(
"The Shapely GEOS version ({}) is incompatible with the GEOS "
"version PyGEOS was compiled with ({}). Conversions between both "
"will be slow.".format(
shapely_geos_version, geos_capi_version_string
)
)
PYGEOS_SHAPELY_COMPAT = False
else:
PYGEOS_SHAPELY_COMPAT = True

except ImportError:
raise ImportError(
"To use the PyGEOS speed-ups within GeoPandas, you need to install "
"PyGEOS: 'conda install pygeos' or 'pip install pygeos'"
)


set_use_pygeos()
38 changes: 36 additions & 2 deletions geopandas/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ def __setattr__(self, key, value):
if option.validator:
option.validator(value)
self._config[key] = value
if option.callback:
option.callback(key, value)
else:
msg = "You can only set the value of existing options"
raise AttributeError(msg)
Expand Down Expand Up @@ -58,7 +60,7 @@ def __repr__(self):
else:
doc_text = u"No description available."
doc_text = indent(doc_text, prefix=" ")
description += doc_text
description += doc_text + "\n"
space = "\n "
description = description.replace("\n", space)
return "{}({}{})".format(cls, space, description)
Expand Down Expand Up @@ -100,4 +102,36 @@ def _validate_display_precision(value):
callback=None,
)

options = Options({"display_precision": display_precision})

def _validate_bool(value):
if not isinstance(value, bool):
raise TypeError("Expected bool value, got {0}".format(type(value)))


def _default_use_pygeos():
import geopandas._compat as compat

return compat.USE_PYGEOS


def _callback_use_pygeos(key, value):
assert key == "use_pygeos"
import geopandas._compat as compat

compat.set_use_pygeos(value)


use_pygeos = Option(
key="use_pygeos",
default_value=_default_use_pygeos(),
doc=(
"Whether to use PyGEOS to speed up spatial operations. The default is True "
"if PyGEOS is installed, and follows the USE_PYGEOS environment variable "
"if set."
),
validator=_validate_bool,
callback=_callback_use_pygeos,
)


options = Options({"display_precision": display_precision, "use_pygeos": use_pygeos})
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved