New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow-up - Refactor cythonize geometry series operations #473
Comments
To test this, you need to do the following:
If there are questions or things not clear, just ask. |
You should only need to mess with
Presumably brew has something similar for OS-X. |
We might consider xfailing the currently failing tests just so that we can start trusting the CI signals in github. We can remove xfail markers as we progress. |
Yes, I have GEOS installed with conda (not using the system one), and therefore have to specify those flags (for now, somebody is very welcome to improve the makefile script / setup.py to do this automatically). |
If people are looking for spatial joins then you should use this branch: #475 |
@eriknw do you have thoughts on how to automatically configure the build process to find the geos_c library regardless of if it is installed with the system package manager or with conda? From @jorisvandenbossche header comment it seems like this is a common speedbump for installation. |
I'm on it. |
@mrocklin I notice that the |
@mrocklin Regarding the failing Line 140 in bf8f8ea
is_ring only has meaning for LineStrings or LinearRings, according to the shapely docs)However, If I try to pass the exterior to the vectorized function, it does not return the correct result:
|
It just didn't match the API of the other functions and so needed special casing. It was low on my priority list and so hasn't yet happened. I'll take a look at it though. |
While you are at it, I think |
Yeah, interestingly on the C side they look fairly different extern int GEOS_DLL GEOSDistance_r(GEOSContextHandle_t handle,
const GEOSGeometry* g1,
const GEOSGeometry* g2, double *dist); extern double GEOS_DLL GEOSProject_r(GEOSContextHandle_t handle,
const GEOSGeometry *g,
const GEOSGeometry *p); But both are fairly doable. I have a flight coming up. Should be a good opportunity to knock them out. (also I'll be away from internet for the next few hours). |
This passes for me def test_is_ring():
p = Polygon([(0, 0), (1, 0), (1, 1)])
s = GeoSeries([p, p])
assert list(s.exterior.is_ring) == [True, True] |
Yes, that passes for me as well. So when I update the |
I'm getting the same thing. I agree that that is weird. |
BTW, unary_union is maybe a nice one to try to cythonize as well? |
Do people actually use |
@mrocklin Yes, I use |
I'm curious, what is your intention behind unary union? What do you want
to achieve with it?
…On Fri, Aug 18, 2017 at 2:09 PM, James McBride ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin> Yes, I use unary_union pretty
regularly. I haven't noticed it being a major performance bottleneck in the
situations I've used it in though.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#473 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszJmBRGc15j5Ws6mbgssQBNV6ybiiks5sZdNCgaJpZM4OuTfB>
.
|
Couple examples, if it helps to provide real world examples:
|
We also have a |
@mrocklin Perhaps nothing that couldn't be addressed in alternative ways, but a few that spring immediately to mind:
|
@jorisvandenbossche if you have a moment can you take a look at pickling tests
|
Also, how is the pandas work going? Are you still confident that the GeometryBlock solution will work? |
Hi, when trying to import geopandas followinf the instructions:
I got:
any clue on what I'm missing? I'm on . ubuntu 16.04, python3.6 latest geopandas maser. |
@epifanio can you show the output of the install commands? (paste here or put in a gist or so) |
@jorisvandenbossche thanks for checking this. I've re-installed both pandas and geopandas. I removed my pandas installation and geopandas by issuing several times:
then a fresh clone of both repository and installing ..
gives me: https://gist.github.com/76e8a0691b9894217f06b6857522a769 https://gist.github.com/aeecaf45d64173de1cb06780562e80af testing I got the following:
|
@epifanio so from the output of the logs, it seems the That said, we of course need to fix the non-inplace installation |
I tried to repeat all the steps (manually removed all the traces of geopandas), this the complete log: https://gist.github.com/955fc72676e70cb4525fae78b4412d48 I was wondering if I should try on a clean virtualenv, instead of my system python (which uses sudo to install). I'll try and report here. |
What version of cython do you have? And can you try this patch? diff --git a/setup.py b/setup.py
index fb1708b..e35626d 100644
--- a/setup.py
+++ b/setup.py
@@ -85,7 +85,7 @@ for modname in ['vectorized']:
ext_modules.append(
Extension(
'geopandas.' + modname,
- ['geopandas/' + modname + suffix],
+ ['geopandas/' + modname + suffix, 'geopandas/algos.c'],
include_dirs=[numpy.get_include(), _geos_headers],
libraries=['geos_c'],
library_dirs=[_geos_lib], |
Cython version:
Patching |
So there is a |
FYI to all: binaries (I will update regularly, but they are not nightly) of the geopandas-cython development branch are available on conda-forge for all platforms:
|
Hi @jorisvandenbossche and everyone. This should be obvious, but if you have a non-conda forge I ran Before running this update, I received this error After install, I saw the bumpConfirming the obvious, but once I installed, I saw bumps in speeds while working through the Scipy 2018 geopandas tutorial. In a work case a few months ago, I couldn't use geopandas for a predicate operation on a geospatial data set with hundreds of millions of observations; now this bump makes it possible. |
@jorisvandenbossche let me know if you don't want us posting issues we come across in here. But, I found a possible bug when trying to use the Make any geodataframe and try to use >>>df = pd.DataFrame(
{'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})
>>>df['Coordinates'] = list(zip(df.Longitude, df.Latitude))
>>>df['Coordinates'] = df['Coordinates'].apply(Point)
>>>df.shift()
[Out]:---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-44-f2de45e67fad> in <module>()
----> 1 df.shift()
~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/frame.py in shift(self, periods, freq, axis)
3801 def shift(self, periods=1, freq=None, axis=0):
3802 return super(DataFrame, self).shift(periods=periods, freq=freq,
-> 3803 axis=axis)
3804
3805 def set_index(self, keys, drop=True, append=False, inplace=False,
~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/generic.py in shift(self, periods, freq, axis)
7826 block_axis = self._get_block_manager_axis(axis)
7827 if freq is None:
-> 7828 new_data = self._data.shift(periods=periods, axis=block_axis)
7829 else:
7830 return self.tshift(periods, freq)
~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/internals.py in shift(self, **kwargs)
3703
3704 def shift(self, **kwargs):
-> 3705 return self.apply('shift', **kwargs)
3706
3707 def fillna(self, **kwargs):
~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
3579
3580 kwargs['mgr'] = self
-> 3581 applied = getattr(b, f)(**kwargs)
3582 result_blocks = _extend_blocks(applied, result_blocks)
3583
~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/internals.py in shift(self, periods, axis, mgr)
1286
1287 # make sure array sent to np.roll is c_contiguous
-> 1288 f_ordered = new_values.flags.f_contiguous
1289 if f_ordered:
1290 new_values = new_values.T
AttributeError: 'GeometryArray' object has no attribute 'flags' |
@linwoodc3 no, no, posting such reports is very welcome! Are you running on pandas master or pandas 0.23 ? |
Hi @jorisvandenbossche . Yes, I am running |
Trying to get this dev cython gpd branch to work on Anaconda/Windows. Am I missing a step by doing the following?
I also ran (just in case): Either way I get the following:
Not sure if the way I am approaching it is possible... I won't deny there were clobber messages whilst installing...! Any pointers would much appreciated! |
The latest pandas release (0.24.0) moved some internal classes, so we will need to update the imports here. If you manually change the import (in
to
then I think it should still work (although there might be other things that changed as well). @webturtles happy to further assist if you want to try out this branch! (I need to update the branch with the latest changes in both pandas and geopandas) |
Thanks for quick response. Tweaking _block.py worked a treat! If you want me to test on a fresh environment at some point, happy to. I assume the steps I took were right (poss. excluding the gdal update - was throwing the kitchen sink at it before) |
My working code has decided to take a turn for the worse after using the dev branch. After running
Returns:
For info, I also had some issues with numpy and the like to get everything working. Involved a couple of environment rebuilds, though got there in the end! I think using conda-forge installs for other libraries helped. |
When calling
Workaround:
use:
|
@waylonflinn thanks for trying it out! But, in the meantime, the cython effort has been shifted to the pygeos package, and a PR has recently landed in master with optional support for that. See #1154 and https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for some docs. |
(and I need to update the top-post comment of this issue to reflect that) |
Good to know! Thanks! |
UPDATE: the cython effort has been shifted to the PyGEOS package (to be integrated into Shapely), and a PR has recently landed in master with optional support for that (will be released as GeoPandas 0.8). See #1154 and https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for some docs.
I merged #467 in #472
Status: an initial implementation of the refactor (#467) has been merged in the
geopandas-cython
branch, leavingmaster
currently as the 'stable' branch.Further improvements can be done by PR, but targeting this geopandas-cython branch (when you open a PR, you can choose the base branch).
A bit more background on the new implementation we are trying out: we made a vectorized geometry object
GeometryArray
(array-like with vectorized operations) in cython in geopandas. This vectorized geometry object only holds the integer pointers as its data, and only boxes it to shapely objects when the user accesses eg a single element, or iterates over it, ... This makes it fast and cheaper to construct.To integrate this in the GeoDataFrame and GeoSeries, we implemented a new GeometryBlock ('blocks' are the internal building block of pandas for the different columns). The reason we need a custom GeometryBlock, is because we need to have a way to let pandas know the data are not just normal integers we store in the dataframe (it are pointers to geometry objects), and cannot be manipulated as it were integers.
Some known to do items:
shapely._buildcfg.get_os_config
#489)GeometryArray.unique
method (then GeoSeries.unique will work automatically)cc @mrocklin @sgillies @kjordahl @jdmcbr @kuanb @eriknw
The text was updated successfully, but these errors were encountered: