Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-up - Refactor cythonize geometry series operations #473

Closed
2 of 8 tasks
jorisvandenbossche opened this issue Aug 5, 2017 · 68 comments
Closed
2 of 8 tasks

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 5, 2017

UPDATE: the cython effort has been shifted to the PyGEOS package (to be integrated into Shapely), and a PR has recently landed in master with optional support for that (will be released as GeoPandas 0.8). See #1154 and https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for some docs.


I merged #467 in #472

Status: an initial implementation of the refactor (#467) has been merged in the geopandas-cython branch, leaving master currently as the 'stable' branch.
Further improvements can be done by PR, but targeting this geopandas-cython branch (when you open a PR, you can choose the base branch).


A bit more background on the new implementation we are trying out: we made a vectorized geometry object GeometryArray (array-like with vectorized operations) in cython in geopandas. This vectorized geometry object only holds the integer pointers as its data, and only boxes it to shapely objects when the user accesses eg a single element, or iterates over it, ... This makes it fast and cheaper to construct.

To integrate this in the GeoDataFrame and GeoSeries, we implemented a new GeometryBlock ('blocks' are the internal building block of pandas for the different columns). The reason we need a custom GeometryBlock, is because we need to have a way to let pandas know the data are not just normal integers we store in the dataframe (it are pointers to geometry objects), and cannot be manipulated as it were integers.


Some known to do items:

  • fix remaining failings tests
  • make installation / building easier (eg automatically finding geos location -> Get include and library paths for GEOS from shapely._buildcfg.get_os_config #489)
  • some changes will be needed to pandas (eg to support concat)
  • implement cythonized/vectorized io functionality (shapefiles, geojson, x/y from csv/df)
  • create an asv benchmark suite to track progress / improvement over master (this should first be merged in master) -> Start adding an asv benchmark suite #497
  • update conda recipe (maybe we can use conda-forge to provide 'beta' builds)
  • get appveyor working to test on windows
  • add a GeometryArray.unique method (then GeoSeries.unique will work automatically)

cc @mrocklin @sgillies @kjordahl @jdmcbr @kuanb @eriknw

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Aug 5, 2017

To test this, you need to do the following:

  • Make sure you have an environment with the latest pandas master version (currently this is only needed to make the repr working)
  • Make sure all dependencies of geopandas are installed
  • Fetch the upstream geopandas-cython branch and check this out
  • Build the extensions (this is a new step in the installation since we now have compiled parts):
    • There is a makefile which provides make inplace (and make clean)

    • However, for me this doesn't work yet as it doesn't find the location of my GEOS library (see to do above). I currently do the following to achieve the same: (UPDATE: this should now work out of the box)

      python setup.py build_ext --inplace --with-cython -l geos_c -L /home/joris/miniconda3/lib -I /home/joris/miniconda3/include
      

      It are the last to args (-L and -I) you need to update. You can normally find them by running geos-config --clibs and geos-config --cflags in a terminal.~~

  • Test!

If there are questions or things not clear, just ask.
And feedback can be given here as well!

cc @ozak @gboeing @nickeubank @wmay

@mrocklin
Copy link
Member

mrocklin commented Aug 5, 2017

You should only need to mess with -L flags specifying libraries if you don't have the geos_c library on your system's load library path. This is typically the case if you use your system installer to install the GEOS library. On debian/ubuntu I do this with the following:

sudo apt-get install libgeos-dev

Presumably brew has something similar for OS-X.

@mrocklin
Copy link
Member

mrocklin commented Aug 5, 2017

We might consider xfailing the currently failing tests just so that we can start trusting the CI signals in github. We can remove xfail markers as we progress.

@jorisvandenbossche
Copy link
Member Author

Yes, I have GEOS installed with conda (not using the system one), and therefore have to specify those flags (for now, somebody is very welcome to improve the makefile script / setup.py to do this automatically).

@mrocklin
Copy link
Member

mrocklin commented Aug 5, 2017

If people are looking for spatial joins then you should use this branch: #475

@mrocklin
Copy link
Member

@eriknw do you have thoughts on how to automatically configure the build process to find the geos_c library regardless of if it is installed with the system package manager or with conda? From @jorisvandenbossche header comment it seems like this is a common speedbump for installation.

@eriknw
Copy link

eriknw commented Aug 14, 2017

@eriknw do you have thoughts on how to automatically configure the build process to find the geos_c library regardless of if it is installed with the system package manager or with conda?

I'm on it.

@jorisvandenbossche
Copy link
Member Author

@mrocklin I notice that the distance method still returns a NotImplementedError. Was there a specific difficulty for adding this to the vectorized module?

@jorisvandenbossche
Copy link
Member Author

@mrocklin Regarding the failing test_is_ring test: this is because previously, the is_ring attribute was called on the exterior of each geometry (

return Series([geom.exterior.is_ring for geom in self.geometry],
), while now on the actual geometries itself (is_ring only has meaning for LineStrings or LinearRings, according to the shapely docs)
However, If I try to pass the exterior to the vectorized function, it does not return the correct result:

import geopandas
from shapely.geometry import Polygon
p = Polygon([(0, 0), (1, 0), (1, 1)])
s = geopandas.GeoSeries([p, p])

In [35]: s.is_ring  # on master this returns all True, hence the failing test
Out[35]: 
0    False
1    False
dtype: bool

In [36]: from geopandas.vectorized import unary_predicate

In [37]: unary_predicate('is_ring', s._geometry_array.data)
Out[37]: array([False, False], dtype=bool)
In [39]: s.exterior
Out[39]: I am densified (2 elements)

0    LINEARRING (0 0, 1 0, 1 1, 0 0)
1    LINEARRING (0 0, 1 0, 1 1, 0 0)
dtype: object

In [40]: s.exterior.apply(lambda x: x.is_ring) # emulation of the behaviour of master
I am densified (external_values, 2 elements)
Out[40]: 
0    True
1    True
dtype: bool

In [41]: unary_predicate('is_ring', s.exterior._geometry_array.data)
Out[41]: array([False, False], dtype=bool)

@mrocklin
Copy link
Member

@mrocklin I notice that the distance method still returns a NotImplementedError. Was there a specific difficulty for adding this to the vectorized module?

It just didn't match the API of the other functions and so needed special casing. It was low on my priority list and so hasn't yet happened. I'll take a look at it though.

@jorisvandenbossche
Copy link
Member Author

While you are at it, I think project follows similar pattern (it also does (geom, geom) -> float)

@mrocklin
Copy link
Member

Yeah, interestingly on the C side they look fairly different

extern int GEOS_DLL GEOSDistance_r(GEOSContextHandle_t handle,
                                   const GEOSGeometry* g1,
                                   const GEOSGeometry* g2, double *dist);
extern double GEOS_DLL GEOSProject_r(GEOSContextHandle_t handle,
                                     const GEOSGeometry *g,
                                     const GEOSGeometry *p);

But both are fairly doable. I have a flight coming up. Should be a good opportunity to knock them out. (also I'll be away from internet for the next few hours).

@mrocklin
Copy link
Member

This passes for me

def test_is_ring():
    p = Polygon([(0, 0), (1, 0), (1, 1)])
    s = GeoSeries([p, p])
    assert list(s.exterior.is_ring) == [True, True]

@jorisvandenbossche
Copy link
Member Author

Yes, that passes for me as well. So when I update the test_is_ring test to test on s.exterior instead of s, it passes. But the strange thing is that if I instead add the .exterior in the actual is_ring method (so self.exterior._geometry_array instead of self._geometry_array and leave the test alone, then the test fails ..

@mrocklin
Copy link
Member

I'm getting the same thing. I agree that that is weird.

@jorisvandenbossche
Copy link
Member Author

BTW, unary_union is maybe a nice one to try to cythonize as well?

@mrocklin
Copy link
Member

Do people actually use unary_union in practice? Or is it only used internally?

@jdmcbr
Copy link
Member

jdmcbr commented Aug 18, 2017

@mrocklin Yes, I use unary_union pretty regularly. I haven't noticed it being a major performance bottleneck in the situations I've used it in though.

@mrocklin
Copy link
Member

mrocklin commented Aug 18, 2017 via email

@kuanb
Copy link

kuanb commented Aug 18, 2017

Couple examples, if it helps to provide real world examples:

  1. Working with a bunch of census blocks and I need to combine them into a single geometry after subsetting from the total based off of a number of constraints.

  2. I have a cloud of GPS trace points that I want to extract a spine from. I buffer then, union them, and then have a single geometry I can manipulate to get a centerline out of.

  3. I have a dataset of fragmented geometries, e.g. a bunch of weirdly shaped parks, some of which overlap. I need to flatten them down so that overlapping shapes are combined into single geometry. I unary union and then break out the resulting MultiPoly into new unique single Polygons.

@jorisvandenbossche
Copy link
Member Author

We also have a dissolve method, which is basically a groupby followed by applying unary_union

@jdmcbr
Copy link
Member

jdmcbr commented Aug 18, 2017

@mrocklin Perhaps nothing that couldn't be addressed in alternative ways, but a few that spring immediately to mind:

  • in preparation for rasterizing; as I recall, I ran into some minor edge effect issues in operations like using a shape to mask a raster when there were polygons that bordered each other, which was fixed by taking the unary union first
  • some operations on aggregates of polygons, such as calculating areas of intersection between two sets of polygons
  • prior to operations for simplifying geometries, e.g., convex hull

@mrocklin
Copy link
Member

@jorisvandenbossche if you have a moment can you take a look at pickling tests

py.test geopandas/tests/test_geodataframe.py::TestDataFrame::test_pickle

==================================================== FAILURES =====================================================
____________________________________________ TestDataFrame.test_pickle ____________________________________________

self = <geopandas.tests.test_geodataframe.TestDataFrame testMethod=test_pickle>

    def test_pickle(self):
        import pickle
>       df2 = pickle.loads(pickle.dumps(self.df))

geopandas/tests/test_geodataframe.py:496: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../pandas/pandas/core/internals.py:3063: in __setstate__
    for b in state['blocks'])
../pandas/pandas/core/internals.py:3063: in <genexpr>
    for b in state['blocks'])
../pandas/pandas/core/internals.py:3056: in unpickle_block
    return make_block(values, placement=mgr_locs)
../pandas/pandas/core/internals.py:2828: in make_block
    return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
../pandas/pandas/core/internals.py:1979: in __init__
    placement=placement, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = ObjectBlock: 5 dtype: object, values = <geopandas.vectorized.GeometryArray object at 0x7fd46c51c860>
placement = slice(4, 5, 1), ndim = 1, fastpath = False

    def __init__(self, values, placement, ndim=None, fastpath=False):
        if ndim is None:
            ndim = values.ndim
        elif values.ndim != ndim:
            raise ValueError('Wrong number of dimensions')
        self.ndim = ndim
    
        self.mgr_locs = placement
        self.values = values
    
        if ndim and len(self.mgr_locs) != len(self.values):
            raise ValueError('Wrong number of items passed %d, placement '
                             'implies %d' % (len(self.values),
>                                            len(self.mgr_locs)))
E           ValueError: Wrong number of items passed 5, placement implies 1

@mrocklin
Copy link
Member

Also, how is the pandas work going? Are you still confident that the GeometryBlock solution will work?

@epifanio
Copy link

epifanio commented Feb 13, 2018

Hi, when trying to import geopandas followinf the instructions:

python3.6 setup.py build_ext --inplace --with-cython -l geos_c
sudo python3.6 setup.py install

I got:

>>> import geopandas as gpd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/geopandas-1.0.0.dev0+117.g455e396.dirty-py3.6-linux-x86_64.egg/geopandas/__init__.py", line 3, in <module>
    from geopandas.geoseries import GeoSeries
  File "/usr/local/lib/python3.6/dist-packages/geopandas-1.0.0.dev0+117.g455e396.dirty-py3.6-linux-x86_64.egg/geopandas/geoseries.py", line 17, in <module>
    from .base import GeoPandasBase, _series_unary_op, _CoordinateIndexer
  File "/usr/local/lib/python3.6/dist-packages/geopandas-1.0.0.dev0+117.g455e396.dirty-py3.6-linux-x86_64.egg/geopandas/base.py", line 15, in <module>
    from . import vectorized
ImportError: /usr/local/lib/python3.6/dist-packages/geopandas-1.0.0.dev0+117.g455e396.dirty-py3.6-linux-x86_64.egg/geopandas/vectorized.cpython-36m-x86_64-linux-gnu.so: undefined symbol: sjoin
>>> 

any clue on what I'm missing?

I'm on . ubuntu 16.04, python3.6 latest geopandas maser.

@jorisvandenbossche
Copy link
Member Author

@epifanio can you show the output of the install commands? (paste here or put in a gist or so)

@epifanio
Copy link

epifanio commented Feb 13, 2018

@jorisvandenbossche thanks for checking this. I've re-installed both pandas and geopandas.

I removed my pandas installation and geopandas by issuing several times:

python3.6 -m pip uninstall pandas and
python3.6 -m pip uninstall geopandas

then a fresh clone of both repository and installing ..

python3.6 setup.py build_ext --inplace --with-cython -l geos_c

gives me:

https://gist.github.com/76e8a0691b9894217f06b6857522a769
While sudo python3.6 setup.py install gave me the following:

https://gist.github.com/aeecaf45d64173de1cb06780562e80af

testing I got the following:

>>> import geopandas as gpd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/geopandas-1.0.0.dev0+117.g455e396-py3.6-linux-x86_64.egg/geopandas/__init__.py", line 3, in <module>
    from geopandas.geoseries import GeoSeries
  File "/usr/local/lib/python3.6/dist-packages/geopandas-1.0.0.dev0+117.g455e396-py3.6-linux-x86_64.egg/geopandas/geoseries.py", line 17, in <module>
    from .base import GeoPandasBase, _series_unary_op, _CoordinateIndexer
  File "/usr/local/lib/python3.6/dist-packages/geopandas-1.0.0.dev0+117.g455e396-py3.6-linux-x86_64.egg/geopandas/base.py", line 15, in <module>
    from . import vectorized
ImportError: /usr/local/lib/python3.6/dist-packages/geopandas-1.0.0.dev0+117.g455e396-py3.6-linux-x86_64.egg/geopandas/vectorized.cpython-36m-x86_64-linux-gnu.so: undefined symbol: sjoin

@jorisvandenbossche
Copy link
Member Author

@epifanio so from the output of the logs, it seems the setup.py build_ext --inplace is going fine, but in the setup.py install case, the algos.c is for some reason not build.
Can you try for now to use the inplace build? Doing pip install -e . (or sudo python3.6 setup.py develop) after you did the build_ext inplace should be enough.

That said, we of course need to fix the non-inplace installation

@epifanio
Copy link

epifanio commented Feb 14, 2018

I tried to repeat all the steps (manually removed all the traces of geopandas), this the complete log:

https://gist.github.com/955fc72676e70cb4525fae78b4412d48

I was wondering if I should try on a clean virtualenv, instead of my system python (which uses sudo to install). I'll try and report here.

@jorisvandenbossche
Copy link
Member Author

What version of cython do you have?

And can you try this patch?

diff --git a/setup.py b/setup.py
index fb1708b..e35626d 100644
--- a/setup.py
+++ b/setup.py
@@ -85,7 +85,7 @@ for modname in ['vectorized']:
     ext_modules.append(
         Extension(
             'geopandas.' + modname,
-            ['geopandas/' + modname + suffix],
+            ['geopandas/' + modname + suffix, 'geopandas/algos.c'],
             include_dirs=[numpy.get_include(), _geos_headers],
             libraries=['geos_c'],
             library_dirs=[_geos_lib],

@epifanio
Copy link

Cython version:

Python 3.6.3 (default, Oct  6 2017, 08:44:35) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import Cython
>>> Cython.__version__
'0.27.3'

Patching setup.py I'm now able to import import geopandas 👏

@jorisvandenbossche
Copy link
Member Author

Patching setup.py I'm now able to import import geopandas 👏

So there is a # distutils: sources = geopandas/algos.c inside vectorized.pyx, but apparently that is not working for you when doing pip install -e . ..

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Feb 22, 2018

FYI to all: binaries (I will update regularly, but they are not nightly) of the geopandas-cython development branch are available on conda-forge for all platforms:

conda install --channel conda-forge/label/dev geopandas

@linwoodc3
Copy link

linwoodc3 commented Nov 25, 2018

FYI to all: binaries (I will update regularly, but they are not nightly) of the geopandas-cython development branch are available on conda-forge for all platforms:

conda install --channel conda-forge/label/dev geopandas

Hi @jorisvandenbossche and everyone. This should be obvious, but if you have a non-conda forge gdal/ libgdal, you'll need to update your conda environment to use the conda-forge gdal with the geopan.

I ran conda update gdal -c conda-forge

Before running this update, I received this error

geopandas-cython

After install, I saw the bump

Confirming the obvious, but once I installed, I saw bumps in speeds while working through the Scipy 2018 geopandas tutorial. In a work case a few months ago, I couldn't use geopandas for a predicate operation on a geospatial data set with hundreds of millions of observations; now this bump makes it possible.

geopandas-cython_bench

@linwoodc3
Copy link

@jorisvandenbossche let me know if you don't want us posting issues we come across in here. But, I found a possible bug when trying to use the .shift() method for this build of geopandas.

Make any geodataframe and try to use .shift() , and you'll get this error:

>>>df = pd.DataFrame(
    {'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
     'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
     'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
     'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})
>>>df['Coordinates']  = list(zip(df.Longitude, df.Latitude))
>>>df['Coordinates'] = df['Coordinates'].apply(Point)
>>>df.shift()
[Out]:---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-44-f2de45e67fad> in <module>()
----> 1 df.shift()

~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/frame.py in shift(self, periods, freq, axis)
   3801     def shift(self, periods=1, freq=None, axis=0):
   3802         return super(DataFrame, self).shift(periods=periods, freq=freq,
-> 3803                                             axis=axis)
   3804 
   3805     def set_index(self, keys, drop=True, append=False, inplace=False,

~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/generic.py in shift(self, periods, freq, axis)
   7826         block_axis = self._get_block_manager_axis(axis)
   7827         if freq is None:
-> 7828             new_data = self._data.shift(periods=periods, axis=block_axis)
   7829         else:
   7830             return self.tshift(periods, freq)

~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/internals.py in shift(self, **kwargs)
   3703 
   3704     def shift(self, **kwargs):
-> 3705         return self.apply('shift', **kwargs)
   3706 
   3707     def fillna(self, **kwargs):

~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3579 
   3580             kwargs['mgr'] = self
-> 3581             applied = getattr(b, f)(**kwargs)
   3582             result_blocks = _extend_blocks(applied, result_blocks)
   3583 

~/anaconda3/envs/geospatial/lib/python3.6/site-packages/pandas/core/internals.py in shift(self, periods, axis, mgr)
   1286 
   1287         # make sure array sent to np.roll is c_contiguous
-> 1288         f_ordered = new_values.flags.f_contiguous
   1289         if f_ordered:
   1290             new_values = new_values.T

AttributeError: 'GeometryArray' object has no attribute 'flags'

@jorisvandenbossche
Copy link
Member Author

let me know if you don't want us posting issues we come across in here

@linwoodc3 no, no, posting such reports is very welcome!

Are you running on pandas master or pandas 0.23 ?
I think in the meantime, on pandas master there was an ExtensionArray.shift added to enable the DataFrame/Series shift method (pandas-dev/pandas#22387), which might solve this.

@linwoodc3
Copy link

let me know if you don't want us posting issues we come across in here

@linwoodc3 no, no, posting such reports is very welcome!

Are you running on pandas master or pandas 0.23 ?
I think in the meantime, on pandas master there was an ExtensionArray.shift added to enable the DataFrame/Series shift method (pandas-dev/pandas#22387), which might solve this.

Hi @jorisvandenbossche . Yes, I am running '0.23.4'. I just opted for the shift on the single column vice the entire geodataframe. Works.

@webturtles
Copy link

Trying to get this dev cython gpd branch to work on Anaconda/Windows. Am I missing a step by doing the following?
Set up new python 3.6 env in Anaconda then:`

conda install -c anaconda cython
conda install --channel conda-forge/label/dev geopandas
conda update gdal -c conda-forge

I also ran (just in case):
conda update cython -c conda-forge

Either way I get the following:

Python 3.6.8 |Anaconda, Inc.| (default, Feb 11 2019, 15:03:47) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import geopandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\cjackson\AppData\Local\conda\conda\envs\pandascython36\lib\site-packages\geopandas\__init__.py", line 3, in <module>
    from geopandas.geoseries import GeoSeries
  File "C:\Users\cjackson\AppData\Local\conda\conda\envs\pandascython36\lib\site-packages\geopandas\geoseries.py", line 19, in <module>
    from ._block import GeometryBlock
  File "C:\Users\cjackson\AppData\Local\conda\conda\envs\pandascython36\lib\site-packages\geopandas\_block.py", line 6, in <module>
    from pandas.core.internals import Block, NonConsolidatableMixIn, BlockManager
ImportError: cannot import name 'NonConsolidatableMixIn'

Not sure if the way I am approaching it is possible... I won't deny there were clobber messages whilst installing...! Any pointers would much appreciated!
Cheers

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Feb 19, 2019

The latest pandas release (0.24.0) moved some internal classes, so we will need to update the imports here.

If you manually change the import (in geopandas/_block.py) of

from pandas.core.internals import Block, NonConsolidatableMixIn, BlockManager

to

from pandas.core.internals import Block, BlockManager
from pandas.core.internals.blocks import NonConsolidatableMixIn

then I think it should still work (although there might be other things that changed as well).

@webturtles happy to further assist if you want to try out this branch! (I need to update the branch with the latest changes in both pandas and geopandas)

@webturtles
Copy link

Thanks for quick response. Tweaking _block.py worked a treat! If you want me to test on a fresh environment at some point, happy to. I assume the steps I took were right (poss. excluding the gdal update - was throwing the kitchen sink at it before)

@webturtles
Copy link

webturtles commented Feb 19, 2019

My working code has decided to take a turn for the worse after using the dev branch.
I am sure this issue is just related to how Geopandas has changed under the hood!

After running
gdfintersects = gpd.sjoin(gdflines, gdfland, how='inner', op='intersects')
The code

for row in gdfintersects.itertuples():
    #grab id
    transitid = row.Transit_ID
    geom = row.geometry

Returns:

    for row in gdfintersects.itertuples():
  File "C:\Users\xxxx\AppData\Local\conda\conda\envs\cython36\lib\site-packages\pandas\core\frame.py", line 933, in itertuples
    return zip(*arrays)
  File "C:\Users\xxxx\AppData\Local\conda\conda\envs\cython36\lib\site-packages\pandas\core\base.py", line 1131, in __iter__
    return map(self._values.item, range(self._values.size))
AttributeError: 'GeometryArray' object has no attribute 'item'

For info, I also had some issues with numpy and the like to get everything working. Involved a couple of environment rebuilds, though got there in the end! I think using conda-forge installs for other libraries helped.

@waylonflinn
Copy link

waylonflinn commented Apr 6, 2020

When calling iterrows() on a geopandas dataframe I get the following error
(built from geopandas-cython running pandas version 0.24.2)

  File "/home/waylonflinn/Documents/task_grid_detect/src/create_states_osm.py", line 65, in <module>
    for idx,state in states.iterrows():
  File "/home/waylonflinn/Documents/Blocks/Map/venv/lib/python3.6/site-packages/geopandas-1.0.0.dev0+141.ge925363.dirty-py3.6-linux-x86_64.egg/geopandas/geodataframe.py", line 475, in iterrows
    for (index, series), geom in zip(rows, self.geometry._geometry_array):
  File "/home/waylonflinn/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 845, in iterrows
    for k, v in zip(self.index, self.values):
  File "/home/waylonflinn/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 5067, in __getattr__
    return object.__getattribute__(self, name)
  File "/home/waylonflinn/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 5325, in values
    return self._data.as_array(transpose=self._AXIS_REVERSED)
  File "/home/waylonflinn/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 768, in as_array
    arr = mgr._interleave()
  File "/home/waylonflinn/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 793, in _interleave
    result[rl.indexer] = blk.get_values(dtype)
  File "/home/waylonflinn/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 181, in get_values
    return self.values.astype(object)
AttributeError: 'GeometryArray' object has no attribute 'astype'

Workaround:
instead of using

for i, row in frame.iterrows():
    ....

use:

for i in range(len(frame)):
    row = frame.iloc[i]
    ...

@jorisvandenbossche
Copy link
Member Author

@waylonflinn thanks for trying it out! But, in the meantime, the cython effort has been shifted to the pygeos package, and a PR has recently landed in master with optional support for that. See #1154 and https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for some docs.
So it's best to try that out to test the latest work on better performance.

@jorisvandenbossche
Copy link
Member Author

(and I need to update the top-post comment of this issue to reflect that)

@waylonflinn
Copy link

Good to know! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants