Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding geopandas? #1569

Closed
deeplook opened this issue May 3, 2021 · 21 comments · Fixed by #3213
Closed

Adding geopandas? #1569

deeplook opened this issue May 3, 2021 · 21 comments · Fixed by #3213

Comments

@deeplook
Copy link

deeplook commented May 3, 2021

Is it thinkable to include geopandas? Or is that too heavily depending on C/C++ packages?

@rth
Copy link
Member

rth commented May 3, 2021

We can try to include it. It does have a number of dependencies which would also need to be packaged, and as far as I remember GIS related tools tend to be complex to package.

More generally we need to switch to a build setup where all packages are not re-built for each commit, as that takes a long time (even with ccache) and is not really necessary.

@yxdragon
Copy link

geo pandas depandents on gdal, it is very heavy. a light solution is geopandas + shapely + pyproj.
could shapely and pyproj be web assambly?

@deeplook
Copy link
Author

Yes, shapely would be a must. Pyproj and fiona seem like hard dependencies, too.

@rth
Copy link
Member

rth commented May 28, 2021

For shapely, adding ctypes support (#728) is a blocker.

For pyproj and fiona someone would need to try to create the corresponding pyodide package and see if there are any issues.

@yxdragon
Copy link

I think the biggest trouble gdal, It is hard to be installed even on native python, and gdal is not pythonic.
pyproj is a cython project, I think it could be web assambly.

geopandas is a dataframe with a geometry columns and a crs.
georaster is a image with a crs and a affine matrix.

So if it is difficult to support gdal and fiona, I think we can give up them. I can try to build a light repo. just need (numpy, pandas shapely, pyproj), and some light or pure python lib for io, such as (h5py, netCDF for raster, pyshp, simplekml for vector)

shapely is important, not only in geography
I think there are 3 base struct in scientific calculation:

  1. array or (series, image, table...) > numpy, scipy, pandas, optional: [skimage, opencv...]
  2. vector or (cloudpoints, polygon, mesh) > shapely, pymesh [open3d, pcl ...]
  3. graph or (tree, network) > scipy.sparse, networkx

@mattficke
Copy link

For shapely, adding ctypes support (#728) is a blocker.

Looks like this was added in #1656. Do you know if there are other blockers for shapely here?

@hoodmane
Copy link
Member

The next issue for shapely is to port libgeos:
https://trac.osgeo.org/geos/

@hoodmane
Copy link
Member

I made an attempt at building libgeos. So far I ran into the warning:

ADD_LIBRARY called with SHARED option but the target platform does not
support dynamic linking.  Building a STATIC library instead.  This may lead
to problems.

and some linker errors:

wasm-ld: error: duplicate symbol: vtable for geos::noding::BasicSegmentString
>>> defined in ../../lib/libgeos.a(inlines.cpp.o)
>>> defined in ../../lib/libgeos.a(BasicSegmentString.cpp.o)

wasm-ld: error: duplicate symbol: typeinfo for geos::noding::BasicSegmentString
>>> defined in ../../lib/libgeos.a(inlines.cpp.o)
>>> defined in ../../lib/libgeos.a(BasicSegmentString.cpp.o)

@hoodmane
Copy link
Member

hoodmane commented Oct 19, 2021

Looks like -DDISABLE_GEOS_INLINE=ON helps with the linker errors:
https://trac.osgeo.org/geos/ticket/1090?cversion=0&cnum_hist=4

@hoodmane
Copy link
Member

Okay so I got it building and installing mostly correctly, but I am not sure how to convince it to build a shared library (currently it's just building .a files). I tried adding set_property(GLOBAL PROPERTY TARGET_SUPPORTS_SHARED_LIBS TRUE) to CMakeLists.txt but that doesn't seem to fix the problem...

@hoodmane
Copy link
Member

Asked about this on emscripten emscripten-core/emscripten#15276

@leouieda
Copy link

Perhaps this can help shed some light on GDAL and other hard to package libs: https://github.com/bugra9/gdal3.js

@ryanking13
Copy link
Member

Thanks for the information @leouieda. I'll try that when I have some bandwitdh.

@raybellwaves
Copy link

created a separate issue for adding fiona which is a geopandas dependency: #3091

@rth
Copy link
Member

rth commented Sep 10, 2022

Recently @jorisvandenbossche mentioned that it might be possible to build geopandas without some of the hard to build dependencies (at the cost of reduced functionality). Currently, we have shapely and pyproj available but not fiona & GDAL. Any suggestions on this, Joris? Unless I misunderstood you :)

Any way to load GeoJSON without fiona? Or at least it feels more useful to try to build pyarrow #2933 and use the parquet backend in geopandas rather than spending that time to build GDAL.

@jorisvandenbossche
Copy link

jorisvandenbossche commented Sep 10, 2022

That's correct, you can use geopandas with only shapely and pyproj installed. We currently still list fiona (and thus GDAL) as an install requirement in setup.py, but if you force to install without fiona, it will work fine (on conda-forge, we already have a geopandas-base package that does not depend on fiona, long term we will probably drop fiona as a dependency for the python package as well).
The only thing you can't use are the read_file/to_file functions. But you can load geojson using the GeoDataFrame.from_features if you pass it the dict from json.loads.

@jorisvandenbossche
Copy link

So if shapely and pyproj are already available (and pandas), and since geopandas is a pure python package (that can be installed without specific effort?), it should actually already work, I think?

@rth
Copy link
Member

rth commented Sep 10, 2022

Hah, yes, indeed it works with the REPL!

>>> import micropip
>>> await micropip.install(['pandas', 'shapely', 'pyproj'])
>>> await micropip.install('geopandas', deps=False)
>>> import geopandas
>>> geopandas.__version__
'0.11.1'

though I haven't tried using it further.

@rth
Copy link
Member

rth commented Sep 10, 2022

Opened a follow up usability issue for GeoJSON in this case in geopandas/geopandas#2548

@rth rth mentioned this issue Sep 10, 2022
@rth
Copy link
Member

rth commented Sep 10, 2022

Installing from PyPI works, but the package size is a bit large (1MB compressed), and uncompressed:

du -sh geopandas/*
4,0K    geopandas/__init__.py
8,0K    geopandas/_compat.py
4,0K    geopandas/_config.py
4,0K    geopandas/_decorator.py
28K     geopandas/_vectorized.py
4,0K    geopandas/_version.py
48K     geopandas/array.py
116K    geopandas/base.py
4,0K    geopandas/conftest.py
976K    geopandas/datasets
36K     geopandas/explore.py
92K     geopandas/geodataframe.py
48K     geopandas/geoseries.py
64K     geopandas/io
36K     geopandas/plotting.py
32K     geopandas/sindex.py
12K     geopandas/testing.py
468K    geopandas/tests
128K    geopandas/tools

So packaging it in Pyodide to unvendor tests and possibly later datasets #3092 would still be useful.

@deeplook
Copy link
Author

deeplook commented Nov 7, 2022

Awesome, thanks so much!! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants