Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new spatialite helper methods #385

Merged
merged 12 commits into from Feb 4, 2022
Merged

Conversation

eyeseast
Copy link
Contributor

@eyeseast eyeseast commented Jan 14, 2022

Refs #79

This PR adds three new Spatialite-related methods to Database and Table:

  • Database.init_spatialite loads the Spatialite extension and initializes it
  • Table.add_geometry_column adds a geometry column
  • Table.create_spatial_index creates a spatial index

Has tests and documentation. Feedback very welcome.

@eyeseast eyeseast marked this pull request as ready for review January 15, 2022 15:14
@eyeseast
Copy link
Contributor Author

eyeseast commented Feb 3, 2022

@simonw Not sure if you've seen this, but any chance you can run the tests?

@simonw
Copy link
Owner

simonw commented Feb 3, 2022

Sorry had missed this - tests should run now.

@codecov
Copy link

codecov bot commented Feb 3, 2022

Codecov Report

Merging #385 (af86b17) into main (74586d3) will decrease coverage by 0.61%.
The diff coverage is 28.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #385      +/-   ##
==========================================
- Coverage   96.52%   95.91%   -0.62%     
==========================================
  Files           6        6              
  Lines        2389     2421      +32     
==========================================
+ Hits         2306     2322      +16     
- Misses         83       99      +16     
Impacted Files Coverage Δ
sqlite_utils/cli.py 95.69% <ø> (+0.15%) ⬆️
sqlite_utils/db.py 96.29% <15.00%> (-1.40%) ⬇️
sqlite_utils/utils.py 94.59% <80.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74586d3...af86b17. Read the comment docs.

@eyeseast
Copy link
Contributor Author

eyeseast commented Feb 3, 2022

Fixed my spelling. That's a useful thing.

@simonw
Copy link
Owner

simonw commented Feb 3, 2022

OK, this change makes a bunch of sense to me - and also raises some interesting questions about future additions to sqlite-utils with regards to SpatiaLite. Would sqlite-utils add-geometry-column ... be a good CLI enhancement. for example?

I see you've already talked about that in #79 - moving this conversation there!

@simonw
Copy link
Owner

simonw commented Feb 3, 2022

from sqlite_utils.utils import find_spatialite is part of the documented API already:

https://sqlite-utils.datasette.io/en/3.22.1/python-api.html#finding-spatialite

To avoid needing to bump the major version number to 4 to indicate a backwards incompatible change, we should keep a from .gis import find_spatialite line at the top of utils.py such that any existing code with that documented import continues to work.

@simonw
Copy link
Owner

simonw commented Feb 3, 2022

What do you think about adding these as methods on the Database class instead? Then you could do:

# This is with an optional argument, which if omitted runs find_spatialite() for you:
db.init_spatialite()

# Instead of:
init_spatialite(db, find_spatialite())

Likewise, the add_geometry_column and create_spatial_index methods could live on Table:

# Instead of this:
add_geometry_column(db["locations"], "POINT", "geometry")
create_spatial_index(db["locations"], "geometry")

# Could have this:
db["locations"].add_geometry_column("POINT")
db["locations"].create_spatial_index("geometry")

On the one hand, this is much more consistent with the existing sqlite-utils Python API.

But on the other hand... this is mixing SpatiaLite functionality directly into the core classes. Is that a good idea, seeing as SpatiaLite is both an optional extension (which can be tricky to install) AND something that has a very different release cadence and quality-of-documentation from SQLite itself?

There's a third option: the SpatiaLite could exist on subclasses of Database and Table - so the above examples would look something like this:

from sqlite_utils.gis import SpatiaLiteDatabase

db = SpatiaLiteDatabase("geo.db")
db.init_spatialite()
db["locations"].add_geometry_column("POINT")
db["locations"].create_spatial_index("geometry")

On the one hand, this would keep the SpatiaLite-specific stuff out of the core Database/Table classes. But it feels a bit untidy to me, especially since it raises the spectre of someone who was already subclassing Database for some reason now needing to instead subclass SpatiaLiteDatabase (not too keen on that capitalization) - or even (horror) trying to dabble with multiple inheritance, which can only lead to pain.

So I don't have a strong opinion formed on this question yet!

@simonw
Copy link
Owner

simonw commented Feb 3, 2022

I'm not sure I like name="geometry" as the default argument to add_geometry_column - mainly because of this example here:

add_geometry_column(db["locations"], "POINT")
create_spatial_index(db["locations"], "geometry")

I had to go and look at the code to figure out if "POINT" was the name of the column - and I don't like how inconsistent it looks next to the following create_spatial_index() call where you DO need to pass the column name.


def add_geometry_column(
table: Table,
geometry_type: str,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out the list of valid geometry type values, which is surprisingly hard!

I eventually found a list on http://www.gaia-gis.it/gaia-sins/spatialite-sql-5.0.1.html by searching for AddGeometryColumn - that documentation says that valid values are:

  • POINT, POINTZ, POINTM, POINTZM
  • LINESTRING, LINESTRINGZ, LINESTRINGM, LINESTRINGZM
  • POLYGON, POLYGONZ, POLYGONM, POLYGONZM
  • MULTIPOINT, MULTIPOINTZ, MULTIPOINTM, MULTIPOINTZM
  • MULTILINESTRING, MULTILINESTRINGZ, MULTILINESTRINGM, MULTILINESTRINGZM
  • MULTIPOLYGON, MULTIPOLYGONZ, MULTIPOLYGONM, MULTIPOLYGONZM
  • GEOMETRYCOLLECTION, GEOMETRYCOLLECTIONZ, GEOMETRYCOLLECTIONZM, GEOMETRYCOLLECTIONZM
  • GEOMETRY, GEOMETRYZ, GEOMETRYM, GEOMETRYZM

I was going to suggest maybe providing constants for these and requiring that the argument was one of those constants... but with that many options I'm not sure that would be better than sticking with strings, and letting SpatiaLite itself raise an exception on an invalid value!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if GEOMETRY would be a reasonable default value for add_geometry_column(geometry_type=)? I believe that's a column which can hold any type of geometry value - then we could allow users who want to be more restrictive to use geometry_type="POINT" or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had to dig for that, too. And those types are case-insensitive, as far as I can tell. I'm happy to provide constants, but if you pass in "point" as your geometry type, Spatialite won't complain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to take out the default column name. Using geometry was my effort to standardize on GeoJSON, but I've seen plenty of examples that don't. Then you could have a similar signature, something like this:

add_geometry_column(db["locations"], "geometry", "POINT")
create_spatial_index(db["locations"], "geometry")

@eyeseast
Copy link
Contributor Author

eyeseast commented Feb 3, 2022

I thought about adding these as methods on Database and Table, and I'm back and forth on it for the same reasons you are. It's certainly cleaner, and it's clearer what you're operating on. I could go either way.

I do sort of like having all the Spatialite stuff in its own module, just because it's built around an extension you might not have or want, but I don't know if that's a good reason to have a different API.

You could have init_spatialite add methods to Database and Table, so they're only there if you have Spatialite set up. Is that too clever? It feels too clever.

@eyeseast
Copy link
Contributor Author

eyeseast commented Feb 3, 2022

from sqlite_utils.utils import find_spatialite is part of the documented API already:

https://sqlite-utils.datasette.io/en/3.22.1/python-api.html#finding-spatialite

To avoid needing to bump the major version number to 4 to indicate a backwards incompatible change, we should keep a from .gis import find_spatialite line at the top of utils.py such that any existing code with that documented import continues to work.

This is fixed now. I had to take out the type annotations for Database and Table to avoid a circular import, but that's fine and may be moot if these become class methods.

@simonw
Copy link
Owner

simonw commented Feb 3, 2022

I thought about adding these as methods on Database and Table, and I'm back and forth on it for the same reasons you are. It's certainly cleaner, and it's clearer what you're operating on. I could go either way.

I do sort of like having all the Spatialite stuff in its own module, just because it's built around an extension you might not have or want, but I don't know if that's a good reason to have a different API.

You could have init_spatialite add methods to Database and Table, so they're only there if you have Spatialite set up. Is that too clever? It feels too clever.

Yeah that's too clever. You know what? I'm pretty confident we are both massively over-thinking this. We should put the methods on Database and Table! API simplicity and consistency matters more than vague concerns about purity.

@eyeseast
Copy link
Contributor Author

eyeseast commented Feb 3, 2022

Works for me. I was just looking at how the FTS extensions work and they're just methods, too. So this can be consistent with that.

@eyeseast
Copy link
Contributor Author

eyeseast commented Feb 3, 2022

OK, I moved all the GIS helpers into db.py as methods on Database and Table, and I put find_spatialite back in utils.py. I deleted gis.py, since there's nothing left it. Docs and tests are updated and passing.

I think this is better.

@eyeseast eyeseast changed the title Add new gis.py module with spatialite helper methods Add new spatialite helper methods Feb 3, 2022
sqlite_utils/db.py Outdated Show resolved Hide resolved
sqlite_utils/db.py Outdated Show resolved Hide resolved
sqlite_utils/db.py Outdated Show resolved Hide resolved
sqlite_utils/db.py Outdated Show resolved Hide resolved
tests/test_gis.py Outdated Show resolved Hide resolved
@simonw
Copy link
Owner

simonw commented Feb 4, 2022

This looks fantastic, thanks for all of the work you put into this!

@simonw simonw merged commit ee11274 into simonw:main Feb 4, 2022
simonw added a commit that referenced this pull request Feb 4, 2022
@simonw
Copy link
Owner

simonw commented Feb 4, 2022

Shipped this as sqlite-utils 3.23: https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-23

@eyeseast
Copy link
Contributor Author

eyeseast commented Feb 4, 2022

Awesome. Thanks for your help getting it in. Will now look at adding CLI versions of this. It's going to be super helpful on a bunch of my projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants