Skip to content

WIP: Add PostGIS Geometry adapter based on Shapely#80

Closed
jacopofar wants to merge 35 commits intopsycopg:masterfrom
jacopofar:add-shapely-adapters
Closed

WIP: Add PostGIS Geometry adapter based on Shapely#80
jacopofar wants to merge 35 commits intopsycopg:masterfrom
jacopofar:add-shapely-adapters

Conversation

@jacopofar
Copy link
Contributor

@jacopofar jacopofar commented Sep 10, 2021

As mentioned in rustprooflabs/pgosm-flex#165 I tried to write an adapter for PostGIS geometries based on Shapely.

The usage looks like this:

Example usage
import psycopg


from psycopg.types.geometry import register_shapely_adapters

CONN_STR = "postgres://postgres:testpassword@localhost:15432/osm_data"

# get the Reichstag building geometry from a pgosm-flex import of Berlin
# it's a multiploygon with holes in it
# https://www.openstreetmap.org/relation/2201742
READ_QUERY = """
    select osm_id, name, st_area(geom), geom
    from osm.building_polygon
    where osm_id = -2201742"""

with psycopg.connect(CONN_STR) as conn:
    register_shapely_adapters(conn)

    with conn.cursor(binary=False) as cur:
        cur.execute(READ_QUERY)
        row = cur.fetchone()
        print('Binary protocol:')
        print(row)
        print('area from shape:', row[-1].area)
        print('area from DB:', row[-2])


    with conn.cursor(binary=True) as cur:
        cur.execute(READ_QUERY)
        row = cur.fetchone()
        print('Text protocol:')
        print(row)
        print('area from shape:', row[-1].area)
        print('area from DB:', row[-2])
        from shapely import affinity
        cur.execute("insert into osm.building_polygon(osm_id, osm_type, address, geom) VALUES(999999, 'fake', 'fake', %s)", (affinity.rotate(row[-1], 30),))
        conn.commit()


with psycopg.connect(CONN_STR) as conn:
    # this connection has no adapters
    with conn.cursor(binary=False) as cur:
        cur.execute(READ_QUERY)
        row = cur.fetchone()
        print(str(row)[:400] + '...(truncated)')
it works both in binary and text mode, and I can see the rotated shape in QGIS
Screenshot of the polygons in QGIS

a map showing the polygon and the rotated one inserted using the adapter

There are a few things missing:

  • the geometry is a type from postGIS, so I think it can only be retrieved at runtime
  • how to write a test foir this? I ran it against a test PostGIS DB
  • I'm not sure about the import for shapely, now it's simply attempted and there's no error handling on it
  • documentation, once the usage is defined

Copy link
Member

@dvarrazzo dvarrazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing and documentation are needed.

For testing:

  • do we have to add the PostGIS package to postgres?
  • one important criteria is that psycopg imports ok without shapely installed, so the basic test suite should skip the shapely tests
    • however, I'd like to test it too (I have the same problem with the dns module, depending on a module I don't want to depend on, so the test are skipped)
    • I think we should add a new tox target which would install shapely in its environment and run the shapely tests only (we can add a pytest tag for that)

@dvarrazzo
Copy link
Member

Please take a look at ef9cb2b as an example about isolating tests requiring an optional dependency.

@dvarrazzo
Copy link
Member

Please call the module psycopg.types.shapely, not geometry. There might come out other ways to adapt geometry types to different Python objects.

@jacopofar
Copy link
Contributor Author

I applied the suggested changes, and wrote a doc (I never used RST and was unable to serve the docs locally, am not totally sure about the syntax), will look at the tests later.

@dvarrazzo
Copy link
Member

I applied the suggested changes, and wrote a doc (I never used RST and was unable to serve the docs locally, am not totally sure about the syntax), will look at the tests later.

I can help you with the docs, no worries.

If you would like to give it a shot you can run pip install -e ./psycopg[docs] to install the dependencies in the current virtualenv and make -C docs serve to run a web server serving the docs locally.

Thank you very much :)

format = Format.BINARY

def dump(self, obj: "BaseGeometry") -> bytes:
return wkb.dumps(obj).encode() # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want encode() here. AFAICS, wkb.dumps() dumps already bytes.

>>> wkb.dumps(point)
b'\x01\x01\x00\x00\x00333333\xf3?333333\x0b@'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, wkb means literally Well Known Binary :-)
Checking this I found out that Shapely already returns str or bytes depending on the hex parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad, anyway, that the hex string is not accepted as bytes: it would be more efficient to parse. I also see a lot of ctypes: for experience that's a great loss of performance.

@dvarrazzo
Copy link
Member

I have pushed a changeset in this branch to fix documentation, both the reST syntax and adding shapely as a doc dependency so that docs introspection works

@jacopofar
Copy link
Contributor Author

I can help you with the docs, no worries.

If you would like to give it a shot you can run pip install -e ./psycopg[docs] to install the dependencies in the current virtualenv and make -C docs serve to run a web server serving the docs locally.

Thank you very much :)

Great, I'm very curious about trying RST and will give it a second try when I have time

@dvarrazzo
Copy link
Member

it looks like you have cherry-picked the changes from master, so now it seems that this branch has changed 51 files. Please rebase on master, thank you :)

Previous commits were messy due to some rebase gone wrong, moved them to
a single commit
In case in the future further adapters for geometry types are added,
this naming makes more sense
The latest registered dumper is the one used by default if the %s
placeholder is used in a query. The binary one is preferred.
@jacopofar
Copy link
Contributor Author

Thanks for the explanation, I was now able to serve the documentation locally and it seems OK, also wrote some test based on the example above. The tests store and retrieve a few Shapely objects, and also generate geometries in the DB using GeoJSON. I'm not used to tox nor GH actions so likely there are some issues.

I still have an issue with the Multipolygon object (the big geoJSON at the beginning of the test), when parsing the wkb Shapely says it's invalid but PostGIS and a few other tools are fine with it, I'm still investigating

@jacopofar
Copy link
Contributor Author

jacopofar commented Sep 15, 2021

Hmm it's failing on matching the shapely dependency with the implementation.

If I set "postgis" as an implementation in .github/workflows/tests.yml it doesn't work because it has to match an implementation defined in psycopg/pq/__init__.py (EDIT: I noticed now the readme file explaining it, that approach was definitely wrong). If I set it to "python" and use a separate parameter to differentiate the service (and run postgis instead of postgres) then shapely is not installed in the environment, even though I'm passing the "postgis" environment using the -e flag and tox.ini has it.
Locally I run it with PSYCOPG_TEST_DSN=postgres://postgres:testpassword@localhost:15432/osm_data tox -c psycopg -e postgis -- tests/test_shapely.py --color yes and it works.

Nevermind, I missed the fact the skip happens programmatically in the test itself, it's not part of the tox configuration.

Now it works, I'm not sure about using pytest.importorskip rather than doing as in here:

pytestmark.append(pytest.mark.skip(reason=str(ex)))

I'm checking the process on macOS and don't know about Windows

@jacopofar
Copy link
Contributor Author

I also just noticed that the doc step fails because it cannot find the Shapely library when importing the adapter code (to generate its docs I assume), I tried to set autodoc_mock_imports to mock shapely in docs/conf.py without results.

I would avoid installing Shapely only to generate the documentation, but could not find a way to prevent it.

@dvarrazzo
Copy link
Member

I also just noticed that the doc step fails because it cannot find the Shapely library when importing the adapter code

That's strange: didn't I add shapely to the docs extra? It seems it run it ok in the CI (the lint step checks for errors there). Looking at the changeset you might have lost those changes in a rebase. I've pushed it again on our repos side: see the changes in setup.py in 49341f6 You can integrate it back

@dvarrazzo
Copy link
Member

Now it works, I'm not sure about using pytest.importorskip rather than doing as in here:

I am pretty ok with what you did here, we can do without importskip :)

SAMPLE_POINT = Point(1.2, 3.4)
SAMPLE_POLYGON = Polygon([(0, 0), (1, 1), (1, 0)])

fmt_placeholder = "%b" if fmt_out == Format.BINARY else "%t"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better coverage of the param format combination you can use a fmt_in passing adapt.Format as parameter. See for instance:

@pytest.mark.parametrize("fmt_in", PyFormat)
def test_uuid_dump(conn, fmt_in):
val = "12345678123456781234567812345679"
cur = conn.cursor()
cur.execute(f"select %{fmt_in} = %s::uuid", (UUID(val), val))
assert cur.fetchone()[0] is True

Explanation: PyFormat is an enum whose values (s, b, t) are the placeholders letters. This is unlike pq.Format which is only "text" and "binary".

Using both fmt_in and fmt_out you test all the possible 6 combinations, although maybe it's more appropriate to test the dumpers with the fmt_in range and the loaders with the fmt_out one. The uuid test is a good sample.

Note that I've just simplified the formats iteration in many tests: take a look at the current state of master now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, done!

- name: Start PostgreSQL service for test
run: brew services start postgresql

- name: Enable PostGIS extension
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other tests, setting prerequisites is done by the unit test, using a fixture. See hstore for an example.

Not sure it's better in absolute terms. If it fails, tests are skipped instead of failing, which is not really optimal. However it keeps the pipeline simpler.

I'll take a look myself at whether it can be improved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now that macOS postgis test is gone it's not needed anymore


- name: Enable PostGIS extension
run: psql -c "CREATE EXTENSION postgis;" postgres
run: psql -h 127.0.0.1 -U runner -c "CREATE EXTENSION postgis;" postgres
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are finding problems with macOS but postgis is known to just work with that platform maybe we can avoid running these tests?

Also note that creating the extension from pytest rather than from runner would probably need no configuration tweaking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and installing it seems quite heavy, maybe the best solution is to test it only on Linux (either using the postgis image or setting it up somehow in the unit test itself)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If postgis image and Linux work, I'm happy with that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, now it's only on linux (where it's already configured in the postgis image so no CREATE EXTENSION step is needed)

@dvarrazzo
Copy link
Member

dvarrazzo commented Sep 19, 2021

I still have an issue with the Multipolygon object (the big geoJSON at the beginning of the test), when parsing the wkb Shapely says it's invalid but PostGIS and a few other tools are fine with it, I'm still investigating

Hi @jacopofar

Have you resolved this issue? I guess it was related to the argument passing style. If so, you can hand me over this branch: I will rebase it, uniform the tests to the rest of the suite and merge.

I mostly care that you pass us your experience with the postgis objects, and so that the test suite has representative tests; we can take care of the pytest/tox/GitHub integration.

Than very much for your work so far!

@jacopofar
Copy link
Contributor Author

Yes, the issue is solved now, I just found there were a few extra digits in the coordinates list, now it loads and there's a test to check that it generates a multipolygon object.

Thanks for the review, it was quite instructive for me :)

@dvarrazzo dvarrazzo added this to the 3.0 milestone Sep 20, 2021
@dvarrazzo
Copy link
Member

I have cleaned up a bit the history of this branch and merged to master. Thank you very much for this contribution 🙂

@dvarrazzo dvarrazzo closed this Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants