Skip to content

Indexer: Set Grid.srid#57

Merged
jameshiebert merged 23 commits intomasterfrom
issue53-srid
Mar 1, 2018
Merged

Indexer: Set Grid.srid#57
jameshiebert merged 23 commits intomasterfrom
issue53-srid

Conversation

@rod-glover
Copy link
Copy Markdown
Collaborator

@rod-glover rod-glover commented Jan 10, 2018

Resolves #53

Now sets Grid.srid, which previously was left null, by finding or inserting an entry in the table spatial_ref_sys (ORM: SpatialRefSys) that describes the coordinate reference system defined in the metadata of the indexed file for the variable for which a Grid is being inserted. If no such metadata exists, a default entry corresponding to a PROJ.4 string of +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs is used.

Notes:

  • This PR updates the ORM to model the table spatial_ref_sys. This table is created in schema public by PostGIS, and the other (modelmeta) tables are created in the username schema (e.g., pcic_meta). The ORM does not define the schema explicitly because it is different in each modelmeta database. Correctly setting search_path in each database ensures that tables are findable. In the test framework, all tables are in schema public.

  • This PR adds a dependency on the PyCRS package, which has a fixed bug that affects us. The only way to get the fix is to install the package directly from the github repo, which seems possibly fragile.

@jameshiebert jameshiebert self-requested a review February 21, 2018 22:05
@jameshiebert jameshiebert self-assigned this Feb 21, 2018
Copy link
Copy Markdown
Contributor

@jameshiebert jameshiebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks great. Just a few minor tweaks, and I think we should be good to go.

Comment thread requirements.txt Outdated
python-dateutil
sqlparse
testing.postgresql No newline at end of file
git+git://github.com/karimbahgat/PyCRS.git
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely non-ideal to install out of a git repo, but I guess we'll roll with it. Looks like the repo author doesn't have any tags to pin to, but can we at least pin to a specific commit? That way we can ensure that the software doesn't change out from under us (and at least mitigate some of the risk of installing from GH).

Comment thread mm_cataloguer/index_netcdf.py Outdated
.filter(SpatialRefSys.srtext ==
wkt(cf.proj4_string(var_name, default=default_proj4)))
)
return q.first()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if there are multiple matches, we have bigger problems? :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you're right. I should change that to q.one()

Comment thread mm_cataloguer/index_netcdf.py Outdated
:param var_name: (str) name of variable for which to insert spatial ref sys
:return: (SpatialRefSys) inserted record
"""
# FIXME: This can cause problems if multiple indexers are being run
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really?! Isn't that what ACID databases are supposed to do? Can you roll the search and insert into a transaction to avoid this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading this I now know why SQLAlchemy transaction management has had me so lost.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have no idea how good it is to read that. This took an inordinate amount of time to find a good (I hope) solution for. Yeesh.

spatial ref sys
:return: existing or new ``SpatialRefSys`` record
"""
spatial_ref_sys = find_spatial_ref_sys(sesh, cf, var_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you'd actually want the transaction here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by new insert_spatial_ref_sys (transaction is via CTE).

Comment thread modelmeta/v2.py Outdated
# responsible for creating it.


# TODO: Move this out to conftest. How did this even get here?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment thread tests/conftest.py
return CFDataset(resource_filename('modelmeta', filename))


@pytest.fixture(params=[False, True])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

@rod-glover
Copy link
Copy Markdown
Collaborator Author

@jameshiebert ready for next review

Copy link
Copy Markdown
Contributor

@jameshiebert jameshiebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple really minor things, but for the most part this looks great! Thanks for taking the time to get it right :)

# Documentation seems to indicate that ``Session.refresh()`` should do this
# for us (and more elegantly), but experimentation shows it doesn't.
sesh.flush()
# The newly inserted SRS is by definition the one with the highest id.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that works.

Comment thread mm_cataloguer/index_netcdf.py Outdated
# Miscellaneous constants

filepath_converter = 'realpath'
default_proj4_string = '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be unused and a duplicate of the default_proj4 variable.

:return: (SpatialRefSys) inserted record
"""

# Use a common table expression (CTE, a.k.a. WITH statement) to compute,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment thread requirements.txt Outdated
numpy
netCDF4
nchelpers>=5.1.2
nchelpers>=5.3.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend pinning the version to 5.3.0 here. Or at the very least '==5'. Presumably, we don't want to auto-upgrade major versions which, by definition, have backwards-incompatible API changes

@rod-glover
Copy link
Copy Markdown
Collaborator Author

@jameshiebert LGTY?

@jameshiebert
Copy link
Copy Markdown
Contributor

Yep. Looks great.

@jameshiebert jameshiebert merged commit 15bcdc0 into master Mar 1, 2018
@jameshiebert jameshiebert deleted the issue53-srid branch March 1, 2018 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New(ish) indexer does not set Grid.srid

2 participants