Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Georss parsing #9

Closed
wants to merge 7 commits into from
Closed

Georss parsing #9

wants to merge 7 commits into from

Conversation

sgillies
Copy link
Contributor

This makes feedparser useful to people who want to parse GeoRSS feeds, map earthquake locations, share spatial metadata, etc. It contains tests and is only 2 namespaces and a small amount of code. I've written a blog post to demonstrate what builtin GeoRSS parsing means to feedparser users here: http://sgillies.net/blog/1154/feed-parsing-and-mapping.

Gives the entry object a GeoJSON-style "where" item. See
https://code.google.com/p/feedparser/issues/detail?id=62 for a history
of this work.
Changing geometry type values to unicode strings passed the tests.
@tomkralidis
Copy link

+1 @sgillies, thanks for this valuable patch!

@kurtmckee
Copy link
Owner

I'm having technical issues with my ssh agent failing to ask for my password, so I'm trying to resolve that so I can fetch these changes and review them!

@kurtmckee
Copy link
Owner

I'm looking at the spec and at your code and I wanted to provide some initial feedback!

First, I see several generic try/except statements that simply catch Exception. One of us will have to modify this to catch specific exceptions. Additionally, you've made reference to _debug, which doesn't exist in the codebase anymore. I haven't run the code through coverage but I'll bet that the unit tests don't trigger any exceptions?

Second, I noticed that the georss spec seems to consistently list coordinates in lat/lon order, but after parsing they're returned in lon/lat order. I looked around the internet for additional examples of coordinates and it appears that latitude/longitude is the typical order. What is the significance for switching them in the tuple? I ask because it will be difficult to change the order at a later date after the code is released.

Without additional feedback I'll expand the unit tests and make the try/except changes I mentioned above, but I will also change the lat/lon order so that its order isn't changed during parsing.

@sgillies
Copy link
Contributor Author

sgillies commented Dec 1, 2012

I'm excited that you are in favor! I'll add an exception-raising test or two right away and you can count on me for documentation of this feature as well.

Lat, lon is perhaps the most common in the vernacular, but is not the standard for every data format. KML probably makes up most of the structured geographic information on the web, and it has lon/lat order: https://developers.google.com/kml/documentation/kmlreference#point. Any projected geographic data from a national mapping agency will have coordinates in easting/northing order (lon/lat in other words). The GeoJSON format, very widely used in web mapping, has lon/lat order: http://leafletjs.com/examples/geojson.html.

The dict I parse GeoRSS into (the "where" item) satisfies this GeoJSON-like protocol: https://gist.github.com/2217756 and thereby brings feedparser into a family of mature Python GIS software like PySAL, Shapely, and even Esri's ArcGIS. Switching the coordinate order to lat/long would go against accepted practice in the Python GIS community. It would also be inconsistent with the functions in Matplotlib (my favorite Python visualization package) which expect coordinates in X/Y (i.e. longitude/latitude for maps with North at the top) order. Furthermore, applications I and others have in production (we've been using this GeoRSS patch for years while feedparser was not actively maintained) would broken by swapped coordinates.

@sgillies
Copy link
Contributor Author

sgillies commented Dec 1, 2012

I've begun eliminating all the references to _debug. Update to this pull request coming soon.

@ghost ghost assigned kurtmckee Dec 2, 2012
@kurtmckee
Copy link
Owner

Well said. The lon/lat reasoning makes sense, and I'm glad it will work well with existing software.

I'm looking forward to the next pull request!

@sgillies
Copy link
Contributor Author

sgillies commented Dec 4, 2012

Most of the time, locations in GeoRSS will be in the default coordinate reference system (latitude, longitude using WGS84 model, aka "EPSG:4326"), but if they are not, feedparser should now do the right thing with respect to coordinate order.

@kurtmckee
Copy link
Owner

I've committed these patches, with some minor modifications: I removed the merge commit (9f1fe0d) and replaced the syntax of one of the lines so it would run under Python 2.4. Thanks for your patience, Sean, I'm really grateful that you provided these patches! :)

@kurtmckee kurtmckee closed this Jan 13, 2013
@sgillies
Copy link
Contributor Author

Fantastic! I'm still a Python 2.4 user from time to time, so I appreciate that change. Thanks for keeping feedparser going!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants