Add plugin to read spatial data directly from csv #902

artemp opened this Issue Oct 11, 2011 · 3 comments


None yet

2 participants

artemp commented Oct 11, 2011

A simple text based format is needed for displaying points easily (or WKT encoded geometries).

Three main use cases:

  1. Novice user wants to create data from scratch or geocode some tabular data - its small and already in a spreadsheet, why not just allow them to read it and render it directly to avoid conversion step? This user could then push their data into google docs or even version in github so multiple users could collaborate on the simple file and rendering could get live updated. Mapnik could gracefully skip invalid rows (verbosely) and then errors could then be corrected at the source rather than just in the conversion step (to some database).

  2. A lot of API's dump data as csv. These same API's should support json, but until Mapnik adds a fast, native geojson plugin, a fast native csv plugin can suffice for optimized rendering of small data chunks (by catching in memory at first load).

  3. Massive government data already in csv with lon/lat. User wants to be able to look at it before trying to make sense of it more, and it is so big that normal spreadsheet or conversion tools fall down. We can efficiently render the bits of it that seem valid to enable better data exploration.

And specifically re GeoJSON . It is great, but:

  • is currently only supported in mapnik through ogr driver
  • gets slow quickly due to lack of fast parsing and indexing in ogr driver
  • can be passed as string, but triggers unneeded overhead in ogr to detect it as json
  • json is not as easy to edit by hand as plain text (main issue)

So, a native CSV (e.g. tabular plugin):

  • would have no external dependencies
  • could be more viable than using ogr csv plugin (no need for vrt, actual type detection, faster rendering)
  • would be useful for writing tests (could remove json usage)
  • aggressive up-front parsing and feature caching could enable faster rendering speeds than any other mapnik datasource approach (for reasonable size datasets).
  • could support any geometry type using wkt column or just simple points by auto-detecting long/lat columns
  • should be simple to write and maintain
  • could easily support being used inline in stylesheets
  • csv is easily edited by novice users

Cons are:

  • CSV is not standard and can vary in format/newlines (but boost::escaped_list_separator/boost::spirit can be leveraged)
  • Type detection is expensive and tricky to do perfectly (just look at sqlite)
  • Supporting CSV well is going to lessen some users desire to move to better formats
  • Slipperly slope: can I join this data to the csv?

now as a branch here and pull request is queued up: #912 for post 2.0.1 release merge.

@springmeyer springmeyer was assigned Oct 12, 2011

merged into master in c97c4c9, closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment