Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geotable plot #844

Merged
merged 52 commits into from Nov 19, 2016
Merged

Geotable plot #844

merged 52 commits into from Nov 19, 2016

Conversation

darribas
Copy link
Member

This is a PR to bring into pysal/dev work to implement easy plotting for geotables. Not to be merged yet but rather to create a place to centralized the discussion around design details and so on. My idea is that, once we're happy with it we merge and, ideally, we have it in by the time @ljwolf's GSoC project gets merged as well so we can have a big splash on simplifying workflows with spatial data.

So far, this implements basic machinery to plot geotables with both a matplotlib and a bokeh backend and contains a color.py module with palette support that depends on bewer2mpl. There's a notebook (geotable_plot) explaining the new functionality.

@coveralls
Copy link

coveralls commented Jul 18, 2016

Coverage Status

Coverage increased (+0.0005%) to 85.28% when pulling dc0ae2a on darribas:geotable_plot into 336cfd0 on pysal:dev.

@darribas
Copy link
Member Author

brewer2mpl mentions that it's superseded by paletteable, should we switch?

@sjsrey @ljwolf

@ljwolf
Copy link
Member

ljwolf commented Jul 18, 2016

Ahh, good call!

I think it's smart to switch to palettable since it involves much more than just brewer, but am not sure of how much would need to be refactored. Probably just colors.py? And, maybe changes in the ipywidgets colorpicker?

I could probably author the necessary changes over the weekend, but I can review any stuff this week.

@darribas
Copy link
Member Author

yes, I think all of the refactoring should happen in color.py. I really like the modular approach you took in there. At the end of the day, the API should be: pass in an indexed series of classes and a palette, receive an equally indexed series of colors. I also really like the option to select the style of the colors (hex, rgb, etc.). bokeh only seems to take hex, so the default currently set is also great.

@ljwolf
Copy link
Member

ljwolf commented Jul 18, 2016

I like the colors.py stuff too, but that's 💯 @sjsrey

I agree on the API. Not sure how the style of colors matters in the backend, other than what each package expects naturally. Maybe that'll be more apparent as I dig in.

@darribas
Copy link
Member Author

re. the color style (or unit), I think matplotlib takes essentially anything, but bokeh only hex. I like leaving it open because we might end up using more backends and it seems to be pretty straightforward.

@darribas
Copy link
Member Author

Right, but right now pandas and matplotlib will fail at import.

@coveralls
Copy link

coveralls commented Sep 29, 2016

Coverage Status

Coverage increased (+0.6%) to 83.073% when pulling 85b4db7 on darribas:geotable_plot into b18c2cf on pysal:dev.

@ljwolf
Copy link
Member

ljwolf commented Sep 30, 2016

Cool, :shipit:

@ljwolf
Copy link
Member

ljwolf commented Nov 17, 2016

Looking into the issue with the brexit map under bokeh that @sjsrey noted, I think the bokeh backend will be limited for complex geometries.

To reprise, the issue was that a call to plot the brexit data in bokeh finishes computing, but crashes before display due to something somewhere in the Javascript rendering.

#this will crash your notebook
maps.geoplot(db, 'Pct_Leave', k=7, alpha=0.5, classi='fisher_jenks', backend='bk')

This is surprising, since there are only ~380 constituencies. In contrast, geotable_plot can show NAT in bokeh just fine ☹️

NAT = ps.pdio.read_files(ps.examples.get_path('NAT.shp'))
maps.geoplot(NAT, 'HR90', alpha=.5, k=7, classi='fisher_jenks', backend='bk')

This led me to look at the vertex counts.

brexit_verts = db.geometry.apply(lambda x: len(x.vertices))
brexit_verts.unique()

The british constituency shapefile has some pretty amazingly detailed polygons, with the Highlands constituency using ~360k points alone. The worst in NAT is under 200 vertices. Thanks mostly to the intense detail with which the Scottish part of the dataset is stored, it takes around 3.5 million points to plot all constituencies, whereas NAT is only 80k points to plot. Extrapolating a bit, it looks like BokehJS or the notebook Javascript chokes at around 2 million points in the bokeh plot.

@ljwolf
Copy link
Member

ljwolf commented Nov 17, 2016

To help the user, we could warn if the plot "size" is too large and the rendering likely to fail, but I'm not sure if the point limit you can find in the brexit data will hold for other datasets.

@darribas
Copy link
Member Author

Thanks very much for this @ljwolf. I think this is not necessarily a bug in our side but just a characteristic or rendering on the browser these days at least. I wonder also if this is bokeh only or any renderer that directly attempts to put that many points on a browser will choke.

I'd be inclined not to set any message because: a) it's probably hard to come up with a number as it might depend on browser, machine specs, etc. and b) it's likely to change over time as bokeh and web technologies improve. What do you think?

@ljwolf
Copy link
Member

ljwolf commented Nov 17, 2016

I agree, looks like a JS issue. I'm not sure when matplotlib will fail, but I'm sure if you try to plot some massive number of points, it'll start to seize up in the image display.

You're also right that it's not worth it to find that limit & display when we think it might fail. I'm game to consider the concerns we discussed from the dev call answered and this to get merged!

@darribas
Copy link
Member Author

Yes, I think at some point any visualization approach is bound to fail if it naively tries to plot all of the data. At that point, approaches like datashader are probably more useful.

+1 on merging as far as I'm concerned, provided executing the guide notebook returns no error.

@darribas
Copy link
Member Author

Just for future reference, this notebook could be an example on how to plot lines with a lot of points and not get bokeh to choke:

https://anaconda.org/jbednar/trajectory/notebook

@sjsrey
Copy link
Member

sjsrey commented Nov 17, 2016

Thanks @ljwolf for digging into this and finding the reason why that particular shapefile was raising the problem. I agree this is not a bug on our side and I'm +1 on merging once the issues you mentioned get resolved. This will be really nice to have in the library!

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling da44055 on darribas:geotable_plot into b18c2cf on pysal:dev.

4 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling da44055 on darribas:geotable_plot into b18c2cf on pysal:dev.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling da44055 on darribas:geotable_plot into b18c2cf on pysal:dev.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling da44055 on darribas:geotable_plot into b18c2cf on pysal:dev.

@coveralls
Copy link

coveralls commented Nov 17, 2016

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling da44055 on darribas:geotable_plot into b18c2cf on pysal:dev.

@darribas
Copy link
Member Author

I think the only thing left here is decide whether we want to bring in @ljwolf's work to make qualitative palettes automatically set k if they have a fixed number of colors. There was a PR for that but can't find it now...

@ljwolf
Copy link
Member

ljwolf commented Nov 17, 2016

That PR is here.

I'll take a look at bringing that up to speed.

@ljwolf
Copy link
Member

ljwolf commented Nov 17, 2016

Revised PR on that is here.

add check for k exceeding the number of available palettable colors
@darribas
Copy link
Member Author

OK, that's merged! If tests pass (which they should), are we good to merge? I think so.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling 1d19ea3 on darribas:geotable_plot into b18c2cf on pysal:dev.

4 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling 1d19ea3 on darribas:geotable_plot into b18c2cf on pysal:dev.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling 1d19ea3 on darribas:geotable_plot into b18c2cf on pysal:dev.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling 1d19ea3 on darribas:geotable_plot into b18c2cf on pysal:dev.

@coveralls
Copy link

coveralls commented Nov 19, 2016

Coverage Status

Coverage increased (+0.6%) to 83.074% when pulling 1d19ea3 on darribas:geotable_plot into b18c2cf on pysal:dev.

@sjsrey sjsrey merged commit a16d75f into pysal:dev Nov 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants