Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release topojson v1.0 #44

Closed
3 tasks done
mattijn opened this issue Aug 11, 2019 · 15 comments
Closed
3 tasks done

Release topojson v1.0 #44

mattijn opened this issue Aug 11, 2019 · 15 comments

Comments

@mattijn
Copy link
Owner

mattijn commented Aug 11, 2019

Current version on PyPi is v1.0rc4.
Things left to do for release v1.0:

  • framework of code including options is there
  • more docs needed
  • travis doesn't work yet
@mattijn
Copy link
Owner Author

mattijn commented Aug 11, 2019

Btw: this was the end of my summer of code. Tomorrow back to work..

@martinfleis
Copy link
Contributor

Hi @mattijn, I just came across this repo. Really good work!
In GeoPandas we had some discussion and attempts to have a direct io to Topojson (geopandas/geopandas#645, geopandas/geopandas#610). While it is simple to read through Fiona, we don't have a way of saving to Topojson now.

Instead of the PR mentioned above, I would like to see it done using your package, so I am checking with you to see whether it is a safe (stable) option yet. https://github.com/calvinmetcalf/topojson.py seem to be unmaintained and with a fraction of options.

If you need a hand pushing the last bit, let me know, I might find a moment.

@mattijn
Copy link
Owner Author

mattijn commented Apr 20, 2020

Thanks @martinfleis, for your comment. There are still a few open issues to solve before releasing a 1.0 version. A bit of pushing is definitely helpful. Potential integration with geopandas might be that push! :)

Besides the defined issues, there are two things to be aware of to find out if this repo is useful for integration:
1. Speed.
Computing a topology can be cost-intensive. This whole repo is more or less build around the shared_paths function. But there is also the current bottleneck (here in code). It is because the shared_paths function cannot do broadcasting. I tried using pygeos (its documented in this issue), but the bottleneck seems to be on the GEOS side.

I never compared this repo to the unmaintained https://github.com/calvinmetcalf/topojson.py repo in terms of speed and correctness, but the start principles differs. This repo builds on shapely and numpy, where the other is solely using python.

2. Toposimplify
To be honest: I don't really understand the units of the toposimplify parameter. The required value is much different for geographic data in degrees or in meters. There is no issue yet, but in the (slowly appearing) documentation page, I mentioned this:

I don’t really know what the current input value means, but I do know that there is currently NO option to use a %-value (like in mapshaper.org).

It would be a great contribution if you can make the toposimplifiy setting work using percentage as input!

This would also make it possible to have an improved interactive experience, since the exact toposimplify value is almost never equal.

So to answer your question, its close to stable, but not yet ready.

@martinfleis
Copy link
Contributor

Thank you!

I don't really understand the units of the toposimplify parameter.

For Douglas-Pecker algorithm, epsilon is in the same units as geometry, that is why there is so big difference between degrees and meters. I like this animation from Wikipedia, which is clear on what is epsilon.

image

Epsilon in this case is the half of the width of the blue buffer around the line connecting furthest points. If the point happens to be within the buffer (i.e. its distance from the line connecting furthest points is smaller than epsilon), it gets deleted.

For Visvalingam-Whyatt algorithm, epsilon is different, aerial value, so it will be larger. VW constructs triangles between the points along the line and removes those points, which are associated (are on the top of) with the triangle which area is smaller than epsilon.

I hope it is clear.

In both cases, values are depending on the actual units. It is then complicated to guess default value. That is likely the reason why shapely's simplify does not have default and user has to specify it. That might be good way for topojson as well.

Computing a topology can be cost-intensive

I'll need to explore your code to say anything meaningful on speed during topology creation.

One speedup could be possible in here, if we used vectorized pygeos version of simplify instead of loop through shapely.

topojson/topojson/ops.py

Lines 529 to 541 in 238fb29

if package == "shapely":
if input_as == "array":
list_arcs = []
for ls in linestrings:
coords_to_simp = ls[~np.isnan(ls)[:, 0]]
simple_ls = geometry.LineString(coords_to_simp)
simple_ls = simple_ls.simplify(epsilon, preserve_topology=False)
list_arcs.append(np.array(simple_ls).tolist())
elif input_as == "linestring":
for idx, ls in enumerate(linestrings):
linestrings[idx] = ls.simplify(epsilon, preserve_topology=False)
list_arcs = linestrings

GeoPandas is close to 0.8 release, so I would like include toposimplify (geopandas/geopandas#1387) in 0.9. There's some time to release topojson 1.0 in the meantime.

@martinfleis
Copy link
Contributor

there is currently NO option to use a %-value (like in mapshaper.org)

The % is mapshaper is the number of retained points. I don't think there's any option how to do that using GEOS and simplification. You would have to implement the algorithm by yourself and have percentage as a stop point during its iteration.

@mattijn
Copy link
Owner Author

mattijn commented Apr 22, 2020

Thanks for the detailed feedback. Very much appreciated. As you have noticed I started to push the last bits of open issues.

In coming week I'll distill the above mentioned points in separate issues, and create a new release candidate version (v1.0rc9), based on the current code base.

In the period after will continue to improve the documentation of the manual and inline docstrings. No new features will be introduced and if no apparent bugs appear the v1.0 release will be the same as the v1.0rc9.

@martinfleis
Copy link
Contributor

Just a small followup. After your question I decided to write up the explanation of algorithms here http://martinfleischmann.net/line-simplification-algorithms/

@mattijn
Copy link
Owner Author

mattijn commented Apr 28, 2020

That is great @martinfleis! Really nice explanation of differences between the two epsilons. Love it.

I created #73, which provides an example of the time-expensive bottleneck I mentioned above.

@mattijn
Copy link
Owner Author

mattijn commented Jun 15, 2020

@martinfleis, as you might have noticed, I've released version 1.0rc10. I said before that there will be no other releases after rc9, but after all your feedback + much appreciated suggestions I have made quite some changes that do not yet deserve a full 1.0 release. Any feedback still much welcome!

@martinfleis
Copy link
Contributor

@mattijn I've seen, the performance gain is great. What are the remaining bottlenecks now? I did not have a chance to actually see what has changed.

@mattijn
Copy link
Owner Author

mattijn commented Jun 16, 2020

Compared to rc9 I've worked my way through Extract, Join, Cut and Dedup to fix bottlenecks in each of them. In Hashmap I've made sure that it works OK with all changes made, but I did not fix or change bottlenecks there. With some timings for larger files (it varies a bit by file) most of the time is now spend in the _backward_arcs function within the Hashmap class.

If no big issues appear I'll release 1.0 beginning of July and then probably wait till shapely 2.0 has arrived and see what can be done better with the new functionalities in there.

@martinfleis
Copy link
Contributor

I'll try to find a moment to play with the master in the following days.

wait till shapely 2.0 has arrived and see what can be done better with the new functionalities in there.

It will be essentially what is in pygeos now, so you can already play with that (I know you already did). If you have pygoes in your environment, you can use geopandas master which uses pygeos geometry under the hood (https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency), so you can work directly with the arrays.

@mattijn
Copy link
Owner Author

mattijn commented Jun 17, 2020

I've checked with geopandas master including pygeos and all still works:

import geopandas as gpd
import topojson as tp

print(f'geopandas: {gpd.__version__}\ntopojson:  {tp.__version__}')
gpd.options.use_pygeos = True

gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
topo = tp.Topology(gdf.query('continent == "Africa"')).toposimplify(4)

topo.to_svg()
topo.to_gdf().head(1)
geopandas: 0.7.0+79.g9f4c995
topojson:  1.0rc10

africa

geometry id pop_est continent name iso_a3 gdp_md_est
0 POLYGON ((33.90369 -0.95000, 39.20219 -4.67675... None 53950935 Africa Tanzania TZA 150600.0

Regarding pygeos related things to investigate:

  • Probably its necessary to change a few things related to the STRtree
  • It will be interesting to investigate if it is possible to use GEOSRelate (DE-9IM, boundary-boundary) once its landed in pygeos (this could be test with current shapely already).
  • Since geometries in pygeos/shapely 2.0 are immutable, maybe can use these and use numpy.asarray() representations for a view.

@martinfleis
Copy link
Contributor

Indeed, everything should work, there's a compatibility layer in GeoPandas.

Probably its necessary to change a few things related to the STRtree

Look at new query and especially query_bulk options. Both are backwards compatible, so they work even without pygeos and are super fast (with pygeos).

@mattijn
Copy link
Owner Author

mattijn commented Nov 25, 2020

Released v1.0 on PyPi and GitHub tag v1.0 reflects this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants