Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown in buffer of a MultiPolygon in GEOS 3.8 #293

Closed
jorisvandenbossche opened this issue Mar 10, 2020 · 13 comments
Closed

Slowdown in buffer of a MultiPolygon in GEOS 3.8 #293

jorisvandenbossche opened this issue Mar 10, 2020 · 13 comments

Comments

@jorisvandenbossche
Copy link
Contributor

jorisvandenbossche commented Mar 10, 2020

Original report from Shapely at shapely/shapely#834 (but further discussed in shapely/shapely#847 (comment) and conda-forge/geos-feedstock#43).

A reproducer using python and pygeos (reproducer with shapely is at shapely/shapely#834 (comment)):

import numpy as np
import pygeos

t = np.arange(1, 10000)
arr = pygeos.points(100000*np.sin(t), 100000*np.sin(4*t)) 
arr2 = pygeos.buffer(arr, radius=200, quadsegs=32) 
union = pygeos.union_all(arr2) 
# union is a large MultiPolygon now

%timeit pygeos.buffer(union, radius=1) 

The above gives a timing of around 60 ms with GEOS master or with 3.7, but around 3 seconds with GEOS 3.8.0.
So it seems this is already fixed on master, and so this might be a potential candidate to backport to 3.8? (I checked that the latest 3.8 branch (to become 3.8.1 I assume) does not yet include the fix, as it gives a similar timing as 3.8.0)

I didn't yet try to look for the commit that fixes this.

@Algunenano
Copy link
Contributor

Algunenano commented Mar 10, 2020

Some measurements with the packages I have lying around, using the python script from shapely/shapely#834 (comment):

Everything was built from master in different dates:

  • geos-git-3.8.0dev.4621.1df959cc.master-1-x86_64.pkg.tar:
    real 0m7.659s

  • geos-git-3.8.0rc3.4699.3a9ce542.master-1-x86_64.pkg.tar:
    real 0m7.326s

  • geos-git-3.9.0dev.4717.6d7d2153.master-1-x86_64.pkg.tar:
    real 0m7.370s

  • geos-git-3.9.0dev.4743.9e46b935.master-1-x86_64.pkg.tar:
    real 0m5.835s

  • geos-git-3.9.0dev.4751.7ed0c824.master-1-x86_64.pkg.tar:
    real 0m5.955s

  • geos-git-3.9.0dev.4799.7f2d9695.master-1-x86_64.pkg.tar:
    real 0m2.974s

  • geos-git-3.9.0dev.4840.e37d04f8.master-1-x86_64.pkg.tar (HEAD from a few days ago):
    real 0m2.835s

So it seems master got at least two changes to improve times, one between 9e46b93 and 7ed0c82, and one (the most impactful) between 7ed0c82 and 7f2d969.

I hope this helps, if necessary I could try to create more intermediate packages to bisect the improvements.

@jorisvandenbossche
Copy link
Contributor Author

jorisvandenbossche commented Mar 10, 2020

and one (the most impactful) between 7ed0c82 and 7f2d969.

This second one, I could pin to 1bf16cd (for my pygeos example, time goes down from around 1.8s to around 60ms)

@dbaston
Copy link
Member

dbaston commented Mar 10, 2020

Many thanks @Algunenano for the testing. I'm not surprised that 1bf16cd provides a performance improvement, but I don't see any explanation for a huge performance regression from 3.7.x to 3.8.x, which is what I think you're reporting, @jorisvandenbossche . I'm not sure it's wise to backport 1bf16cd to a stable branch.

@jorisvandenbossche
Copy link
Contributor Author

Yes, I just wanted to ask the same, whether 1bf16cd is a backportable commit. I certainly understand if that is not the case.

I can try to investigate later a bit more to see where the performance regression came from.

@Algunenano
Copy link
Contributor

Some more tests:

  • geos-git-3.7.2.r28.g45b28eb4-1-x86_64.pkg.tar (3.7/HEAD)
    real 0m2.162s

  • geos-git-3.8.1dev.4723.17110ec8.3.8-1-x86_64.pkg.tar (3.8/HEAD)
    real 0m7.995s

Now going from more recent to older in the 3.8 branch:

  • geos-git-3.8.0dev.4340.6edf1772.3.8-1-x86_64.pkg.tar:
    real 0m8.167s

  • geos-git-3.8.0dev.4235.901192ad.3.8-1-x86_64.pkg.tar:
    real 0m8.268s

  • geos-git-3.8.0dev.4196.02ddad98.3.8-1-x86_64.pkg.tar:
    real 0m8.638s

  • geos-git-3.8.0dev.4175.6ca3f4c2.3.8-1-x86_64.pkg.tar:
    real 0m2.173s

I'll try to bisect between 6ca3f4c and 02ddad9 (~20 commits).

@Algunenano
Copy link
Contributor

  • geos-git-3.8.0dev.4186.16a39b0b.3.8-1-x86_64.pkg.tar:
    real 0m8.419s

  • 3.8.0dev.4185.fb560254.3.8: Does not build

  • geos-git-3.8.0dev.4163.db5723b7.3.8-1:
    real 0m8.171s

  • geos-git-3.8.0dev.4162.9aeb4013.3.8-1-x86_64.pkg.tar
    real 0m8.241s

  • geos-git-3.8.0dev.4157.b0504db5.3.8-1-x86_64.pkg.tar
    real 0m2.125s

So there is 2 options:

  • This merge commit: 893b84b
  • This commit that fixed the build after the merge: 9aeb401

There have been multiple changes in 3.8 after that so I can't simply revert those on the 3.8 branch, but I'm fairly certain that 893b84b is the cause of the slowdown.

@pramsey
Copy link
Member

pramsey commented Mar 10, 2020

My money is on the first one 893b84b, changes in polygon builder seem like they'd probably affect buffering

@dbaston
Copy link
Member

dbaston commented Mar 10, 2020

Agreed. I suspect cherry-picking 8b0f0cd into 3.8 (definitely safe) may resolve this.

@Algunenano
Copy link
Contributor

Agreed. I suspect cherry-picking 8b0f0cd into 3.8 (definitely safe) may resolve this.

Using 3.8/HEAD + 8b0f0cd:
real 0m2.949s

So yeah, it more or less solves the issue although things aren't as fast as they were before that commit.

@pramsey
Copy link
Member

pramsey commented Mar 10, 2020

I'll settle for much improved for now.

@dbaston
Copy link
Member

dbaston commented Mar 10, 2020

Applied to 3.8 as 6b12bcc

@jorisvandenbossche
Copy link
Contributor Author

I can confirm that 3.8/HEAD + 8b0f0cd gives almost the same speed as master (around 70ms for my pygeos example, vs around 3s for 3.8.0)

Thanks all!

@dbaston
Copy link
Member

dbaston commented Mar 10, 2020

Thanks for your persistence 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants