Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low CPU usage for Tippecanoe on EC2 #218

Open
damg22 opened this issue Mar 16, 2024 · 6 comments
Open

Low CPU usage for Tippecanoe on EC2 #218

damg22 opened this issue Mar 16, 2024 · 6 comments

Comments

@damg22
Copy link

damg22 commented Mar 16, 2024

Currently attempting to run Tippecanoe on EC2 with a very large geojson file (~110GB), this file takes too long to progress (an indefinite amount that is at least more than 48 hours). After a lot of searching for root causes, once I ran 'top', I noticed Tippecanoe was using only 1 of 16 provided cores. When I ran 'top' locally, on mac os, Tippecanoe was using 7 cores, which would explain why it was so much faster locally.
After playing around with Tippecanoe and reading your docs, I noticed the TIPPECANOE_MAX_THREADS argument, I set the threads to 16, one per core, and it seems like this briefly raised the CPU usage to 16, but after it gets to 99.9% reading, the cpu usage drops to only 1 core, this causes the job completion to take days.
Do you have any recommendations or help that you could provide in debugging this issue?

@DeepakSharda
Copy link

DeepakSharda commented Mar 16, 2024 via email

@mtravis
Copy link

mtravis commented Mar 16, 2024

Try converting the Geojson to Flatgeobuf using ogr2ogr and then running Tippecanoe.

FGBs are smaller, stream quicker and runs jobs in parallel by default.

Hope that helps

Matt

@DeepakSharda
Copy link

DeepakSharda commented Mar 16, 2024 via email

@mtravis
Copy link

mtravis commented Mar 16, 2024

No problem. tile-join only works on mbtiles so I don't think you'd see any improvement there.

@damg22
Copy link
Author

damg22 commented Mar 18, 2024

@mtravis Thanks for the suggestion, my concern is at the end of the day i'd have to run Tippecanoe on the EC2 instance with a single core anyways. I have done a good amount of testing on the instance and it seems like this is related to a Tippecanoe implementation. Currently looking through the source code for a possible bug.

Wondering if @e-n-f has any insights on this ? Seems like the TIPPECANOE_MAX_THREADS argument simply isn't forcing more cpu usage. I can confirm the docker container has 16 cores available.

@e-n-f
Copy link
Collaborator

e-n-f commented Mar 20, 2024

Tippecanoe will generally use as many CPUs as are available, even if TIPPECANOE_MAX_THREADS is not set, but there are a few parts of tippecanoe that are inherently single-threaded: feature reordering after ingestion and before tiling is limited by I/O speed, and most of processing the z0 tile is a single thread since there is only one tile in the zoom level.

Are there any log messages visible at the point where it is stuck? "Reordering geometry?" "Merging vertices?"

Can you share a copy of the GeoJSON file so I can try to reproduce the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants