Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planet generation: The more time passes, the more the estimated generation time rises #654

Closed
qlerebours opened this issue Aug 10, 2019 · 12 comments

Comments

@qlerebours
Copy link

qlerebours commented Aug 10, 2019

I'm currently trying to generate tiles for the planet file on z12 and z13.
I know that it requires a strong configuration and that it should take at least a week or two to generate (no need to warn me that this is the optimistic generation time).
My problem is that it took 3 days to generate 25%, the estimated time was 12 days during the first days, then it started to slow down announcing 30 days to generate, then 70, then 100, then 130 and now the estimated time is 170 days... It's been 10 days since the genreration started and it's been generated "only" 28%.

Can someone help me to understand why it keeps slowing down everyday ?

My configuration: 16Gb RAM, 4 threads, 1To SSD
Here is a picture of htop command on the computer

IMG_20190809_180508

IMG_20190809_180500

Can be viewed here too:
https://ibb.co/mRBFh5H
https://ibb.co/MNqQFLc

Thanks

@MartinMikita
Copy link
Collaborator

This is expected (optimistic) generating time duration for the whole planet.

We offer the generating of tiles for the whole planet on our cluster as a service (with our hardware setup it is done in four days): https://openmaptiles.com/cluster-rendering/
The complete latest planet is also available as part of the Production package: https://openmaptiles.com/production-package/ - delivered with weekly updates! (every week you can download a new MBTiles with latest OpenStreetMap data).

This ticket is a duplicate to #242 and #299
Closing.

@qlerebours
Copy link
Author

I really like the job you made with this library, but honestly, i'm done with your copied / pasted answers ...
I know you're offering the generation as a service, but honestly I can't afford that and I'm sure many people can't. I discussed with your commercial support:

  • Either I need to buy the pregenerated mbtiles data with Multi Licence ($2000 + idk how much every year).
  • Either I would use your generation service. As your support said to me: Rendering to zoom 16 would cost 20 000 USD + 40 hours of our time (7000 USD). The delivery is in 2 months. Rendering to zoom 18 would cost much more than that.

I totally understand the fact that you made tiles generation a business because you spent much time working on it. But when people needs to be helped on the open source part, please just answer them in order to help, or don't answer at all. Your business is valuable to people that don't want to wait tiles generation and can afford your services.

Concerning my issue:
I just realized that the estimated generation time was live and computed by mapbox's tilelive. It means that when the job starts, it's generating the top left of a map, where there's nothing to generate (the sea), so this is very fast to generate tiles. When it comes to the lands and big cities, with my config, its changes from 100/s to 2/s because there are many information to compute for a single tile.
To be able to parallelize the generation (or at least prevent any shutdown that could break the generation), I forked the generate-vectortiles project and added the tilelive option that allows to split generation in jobs:
https://github.com/qlerebours/generate-vectortiles/blob/master/export-local.sh

This way, I can set the number of jobs to execute in environement variable $JOBS and the job to execute in $JOB_NUM.
I also created a script that will run a job, copy the .mbtiles file when it's done and run the next one.
If someone needs the script (that is a very simple script), I can send it (I don't have the time now, sorry). Don't hesitate to ask.

@nnhubbard
Copy link
Contributor

@qlerebours Are you getting much faster renders with your jobs script?

@qlerebours
Copy link
Author

No, they are not much faster but I can't use multiple computers.
I also plan to use EC2 Spot to generate them and I will take advantage of running smalls jobs.
The jobs could be stopped at any moment I won't be a problem, they will restart when the ec2 will be available.

For the moment, I split z12 in 50 jobs, each job takes between 1 hour and 12 hours depending on the quantity of data, on a small computer: 16Gb RAM, 4v CPU, 1To SSD.

@carlos-mg89
Copy link

How did it end up going @qlerebours ?

I'm dealing with the generation of the planet in a different way, I think. I'm using the openmaptiles quickstart.sh script, to generate the MBTiles of each country. And then, I merge them all in the same file, with tippecanoe (using the tile-join tool), like this:
tile-join -o planet.mbtiles country1.mbtiles country2.mbtiles

So far it's working quite nicely. However it still takes time to generate all countries and merge them. But at least I don't need a 1TB SSD and I can use it progressively as I have a bigger file. In my case, I don't need the whole world at the moment, only a part of Europe. But I guess you can set up a bunch of cloud servers, along with your computer, to generate all the MBTiles in a cheaper way than getting a computer with LOTS of RAM and SSD.

Cheers!

@qlerebours
Copy link
Author

@carlos-mg89 I thought about doing it the way you are, but I think there may be a problem:
When all the countries are generated, you'll still need to generate the oceans and everything that's not in the countries osm files. I didn't really know how to do it and thought that it could be error prone. This is why I choose to use the "multiple jobs" way.

Even with EC2 c5x.large it take many time to generate tiles. I ran the first 21 jobs on 100 with a first instance, in less than 2 weeks. The problem is that the first 10 were very fast because it's just ocean, not cities.
With a second instance, I ran only the 51st, 52nd, and 53rd jobs (on 100) in more than 1 week. It's very slow because it's in the center of Europe ==> It will be expensive.
I think it's better to buy it's own configuration that paying ec2, but I got credits from AWS.

@skylarmt
Copy link
Contributor

* Rendering to zoom 16 would cost 20 000 USD + 40 hours of our time (7000 USD). The delivery is in 2 months. Rendering to zoom 18 would cost much more than that.

Why would you need to render past zoom level 14? It's vector data and can be overzoomed. Past z14 you're just splitting tiles into smaller chunks, not adding any extra detail.

@qlerebours
Copy link
Author

Yeah, someone told me that a few days after I posted. Thanks for sharing here

@indus
Copy link

indus commented Nov 10, 2020

I finally managed to process a full planet (with quickstart.sh :-) so I could give a rough figure on what to expect.
It took me ~150 days from start to finish for z0-z14 with the default openmaptiles layers (8x Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz; 64 GiB Memory)

I proccesd the data in 4 parts.
z0-z12: 12GB
z13: 17GB
z14-west: 22GB
z14-east: 33GB
...and merged them with tippecanoe.

for better comparisson:

$ docker-compose run openmaptiles-tools test-perf openmaptiles.yaml --no-color
Connecting to PostgreSQL at postgres:5432, db=openmaptiles, user=openmaptiles...
* version()                       = PostgreSQL 9.6.18 on x86_64-pc-linux-gnu (Debian 9.6.18-1.pgdg90+1), compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
* postgis_full_version()          = POSTGIS="3.0.1 ec2a9aa" [EXTENSION] PGSQL="96" GEOS="3.7.1-CAPI-1.11.1 27a5e771" PROJ="Rel. 4.9.3, 15 August 2016" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.2.1" WAGYU="0.4.3 (Internal)"
* jit                             = unrecognized configuration parameter "jit"
* shared_buffers                  = 128MB
* work_mem                        = 4MB
* maintenance_work_mem            = 64MB
* effective_cache_size            = 4GB
* effective_io_concurrency        = 1
* max_connections                 = 100
* max_worker_processes            = 8
* max_parallel_workers            = unrecognized configuration parameter "max_parallel_workers"
* max_parallel_workers_per_gather = 0
* wal_buffers                     = 4MB
* min_wal_size                    = 80MB
* max_wal_size                    = 1GB
* random_page_cost                = 4
* default_statistics_target       = 100
* checkpoint_completion_target    = 0.5

Validating SQL fields in all layers of the tileset

Running all layers test 'us-across' at zoom 14 (2,361 tiles) - A line from Pacific ocean across US via New York and some Atlantic ocean...

Tile sizes for 2,361 tiles (~236/line) done in 0:08:40.3 (4.5 tiles/s)
#######################################################################################################################################
                                                                         52.0 avg size, 0B (14/2759/6158) — 233B (14/2839/6158)
█                                                                       832.1 avg size, 236B (14/2774/6158) — 1,117B (14/3595/6158)
█                                                                        1.2K avg size, 1,117B (14/4526/6158) — 1,323B (14/3752/6158)
██                                                                       1.4K avg size, 1,323B (14/4188/6158) — 1,624B (14/4411/6158)
██                                                                       1.8K avg size, 1,632B (14/3994/6158) — 2,110B (14/2620/6158)
███                                                                      2.3K avg size, 2,110B (14/3013/6158) — 2,560B (14/4079/6158)
████                                                                     2.8K avg size, 2,561B (14/3406/6158) — 3,294B (14/3877/6158)
█████                                                                    3.8K avg size, 3,296B (14/3349/6158) — 4,549B (14/4489/6158)
████████                                                                 5.5K avg size, 4,555B (14/4600/6158) — 7,404B (14/3964/6158)
███████████████████████████████████████████████████████████████████████ 49.0K avg size, 7,418B (14/4122/6158) — 585,849B (14/4824/6158)



================ SUMMARY ================
Generated 2,361 tiles in 0:08:40.3, 4.5 tiles/s, 7,029.6 bytes/tile

4.5 tiles/s seems low to me - I was seeing 40-100 tiles/s most of the time. But 4.5 tiles/s is what this test says ;-)

@ache051
Copy link

ache051 commented Dec 15, 2020

Not really a solution but some experience to share.
We do cuts of entire planet by using AWS spot instances this way:

  1. Use quickstart.sh to load all data to Postgres on one spot instance (r4.16x large) then stop. (~ 2 days)
  2. Create an image from EBS of said instance
  3. Split the planet into 8 "geographical areas" depending on concentration of data - East Europe, West Europe, North America, South America, Oceania, East Asia, West Asia, Middle East and India subcontinent.
  4. Start 8 spot instances (r4.16x large) from image created in 2)
  5. Run generate-vectortiles with 64 processes each for the areas devised in 3) from Z0 to Z14 on the 8 instances (~ 5-7 days in total depending on availability of spot instances)
  6. Combine the eight mbtiles files into one to get world coverage (1 day)
    Altogether it takes about 10 days in total to do a cut of the world. Each complete run cost about US $1k.

@indus
Copy link

indus commented Dec 15, 2020 via email

@ache051
Copy link

ache051 commented Dec 15, 2020

@indus We use a script written by one of our colleagues. Should take an hour max to combine everything together. Try it and tell me how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants