Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmtiles extract feedback thread #68

Closed
bdon opened this issue Sep 4, 2023 · 15 comments
Closed

pmtiles extract feedback thread #68

bdon opened this issue Sep 4, 2023 · 15 comments

Comments

@bdon
Copy link
Member

bdon commented Sep 4, 2023

Comment in this thread with your reports on version 1.9's extract command.

HOW TO USE

pmtiles extract https://r2-public.protomaps.com/protomaps-sample-datasets/protomaps-basemap-opensource-20230408.pmtiles out.pmtiles --region myregion.json
  • works on public HTTP pmtiles, local files, S3-compatible buckets, Azure blobs, Google Cloud Storage buckets
  • Input region can be any GeoJSON - Polygon, MultiPolygon, Feature, FeatureCollection

Info

  • Your source extract size
  • Your query region and maxzoom
  • Your source bucket relative to your location (EC2 to S3 in same region, Cloudflare R2 to desktop across ocean, etc)
  • Information printed in the run of pmtiles extract

example run, extracting z0-z15 US + Mexico from a planet OSM basemap hosted on Source Cooperative, medium EC2 instance in same AWS region

./pmtiles extract protomaps/openstreetmap/tiles/v2/protomaps-basemap-opensource-20230408.pmtiles us_and_mexico.pmtiles --bucket=s3://us-west-2.opendata.source.coop --region=us_and_mexico.json
WARNING: extract is an experimental feature and results may not be suitable for production use.
fetching 303 dirs, 303 chunks, 15 requests
Region tiles 26196418, result tile entries 15282945
fetching 15282945 tiles, 4261 chunks, 339 requests
fetching chunks 100% |█████████████████████████████████████████████████████████████████████████████████████| (17/17 GB, 33 MB/s)
Completed in 9m24.19785483s seconds with 1 download thread.
Extract required 357 total requests.
Extract transferred 18 GB (overfetch 0.1) for an archive size of 17 GB
Verify your extract is usable at https://protomaps.github.io/PMTiles/
Feedback wanted! report your success or failure to https://github.com/protomaps/go-pmtiles/issues

That's 28,000 tiles a second

Right now the download is unoptimized and single threaded - the priority right now is to ensure that the result is correct, we'll make a multithreaded extractor after.

Things to try

  • pass the option --overfetch=0.1 By default, the downloader will batch nearby requests by downloading slightly more data, by default 10%. --overfetch=1.0 lets it download twice the final size of data.
  • --dry-run prints out stats without actually downloading tile data
@wipfli
Copy link
Sponsor Contributor

wipfli commented Sep 5, 2023

I failed to download from a cloudflare R2 bucket:

./pmtiles extract swiss-map/swissmap.pmtiles out.pmtiles --region shape.json --bucket="s3://swiss-map?endpoint=https://5521f1c60beed398e82b05eabc341142.r2.cloudflarestorage.com&region=auto" --overfetch=2
WARNING: extract is an experimental feature and results may not be suitable for production use.
2023/09/05 20:04:31 main.go:125: Failed to extract, Failed to create range reader for swiss-map/swissmap.pmtiles, blob (key "swiss-map/swissmap.pmtiles") (code=Unknown): NoCredentialProviders: no valid providers in chain. Deprecated.

Maybe I did not construct the urls properly? I can write to this bucket with rclone.

@wipfli
Copy link
Sponsor Contributor

wipfli commented Sep 5, 2023

I was able to extract all tiles of Switzerland from a planet-scale pmtiles file with the new command in like a second or even less!

./pmtiles extract ~/swiss-map/planetiler/data/swissmap.pmtiles out.pmtiles --region shape.json

with swissmap.pmtiles being something like 42 GB and out.pmtiles being 180 MB or so. Really cool to see that it works so well for local files!

@bdon
Copy link
Member Author

bdon commented Sep 6, 2023

Maybe I did not construct the urls properly? I can write to this bucket with rclone.

this looks correct, did you set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables*?

@bdon
Copy link
Member Author

bdon commented Sep 11, 2023

adding multithreading in #68

TODOs:

  • Fix maxzoom-only extracts feature: efficient extract of zoom level subset #64
  • Allow the following GeoJSONs: Polygon, MultiPolygon, Feature of Polygon/Multipolygon, FeatureCollection of one or more Polygons/MultiPolygon Features. Disallow non-valid GeoJSON (array of Polygons, array of MultiPolygons, etc)
  • Allow public anonymous HTTP endpoints

bdon added a commit that referenced this issue Sep 11, 2023
* change tasks of byte ranges from slice to linked list

* default download threads to 4; fix tests [#68]

* return errors from download threads
bdon added a commit that referenced this issue Sep 11, 2023
* fix normalization of local paths via filepath.Abs
* force the parameter awssdk=v2 for s3 buckets to use newer SDK
bdon added a commit that referenced this issue Sep 11, 2023
bdon added a commit that referenced this issue Sep 11, 2023
* All operations support public HTTP endpoints [#68]

* fix normalization of local paths via filepath.Abs
* force the parameter awssdk=v2 for s3 buckets to use newer SDK

* handle maxzoom-only extracts [#68, #64]

* fix default maxzoom; support raw geojson geometries [#68, #64]
@bdon
Copy link
Member Author

bdon commented Sep 11, 2023

@wipfli v1.9.1 out with public URL support and multithreading: https://github.com/protomaps/go-pmtiles/releases/tag/v1.9.1

@missinglink
Copy link

missinglink commented Sep 14, 2023

Wow cool, works super fast! it took 16 seconds to create a 71MB extract of Berlin 🤯

time ./pmtiles \
  extract \
  https://r2-public.protomaps.com/protomaps-sample-datasets/protomaps-basemap-opensource-20230408.pmtiles berlin.pmtiles \
  --region berlin.geojson

WARNING: extract is an experimental feature and results may not be suitable for production use.
fetching 10 dirs, 10 chunks, 8 requests
Region tiles 3993, result tile entries 3993
fetching 3993 tiles, 107 chunks, 53 requests
fetching chunks 100% |█████████████████████████| (74/74 MB, 7.7 MB/s)
Completed in 16.588543686s with 4 download threads (240.70828784435986 tiles/s).
Extract required 64 total requests.
Extract transferred 77 MB (overfetch 0.05) for an archive size of 74 MB
Verify your extract is usable at https://protomaps.github.io/PMTiles/
Feedback wanted! report your success or failure to https://github.com/protomaps/go-pmtiles/issues
real	0m16.612s
user	0m0.702s
sys	0m0.333s
-rw-r--r-- 1 root root 71M Sep 14 10:44 berlin.pmtiles
Screenshot 2023-09-14 at 10 56 11

@hfu
Copy link

hfu commented Sep 14, 2023

Many thanks for a fantastic update!

Another success case here. I extracted one city (Tsukuba-shi / つくば市) from GSI topographic map PMTiles hosted on IPFS. Took 25 sec with go-pmtiles v1.9.1.

tsukuba

I'm curious to know if it's possible to extract geometries exactly within the extent geometry. I am afraid it would not be the function of go-pmtiles because it can handle different content type, but cutting vector tileset by extent geometry in exact way looks nice - as in https://optgeo.github.io/14321/#12.84/35.37698/139.38385

@bdon
Copy link
Member Author

bdon commented Sep 14, 2023

@hfu the tiles that are included in the extract are "exactly" the ones covering the extent - for example, if you provide a circular extent, the highest zoom tiles that do not touch the circle, at the corners of the bounding box, will not be included.

However, the extract process currently does not modify the tile contents. Ideally, the extract process should detect you are working with MVT and then clip every geometry exactly to the extent, so that the lowest zoom does not include information outside of the extent.

This is possible and I would like to implement it; the major outstanding issue is the lack of a pure-Go library for robust geometry operations. The conservative choice would be to use GEOS (C++) via its C API, but that will make packaging and distributing pmtiles as a single binary more complex. I am still evaluating if there are emerging alternatives that can perform clipping, while keeping the code simple to maintain.

In the meanwhile, one remedy is to provide a "mask" polygon that is the inverse of your extent and display this as a layer in MapLibre with a solid color between the shape and label layers. This mostly accomplishes correct display, with some drawbacks, such as labels still appearing outside the extent.

@hfu
Copy link

hfu commented Sep 14, 2023

@bdon Many thanks for sharing your thought! Yes, I was also thinking about masking as one remedy and I was also thinking that labels would be cut by the mask. But as long as we are on vector tiles, yes we can put labels over the mask and make labels appear outside the extent mask.

I am excited to know about your intention to implement clipping inside go-pmtiles and I thank you sharing the implementation consideration.

As long as I suppose my use case is somewhat less popular, I would also think about slower way by scripting. Maybe I will be thinking about the following steps if I have some time.

  1. create a sequence of mapbox-vector-tiles from PMTiles, probably using pmtiles serve internally |
  2. decode mapbox-vector-tiles into GeoJSONSeq |
  3. clip that GeoJSONSeq by extent polygon |
  4. compile back to PMTiles using felt/tippecanoe.

This might be slow, ugly, and tend to lose some precision, but it would work as a proof of concept.

I would like to update you about my scripting project, probably by mentioning you from my repository.

Again, I appreciate and respect your work expanding the horizon of free and open source software!

@zstadler
Copy link
Contributor

@bdon wrote:

However, the extract process currently does not modify the tile contents. Ideally, the extract process should detect you are working with MVT and then clip every geometry exactly to the extent, so that the lowest zoom does not include information outside of the extent.

When you implement "clipping inside tiles", please make that optional.

I am thinking about making multiple per-region extracts available for offline viewing, and allowing multiple such extract to be used on a "first available tile" basis. In this scenario, keeping the original tiles intact seems essential for creating a seamless map that covers neighboring regions.

@bdon
Copy link
Member Author

bdon commented Sep 29, 2023

@zstadler yes, it would be optional.

@zstadler
Copy link
Contributor

Response to Ctrl-C was slow. I've also tried Ctrl-\ and Ctrl-@ and then decided to just wait.

$ pmtiles extract --bbox=0,20,40,60 0-0-90-90.pmtiles 0-20-40-60.pmtiles
WARNING: extract is an experimental feature and results may not be suitable for production use.
fetching 994 dirs, 994 chunks, 24 requests
Region tiles 6087310, result tile entries 3789456
fetching 3789456 tiles, 2628 chunks, 88 requests
fetching chunks   0% |                                                                                                         | (42 MB/21 GB, 815 kB/s) [1m56s:7h21m26s]^C^C
^C^\^\^@
^C

@bdon
Copy link
Member Author

bdon commented Sep 29, 2023

@zstadler is your issue the ctrl-C hanging, or the slow download speed?

Can you tell us your operating system?

@zstadler
Copy link
Contributor

It's the response to Ctrl-C in an early stage of the program. I wanted to abort and run again with a different flag.
I'm running in a WSL Ubuntu VM on Windows 11.

@bdon
Copy link
Member Author

bdon commented Nov 12, 2023

I wasn't able to reproduce any problem with Ctrl-C in WSL in Windows 10; it would exit immediately. Let me know if you can find a consistent reproduction and open a new issue.

@bdon bdon closed this as completed Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants