Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pmtiles extract command #31

Closed
wipfli opened this issue Nov 12, 2022 · 19 comments
Closed

Add pmtiles extract command #31

wipfli opened this issue Nov 12, 2022 · 19 comments

Comments

@wipfli
Copy link
Sponsor Contributor

wipfli commented Nov 12, 2022

It would be useful to be able to extract tiles from a pmtiles file for a certain area and put the extracted tiles into a new pmtiles file.

It could look something like this:

pmtiles extract input.pmtiles output.pmtiles --bounds "47.5,7.5,48.0,8.5" 

A usecase could be to have a global tileset and allow users to download an extract for a city or so.

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Nov 12, 2022

Maybe there is already a trivial way to do something like this with mbtiles tools like tile-join, but it would be super nice to be able to do this directly with pmtiles.

@bdon
Copy link
Member

bdon commented Nov 12, 2022

Yeah, this is not implemented yet for v3. The new design should allow for really efficient spatial extracts from remote archives, but the implementation will be a bit more complex.

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Nov 15, 2022

I guess the Hilbert ordering of tiles helps with the efficient extracts, right? Where do you think will be the main complexity in implementing area downloads?

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Nov 15, 2022

Can I help implementing an extract command?

@bdon
Copy link
Member

bdon commented Nov 16, 2022

Here is list of all the tasks I can think of right now:

  • extract should work with rectangular BBOX
  • extract should work with given polygon
  • extract should work with multipolygon
  • extract should find tileID coverings efficiently using boundary Hilbert algorithm
  • extract should batch together consecutive IO in clustered mode
  • extract should work with remote archives
  • when doing MVT extract should create masking features on tiles outside the query area
  • when doing PNG/WebP extract should create transparent mask on tiles outside query area
  • when doing JPEG extract should create black or chosen color mask on tiles outside query area

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Nov 16, 2022

Cool, thanks for the list. What also could be nice is geofabrik-like area names, e.g. --area switzerland

@lseelenbinder
Copy link

What also could be nice is geofabrik-like area names, e.g. --area switzerland

Supporting the .poly format would allow for that. (They have the downloadable polygons in that format on Geofabrik.)

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Dec 4, 2022

Haha my first go program!

	case "extract":
		var z uint8 = 14
		var x_min uint32 = 0
		var x_max uint32 = 10000 // included
		var y_min uint32 = 0
		var y_max uint32 = 10000 // included
		
		var tile_ids []uint64

		for x := x_min; x <= x_max; x++ {
			for y := y_min; y <= y_max; y++ {
				// fmt.Println(z, x, y, pmtiles.ZxyToId(z, x, y))
				tile_ids = append(tile_ids, pmtiles.ZxyToId(z, x, y))
			}
		}

		sort.Slice(tile_ids, func(i, j int) bool { return tile_ids[i] < tile_ids[j]})

		var tile_id_ranges [][2]uint64

		tile_id_ranges = append(tile_id_ranges, [2]uint64{tile_ids[0], tile_ids[0]})

		for i := 1; i < len(tile_ids); i++ {
			if tile_id_ranges[len(tile_id_ranges)-1][1] + 1 == tile_ids[i] {
				tile_id_ranges[len(tile_id_ranges)-1][1] = tile_ids[i]
			} else {
				tile_id_ranges = append(tile_id_ranges, [2]uint64{tile_ids[i], tile_ids[i]})
			}
		}

		fmt.Println(len(tile_ids))
		fmt.Println(len(tile_id_ranges))
		

It computes the ranges of tiles which lie in hilbert ordering inside a rectangle which is defined by a x, y bounding box. Running it at z14 for something which is almost as large as the planet takes like 20 seconds on my laptop...

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Dec 4, 2022

wipfli@16bea22

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Dec 4, 2022

I found a few snippets on how to write a .pmtiles file, but I am not sure how to read one. Can I use the code in loop.go for reading directory entries and tile data?

@bdon
Copy link
Member

bdon commented Dec 5, 2022

You should reuse the code in https://github.com/protomaps/go-pmtiles/blob/main/pmtiles/directory.go#L135 etc

loop.go isn't the best comparison, because that requires random access over an archive. for extracting you should only need to iterate over the entire directory once, so you can keep just one leaf in memory at a time instead of an LRU cache.

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Dec 5, 2022

Thanks for the hint. It seems like show.go is a good starting point for reading .pmtiles as it uses also the deserialize function. At least I find there the logic for going from tile_id and .pmtiles file to the entry.

By the way, the z/x/y = 0/0/0 tile in my example archive has offset = 0. Does that mean that the first byte of the pmtiles file is where the 0/0/0 tile starts?

@bdon
Copy link
Member

bdon commented Dec 5, 2022

for tile entries (runlength > 0) the offset is relative to header.tile_data_offset

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Dec 7, 2022

I don't quite understand yet how pmtiles works. Maybe you can help me with an example... Let's say I would like to extract a single tile tile_id from input.pmtiles and write it to output.pmtiles. What would be the main steps to do this?

@bdon
Copy link
Member

bdon commented Dec 7, 2022

The logic to extract a single tile is these lines in show.go: https://github.com/protomaps/go-pmtiles/blob/main/pmtiles/show.go#L126-L164

https://github.com/protomaps/go-pmtiles/blob/main/pmtiles/show.go#L144 is the point at which you know you have a matching tile in the archive and grab the bytes using TileDataOffset, and then can write It out on L154

@wipfli
Copy link
Sponsor Contributor Author

wipfli commented Dec 7, 2022

Thanks, exactly, this seems like the right approach for reading data.

What I am unsure about is how I get from an EntryV3 with the bytes in a buffer I control to an output.pmtiles file which contains only this tile. I looked at the convert tools but could not figure out how to get started writing a new pmtiles file...

@bdon
Copy link
Member

bdon commented Dec 11, 2022

#20 is relevant so we can refactor the write logic out of convert.go

@spatialillusions
Copy link

Can I suggest that it would also be nice if there was a --maxzoom option, similar to what is implemented in the python version, so that it would be possible to extract tiles down to a specific zoom level.

bdon added a commit that referenced this issue Sep 2, 2023
bdon added a commit that referenced this issue Sep 4, 2023
* Experimental cli support for extracting a region from a larger archive, given a maxzoom and GeoJSON multipolygon region.
* Limited to credentialed buckets or local files now, public HTTP to come later
* Limited to a single download thread
* Change directory optimization to be faster and match Java implementation, affects root/leaf sizes
bdon added a commit that referenced this issue Sep 4, 2023
* include the DstOffset so we can multithread downloads later
* set header statistics
* implement --dry-run
* add logging messages for user feedback
bdon added a commit that referenced this issue Sep 4, 2023
* implement pmtiles extract [#31, #52]

* Experimental cli support for extracting a region from a larger archive, given a maxzoom and GeoJSON multipolygon region.
* Limited to credentialed buckets or local files now, public HTTP to come later
* Limited to a single download thread
* Change directory optimization to be faster and match Java implementation, affects root/leaf sizes

* Finish initial extract [#31]

* include the DstOffset so we can multithread downloads later
* set header statistics
* implement --dry-run
* add logging messages for user feedback
@bdon
Copy link
Member

bdon commented Sep 4, 2023

This has been implemented in https://github.com/protomaps/go-pmtiles/releases/tag/v1.9.0

The only big missing feature is faster download speed - let's exercise the code as-is to make sure results are correct

starting a separate feedback thread, report success and failure there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants