Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support batch-rasterization in rs rasterize #25

Closed
amandasaurus opened this issue Jun 12, 2018 · 6 comments
Closed

Support batch-rasterization in rs rasterize #25

amandasaurus opened this issue Jun 12, 2018 · 6 comments

Comments

@amandasaurus
Copy link

When rs rasterize runs, the memory usage of the process grows and grows, and on my machine is eventually killed by the linux OOM killer. This is the command I use, which tries to generate 28k mask tiles:

./rs rasterize --dataset ./data/config.toml --zoom 12 ./data/ie-buildings.geojson ./data/bld-cover.csv ./data
@daniel-j-h
Copy link
Collaborator

The rasterize tool is associating and caching all features per tile which probably can get large if you rasterize all at once. Try to split it up and rasterize smaller batches, then combine their output.

We probably should look into making this more efficient - if possible.

@daniel-j-h
Copy link
Collaborator

Adding here: if you want to run this in batches you need to make sure your batched features are from different areas. Otherwise you will get multiple images for the same tile containing only parts of the features in cases where multiple features are in the same tile.

@amandasaurus
Copy link
Author

amandasaurus commented Jun 12, 2018 via email

@amandasaurus
Copy link
Author

amandasaurus commented Jun 12, 2018 via email

@daniel-j-h
Copy link
Collaborator

The rs rasterize tool reads in the GeoJSON features and calculates all tile ids covering these features. It then stores a lookup table for the features overlapping each tile. Then we loop over all features rasterizing their feature into the tiles which are covered by the feature.

I would be careful with using tools like ImageMagick. The masks have to be single-channel PNG files. If you want to do this you oculd e.g. use the Pillow library, do np.array(PIL.Image.open(path).convert('p')), merge the arrays, and save the single-channel PNG out again.

Probably easier to do: split your GeoJSON file into multiple files and make sure the features in different files are not for the same tile. You can do this e.g. by having one file per city, or area, or boundary. In case you have features falling into the same tile these have to be rasterized at the same time.

@daniel-j-h daniel-j-h changed the title Memory leak in rs rasterize Support batch-rasterization in rs rasterize Oct 23, 2018
@daniel-j-h
Copy link
Collaborator

I looked into this again. A common use-case is to batch-extract GeoJSON features from OSM and then batch-rasterize them into mask tiles. In case where features are in the same tile but in multiple GeoJSON files we currently rasterize a single tile.

We should extend rs rasterize to

  • check if the tile we rasterize already exists
  • if so load the tile
  • compute the element-wise maximum between the two tiles
  • write out the new tile again

This will allow users to batch-rasterize masks without having to load all GeoJSON features into memory.

48648

48700

merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants