Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Separate Pools for Raster IO + Cache #5219

Merged
merged 8 commits into from Oct 14, 2019
Merged

Conversation

notthatbreezy
Copy link
Contributor

Overview

In #5200 we removed parTraverse because we had evidence that tiles
with many IO operations were using up the default execution context
threads and blocking on requests from S3. Before, we experimented with
different execution contexts for the whole application, but not in
specific circumstances. This commit introduces parTraverse back into
our rendering and specifically uses a cached thread pool in this
context.

Additionally, the memcached client had an internal pool that was
scaled based on the number of cores present. This PR adjusts that to
also use a cached threadpool with the understanding that since this is
IO we should not be limiting ourselves to the cores of the machine.

Checklist

  • Description of PR is in an appropriate section of the changelog and grouped with similar changes if possible

Notes

Load test from develop
image

Load test from this PR
image

It doesn't move the needle that much, or if at all, but the story makes more sense I think conceptually without too much overhead in terms of complexity so I think it's still worth it.

I did test out using fibers, but there wasn't a noticeable bump and on the global execution context we were still running into issues around paralellizing the fetches to S3.

Testing Instructions

  • Rebuild Jars
  • Start server and browse around

Closes #5199
Closes #5191

In #5200 we removed `parTraverse` because we had evidence that tiles
with many IO operations were using up the default execution context
threads and blocking on requests from S3. Before, we experimented with
different execution contexts for the whole application, but not in
specific circumstances. This commit introduces `parTraverse` back into
our rendering and specifically uses a cached thread pool in this
context.

Additionally, the memcached client had an internal pool that was
scaled based on the number of cores present. This PR adjusts that to
also use a cached threadpool with the understanding that since this is
IO we should not be limiting ourselves to the cores of the machine.
@Lknechtli
Copy link
Contributor

Lknechtli commented Oct 10, 2019

Loading the NYC project at a low zoom level seems to cripple the tile server with raster-io threads.
The heap usage signals to me that something's happening that's probably memory bandwidth limited, and is rapidly creating / destroying objects that need to be cleaned up every so often not so sure about this after the later testing
image

There are 2-300 raster-io threads active
image

It doesn't seem right to me that the raster-io threads would be active for this entire time, the resource usage on my computer shows me that it's not spending a whole lot of time spinning the cpu or fetching things on the network, so this tells me that we're running into a resource that's being shared across all these threads and probably running into some serious context switching issues or something.

@Lknechtli
Copy link
Contributor

going by this, it looks like the actual read is the problem. So I might be running into gdal working under extreme memory pressure.
image

I think we need to make it so that paintedRender can be interrupted and release any resources it's using (gdal etc), because when we have severe pressure like this, later requests will just keep piling up IO threads and putting more pressure on the blocking resource. It took about 10-15 minutes for the tile server to finish the actual reads on a single page of tiles, which is pretty extreme.

@Lknechtli
Copy link
Contributor

Lknechtli commented Oct 10, 2019

another test case.
50 raster-io threads, single tile first reads:

First read on tile containing a single image:
image

First read on a tile containing >50 images (same project zoomed out)
image

Interestingly, the reads that happen after the initial burst (delayed due to the 50 thread limit) go much faster, ranging from a high of 60 seconds to as low as 15 seconds.

The number of threads currently fetching has a drastic impact on how fast the reads happen. With 17 threads going, it maxes out at a minute, 50 threads is > 100 seconds.

This is not IO or compute bound, my network activity is pretty consistently under 100 kb/s with a few spikes which I assume are image fetches. Cpu doesn't max out any cores.

When I limit things to 10 threads, each individual read takes about 25-30 seconds, but the overall time to complete is about 160 seconds as opposed to nearly 10 minutes with an unlimited number of threads

Limited to 4 threads, reads take under 6-10 seconds each (1.5-2.5 sec/img bandwidth) for a total of 112 seconds
2 threads takes about 3 seconds per image (1.5 sec/img bandwidth), totalling 93 seconds
1 thread takes 1.3-1.6 seconds per image (1.5 sec/img bandwidth), totalling 94 seconds

Copy link
Contributor

@Lknechtli Lknechtli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good once we switch to a smaller threadpool to avoid crushing gdal until we get geotiff raster sources working.

@notthatbreezy notthatbreezy merged commit a134247 into develop Oct 14, 2019
@notthatbreezy notthatbreezy deleted the test/threadpools branch October 14, 2019 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Use fibers for processing mosaics concurrently Prototype Using Cached Threadpool for Raster IO
2 participants