GeoSAFE crash when dealing with very large vector layers. #538

lucernae · 2019-03-27T14:55:55Z

Problem

This issue might relate with other issue because of it's generic symptoms.

Saravana tried to run analysis with a modest size of building layers (60MB of shapefile).
Even if the extent is big, the analysis was complete in around 20minutes or so, after my recent patch to GeoSAFE.
However the problem happens when he is trying to run a second analysis. The server crashed. This bug is reproducible consistently.

Here's the excerpt of the conversation in the email (of my conclusion):

Hi Bala, sorry, just got a chance to work on it now.

So, if I understand correctly. The behaviour can be reproduced consistently.
Your logs and screenshot is very helpful.
I found out these several conclusions:

1. uWSGI crashed because it was unable to keep up with the request. Your droplet have 4 CPU, this should not be a problem, so the problem lies elsewhere.
2. To generate map tiles on the fly for newly created/uploaded layers, we uses QGIS Server backend. By default we uses 4 containers to help offload the job to generate these tiles. I can see that all layers have thumbnails. So it’s not a problem. QGIS is able to generate the tiles.
3. However for the impact layers, I noticed the size is around 100 MB. QGIS has failed to render all the tiles for a given extent. We actually have bigger layers in the past around gigabytes of data. Not sure why it can’t handle this. But seeing the logs (rancher logs) and the memory spike, I’m confident this is the cause. QGIS tried to render a tile > then failed > but it has memory leak.
4. Now, the CPU spike is caused by the swap process (because eventually the memory leak will be higher than available memory). But the swap also has limits. The stagnant RAM usage you saw is the effort of swap trying to move the memory away. But since it’s a leak, it will increase until the container is deleted or the server eventual crash.


Based on this information, I copied all your data (database and media files) from your droplet to replicate this in my machine.
From there, we will decide on how to handle this crash. I’m not sure at the moment because the cause is possibly come from QGIS Server which is a complete package on it’s own. So we are going to probably try to find a workaround to avoid the memory spike.

That concludes my report as of now. :)
I will post again in the slack channel after trying this out.

Regards,


-- 
Rizky Maulana Nugraha
Senior Software Engineer
Kartoza
rizky@kartoza.com





On 25 Mar 2019, at 06.41, Da CodeKid <damacusr@gmail.com> wrote:

Hi Rizky,

Unfortunately I've destroyed that server (the one spiked to 100%).  Just to confirm I created another server during this weekend and ended with same result. 

I just created a new droplet (with same configuration - 8GB / 4CPU) and running the first analysis. The spike occurs only when I run the analysis for the second time.

Below attached server info in case if you'd like to collect the log (I'll destroy the server 24 hours from now).  I've enabled Password login for the server and added your github a/c to rancher login as well (use the same password for /admin portal as well).

The eventual conclusion is QGIS-Server backend was the cause and not optimized to handle the memory leak.

Proposed solution will follow after further investigations.

The text was updated successfully, but these errors were encountered:

lucernae · 2019-03-27T15:22:04Z

Update on the investigations:

I replicated the behaviour on my own machine with the following specs:

CPU: Skylake 4 core, 4.0 GHz
RAM: 16 GB
Disk: plenty/not a problem
OS: Ubuntu, but GeoSAFE run on Rancher, the same way with prod environment.

The crash did happen after the second analysis. The second analysis itself was finished with no problem, but the whole computer crash when I tried to view the layer.
Thus, I conclude the problem lies with QGIS-Server rendering.

It turns out the reason of the memory leak is because for each container, there are several Apache worker thread on it (pretty normal actually). However for this kind of file, which is 100MB geojson layer, a thread takes time to open the file and render it. While this thread is currently working, another thread tried to access the layer (to render a different tile location), it somehow raised permission denied and died without cleanup. This process happens, again and again accumulating dead memory until the container itself is destroyed.

To check if this is coming from apache or QGIS own code. I tried to load the layer on QGIS desktop and memory it uses is a whopping 8GB for a mere 100MB geojson layer.

Wow....

It is only using around 500 MB if the geojson is converted to shapefile. So I guess the problem comes from the data format. Maybe strings type is not handled properly.

Possible Solution

Because the problem lies with the data format and how QGIS handled it, we can use the following alternatives for short term solutions:

Do not upload big vector layer in geojson format. Shapefile will do to optimize the size. (I know, I also hate it. But the memory consumptions don't lie).
Even if we use input layers with shapefiles, the analysis will be in GeoJSON format... Which is a problem for now. I don't have any workaround other than to limit the memory consumptions by Apache config. So the plan is to limit for 1 worker/thread per containers and disable KeepAlive (so the thread will die and created anew, clean memory). The scale settings can be configured in Rancher depending on how many containers the host can handle. This will make the rendering a little bit slow, but at least (I hope) it doesn't crash and the site will self heal (if container crash, it will create a new one).

Long term solutions:

Use geopackage, of course, and move on.

Short term solution no. 2 seems practical, but we need to build/test the new QGIS-Server backend container optimized for rancher like this.

lucernae added the bug label Mar 27, 2019

lucernae self-assigned this Mar 27, 2019

lucernae added this to To do in Support Task GeoSAFE - Saravana via automation Mar 27, 2019

lucernae mentioned this issue Apr 4, 2019

QGIS Server Crashing with memory leak on Huge GeoJSON layer. kartoza/docker-qgis-server#22

Open

gubuntu added this to Funnel in kartoza/geosafe May 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeoSAFE crash when dealing with very large vector layers. #538

GeoSAFE crash when dealing with very large vector layers. #538

lucernae commented Mar 27, 2019 •

edited

lucernae commented Mar 27, 2019

GeoSAFE crash when dealing with very large vector layers. #538

GeoSAFE crash when dealing with very large vector layers. #538

Comments

lucernae commented Mar 27, 2019 • edited

Problem

lucernae commented Mar 27, 2019

Possible Solution

lucernae commented Mar 27, 2019 •

edited