Memory leak on compressed predict requests with oatpp #1316

YaYaB · 2021-07-04T22:05:20Z

Configuration

Version of DeepDetect:
- Locally compiled on:
  - Ubuntu 18.04 LTS
  - Other:
- Docker CPU
- Docker GPU
- Amazon AMI
Commit (shown by the server when starting):
23bd913ac180b56eddbf90c71d1f2e8bc2310c54

Your question / the problem you're facing:

When using the last versions of DeDe (0.18.0 and 0.17.0 at least) I have noticed that there was a memory leak (similar to #1260). I thought that it was fixed but using the following test it does not seem to be.
Tests are made using a 1080Ti gpu fyi.

Error message (if any) / steps to reproduce the problem:

First I run a container using the following image

CALL

docker run --name dd-test --gpus device=0 -p 8080:8080 jolibrain/deepdetect_gpu_tensorrt:v0.18.0

LOG

=====================
== NVIDIA TensorRT ==
=====================

NVIDIA Release 21.04 (build 22393618)

NVIDIA TensorRT 7.2.3 (c) 2016-2021, NVIDIA CORPORATION.  All rights reserved.
Container image (c) 2021, NVIDIA CORPORATION.  All rights reserved.

https://developer.nvidia.com/tensorrt

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version run /opt/tensorrt/install_opensource.sh.
To build the open source parsers, plugins, and samples for current top-of-tree on master or a different branch, run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

DeepDetect v0.18.0-dirty (dev)
GIT REF: heads/v0.18.0:23bd913ac180b56eddbf90c71d1f2e8bc2310c54
COMPILE_FLAGS: USE_CAFFE2=OFF USE_TF=OFF USE_NCNN=OFF USE_TORCH=OFF USE_HDF5=ON USE_CAFFE=OFF USE_TENSORRT=ON USE_TENSORRT_OSS=OFF USE_DLIB=OFF USE_CUDA_CV=OFF USE_SIMSEARCH=OFF USE_ANNOY=OFF USE_FAISS=ON USE_COMMAND_LINE=ON USE_JSON_API=ON USE_HTTP_SERVER=OFF
DEPS_VERSION: OPENCV_VERSION=4.2.0 CUDA_VERSION=11.3 CUDNN_VERSION= TENSORRT_VERSION=21.04
[2021-07-04 21:47:20.374] [api] [info] DeepDetect HTTP server listening on 0.0.0.0:8080

Then I create a service using an nsfw model
CALL

curl -X PUT http://localhost:8080/services/nsfw -d '{
   "description": "nsfw classification service",
   "model": {
    "repository": "/tmp/models/nsfw",
    "create_repository": true,
    "init":"https://deepdetect.com/models/init/desktop/images/classification/nsfw.tar.gz"
   },
   "mllib": "tensorrt",
   "type": "supervised",
   "parameters": {
    "input": {
     "connector": "image"
    }
   }
  }
  '

LOG

DEPS_VERSION: OPENCV_VERSION=4.2.0 CUDA_VERSION=11.3 CUDNN_VERSION= TENSORRT_VERSION=21.04
[2021-07-04 21:47:20.374] [api] [info] DeepDetect HTTP server listening on 0.0.0.0:8080
[2021-07-04 21:48:49.115] [api] [info] Downloading init model https://deepdetect.com/models/init/desktop/images/classification/nsfw.tar.gz
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchedNMS_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CoordConvAC version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CropAndResize version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CropAndResizeDynamic version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::DetectionLayer_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::FlattenConcat_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GenerateDetection_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GridAnchor_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GridAnchorRect_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::InstanceNormalization_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::LReLU_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::NMS_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::NMSDynamic_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Normalize_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::PriorBox_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ProposalLayer_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Proposal version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ProposalDynamic version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Region_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Reorg_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ResizeNearest_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::RPROI_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::SpecialSlice_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Split version 1
[2021-07-04 21:49:00.571] [nsfw] [info] trying to determine the input size...
[2021-07-04 21:49:00.585] [nsfw] [info] found 224x224 as input size
[2021-07-04 21:49:00.585] [api] [info] HTTP/1.1 "PUT /services/nsfw" <n/a> 201 11471ms

Then I launch many predictions with a fixed batche size using the script called dd_test.py that is pasted below
CALL

import json
import sys
import random


# Get random data
def get_random_images(number_images=1000, height=600, width=600):
    images = ["https://picsum.photos/id/{}/{}/{}".format(x, height, width) for x in range(number_images)]

    return images

LISTEN_URL = "http://localhost"
LISTEN_PORT = "8080"

NUMBER_IMAGES = 1000  # Number of images to use

clf_post ={
      "service":"NAME",
      "parameters":{
        "output":{
          "best": 3
        },
        "mllib": {
            "gpu": True
        }
      },
      "data": []
    }


services = {'nsfw': {'bbox': False, 'size': 224}}

url_images = get_random_images(NUMBER_IMAGES)
print(services)

# Launch predictions
nb_run=10
for j in range(nb_run):
    for i in range(0, NUMBER_IMAGES, 6):
        data = url_images[i:i+6]
        for elem, val in services.items():
            clf_post["data"] = data
            clf_post["service"] = elem
            tmp = requests.post("{}:{}/predict".format(LISTEN_URL, LISTEN_PORT), data=json.dumps(clf_post))

LOG

....
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(Pooling): pool, Tactic: -1, eltwise_stage3_block2[Float(1024,7,7)] -> pool[Float(1024,1,1)]
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(CublasConvolution): fc_nsfw, Tactic: 0, pool[Float(1024,1,1)] -> fc_nsfw[Float(2,1,1)]
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(SoftMax): prob, Tactic: 1001, fc_nsfw[Float(2,1,1)] -> prob[Float(2,1,1)]
[2021-07-04 21:50:05.285] [nsfw] [info] Allocated persistent device memory of size 31235584
[2021-07-04 21:50:05.286] [nsfw] [info] Allocated activation device memory of size 272154624
[2021-07-04 21:50:05.286] [nsfw] [info] Assigning persistent memory blocks for various profiles
[2021-07-04 21:50:05.286] [nsfw] [info] detected output dimensions: [2, 1 1 0]
[2021-07-04 21:50:05.534] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 8386ms
[2021-07-04 21:50:05.716] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 177ms
[2021-07-04 21:50:05.895] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 176ms
[2021-07-04 21:50:06.079] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 181ms
[2021-07-04 21:50:06.302] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 218ms
[2021-07-04 21:50:06.505] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 198ms
[2021-07-04 21:50:06.714] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 205ms
[2021-07-04 21:50:06.894] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 176ms
[2021-07-04 21:50:07.086] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 189ms
[2021-07-04 21:50:07.273] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 183ms

Now if you check the evolution of the RAM used we observe an increase (1644Mo at the beginning to 2095Mo after 5 minutes

).

The text was updated successfully, but these errors were encountered:

rguilmont · 2021-07-05T10:52:29Z

More info on that : we managed to reproduce this issues with the above Python scripts, but not with curl.

So after testing more with @YaYaB, we had a strong intuition that it has something to do with the HTTP serving.
After analysing HTTP headers, we found that Python requests by default asks for GZIP encoded answer ( Accept-Encoding: gzip, deflate ) while curl doesn't.
So we manually set this header in curl, and finally reproduced the issue with curl too.

We also tested to send gzip-compressed queries, asking for uncompressed responses, and no memory leak was noticed. So really looks like it's something related to GZIP compression.

YaYaB · 2021-07-05T10:57:19Z

Actualy it is even not related to tensorrt but even with classical caffe predictions with or without gpu

beniz · 2021-07-06T06:31:02Z

@rguilmont @YaYaB gzip/deflate encryption is handled by https://github.com/oatpp/oatpp-zlib from within https://github.com/oatpp/oatpp. The components are simply added here: https://github.com/jolibrain/deepdetect/blob/master/src/http/app_component.hpp#L114

Running valgrind on dede with gzip queries only shows the possible leak below. This looks like an init from libz directly, from the oatpp send function.

@lganzzzo Hi, the ::send function seems to leak from deflateInit, have you seen this before, or are we doing something wrong ? Thanks.

Libz init memory reported by valgrind:

==3020638== 536,192 (11,904 direct, 524,288 indirect) bytes in 2 blocks are definitely lost in loss record 4,799 of 4,801
==3020638==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3020638==    by 0x5AA3418: deflateInit2_ (in /lib/x86_64-linux-gnu/libz.so.1.2.11)
==3020638==    by 0x5AA3651: deflateInit_ (in /lib/x86_64-linux-gnu/libz.so.1.2.11)
==3020638==    by 0x71C063: oatpp::zlib::DeflateEncoder::DeflateEncoder(long, bool, int) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x71B2E8: oatpp::zlib::DeflateEncoderProvider::getProcessor() (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x6EA02D: oatpp::web::protocol::http::outgoing::Response::send(oatpp::data::stream::OutputStream*, oatpp::data::stream::BufferOutputStream*, oatpp::web::protocol::http::encoding::EncoderProvider*) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x6F6DB6: oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x6FB28F: oatpp::web::server::HttpProcessor::Task::run() (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x945BDE3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3020638==    by 0x936B608: start_thread (pthread_create.c:477)
==3020638==    by 0x9837292: clone (clone.S:95)

lganzzzo · 2021-07-06T14:17:31Z

Hey @beniz ,

Your code looks good. Most probably it's on oatpp side.
I'll take a closer look.

beniz · 2021-07-18T09:52:01Z

Hi @lganzzzo how are things ? Do you have any fresh lead on this by any chance ? I've seen issues with libz a long time ago, this could still be outside oatpp.

lganzzzo · 2021-07-20T12:35:41Z

Hey @beniz ,

Yes, at this point it looks like a libz issue.
I'm filing an issue in oatpp to investigate possible fixes.

It might take a while

rguilmont · 2021-07-21T12:17:29Z

Thanks a lot guys.

FYI we've mitigated this gzip issue by setting an Envoy proxy in front of deepdetect, taking care of compression and decompression of requests.

YaYaB changed the title ~~Memory leak TRT predict requests~~ Memory leak predict requests Jul 5, 2021

beniz changed the title ~~Memory leak predict requests~~ Memory leak on compressed predict requests with oatpp Jul 6, 2021

lganzzzo mentioned this issue Jul 20, 2021

libz / oatpp-libz memory leak. Investigate possible ways to fix it. oatpp/oatpp#445

Open

Bycob mentioned this issue Sep 28, 2022

fix(oatpp): oatpp-zlib memory leak #1468

Merged

Bycob closed this as completed Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak on compressed predict requests with oatpp #1316

Memory leak on compressed predict requests with oatpp #1316

YaYaB commented Jul 4, 2021

rguilmont commented Jul 5, 2021 •

edited

Loading

YaYaB commented Jul 5, 2021

beniz commented Jul 6, 2021 •

edited

Loading

lganzzzo commented Jul 6, 2021

beniz commented Jul 18, 2021

lganzzzo commented Jul 20, 2021

rguilmont commented Jul 21, 2021

Memory leak on compressed predict requests with oatpp #1316

Memory leak on compressed predict requests with oatpp #1316

Comments

YaYaB commented Jul 4, 2021

Configuration

Your question / the problem you're facing:

Error message (if any) / steps to reproduce the problem:

rguilmont commented Jul 5, 2021 • edited Loading

YaYaB commented Jul 5, 2021

beniz commented Jul 6, 2021 • edited Loading

lganzzzo commented Jul 6, 2021

beniz commented Jul 18, 2021

lganzzzo commented Jul 20, 2021

rguilmont commented Jul 21, 2021

rguilmont commented Jul 5, 2021 •

edited

Loading

beniz commented Jul 6, 2021 •

edited

Loading