Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on compressed predict requests with oatpp #1316

Closed
1 of 6 tasks
YaYaB opened this issue Jul 4, 2021 · 7 comments · Fixed by #1468
Closed
1 of 6 tasks

Memory leak on compressed predict requests with oatpp #1316

YaYaB opened this issue Jul 4, 2021 · 7 comments · Fixed by #1468

Comments

@YaYaB
Copy link
Contributor

YaYaB commented Jul 4, 2021

Configuration

  • Version of DeepDetect:
    • Locally compiled on:
      • Ubuntu 18.04 LTS
      • Other:
    • Docker CPU
    • Docker GPU
    • Amazon AMI
  • Commit (shown by the server when starting):
    23bd913ac180b56eddbf90c71d1f2e8bc2310c54

Your question / the problem you're facing:

When using the last versions of DeDe (0.18.0 and 0.17.0 at least) I have noticed that there was a memory leak (similar to #1260). I thought that it was fixed but using the following test it does not seem to be.
Tests are made using a 1080Ti gpu fyi.

Error message (if any) / steps to reproduce the problem:

First I run a container using the following image

CALL

docker run --name dd-test --gpus device=0 -p 8080:8080 jolibrain/deepdetect_gpu_tensorrt:v0.18.0

LOG

=====================
== NVIDIA TensorRT ==
=====================

NVIDIA Release 21.04 (build 22393618)

NVIDIA TensorRT 7.2.3 (c) 2016-2021, NVIDIA CORPORATION.  All rights reserved.
Container image (c) 2021, NVIDIA CORPORATION.  All rights reserved.

https://developer.nvidia.com/tensorrt

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version run /opt/tensorrt/install_opensource.sh.
To build the open source parsers, plugins, and samples for current top-of-tree on master or a different branch, run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

DeepDetect v0.18.0-dirty (dev)
GIT REF: heads/v0.18.0:23bd913ac180b56eddbf90c71d1f2e8bc2310c54
COMPILE_FLAGS: USE_CAFFE2=OFF USE_TF=OFF USE_NCNN=OFF USE_TORCH=OFF USE_HDF5=ON USE_CAFFE=OFF USE_TENSORRT=ON USE_TENSORRT_OSS=OFF USE_DLIB=OFF USE_CUDA_CV=OFF USE_SIMSEARCH=OFF USE_ANNOY=OFF USE_FAISS=ON USE_COMMAND_LINE=ON USE_JSON_API=ON USE_HTTP_SERVER=OFF
DEPS_VERSION: OPENCV_VERSION=4.2.0 CUDA_VERSION=11.3 CUDNN_VERSION= TENSORRT_VERSION=21.04
[2021-07-04 21:47:20.374] [api] [info] DeepDetect HTTP server listening on 0.0.0.0:8080

Then I create a service using an nsfw model
CALL

curl -X PUT http://localhost:8080/services/nsfw -d '{
   "description": "nsfw classification service",
   "model": {
    "repository": "/tmp/models/nsfw",
    "create_repository": true,
    "init":"https://deepdetect.com/models/init/desktop/images/classification/nsfw.tar.gz"
   },
   "mllib": "tensorrt",
   "type": "supervised",
   "parameters": {
    "input": {
     "connector": "image"
    }
   }
  }
  '

LOG

DEPS_VERSION: OPENCV_VERSION=4.2.0 CUDA_VERSION=11.3 CUDNN_VERSION= TENSORRT_VERSION=21.04
[2021-07-04 21:47:20.374] [api] [info] DeepDetect HTTP server listening on 0.0.0.0:8080
[2021-07-04 21:48:49.115] [api] [info] Downloading init model https://deepdetect.com/models/init/desktop/images/classification/nsfw.tar.gz
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchedNMS_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CoordConvAC version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CropAndResize version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CropAndResizeDynamic version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::DetectionLayer_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::FlattenConcat_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GenerateDetection_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GridAnchor_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GridAnchorRect_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::InstanceNormalization_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::LReLU_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::NMS_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::NMSDynamic_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Normalize_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::PriorBox_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ProposalLayer_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Proposal version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ProposalDynamic version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Region_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Reorg_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ResizeNearest_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::RPROI_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::SpecialSlice_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Split version 1
[2021-07-04 21:49:00.571] [nsfw] [info] trying to determine the input size...
[2021-07-04 21:49:00.585] [nsfw] [info] found 224x224 as input size
[2021-07-04 21:49:00.585] [api] [info] HTTP/1.1 "PUT /services/nsfw" <n/a> 201 11471ms

Then I launch many predictions with a fixed batche size using the script called dd_test.py that is pasted below
CALL

import json
import sys
import random


# Get random data
def get_random_images(number_images=1000, height=600, width=600):
    images = ["https://picsum.photos/id/{}/{}/{}".format(x, height, width) for x in range(number_images)]

    return images

LISTEN_URL = "http://localhost"
LISTEN_PORT = "8080"

NUMBER_IMAGES = 1000  # Number of images to use

clf_post ={
      "service":"NAME",
      "parameters":{
        "output":{
          "best": 3
        },
        "mllib": {
            "gpu": True
        }
      },
      "data": []
    }


services = {'nsfw': {'bbox': False, 'size': 224}}

url_images = get_random_images(NUMBER_IMAGES)
print(services)

# Launch predictions
nb_run=10
for j in range(nb_run):
    for i in range(0, NUMBER_IMAGES, 6):
        data = url_images[i:i+6]
        for elem, val in services.items():
            clf_post["data"] = data
            clf_post["service"] = elem
            tmp = requests.post("{}:{}/predict".format(LISTEN_URL, LISTEN_PORT), data=json.dumps(clf_post))

LOG

....
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(Pooling): pool, Tactic: -1, eltwise_stage3_block2[Float(1024,7,7)] -> pool[Float(1024,1,1)]
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(CublasConvolution): fc_nsfw, Tactic: 0, pool[Float(1024,1,1)] -> fc_nsfw[Float(2,1,1)]
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(SoftMax): prob, Tactic: 1001, fc_nsfw[Float(2,1,1)] -> prob[Float(2,1,1)]
[2021-07-04 21:50:05.285] [nsfw] [info] Allocated persistent device memory of size 31235584
[2021-07-04 21:50:05.286] [nsfw] [info] Allocated activation device memory of size 272154624
[2021-07-04 21:50:05.286] [nsfw] [info] Assigning persistent memory blocks for various profiles
[2021-07-04 21:50:05.286] [nsfw] [info] detected output dimensions: [2, 1 1 0]
[2021-07-04 21:50:05.534] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 8386ms
[2021-07-04 21:50:05.716] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 177ms
[2021-07-04 21:50:05.895] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 176ms
[2021-07-04 21:50:06.079] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 181ms
[2021-07-04 21:50:06.302] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 218ms
[2021-07-04 21:50:06.505] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 198ms
[2021-07-04 21:50:06.714] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 205ms
[2021-07-04 21:50:06.894] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 176ms
[2021-07-04 21:50:07.086] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 189ms
[2021-07-04 21:50:07.273] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 183ms

Now if you check the evolution of the RAM used we observe an increase (1644Mo at the beginning to 2095Mo after 5 minutes
after_5minutes_predictions
first_predictions
).

@rguilmont
Copy link

rguilmont commented Jul 5, 2021

More info on that : we managed to reproduce this issues with the above Python scripts, but not with curl.

So after testing more with @YaYaB, we had a strong intuition that it has something to do with the HTTP serving.
After analysing HTTP headers, we found that Python requests by default asks for GZIP encoded answer ( Accept-Encoding: gzip, deflate ) while curl doesn't.
So we manually set this header in curl, and finally reproduced the issue with curl too.

We also tested to send gzip-compressed queries, asking for uncompressed responses, and no memory leak was noticed. So really looks like it's something related to GZIP compression.

@YaYaB YaYaB changed the title Memory leak TRT predict requests Memory leak predict requests Jul 5, 2021
@YaYaB
Copy link
Contributor Author

YaYaB commented Jul 5, 2021

Actualy it is even not related to tensorrt but even with classical caffe predictions with or without gpu

@beniz beniz changed the title Memory leak predict requests Memory leak on compressed predict requests with oatpp Jul 6, 2021
@beniz
Copy link
Collaborator

beniz commented Jul 6, 2021

@rguilmont @YaYaB gzip/deflate encryption is handled by https://github.com/oatpp/oatpp-zlib from within https://github.com/oatpp/oatpp. The components are simply added here: https://github.com/jolibrain/deepdetect/blob/master/src/http/app_component.hpp#L114

Running valgrind on dede with gzip queries only shows the possible leak below. This looks like an init from libz directly, from the oatpp send function.

@lganzzzo Hi, the ::send function seems to leak from deflateInit, have you seen this before, or are we doing something wrong ? Thanks.

Libz init memory reported by valgrind:

==3020638== 536,192 (11,904 direct, 524,288 indirect) bytes in 2 blocks are definitely lost in loss record 4,799 of 4,801
==3020638==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3020638==    by 0x5AA3418: deflateInit2_ (in /lib/x86_64-linux-gnu/libz.so.1.2.11)
==3020638==    by 0x5AA3651: deflateInit_ (in /lib/x86_64-linux-gnu/libz.so.1.2.11)
==3020638==    by 0x71C063: oatpp::zlib::DeflateEncoder::DeflateEncoder(long, bool, int) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x71B2E8: oatpp::zlib::DeflateEncoderProvider::getProcessor() (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x6EA02D: oatpp::web::protocol::http::outgoing::Response::send(oatpp::data::stream::OutputStream*, oatpp::data::stream::BufferOutputStream*, oatpp::web::protocol::http::encoding::EncoderProvider*) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x6F6DB6: oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x6FB28F: oatpp::web::server::HttpProcessor::Task::run() (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x945BDE3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3020638==    by 0x936B608: start_thread (pthread_create.c:477)
==3020638==    by 0x9837292: clone (clone.S:95)

@lganzzzo
Copy link

lganzzzo commented Jul 6, 2021

Hey @beniz ,

Your code looks good. Most probably it's on oatpp side.
I'll take a closer look.

@beniz
Copy link
Collaborator

beniz commented Jul 18, 2021

Hi @lganzzzo how are things ? Do you have any fresh lead on this by any chance ? I've seen issues with libz a long time ago, this could still be outside oatpp.

@lganzzzo
Copy link

Hey @beniz ,

Yes, at this point it looks like a libz issue.
I'm filing an issue in oatpp to investigate possible fixes.

It might take a while

@rguilmont
Copy link

Thanks a lot guys.

FYI we've mitigated this gzip issue by setting an Envoy proxy in front of deepdetect, taking care of compression and decompression of requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants