Memory leak when reloading model #11863

leonballonigomes · 2022-11-23T20:57:23Z

leonballonigomes
Nov 23, 2022

Hello, we have three models trained and provided with FAST API and we are currently suffering from memory leak due to spacy.vocab accumulating with every request. (more about here:#10015). The solution proposed was to reload the model with a certain criteria but all the time we reload it the models are incrementing the usage of memory.

The memory profiller used. -> memory-profiler

Every reloading we were deleting all content from our models dict created as a model caching and collecting it with gc. This cleans around 30% of all memory usage.

After reloading, we noticed that every loading is increasing a dict type data within spacy.load

Model graph -> built with objgraph

We added a temporary API to test the reloading and find the issue:

model_manager is a class built that manages all spacy model loaded.

The managar build all models and saves it within a dict attribute

When reloading, the model checks if there is something within the dict and erase it

API code

# pylint: disable=C0114, C0116, W0703, W0707
import logging
import uvicorn
from fastapi.responses import JSONResponse
from fastapi import FastAPI, HTTPException, Request, status
from model_manager import ModelManager
from app.utils.memory_checker import memory_check
from gc import collect
import objgraph

app = FastAPI()
logger = logging.getLogger("Financial NLP")

model_manager = ModelManager(logger)


@app.middleware('http')
async def exception_handling(request: Request, call_next):
    try:
        return await call_next(request)
    except KeyError:
        return JSONResponse(
            status_code=status.HTTP_400_BAD_REQUEST,
            content={"messages": "Received data is not a valid JSON"}
        )
    except Exception:
        return JSONResponse(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            content={'message': 'Unexpected error occurred...'}
        )


@app.post("/reload")
async def nlp(request: Request):

    print(model_manager.models['NER'])
    model_manager.models['NER'].nlp_model = None
    model_manager.models['TAGS'].nlp_model = None
    model_manager.models['RELEVANCE'].nlp_model = None
    print('after cleaning: ', memory_check())
    print(objgraph.show_growth())
    collect()
    model_manager.build_models()
    print(model_manager.models['NER'])
    print('after reloading: ', memory_check())
    print(objgraph.show_growth())



if __name__ == "__main__":
    model_manager.build_models()
    uvicorn.run(app, host="0.0.0.0", port=8000)

Model build and manager

# pylint: disable=C0116, C0115, C0114, C0103, R0903

import os
import logging
import shutil
import json
from google.oauth2 import service_account
from google.cloud import storage
import utils
from model import Model
from memory_profiler import profile
from gc import collect
from app.utils.memory_checker import memory_check
from spacy import load
import objgraph

class ModelManager:

    BUCKET_NAME = "hml"

    models = {}

    @profile
    def __init__(self, logger: logging.Logger):
        self.log = logger

    @profile
    def __download_spacy_model(self, model: Model):

    # @profile
    def build_models(self):
        print('0', memory_check())
        self.__check_models()
        print('second cleaning', objgraph.show_growth())
        models = utils.get_data_model()
        print('get data ', objgraph.show_growth())
        print('1', memory_check())
        for model in models:
            print('models ', objgraph.show_growth())
            self.models[model.model_name] = model
            print('get spacy', objgraph.show_growth())
            self.__get_spacy_model(self.models[model.model_name])
        self.log.info(self.models)
        print('2', memory_check())

    # @profile
    def __get_spacy_model(self, model):
        if not os.path.exists(model.api_path):
            os.makedirs(model.api_path)
            self.log.info("required model download")
            self.__download_spacy_model(model)
            self.log.info("downloading")
            self.log.info("download finished")

        print(model)
        print('before building ', objgraph.show_growth())

        model.nlp_model = load(model.api_path)
        print('after building ', objgraph.show_growth())
        print('3', memory_check())
        collect()

    # @profile
    def __check_models(self) -> None:
        '''Checks if the model is already built or if this is a fresh restart'''
        if self.models:
            # self.models.clear()
            self.models = {}
            self.models['NER'] = None
            self.models['TAGS'] = None
            self.models['RELEVANCE'] = None
        print('Cleaning', memory_check())

Model


from dataclasses import dataclass
import spacy
@dataclass
class Model:

    api_path: str
    model_name: str
    nlp_model: spacy.language
    storage_path: str

Answered by leonballoni

Nov 29, 2022

@adrianeboyd The following avoided any complications:

We changed our api web server app to gunicorn. This was due to two points: 1- it supports preload and 2- we could setup max-request limit to forcely restart each worker. This was something that the uvicorn wasn't best suited for.
We had issues with preload since workers were not able to return our requests. This was a complication caused by thread safety and was resolved with 1- force set_num_threads before loading spacy package.

With this two modifications we achieved:

Memory leak issue was handled with preloading preserving the master worker app load
The new vocab added only changed the memory from slave workers which was limited …

View full answer

adrianeboyd · 2022-11-24T09:17:38Z

adrianeboyd
Nov 24, 2022

It's really hard for us to follow what might be going on here. Could you please provide code and text output as markdown code blocks rather than screenshots? See: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks

In general we're happy to try and help debug cases like this because we would definitely want to address memory leaks, but it's easiest for us to get started if there's a single script we can copy and run to reproduce the problem, like a minimal FastAPI example that does the model reloading.

0 replies

leonballonigomes · 2022-11-24T14:36:38Z

leonballonigomes
Nov 24, 2022
Author

Good morning, sure here it goes.
Requirements:
uvicorn==0.17.6
pydantic==1.8.2
fastapi==0.63.0
spacy==3.4.1

Model_manager.py

# pylint: disable=C0116, C0115, C0114, C0103, R0903

import os
import logging

 from model import Model
from memory_profiler import profile
from spacy import load

class ModelManager:

    BUCKET_NAME = "nlp_hml"

    models = {}

    @profile
    def __init__(self, logger: logging.Logger):
        self.log = logger

    @profile
    def __download_spacy_model(self, model: Model):

        pass

    # @profile
    def build_models(self):
        self.__check_models()
        models = [Model(api_path='./models/tags/tags_v2', model_name='TAGS', nlp_model=None, storage_path="pt_core_news_lg"), Model(api_path='./models/relevance_filter/relevance_v2', model_name='RELEVANCE_FILTER', nlp_model=None, storage_path="pt_core_news_lg"), Model(api_path='./models/ner/ner_v3', model_name='NER', nlp_model=None, storage_path="pt_core_news_lg")]
        # this is a simulation of the 3 loading models using spacy package  pt_core_news_lg
        for model in models:
            self.models[model.model_name] = model
            self.__get_spacy_model(self.models[model.model_name])
        self.log.info(self.models)

    @profile
    def __get_spacy_model(self, model):
        if not os.path.exists(model.api_path):
            os.makedirs(model.api_path)
            self.log.info("required model download")
            self.__download_spacy_model(model)
            self.log.info("downloading")
            self.log.info("download finished")
        model.nlp_model = load(model.api_path)

    # @profile
    def __check_models(self) -> None:
        '''Checks if the model is already built or if this is a fresh restart'''
        if self.models:
            self.models = {}
            self.models['NER'] = None
            self.models['TAGS'] = None
            self.models['RELEVANCE_FILTER'] = None

Model.py

# pylint: disable=C0115, C0114, C0103

from dataclasses import dataclass
import spacy
@dataclass
class Model:

    api_path: str
    model_name: str
    nlp_model: spacy.language
    storage_path: str

api.py

# pylint: disable=C0114, C0116, W0703, W0707
import logging
import uvicorn
from fastapi.responses import JSONResponse
from fastapi import FastAPI, HTTPException, Request, status
from model_manager import ModelManager
from app.utils.memory_checker import memory_check
from gc import collect

app = FastAPI()
logger = logging.getLogger("Financial NLP")

model_manager = ModelManager(logger)

@app.middleware('http')
async def exception_handling(request: Request, call_next):
    try:
        return await call_next(request)
    except KeyError:
        return JSONResponse(
            status_code=status.HTTP_400_BAD_REQUEST,
            content={"messages": "Received data is not a valid JSON"}
        )
    except Exception:
        return JSONResponse(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            content={'message': 'Unexpected error occurred...'}
        )

@app.post("/reload")
async def nlp(request: Request):

    print(model_manager.models['NER'])
    model_manager.models['NER'].nlp_model = None
    model_manager.models['TAGS'].nlp_model = None
    model_manager.models['RELEVANCE_FILTER'].nlp_model = None
    print('after cleaning: ', memory_check())
    collect()
    model_manager.build_models()
    print(model_manager.models['NER'])
    print('after reloading: ', memory_check())


if __name__ == "__main__":
    model_manager.build_models()
    uvicorn.run(app, host="0.0.0.0", port=8000)

memory_checker.py

from psutil import Process
from os import getpid

def memory_check():
    process = int(Process(getpid()).memory_info().rss/1000000)
    return process

2 replies

adrianeboyd Nov 25, 2022

Thanks, the details are helpful!

I think inspecting Process(getpid()).memory_info().rss is misleading because the python process doesn't always give all of the freed memory back to the OS.

Try this script to see: #10015 (reply in thread)

A more detailed explanation in a related thread: #11086 (comment)

leonballonigomes Nov 25, 2022
Author

Thank you, we noticed the same. We thought about it in two ways:

Instead of reloading the spacy model, limit the pod to restard until it reaches a certain threshold. This would avoid API unavailability.
Built inside the API a config to reload the workers after a certain time or threshold. But we couldn't find anything that is sure to guarantee this will return the memory to what it`s supposed to.

Are there any other options you could help us with please?

leonballoni · 2022-11-29T01:08:30Z

leonballoni
Nov 29, 2022

@adrianeboyd The following avoided any complications:

We changed our api web server app to gunicorn. This was due to two points: 1- it supports preload and 2- we could setup max-request limit to forcely restart each worker. This was something that the uvicorn wasn't best suited for.
We had issues with preload since workers were not able to return our requests. This was a complication caused by thread safety and was resolved with 1- force set_num_threads before loading spacy package.

With this two modifications we achieved:

Memory leak issue was handled with preloading preserving the master worker app load
The new vocab added only changed the memory from slave workers which was limited by max_request restart.
Therefore, making unnecessary to force spacy reloading the model of master worker.

For those in similar need, here is a code sample for what we did:

gunicorn.conf.py -> basic config file

wsgi_app = "api:app"
bind = "0.0.0.0:8000"
worker_class = "uvicorn.workers.UvicornWorker"
#"sync"
workers = 3
threads = 5
max_requests=100
max_requests_jitter=10
graceful_timeout = 30
preload_app = True

main.py -> running the api from (remenber to setup your api routing!)

import subprocess
subprocess.run(["gunicorn"])

spacy.load sample

from spacy import load
from torch import set_num_threads
set_num_threads(1) # THIS IS VITAL for running your api as preload with slave workers forking the master
model.nlp_model = load(model.api_path)

2 replies

adrianeboyd Nov 29, 2022

Thanks for reporting back with your solution!

lsmith77 Jan 3, 2024

We had issues with preload since workers were not able to return our requests.

What was the effect of this issue?

Also I assume the set_num_threads() is only necessary when using the transformer spacy models, correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when reloading model #11863

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Memory leak when reloading model #11863

leonballonigomes Nov 23, 2022

Replies: 3 comments · 4 replies

adrianeboyd Nov 24, 2022

leonballonigomes Nov 24, 2022 Author

adrianeboyd Nov 25, 2022

leonballonigomes Nov 25, 2022 Author

leonballoni Nov 29, 2022

adrianeboyd Nov 29, 2022

lsmith77 Jan 3, 2024

leonballonigomes
Nov 23, 2022

Replies: 3 comments 4 replies

adrianeboyd
Nov 24, 2022

leonballonigomes
Nov 24, 2022
Author

leonballonigomes Nov 25, 2022
Author

leonballoni
Nov 29, 2022