Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why GPU Memory is not released after the pipeline is finished? #13114

Closed
Hansyvea opened this issue Nov 8, 2023 · 0 comments
Closed

Why GPU Memory is not released after the pipeline is finished? #13114

Hansyvea opened this issue Nov 8, 2023 · 0 comments

Comments

@Hansyvea
Copy link

Hansyvea commented Nov 8, 2023

Hi, I am using trf model to do a pipeline. It runs good with the first batch and fails to load the second batch because vram is not released.
I tried to reduce the data fed into the pipeline and after the pipeline is finished, the vram is still occupied.
How to release the vram after one batch of work is done?

How to reproduce the behaviour

def spacy_tokenise(df: pd.DataFrame, batch_size: int):
    from thinc.api import set_gpu_allocator, require_gpu

    # manage gpu vram
    set_gpu_allocator("pytorch")
    # use GPU
    # spacy.require_gpu()
    require_gpu(0)
    # Check if spaCy is using GPU
    print("spaCy is using GPU: ", spacy.prefer_gpu())
    # load model
    model = spacy.load("en_core_web_trf")
    docs = model.pipe(df.TEXT, batch_size=batch_size)
    res = []
    for doc in tqdm(docs, total=len(df.TEXT), desc="spaCy pipeline"):
        for sent in doc.sents:
            lst_token = [word.text for word in sent]
            lst_pos = [word.pos_ for word in sent]
            lst_lemma = [word.lemma_ for word in sent]
            lst_ner_token = [ent.text for ent in sent.ents]
            lst_ner_label = [ent.label_ for ent in sent.ents]
            if len(lst_ner_token) == 0:
                lst_ner_token = np.nan
                lst_ner_label = np.nan
            res.append(
                {
                    "token": lst_token,
                    "pos": lst_pos,
                    "lemma": lst_lemma,
                    "ner_token": lst_ner_token,
                    "ner_label": lst_ner_label,
                }
            )
    res = pd.DataFrame(res)
    return res

Your Environment

  • Operating System: WSL2 Ubuntu 20
  • Python Version Used: 3.10
  • spaCy Version Used: 3
  • Environment Information: Cuda 12.3
@explosion explosion locked and limited conversation to collaborators Nov 8, 2023
@shadeMe shadeMe converted this issue into discussion #13117 Nov 8, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant