Reduce the memory footprint of the training script and few other bug fixes #231

Abhijit-2592 · 2019-12-24T12:49:23Z

Fixes #193
This Pull request addresses the following 3 problems:

Issue Train the model on chinese OntoNotes 5.0 but eat up all my 64GB memory #193 highlights that the training script hogs a huge amount of RAM memory. Looking into the code base, this is mainly attributed to: Loading the entire .npy files to memory. Thus I have provided an option to load them lazily so that RAM usage does not blow up. This option can be turned on by passing a --lazy flag to learn.py. Before this I was unable to train on the dataset (approx 8.5GB size in disk) generated using spacy's en_core_web_lg model on my laptop (16GB RAM and 6GB GPU). The dataset required more than 50GB of RAM to train. After this change the memory foot print is ~ 5GB for the same dataset and runs on my laptop without hiccups.
While creating the dataset it would be beneficial if user could pass the required spacy model.
The pull request fix training error when training on GPU #230 attempts to fix training on GPU but introduces a new bug which throws an error while training on CPU. This is fixed here. Now the training can be done both on GPU and CPU as required.

…g the dataset

…y using --lazy flag

svlandeg

Looks great !

neuralcoref/train/conllparser.py

neuralcoref/train/dataset.py

neuralcoref/train/learn.py

Abhijit-2592 · 2019-12-25T11:02:56Z

@svlandeg done

SysDevHayes · 2020-06-12T10:30:24Z

It seems did not work. I trained a dataset of approximately 8.5 Gb on a machine with 62GB of RAM and 8GB GPU. Still I got this error:

svlandeg · 2021-02-18T14:51:39Z

@EricLe-dev : apologies for the late follow-up, but did you pull the recent master branch for neuralcoref and did you install it from source? Because this fix isn't part of any release yet...

Abhijit-2592 added 7 commits December 17, 2019 10:07

cast tensors to CUDA to work with GPU

5310973

add move to data subdirectory in training.md

5986a6e

make the data directory creation instruction obvious

3f9b72f

option for the user to specify the required spacy model while creatin…

56ef5a1

…g the dataset

make training work in both GPU and CPU as required

0f11ad3

reduce the memory footprint of the training script. This is toggled b…

57d9f5e

…y using --lazy flag

resolve conflict

71c6de1

svlandeg added enhancement perf / memory training labels Dec 24, 2019

svlandeg approved these changes Dec 24, 2019

View reviewed changes

neuralcoref/train/conllparser.py Outdated Show resolved Hide resolved

neuralcoref/train/conllparser.py Outdated Show resolved Hide resolved

neuralcoref/train/dataset.py Show resolved Hide resolved

neuralcoref/train/learn.py Outdated Show resolved Hide resolved

Abhijit-2592 added 2 commits December 25, 2019 11:49

minor changes from code review

dbd98cf

removed print

6b5a146

Abhijit-2592 mentioned this pull request Dec 25, 2019

Train the model on chinese OntoNotes 5.0 but eat up all my 64GB memory #193

Closed

svlandeg merged commit 84f29f4 into huggingface:master Dec 25, 2019

Abhijit-2592 mentioned this pull request Jan 2, 2020

Training is slow with the lazy mode enabled #234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce the memory footprint of the training script and few other bug fixes #231

Reduce the memory footprint of the training script and few other bug fixes #231

Abhijit-2592 commented Dec 24, 2019 •

edited

Loading

Uh oh!

svlandeg left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Abhijit-2592 commented Dec 25, 2019

Uh oh!

SysDevHayes commented Jun 12, 2020

Uh oh!

svlandeg commented Feb 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reduce the memory footprint of the training script and few other bug fixes #231

Reduce the memory footprint of the training script and few other bug fixes #231

Conversation

Abhijit-2592 commented Dec 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

svlandeg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Abhijit-2592 commented Dec 25, 2019

Uh oh!

SysDevHayes commented Jun 12, 2020

Uh oh!

svlandeg commented Feb 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Abhijit-2592 commented Dec 24, 2019 •

edited

Loading