Skip to content

Conversation

@Abhijit-2592
Copy link
Contributor

@Abhijit-2592 Abhijit-2592 commented Dec 24, 2019

Fixes #193
This Pull request addresses the following 3 problems:

  1. Issue Train the model on chinese OntoNotes 5.0 but eat up all my 64GB memory #193 highlights that the training script hogs a huge amount of RAM memory. Looking into the code base, this is mainly attributed to: Loading the entire .npy files to memory. Thus I have provided an option to load them lazily so that RAM usage does not blow up. This option can be turned on by passing a --lazy flag to learn.py. Before this I was unable to train on the dataset (approx 8.5GB size in disk) generated using spacy's en_core_web_lg model on my laptop (16GB RAM and 6GB GPU). The dataset required more than 50GB of RAM to train. After this change the memory foot print is ~ 5GB for the same dataset and runs on my laptop without hiccups.
  2. While creating the dataset it would be beneficial if user could pass the required spacy model.
  3. The pull request fix training error when training on GPU #230 attempts to fix training on GPU but introduces a new bug which throws an error while training on CPU. This is fixed here. Now the training can be done both on GPU and CPU as required.

Copy link
Collaborator

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great !

@Abhijit-2592
Copy link
Contributor Author

@svlandeg done

@svlandeg svlandeg merged commit 84f29f4 into huggingface:master Dec 25, 2019
@SysDevHayes
Copy link

It seems did not work. I trained a dataset of approximately 8.5 Gb on a machine with 62GB of RAM and 8GB GPU. Still I got this error:
image

@svlandeg
Copy link
Collaborator

@EricLe-dev : apologies for the late follow-up, but did you pull the recent master branch for neuralcoref and did you install it from source? Because this fix isn't part of any release yet...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Train the model on chinese OntoNotes 5.0 but eat up all my 64GB memory

3 participants