-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocess stuck #7
Comments
Hi @nikhilrayaprolu, I faced the same problem with you, it was because the preprocessing script on whole cnndm training dataset took more than 32GB RAM. I would suggest you to split the train set into several parts, then merge them later after preprocess on those parts finished. |
thanks for the reply @geeraay |
@geeraay can you provide some more explanation on how the splitting and merging is done. Any accompanying code would really be helpful. |
I don't remember the exact step I've done back then, but the idea is this. I did something like Then I rename the generated files to After that you could run wait until the preprocessing step is done, then I manually copy and paste the generated files into one big Or you can simply use bigger RAM machine to preprocess without splitting the file. |
馃悰 Bug
On executing
python scripts/preprocess.py cnndm --mode pipeline
Preprocessing stuck at this point:
some of the oraclewords are not generated too.
Environment
pip
, source): sourceThe text was updated successfully, but these errors were encountered: