Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train XNMT with datasets that do not fit in RAM? #585

Open
rrmariani opened this issue Jan 7, 2020 · 1 comment
Open

How to train XNMT with datasets that do not fit in RAM? #585

rrmariani opened this issue Jan 7, 2020 · 1 comment

Comments

@rrmariani
Copy link

rrmariani commented Jan 7, 2020

I have source and target files that are 15GB each and the system crashes after allocating all the 32GB RAM I have and the 2GB of swap disk.

I am now trying with 15 pairs of 1GB files...

Is there a way to tell XNMT how to train on very large files?

@rrmariani
Copy link
Author

I found an answer to my problem in the doc...

"sample_train_sents – If given, load a random subset of training sentences before each epoch. Useful when training data does not fit in memory."

I guess we need to read a large corpus by consecutive chunks as well, to make sure we cover the entire data set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant