Preprocess stuck #7

nikhilrayaprolu · 2021-05-14T08:08:14Z

🐛 Bug

On executing python scripts/preprocess.py cnndm --mode pipeline
Preprocessing stuck at this point:

some of the oraclewords are not generated too.

Environment

fairseq Version (e.g., 1.0 or master): recommended commit
PyTorch Version (e.g., 1.0) : 1.8
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): source
Build command you used (if compiling from source): source
Python version: 3.6.8
CUDA/cuDNN version: 10.2

The text was updated successfully, but these errors were encountered:

nikhilrayaprolu · 2021-05-14T08:08:49Z

@jxhe @muggin

geeraay · 2021-05-23T14:19:56Z

Hi @nikhilrayaprolu,

I faced the same problem with you, it was because the preprocessing script on whole cnndm training dataset took more than 32GB RAM. I would suggest you to split the train set into several parts, then merge them later after preprocess on those parts finished.

nikhilrayaprolu · 2021-05-24T06:04:45Z

thanks for the reply @geeraay

nikhilrayaprolu · 2021-05-25T12:17:34Z

@geeraay can you provide some more explanation on how the splitting and merging is done. Any accompanying code would really be helpful.

geeraay · 2021-05-26T06:22:20Z

I don't remember the exact step I've done back then, but the idea is this.

I did something like
split -n l/${nsplit} /path-to-file/train.source /path-to-file/train.source.
it will create train.source.00, train.source.01, ... , train.source.${nsplit}

Then I rename the generated files to
train_1.source, train_2.source, ..., train_${nsplit}.source.

After that you could run
python scripts/preprocess.py cnndm --mode pipeline --split train_1,train_2,...,train_${nsplit}

wait until the preprocessing step is done, then I manually copy and paste the generated files into one big train.source file.

Or you can simply use bigger RAM machine to preprocess without splitting the file.

nikhilrayaprolu added the bug Something isn't working label May 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocess stuck #7

Preprocess stuck #7

nikhilrayaprolu commented May 14, 2021

nikhilrayaprolu commented May 14, 2021

geeraay commented May 23, 2021 •

edited

Loading

nikhilrayaprolu commented May 24, 2021

nikhilrayaprolu commented May 25, 2021

geeraay commented May 26, 2021

Preprocess stuck #7

Preprocess stuck #7

Comments

nikhilrayaprolu commented May 14, 2021

🐛 Bug

Environment

nikhilrayaprolu commented May 14, 2021

geeraay commented May 23, 2021 • edited Loading

nikhilrayaprolu commented May 24, 2021

nikhilrayaprolu commented May 25, 2021

geeraay commented May 26, 2021

geeraay commented May 23, 2021 •

edited

Loading