Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading datasets #26

Closed
andreamad8 opened this issue Aug 28, 2018 · 3 comments
Closed

Loading datasets #26

andreamad8 opened this issue Aug 28, 2018 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@andreamad8
Copy link

While running this command as suggested in the repo:
nvidia-docker run -it --rm -v `pwd`:/decaNLP/ -u $(id -u):$(id -g) decanlp bash -c "python /decaNLP/train.py --train_tasks squad iwslt.en.de cnn_dailymail multinli.in.out sst srl zre woz.en wikisql schema --train_iterations 1 --gpu 1"
It can download and process all the following datasets: squad iwslt.en.de cnn_dailymail multinli.in.out sst srl zre.
Once it arrives to woz.en I have the following error:

process_main - zre has 840000 training examples
process_main - Loading woz.en
process_main - Adding woz.en to training datasets
downloading woz_train_en.json
downloading woz_test_de.json
downloading woz_test_en.json
downloading woz_train_de.json
downloading woz_validate_de.json
downloading woz_validate_en.json
Traceback (most recent call last):
File "/decaNLP/train.py", line 365, in
main()
File "/decaNLP/train.py", line 352, in main
field, train_sets, val_sets = prepare_data(args, field, logger)
File "/decaNLP/train.py", line 67, in prepare_data
split = get_splits(args, task, FIELD, **kwargs)[0]
File "/decaNLP/util.py", line 138, in get_splits
fields=FIELD, root=args.data, **kwargs)
File "/decaNLP/text/torchtext/datasets/generic.py", line 976, in splits
os.path.join(path, f'{train}.jsonl'), fields, **kwargs)
File "/decaNLP/text/torchtext/datasets/generic.py", line 897, in init
ex = data.Example.fromlist([context, question, answer, CONTEXT_SPECIAL, QUESTION_SPECIAL, context_question, woz_id], fields)
File "/decaNLP/text/torchtext/data/example.py", line 62, in fromlist
setattr(ex, name, [sys.intern(x) for x in field.preprocess(val)])
TypeError: 'int' object is not iterable

Thanks in advance

Andrea

@bmccann
Copy link
Contributor

bmccann commented Aug 28, 2018

Looks like I broke this recently trying to save on memory. I’ll have this fixed today.

@bmccann bmccann self-assigned this Aug 29, 2018
@bmccann bmccann added the bug Something isn't working label Aug 29, 2018
@bmccann
Copy link
Contributor

bmccann commented Aug 29, 2018

847a9dd should fix this -- let me know if not

@andreamad8
Copy link
Author

Issued solved. Thank you a lot for the quick response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants