Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

somethings goes wrong after loading data #4

Open
tanyuqian opened this issue Jan 3, 2019 · 3 comments
Open

somethings goes wrong after loading data #4

tanyuqian opened this issue Jan 3, 2019 · 3 comments

Comments

@tanyuqian
Copy link

Hi,

I encountered a problem when using your code.
here is the command I used:

python script_bin/train_model.py \
	--trainer	--train-inputs cnn-dailymail/inputs/train \
			--train-labels cnn-dailymail/labels/train \
			--valid-inputs cnn-dailymail/inputs/valid \
			--valid-labels cnn-dailymail/labels/valid \
			--valid-refs cnn-dailymail/human-abstracts/valid \
			--weighted \
			--gpu 0 \
			--seed 12345678 \
	--emb	--pretrained-embeddings glove/glove.6B.200d.txt \
	--enc cnn \
	--ext s2s --bidirectional

and here is the msg on the screen:

{'train_inputs': PosixPath('cnn-dailymail/inputs/train'), 'train_labels': PosixPath('cnn-dailymail/labels/train'), 'valid_inputs': PosixPath('cnn-dailymail/inputs/valid'), 'valid_labels': PosixPath('cnn-dailymail/labels/valid'), 'valid_refs': PosixPath('cnn-dailymail/human-abstracts/valid'), 'seed': 12345678, 'epochs': 50, 'batch_size': 32, 'gpu': 0, 'teacher_forcing': 25, 'sentence_limit': 50, 'weighted': True, 'loader_workers': 8, 'raml_samples': 25, 'raml_temp': 0.05, 'summary_length': 100, 'remove_stopwords': False, 'shuffle_sents': False, 'model': None, 'results': None}

{'embedding_size': 200, 'pretrained_embeddings': 'glove/glove.6B.200d.txt', 'top_k': None, 'at_least': 1, 'word_dropout': 0.0, 'embedding_dropout': 0.25, 'update_rule': 'fix-all', 'filter_pretrained': False}

{'dropout': 0.25, 'filter_windows': [1, 2, 3, 4, 5, 6], 'feature_maps': [25, 25, 50, 50, 50, 50], 'OPT': 'cnn'}

{'hidden_size': 300, 'bidirectional': True, 'rnn_dropout': 0.25, 'num_layers': 1, 'cell': 'gru', 'mlp_layers': [100], 'mlp_dropouts': [0.25], 'OPT': 's2s'}
Initializing vocabulary and embeddings.
INFO:root: Reading pretrained embeddings from glove/glove.6B.200d.txt
INFO:root: Read 400002 embeddings of size 200
INFO:root: EmbeddingContext(
  (embeddings): Embedding(400002, 200, padding_idx=0)
)
Loading training data.
Loading validation data.
INFO:root: Computing class weights...
287113/287113
INFO:root: Counts y=0: 8980574, y=1 1119138
INFO:root: Reweighting y=1 by 8.024545677119354

Placing model on device: 0
INFO:root: Model parameter initialization started.
INFO:root: EmbeddingContext initialization started.
INFO:root: Initializing with pretrained embeddings.
INFO:root: EmbeddingContext initialization finished.
INFO:root: CNNSentenceEncoder initialization started.
INFO:root: filters.0.weight (25,1,1,1,200): Xavier normal init.
INFO:root: filters.0.bias (25): constant (0) init.
INFO:root: filters.1.weight (25,1,1,2,200): Xavier normal init.
INFO:root: filters.1.bias (25): constant (0) init.
INFO:root: filters.2.weight (50,1,1,3,200): Xavier normal init.
INFO:root: filters.2.bias (50): constant (0) init.
INFO:root: filters.3.weight (50,1,1,4,200): Xavier normal init.
INFO:root: filters.3.bias (50): constant (0) init.
INFO:root: filters.4.weight (50,1,1,5,200): Xavier normal init.
INFO:root: filters.4.bias (50): constant (0) init.
INFO:root: filters.5.weight (50,1,1,6,200): Xavier normal init.
INFO:root: filters.5.bias (50): constant (0) init.
INFO:root: CNNSentenceEncoder initialization finished.
INFO:root: Seq2SeqSentenceExtractor initialization started.
INFO:root: decoder_start (250): random normal init.
INFO:root: encoder_rnn.weight_ih_l0 (900,250): Xavier normal init.
INFO:root: encoder_rnn.weight_hh_l0 (900,300): Xavier normal init.
INFO:root: encoder_rnn.bias_ih_l0 (900): constant (0) init.
INFO:root: encoder_rnn.bias_hh_l0 (900): constant (0) init.
INFO:root: encoder_rnn.weight_ih_l0_reverse (900,250): Xavier normal init.
INFO:root: encoder_rnn.weight_hh_l0_reverse (900,300): Xavier normal init.
INFO:root: encoder_rnn.bias_ih_l0_reverse (900): constant (0) init.
INFO:root: encoder_rnn.bias_hh_l0_reverse (900): constant (0) init.
INFO:root: decoder_rnn.weight_ih_l0 (900,250): Xavier normal init.
INFO:root: decoder_rnn.weight_hh_l0 (900,300): Xavier normal init.
INFO:root: decoder_rnn.bias_ih_l0 (900): constant (0) init.
INFO:root: decoder_rnn.bias_hh_l0 (900): constant (0) init.
INFO:root: decoder_rnn.weight_ih_l0_reverse (900,250): Xavier normal init.
INFO:root: decoder_rnn.weight_hh_l0_reverse (900,300): Xavier normal init.
INFO:root: decoder_rnn.bias_ih_l0_reverse (900): constant (0) init.
INFO:root: decoder_rnn.bias_hh_l0_reverse (900): constant (0) init.
INFO:root: mlp.0.weight (100,1200): Xavier normal init.
INFO:root: mlp.0.bias (100): constant (0) init.
INFO:root: mlp.3.weight (1,100): Xavier normal init.
INFO:root: mlp.3.bias (1): constant (0) init.
INFO:root: Seq2SeqSentenceExtractor initialization finished.
INFO:root: Model parameter initialization finished.

INFO:ignite.engine.engine.Engine:Engine run starting with max_epochs=50
ERROR:ignite.engine.engine.Engine:Current run is terminating due to exception: 'NoneType' object has no attribute 'data'
ERROR:ignite.engine.engine.Engine:Engine run is terminating due to exception: 'NoneType' object has no attribute 'data'
Traceback (most recent call last):
  File "script_bin/train_model.py", line 79, in <module>
    main()
  File "script_bin/train_model.py", line 76, in main
    results_path=args["trainer"]["results"])
  File "/home/bowen/packages/anaconda3/lib/python3.6/site-packages/nnsum-1.0-py3.6.egg/nnsum/trainer/labels_mle_trainer.py", line 164, in labels_mle_trainer
  File "/home/bowen/packages/anaconda3/lib/python3.6/site-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 326, in run
    self._handle_exception(e)
  File "/home/bowen/packages/anaconda3/lib/python3.6/site-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 291, in _handle_exception
    raise e
  File "/home/bowen/packages/anaconda3/lib/python3.6/site-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 313, in run
    hours, mins, secs = self._run_once_on_dataset()
  File "/home/bowen/packages/anaconda3/lib/python3.6/site-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 280, in _run_once_on_dataset
    self._handle_exception(e)
  File "/home/bowen/packages/anaconda3/lib/python3.6/site-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 291, in _handle_exception
    raise e
  File "/home/bowen/packages/anaconda3/lib/python3.6/site-packages/pytorch_ignite-0.1.2-py3.6.egg/ignite/engine/engine.py", line 272, in _run_once_on_dataset
    self.state.output = self._process_function(self, batch)
  File "/home/bowen/packages/anaconda3/lib/python3.6/site-packages/nnsum-1.0-py3.6.egg/nnsum/trainer/labels_mle_trainer.py", line 188, in _update
AttributeError: 'NoneType' object has no attribute 'data'

I don't know what is the problem? Hope you can help me:)

Thanks!

@avineshpvs
Copy link

avineshpvs commented Feb 1, 2019

Since the very recent version pytorch, the gradient buffers are initialized only when needed.
I could find a workaround by adding a check before clipping the gradients in nnsum/trainer/labels_mle_trainer.py

if hasattr(param.grad, 'data'):

for param in model.parameters():
param.grad.data.clamp_(-grad_clip, grad_clip)

@tlifcen
Copy link

tlifcen commented Aug 16, 2019

Hello,
I'd like to test the cnn-dailymail dataset, but I can not download the dataset. So, I used reddit dataset to test the code. I tried with the command:

python script_bin/train_model.py \
    --trainer --train-inputs  ../dataset/summarization/nnsum/reddit/reddit/inputs/train\
              --train-labels ../dataset/summarization/nnsum/reddit/reddit/abs_lables/train \
              --valid-inputs ../dataset/summarization/nnsum/reddit/reddit/inputs/valid \
              --valid-labels ../dataset/summarization/nnsum/reddit/reddit/abs_lables/valid \
              --valid-refs ../dataset/summarization/nnsum/reddit/reddit/human-abstracts/valid \
              --weighted \
              --gpu 0 \
              --model ../dataset/summarization/nnsum/reddit/model \
              --results ../dataset/summarization/nnsum/reddit/val_score \
              --seed 12345678 \
    --emb --pretrained-embeddings ../dataset/embedding/eng_word_embedding.word2vec.vec \
    --enc cnn \
    --ext s2s --bidirectional

However, some reference problems occurred:

{'train_inputs': PosixPath('../dataset/summarization/nnsum/reddit/reddit/inputs/train'), 'train_labels': PosixPath('../dataset/summarization/nnsum/reddit/reddit/abs_lables/train'), 'valid_inputs': PosixPath('../dataset/summarization/nnsum/reddit/reddit/inputs/valid'), 'valid_labels': PosixPath('../dataset/summarization/nnsum/reddit/reddit/abs_lables/valid'), 'valid_refs': PosixPath('../dataset/summarization/nnsum/reddit/reddit/human-abstracts/valid'), 'seed': 12345678, 'epochs': 50, 'batch_size': 32, 'gpu': 0, 'teacher_forcing': 25, 'sentence_limit': 50, 'weighted': True, 'loader_workers': 8, 'raml_samples': 25, 'raml_temp': 0.05, 'summary_length': 100, 'remove_stopwords': False, 'shuffle_sents': False, 'model': PosixPath('../dataset/summarization/nnsum/reddit/model'), 'results': PosixPath('../dataset/summarization/nnsum/reddit/val_score')}

{'embedding_size': 200, 'pretrained_embeddings': '../dataset/embedding/eng_word_embedding.word2vec.vec', 'top_k': None, 'at_least': 1, 'word_dropout': 0.0, 'embedding_dropout': 0.25, 'update_rule': 'fix-all', 'filter_pretrained': False}

{'dropout': 0.25, 'filter_windows': [1, 2, 3, 4, 5, 6], 'feature_maps': [25, 25, 50, 50, 50, 50], 'OPT': 'cnn'}

{'hidden_size': 300, 'bidirectional': True, 'rnn_dropout': 0.25, 'num_layers': 1, 'cell': 'gru', 'mlp_layers': [100], 'mlp_dropouts': [0.25], 'OPT': 's2s'}
Initializing vocabulary and embeddings.
INFO:root: Reading pretrained embeddings from ../dataset/embedding/eng_word_embedding.word2vec.vec
INFO:root: Read 559185 embeddings of size 200
INFO:root: EmbeddingContext(
  (embeddings): Embedding(559185, 200, padding_idx=0)
)
Loading training data.
Loading validation data.
Traceback (most recent call last):
  File "script_bin/train_model.py", line 79, in <module>
    main()
  File "script_bin/train_model.py", line 48, in main
    sentence_limit=args["trainer"]["sentence_limit"])
  File "/home/constant/anaconda3/lib/python3.7/site-packages/nnsum-1.0-py3.7.egg/nnsum/data/summarization_dataset.py", line 30, in __init__
  File "/home/constant/anaconda3/lib/python3.7/site-packages/nnsum-1.0-py3.7.egg/nnsum/data/summarization_dataset.py", line 59, in _collect_references
Exception: No references found for example id: 12he9h.32

I looked the data files, there are 12he9h.32.a.txt and 12he9h.32.d.txt in the path reddit/human-abstracts/valid. I do not know where the problem is.

Thanks for any help! @kedz

@alexookah
Copy link

@tlifcen i encountered the same problem.
when you run python script_bin/train_model.py make sure you are using the correct paths.

assume your current directory is from python_script directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants