You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.
I am running the latest master branch (offline), and when the code gets to the training point it crashes when trying to forward() the model and evaluate the loss function:
neptune: Executing in Offline Mode.
2018-08-15 18-34-40 google-ai-odt >>> training
2018-08-15 18-35-03 google-ai-odt >>> Training on a reduced class subset: ['Person', 'Car', 'Dress', 'Footwear']
2018-08-15 18:35:05 steppy >>> initializing Step label_encoder...
2018-08-15 18:35:05 steppy >>> initializing Step label_encoder...
2018-08-15 18:35:05 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:05 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:05 steppy >>> done: initializing experiment directories
2018-08-15 18:35:05 steppy >>> done: initializing experiment directories
2018-08-15 18:35:05 steppy >>> Step label_encoder initialized
2018-08-15 18:35:05 steppy >>> Step label_encoder initialized
2018-08-15 18:35:05 steppy >>> initializing Step loader...
2018-08-15 18:35:05 steppy >>> initializing Step loader...
2018-08-15 18:35:05 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:05 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:05 steppy >>> done: initializing experiment directories
2018-08-15 18:35:05 steppy >>> done: initializing experiment directories
2018-08-15 18:35:05 steppy >>> Step loader initialized
2018-08-15 18:35:05 steppy >>> Step loader initialized
neptune: Executing in Offline Mode.
2018-08-15 18:35:07 steppy >>> initializing Step retinanet...
2018-08-15 18:35:07 steppy >>> initializing Step retinanet...
2018-08-15 18:35:07 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:07 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:07 steppy >>> done: initializing experiment directories
2018-08-15 18:35:07 steppy >>> done: initializing experiment directories
2018-08-15 18:35:07 steppy >>> Step retinanet initialized
2018-08-15 18:35:07 steppy >>> Step retinanet initialized
2018-08-15 18:35:07 steppy >>> cleaning cache...
2018-08-15 18:35:07 steppy >>> cleaning cache...
2018-08-15 18:35:07 steppy >>> cleaning cache done
2018-08-15 18:35:07 steppy >>> cleaning cache done
2018-08-15 18:35:07 steppy >>> Step label_encoder, adapting inputs...
2018-08-15 18:35:07 steppy >>> Step label_encoder, adapting inputs...
2018-08-15 18:35:07 steppy >>> Step label_encoder, fitting and transforming...
2018-08-15 18:35:07 steppy >>> Step label_encoder, fitting and transforming...
2018-08-15 18:35:10 steppy >>> Step label_encoder, persisting transformer to the experiments/transformers/label_encoder
2018-08-15 18:35:10 steppy >>> Step label_encoder, persisting transformer to the experiments/transformers/label_encoder
2018-08-15 18:35:10 steppy >>> Step loader, adapting inputs...
2018-08-15 18:35:10 steppy >>> Step loader, adapting inputs...
2018-08-15 18:35:10 steppy >>> Step loader, transforming...
2018-08-15 18:35:10 steppy >>> Step loader, transforming...
2018-08-15 18:35:10 steppy >>> Step retinanet, unpacking inputs...
2018-08-15 18:35:10 steppy >>> Step retinanet, unpacking inputs...
2018-08-15 18:35:10 steppy >>> Step retinanet, fitting and transforming...
2018-08-15 18:35:10 steppy >>> Step retinanet, fitting and transforming...
2018-08-15 18:35:13 steppy >>> starting training...
2018-08-15 18:35:13 steppy >>> starting training...
2018-08-15 18:35:13 steppy >>> initial lr: 1e-05
2018-08-15 18:35:13 steppy >>> initial lr: 1e-05
2018-08-15 18:35:13 steppy >>> epoch 0 ...
2018-08-15 18:35:13 steppy >>> epoch 0 ...
2018-08-15 18:35:13 steppy >>> epoch 0 batch 0 ...
2018-08-15 18:35:13 steppy >>> epoch 0 batch 0 ...
Traceback (most recent call last):
File "main.py", line 78, in <module>
main()
File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "main.py", line 16, in train
pipeline_manager.train(pipeline_name, dev_mode)
File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/pipeline_manager.py", line 21, in train
train(pipeline_name, dev_mode)
File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/pipeline_manager.py", line 85, in train
pipeline.fit_transform(data)
File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/steppy_dev/base.py", line 280, in fit_transform
step_output_data = self._cached_fit_transform(step_inputs)
File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/steppy_dev/base.py", line 390, in _cached_fit_transform
step_output_data = self.transformer.fit_transform(**step_inputs)
File "/home/m09170/anaconda3/lib/python3.6/site-packages/steppy/base.py", line 605, in fit_transform
self.fit(*args, **kwargs)
File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/models.py", line 32, in fit
metrics = self._fit_loop(data)
File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/models.py", line 63, in _fit_loop
batch_loss = loss_function(outputs_batch, target) * weight
File "/home/m09170/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/parallel.py", line 137, in forward
outputs = _criterion_parallel_apply(replicas, inputs, targets, kwargs)
File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/parallel.py", line 192, in _criterion_parallel_apply
raise output
File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/parallel.py", line 167, in _worker
output = module(*(input + target), **kwargs)
TypeError: can only concatenate tuple (not "dict") to tuple
It looks like the "target" variable used for the loss function is supposed to be a tuple, but instead it is a dictionary. I have to admit I'm not sure what exactly's causing this, but I wanted to see if you have any immediate ideas before I spend time going through the code line by line. Execution command is just: python main.py -- train --pipeline_name retinanet, and the whole config has been filled out with (supposedly) the correct files.
Thanks!
The text was updated successfully, but these errors were encountered:
In the neptune config file:
You can change the batch_size_inference to the same number you have set for batch_size_train for training, but you will need to change it back to 1 if using evaluate or prediction pipes.
I am running the latest master branch (offline), and when the code gets to the training point it crashes when trying to
forward()
the model and evaluate the loss function:It looks like the "target" variable used for the loss function is supposed to be a tuple, but instead it is a dictionary. I have to admit I'm not sure what exactly's causing this, but I wanted to see if you have any immediate ideas before I spend time going through the code line by line. Execution command is just:
python main.py -- train --pipeline_name retinanet
, and the whole config has been filled out with (supposedly) the correct files.Thanks!
The text was updated successfully, but these errors were encountered: