Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

Evaluating loss function: TypeError: can only concatenate tuple (not "dict") to tuple #81

Closed
mxbi opened this issue Aug 15, 2018 · 2 comments
Assignees

Comments

@mxbi
Copy link

mxbi commented Aug 15, 2018

I am running the latest master branch (offline), and when the code gets to the training point it crashes when trying to forward() the model and evaluate the loss function:

neptune: Executing in Offline Mode.
2018-08-15 18-34-40 google-ai-odt >>> training
2018-08-15 18-35-03 google-ai-odt >>> Training on a reduced class subset: ['Person', 'Car', 'Dress', 'Footwear']
2018-08-15 18:35:05 steppy >>> initializing Step label_encoder...
2018-08-15 18:35:05 steppy >>> initializing Step label_encoder...
2018-08-15 18:35:05 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:05 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:05 steppy >>> done: initializing experiment directories
2018-08-15 18:35:05 steppy >>> done: initializing experiment directories
2018-08-15 18:35:05 steppy >>> Step label_encoder initialized
2018-08-15 18:35:05 steppy >>> Step label_encoder initialized
2018-08-15 18:35:05 steppy >>> initializing Step loader...
2018-08-15 18:35:05 steppy >>> initializing Step loader...
2018-08-15 18:35:05 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:05 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:05 steppy >>> done: initializing experiment directories
2018-08-15 18:35:05 steppy >>> done: initializing experiment directories
2018-08-15 18:35:05 steppy >>> Step loader initialized
2018-08-15 18:35:05 steppy >>> Step loader initialized
neptune: Executing in Offline Mode.
2018-08-15 18:35:07 steppy >>> initializing Step retinanet...
2018-08-15 18:35:07 steppy >>> initializing Step retinanet...
2018-08-15 18:35:07 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:07 steppy >>> initializing experiment directories under experiments
2018-08-15 18:35:07 steppy >>> done: initializing experiment directories
2018-08-15 18:35:07 steppy >>> done: initializing experiment directories
2018-08-15 18:35:07 steppy >>> Step retinanet initialized
2018-08-15 18:35:07 steppy >>> Step retinanet initialized
2018-08-15 18:35:07 steppy >>> cleaning cache...
2018-08-15 18:35:07 steppy >>> cleaning cache...
2018-08-15 18:35:07 steppy >>> cleaning cache done
2018-08-15 18:35:07 steppy >>> cleaning cache done
2018-08-15 18:35:07 steppy >>> Step label_encoder, adapting inputs...
2018-08-15 18:35:07 steppy >>> Step label_encoder, adapting inputs...
2018-08-15 18:35:07 steppy >>> Step label_encoder, fitting and transforming...
2018-08-15 18:35:07 steppy >>> Step label_encoder, fitting and transforming...
2018-08-15 18:35:10 steppy >>> Step label_encoder, persisting transformer to the experiments/transformers/label_encoder
2018-08-15 18:35:10 steppy >>> Step label_encoder, persisting transformer to the experiments/transformers/label_encoder
2018-08-15 18:35:10 steppy >>> Step loader, adapting inputs...
2018-08-15 18:35:10 steppy >>> Step loader, adapting inputs...
2018-08-15 18:35:10 steppy >>> Step loader, transforming...
2018-08-15 18:35:10 steppy >>> Step loader, transforming...
2018-08-15 18:35:10 steppy >>> Step retinanet, unpacking inputs...
2018-08-15 18:35:10 steppy >>> Step retinanet, unpacking inputs...
2018-08-15 18:35:10 steppy >>> Step retinanet, fitting and transforming...
2018-08-15 18:35:10 steppy >>> Step retinanet, fitting and transforming...
2018-08-15 18:35:13 steppy >>> starting training...
2018-08-15 18:35:13 steppy >>> starting training...
2018-08-15 18:35:13 steppy >>> initial lr: 1e-05
2018-08-15 18:35:13 steppy >>> initial lr: 1e-05
2018-08-15 18:35:13 steppy >>> epoch 0 ...
2018-08-15 18:35:13 steppy >>> epoch 0 ...
2018-08-15 18:35:13 steppy >>> epoch 0 batch 0 ...
2018-08-15 18:35:13 steppy >>> epoch 0 batch 0 ...
Traceback (most recent call last):
  File "main.py", line 78, in <module>
    main()
  File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/m09170/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "main.py", line 16, in train
    pipeline_manager.train(pipeline_name, dev_mode)
  File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/pipeline_manager.py", line 21, in train
    train(pipeline_name, dev_mode)
  File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/pipeline_manager.py", line 85, in train
    pipeline.fit_transform(data)
  File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/steppy_dev/base.py", line 280, in fit_transform
    step_output_data = self._cached_fit_transform(step_inputs)
  File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/steppy_dev/base.py", line 390, in _cached_fit_transform
    step_output_data = self.transformer.fit_transform(**step_inputs)
  File "/home/m09170/anaconda3/lib/python3.6/site-packages/steppy/base.py", line 605, in fit_transform
    self.fit(*args, **kwargs)
  File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/models.py", line 32, in fit
    metrics = self._fit_loop(data)
  File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/models.py", line 63, in _fit_loop
    batch_loss = loss_function(outputs_batch, target) * weight
  File "/home/m09170/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/parallel.py", line 137, in forward
    outputs = _criterion_parallel_apply(replicas, inputs, targets, kwargs)
  File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/parallel.py", line 192, in _criterion_parallel_apply
    raise output
  File "/media/nvme1/kaggle-openimages/src/open-solution-googleai-object-detection/src/parallel.py", line 167, in _worker
    output = module(*(input + target), **kwargs)
TypeError: can only concatenate tuple (not "dict") to tuple

It looks like the "target" variable used for the loss function is supposed to be a tuple, but instead it is a dictionary. I have to admit I'm not sure what exactly's causing this, but I wanted to see if you have any immediate ideas before I spend time going through the code line by line. Execution command is just: python main.py -- train --pipeline_name retinanet, and the whole config has been filled out with (supposedly) the correct files.

Thanks!

@i008
Copy link
Collaborator

i008 commented Aug 15, 2018

In the neptune config file:
You can change the batch_size_inference to the same number you have set for batch_size_train for training, but you will need to change it back to 1 if using evaluate or prediction pipes.

@jakubczakon jakubczakon self-assigned this Aug 15, 2018
@jakubczakon
Copy link
Contributor

@mxbi I answered on kaggle but what @i008 is saying is pretty much exactly that. Are you using multi gpu or single gpu is another question.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants