Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: Target 83 is out of bounds. #1

Closed
ilikeokoge opened this issue Aug 6, 2020 · 13 comments
Closed

IndexError: Target 83 is out of bounds. #1

ilikeokoge opened this issue Aug 6, 2020 · 13 comments

Comments

@ilikeokoge
Copy link

Thanks for a great codebase!

I try to execute "python train.py dataset=ntu_swap_axis -m".
But I have encountered IndexError: Target 83 is out of bounds.

Do you know?

@ilikeokoge
Copy link
Author

Hi, this is traceback
image

@raphaelmemmesheimer
Copy link
Owner

raphaelmemmesheimer commented Aug 6, 2020

Hey, thanks for trying it. I'm not able to reproduce the error. Could you paste the config output, which is printed right after running the script and could you state your library versions. Mine are:

In [3]: import pytorch_metric_learning
In [4]: pytorch_metric_learning.__version__
Out[4]: '0.9.89'
In [5]: import torch
In [6]: torch.__version__
Out[6]: '1.5.0'

Make sure that the folder structure should look like:

-$DATASET_FOLDER
-- ntu
---- one_shot
------ samples
-------- ...
------ train
-------- ...
------ test
-------- ...

@ilikeokoge
Copy link
Author

ilikeokoge commented Aug 6, 2020

Thanks for a quick reply!
My config output is following.

(sldml) $ python train.py dataset=ntu_swap_axis -m
INFO:root:pytorch-metric-learning VERSION 0.9.89
INFO:root:record_keeper VERSION 0.9.27
[2020-08-06 20:31:45,388][HYDRA] Sweep output dir : multirun/2020-08-06/20-31-45
[2020-08-06 20:31:45,391][HYDRA] Launching 1 jobs locally
[2020-08-06 20:31:45,391][HYDRA]        #0 : dataset=ntu_swap_axis
dataset:
  data_dir: ntu/ntu_swap_axes_testswapaxes/one_shot
  name: NTU_ONE_SHOT_SWAP_AXIS
  train_classes: 100
embedder:
  class_out_size: 21
  size: 128
embedder_loss:
  margin: 0.2
  name: triplet_margin
loss:
  classifier_loss: 0.5
  metric_loss: 0.5
miner:
  epsilon: 0.1
  name: multi_similarity
mode:
  type: final_train
model:
  model_name: resnet18
  pretrained: true
optimizer:
  lr: 1.0e-06
  momentum: 0
  name: rmsprop
  weight_decay: 0
scheduler:
  gamma: 0.1
  name: step
  step_size: 1000
trainer:
  batch_size: 32
  iterations_per_epoch: 300000
  num_epochs: 100
transform:
  transform_normalize: false
  transform_random_affine: false
  transform_random_horizontal_flip: false
  transform_random_perspective: false
  transform_random_resized_crop: false
  transform_random_rotation: false
  transform_random_shear: false
  transform_resize: 256
  transform_resize_match: false

And env is

>>> import pytorch_metric_learning
>>> pytorch_metric_learning.__version__
'0.9.89'
>>> import torch
>>> torch.__version__
'1.5.0'

And the folder structure

-$DATASET_FOLDER
-- ntu
---- ntu_swap_axes_testswapaxes
------ one_shot
-------- samples
---------- ...
-------- train
---------- ...
-------- test
---------- ...

the problem is the folder structure?
But I think it looks that path is OK.

@raphaelmemmesheimer
Copy link
Owner

Looks all fine. Just a quick thing. Could you try running:
python train.py dataset=ntu_swap_axis (without the -m) in the end

@ilikeokoge
Copy link
Author

I tried python train.py dataset=ntu_swap_axis but same error ...

The problem is about dataset?
I downloaded ntu_120_one_shot.zip and extracted as "one_shot" in $DATASET_FOLDER.
And I create new folders "ntu" and "ntu_swap_axes_testswapaxes", then I put "one_shot" in "ntu_swap_axes_testswapaxes".
Is that OK?

@raphaelmemmesheimer
Copy link
Owner

raphaelmemmesheimer commented Aug 7, 2020

Okay let's try this, when being in the sl-dml root folder:

export DATASET_FOLDER="$(pwd)/data"
wget https://agas.uni-koblenz.de/datasets/sl-dml/ntu_120_one_shot.zip
mkdir -p data/ntu
unzip ntu_120_one_shot.zip -d $DATASET_FOLDER/ntu/ntu_swap_axes_testswapaxes
python train.py dataset=ntu_swap_axis

When returning in a new session you have to run export DATASET_FOLDER="$(pwd)/data" prior to the training.
If it works for you I'll change the README according to it or add a script. Also I'll change the ntu_swap_axes_testswapaxes to be something more convenient in the future.

@ilikeokoge
Copy link
Author

I tried it and encountered same error...
Well, this may be a something to my private setting...
I tried it in Linux server and my python =3.6.x .
Or someone except you tried your setting and be successful?

@raphaelmemmesheimer
Copy link
Owner

raphaelmemmesheimer commented Aug 7, 2020

A colleague of mine just executed the commands from the current version of the README in a new conda environment:

conda create --name sl-dml
conda activate sl-dml
conda install -c anaconda python=3.7

and then:

git clone https://github.com/raphaelmemmesheimer/sl-dml
cd sl-dml
pip install -r requirements.txt
export DATASET_FOLDER="$(pwd)/data"
mkdir -p data/ntu/
wget https://agas.uni-koblenz.de/datasets/sl-dml/ntu_120_one_shot.zip
unzip ntu_120_one_shot.zip -d $DATASET_FOLDER/ntu/ntu_swap_axes_testswapaxes
python train.py dataset=ntu_swap_axis

and it was training immediately for him.

@raphaelmemmesheimer
Copy link
Owner

raphaelmemmesheimer commented Aug 7, 2020

I managed to reconstruct the error when training on a CPU. I don't recommend doing that but I'll try to follow up on that.

@ilikeokoge
Copy link
Author

Thanks for your kindness.
I have a gpu so I think I am using gpu.
Default setting is for gpu? I did not change the setting.
How do I set for using gpu?

@raphaelmemmesheimer
Copy link
Owner

GPU is used if available. The requirements.txt contains dependencies for gpu. I guess your CUDA or driver setup is broken in a way. The following snippet should return True:

import torch
torch.cuda.is_available()
True

before you start training. May this helps:

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

@ilikeokoge
Copy link
Author

I installed cudatoolkit=10.1 and your colleague's setting.
As a result, I did not find the IndexError.
But another error is found.
image

Now, GPU is OK.
image

Do you have a any suggestions?

@ilikeokoge
Copy link
Author

I can train now!
This error cause version of Pytorch(1.6).
I downgraded Pytorch to 1.5 and torchvision to 0.6.0 and It works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants