Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: invalid index of a 0-dim tensor. Use tensor.item() in Python or tensor.item() in C++ to convert a 0-dim tensor to a number #1010

Open
kiki88104 opened this issue Aug 17, 2023 · 8 comments
Labels

Comments

@kiki88104
Copy link

How to solve this error?

File "c:\Users\User.virtualenvs\testNewTransformer-s5V1e_yt\lib\site-packages\skorch\utils.py", line 264, in _indexing_other
return data[i]

IndexError: invalid index of a 0-dim tensor. Use tensor.item() in Python or tensor.item() in C++ to convert a 0-dim tensor to a number

pytorch-version==2.0.0 withcu118
torchvision==0.15.0+cu118
skorch==0.14.0

@BenjaminBossan
Copy link
Collaborator

Could you please give a lot more context, otherwise we cannot answer that question. Ideally, you could provide a code sample that allows us to reproduce the error. Otherwise, it would probably help if you can describe the data you're using (types, dtypes, shapes, etc.).

@kiki88104
Copy link
Author

The data type is a pytorch tensordataset
It includes in input_ids_array and label_id_array.
There are all in Huggingface AutoTokenizer generated.
Because the text needs to encode by using Huggingface language model

@kiki88104
Copy link
Author

kiki88104 commented Aug 17, 2023

def generate_data_set(input_examples, label_masks, label_map, do_shuffle=False, balance_label_examples=False):
    '''
    Generate a Dataloader given the input examples, eventually masked if they are 
    to be considered NOT labeled.
    '''
    examples = []

    # Count the percentage of labeled examples
    num_labeled_examples = 0
    for label_mask in label_masks:
        if label_mask:
            num_labeled_examples += 1
  
   
    label_mask_rate = num_labeled_examples/len(input_examples)
    #計算有標籤資料在所有資料中的比例
    # if required it applies the balance
    for index, ex in enumerate(input_examples):
        if label_mask_rate == 1 or not balance_label_examples:
            examples.append((ex, label_masks[index]))
            
        else:
            # IT SIMULATE A LABELED EXAMPLE
            if label_masks[index]:
                balance = int(1/label_mask_rate)
                balance = int(math.log(balance, 2))
                if balance < 1:
                    balance = 1
                for b in range(0, int(balance)):
                    examples.append((ex, label_masks[index]))
            else:
                examples.append((ex, label_masks[index]))


    # -----------------------------------------------
    # Generate input examples to the Transformer
    # -----------------------------------------------
    input_ids = []
    input_mask_array = []
    label_mask_array = []
    label_id_array = []
    # Tokenization     
    for (text, label_mask) in examples:
        encoded_sent = tokenizer.encode(
            text[0], add_special_tokens=True, max_length=max_seq_length, padding="max_length", truncation=True)
        input_ids.append(encoded_sent)
        label_id_array.append(label_map[text[1]])
        label_mask_array.append(label_mask)

    # Attention to token (to ignore padded input wordpieces)
    for sent in input_ids:
        att_mask = [int(token_id > 0) for token_id in sent]
        input_mask_array.append(att_mask)
    # Convertion to Tensor
    input_ids = torch.tensor(input_ids)

    input_mask_array = torch.tensor(input_mask_array)
    label_id_array = torch.tensor(label_id_array, dtype=torch.long)
    label_mask_array = torch.tensor(label_mask_array)

    # Building the TensorDataset
    dataset = TensorDataset(input_ids,label_id_array)
    return dataset

@kiki88104
Copy link
Author

kiki88104 commented Aug 17, 2023

generator_find_param = NeuralNetClassifier(
    module=Generator1,
    criterion=torch.nn.CrossEntropyLoss(ignore_index=-1),
    train_split=None,
    max_epochs=10,
    batch_size=32,
    module__noise_size=100,
    module__output_size=hidden_size,
    module__hidden_sizes=hidden_levels_g,
    module__dropout_rate=out_dropout_rate,
)
kf = KFold(n_splits=5, shuffle=True, random_state=42)
generator_param_grid = {
    'module__noise_size': [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],
    'module__dropout_rate': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
}
grid_searchCV = GridSearchCV(
    generator_find_param, generator_param_grid, cv=kf, scoring='accuracy')
**grid_searchCV.fit(x_train_dataset, y_train_dataset)  /*error is here*/**

@kiki88104
Copy link
Author

Error Message
File c:\Users\User.virtualenvs\testNewTransformer-s5V1e_yt\lib\site-packages\sklearn\base.py:1151, in _fit_context..decorator..wrapper(estimator, *args, **kwargs)
1144 estimator._validate_params()
1146 with config_context(
1147 skip_parameter_validation=(
1148 prefer_skip_nested_validation or global_skip_validation
1149 )
1150 ):
-> 1151 return fit_method(estimator, *args, **kwargs)

File c:\Users\User.virtualenvs\testNewTransformer-s5V1e_yt\lib\site-packages\sklearn\model_selection_search.py:898, in BaseSearchCV.fit(self, X, y, groups, **fit_params)
892 results = self._format_results(
893 all_candidate_params, n_splits, all_out, all_more_results
894 )
896 return results
--> 898 self._run_search(evaluate_candidates)
...
return indexing(data, i)
File "c:\Users\User.virtualenvs\testNewTransformer-s5V1e_yt\lib\site-packages\skorch\utils.py", line 264, in _indexing_other
return data[i]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() in Python or tensor.item() in C++ to convert a 0-dim tensor to a number

@kiki88104
Copy link
Author

x_train_dataset=generate_data_set(labeled_examples,x_train_label_masks, label_map, do_shuffle = False, balance_label_examples = apply_balance)

@BenjaminBossan
Copy link
Collaborator

It's still very hard to tell from your code what the exact issue is.

Could you print what the input_ids and label_id_array are, which you put into your TensorDataset?

Furthermore, did you know that we support using Hugging Face tokenizers directly through HuggingFacePretrainedTokenizer? Maybe this would be a better fit for your problem. Here is a complete notebook that showcases how to use it.

@kiki88104
Copy link
Author

OK,Thanks for your help. I will take your advice. This is a good suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants