Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while reproducing the Colab Example #1

Closed
zwang77 opened this issue Jan 17, 2021 · 3 comments
Closed

Error while reproducing the Colab Example #1

zwang77 opened this issue Jan 17, 2021 · 3 comments

Comments

@zwang77
Copy link

zwang77 commented Jan 17, 2021

I was trying to reproduce the example notebook
https://colab.research.google.com/github/georgianpartners/Multimodal-Toolkit/blob/master/notebooks/text_w_tabular_classification.ipynb#scrollTo=ABT1hK9cRsuk
and got the error


RuntimeError Traceback (most recent call last)
in

~/anaconda3/lib/python3.7/site-packages/transformers/trainer.py in train(self, model_path, trial)
750
751 if self.args.evaluate_during_training and self.global_step % self.args.eval_steps == 0:
--> 752 metrics = self.evaluate()
753 self._report_to_hp_search(trial, epoch, metrics)
754

~/anaconda3/lib/python3.7/site-packages/transformers/trainer.py in evaluate(self, eval_dataset)
1155 eval_dataloader = self.get_eval_dataloader(eval_dataset)
1156
-> 1157 output = self.prediction_loop(eval_dataloader, description="Evaluation")
1158
1159 self.log(output.metrics)

~/anaconda3/lib/python3.7/site-packages/transformers/trainer.py in prediction_loop(self, dataloader, description, prediction_loss_only)
1236 samples_count = 0
1237 for inputs in tqdm(dataloader, desc=description, disable=disable_tqdm):
-> 1238 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only)
1239 batch_size = inputs[list(inputs.keys())[0]].shape[0]
1240 samples_count += batch_size

~/anaconda3/lib/python3.7/site-packages/transformers/trainer.py in prediction_step(self, model, inputs, prediction_loss_only)
1327
1328 with torch.no_grad():
-> 1329 outputs = model(**inputs)
1330 if has_labels:
1331 loss, logits = outputs[:2]

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
159 return self.module(*inputs[0], **kwargs[0])
160 replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
--> 161 outputs = self.parallel_apply(replicas, inputs, kwargs)
162 return self.gather(outputs, self.output_device)
163

~/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py in parallel_apply(self, replicas, inputs, kwargs)
169
170 def parallel_apply(self, replicas, inputs, kwargs):
--> 171 return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
172
173 def gather(self, outputs, output_device):

~/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py in parallel_apply(modules, inputs, kwargs_tup, devices)
84 output = results[i]
85 if isinstance(output, ExceptionWrapper):
---> 86 output.reraise()
87 outputs.append(output)
88 return outputs

~/anaconda3/lib/python3.7/site-packages/torch/_utils.py in reraise(self)
426 # have message field
427 raise self.exc_type(message=msg)
--> 428 raise self.exc_type(msg)
429
430

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/multimodal_transformers/model/tabular_transformers.py", line 116, in forward
numerical_feats)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/multimodal_transformers/model/tabular_combiner.py", line 426, in forward
cat_feats = self.cat_layer(cat_feats)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/multimodal_transformers/model/layer_utils.py", line 52, in forward
layer_inputs.append(layer(input))
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1690, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

I did the following modifications to the notebook which I don't think would cause the error:

Change 1: Use load_data to create train, val and test datasets

# Get Datasets
train_dataset = load_data(
    df_train,
    data_args.column_info['text_cols'],
    tokenizer,
    label_col=data_args.column_info['label_col'],
    label_list=data_args.column_info['label_list'],
    categorical_cols=data_args.column_info['cat_cols'],
    numerical_cols=data_args.column_info['num_cols'],
    sep_text_token_str=tokenizer.sep_token,
)

val_dataset = load_data(
    df_val,
    data_args.column_info['text_cols'],
    tokenizer,
    label_col=data_args.column_info['label_col'],
    label_list=data_args.column_info['label_list'],
    categorical_cols=data_args.column_info['cat_cols'],
    numerical_cols=data_args.column_info['num_cols'],
    sep_text_token_str=tokenizer.sep_token,
)

test_dataset  = load_data(
    df_test,
    data_args.column_info['text_cols'],
    tokenizer,
    label_col=data_args.column_info['label_col'],
    label_list=data_args.column_info['label_list'],
    categorical_cols=data_args.column_info['cat_cols'],
    numerical_cols=data_args.column_info['num_cols'],
    sep_text_token_str=tokenizer.sep_token,
)

change 2: didn't use cache_dir since Colab Example didn't specify it and the default is None

tokenizer = AutoTokenizer.from_pretrained(
    model_args.model_name_or_path
)

config = AutoConfig.from_pretrained(model_args.model_name_or_path)
@petulla
Copy link

petulla commented Apr 29, 2021

I have this same issue when trying to use load_data() on separate test and train dataframes. Weirdly, one of the two tabular_torch_dataset.TorchTextDataset` returned from load_data will train; the other will not.

I have to use the code with the load from file and setup just as in the colab to make work.

@codeKgu Seems like you put a ton of work into this repo. Would be great to get this fixed.

@spencernelsonucla
Copy link

@petulla and @mikiwz - I think I found the reason for this issue. In this line:
https://github.com/georgian-io/Multimodal-Toolkit/blob/master/multimodal_transformers/data/load_data.py#L228

The package concatenates the train, val, and test dfs. Then, if you're precessing the categorical features via one hot encoding, which is the default, it will one hot encode with ALL of those dfs together.

For example, say your train df has a categorical feature with values ["a", "b"]. This would get one hot encoded as 2 separate columns (a and b). However, say your test data has values ["a", "c"]. Well, with the way this is currently packaged, the train and test data is concatenated together and so there will be one hot encoding to produce 3 columns (a, b, and c). But, if you load your test dataset separately, you would only one hot encode "a" and "c" - resulting in 2 columns instead of 3. This is the issue. The model was thus trained on 3 columns, but you're giving it 2 columns to predict with.

The way around this is to either not use categorical data, or use label encoding instead:

test_dataset_2 = load_data(
    test_data,
    data_args.column_info['text_cols'],
    tokenizer,
    label_col=data_args.column_info['label_col'],
    label_list=data_args.column_info['label_list'],
    numerical_cols=data_args.column_info['num_cols'],
    sep_text_token_str=tokenizer.sep_token,
    categorical_encode_type="label"
)

@akashsaravanan-georgian
Copy link
Member

Closing as this has been answered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants