Not able to train custom data. #48

sanjaygunda13 · 2022-07-11T16:00:53Z

I am trying to train data having roughly 3000 images on google colab GPU and resulting GPU error as below. So I tried giving 50 images to process then it is working fine. But, I believe it should not be GPU issue as coco is able to train more than 10000 images. I checked out the format of data and images as well those look fine based on coco data format. Any leads on this is appreciated.

Downloading: 100% 0.99M/0.99M [00:00<00:00, 7.93MB/s]
Downloading: 100% 446k/446k [00:00<00:00, 5.78MB/s]
Downloading: 100% 665/665 [00:00<00:00, 1.13MB/s]
Data size is 3060
Token indices sequence length is longer than the specified maximum sequence length for this model (1140 > 1024). Running this sequence through the model will result in indexing errors
Downloading: 100% 523M/523M [00:07<00:00, 69.9MB/s]
Train both prefix and GPT
/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:310: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
FutureWarning,

Training epoch 0
coco_prefix: 0% 0/76 [00:00<?, ?it/s]Traceback (most recent call last):
File "/content/gdrive/MyDrive/CLIP_prefix_caption/train.py", line 370, in
main()
File "/content/gdrive/MyDrive/CLIP_prefix_caption/train.py", line 366, in main
train(dataset, model, args, output_dir=args.out_dir, output_prefix=args.prefix)
File "/content/gdrive/MyDrive/CLIP_prefix_caption/train.py", line 314, in train
outputs = model(tokens, prefix, mask)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/gdrive/MyDrive/CLIP_prefix_caption/train.py", line 233, in forward
out = self.gpt(inputs_embeds=embedding_cat, labels=labels, attention_mask=mask)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 1061, in forward
return_dict=return_dict,
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 899, in forward
output_attentions=output_attentions,
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 397, in forward
output_attentions=output_attentions,
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 332, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 189, in _attn
attn_weights = torch.matmul(query, key.transpose(-1, -2))
RuntimeError: CUDA out of memory. Tried to allocate 430.00 MiB (GPU 0; 14.76 GiB total capacity; 13.54 GiB already allocated; 145.75 MiB free; 13.69 GiB reserved in total by PyTorch)
coco_prefix: 0% 0/76 [00:00<?, ?it/s]

The text was updated successfully, but these errors were encountered:

rongtongxueya · 2023-07-14T06:27:40Z

i hvae a question.I tried to use coco's json downloaded from the Internet, but it was not successful because it was different from the json marked in the code. However, I would like to know what his json file is like, why it seems that each image has only one caption.I would like to ask you, how do you make the json file of the data set you need?

Junyiliu0 · 2023-10-25T08:42:17Z

Did you solve it?

sanjaygunda13 · 2023-10-25T17:24:43Z

I stoped working on it long back and i am not able to identify the issue Thanks, Sanjay

…

On Wed, Oct 25, 2023 at 1:42 AM Junyiliu0 ***@***.***> wrote: Did you solve it? — Reply to this email directly, view it on GitHub <#48 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUSWOKKAOX2DKF4VMCUS3W3YBDGHHAVCNFSM53IASOY2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZXHA3TSMJZGE2Q> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to train custom data. #48

Not able to train custom data. #48

sanjaygunda13 commented Jul 11, 2022

rongtongxueya commented Jul 14, 2023

Junyiliu0 commented Oct 25, 2023

sanjaygunda13 commented Oct 25, 2023 via email

Not able to train custom data. #48

Not able to train custom data. #48

Comments

sanjaygunda13 commented Jul 11, 2022

rongtongxueya commented Jul 14, 2023

Junyiliu0 commented Oct 25, 2023

sanjaygunda13 commented Oct 25, 2023 via email