Skip to content
This repository has been archived by the owner on Feb 16, 2022. It is now read-only.

i got "ConnectionResetError" #16

Closed
jaeyun95 opened this issue Mar 13, 2020 · 2 comments
Closed

i got "ConnectionResetError" #16

jaeyun95 opened this issue Mar 13, 2020 · 2 comments

Comments

@jaeyun95
Copy link

jaeyun95 commented Mar 13, 2020

hi
i run this code for VCR task.(training)
i got an error like this.

(vilbert-mt) ailab@ailab:~/vilbert-multi-task$ python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --tasks 5-6 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model
Failed to import tensorflow.
03/13/2020 13:52:59 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
03/13/2020 13:53:00 - INFO - pytorch_transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/13/2020 13:53:00 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 16
03/13/2020 13:53:11 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 16
03/13/2020 13:53:25 - INFO - vilbert.utils -   logging file at: save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-multi_task_model/logs
03/13/2020 13:53:25 - INFO - vilbert.utils -   loading weights file save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin
03/13/2020 13:53:29 - INFO - vilbert.utils -   Weights of VILBertForVLTasks not initialized from pretrained model: ['bert.embeddings.task_embeddings.weight', 'vil_prediction.logit_fc.0.weight', 'vil_prediction.logit_fc.0.bias', 'vil_prediction.logit_fc.2.weight', 'vil_prediction.logit_fc.2.bias', 'vil_prediction.logit_fc.3.weight', 'vil_prediction.logit_fc.3.bias', 'vil_prediction_gqa.logit_fc.0.weight', 'vil_prediction_gqa.logit_fc.0.bias', 'vil_prediction_gqa.logit_fc.2.weight', 'vil_prediction_gqa.logit_fc.2.bias', 'vil_prediction_gqa.logit_fc.3.weight', 'vil_prediction_gqa.logit_fc.3.bias', 'vil_binary_prediction.logit_fc.0.weight', 'vil_binary_prediction.logit_fc.0.bias', 'vil_binary_prediction.logit_fc.2.weight', 'vil_binary_prediction.logit_fc.2.bias', 'vil_binary_prediction.logit_fc.3.weight', 'vil_binary_prediction.logit_fc.3.bias', 'vil_tri_prediction.weight', 'vil_tri_prediction.bias']
03/13/2020 13:53:29 - INFO - vilbert.utils -   Weights from pretrained model not used in VILBertForVLTasks: ['vil_prediction.main.0.bias', 'vil_prediction.main.0.weight_g', 'vil_prediction.main.0.weight_v', 'vil_prediction.main.3.bias', 'vil_prediction.main.3.weight_g', 'vil_prediction.main.3.weight_v']
559 559
***** Running training *****
  Num Iters:  {'TASK5': 2039, 'TASK6': 2039}
  Batch size:  {'TASK5': 16, 'TASK6': 16}
  Num steps: 40780
Epoch:   0%|                                             | 0/20 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "train_tasks.py", line 670, in <module>
    main()
  File "train_tasks.py", line 529, in main
    task_losses,
  File "/home/ailab/vilbert-multi-task/vilbert/task_utils.py", line 200, in ForwardModelsTrain
    if task_cfg[task_id]["process"] in ["dialog"]:
KeyError: 'process'
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/utils/data/_utils/pin_memory.py", line 21, in _pin_memory_loop
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 276, in rebuild_storage_fd
    fd = df.detach()
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 493, in Client
    answer_challenge(c, authkey)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 737, in answer_challenge
    response = connection.recv_bytes(256)        # reject large message
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

how do i do? help T^T

@jaeyun95
Copy link
Author

i solved it!

@gw16
Copy link

gw16 commented Jul 23, 2021

i solved it!

i solved it!

Hey I got same. How did you solve this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants