unable to train in docker #1

wondergo2017 · 2020-12-12T14:20:59Z

I'm pretty interested in your work, but I met with some problems when running your code.

I pull docker haotang95/itergnn:main and run cmd in docker
python train.py lobster-unweighted DecIterGNN30-HomoPathSimConv-Max

but I get Runtime Error :

root@d21db24d6e17:/workspace# python train.py lobster-unweighted DecIterGNN30-HomoPathSimConv-Max
Saving logs to /workspace/record/ShortestPathLen/lobster_unweighted/4_30/100000/DecIterGNN_30/PathSimConv_homo/Max/1/Graph/logs
2020-12-12 14:13:29,919 root INFO: DatasetParam: {'dataset_name': 'ShortestPathLen', 'size': 100000, 'min_num_node': 4, 'num_num_node': 30, 'sparsity': 0.5, 'k': 8, 'dim': 2, 'lobster_prob': (0.2, 0.2), 'index_generator': 'lobster', 'min_edge_distance': 1.0, 'max_edge_distance': 1.0, 'device': device(type='cpu')}
2020-12-12 14:13:29,919 root INFO: ModelParam: {'embedding_layer_num': 2, 'architecture_name': 'DecIterGNN', 'layer_num': 30, 'module_num': 1, 'layer_name': 'PathSimConv', 'hidden_size': 64, 'input_feat_flag': True, 'homogeneous_flag': 1, 'readout_name': 'Max', 'confidence_layer_num': 1, 'head_layer_num': 1, 'model_type': 'Graph'}
2020-12-12 14:13:29,919 root INFO: GeneralParam: {'learning_rate': 0.001, 'epoch_num': 200, 'batch_size': 32, 'save_freq': 20, 'log_freq': 100, 'resume_flag': False, 'running_metric_name_list': ['relative_loss', 'mse_loss']}
2020-12-12 14:13:29,919 root INFO: cpu
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/tarfile.py", line 2297, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/opt/conda/lib/python3.6/tarfile.py", line 1093, in fromtarfile
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
File "/opt/conda/lib/python3.6/tarfile.py", line 1031, in frombuf
raise TruncatedHeaderError("truncated header")
tarfile.TruncatedHeaderError: truncated header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 556, in _load
return legacy_load(f)
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 467, in legacy_load
with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
File "/opt/conda/lib/python3.6/tarfile.py", line 1589, in open
return func(name, filemode, fileobj, **kwargs)
File "/opt/conda/lib/python3.6/tarfile.py", line 1619, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/opt/conda/lib/python3.6/tarfile.py", line 1482, in init
self.firstmember = self.next()
File "/opt/conda/lib/python3.6/tarfile.py", line 2315, in next
raise ReadError(str(e))
tarfile.ReadError: truncated header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 29, in
main(arg)
File "train.py", line 23, in main
train_model(dataset_param, model_param, general_param)
File "/workspace/main/train.py", line 62, in train_model
train_dataset = param2dataset(dataset_param, train_flag=True)
File "/workspace/main/parameters.py", line 68, in param2dataset
return globals()[param.dataset_name+'Dataset'](train_flag=train_flag, **other_param)
File "/workspace/dataset/base.py", line 29, in init
pre_transform=None, pre_filter=pre_filter)
File "/opt/conda/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 88, in init
if osp.exists(path) and torch.load(path) != repr(pre_transform):
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 560, in _load
raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: /workspace/dataset/FAKEDataset/processed/pre_transform.pt is a zip archive (did you mean to use torch.jit.load()?)
root@d21db24d6e17:/workspace# ls

It must be something wrong, can you help me? Thank you for your attention!

The text was updated successfully, but these errors were encountered:

silent567 · 2020-12-12T15:54:51Z

Hi, thanks for your interest in our work! However, I will be quite busy with the application in the next few days and may not have enough time to check the details.

I have briefly gone through your error messages and it seems that there is something wrong with the datasets, which should be quite stable in my experiments. So, a quick check is to do rm -r dataset/FAKEDataset, then mkdir dataset/FAKEDataset and finally run the corresponding training commands. If the error still exists, could you please help provide a method to reproduce the error as well as your system's details? I will check them as soon as possible (maybe after Dec. 16th). Thanks!

wondergo2017 · 2020-12-13T02:41:18Z

It works, thank you!

wondergo2017 closed this as completed Dec 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to train in docker #1

unable to train in docker #1

wondergo2017 commented Dec 12, 2020

silent567 commented Dec 12, 2020

wondergo2017 commented Dec 13, 2020

unable to train in docker #1

unable to train in docker #1

Comments

wondergo2017 commented Dec 12, 2020

silent567 commented Dec 12, 2020

wondergo2017 commented Dec 13, 2020