Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to train in docker #1

Closed
wondergo2017 opened this issue Dec 12, 2020 · 2 comments
Closed

unable to train in docker #1

wondergo2017 opened this issue Dec 12, 2020 · 2 comments

Comments

@wondergo2017
Copy link

I'm pretty interested in your work, but I met with some problems when running your code.

I pull docker haotang95/itergnn:main and run cmd in docker
python train.py lobster-unweighted DecIterGNN30-HomoPathSimConv-Max

but I get Runtime Error :

root@d21db24d6e17:/workspace# python train.py lobster-unweighted DecIterGNN30-HomoPathSimConv-Max
Saving logs to /workspace/record/ShortestPathLen/lobster_unweighted/4_30/100000/DecIterGNN_30/PathSimConv_homo/Max/1/Graph/logs
2020-12-12 14:13:29,919 root INFO: DatasetParam: {'dataset_name': 'ShortestPathLen', 'size': 100000, 'min_num_node': 4, 'num_num_node': 30, 'sparsity': 0.5, 'k': 8, 'dim': 2, 'lobster_prob': (0.2, 0.2), 'index_generator': 'lobster', 'min_edge_distance': 1.0, 'max_edge_distance': 1.0, 'device': device(type='cpu')}
2020-12-12 14:13:29,919 root INFO: ModelParam: {'embedding_layer_num': 2, 'architecture_name': 'DecIterGNN', 'layer_num': 30, 'module_num': 1, 'layer_name': 'PathSimConv', 'hidden_size': 64, 'input_feat_flag': True, 'homogeneous_flag': 1, 'readout_name': 'Max', 'confidence_layer_num': 1, 'head_layer_num': 1, 'model_type': 'Graph'}
2020-12-12 14:13:29,919 root INFO: GeneralParam: {'learning_rate': 0.001, 'epoch_num': 200, 'batch_size': 32, 'save_freq': 20, 'log_freq': 100, 'resume_flag': False, 'running_metric_name_list': ['relative_loss', 'mse_loss']}
2020-12-12 14:13:29,919 root INFO: cpu
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/tarfile.py", line 2297, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/opt/conda/lib/python3.6/tarfile.py", line 1093, in fromtarfile
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
File "/opt/conda/lib/python3.6/tarfile.py", line 1031, in frombuf
raise TruncatedHeaderError("truncated header")
tarfile.TruncatedHeaderError: truncated header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 556, in _load
return legacy_load(f)
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 467, in legacy_load
with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
File "/opt/conda/lib/python3.6/tarfile.py", line 1589, in open
return func(name, filemode, fileobj, **kwargs)
File "/opt/conda/lib/python3.6/tarfile.py", line 1619, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/opt/conda/lib/python3.6/tarfile.py", line 1482, in init
self.firstmember = self.next()
File "/opt/conda/lib/python3.6/tarfile.py", line 2315, in next
raise ReadError(str(e))
tarfile.ReadError: truncated header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 29, in
main(arg)
File "train.py", line 23, in main
train_model(dataset_param, model_param, general_param)
File "/workspace/main/train.py", line 62, in train_model
train_dataset = param2dataset(dataset_param, train_flag=True)
File "/workspace/main/parameters.py", line 68, in param2dataset
return globals()[param.dataset_name+'Dataset'](train_flag=train_flag, **other_param)
File "/workspace/dataset/base.py", line 29, in init
pre_transform=None, pre_filter=pre_filter)
File "/opt/conda/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 88, in init
if osp.exists(path) and torch.load(path) != repr(pre_transform):
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 560, in _load
raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: /workspace/dataset/FAKEDataset/processed/pre_transform.pt is a zip archive (did you mean to use torch.jit.load()?)
root@d21db24d6e17:/workspace# ls

It must be something wrong, can you help me? Thank you for your attention!

@silent567
Copy link
Collaborator

Hi, thanks for your interest in our work! However, I will be quite busy with the application in the next few days and may not have enough time to check the details.

I have briefly gone through your error messages and it seems that there is something wrong with the datasets, which should be quite stable in my experiments. So, a quick check is to do rm -r dataset/FAKEDataset, then mkdir dataset/FAKEDataset and finally run the corresponding training commands. If the error still exists, could you please help provide a method to reproduce the error as well as your system's details? I will check them as soon as possible (maybe after Dec. 16th). Thanks!

@wondergo2017
Copy link
Author

It works, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants