Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load dataset file for semisupervised TU #26

Open
Ripper346 opened this issue Jun 4, 2021 · 5 comments
Open

Can't load dataset file for semisupervised TU #26

Ripper346 opened this issue Jun 4, 2021 · 5 comments

Comments

@Ripper346
Copy link

Hi,
I have the problem of #4 (comment) and #1 trying lunching semisupervised TU pre training. I launch python main.py --dataset MUTAG --aug1 random2 --aug2 random2 --lr 0.001 --suffix 0 --exp test and I get this error:

[INFO] running single test..
-----
Total 1 experiments in this run:
1/1 - MUTAG - deg+odeg100+ak3+reall - ResGFN
Here we go..
-----
1/1 - MUTAG - deg+odeg100+ak3+reall - ResGFN
None None
Traceback (most recent call last):
  File "main.py", line 338, in <module>     
    run_exp_single_test()
  File "main.py", line 316, in run_exp_single_test
    run_exp_lib([('MUTAG', 'deg+odeg100+ak3+reall', 'ResGFN')])
  File "main.py", line 165, in run_exp_lib
    dataset = get_dataset(
  File "C:\Users\alessandro\Developments\GraphCL\semisupervised_TU\pre-training\datasets.py", line 57, in get_dataset
    dataset = TUDatasetExt(
  File "C:\Users\alessandro\Developments\GraphCL\semisupervised_TU\pre-training\tu_dataset.py", line 49, in __init__
    super(TUDatasetExt, self).__init__(root, name, transform, pre_transform,
  File "C:\_______\envs\torch\lib\site-packages\torch_geometric\datasets\tu_dataset.py", line 66, in __init__
    self.data, self.slices = torch.load(self.processed_paths[0])
  File "C:\_______\envs\torch\lib\site-packages\torch\serialization.py", line 579, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "C:\_______\envs\torch\lib\site-packages\torch\serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "C:\_______\envs\torch\lib\site-packages\torch\serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'data\\MUTAG\\MUTAG\\processed\\data_deg+odeg100+ak3+reall.pt'

I have installed all, read the other two issues but I can't understand what I have to do in order to make it work (if there is anything I can do). I have all installed in a python env

@yyou1996
Copy link
Collaborator

yyou1996 commented Jun 5, 2021

Hi @Ripper346,

Thanks for your interest and a big apology for your frustration. The following solutions are I can come with:

  1. Would you mind share your env information that I can double check? This experiment is constructed upon an old repo https://github.com/chentingpc/gfn#requirements with slightly outdated packages, so I understand you may install the required ones but in case there is an oversight.

  2. I notice in the error information that FileNotFoundError: [Errno 2] No such file or directory: 'data\\MUTAG\\MUTAG\\processed\\data_deg+odeg100+ak3+reall.pt'. It looks weird for me that the program concat the path as 'data\\MUTAG\\MUTAG\\processed\\data_deg+odeg100+ak3+reall.pt' rather than 'data\MUTAG\MUTAG\processed\data_deg+odeg100+ak3+reall.pt'. Is there anyway for you to debug this?

@Ripper346
Copy link
Author

  1. I have python 3.8.8 and I use it fine with other projects that use torch and pytorch-geometric, but here is my requirements of my env (a bit long)
alembic==1.5.8
ase==3.21.1
astroid==2.5.1
async-generator==1.10
attrs==20.3.0
autopep8==1.5.5
backcall==0.2.0
bleach==3.3.0
certifi==2020.12.5
chardet==3.0.4
cliff==3.7.0
cmaes==0.8.2
cmd2==1.5.0
colorama==0.4.4
colorlog==4.8.0
control==0.8.4
cvxopt==1.2.6
cycler==0.10.0
Cython==0.29.22
decorator==4.4.2
defusedxml==0.7.0
dgl-cu110==0.6.0
entrypoints==0.3
future==0.18.2
googledrivedownloader==0.4
grakel==0.1.8
graphkit-learn==0.2.0.post1
greenlet==1.0.0
h5py==3.2.0
idna==2.10
ipdb==0.13.5
ipykernel==5.5.0
ipython==7.21.0
ipython-genutils==0.2.0
isodate==0.6.0
isort==5.7.0
jedi==0.18.0
Jinja2==2.11.3
joblib==1.0.1
jsonschema==3.2.0
jupyter-client==6.1.11
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
kiwisolver==1.3.1
lazy-object-proxy==1.5.2
llvmlite==0.35.0
Mako==1.1.4
mariadb==1.0.6
MarkupSafe==1.1.1
matplotlib==3.3.4
mccabe==0.6.1
mistune==0.8.4
Mosek==9.2.38
mysql-connector-python==8.0.23
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
networkx==2.5
nose==1.3.7
numba==0.52.0
numpy==1.20.1
optuna==2.7.0
packaging==20.9
pandas==1.2.3
pandocfilters==1.4.3
parso==0.8.1
pbr==5.5.1
pickleshare==0.7.5
Pillow==8.1.1
prettytable==2.1.0
prompt-toolkit==3.0.16
protobuf==3.15.4
pycodestyle==2.6.0
Pygments==2.8.0
pylint==2.7.2
pyparsing==2.4.7
pyperclip==1.8.2
pyreadline3==3.3
pyrsistent==0.17.3
python-dateutil==2.8.1
python-editor==1.0.4
python-louvain==0.15
pytz==2021.1
pywin32==300
PyYAML==5.4.1
pyzmq==22.0.3
rdflib==5.0.0
requests==2.25.1
rope==0.18.0
scikit-learn==0.24.1
scipy==1.6.1
seaborn==0.11.1
six==1.15.0
SQLAlchemy==1.4.7
stevedore==3.3.0
tabulate==0.8.9
testpath==0.4.4
threadpoolctl==2.1.0
toml==0.10.2
torch==1.8.0+cu111
torch-cluster==1.5.9
torch-geometric==1.6.3
torch-scatter==2.0.6
torch-sparse==0.6.9
torch-spline-conv==1.2.1
torchaudio==0.8.0
torchvision==0.9.0+cu111
tornado==6.1
tqdm==4.58.0
traitlets==5.0.5
typing-extensions==3.7.4.3
urllib3==1.26.3
wcwidth==0.2.5
webencodings==0.5.1
wrapt==1.12.1
  1. I am on windows, it is normal that it places two \\ as escaping the backslash

@yyou1996
Copy link
Collaborator

yyou1996 commented Jun 5, 2021

Thank you. I see your env and it is too new (torch_geometric>=1.6.0 rather than the required 1.4.0) for semi_TU repo (please refer to https://github.com/Shen-Lab/GraphCL/tree/master/semisupervised_TU#option-1 for the correct environment).

Another option is that you can try replacing the __init__ function in tu_dataset by:

    url = 'https://ls11-www.cs.tu-dortmund.de/people/morris/' \ 
             'graphkerneldatasets'

    def __init__(self,
                 root,
                 name,
                 transform=None,
                 pre_transform=None,
                 pre_filter=None,
                 use_node_attr=False,
                 processed_filename='data.pt', aug_ratio=None):
        self.name = name
        self.processed_filename = processed_filename

        self.aug = "none"
        self.aug_ratio = None

        super(TUDatasetExt, self).__init__(root, transform, pre_transform,
                                        pre_filter)
        self.data, self.slices = torch.load(self.processed_paths[0])
        if self.data.x is not None and not use_node_attr:
            self.data.x = self.data.x[:, self.num_node_attributes:]

    @property
    def num_node_labels(self):
        if self.data.x is None:
            return 0
        for i in range(self.data.x.size(1)):
            if self.data.x[:, i:].sum().item() == self.data.x.size(0):
                return self.data.x.size(1) - i
        return 0

    @property
    def num_node_attributes(self):
        if self.data.x is None:
            return 0
        return self.data.x.size(1) - self.num_node_labels

    @property
    def raw_file_names(self):
        names = ['A', 'graph_indicator']
        return ['{}_{}.txt'.format(self.name, name) for name in names]

    @property
    def processed_file_names(self):
        return self.processed_filename

    @property
    def num_node_features(self):
        r"""Returns the number of features per node in the dataset."""
        return self[0][0].num_node_features

which might solve the download issue. A new version of this experiment to adapt to torch_geometric>=1.6.0 will also be released in the following weeks.

@Ripper346
Copy link
Author

Ripper346 commented Jun 5, 2021

Ok, thanks I will try on Monday and I will keep you informed in this issue. I think that that behavior is strange, maybe I could look at differences too between torch geometric 1.4 and 1.6

@Ripper346
Copy link
Author

Hi again, so, your code didn't solve the issue, I mitigated something else resulting the class as the following

class TUDatasetExt(TUDataset):
    def __init__(self,
                 root,
                 name,
                 transform=None,
                 pre_transform=None,
                 pre_filter=None,
                 use_node_attr=False,
                 processed_filename='data.pt',
                 aug="none", aug_ratio=None):
        self.name = name
        self.processed_filename = processed_filename

        self.aug = aug
        self.aug_ratio = None

        super(TUDatasetExt, self).__init__(root, self.name, transform, pre_transform,
                                           pre_filter, use_node_attr)
        self.data, self.slices = torch.load(self.processed_paths[0])
        if self.data.x is not None and not use_node_attr:
            self.data.x = self.data.x[:, self.num_node_attributes:]

    @property
    def processed_file_names(self):
        return self.processed_filename

    @property
    def num_node_features(self):
        r"""Returns the number of features per node in the dataset."""
        return self[0][0].num_node_features

    def download(self):
        super().download()

    def get(self, idx):
        data = self.data.__class__()

        if hasattr(self.data, '__num_nodes__'):
            data.num_nodes = self.data.__num_nodes__[idx]

        for key in self.data.keys:
            item, slices = self.data[key], self.slices[key]
            if torch.is_tensor(item):
                s = list(repeat(slice(None), item.dim()))
                s[self.data.__cat_dim__(key,
                                        item)] = slice(slices[idx],
                                                       slices[idx + 1])
            else:
                s = slice(slices[idx], slices[idx + 1])
            data[key] = item[s]

        if self.aug == 'dropN':
            data = drop_nodes(data, self.aug_ratio)
        elif self.aug == 'wdropN':
            data = weighted_drop_nodes(data, self.aug_ratio, self.npower)
        elif self.aug == 'permE':
            data = permute_edges(data, self.aug_ratio)
        elif self.aug == 'subgraph':
            data = subgraph(data, self.aug_ratio)
        elif self.aug == 'maskN':
            data = mask_nodes(data, self.aug_ratio)
        elif self.aug == 'none':
            data = data
        elif self.aug == 'random4':
            ri = np.random.randint(4)
            if ri == 0:
                data = drop_nodes(data, self.aug_ratio)
            elif ri == 1:
                data = subgraph(data, self.aug_ratio)
            elif ri == 2:
                data = permute_edges(data, self.aug_ratio)
            elif ri == 3:
                data = mask_nodes(data, self.aug_ratio)
            else:
                print('sample augmentation error')
                assert False

        elif self.aug == 'random3':
            ri = np.random.randint(3)
            if ri == 0:
                data = drop_nodes(data, self.aug_ratio)
            elif ri == 1:
                data = subgraph(data, self.aug_ratio)
            elif ri == 2:
                data = permute_edges(data, self.aug_ratio)
            else:
                print('sample augmentation error')
                assert False

        elif self.aug == 'random2':
            ri = np.random.randint(2)
            if ri == 0:
                data = drop_nodes(data, self.aug_ratio)
            elif ri == 1:
                data = subgraph(data, self.aug_ratio)
            else:
                print('sample augmentation error')
                assert False

        else:
            print('augmentation error')
            assert False

        return data

It can now download the dataset but it raises again the error of the issue.

Then I tried to install the conda environment of semisupervised TU first but it can't solve some dependencies:

ResolvePackageNotFound:
  - ld_impl_linux-64=2.33.1
  - libffi=3.3
  - readline=8.0
  - libgcc-ng=9.1.0
  - libstdcxx-ng=9.1.0
  - ncurses=6.2
  - libedit=3.1.20191231

I tried with a docker devcontainer, python 3.7 on debian buster with the requirements:

decorator==4.4.2
future==0.18.2
isodate==0.6.0
joblib==0.16.0
networkx==2.4
numpy==1.19.0
pandas==1.0.5
pillow==7.2.0
plyfile==0.7.2
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
rdflib==5.0.0
scikit-learn==0.23.1
scipy==1.5.0
six==1.15.0
threadpoolctl==2.1.0

and then installed manually

pip3 install torch==1.4.0 torch-vision==0.5.0 -f https://download.pytorch.org/whl/torch_stable.html
pip3 install torch-scatter==1.1.0 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-sparse==0.4.4 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-cluster==1.4.5 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-spline-conv==1.1.0 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip3 install torch-geometric==1.1.0

The installation and run of the original code went fine. I had just to do an adjustment in train_eval.py from r146 I had to place two checks for the logs and models folders to create them if they don't exist.

I noticed that the issue starts facing from torch-geometric 1.4.2, before it doesn't have that problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants