Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to process movie_lens dataset with local directory transformers model #5500

Open
PolarisRisingWar opened this issue Sep 22, 2022 · 2 comments
Labels

Comments

@PolarisRisingWar
Copy link

馃悰 Describe the bug

This is my code:

from torch_geometric.datasets import MovieLens
dataset = MovieLens(root='/data/pyg_data/MovieLens',model_name='/data/pretrained_model/all-MiniLM-L6-v2')

And it has correctly used the local respiratory for sentence_transformers to process the raw text, but it caused this bug:

Traceback (most recent call last):
  File "try1/try4.py", line 3, in <module>
    dataset = MovieLens(root='/data/pyg_data/MovieLens',model_name='/data/pretrained_model/all-MiniLM-L6-v2')
  File "env_path/lib/python3.8/site-packages/torch_geometric/datasets/movie_lens.py", line 43, in __init__
    super().__init__(root, transform, pre_transform)
  File "env_path/lib/python3.8/site-packages/torch_geometric/data/in_memory_dataset.py", line 50, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "env_path/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 87, in __init__
    self._process()
  File "env_path/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 170, in _process
    self.process()
  File "env_path/lib/python3.8/site-packages/torch_geometric/datasets/movie_lens.py", line 96, in process
    torch.save(self.collate([data]), self.processed_paths[0])
  File "env_path/lib/python3.8/site-packages/torch/serialization.py", line 377, in save
    with _open_file_like(f, 'wb') as opened_file:
  File "env_path/lib/python3.8/site-packages/torch/serialization.py", line 231, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "env_path/lib/python3.8/site-packages/torch/serialization.py", line 212, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/data/pyg_data/MovieLens/processed/data_/data/pretrained_model/all-MiniLM-L6-v2.pt'

And it is clear that this is becase the / in model_name, so I changed this two lines:
In the input parameter of __init__(), append: processed_file_name: Optional[str] = "all-MiniLM-L6-v2"
In __init__(), append: self.processed_file_name=processed_file_name
return f'data_{self.model_name}.pt' change to return f'data_{self.processed_file_name}.pt'
And original code change to:

from torch_geometric.datasets import MovieLens
dataset = MovieLens(root='/data/pyg_data/MovieLens',model_name='/data/pretrained_model/all-MiniLM-L6-v2',processed_file_name='all-MiniLM-L6-v2')

Now it works.

Environment

PyG version: 2.1.0.dev20220815
PyTorch version: 1.11.0
OS: Linux
Python version: 3.8.13
CUDA/cuDNN version: cuda10.2 cudnn7.6.5
How you installed PyTorch and PyG (conda, pip, source):
PyTorch: conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=10.2 -c pytorch

PyG:
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu102.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu102.html
pip install pyg-nightly

Any other relevant information (e.g., version of torch-scatter):
torch-scatter 2.0.9
torch-sparse 0.6.14

@rusty1s
Copy link
Member

rusty1s commented Sep 22, 2022

Thanks for reporting. Do you want to send a pull request to fix?

PolarisRisingWar added a commit to PolarisRisingWar/pytorch_geometric that referenced this issue Sep 22, 2022
Explictly name the processed pt file, in case this situation happening: pyg-team#5500
Maybe it's better to use it only when using local transformers model? I can't decide it quickly.
@PolarisRisingWar
Copy link
Author

OK. I send this pull request: #5503

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants