Unable to process movie_lens dataset with local directory transformers model #5500

PolarisRisingWar · 2022-09-22T10:05:02Z

🐛 Describe the bug

This is my code:

from torch_geometric.datasets import MovieLens
dataset = MovieLens(root='/data/pyg_data/MovieLens',model_name='/data/pretrained_model/all-MiniLM-L6-v2')

And it has correctly used the local respiratory for sentence_transformers to process the raw text, but it caused this bug:

Traceback (most recent call last):
  File "try1/try4.py", line 3, in <module>
    dataset = MovieLens(root='/data/pyg_data/MovieLens',model_name='/data/pretrained_model/all-MiniLM-L6-v2')
  File "env_path/lib/python3.8/site-packages/torch_geometric/datasets/movie_lens.py", line 43, in __init__
    super().__init__(root, transform, pre_transform)
  File "env_path/lib/python3.8/site-packages/torch_geometric/data/in_memory_dataset.py", line 50, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "env_path/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 87, in __init__
    self._process()
  File "env_path/lib/python3.8/site-packages/torch_geometric/data/dataset.py", line 170, in _process
    self.process()
  File "env_path/lib/python3.8/site-packages/torch_geometric/datasets/movie_lens.py", line 96, in process
    torch.save(self.collate([data]), self.processed_paths[0])
  File "env_path/lib/python3.8/site-packages/torch/serialization.py", line 377, in save
    with _open_file_like(f, 'wb') as opened_file:
  File "env_path/lib/python3.8/site-packages/torch/serialization.py", line 231, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "env_path/lib/python3.8/site-packages/torch/serialization.py", line 212, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/data/pyg_data/MovieLens/processed/data_/data/pretrained_model/all-MiniLM-L6-v2.pt'

And it is clear that this is becase the / in model_name, so I changed this two lines:
In the input parameter of __init__(), append: processed_file_name: Optional[str] = "all-MiniLM-L6-v2"
In __init__(), append: self.processed_file_name=processed_file_name
return f'data_{self.model_name}.pt' change to return f'data_{self.processed_file_name}.pt'
And original code change to:

from torch_geometric.datasets import MovieLens
dataset = MovieLens(root='/data/pyg_data/MovieLens',model_name='/data/pretrained_model/all-MiniLM-L6-v2',processed_file_name='all-MiniLM-L6-v2')

Now it works.

Environment

PyG version: 2.1.0.dev20220815
PyTorch version: 1.11.0
OS: Linux
Python version: 3.8.13
CUDA/cuDNN version: cuda10.2 cudnn7.6.5
How you installed PyTorch and PyG (conda, pip, source):
PyTorch: conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=10.2 -c pytorch

PyG:
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu102.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu102.html
pip install pyg-nightly

Any other relevant information (e.g., version of torch-scatter):
torch-scatter 2.0.9
torch-sparse 0.6.14

The text was updated successfully, but these errors were encountered:

rusty1s · 2022-09-22T11:37:37Z

Thanks for reporting. Do you want to send a pull request to fix?

Explictly name the processed pt file, in case this situation happening: pyg-team#5500 Maybe it's better to use it only when using local transformers model? I can't decide it quickly.

PolarisRisingWar · 2022-09-22T12:21:57Z

OK. I send this pull request: #5503

PolarisRisingWar added the bug label Sep 22, 2022

PolarisRisingWar mentioned this issue Sep 22, 2022

Explictly name the processed file #5503

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to process movie_lens dataset with local directory transformers model #5500

Unable to process movie_lens dataset with local directory transformers model #5500

PolarisRisingWar commented Sep 22, 2022

rusty1s commented Sep 22, 2022

PolarisRisingWar commented Sep 22, 2022

Unable to process movie_lens dataset with local directory transformers model #5500

Unable to process movie_lens dataset with local directory transformers model #5500

Comments

PolarisRisingWar commented Sep 22, 2022

🐛 Describe the bug

Environment

rusty1s commented Sep 22, 2022

PolarisRisingWar commented Sep 22, 2022