Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError when running on Pathfinder #33

Closed
andrewliu2001 opened this issue May 22, 2022 · 5 comments
Closed

ValueError when running on Pathfinder #33

andrewliu2001 opened this issue May 22, 2022 · 5 comments

Comments

@andrewliu2001
Copy link

Hi, I am getting the following error when trying to train S4 on the pathfinder dataset. Any help would be greatly appreciated.

Traceback (most recent call last):
File "/data/al451/state-spaces/train.py", line 553, in main
train(config)
File "/data/al451/state-spaces/train.py", line 498, in train
trainer.fit(model)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
self._call_and_handle_interrupt(
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1172, in _run
self._call_setup_hook() # allow user to setup lightning_module in accelerator environment
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1492, in _call_setup_hook
self._call_lightning_module_hook("setup", stage=fn)
File "/home/al451/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1593, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/data/al451/state-spaces/train.py", line 56, in setup
self.dataset.setup()
File "/data/al451/state-spaces/src/dataloaders/datasets.py", line 1234, in setup
dataset = PathFinderDataset(self.data_dir, transform=self.default_transforms())
File "/data/al451/state-spaces/src/dataloaders/datasets.py", line 1130, in init
path_list = sorted(
File "/data/al451/state-spaces/src/dataloaders/datasets.py", line 1132, in
key=lambda path: int(path.stem),
ValueError: invalid literal for int() with base 10: '._142'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

@albertfgu
Copy link
Contributor

Are you running it using the command in the README? What is your torch and pytorch-lightning version?

@andrewliu2001
Copy link
Author

andrewliu2001 commented May 22, 2022

Hi, yes I ran CUDA_VISIBLE_DEVICES=0,5,6,7 python -m train wandb=null experiment=s4-lra-pathx. I am using torch 1.11.0+cu113 and pytorch-lightning 1.6.3.

@albertfgu
Copy link
Contributor

Could you try pytorch-lightning==1.5.10? We've had issues with 1.6 and later

@andrewliu2001
Copy link
Author

Hi, I tried using pytorch-lightning 1.5.10 and I still get the same issue.

@albertfgu
Copy link
Contributor

It seems like your data might not be set up correctly. If you look at the line where it is throwing an error, it expects the data to look like data/pathfinder/pathfinder32/curv_contour_length_14/metadata/{0,1,2,...}.npy. The error you have looks like it might have found files of the form _142.npy. Can you check your data structure and re-download the data if necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants