New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform to numpy arrays #3
Conversation
Pushed some of your changes @NegatioN . I now have two files: "datahelper.py" and "dataset_torch.py" which contain relevant code. The idea is that we can also add a "dataset_tensorflow.py" etc to reduce needed dependencies. In addition to these I think it makes sense to add some evaluation functions, which I will write in pytorch-lightning as a first implementation. So maybe a dataset_lightning.py that can inherit the dataset from dataset_torch maybe. Is this reasonable or too complicated? |
…h default that are equivalent to old setup
(I already wrote this in a comment, but:) I would aim for a numpy generator that people who don't use PyTorch could use for their own dataloaders (ex: TensorFlow?)
Maybe writing the evaluation functions for the framework you're using first is the right call. But I think it's wise to put some effort into separating the eval functions into a self-contained format, so that they can be used in both Lightning/Pytorch/regular Python? I can provide more feedback on those things, or try to help out, if you sketch out how the function itself is supposed to look like 🙂
The datasets look identical for Lightning and Pytorch, right? It's only the eval functions you're worried about? |
I agree on all these points! But its out of scope for this PR I think. It also requires us to be a bit careful so that all functions and dataset splits are equivalent. So need to be worked on slightly separate. But we gone one step with moving data from torch to numpy. so next steps would be to also move code to be only np-dependent. :) numpy generators could a good way to "fix" this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
Pytorch files are too platform specific. Instead, let the data be available as numpy arrays. This has the same amount of compression. In addition, we make a couple of name changes to more intuitive names.
Todo