Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError for __len__ function in SafeDataLoader #2

Closed
sannawag opened this issue Oct 15, 2018 · 8 comments
Closed

NotImplementedError for __len__ function in SafeDataLoader #2

sannawag opened this issue Oct 15, 2018 · 8 comments

Comments

@sannawag
Copy link

sannawag commented Oct 15, 2018

Hi @msamogh,

Having a __len__ function in SafeDataLoader, identical to the one in torch.utils.data.Dataloader, would be very helpful. I currently get the following error:

dataloader = nc.SafeDataLoader(dataset)
if i == len(dataloader):
File "envs/deep/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 504, in __len__
return len(self.batch_sampler)
File "envs/deep/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 150, in __len__
return (len(self.sampler) + self.batch_size - 1) // self.batch_size
File "envs/deep/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 20, in __len__
raise NotImplementedError
NotImplementedError

Thank you,

@msamogh
Copy link
Owner

msamogh commented Oct 15, 2018

Hi @sannawag,
Can you tell me a bit more of what you are trying to do? It would help me understand your situation better so that I can help you.

@sannawag
Copy link
Author

@msamogh, given the large size of my training dataset, I wish to validate more often than once per epoch. For this reason, when enumerating the dataloader, I check whether the index equals the length of the dataloader - 1. I do not directly have access to the dataset length because I initialize the dataloaders in a separate function.

@msamogh
Copy link
Owner

msamogh commented Oct 15, 2018

So if I understand you correctly, you wish to enumerate through a single DataLoader in a nested fashion?

@sannawag
Copy link
Author

sannawag commented Oct 15, 2018

I do wish to enumerate through DataLoaders in a nested fashion, but one is a built from a training set, the other from a validation set.

@msamogh
Copy link
Owner

msamogh commented Oct 15, 2018

So what's preventing you from enumerating through the validation set in the usual way (using enumerate())? You can reinitialize your validation set DataLoader inside the loop as many times you want.

@sannawag
Copy link
Author

Here is the basic structure I am trying to obtain:

for i, sample in enumerate(training_dataloader):
    # process the training sample
    if i % step == 0 or i == len(training_dataloader) - 1:
       validate_and_report_loss(validation_dataloader)

The catch is computing len(training_dataloader).

Thanks!

@msamogh
Copy link
Owner

msamogh commented Oct 15, 2018

Ah, I see. The bad news is you can't call len on your DataLoader (unless you're okay with setting eager_eval to True on your dataset). The good news is, in this case, is that you can simply move the part where you check if it's the last iteration to outside the loop (once it has ended).

This is because without actually checking every element, SafeDataLoader has no way of telling what the effective number of valid samples in your dataset is going to be. So one of the things you will have to give up with SafeDataLoader is the ability to call len().

Hope that helps!

@sannawag
Copy link
Author

Thank you @msamogh, this makes perfect sense.

@msamogh msamogh closed this as completed Oct 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants