Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated to support adding transforms to multiple dataloaders #3268

Merged
merged 2 commits into from
Mar 27, 2021

Conversation

marii-moe
Copy link
Collaborator

Fixes the issue here: https://forums.fast.ai/t/performance-degradation-between-fastai-2-2-5-and-2-2-7/86069

We add add_tfms in order to support adding the tfms to both dataloaders. Why?

Previously dls.train.after_batch.fs == dls.valid.after_batch.fs, so adding a transform to the training dataloader automatically added it to the validation dataloader. Removing that requirement caused validation to break, so adding back that functionality here. (moving my analysis from discord into the forum thread now)

@marii-moe marii-moe requested a review from jph00 as a code owner March 17, 2021 11:49
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@muellerzr
Copy link
Contributor

muellerzr commented Mar 17, 2021

So I think we can rewrite and simplify this a bit more:

@patch
def _add_tfm(self:DataLoaders, tfms, event, ds_idx):
    "Adds `tfms` to `event` on `dl`"
    dl_tfms = getattr(self[ds_idx], event)
    apply(dl_tfms.add, tfms)
    dl_tfms.fs = L(dl_tfms.fs).sorted(key='order')
@patch
def add_tfms(self:DataLoaders, tfms, event, ds_idxs=None):
    if ds_idxs is None: ds_idxs = range(len(self.loaders))
    if not is_listy(ds_idxs): ds_idxs = listify(ds_idxs)
    for idx in ds_idxs:
        apply(self._add_tfm, tfms, event, idx)

ds_idxs is more a preference thing, and I'd leave that up to @jph00: should we use just indexing? Similar to what we already do? Or include support for dls.train and dls.valid and just do both. It adds a few lines of code but could be nice.

Also we should likely check that event is valid. In a perfect world both item_tfms and after_item can work, but I think just checking that event is either after_item or after_batch would be enough.

Re: my adjustments here, we need that last little line in _add_tfms in order to make sure the order gets properly readjusted. The only thing this doesn't do is if I pass in 3 Resize functions for instance it will add in all three. Should we not allow that since item transforms are all composed? How should that behavior work. (Question for you Jeremy)

@marii-moe
Copy link
Collaborator Author

marii-moe commented Mar 17, 2021

I am reviewing the code with @muellerzr s suggestions. I had forgotten about order and I do agree the code could be better.

@marii-moe
Copy link
Collaborator Author

marii-moe commented Mar 19, 2021

After this I will take a look at Pipeline.add. I think it needs to be updated to allow adding a list and we should also look at handling order when adding transforms. I think that logic should be reserved for the pipeline itself. Wanted to get this in first, as currently validation loss does not work with cnn_learner, and the pipeline change might take a bit longer.

@marii-moe
Copy link
Collaborator Author

@jph00 Don't mean to bother you too much, but currently cnn_learner's valid loss and accuracy are incorrect due to this issue.

@jph00
Copy link
Member

jph00 commented Mar 27, 2021

Apologies @marii-moe I didn't realize it was ready to merge now. Thanks so much for the code and the ping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants