Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent schedule from crashing close to the end of training #3335

Merged

Conversation

Lewington-pitsos
Copy link
Contributor

The check introduced in #2792 is not strong enough.

In fastai/fastbook#263 it was noted that sometimes in fastai/callback/schedule.py the lines:

idx = (pos >= pcts).nonzero().max()
actual_pos = (pos-pcts[idx]) / (pcts[idx+1]-pcts[idx])

can lead to index-out-of-range errors when pos is very close to the final value of pcts (as can occur near the end of training using learner.fit_one_cycle() when the total number of steps is very large).

The following check against this was introduced in #2792:

if int(pos) == 1: return scheds[-1](1.)

But it looks like the error can still sneak through in some cases, check out the following:

import torch

pos = 0.9999999999999997

print("greater or equal to float 1.:", pos >= 1.) 
print("cast to int and greater or equal to int 1:", int(pos) >= 1) 
print("greater or equal to tensor 1.:", pos >= torch.as_tensor(1.)) # Bad news bears 


pcts = torch.Tensor([0.0, 0.25, 1.0])
idx = (pos >= pcts).nonzero().max()
print("should never be more than 1:", idx)
actual_pos = (pos-pcts[idx]) / (pcts[idx+1]-pcts[idx])

# PRINTS:
# greater or equal to float 1.: False
# cast to int and greater or equal to int 1: False
# greater or equal to tensor 1.: tensor(True) 
# should never be more than 1: tensor(2)
# IndexError: index 3 is out of bounds for dimension 0 with size 3

This is my first contribution so I'm trying to keep this PR minimal. Initially I strengthened the check to

if int(pos) == 1 or pos == torch.Tensor(1.): return scheds[-1](1.)

but by then it seemed like the checks were getting a little out of hand.

Very happy to hear feedback/make changes.

Probably a good idea to get this one in kind of soon if possible, since it's been cropping up on the forms a little recently, e.g.: IndexError: index 3 is out of bounds for dimension 0 with size 3

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@muellerzr
Copy link
Contributor

Can you add your example in as a test in the notebook please? (Or write a different test if you so choose), so that we can catch this pre-emptively if something changes

@Lewington-pitsos
Copy link
Contributor Author

Hi @muellerzr

Sorry to bother but I spent ~20 minutes looking for some kind of guide on how to write tests for fastai and couldn't manage to find anything holistic.

I worked out that all the tests exist in the jupyter notebooks, but I can't find the testing library (e.g. where is the test_close function defined?) or a guide on how to write good tests.

Can you link me to a guide on good testing practices or a walkthrough on writing tests?

@muellerzr
Copy link
Contributor

The testing comes from fastcore:

from fastcore.test import *

https://fastcore.fast.ai/test.html

@jph00
Copy link
Member

jph00 commented Apr 30, 2021

Many thanks!

@jph00 jph00 merged commit 17f8237 into fastai:master Apr 30, 2021
@hamelsmu hamelsmu added the bug label Apr 30, 2021
@hamelsmu hamelsmu changed the title Prevent schedule from crashing flakily close to the end of training Prevent schedule from crashing close to the end of training Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants