Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Feature skip data iteration when caching scoring #557
This is the proposed fix for #552.
As discussed there, my proposal is to cache
It would be very helpful if someone could think of any unintended side-effects this change could have. E.g., in theory, some code could rely on iterating over the data even in case of caching, but I can hardly imagine this happening in reality.
Also, I cannot test on GPU for the moment. I don't see how this would affect the outcome, but it could still be nice of someone verifies it works.
Before, net.infer was cached when using a scoring callback wiht use_caching=True. This way, the time to make an inference step was saved. However, there was still an iteration step over the data for each scoring callback. If iteration is slow, this could incur a significant overhead. Now net.forward_iter is cached instead. This way, the iteration over the data is skipped and the iteration overhead should be gone.
ottonemo left a comment
In general I think this approach works. We could debate whether this is a problem we need to solve or where PyTorch lacks infrastructure (i.e., caching datasets) but ultimately I think it doesn't hurt to fix this.