Conversation
…e doesn't leave us with uninitialized tensors
|
[For maintainers] Suggested jobs to run (before merge) run-slow: tapas |
|
run-slow: tapas |
|
This comment contains models: ["models/tapas"] |
vasqu
left a comment
There was a problem hiding this comment.
Thanks for the fix, looks good to me. Makes me wonder how we can detect these misses more easily 😓
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@vasqu we could expand This wouldn't catch every failure (because the NaNs only appear sometimes) but I think (?) it should create a flaky failure for any uninitialized model parameter. Not sure how to make that flaky failure more reliable, though! |
|
Merging this and will think about a follow-up PR to surface this class of bug more reliably and with less flakiness. |
Some parameters in Tapas are initialized in
__init__()and not reinitialized in_init_weights(), which means that if the model is created on themetadevice, those parameters do not get a weight initialization. This causes a crash later if the uninitialized memory has someNaNvalues in it! This caused thetest_all_tensors_are_parameter_or_buffertest to be flaky.This PR leaves tensor creation in
__init__()but moves initialization to_init_weights()cc @vasqu