You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to understand why masking is used in the text encoder. This doesn't seem necessary for CLIP since it does not perform an autoregressive task. Maybe my understanding is incomplete. The relevant code is located at line 286 in model.py.