Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If Alibi is on, we should turn learned_pos_emb to False #489

Merged
merged 4 commits into from
Jul 26, 2023
Merged

Conversation

bcui19
Copy link
Contributor

@bcui19 bcui19 commented Jul 25, 2023

If we use alibi we need to turn the learned_pos_emb to False. Otherwise we still go down the learned_pos_emb codepath.

@bcui19 bcui19 requested a review from vchiley July 25, 2023 20:40
@vchiley
Copy link
Contributor

vchiley commented Jul 25, 2023

Similarly the argument could be made that if alibi == False set learned_pos_emb = True BUT I figured alibi = False and learned_pos_emb = False is NoPE so I left the option open.
Similarly, I wanted to give ppl the ability to use both alibi and learned_pos_emb

maybe add a warning or something???
maybe just ignore me... 😅

@bcui19
Copy link
Contributor Author

bcui19 commented Jul 25, 2023

Hm... Okay, so one thing is if we train models w/ ALiBi (without learned_pos_embed) and load that model with learned_pos_embed=True it could really affect model performance (since now you have this random position embedding...)

I also am not 100% sure what's the right thing. The "safest" thing might be to add a warning that we're changing it? Since if a user specifies ALiBi they usually mean the case where you don't add in the positional embedding.

@vchiley
Copy link
Contributor

vchiley commented Jul 25, 2023

if we load a trained model, should the model config be part of the ckpt?

In general, the change you made + a warning sounds good.

@samhavens
Copy link
Contributor

samhavens commented Jul 25, 2023

would a good middle ground being changing the default value so that learned_pos_emb: bool = False,? Alternatively, we can default learned_pos_emb to None, and then later change the None to learned_pos_emb = not alibi. Users could still explicitly set both to be on, but we shouldn't run in to accidentally having both?

@bcui19
Copy link
Contributor Author

bcui19 commented Jul 25, 2023

@vchiley It might be good to independently save it from composer, since also the composer checkpoint is unwieldy.

@samhavens I agree, so I did sth like this:

        if self.learned_pos_emb is None:
            self.learned_pos_emb = not self.attn_config.get('alibi', False)
            warnings.warn(f'`learned_pos_emb` has not been set, setting it to {self.learned_pos_emb}')

And it leads to this super weird warning... ):

image

Let me know if I'm doing something obviously wrong (I think there's something weird where in transformers where the class gets initialized twice...:
https://github.com/huggingface/transformers/blob/main/src/transformers/configuration_utils.py#L816

And we also can't set this up in _validate_config since we need it if we were to create a model using from_pretrained.

So I think unfortunately, I think I'm just going to add an additional warning, unless someone else has a way around this.

@vchiley
Copy link
Contributor

vchiley commented Jul 25, 2023

@bcui19 it sets it to false, then sets it to true?

@bcui19
Copy link
Contributor Author

bcui19 commented Jul 25, 2023

I think... When you create MPTConfig the way we do it now, HF does something weird and it ends up creating two MPTConfig's...

@samhavens
Copy link
Contributor

If HF is going to make us choose between extremely cryptic warnings like that or just "set pos_emb to false if alibi is true" then I guess we should revert to what you had before and just let the user know they specified alibi so we are setting pos emb to false.

Unless, does setting the default differently solve this? learned_pos_emb: bool = False ?

@bcui19
Copy link
Contributor Author

bcui19 commented Jul 26, 2023

Running w/ the current set of changes:

image

Going to merge. This is partly me assuming that very few people would want to mix positional embeddings w/ Alibi

@bcui19 bcui19 merged commit 634597d into main Jul 26, 2023
10 checks passed
@dakinggg dakinggg deleted the fix-pos-embed branch October 11, 2023 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants