Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How train mixtral MoE ? #18

Open
tommarques56 opened this issue May 9, 2024 · 3 comments
Open

How train mixtral MoE ? #18

tommarques56 opened this issue May 9, 2024 · 3 comments

Comments

@tommarques56
Copy link

Hi, I just want to know if somebody have successfully trained a Mixtral like the 7x8B ? Because when I try, the output is random ( unreadable ).

thank !

@sshh12
Copy link
Owner

sshh12 commented May 18, 2024

My guess would be that the architecture might be different enough that this code would not work https://github.com/sshh12/multi_token/blob/main/multi_token/language_models/mistral.py. Potentially can duplicate that file and add an instance for Mixtral based on the huggingface implementation.

@tommarques56
Copy link
Author

Do you think it’s the same / better way to fine tune directly Mixtral, or use mistral 7b fine tuned for vision, create a MoE with tools like mergoo ( please take a look at mergoo because SSHH12 + mergoo can be a life changer ) and fine tune the created MoE ?

@sshh12
Copy link
Owner

sshh12 commented May 28, 2024

Hm my guess would be that merging after training the modality projector wouldn't work (at least out of the box with this library just bc of all the custom torch modules that get strapped onto the model). However should definitely be do-able to take an existing merge and add the modality to it by adding that hf architecture as I mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants