How train mixtral MoE ? #18

tommarques56 · 2024-05-09T21:49:14Z

Hi, I just want to know if somebody have successfully trained a Mixtral like the 7x8B ? Because when I try, the output is random ( unreadable ).

thank !

sshh12 · 2024-05-18T17:43:38Z

My guess would be that the architecture might be different enough that this code would not work https://github.com/sshh12/multi_token/blob/main/multi_token/language_models/mistral.py. Potentially can duplicate that file and add an instance for Mixtral based on the huggingface implementation.

tommarques56 · 2024-05-18T23:38:40Z

Do you think it’s the same / better way to fine tune directly Mixtral, or use mistral 7b fine tuned for vision, create a MoE with tools like mergoo ( please take a look at mergoo because SSHH12 + mergoo can be a life changer ) and fine tune the created MoE ?

sshh12 · 2024-05-28T00:51:58Z

Hm my guess would be that merging after training the modality projector wouldn't work (at least out of the box with this library just bc of all the custom torch modules that get strapped onto the model). However should definitely be do-able to take an existing merge and add the modality to it by adding that hf architecture as I mentioned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How train mixtral MoE ? #18

How train mixtral MoE ? #18

tommarques56 commented May 9, 2024

sshh12 commented May 18, 2024

tommarques56 commented May 18, 2024

sshh12 commented May 28, 2024

How train mixtral MoE ? #18

How train mixtral MoE ? #18

Comments

tommarques56 commented May 9, 2024

sshh12 commented May 18, 2024

tommarques56 commented May 18, 2024

sshh12 commented May 28, 2024