Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for Conversion from Huggingface to Megads with MoE #381

Open
ControllableGeneration opened this issue Apr 24, 2024 · 0 comments
Open

Comments

@ControllableGeneration
Copy link

ControllableGeneration commented Apr 24, 2024

I want to convert a pretrained LLM into a Megads version of an MoE of it and save it.

The weights of the experts created should be identical to the weight of the Linear layer where the experts are created.

My guess would be to use hf2megads_weight_converter.py. However, it raises an error as

ValueError: Unrecognized weight type, with subname=mlp.deepspeed_moe.gate.wg.weight

And the code shows that it does not support moe weights.

244     def refactor(self):
245         assert self.is_refactored == False
246         new_w = None
247         for pname, p in self.model.named_parameters():
248             if pname in [
249                     f"{self.mega_emb_wnum}.word_embeddings.weight",
250                     f"{self.mega_lm_head_wnum}.lm_head.weight"
251             ]:
252                 new_w = self._embedding_refactor(pname, p)
253             elif pname == f"{self.mega_norm_wnum}.weight":
254                 new_w = self._direct_refactor(pname, p)
255             else:
256                 mobj = self.decoder_pat.match(pname)
257                 layer_num = int(mobj.group(1))
258                 subname = mobj.group(2)
259                 hf_layer = layer_num - self.offset_num
260                 if subname in ["self_attention.query_key_value.weight"]:
261                     new_w = self._qkv_refactor(pname, p, hf_layer)
262                 elif subname in ["mlp.dense_h_to_4h.weight"]:
263                     new_w = self._mlphto4h_dense_refactor(pname, p, hf_layer)
264                 elif subname in [
265                         "self_attention.dense.weight",
266                         "mlp.dense_4h_to_h.weight"
267                 ]:
268                     new_w = self._attn_dense_refactor(pname, p, hf_layer, subname)
269                 elif subname in [
270                         "mlp.dense_h_to_4h1.weight",
271                         "mlp.dense_h_to_4h2.weight"
272                 ]:
273                     new_w = self._mlphto4h1_refactor()
274                 elif subname in [
275                         "input_layernorm.weight",
276                         "post_attention_layernorm.weight"
277                 ]:
278                     new_w = self._direct_refactor(pname, p, hf_layer, subname)
279                 else:
280                     raise ValueError(f"Unrecognized weight type, with subname={subname}")

Please consider adding MoE model conversion, I would also do it myself and let you know if I succeed

@ControllableGeneration ControllableGeneration changed the title Support call for MoE model conversion Call for Conversion from Huggingface to Megads with MoE Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant