Skip to content

shootime2021/APUS-xDAN-4.0-moe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 

Repository files navigation

APUS-xDAN-4.0(MoE)

A Trillion-Parameter MOE Architecture Model Outperforms Grok1, Compatible with 4090 Graphics Card

📊Performance ✨Resources 📖Architecture 📂Weights 🔨 Install 🚀Inference 🤝 Acknowledgement



English | 简体中文

Important

📢 Welcome to the First High-Performance Trillion-Parameter MOE Architecture LLM trained jointly by xDAN and APUS. 📢
🤗 This a high-performance MOE model whose Math(GSM8k_Cot:79%), Reasoning(MMLU:75%)!
🙏 Feel free to use according to the inference code.

News

  • 🙌 04-03 Model link released! [https://huggingface.co/xDAN-AI/APUS-xDAN-4.0-MOE]
  • 🙌 04-01 APUS-xDAN-4.0(MOE) The model files link will be released soon. Stay tuned for more details!
  • 🙌 03-31 APUS-xDAN-4.0(MOE) Open Source Model, quantized on "IQ-Quantized Tech" in 1.5-bit, 2-bit, and 4-bit, optimized to run on consumer-grade 4090 graphics cards.

📊 Performance

Comparison with Other Models

  • Alignment By xDAN-APUS4.0

Performances generated from different evaluation toolkits are different due to the prompts, settings and implementation details.

BenchMark Mode APUS-xDAN-4.0(MoE) Mixtral-8x7B(MoE) Llama2-70B Grok-1(MoE)
Total Params GEN 136B 48B 70B 314B
Active Params GEN 60B 12B 70B 78.5B
MMLU PPL 73.1 71.3 69.7 73.0
BIG-Bench-Hard GEN 66.4 67.1 64.9 71.7
GSM-8K GEN 78.2 65.7 63.4 62.9
MATH GEN 29.5 22.7 12.0 23.9

✨ Resources

Model Quantized Size Context Hardware Requirement
APUS-xDAN4.0-MoE-0402.Q2_K.gguf Q2_K 39G 32k 2x24G GPU memory
APUS-xDAN4.0-MoE-0402.IQ3_XXS.gguf IQ3_XXS 41G 32k 2x24G GPU memory
APUS-xDAN4.0-MoE-0402.Q3_K_M_Matrix.gguf Q3_K_M 51G 32k 2x24G GPU memory
APUS-xDAN4.0-MoE-0402.Q4_K_M.gguf Q4_K_M 64G 32k 3x24G GPU memory

Deployment

📖 Model Architecture

The APUS-xDAN-4.0(MoE) model is mainly composed of 32 identical MoEtransformer blocks. The main difference between the MoEtransformer block and the ordinary transformer block is that the FFN layer is replaced by the MoE FFN layer. In the MoE FFN layer, the tensor first goes through a gate layer to calculate the scores of each expert, and then selects the top-k experts from the 8 experts based on the expert scores. The tensor is aggregated through the outputs of the top-k experts, thereby obtaining the final output of the MoE FFN layer. Each expert consists of 3 linear layers. It is worth noting that all Norm Layers of APUS-xDAN4.0 MoE also use RMSNorm, which is the same as LLama. In the attention layer, the QKV matrix in the APUS-xDAN4.0 MoE has a Q matrix shape of (4096,4096) and K and V matrix shapes of (4096,1024).

We plot the architecture as the following:

📂 Model Weights

Hugging Face Format

Raw Format

You can download the checkpoints by magnet or Hugging Face

Download via HF

If you are unable to access Hugging Face, please try hf-mirror

# Download the Hugging Face
git lfs install
git clone https://huggingface.co/xDAN-AI/APUS-xDAN-4.0-MOE
# Merge Files(Only for HF)
cd APUS-xDAN-4.0-MOE/



# 🚀 Inference

## Text Completion 
```bash
./main -m APUS-xDAN4.0-MoE-0402.Q2_K.gguf   --n-gpu-layers 99 \
--prompt "You are a helpful assistant named APUS-xDAN4.0 MoE. " --chatml \
--interactive \
--temp 0.7 \
--ctx-size 4096

🖊️ Citation

@misc{2023opencompass,
    title={xDAN-APUS4.0: High Performance Alignment Model Trainer.},
    author={xDAN-APUS4.0 Contributors},
    howpublished = {\url{https://github.com/shootime2021/APUS-xDAN-4.0-moe}},
    year={2024}
}

About

Its an open source LLM based on MOE Structure.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published