🤗 Hugging Face | 🤖 ModelScope
Introducing Ring-2.5-1T: the world's first open-source trillion-parameter thinking model based on hybrid linear attention architecture.
In a major step toward general-purpose AI agents, we're scaling hybrid linear attention across pre-training and RL. Our efficient 1:7 MLA + Lightning Linear Attention boosts reasoning speed and exploration, while expanded RL training enhances deep thinking and long-horizon task execution.
Ring-2.5-1T model achieves gold-medal🏅 level performance at both IMO 2025 and CMO 2025. For detailed solutions of our model, please see examples.
| Model | Context Length | Download |
|---|---|---|
| Ring-2.5-1T | 128K -> 256K (YaRN) | 🤗 HuggingFace 🤖 ModelScope |
Note: If you are interested in previous version, please visit the past model collections in Huggingface or ModelScope.
We will later submit our model to SGLang official release, now we can prepare the environment following steps:
git clone -b ling_2_5 git@github.com:antgroup/sglang.git
cd sglang
# Install the python packages
pip install --upgrade pip
pip install -e "python"Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}.
Here is the example to run Ring-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:
- Start server:
# Node 0:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0
# Node 1:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1
# Node 2:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2
# Node 3:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3
# This is only an example. Please adjust arguments according to your actual environment.- Client:
curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'More usage can be found here
This code repository is licensed under the MIT License.
If you find our work helpful, feel free to give us a cite.
