Add LongCat-Flash #40730

molbap · 2025-09-05T16:24:19Z

What does this PR do?

As per title, adds support for LongCat-Flash, a 560B MoE from Meituan.

Status:

Current modeling_longcat_flash file allows loading checkpoint without trust_remote_code, using a specific base_model_tp_plan found in the config. `from_pretrained('..., tp_plan='auto') loads the model properly.
Chat template is as provided by authors.
A no-op hook added to deepseek_v3 to abstract lora scaling.
Testing out generations and correctness. All work.
A few modular adjustments to make to derive from DeepSeekv3, estimate ~300 loc total.
Quality & last touches, adding a new checkpoint to maximize compatibility with transformers
Make CI happy
Include parallelism tests

Launch snippet:

# launch_longcat.py
from transformers import LongcatFlashForCausalLM, AutoTokenizer
import torch

torch.manual_seed(30)
model_id = "meituan-longcat/LongCat-Flash-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id)

chat = [
      {"role": "user", "content": "Hello! What is the capital of France? What can you tell me about it?"},
]

model = LongcatFlashForCausalLM.from_pretrained(
      model_id,
      tp_plan="auto",
      dtype=torch.bfloat16,
      trust_remote_code=False, # can be removed.
      )

inputs = tokenizer.apply_chat_template(
      chat, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

outputs = model.generate(inputs, max_new_tokens=30)
print(tokenizer.batch_decode(outputs))

Note that you will need at least 2x8 H100 to launch the model with TP as follows

torchrun  --nproc_per_node=8 --nnodes=2 --node_rank=0 | 1  --rdzv-id <an_id> --rdzv-backend c10d --rdzv-endpoint $NODE_ID:$NODE_PORT  --log-dir ./logs_longcat launch_longcat.py

And you'll get

[Round 0] USER:Hello! What is the capital of France? What can you tell me about it? ASSISTANT:Hello! 😊 The capital of France is Paris, one of the most famous and beloved cities in the world. Here’s a quick overview of what makes Paris special:

1. Iconic Landmarks

Eiffel Tower – The global symbol of France, built in 1889 for the World's Fair.
Notre-Dame Cathedral – A masterpiece of Gothic architecture (currently under restoration after the 2019 fire).
Louvre Museum – The world’s largest art museum, home to the Mona Lisa and Venus de Milo.
Sacré-Cœur Basilica – A stunning white church atop Montmartre with panoramic views.
Arc de Triomphe – Honors French military victories, with the Tomb of the Unknown Soldier beneath it.
Champs-Élysées – A glamorous avenue leading to the Arc de Triomphe, lined with shops and cafés.

2. Culture & Arts

Paris is the "City of Light" (La Ville Lumière), a nickname from its early adoption of street lighting and its role as a center of enlightenment.
It’s a global hub for fashion (haute couture, Paris Fashion Week) and art (Impressionism, Picasso, Dali).
Famous literary figures like Hemingway, Fitzgerald, and Sartre lived and wrote here.

3. Food & Cuisine

Croissants, baguettes, macarons, and crème brûlée are just a few of its culinary delights.
Paris has over 100 Michelin-starred restaurants and countless cozy bistros.
The Marché d’Aligre and Rue Mouffetard are great for fresh produce and local flavors.

4. History & Politics

Founded in the 3rd century BC by the Parisii tribe, it became a major European city under the Romans.
The French Revolution (1789–1799) began here, leading to the fall of the monarchy.
Today, it’s the political and economic heart of France, housing the French President’s residence (Élysée Palace) and the National Assembly.

**

HuggingFaceDocBuilderDev · 2025-09-05T16:33:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

molbap · 2025-09-15T08:04:11Z

[run-slow] longcat_flash

ydshieh · 2025-09-15T13:06:51Z

[run-slow] longcat_flash

ydshieh · 2025-09-15T13:08:07Z

[run slow] longcat_flash

ydshieh · 2025-09-15T13:08:38Z

run-slow: longcat_flash

github-actions · 2025-09-15T13:10:06Z

This comment contains run-slow, running the specified jobs:

models: ['models/longcat_flash']
quantizations: [] ...

github-actions · 2025-09-16T08:57:50Z

This comment contains run-slow, running the specified jobs:

models: ['models/longcat_flash']
quantizations: [] ...

github-actions · 2025-09-16T09:09:15Z

This comment contains run-slow, running the specified jobs:

models: ['models/longcat_flash']
quantizations: [] ...

github-actions · 2025-09-16T13:07:44Z

This comment contains run-slow, running the specified jobs:

models: ['models/longcat_flash']
quantizations: [] ...

…w_moe

molbap · 2025-09-16T13:46:33Z

run-slow: longcat_flash

github-actions · 2025-09-16T13:48:07Z

This comment contains run-slow, running the specified jobs:

models: ['models/longcat_flash']
quantizations: [] ...

…w_moe

github-actions · 2025-09-16T17:03:41Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, longcat_flash

molbap · 2025-09-16T17:12:03Z

run-slow: longcat_flash

github-actions · 2025-09-16T17:13:31Z

This comment contains run-slow, running the specified jobs:

models: ['models/longcat_flash']
quantizations: [] ...

* working draft for LongCat * BC changes to deepseek_v3 for modular * format * various modularities * better tp plan * better init * minor changes * make modular better * clean up patterns * Revert a couple of modular commits, because we won't convert in the end * make things explicit. * draft test * toctree, tests and imports * drop * woops * make better things * update test * update * fixes * style and CI * convert stuff * up * ah, yes, that * enable gen tests * fix cache shape in test (sum of 2 things) * fix tests * comments * re-Identitise * minimize changes * better defaults * modular betterment * fix configuration, add documentation * fix init * add integration tests * add info * simplify * update slow tests * fix * style * some additional long tests * cpu-only long test * fix last tests? * urg * cleaner tests why not * fix * improve slow tests, no skip * style * don't upcast * one skip * finally fix parallelism

molbap added 2 commits September 5, 2025 16:00

working draft for LongCat

21ac639

BC changes to deepseek_v3 for modular

c939eb2

molbap added 22 commits September 8, 2025 07:44

format

2535c28

Merge branch 'main' into new_moe

bac973f

various modularities

cddaba5

better tp plan

67943a4

better init

d765b18

minor changes

eebb41c

make modular better

414ba61

clean up patterns

7586dd7

Revert a couple of modular commits, because we won't convert in the end

b4584ad

make things explicit.

76e4555

draft test

c7c5a3d

toctree, tests and imports

6e58487

drop

8bb172d

woops

726828d

make better things

df11c0e

update test

fa3aacf

update

07af563

fixes

927a55e

style and CI

36c3dbb

convert stuff

d85c3e3

up

8cb4dc2

ah, yes, that

1343b65

molbap marked this pull request as ready for review September 9, 2025 16:58

molbap added 3 commits September 10, 2025 10:58

enable gen tests

275374a

fix cache shape in test (sum of 2 things)

f9d35c5

fix tests

74d2728

molbap requested a review from ArthurZucker September 10, 2025 15:45

molbap added 2 commits September 16, 2025 10:52

improve slow tests, no skip

a9b040e

style

b95af0a

huggingface deleted a comment from github-actions bot Sep 16, 2025

molbap and others added 2 commits September 16, 2025 15:05

don't upcast

f0dfec7

Merge branch 'main' into new_moe

8463c5b

molbap and others added 3 commits September 16, 2025 15:44

one skip

8cd2bb4

Merge branch 'new_moe' of github.com:huggingface/transformers into ne…

68943ca

…w_moe

Merge branch 'main' into new_moe

f0eb7af

molbap added 2 commits September 16, 2025 19:02

finally fix parallelism

c85b064

Merge branch 'new_moe' of github.com:huggingface/transformers into ne…

f385373

…w_moe

Merge branch 'main' into new_moe

66b414a

molbap merged commit 6cade29 into main Sep 17, 2025
25 checks passed

molbap deleted the new_moe branch September 17, 2025 12:48

ArthurZucker added the New model label Sep 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LongCat-Flash #40730

Add LongCat-Flash #40730

Uh oh!

molbap commented Sep 5, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Sep 5, 2025

Uh oh!

molbap commented Sep 15, 2025

Uh oh!

ydshieh commented Sep 15, 2025

Uh oh!

ydshieh commented Sep 15, 2025

Uh oh!

ydshieh commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

molbap commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

molbap commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Add LongCat-Flash #40730

Add LongCat-Flash #40730

Uh oh!

Conversation

molbap commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

1. Iconic Landmarks

2. Culture & Arts

3. Food & Cuisine

4. History & Politics

**

Uh oh!

HuggingFaceDocBuilderDev commented Sep 5, 2025

Uh oh!

molbap commented Sep 15, 2025

Uh oh!

ydshieh commented Sep 15, 2025

Uh oh!

ydshieh commented Sep 15, 2025

Uh oh!

ydshieh commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

molbap commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

molbap commented Sep 16, 2025

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

molbap commented Sep 5, 2025 •

edited

Loading