Tortoise: Scaling in QKVAttentionLegacy #557

farisalasmary · 2026-02-06T07:29:55Z

farisalasmary
Feb 6, 2026

Hello community,

I came across this line when I was reading the code and I was wondering why the math.sqrt is computed twice on the ch which means that the computed number is the fourth root of ch.
Is this a typo or is it by design? Can anyone explain, please?

coqui-ai-TTS/TTS/tts/layers/tortoise/arch_utils.py

Line 67 in 36c6ff8

scale = 1 / math.sqrt(math.sqrt(ch))

Answered by eginhard

Feb 6, 2026

It's in the original code as well, so likely by design: https://github.com/neonbjb/tortoise-tts/blob/8a2563ecabe93c4fb626f876dd0c52c966edef2f/tortoise/models/arch_util.py#L64

I don't see any particular reason mentioned in the paper, but it probably helps to make training more stable.

View full answer

eginhard · 2026-02-06T09:20:58Z

eginhard
Feb 6, 2026
Maintainer

It's in the original code as well, so likely by design: https://github.com/neonbjb/tortoise-tts/blob/8a2563ecabe93c4fb626f876dd0c52c966edef2f/tortoise/models/arch_util.py#L64

I don't see any particular reason mentioned in the paper, but it probably helps to make training more stable.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tortoise: Scaling in QKVAttentionLegacy #557

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Tortoise: Scaling in QKVAttentionLegacy #557

Uh oh!

farisalasmary Feb 6, 2026

Replies: 1 comment

Uh oh!

eginhard Feb 6, 2026 Maintainer

farisalasmary
Feb 6, 2026

eginhard
Feb 6, 2026
Maintainer