Tortoise: Scaling in QKVAttentionLegacy #557
Answered
by
eginhard
farisalasmary
asked this question in
Q&A
-
|
Hello community, I came across this line when I was reading the code and I was wondering why the |
Beta Was this translation helpful? Give feedback.
Answered by
eginhard
Feb 6, 2026
Replies: 1 comment
-
|
It's in the original code as well, so likely by design: https://github.com/neonbjb/tortoise-tts/blob/8a2563ecabe93c4fb626f876dd0c52c966edef2f/tortoise/models/arch_util.py#L64 I don't see any particular reason mentioned in the paper, but it probably helps to make training more stable. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
eginhard
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It's in the original code as well, so likely by design: https://github.com/neonbjb/tortoise-tts/blob/8a2563ecabe93c4fb626f876dd0c52c966edef2f/tortoise/models/arch_util.py#L64
I don't see any particular reason mentioned in the paper, but it probably helps to make training more stable.