Model conversion support for T5 and FLAN-T5 model variants #8055

fairydreaming · 2024-06-21T13:58:00Z

This PR adds model conversion support for T5 and FLAN-T5 model variants:

It's a first PR from a series of PR adding support for T5 and FLAN-T5 model families.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…alGeneration and T5WithLMHeadModel

felladrin

Thank you for this implementation, @fairydreaming!
I've just tested the conversion of t5-small and it worked great!
I hope you can also bring support for flan-t5 later 🙏

fairydreaming · 2024-06-22T20:50:13Z

Thank you for this implementation, @fairydreaming! I've just tested the conversion of t5-small and it worked great! I hope you can also bring support for flan-t5 later 🙏

Hmm, since it's the same architecture with small tweaks (gated gelu instead of relu, separate lm_head), it shouldn't be hard.

fairydreaming · 2024-06-23T09:24:07Z

I hope you can also bring support for flan-t5 later 🙏

@felladrin It's now supported.

felladrin · 2024-06-23T12:56:35Z

Amazing work!

I have just one more thought:
Would it be possible not to require the spiece.model file when converting?
I'm asking because MBZUAI/LaMini-T5-61M and MBZUAI/LaMini-Flan-T5-77M, for example, don't have this file in their repo; but even with this file missing they can be converted to GGUF by huggingface/candle (and I'd guess the answer is somewhere around candle-transformers/src/models/t5.rs).

…tokens tensors (they are duplicates of shared tensor)

fairydreaming · 2024-06-23T14:07:24Z

Amazing work!

I have just one more thought: Would it be possible not to require the spiece.model file when converting? I'm asking because MBZUAI/LaMini-T5-61M and MBZUAI/LaMini-Flan-T5-77M, for example, don't have this file in their repo; but even with this file missing they can be converted to GGUF by huggingface/candle (and I'd guess the answer is somewhere around candle-transformers/src/models/t5.rs).

@felladrin From what I see all models from T5 and FLAN-T5 families use the same spiece.model file. If they fine-tuned T5 or FLAN-T5 to create LaMini-T5 and LaMini-Flan-T5 models without changing tokens then you can simply copy spiece.model from T5 or FLAN-T5. I added one more commit that allows to convert both of the LaMini models you mentioned. They seem to work just fine on my t5 branch (https://github.com/fairydreaming/llama.cpp/tree/t5):

./llama-cli --temp 0.01 -m models/lamini-flan-t5-77m.gguf -p 'how can I become more healthy?'

...
llama_output_reserve: reallocating output buffer from size 0.12 MiB to 1.00 MiB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
 You can become more healthy by practicing good nutrition, getting enough sleep, eating a balanced diet, staying hydrated, and getting enough sleep. [end of text]

llama_print_timings:        load time =      16.45 ms
llama_print_timings:      sample time =       3.35 ms /    30 runs   (    0.11 ms per token,  8957.90 tokens per second)
llama_print_timings: prompt eval time =      14.23 ms /     9 tokens (    1.58 ms per token,   632.42 tokens per second)
llama_print_timings:        eval time =     141.37 ms /    29 runs   (    4.87 ms per token,   205.13 tokens per second)
llama_print_timings:       total time =     222.85 ms /    38 tokens
Log end

felladrin · 2024-06-23T16:12:30Z

Thank you! Onwards!

compilade

I only have some very minor comments on this, which is great!

compilade · 2024-06-24T03:30:02Z

gguf-py/gguf/constants.py

+    MODEL_TENSOR.DEC_OUTPUT_NORM:      "dec.output_norm",
+    MODEL_TENSOR.ENC_ATTN_NORM:        "enc.blk.{bid}.attn_norm",


The enc and dec prefixes will (eventually) need to be also handled by the new markdown output mode of gguf-dump.py (#7853).

Can be fixed in a separate PR, I'm mentioning this for future reference.

(@mofosyne, you should be aware of this)

@compilade I tried it on one example model python3 gguf-py/scripts/gguf-dump.py --markdown /mnt/md0/models/t5-small.gguf and I'm not sure what could be fixed, can you be more specific?

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type

0 dec.blk.0.attn_k.weight Dec Block 0 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

1 dec.blk.0.attn_o.weight Dec Block 0 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

2 dec.blk.0.attn_q.weight Dec Block 0 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

3 dec.blk.0.attn_rel_b.weight Dec Block 0 Attn_Rel_B (W) ( 256) 256 8 x 32 x 1 x 1 F16

4 dec.blk.0.attn_v.weight Dec Block 0 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

5 dec.blk.0.attn_norm.weight Dec Block 0 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

6 dec.blk.0.cross_attn_k.weight Dec Block 0 Cross_Attn_K (W) (~262K) 262144 512 x 512 x 1 x 1 F16

7 dec.blk.0.cross_attn_o.weight Dec Block 0 Cross_Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

8 dec.blk.0.cross_attn_q.weight Dec Block 0 Cross_Attn_Q (W) (~262K) 262144 512 x 512 x 1 x 1 F16

9 dec.blk.0.cross_attn_rel_b.weight Dec Block 0 Cross_Attn_Rel_B (W) ( 256) 256 8 x 32 x 1 x 1 F16

10 dec.blk.0.cross_attn_v.weight Dec Block 0 Cross_Attn_V (W) (~262K) 262144 512 x 512 x 1 x 1 F16

11 dec.blk.0.cross_attn_norm.weight Dec Block 0 Cross_Attn_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32

12 dec.blk.0.ffn_up.weight Dec Block 0 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

13 dec.blk.0.ffn_down.weight Dec Block 0 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

14 dec.blk.0.ffn_norm.weight Dec Block 0 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

15 dec.blk.1.attn_k.weight Dec Block 1 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

16 dec.blk.1.attn_o.weight Dec Block 1 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

17 dec.blk.1.attn_q.weight Dec Block 1 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

18 dec.blk.1.attn_v.weight Dec Block 1 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

19 dec.blk.1.attn_norm.weight Dec Block 1 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

20 dec.blk.1.cross_attn_k.weight Dec Block 1 Cross_Attn_K (W) (~262K) 262144 512 x 512 x 1 x 1 F16

21 dec.blk.1.cross_attn_o.weight Dec Block 1 Cross_Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

22 dec.blk.1.cross_attn_q.weight Dec Block 1 Cross_Attn_Q (W) (~262K) 262144 512 x 512 x 1 x 1 F16

23 dec.blk.1.cross_attn_v.weight Dec Block 1 Cross_Attn_V (W) (~262K) 262144 512 x 512 x 1 x 1 F16

24 dec.blk.1.cross_attn_norm.weight Dec Block 1 Cross_Attn_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32

25 dec.blk.1.ffn_up.weight Dec Block 1 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

26 dec.blk.1.ffn_down.weight Dec Block 1 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

27 dec.blk.1.ffn_norm.weight Dec Block 1 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

28 dec.blk.2.attn_k.weight Dec Block 2 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

29 dec.blk.2.attn_o.weight Dec Block 2 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

30 dec.blk.2.attn_q.weight Dec Block 2 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

31 dec.blk.2.attn_v.weight Dec Block 2 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

32 dec.blk.2.attn_norm.weight Dec Block 2 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

33 dec.blk.2.cross_attn_k.weight Dec Block 2 Cross_Attn_K (W) (~262K) 262144 512 x 512 x 1 x 1 F16

34 dec.blk.2.cross_attn_o.weight Dec Block 2 Cross_Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

35 dec.blk.2.cross_attn_q.weight Dec Block 2 Cross_Attn_Q (W) (~262K) 262144 512 x 512 x 1 x 1 F16

36 dec.blk.2.cross_attn_v.weight Dec Block 2 Cross_Attn_V (W) (~262K) 262144 512 x 512 x 1 x 1 F16

37 dec.blk.2.cross_attn_norm.weight Dec Block 2 Cross_Attn_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32

38 dec.blk.2.ffn_up.weight Dec Block 2 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

39 dec.blk.2.ffn_down.weight Dec Block 2 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

40 dec.blk.2.ffn_norm.weight Dec Block 2 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

41 dec.blk.3.attn_k.weight Dec Block 3 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

42 dec.blk.3.attn_o.weight Dec Block 3 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

43 dec.blk.3.attn_q.weight Dec Block 3 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

44 dec.blk.3.attn_v.weight Dec Block 3 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

45 dec.blk.3.attn_norm.weight Dec Block 3 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

46 dec.blk.3.cross_attn_k.weight Dec Block 3 Cross_Attn_K (W) (~262K) 262144 512 x 512 x 1 x 1 F16

47 dec.blk.3.cross_attn_o.weight Dec Block 3 Cross_Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

48 dec.blk.3.cross_attn_q.weight Dec Block 3 Cross_Attn_Q (W) (~262K) 262144 512 x 512 x 1 x 1 F16

49 dec.blk.3.cross_attn_v.weight Dec Block 3 Cross_Attn_V (W) (~262K) 262144 512 x 512 x 1 x 1 F16

50 dec.blk.3.cross_attn_norm.weight Dec Block 3 Cross_Attn_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32

51 dec.blk.3.ffn_up.weight Dec Block 3 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

52 dec.blk.3.ffn_down.weight Dec Block 3 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

53 dec.blk.3.ffn_norm.weight Dec Block 3 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

54 dec.blk.4.attn_k.weight Dec Block 4 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

55 dec.blk.4.attn_o.weight Dec Block 4 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

56 dec.blk.4.attn_q.weight Dec Block 4 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

57 dec.blk.4.attn_v.weight Dec Block 4 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

58 dec.blk.4.attn_norm.weight Dec Block 4 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

59 dec.blk.4.cross_attn_k.weight Dec Block 4 Cross_Attn_K (W) (~262K) 262144 512 x 512 x 1 x 1 F16

60 dec.blk.4.cross_attn_o.weight Dec Block 4 Cross_Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

61 dec.blk.4.cross_attn_q.weight Dec Block 4 Cross_Attn_Q (W) (~262K) 262144 512 x 512 x 1 x 1 F16

62 dec.blk.4.cross_attn_v.weight Dec Block 4 Cross_Attn_V (W) (~262K) 262144 512 x 512 x 1 x 1 F16

63 dec.blk.4.cross_attn_norm.weight Dec Block 4 Cross_Attn_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32

64 dec.blk.4.ffn_up.weight Dec Block 4 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

65 dec.blk.4.ffn_down.weight Dec Block 4 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

66 dec.blk.4.ffn_norm.weight Dec Block 4 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

67 dec.blk.5.attn_k.weight Dec Block 5 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

68 dec.blk.5.attn_o.weight Dec Block 5 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

69 dec.blk.5.attn_q.weight Dec Block 5 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

70 dec.blk.5.attn_v.weight Dec Block 5 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

71 dec.blk.5.attn_norm.weight Dec Block 5 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

72 dec.blk.5.cross_attn_k.weight Dec Block 5 Cross_Attn_K (W) (~262K) 262144 512 x 512 x 1 x 1 F16

73 dec.blk.5.cross_attn_o.weight Dec Block 5 Cross_Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

74 dec.blk.5.cross_attn_q.weight Dec Block 5 Cross_Attn_Q (W) (~262K) 262144 512 x 512 x 1 x 1 F16

75 dec.blk.5.cross_attn_v.weight Dec Block 5 Cross_Attn_V (W) (~262K) 262144 512 x 512 x 1 x 1 F16

76 dec.blk.5.cross_attn_norm.weight Dec Block 5 Cross_Attn_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32

77 dec.blk.5.ffn_up.weight Dec Block 5 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

78 dec.blk.5.ffn_down.weight Dec Block 5 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

79 dec.blk.5.ffn_norm.weight Dec Block 5 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

80 dec.output_norm.weight Dec Output Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

81 enc.blk.0.attn_k.weight Enc Block 0 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

82 enc.blk.0.attn_o.weight Enc Block 0 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

83 enc.blk.0.attn_q.weight Enc Block 0 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

84 enc.blk.0.attn_rel_b.weight Enc Block 0 Attn_Rel_B (W) ( 256) 256 8 x 32 x 1 x 1 F16

85 enc.blk.0.attn_v.weight Enc Block 0 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

86 enc.blk.0.attn_norm.weight Enc Block 0 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

87 enc.blk.0.ffn_up.weight Enc Block 0 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

88 enc.blk.0.ffn_down.weight Enc Block 0 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

89 enc.blk.0.ffn_norm.weight Enc Block 0 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

90 enc.blk.1.attn_k.weight Enc Block 1 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

91 enc.blk.1.attn_o.weight Enc Block 1 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

92 enc.blk.1.attn_q.weight Enc Block 1 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

93 enc.blk.1.attn_v.weight Enc Block 1 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

94 enc.blk.1.attn_norm.weight Enc Block 1 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

95 enc.blk.1.ffn_up.weight Enc Block 1 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

96 enc.blk.1.ffn_down.weight Enc Block 1 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

97 enc.blk.1.ffn_norm.weight Enc Block 1 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

98 enc.blk.2.attn_k.weight Enc Block 2 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

99 enc.blk.2.attn_o.weight Enc Block 2 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

100 enc.blk.2.attn_q.weight Enc Block 2 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

101 enc.blk.2.attn_v.weight Enc Block 2 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

102 enc.blk.2.attn_norm.weight Enc Block 2 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

103 enc.blk.2.ffn_up.weight Enc Block 2 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

104 enc.blk.2.ffn_down.weight Enc Block 2 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

105 enc.blk.2.ffn_norm.weight Enc Block 2 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

106 enc.blk.3.attn_k.weight Enc Block 3 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

107 enc.blk.3.attn_o.weight Enc Block 3 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

108 enc.blk.3.attn_q.weight Enc Block 3 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

109 enc.blk.3.attn_v.weight Enc Block 3 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

110 enc.blk.3.attn_norm.weight Enc Block 3 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

111 enc.blk.3.ffn_up.weight Enc Block 3 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

112 enc.blk.3.ffn_down.weight Enc Block 3 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

113 enc.blk.3.ffn_norm.weight Enc Block 3 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

114 enc.blk.4.attn_k.weight Enc Block 4 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

115 enc.blk.4.attn_o.weight Enc Block 4 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

116 enc.blk.4.attn_q.weight Enc Block 4 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

117 enc.blk.4.attn_v.weight Enc Block 4 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

118 enc.blk.4.attn_norm.weight Enc Block 4 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

119 enc.blk.4.ffn_up.weight Enc Block 4 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

120 enc.blk.4.ffn_down.weight Enc Block 4 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

121 enc.blk.4.ffn_norm.weight Enc Block 4 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

122 enc.blk.5.attn_k.weight Enc Block 5 Attention Key (W) (~262K) 262144 512 x 512 x 1 x 1 F16

123 enc.blk.5.attn_o.weight Enc Block 5 Attn_O (W) (~262K) 262144 512 x 512 x 1 x 1 F16

124 enc.blk.5.attn_q.weight Enc Block 5 Attention Query (W) (~262K) 262144 512 x 512 x 1 x 1 F16

125 enc.blk.5.attn_v.weight Enc Block 5 Attention Value (W) (~262K) 262144 512 x 512 x 1 x 1 F16

126 enc.blk.5.attn_norm.weight Enc Block 5 Attention Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

127 enc.blk.5.ffn_up.weight Enc Block 5 Feed-Forward Network "Up" (W) ( ~1M) 1048576 512 x 2048 x 1 x 1 F16

128 enc.blk.5.ffn_down.weight Enc Block 5 Feed-Forward Network "Down" (W) ( ~1M) 1048576 2048 x 512 x 1 x 1 F16

129 enc.blk.5.ffn_norm.weight Enc Block 5 Feed-Forward Network Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

130 enc.output_norm.weight Enc Output Normalization (W) ( 512) 512 512 x 1 x 1 x 1 F32

131 token_embd.weight Token Embedding (W) ( ~16M) 16449536 512 x 32128 x 1 x 1 F16

In the markdown output of gguf-dump.py, there's currently a special case for tensor names which don't start with blk (ref: #7853 (comment), it seemed reasonable at the time), and it puts them all in the same section (so that token_embd.weight is in the same section as output.weight). If you try it on a non-T5 model (e.g. tinyllama or something), you'll notice that there are sections for each layer number.

Fixed in #8090

compilade · 2024-06-24T03:34:13Z

gguf-py/gguf/constants.py

@@ -49,6 +49,7 @@ class LLM:
        EXPERT_WEIGHTS_SCALE              = "{arch}.expert_weights_scale"
        POOLING_TYPE                      = "{arch}.pooling_type"
        LOGIT_SCALE                       = "{arch}.logit_scale"
+        DECODER_START_TOKEN_ID            = "{arch}.decoder_start_token_id"


Is there a specific reason why the decoder_start_token_id isn't with the rest of the tokenizer config (like e.g. tokenizer.ggml.bos_token_id)?

In what way is it different from tokenizer.ggml.bos_token_id? When is it used?

Yes, it's different. It's not related to the tokenizer at all, it's a model parameter. Decoder start token is not a separate specific token like BOS, EOS or PAD. It's used in encoder-decoder models like T5 as an initial starting token of the autoregressive decoding process. The model creators decided to use one of the existing tokens as the decoder start token (PAD in case of T5) and id of this token is stored in this parameter.

convert-hf-to-gguf.py

Sadeghi85 · 2024-06-24T07:44:28Z

Hello,

Is Madlad-400 also supported? It's based on T5.

fairydreaming · 2024-06-24T09:37:53Z

Is Madlad-400 also supported? It's based on T5.

Currently it converts OK, but then crashes with a big boom: Segmentation fault (core dumped) during inference. But that's good, at least we'll fix more bugs before the merge.

MoonRide303 · 2024-06-24T10:05:57Z

I tried to convert pile-t5-xl (blog post) using 52fc870 - it didn't work:

python D:\repos-git\llama.cpp\convert-hf-to-gguf.py --outtype f16 ..\pile-t5-xl\ --outfile pile-t5-xl-F16.gguf
INFO:hf-to-gguf:Loading model: pile-t5-xl
ERROR:hf-to-gguf:Model UMT5ForConditionalGeneration is not supported

Could it be supported, too? It uses Llama tokenizer.

fairydreaming · 2024-06-24T15:54:38Z

Is Madlad-400 also supported? It's based on T5.

@Sadeghi85 I added some fixes allowing to run this (tested on madlad400-3b), but they are currently in my branch: https://github.com/fairydreaming/llama.cpp/tree/t5

Sadeghi85 · 2024-06-24T17:33:17Z

Is Madlad-400 also supported? It's based on T5.

@Sadeghi85 I added some fixes allowing to run this (tested on madlad400-3b), but they are currently in my branch: https://github.com/fairydreaming/llama.cpp/tree/t5

I converted hf model to gguf, it went ok. then compiled t5 branch and ran llama-server with the converted gguf, it gave below error:

GGML_ASSERT: J:\fairydreaming\llama.cpp\examples\server\server.cpp:690: llama_add_eos_token(model) != 1

fairydreaming · 2024-06-24T19:01:32Z

Is Madlad-400 also supported? It's based on T5.

@Sadeghi85 I added some fixes allowing to run this (tested on madlad400-3b), but they are currently in my branch: https://github.com/fairydreaming/llama.cpp/tree/t5

I converted hf model to gguf, it went ok. then compiled t5 branch and ran llama-server with the converted gguf, it gave below error:

GGML_ASSERT: J:\fairydreaming\llama.cpp\examples\server\server.cpp:690: llama_add_eos_token(model) != 1

@Sadeghi85 Only llama-cli supports encoder-decoder models at this moment.

Example:

(llama.cpp) phm@epyc:~/projects/llama.cpp-t5$ ./llama-cli --temp 0.01 -m /mnt/md0/models/madlad400-3b.gguf -p '<2de> I love pizza!'
Log start
main: build = 3235 (68b51162)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1719255737
...
llama_output_reserve: reallocating output buffer from size 0.98 MiB to 6.86 MiB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
▅ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
 Ich liebe Pizza! [end of text]

llama_print_timings:        load time =     267.43 ms
llama_print_timings:      sample time =       5.05 ms /     7 runs   (    0.72 ms per token,  1385.04 tokens per second)
llama_print_timings: prompt eval time =     390.53 ms /     8 tokens (   48.82 ms per token,    20.48 tokens per second)
llama_print_timings:        eval time =     731.00 ms /     6 runs   (  121.83 ms per token,     8.21 tokens per second)
llama_print_timings:       total time =    1613.58 ms /    14 tokens
Log end

It looks like there's some weird extra character outputted with madlad400-3b, but I didn't have time to investigate this yet.

fairydreaming · 2024-06-24T19:37:43Z

I tried to convert pile-t5-xl (blog post) using 52fc870 - it didn't work:
python D:\repos-git\llama.cpp\convert-hf-to-gguf.py --outtype f16 ..\pile-t5-xl\ --outfile pile-t5-xl-F16.gguf
INFO:hf-to-gguf:Loading model: pile-t5-xl
ERROR:hf-to-gguf:Model UMT5ForConditionalGeneration is not supported
Could it be supported, too? It uses Llama tokenizer.

From the description it looks like it's based on T5X, not T5.

I tried to convert pile-t5-xl (blog post) using 52fc870 - it didn't work:
python D:\repos-git\llama.cpp\convert-hf-to-gguf.py --outtype f16 ..\pile-t5-xl\ --outfile pile-t5-xl-F16.gguf
INFO:hf-to-gguf:Loading model: pile-t5-xl
ERROR:hf-to-gguf:Model UMT5ForConditionalGeneration is not supported
Could it be supported, too? It uses Llama tokenizer.

@MoonRide303 It looks like it would require some extra work, so maybe some day.

MoonRide303 · 2024-06-24T20:31:03Z

@fairydreaming It seems that they've released both T5 and T5x checkpoints. I've mentioned those, cause they've got some improvements on benchmarks compared to vanilla T5, and looked roughly compatible - but if it's not trivial to add support for it, then I guess they'll have to wait for better times.

Sadeghi85 · 2024-06-25T04:41:52Z

@Sadeghi85 Only llama-cli supports encoder-decoder models at this moment.

I tried with my own finetune of madlad400-7b and it worked correctly. (there is an extra character at the start as you mentioned)

Thanks.

fairydreaming · 2024-06-25T07:00:59Z

@fairydreaming It seems that they've released both T5 and T5x checkpoints. I've mentioned those, cause they've got some improvements on benchmarks compared to vanilla T5, and looked roughly compatible - but if it's not trivial to add support for it, then I guess they'll have to wait for better times.

@MoonRide303 I managed to run pile-t5-base, but it looks like all it can do is "to take a string of text that has been partially replaced with mask tokens and predict a sequence of tokens that would replace those mask tokens". Are there any fine-tunes of pile-t5 with more interesting use-cases?

fairydreaming · 2024-06-25T08:24:46Z

@Sadeghi85 Only llama-cli supports encoder-decoder models at this moment.

I tried with my own finetune of madlad400-7b and it worked correctly. (there is an extra character at the start as you mentioned)

Thanks.

@Sadeghi85 I know what's this extra char is, it's the decoder starting token (initial token passed to the decode to start autoregressive decoding process). In madlad400 decoder starting token has id 0 and token 0 is unk_token, and llama prints unknown tokens as “▅” U+2585 Lower Five Eighths Block Unicode Character. So it's not exactly a bug, but I'm not sure whether llama-cli shall print the decoder starting token or not.

MoonRide303 · 2024-06-25T11:33:28Z

@fairydreaming It seems that they've released both T5 and T5x checkpoints. I've mentioned those, cause they've got some improvements on benchmarks compared to vanilla T5, and looked roughly compatible - but if it's not trivial to add support for it, then I guess they'll have to wait for better times.

@MoonRide303 I managed to run pile-t5-base, but it looks like all it can do is "to take a string of text that has been partially replaced with mask tokens and predict a sequence of tokens that would replace those mask tokens". Are there any fine-tunes of pile-t5 with more interesting use-cases?

I've found finetuned variants (like FLAN) on HF, but didn't test those, yet. I was wondering if the base models could be used as an alternative for vanilla T5 for the purpose of image generation (in architectures like SD3 or PixArt Sigma) - it might require training new model with Pile-T5 from the scratch, though.

fairydreaming · 2024-06-25T13:39:56Z

@MoonRide303 I managed to run pile-t5-base, but it looks like all it can do is "to take a string of text that has been partially replaced with mask tokens and predict a sequence of tokens that would replace those mask tokens". Are there any fine-tunes of pile-t5 with more interesting use-cases?

I've found finetuned variants (like FLAN) on HF, but didn't test those, yet. I was wondering if the base models could be used as an alternative for vanilla T5 for the purpose of image generation (in architectures like SD3 or PixArt Sigma) - it might require training new model with Pile-T5 from the scratch, though.

@MoonRide303 Pile-T5 models should now work in my t5 branch. I checked pile-t5-xl-flan you mentioned, seems to generate coherent output.

MathiasSchindler · 2024-06-25T15:46:06Z

@Sadeghi85 Only llama-cli supports encoder-decoder models at this moment.

I tried with my own finetune of madlad400-7b and it worked correctly. (there is an extra character at the start as you mentioned)

Thanks.

Congratulations. Since this is outside the scope of this thread here, would you be able to point to me to a simple explanation how to use the MADLAD-400 model using llama.cpp? This would be greatly appreciated.

Sadeghi85 · 2024-06-25T16:36:00Z

Congratulations. Since this is outside the scope of this thread here, would you be able to point to me to a simple explanation how to use the MADLAD-400 model using llama.cpp? This would be greatly appreciated.

Follow T5 support progression here: #5763

When it's complete, you can use madlad like any other model.

If you want to test it now, you have to compile fairydreaming's t5 branch. Use convert-hf-to-gguf.py to convert madlad model to gguf and use llama-cli for inference.

vladfaust · 2024-06-29T14:47:12Z

It may be out of the scope of this PR, but I'd like to note that ./llama-quantize ./models/t5-small/ggml-model-f16.gguf ./models/t5-small/ggml-model-Q4_K_M.gguf Q4_K_M fails with the following output:

main: build = 3252 (7d7fff46)
main: built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.2.0
main: quantizing './models/t5-small/ggml-model-f16.gguf' to './models/t5-small/ggml-model-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 28 key-value pairs and 132 tensors from ./models/t5-small/ggml-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = t5
llama_model_loader: - kv   1:                               general.name str              = T5
llama_model_loader: - kv   2:                          t5.context_length u32              = 512
llama_model_loader: - kv   3:                        t5.embedding_length u32              = 512
llama_model_loader: - kv   4:                     t5.feed_forward_length u32              = 2048
llama_model_loader: - kv   5:                             t5.block_count u32              = 6
llama_model_loader: - kv   6:                    t5.attention.head_count u32              = 8
llama_model_loader: - kv   7:                    t5.attention.key_length u32              = 64
llama_model_loader: - kv   8:                  t5.attention.value_length u32              = 64
llama_model_loader: - kv   9:            t5.attention.layer_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  10:        t5.attention.relative_buckets_count u32              = 32
llama_model_loader: - kv  11:        t5.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  12:                  t5.decoder_start_token_id u32              = 0
llama_model_loader: - kv  13:                          general.file_type u32              = 1
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = t5
llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,32128]   = ["<pad>", "</s>", "<unk>", "▁", "X"...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,32128]   = [0.000000, 0.000000, 0.000000, -2.012...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,32128]   = [3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:            tokenizer.ggml.add_space_prefix bool             = true
llama_model_loader: - kv  20:    tokenizer.ggml.remove_extra_whitespaces bool             = true
llama_model_loader: - kv  21:        tokenizer.ggml.precompiled_charsmap arr[u8,237539]   = [0, 180, 2, 0, 0, 132, 0, 0, 0, 0, 0,...
llama_model_loader: - kv  22:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  23:            tokenizer.ggml.unknown_token_id u32              = 2
llama_model_loader: - kv  24:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  25:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  26:               tokenizer.ggml.add_eos_token bool             = true
llama_model_loader: - kv  27:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   32 tensors
llama_model_loader: - type  f16:  100 tensors
GGML_ASSERT: src/llama.cpp:17201: (qs.n_attention_wv == 0 || qs.n_attention_wv == (int)model.hparams.n_layer) && "n_attention_wv is unexpected"
[1]    68589 abort      ./llama-quantize ./models/t5-small/ggml-model-f16.gguf  Q4_K_M

fairydreaming · 2024-06-29T16:15:30Z

It may be out of the scope of this PR, but I'd like to note that ./llama-quantize ./models/t5-small/ggml-model-f16.gguf ./models/t5-small/ggml-model-Q4_K_M.gguf Q4_K_M fails with the following output:

@vladfaust I added fixes for this in #8141 PR, thanks for reporting!

gguf-py, convert-hf : add model conversion support for T5ForCondition…

da4f661

…alGeneration and T5WithLMHeadModel

github-actions bot added the python python script changes label Jun 21, 2024

fairydreaming requested a review from ggerganov June 21, 2024 14:36

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 21, 2024

fairydreaming mentioned this pull request Jun 22, 2024

llama : add T5 (encoder-decoder) support #5763

Closed

felladrin approved these changes Jun 22, 2024

View reviewed changes

fairydreaming closed this Jun 22, 2024

fairydreaming reopened this Jun 22, 2024

gguf-py, convert-hf : conversion support for FLAN-T5 model family

47a0a0c

fairydreaming changed the title ~~Model conversion support for T5 model variants~~ Model conversion support for T5 and FLAN-T5 model variants Jun 23, 2024

fairydreaming requested a review from compilade June 23, 2024 12:39

convert-hf : for T5 skip both decoder.embed_tokens and encoder.embed_…

98931f8

…tokens tensors (they are duplicates of shared tensor)

ggerganov approved these changes Jun 23, 2024

View reviewed changes

fairydreaming and others added 4 commits June 23, 2024 18:49

Merge branch 'ggerganov:master' into t5-clean

a59f4f9

gguf-py : whitespace formatting fixes

843b1b7

Merge remote-tracking branch 'upstream/master' into t5-clean

3d5019f

Merge branch 'ggerganov:master' into t5-clean

8a30bd0

compilade reviewed Jun 24, 2024

View reviewed changes

convert-hf : remove duplicated initialization of variables

3853e3a

fairydreaming merged commit de0d6a6 into ggerganov:master Jun 24, 2024
19 checks passed

Green-Sky mentioned this pull request Jun 25, 2024

Support for SD3 leejet/stable-diffusion.cpp#287

Open

compilade mentioned this pull request Jun 30, 2024

Support glm3 and glm4. #8031

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model conversion support for T5 and FLAN-T5 model variants #8055

Model conversion support for T5 and FLAN-T5 model variants #8055

fairydreaming commented Jun 21, 2024 •

edited

Loading

felladrin left a comment

fairydreaming commented Jun 22, 2024

fairydreaming commented Jun 23, 2024

felladrin commented Jun 23, 2024

fairydreaming commented Jun 23, 2024

felladrin commented Jun 23, 2024

compilade left a comment •

edited

Loading

compilade Jun 24, 2024

fairydreaming Jun 24, 2024

compilade Jun 24, 2024 •

edited

Loading

fairydreaming Jun 24, 2024

compilade Jun 24, 2024

fairydreaming Jun 24, 2024

Sadeghi85 commented Jun 24, 2024

fairydreaming commented Jun 24, 2024

MoonRide303 commented Jun 24, 2024 •

edited

Loading

fairydreaming commented Jun 24, 2024

Sadeghi85 commented Jun 24, 2024

fairydreaming commented Jun 24, 2024 •

edited

Loading

fairydreaming commented Jun 24, 2024

MoonRide303 commented Jun 24, 2024

Sadeghi85 commented Jun 25, 2024

fairydreaming commented Jun 25, 2024

fairydreaming commented Jun 25, 2024

MoonRide303 commented Jun 25, 2024

fairydreaming commented Jun 25, 2024

MathiasSchindler commented Jun 25, 2024

Sadeghi85 commented Jun 25, 2024

vladfaust commented Jun 29, 2024

fairydreaming commented Jun 29, 2024

		MODEL_TENSOR.DEC_OUTPUT_NORM: "dec.output_norm",
		MODEL_TENSOR.ENC_ATTN_NORM: "enc.blk.{bid}.attn_norm",

T_ID	Tensor Layer Name	Human Friendly Tensor Layer Name	Elements	Shape	Type
0	dec.blk.0.attn_k.weight	Dec Block 0 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
1	dec.blk.0.attn_o.weight	Dec Block 0 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
2	dec.blk.0.attn_q.weight	Dec Block 0 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
3	dec.blk.0.attn_rel_b.weight	Dec Block 0 Attn_Rel_B (W)	( 256) 256	8 x 32 x 1 x 1	F16
4	dec.blk.0.attn_v.weight	Dec Block 0 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
5	dec.blk.0.attn_norm.weight	Dec Block 0 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
6	dec.blk.0.cross_attn_k.weight	Dec Block 0 Cross_Attn_K (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
7	dec.blk.0.cross_attn_o.weight	Dec Block 0 Cross_Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
8	dec.blk.0.cross_attn_q.weight	Dec Block 0 Cross_Attn_Q (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
9	dec.blk.0.cross_attn_rel_b.weight	Dec Block 0 Cross_Attn_Rel_B (W)	( 256) 256	8 x 32 x 1 x 1	F16
10	dec.blk.0.cross_attn_v.weight	Dec Block 0 Cross_Attn_V (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
11	dec.blk.0.cross_attn_norm.weight	Dec Block 0 Cross_Attn_Norm (W)	( 512) 512	512 x 1 x 1 x 1	F32
12	dec.blk.0.ffn_up.weight	Dec Block 0 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
13	dec.blk.0.ffn_down.weight	Dec Block 0 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
14	dec.blk.0.ffn_norm.weight	Dec Block 0 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
15	dec.blk.1.attn_k.weight	Dec Block 1 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
16	dec.blk.1.attn_o.weight	Dec Block 1 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
17	dec.blk.1.attn_q.weight	Dec Block 1 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
18	dec.blk.1.attn_v.weight	Dec Block 1 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
19	dec.blk.1.attn_norm.weight	Dec Block 1 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
20	dec.blk.1.cross_attn_k.weight	Dec Block 1 Cross_Attn_K (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
21	dec.blk.1.cross_attn_o.weight	Dec Block 1 Cross_Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
22	dec.blk.1.cross_attn_q.weight	Dec Block 1 Cross_Attn_Q (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
23	dec.blk.1.cross_attn_v.weight	Dec Block 1 Cross_Attn_V (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
24	dec.blk.1.cross_attn_norm.weight	Dec Block 1 Cross_Attn_Norm (W)	( 512) 512	512 x 1 x 1 x 1	F32
25	dec.blk.1.ffn_up.weight	Dec Block 1 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
26	dec.blk.1.ffn_down.weight	Dec Block 1 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
27	dec.blk.1.ffn_norm.weight	Dec Block 1 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
28	dec.blk.2.attn_k.weight	Dec Block 2 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
29	dec.blk.2.attn_o.weight	Dec Block 2 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
30	dec.blk.2.attn_q.weight	Dec Block 2 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
31	dec.blk.2.attn_v.weight	Dec Block 2 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
32	dec.blk.2.attn_norm.weight	Dec Block 2 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
33	dec.blk.2.cross_attn_k.weight	Dec Block 2 Cross_Attn_K (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
34	dec.blk.2.cross_attn_o.weight	Dec Block 2 Cross_Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
35	dec.blk.2.cross_attn_q.weight	Dec Block 2 Cross_Attn_Q (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
36	dec.blk.2.cross_attn_v.weight	Dec Block 2 Cross_Attn_V (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
37	dec.blk.2.cross_attn_norm.weight	Dec Block 2 Cross_Attn_Norm (W)	( 512) 512	512 x 1 x 1 x 1	F32
38	dec.blk.2.ffn_up.weight	Dec Block 2 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
39	dec.blk.2.ffn_down.weight	Dec Block 2 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
40	dec.blk.2.ffn_norm.weight	Dec Block 2 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
41	dec.blk.3.attn_k.weight	Dec Block 3 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
42	dec.blk.3.attn_o.weight	Dec Block 3 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
43	dec.blk.3.attn_q.weight	Dec Block 3 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
44	dec.blk.3.attn_v.weight	Dec Block 3 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
45	dec.blk.3.attn_norm.weight	Dec Block 3 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
46	dec.blk.3.cross_attn_k.weight	Dec Block 3 Cross_Attn_K (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
47	dec.blk.3.cross_attn_o.weight	Dec Block 3 Cross_Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
48	dec.blk.3.cross_attn_q.weight	Dec Block 3 Cross_Attn_Q (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
49	dec.blk.3.cross_attn_v.weight	Dec Block 3 Cross_Attn_V (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
50	dec.blk.3.cross_attn_norm.weight	Dec Block 3 Cross_Attn_Norm (W)	( 512) 512	512 x 1 x 1 x 1	F32
51	dec.blk.3.ffn_up.weight	Dec Block 3 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
52	dec.blk.3.ffn_down.weight	Dec Block 3 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
53	dec.blk.3.ffn_norm.weight	Dec Block 3 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
54	dec.blk.4.attn_k.weight	Dec Block 4 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
55	dec.blk.4.attn_o.weight	Dec Block 4 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
56	dec.blk.4.attn_q.weight	Dec Block 4 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
57	dec.blk.4.attn_v.weight	Dec Block 4 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
58	dec.blk.4.attn_norm.weight	Dec Block 4 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
59	dec.blk.4.cross_attn_k.weight	Dec Block 4 Cross_Attn_K (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
60	dec.blk.4.cross_attn_o.weight	Dec Block 4 Cross_Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
61	dec.blk.4.cross_attn_q.weight	Dec Block 4 Cross_Attn_Q (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
62	dec.blk.4.cross_attn_v.weight	Dec Block 4 Cross_Attn_V (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
63	dec.blk.4.cross_attn_norm.weight	Dec Block 4 Cross_Attn_Norm (W)	( 512) 512	512 x 1 x 1 x 1	F32
64	dec.blk.4.ffn_up.weight	Dec Block 4 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
65	dec.blk.4.ffn_down.weight	Dec Block 4 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
66	dec.blk.4.ffn_norm.weight	Dec Block 4 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
67	dec.blk.5.attn_k.weight	Dec Block 5 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
68	dec.blk.5.attn_o.weight	Dec Block 5 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
69	dec.blk.5.attn_q.weight	Dec Block 5 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
70	dec.blk.5.attn_v.weight	Dec Block 5 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
71	dec.blk.5.attn_norm.weight	Dec Block 5 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
72	dec.blk.5.cross_attn_k.weight	Dec Block 5 Cross_Attn_K (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
73	dec.blk.5.cross_attn_o.weight	Dec Block 5 Cross_Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
74	dec.blk.5.cross_attn_q.weight	Dec Block 5 Cross_Attn_Q (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
75	dec.blk.5.cross_attn_v.weight	Dec Block 5 Cross_Attn_V (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
76	dec.blk.5.cross_attn_norm.weight	Dec Block 5 Cross_Attn_Norm (W)	( 512) 512	512 x 1 x 1 x 1	F32
77	dec.blk.5.ffn_up.weight	Dec Block 5 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
78	dec.blk.5.ffn_down.weight	Dec Block 5 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
79	dec.blk.5.ffn_norm.weight	Dec Block 5 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
80	dec.output_norm.weight	Dec Output Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
81	enc.blk.0.attn_k.weight	Enc Block 0 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
82	enc.blk.0.attn_o.weight	Enc Block 0 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
83	enc.blk.0.attn_q.weight	Enc Block 0 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
84	enc.blk.0.attn_rel_b.weight	Enc Block 0 Attn_Rel_B (W)	( 256) 256	8 x 32 x 1 x 1	F16
85	enc.blk.0.attn_v.weight	Enc Block 0 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
86	enc.blk.0.attn_norm.weight	Enc Block 0 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
87	enc.blk.0.ffn_up.weight	Enc Block 0 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
88	enc.blk.0.ffn_down.weight	Enc Block 0 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
89	enc.blk.0.ffn_norm.weight	Enc Block 0 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
90	enc.blk.1.attn_k.weight	Enc Block 1 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
91	enc.blk.1.attn_o.weight	Enc Block 1 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
92	enc.blk.1.attn_q.weight	Enc Block 1 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
93	enc.blk.1.attn_v.weight	Enc Block 1 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
94	enc.blk.1.attn_norm.weight	Enc Block 1 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
95	enc.blk.1.ffn_up.weight	Enc Block 1 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
96	enc.blk.1.ffn_down.weight	Enc Block 1 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
97	enc.blk.1.ffn_norm.weight	Enc Block 1 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
98	enc.blk.2.attn_k.weight	Enc Block 2 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
99	enc.blk.2.attn_o.weight	Enc Block 2 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
100	enc.blk.2.attn_q.weight	Enc Block 2 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
101	enc.blk.2.attn_v.weight	Enc Block 2 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
102	enc.blk.2.attn_norm.weight	Enc Block 2 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
103	enc.blk.2.ffn_up.weight	Enc Block 2 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
104	enc.blk.2.ffn_down.weight	Enc Block 2 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
105	enc.blk.2.ffn_norm.weight	Enc Block 2 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
106	enc.blk.3.attn_k.weight	Enc Block 3 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
107	enc.blk.3.attn_o.weight	Enc Block 3 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
108	enc.blk.3.attn_q.weight	Enc Block 3 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
109	enc.blk.3.attn_v.weight	Enc Block 3 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
110	enc.blk.3.attn_norm.weight	Enc Block 3 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
111	enc.blk.3.ffn_up.weight	Enc Block 3 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
112	enc.blk.3.ffn_down.weight	Enc Block 3 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
113	enc.blk.3.ffn_norm.weight	Enc Block 3 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
114	enc.blk.4.attn_k.weight	Enc Block 4 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
115	enc.blk.4.attn_o.weight	Enc Block 4 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
116	enc.blk.4.attn_q.weight	Enc Block 4 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
117	enc.blk.4.attn_v.weight	Enc Block 4 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
118	enc.blk.4.attn_norm.weight	Enc Block 4 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
119	enc.blk.4.ffn_up.weight	Enc Block 4 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
120	enc.blk.4.ffn_down.weight	Enc Block 4 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
121	enc.blk.4.ffn_norm.weight	Enc Block 4 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
122	enc.blk.5.attn_k.weight	Enc Block 5 Attention Key (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
123	enc.blk.5.attn_o.weight	Enc Block 5 Attn_O (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
124	enc.blk.5.attn_q.weight	Enc Block 5 Attention Query (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
125	enc.blk.5.attn_v.weight	Enc Block 5 Attention Value (W)	(~262K) 262144	512 x 512 x 1 x 1	F16
126	enc.blk.5.attn_norm.weight	Enc Block 5 Attention Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
127	enc.blk.5.ffn_up.weight	Enc Block 5 Feed-Forward Network "Up" (W)	( ~1M) 1048576	512 x 2048 x 1 x 1	F16
128	enc.blk.5.ffn_down.weight	Enc Block 5 Feed-Forward Network "Down" (W)	( ~1M) 1048576	2048 x 512 x 1 x 1	F16
129	enc.blk.5.ffn_norm.weight	Enc Block 5 Feed-Forward Network Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
130	enc.output_norm.weight	Enc Output Normalization (W)	( 512) 512	512 x 1 x 1 x 1	F32
131	token_embd.weight	Token Embedding (W)	( ~16M) 16449536	512 x 32128 x 1 x 1	F16

Model conversion support for T5 and FLAN-T5 model variants #8055

Model conversion support for T5 and FLAN-T5 model variants #8055

Conversation

fairydreaming commented Jun 21, 2024 • edited Loading

felladrin left a comment

Choose a reason for hiding this comment

fairydreaming commented Jun 22, 2024

fairydreaming commented Jun 23, 2024

felladrin commented Jun 23, 2024

fairydreaming commented Jun 23, 2024

felladrin commented Jun 23, 2024

compilade left a comment • edited Loading

Choose a reason for hiding this comment

compilade Jun 24, 2024

Choose a reason for hiding this comment

fairydreaming Jun 24, 2024

Choose a reason for hiding this comment

compilade Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

fairydreaming Jun 24, 2024

Choose a reason for hiding this comment

compilade Jun 24, 2024

Choose a reason for hiding this comment

fairydreaming Jun 24, 2024

Choose a reason for hiding this comment

Sadeghi85 commented Jun 24, 2024

fairydreaming commented Jun 24, 2024

MoonRide303 commented Jun 24, 2024 • edited Loading

fairydreaming commented Jun 24, 2024

Sadeghi85 commented Jun 24, 2024

fairydreaming commented Jun 24, 2024 • edited Loading

fairydreaming commented Jun 24, 2024

MoonRide303 commented Jun 24, 2024

Sadeghi85 commented Jun 25, 2024

fairydreaming commented Jun 25, 2024

fairydreaming commented Jun 25, 2024

MoonRide303 commented Jun 25, 2024

fairydreaming commented Jun 25, 2024

MathiasSchindler commented Jun 25, 2024

Sadeghi85 commented Jun 25, 2024

vladfaust commented Jun 29, 2024

fairydreaming commented Jun 29, 2024

fairydreaming commented Jun 21, 2024 •

edited

Loading

compilade left a comment •

edited

Loading

compilade Jun 24, 2024 •

edited

Loading

MoonRide303 commented Jun 24, 2024 •

edited

Loading

fairydreaming commented Jun 24, 2024 •

edited

Loading