Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QX_4 quantization #1240

Closed
ikawrakow opened this issue Apr 29, 2023 · 10 comments
Closed

QX_4 quantization #1240

ikawrakow opened this issue Apr 29, 2023 · 10 comments
Labels
enhancement New feature or request generation quality Quality of model output Less than 4 bits Efforts related to viable quantized models using <4 bits

Comments

@ikawrakow
Copy link
Contributor

ikawrakow commented Apr 29, 2023

Summary

Use 16 x 8 "super-blocks" for quantization, having one fp16 scale for the "super-block" and 16 quantized scales per 8 model weights. This is particularly useful for 2- and 3-bit quantization, but it also outperforms the existing 4-bit quantization schemes Q4_0 and Q4_2.

Details

The naming of existing llama.cpp quantizations follows the scheme QX_Y, where X is the number of bits used for the quants, and Y is 0, 1, 2, or 3. When Y is even (0 or 2), model weights x are computed from the quants q as x = d * q. When Y is odd, then x = m + d * q is used. If we look at the integer part of Y/2 ([Y/2]), then the number of weights in a quantization block is 32 (Q4_0, Q4_1, Q5_0) when [Y/2] = 0, and 16 (Q4_2, Q4_3) when [Y/2] = 1. From the latest perplexity results one can see that quantization using blocks of 16 weights performs better than quantization that uses blocks of 32. The logical conclusion from this would be to look into using blocks of 8 weights. Following the existing naming convention, quantization of type x = d * q for blocks with 8 weights would be QX_4, and quantization of type x = m + d * q would be QX_5. The problem with going to blocks with 8 weights using the same strategy as utilized in Q4_2 and Q4_3 is that the bits needed to store the scale d (or scale d and offset m) start becoming comparable to the number of bits used for the quants q. For instance, using fp16 for the scale in a block of 8 weights requires 16 bits, while the quants need 32 bits for 4-bit quantization, so effectively 6 bits per weight (bpw).

So, after this long introduction, here is an idea how one can use quantization blocks of 8 weights while keeping bpw reasonable: one can use "super-blocks" that combine N quantization blocks. The scale in each block of 8 weights is stored as int8_t, and there is a single fp16 that converts the quantized scales to their final value. E.g., for 4-bit quantization

#define QK4_4 128       
typedef struct {      
    int8_t  scales[QK4_4/8];   // quantized scales per 8 weights   
    uint8_t qs[QK4_4/2];        // nibbles / quants of the "super-block"       
    ggml_fp16_t d;                  //  "super-block" scale  
} block_q4_4;       

In the above, N = 16, i.e., there are 16 blocks of 8 weights, each having its own 8-bit quantized scale. This ends up using 5.125 bpw (4 + 1.125).

To further clarify the idea, here is a simple scalar implementation of the de-quantization for Q4_4:

static void dequantize_row_q4_4(const void * restrict vx, float * restrict y, int k) {
    assert(k % QK4_4 == 0);
    const int nb = k / QK4_4;
          
    const block_q4_4 * restrict x = vx;
    
    uint32_t u;
    for (int i = 0; i < nb; i++) {
        const float d_all = GGML_FP16_TO_FP32(x[i].d);

        const uint8_t * q = x[i].qs;
    
        for (int n = 0; n < QK4_4/8; ++n) {
            memcpy(&u, q, 4);
            const uint32_t u1 = (u >> 0) & 0x0f0f0f0f;
            const uint32_t u2 = (u >> 4) & 0x0f0f0f0f;
            const int8_t * v1 = (const int8_t*)&u1;
            const int8_t * v2 = (const int8_t*)&u2;
            float d = d_all * x[i].scales[n];
            y[0] = d * (v1[0] - 8);
            y[1] = d * (v2[0] - 8);
            y[2] = d * (v1[1] - 8);
            y[3] = d * (v2[1] - 8);
            y[4] = d * (v1[2] - 8);
            y[5] = d * (v2[2] - 8);
            y[6] = d * (v1[3] - 8); 
            y[7] = d * (v2[3] - 8);
            q += 4;
            y += 8;
        } 
    }
}           

Perplexity results

I have done some experiments with this idea for 2-, 3- and 4-bit quantization and the following table summarizes the perplexity results. All calculations are with output tensor kept as fp16, which adds about 200 MB to the size of the quantized model (compared to the output.weight tensor also being quantized):

Model Measure Q2_4 Q3_4 Q4_4
7B perplexity 8.3618 6.3559 6.1378
7B file size 2.65G 3.45G 4.2G
13B perplexity 6.7409 5.5110 5.2981
13B file size 4.95G 6.45G 8.0G

A few observations from the experiments and existing 4- and 5-bit results

  • At 4 and 5 bits, quantization of type x = m + d * q (QX_1, QX_3) performs better than x = d * m (QX_0, QX_2, and the QX_4 proposed here). This trend is reversed for 2- and 3-bit quantization. Especially for 2-bit quantization, Q2_1 and Q2_3 give basically useless results
  • There has been some work done for 2- and 3-bit quantization on this branch. The Q2_4 quantization proposed here gives much lower perplexity compared to what is reported there for Q2_2 (and my own experiment with Q2_2 gives a 7B perplexity of 10.6271 and 13B perplexity of 8.3552. The 30B Q2_2 perplexity of 6.9507 reported there is higher than the 13B Q2_4 perplexity found here ).
  • At 2-bit quantization, the difference between quantized and not quantized output tensor is significant (e.g., quantized output results in a 7B perplexity of 9.0087 vs 8.3618 from the above table. At 3-bit quantization the difference is much smaller (e.g., 6.4433 vs 6.3559 for 7B).
  • Q4_4 is better than Q4_0 and Q4_2, but the difference is much less compared to 2- and 3-bit quantizations
  • I have tried N = 8, 16, 32 (so "super-blocks" of 64, 128, 256 weights). Perplexity results remain effectively the same, while extra bits per weight (extra as in addition to the X quantization bits) change from 1.25 to 1.125 to 1.0625. Tensor sizes are divisible by 256 for all layers in the 7B and 13B models, so one could use this instead of the super-block size of 128 used here (this saves ~0.1G for the 13B model).

Here are the perplexity runs reported above:

Q2_4, 7B

main: seed = 1682671488
llama.cpp: loading model from ../models/7B/q24.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 15 (mostly Q2_4)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 4504.40 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.49 seconds per pass - ETA 16 minutes
[1]6.2670,[2]7.2397,[3]7.8484,[4]8.7113,[5]8.5541,[6]8.5304,[7]8.6772,[8]8.7266,[9]9.2108,[10]9.5711,[11]9.9331,[12]10.0286,[13]10.0075,[14]10.2435,[15]10.5844,[16]10.0359,[17]9.8055,[18]9.8147,[19]9.2709,[20]9.2379,[21]9.0826,[22]8.9060,[23]8.8751,[24]8.7826,[25]8.7974,[26]8.5861,[27]8.3354,[28]8.2662,[29]8.1377,[30]7.9415,[31]7.9138,[32]7.9309,[33]7.8544,[34]7.9014,[35]7.9400,[36]8.0232,[37]8.0318,[38]8.0524,[39]8.1139,[40]8.1847,[41]8.2215,[42]8.2709,[43]8.1977,[44]8.2695,[45]8.2579,[46]8.2205,[47]8.2469,[48]8.1920,[49]8.1907,[50]8.1222,[51]8.1247,[52]8.1002,[53]8.1535,[54]8.1312,[55]8.0808,[56]8.1320,[57]8.1640,[58]8.1951,[59]8.2061,[60]8.2674,[61]8.2525,[62]8.3396,[63]8.3781,[64]8.3872,[65]8.4551,[66]8.4626,[67]8.4924,[68]8.5125,[69]8.5512,[70]8.5979,[71]8.6292,[72]8.6747,[73]8.7532,[74]8.7463,[75]8.7566,[76]8.7672,[77]8.7906,[78]8.7662,[79]8.7960,[80]8.7820,[81]8.8039,[82]8.8159,[83]8.7322,[84]8.7181,[85]8.7079,[86]8.6686,[87]8.6050,[88]8.5635,[89]8.5359,[90]8.5171,[91]8.5563,[92]8.5568,[93]8.5615,[94]8.5647,[95]8.6043,[96]8.6056,[97]8.6059,[98]8.5980,[99]8.5716,[100]8.5662,[101]8.5971,[102]8.5883,[103]8.6169,[104]8.6284,[105]8.6296,[106]8.6542,[107]8.6555,[108]8.6706,[109]8.6619,[110]8.6569,[111]8.6789,[112]8.7064,[113]8.7181,[114]8.7197,[115]8.7335,[116]8.7271,[117]8.7349,[118]8.7694,[119]8.7968,[120]8.8443,[121]8.8727,[122]8.9000,[123]8.9478,[124]8.9703,[125]8.9524,[126]9.0055,[127]9.0459,[128]9.0799,[129]9.0528,[130]9.0613,[131]9.0543,[132]9.0445,[133]9.0327,[134]9.0539,[135]9.0482,[136]9.0383,[137]9.0256,[138]9.0123,[139]9.0012,[140]9.0007,[141]8.9808,[142]8.9746,[143]8.9590,[144]8.9421,[145]8.9386,[146]8.9224,[147]8.9316,[148]8.9294,[149]8.9273,[150]8.9260,[151]8.9272,[152]8.9072,[153]8.8795,[154]8.8652,[155]8.8713,[156]8.8626,[157]8.8816,[158]8.8812,[159]8.8950,[160]8.8969,[161]8.9131,[162]8.8704,[163]8.8516,[164]8.8103,[165]8.7617,[166]8.7196,[167]8.6616,[168]8.6151,[169]8.5920,[170]8.5716,[171]8.5289,[172]8.4991,[173]8.4753,[174]8.4339,[175]8.4022,[176]8.3800,[177]8.3518,[178]8.3213,[179]8.2960,[180]8.2789,[181]8.2471,[182]8.2158,[183]8.1923,[184]8.1901,[185]8.1759,[186]8.1755,[187]8.1800,[188]8.1783,[189]8.2022,[190]8.2040,[191]8.2313,[192]8.2490,[193]8.2748,[194]8.2906,[195]8.3192,[196]8.3382,[197]8.3638,[198]8.3830,[199]8.3825,[200]8.3868,[201]8.3827,[202]8.4192,[203]8.4279,[204]8.4391,[205]8.4521,[206]8.4600,[207]8.4536,[208]8.4630,[209]8.4689,[210]8.4726,[211]8.4873,[212]8.4963,[213]8.5092,[214]8.5160,[215]8.5206,[216]8.5372,[217]8.5580,[218]8.5740,[219]8.5719,[220]8.5640,[221]8.5553,[222]8.5496,[223]8.5326,[224]8.5205,[225]8.5155,[226]8.5383,[227]8.5536,[228]8.5615,[229]8.5653,[230]8.5606,[231]8.5816,[232]8.5697,[233]8.5431,[234]8.5208,[235]8.5105,[236]8.4995,[237]8.4836,[238]8.4880,[239]8.4663,[240]8.4513,[241]8.4579,[242]8.4639,[243]8.4602,[244]8.4461,[245]8.4448,[246]8.4290,[247]8.4139,[248]8.4028,[249]8.3996,[250]8.4048,[251]8.3959,[252]8.3912,[253]8.3790,[254]8.3743,[255]8.3575,[256]8.3323,[257]8.3143,[258]8.3039,[259]8.3025,[260]8.2925,[261]8.2885,[262]8.2801,[263]8.2747,[264]8.2566,[265]8.2547,[266]8.2506,[267]8.2400,[268]8.2504,[269]8.2489,[270]8.2463,[271]8.2539,[272]8.2614,[273]8.2570,[274]8.2603,[275]8.2742,[276]8.2809,[277]8.3035,[278]8.3175,[279]8.3288,[280]8.3324,[281]8.3439,[282]8.3494,[283]8.3672,[284]8.3763,[285]8.3869,[286]8.4031,[287]8.4049,[288]8.4160,[289]8.4034,[290]8.3824,[291]8.3640,[292]8.3433,[293]8.3263,[294]8.3285,[295]8.3263,[296]8.3322,[297]8.3320,[298]8.3405,[299]8.3354,[300]8.3234,[301]8.3182,[302]8.3099,[303]8.2992,[304]8.2859,[305]8.2847,[306]8.2684,[307]8.2696,[308]8.2736,[309]8.2506,[310]8.2432,[311]8.2368,[312]8.2386,[313]8.2294,[314]8.2276,[315]8.2045,[316]8.2053,[317]8.1839,[318]8.1579,[319]8.1772,[320]8.1937,[321]8.2004,[322]8.1927,[323]8.1903,[324]8.1923,[325]8.2089,[326]8.2077,[327]8.2130,[328]8.2180,[329]8.2283,[330]8.2368,[331]8.2554,[332]8.2503,[333]8.2620,[334]8.2542,[335]8.2433,[336]8.2464,[337]8.2399,[338]8.2423,[339]8.2345,[340]8.2289,[341]8.2386,[342]8.2404,[343]8.2479,[344]8.2476,[345]8.2446,[346]8.2377,[347]8.2412,[348]8.2458,[349]8.2465,[350]8.2403,[351]8.2399,[352]8.2413,[353]8.2316,[354]8.2341,[355]8.2429,[356]8.2474,[357]8.2407,[358]8.2533,[359]8.2570,[360]8.2481,[361]8.2455,[362]8.2539,[363]8.2650,[364]8.2735,[365]8.2813,[366]8.2828,[367]8.2931,[368]8.2887,[369]8.2886,[370]8.2892,[371]8.2801,[372]8.2848,[373]8.2920,[374]8.2879,[375]8.2865,[376]8.2957,[377]8.2873,[378]8.2901,[379]8.2977,[380]8.2848,[381]8.2801,[382]8.2740,[383]8.2709,[384]8.2678,[385]8.2664,[386]8.2672,[387]8.2645,[388]8.2579,[389]8.2485,[390]8.2391,[391]8.2277,[392]8.2256,[393]8.2288,[394]8.2330,[395]8.2318,[396]8.2207,[397]8.2295,[398]8.2333,[399]8.2437,[400]8.2453,[401]8.2475,[402]8.2493,[403]8.2497,[404]8.2573,[405]8.2502,[406]8.2458,[407]8.2465,[408]8.2466,[409]8.2624,[410]8.2776,[411]8.2933,[412]8.3162,[413]8.3303,[414]8.3414,[415]8.3478,[416]8.3594,[417]8.3760,[418]8.3834,[419]8.3934,[420]8.4054,[421]8.4215,[422]8.4264,[423]8.4389,[424]8.4543,[425]8.4667,[426]8.4751,[427]8.4789,[428]8.4899,[429]8.4956,[430]8.5069,[431]8.5263,[432]8.5289,[433]8.5255,[434]8.5161,[435]8.5146,[436]8.5163,[437]8.5286,[438]8.5396,[439]8.5341,[440]8.5309,[441]8.5231,[442]8.5200,[443]8.5216,[444]8.5225,[445]8.5190,[446]8.5207,[447]8.5240,[448]8.5283,[449]8.5239,[450]8.5232,[451]8.5163,[452]8.5112,[453]8.5021,[454]8.4972,[455]8.4964,[456]8.5014,[457]8.5044,[458]8.5017,[459]8.5020,[460]8.5125,[461]8.5089,[462]8.5070,[463]8.5138,[464]8.5136,[465]8.5099,[466]8.5021,[467]8.5045,[468]8.5073,[469]8.5106,[470]8.5116,[471]8.5045,[472]8.5100,[473]8.5005,[474]8.5033,[475]8.5011,[476]8.5043,[477]8.4954,[478]8.4967,[479]8.5104,[480]8.5171,[481]8.5200,[482]8.5139,[483]8.5089,[484]8.5131,[485]8.5129,[486]8.5049,[487]8.5066,[488]8.5061,[489]8.4980,[490]8.4952,[491]8.4915,[492]8.4829,[493]8.4796,[494]8.4758,[495]8.4773,[496]8.4722,[497]8.4668,[498]8.4663,[499]8.4568,[500]8.4459,[501]8.4388,[502]8.4402,[503]8.4385,[504]8.4277,[505]8.4303,[506]8.4317,[507]8.4305,[508]8.4251,[509]8.4240,[510]8.4299,[511]8.4356,[512]8.4377,[513]8.4391,[514]8.4473,[515]8.4395,[516]8.4386,[517]8.4401,[518]8.4385,[519]8.4423,[520]8.4454,[521]8.4476,[522]8.4521,[523]8.4524,[524]8.4590,[525]8.4639,[526]8.4657,[527]8.4686,[528]8.4647,[529]8.4674,[530]8.4580,[531]8.4541,[532]8.4608,[533]8.4632,[534]8.4588,[535]8.4638,[536]8.4558,[537]8.4510,[538]8.4575,[539]8.4578,[540]8.4657,[541]8.4691,[542]8.4701,[543]8.4711,[544]8.4726,[545]8.4708,[546]8.4720,[547]8.4648,[548]8.4550,[549]8.4551,[550]8.4508,[551]8.4454,[552]8.4420,[553]8.4362,[554]8.4316,[555]8.4256,[556]8.4259,[557]8.4311,[558]8.4275,[559]8.4277,[560]8.4264,[561]8.4258,[562]8.4236,[563]8.4254,[564]8.4323,[565]8.4359,[566]8.4354,[567]8.4333,[568]8.4318,[569]8.4280,[570]8.4301,[571]8.4303,[572]8.4308,[573]8.4289,[574]8.4256,[575]8.4269,[576]8.4264,[577]8.4243,[578]8.4222,[579]8.4236,[580]8.4137,[581]8.4081,[582]8.4046,[583]8.4039,[584]8.4026,[585]8.3949,[586]8.3877,[587]8.3879,[588]8.3944,[589]8.4027,[590]8.4066,[591]8.4067,[592]8.4038,[593]8.3969,[594]8.3972,[595]8.3932,[596]8.3984,[597]8.3938,[598]8.3913,[599]8.3928,[600]8.3922,[601]8.3893,[602]8.3954,[603]8.3990,[604]8.4015,[605]8.4037,[606]8.4054,[607]8.4049,[608]8.3980,[609]8.3974,[610]8.4014,[611]8.3989,[612]8.4027,[613]8.3976,[614]8.3919,[615]8.3800,[616]8.3859,[617]8.3765,[618]8.3683,[619]8.3586,[620]8.3366,[621]8.3254,[622]8.3233,[623]8.3250,[624]8.3239,[625]8.3229,[626]8.3212,[627]8.3259,[628]8.3252,[629]8.3237,[630]8.3274,[631]8.3340,[632]8.3404,[633]8.3379,[634]8.3423,[635]8.3431,[636]8.3403,[637]8.3381,[638]8.3427,[639]8.3394,[640]8.3394,[641]8.3390,[642]8.3474,[643]8.3494,[644]8.3498,[645]8.3465,[646]8.3537,[647]8.3518,[648]8.3533,[649]8.3524,[650]8.3579,[651]8.3656,[652]8.3674,[653]8.3719,[654]8.3636,[655]8.3618,

llama_print_timings: load time = 2570.97 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 906622.46 ms / 335360 tokens ( 2.70 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 933921.98 ms

Q3_4, 7B

main: seed = 1682612164
llama.cpp: loading model from junk.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 10 (mostly Q3_4)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 5390.48 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 8 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
13.33 seconds per pass - ETA 2 hours 25 minutes
[1]4.5663,[2]4.9408,[3]5.8361,[4]6.5915,[5]6.6755,[6]6.6236,[7]6.8095,[8]6.9148,[9]7.2707,[10]7.5192,[11]7.7399,[12]7.7851,[13]7.7344,[14]7.8086,[15]8.0569,[16]7.6602,[17]7.5283,[18]7.4705,[19]7.0917,[20]7.0792,[21]6.9849,[22]6.8041,[23]6.7713,[24]6.6866,[25]6.6918,[26]6.5324,[27]6.3450,[28]6.2465,[29]6.1521,[30]5.9987,[31]5.9758,[32]5.9998,[33]5.9411,[34]5.9724,[35]5.9995,[36]6.0390,[37]6.0437,[38]6.0577,[39]6.0941,[40]6.1552,[41]6.1714,[42]6.2174,[43]6.1721,[44]6.2245,[45]6.2294,[46]6.2025,[47]6.2265,[48]6.1973,[49]6.2013,[50]6.1552,[51]6.1511,[52]6.1385,[53]6.1806,[54]6.1639,[55]6.1354,[56]6.1712,[57]6.1937,[58]6.2181,[59]6.2339,[60]6.2785,[61]6.2698,[62]6.3279,[63]6.3618,[64]6.3756,[65]6.4226,[66]6.4309,[67]6.4517,[68]6.4697,[69]6.4930,[70]6.5255,[71]6.5479,[72]6.5798,[73]6.6429,[74]6.6461,[75]6.6590,[76]6.6751,[77]6.6892,[78]6.6736,[79]6.7003,[80]6.6948,[81]6.7049,[82]6.7107,[83]6.6550,[84]6.6395,[85]6.6251,[86]6.6019,[87]6.5435,[88]6.5150,[89]6.4959,[90]6.4806,[91]6.5057,[92]6.4994,[93]6.4989,[94]6.4935,[95]6.5215,[96]6.5199,[97]6.5145,[98]6.5099,[99]6.4929,[100]6.4931,[101]6.5206,[102]6.5163,[103]6.5351,[104]6.5432,[105]6.5422,[106]6.5577,[107]6.5536,[108]6.5665,[109]6.5615,[110]6.5573,[111]6.5786,[112]6.6020,[113]6.6021,[114]6.5969,[115]6.6038,[116]6.5944,[117]6.6001,[118]6.6283,[119]6.6498,[120]6.6840,[121]6.6992,[122]6.7240,[123]6.7622,[124]6.7804,[125]6.7694,[126]6.8078,[127]6.8453,[128]6.8757,[129]6.8576,[130]6.8657,[131]6.8604,[132]6.8510,[133]6.8362,[134]6.8459,[135]6.8407,[136]6.8283,[137]6.8197,[138]6.8047,[139]6.7936,[140]6.7900,[141]6.7636,[142]6.7598,[143]6.7322,[144]6.7116,[145]6.7051,[146]6.6920,[147]6.6975,[148]6.6982,[149]6.6923,[150]6.6885,[151]6.6909,[152]6.6807,[153]6.6650,[154]6.6550,[155]6.6620,[156]6.6565,[157]6.6743,[158]6.6784,[159]6.6818,[160]6.6830,[161]6.6951,[162]6.6634,[163]6.6521,[164]6.6259,[165]6.5926,[166]6.5632,[167]6.5231,[168]6.4907,[169]6.4767,[170]6.4657,[171]6.4369,[172]6.4188,[173]6.4011,[174]6.3693,[175]6.3458,[176]6.3342,[177]6.3129,[178]6.2889,[179]6.2715,[180]6.2615,[181]6.2389,[182]6.2196,[183]6.2037,[184]6.2019,[185]6.1936,[186]6.1946,[187]6.2012,[188]6.1977,[189]6.2168,[190]6.2182,[191]6.2409,[192]6.2573,[193]6.2741,[194]6.2860,[195]6.3075,[196]6.3233,[197]6.3455,[198]6.3612,[199]6.3641,[200]6.3691,[201]6.3647,[202]6.3864,[203]6.3947,[204]6.3980,[205]6.4091,[206]6.4162,[207]6.4121,[208]6.4209,[209]6.4261,[210]6.4310,[211]6.4415,[212]6.4490,[213]6.4591,[214]6.4633,[215]6.4680,[216]6.4832,[217]6.5014,[218]6.5150,[219]6.5162,[220]6.5118,[221]6.5051,[222]6.5023,[223]6.4909,[224]6.4835,[225]6.4790,[226]6.5006,[227]6.5076,[228]6.5134,[229]6.5182,[230]6.5144,[231]6.5315,[232]6.5184,[233]6.5008,[234]6.4849,[235]6.4693,[236]6.4610,[237]6.4500,[238]6.4531,[239]6.4369,[240]6.4258,[241]6.4290,[242]6.4326,[243]6.4305,[244]6.4181,[245]6.4153,[246]6.4041,[247]6.3923,[248]6.3846,[249]6.3822,[250]6.3861,[251]6.3801,[252]6.3766,[253]6.3666,[254]6.3631,[255]6.3511,[256]6.3323,[257]6.3202,[258]6.3114,[259]6.3098,[260]6.3011,[261]6.2972,[262]6.2914,[263]6.2865,[264]6.2683,[265]6.2681,[266]6.2659,[267]6.2593,[268]6.2693,[269]6.2675,[270]6.2675,[271]6.2760,[272]6.2799,[273]6.2786,[274]6.2805,[275]6.2891,[276]6.2950,[277]6.3109,[278]6.3215,[279]6.3307,[280]6.3332,[281]6.3427,[282]6.3475,[283]6.3624,[284]6.3708,[285]6.3798,[286]6.3937,[287]6.3928,[288]6.3992,[289]6.3897,[290]6.3734,[291]6.3572,[292]6.3419,[293]6.3274,[294]6.3303,[295]6.3298,[296]6.3349,[297]6.3338,[298]6.3372,[299]6.3346,[300]6.3234,[301]6.3228,[302]6.3155,[303]6.3066,[304]6.2975,[305]6.2948,[306]6.2815,[307]6.2830,[308]6.2856,[309]6.2690,[310]6.2629,[311]6.2567,[312]6.2591,[313]6.2540,[314]6.2524,[315]6.2360,[316]6.2325,[317]6.2155,[318]6.1942,[319]6.2070,[320]6.2197,[321]6.2236,[322]6.2183,[323]6.2123,[324]6.2093,[325]6.2197,[326]6.2195,[327]6.2218,[328]6.2254,[329]6.2315,[330]6.2353,[331]6.2479,[332]6.2454,[333]6.2528,[334]6.2471,[335]6.2406,[336]6.2445,[337]6.2410,[338]6.2406,[339]6.2349,[340]6.2300,[341]6.2388,[342]6.2411,[343]6.2469,[344]6.2470,[345]6.2466,[346]6.2438,[347]6.2479,[348]6.2522,[349]6.2543,[350]6.2511,[351]6.2515,[352]6.2519,[353]6.2461,[354]6.2467,[355]6.2520,[356]6.2548,[357]6.2507,[358]6.2604,[359]6.2628,[360]6.2577,[361]6.2571,[362]6.2634,[363]6.2747,[364]6.2805,[365]6.2861,[366]6.2869,[367]6.2959,[368]6.2936,[369]6.2947,[370]6.2959,[371]6.2899,[372]6.2945,[373]6.3000,[374]6.2986,[375]6.2982,[376]6.3058,[377]6.3004,[378]6.3028,[379]6.3093,[380]6.3012,[381]6.2973,[382]6.2922,[383]6.2908,[384]6.2898,[385]6.2888,[386]6.2885,[387]6.2885,[388]6.2839,[389]6.2783,[390]6.2714,[391]6.2634,[392]6.2598,[393]6.2582,[394]6.2612,[395]6.2601,[396]6.2523,[397]6.2594,[398]6.2624,[399]6.2696,[400]6.2688,[401]6.2714,[402]6.2727,[403]6.2746,[404]6.2816,[405]6.2732,[406]6.2692,[407]6.2692,[408]6.2711,[409]6.2829,[410]6.2948,[411]6.3068,[412]6.3233,[413]6.3350,[414]6.3431,[415]6.3488,[416]6.3577,[417]6.3705,[418]6.3745,[419]6.3817,[420]6.3908,[421]6.4034,[422]6.4078,[423]6.4150,[424]6.4265,[425]6.4360,[426]6.4423,[427]6.4469,[428]6.4554,[429]6.4599,[430]6.4688,[431]6.4829,[432]6.4860,[433]6.4848,[434]6.4798,[435]6.4800,[436]6.4816,[437]6.4912,[438]6.4991,[439]6.4958,[440]6.4950,[441]6.4895,[442]6.4880,[443]6.4888,[444]6.4892,[445]6.4875,[446]6.4892,[447]6.4916,[448]6.4960,[449]6.4937,[450]6.4942,[451]6.4897,[452]6.4792,[453]6.4706,[454]6.4647,[455]6.4659,[456]6.4703,[457]6.4721,[458]6.4701,[459]6.4703,[460]6.4789,[461]6.4761,[462]6.4740,[463]6.4784,[464]6.4771,[465]6.4748,[466]6.4669,[467]6.4671,[468]6.4666,[469]6.4686,[470]6.4690,[471]6.4642,[472]6.4688,[473]6.4631,[474]6.4643,[475]6.4586,[476]6.4605,[477]6.4535,[478]6.4529,[479]6.4602,[480]6.4651,[481]6.4669,[482]6.4626,[483]6.4583,[484]6.4605,[485]6.4590,[486]6.4533,[487]6.4536,[488]6.4514,[489]6.4462,[490]6.4439,[491]6.4405,[492]6.4345,[493]6.4318,[494]6.4300,[495]6.4294,[496]6.4262,[497]6.4207,[498]6.4187,[499]6.4142,[500]6.4045,[501]6.3977,[502]6.3982,[503]6.3975,[504]6.3887,[505]6.3914,[506]6.3922,[507]6.3870,[508]6.3831,[509]6.3822,[510]6.3861,[511]6.3908,[512]6.3944,[513]6.3960,[514]6.4024,[515]6.3970,[516]6.3961,[517]6.3972,[518]6.3967,[519]6.3997,[520]6.4024,[521]6.4039,[522]6.4071,[523]6.4077,[524]6.4132,[525]6.4167,[526]6.4175,[527]6.4192,[528]6.4137,[529]6.4147,[530]6.4096,[531]6.4080,[532]6.4132,[533]6.4158,[534]6.4138,[535]6.4163,[536]6.4104,[537]6.4078,[538]6.4132,[539]6.4142,[540]6.4182,[541]6.4187,[542]6.4199,[543]6.4211,[544]6.4221,[545]6.4199,[546]6.4205,[547]6.4159,[548]6.4104,[549]6.4102,[550]6.4072,[551]6.4035,[552]6.4016,[553]6.3975,[554]6.3949,[555]6.3915,[556]6.3912,[557]6.3940,[558]6.3902,[559]6.3897,[560]6.3893,[561]6.3894,[562]6.3868,[563]6.3866,[564]6.3912,[565]6.3934,[566]6.3934,[567]6.3908,[568]6.3910,[569]6.3895,[570]6.3927,[571]6.3928,[572]6.3939,[573]6.3937,[574]6.3899,[575]6.3895,[576]6.3898,[577]6.3881,[578]6.3859,[579]6.3864,[580]6.3795,[581]6.3757,[582]6.3744,[583]6.3751,[584]6.3752,[585]6.3680,[586]6.3610,[587]6.3616,[588]6.3663,[589]6.3722,[590]6.3754,[591]6.3775,[592]6.3755,[593]6.3719,[594]6.3725,[595]6.3698,[596]6.3737,[597]6.3711,[598]6.3683,[599]6.3702,[600]6.3694,[601]6.3680,[602]6.3704,[603]6.3732,[604]6.3745,[605]6.3776,[606]6.3799,[607]6.3788,[608]6.3751,[609]6.3755,[610]6.3790,[611]6.3770,[612]6.3797,[613]6.3758,[614]6.3703,[615]6.3627,[616]6.3654,[617]6.3589,[618]6.3537,[619]6.3478,[620]6.3331,[621]6.3260,[622]6.3239,[623]6.3254,[624]6.3260,[625]6.3260,[626]6.3248,[627]6.3273,[628]6.3271,[629]6.3268,[630]6.3300,[631]6.3358,[632]6.3411,[633]6.3395,[634]6.3428,[635]6.3431,[636]6.3407,[637]6.3375,[638]6.3405,[639]6.3373,[640]6.3381,[641]6.3380,[642]6.3446,[643]6.3467,[644]6.3480,[645]6.3463,[646]6.3508,[647]6.3474,[648]6.3484,[649]6.3487,[650]6.3527,[651]6.3582,[652]6.3594,[653]6.3633,[654]6.3566,[655]6.3559,

llama_print_timings: load time = 13794.33 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 4708961.29 ms / 335360 tokens ( 14.04 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 4740083.96 ms

Q4_4, 7B

main: seed = 1682662628
llama.cpp: loading model from ../models/7B/q44.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 14 (mostly Q4_4)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 6079.65 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.66 seconds per pass - ETA 18 minutes
[1]4.4671,[2]4.9332,[3]5.7966,[4]6.3903,[5]6.4889,[6]6.4445,[7]6.6354,[8]6.7448,[9]7.1008,[10]7.3327,[11]7.5469,[12]7.5702,[13]7.4939,[14]7.5431,[15]7.8020,[16]7.4115,[17]7.2997,[18]7.2633,[19]6.8989,[20]6.8895,[21]6.7942,[22]6.6200,[23]6.5946,[24]6.4943,[25]6.4910,[26]6.3306,[27]6.1518,[28]6.0550,[29]5.9628,[30]5.7997,[31]5.7669,[32]5.7880,[33]5.7247,[34]5.7606,[35]5.7797,[36]5.8213,[37]5.8266,[38]5.8396,[39]5.8747,[40]5.9274,[41]5.9369,[42]5.9771,[43]5.9365,[44]5.9939,[45]5.9977,[46]5.9743,[47]5.9951,[48]5.9683,[49]5.9721,[50]5.9325,[51]5.9288,[52]5.9188,[53]5.9631,[54]5.9476,[55]5.9211,[56]5.9503,[57]5.9739,[58]5.9943,[59]6.0098,[60]6.0524,[61]6.0460,[62]6.1021,[63]6.1367,[64]6.1507,[65]6.1955,[66]6.2021,[67]6.2187,[68]6.2364,[69]6.2616,[70]6.2919,[71]6.3106,[72]6.3415,[73]6.4011,[74]6.4065,[75]6.4199,[76]6.4314,[77]6.4421,[78]6.4259,[79]6.4530,[80]6.4448,[81]6.4560,[82]6.4602,[83]6.4082,[84]6.3919,[85]6.3798,[86]6.3574,[87]6.2929,[88]6.2639,[89]6.2441,[90]6.2288,[91]6.2508,[92]6.2444,[93]6.2448,[94]6.2434,[95]6.2717,[96]6.2711,[97]6.2662,[98]6.2604,[99]6.2463,[100]6.2476,[101]6.2717,[102]6.2658,[103]6.2852,[104]6.2931,[105]6.2930,[106]6.3089,[107]6.3065,[108]6.3193,[109]6.3124,[110]6.3074,[111]6.3307,[112]6.3508,[113]6.3523,[114]6.3484,[115]6.3549,[116]6.3466,[117]6.3512,[118]6.3800,[119]6.3996,[120]6.4352,[121]6.4512,[122]6.4768,[123]6.5140,[124]6.5320,[125]6.5220,[126]6.5619,[127]6.5989,[128]6.6290,[129]6.6127,[130]6.6231,[131]6.6199,[132]6.6103,[133]6.5962,[134]6.6053,[135]6.6013,[136]6.5899,[137]6.5826,[138]6.5660,[139]6.5555,[140]6.5498,[141]6.5198,[142]6.5166,[143]6.4876,[144]6.4677,[145]6.4586,[146]6.4460,[147]6.4515,[148]6.4512,[149]6.4450,[150]6.4413,[151]6.4425,[152]6.4321,[153]6.4147,[154]6.4056,[155]6.4131,[156]6.4087,[157]6.4255,[158]6.4298,[159]6.4351,[160]6.4369,[161]6.4483,[162]6.4189,[163]6.4069,[164]6.3823,[165]6.3515,[166]6.3244,[167]6.2874,[168]6.2551,[169]6.2412,[170]6.2300,[171]6.2027,[172]6.1858,[173]6.1688,[174]6.1387,[175]6.1174,[176]6.1069,[177]6.0861,[178]6.0630,[179]6.0461,[180]6.0368,[181]6.0151,[182]5.9970,[183]5.9832,[184]5.9832,[185]5.9755,[186]5.9762,[187]5.9828,[188]5.9785,[189]5.9951,[190]5.9961,[191]6.0184,[192]6.0350,[193]6.0522,[194]6.0635,[195]6.0853,[196]6.1015,[197]6.1229,[198]6.1381,[199]6.1415,[200]6.1468,[201]6.1411,[202]6.1608,[203]6.1688,[204]6.1675,[205]6.1781,[206]6.1847,[207]6.1806,[208]6.1893,[209]6.1939,[210]6.1997,[211]6.2106,[212]6.2183,[213]6.2291,[214]6.2316,[215]6.2350,[216]6.2496,[217]6.2681,[218]6.2813,[219]6.2813,[220]6.2777,[221]6.2732,[222]6.2704,[223]6.2607,[224]6.2532,[225]6.2496,[226]6.2706,[227]6.2796,[228]6.2846,[229]6.2911,[230]6.2883,[231]6.3053,[232]6.2927,[233]6.2763,[234]6.2615,[235]6.2437,[236]6.2364,[237]6.2263,[238]6.2291,[239]6.2136,[240]6.2034,[241]6.2061,[242]6.2098,[243]6.2076,[244]6.1963,[245]6.1933,[246]6.1819,[247]6.1698,[248]6.1622,[249]6.1601,[250]6.1644,[251]6.1573,[252]6.1532,[253]6.1435,[254]6.1388,[255]6.1269,[256]6.1089,[257]6.0971,[258]6.0891,[259]6.0872,[260]6.0795,[261]6.0754,[262]6.0699,[263]6.0643,[264]6.0427,[265]6.0420,[266]6.0407,[267]6.0339,[268]6.0434,[269]6.0410,[270]6.0421,[271]6.0497,[272]6.0532,[273]6.0534,[274]6.0557,[275]6.0640,[276]6.0700,[277]6.0856,[278]6.0958,[279]6.1049,[280]6.1076,[281]6.1168,[282]6.1228,[283]6.1381,[284]6.1461,[285]6.1548,[286]6.1684,[287]6.1687,[288]6.1747,[289]6.1662,[290]6.1505,[291]6.1353,[292]6.1199,[293]6.1070,[294]6.1090,[295]6.1082,[296]6.1124,[297]6.1110,[298]6.1144,[299]6.1116,[300]6.1005,[301]6.1005,[302]6.0925,[303]6.0832,[304]6.0749,[305]6.0724,[306]6.0599,[307]6.0621,[308]6.0658,[309]6.0495,[310]6.0435,[311]6.0375,[312]6.0401,[313]6.0346,[314]6.0328,[315]6.0165,[316]6.0113,[317]5.9953,[318]5.9745,[319]5.9864,[320]5.9991,[321]6.0034,[322]5.9992,[323]5.9924,[324]5.9893,[325]5.9993,[326]5.9989,[327]6.0011,[328]6.0052,[329]6.0116,[330]6.0142,[331]6.0267,[332]6.0234,[333]6.0306,[334]6.0251,[335]6.0182,[336]6.0215,[337]6.0188,[338]6.0183,[339]6.0132,[340]6.0088,[341]6.0169,[342]6.0194,[343]6.0240,[344]6.0238,[345]6.0243,[346]6.0216,[347]6.0255,[348]6.0282,[349]6.0300,[350]6.0266,[351]6.0272,[352]6.0275,[353]6.0218,[354]6.0218,[355]6.0268,[356]6.0295,[357]6.0262,[358]6.0353,[359]6.0384,[360]6.0350,[361]6.0349,[362]6.0416,[363]6.0531,[364]6.0599,[365]6.0656,[366]6.0665,[367]6.0749,[368]6.0722,[369]6.0729,[370]6.0744,[371]6.0687,[372]6.0733,[373]6.0784,[374]6.0768,[375]6.0770,[376]6.0837,[377]6.0790,[378]6.0816,[379]6.0876,[380]6.0798,[381]6.0762,[382]6.0711,[383]6.0704,[384]6.0698,[385]6.0690,[386]6.0684,[387]6.0682,[388]6.0644,[389]6.0591,[390]6.0524,[391]6.0446,[392]6.0405,[393]6.0391,[394]6.0420,[395]6.0406,[396]6.0330,[397]6.0401,[398]6.0440,[399]6.0523,[400]6.0522,[401]6.0539,[402]6.0546,[403]6.0566,[404]6.0632,[405]6.0534,[406]6.0498,[407]6.0490,[408]6.0505,[409]6.0626,[410]6.0736,[411]6.0848,[412]6.1006,[413]6.1118,[414]6.1195,[415]6.1248,[416]6.1322,[417]6.1444,[418]6.1483,[419]6.1556,[420]6.1642,[421]6.1757,[422]6.1804,[423]6.1873,[424]6.1986,[425]6.2072,[426]6.2136,[427]6.2181,[428]6.2262,[429]6.2315,[430]6.2398,[431]6.2542,[432]6.2585,[433]6.2580,[434]6.2538,[435]6.2546,[436]6.2568,[437]6.2665,[438]6.2738,[439]6.2713,[440]6.2704,[441]6.2652,[442]6.2637,[443]6.2651,[444]6.2653,[445]6.2633,[446]6.2660,[447]6.2689,[448]6.2734,[449]6.2704,[450]6.2717,[451]6.2676,[452]6.2548,[453]6.2464,[454]6.2409,[455]6.2419,[456]6.2463,[457]6.2483,[458]6.2461,[459]6.2467,[460]6.2551,[461]6.2525,[462]6.2511,[463]6.2558,[464]6.2547,[465]6.2519,[466]6.2441,[467]6.2441,[468]6.2439,[469]6.2458,[470]6.2462,[471]6.2412,[472]6.2457,[473]6.2402,[474]6.2412,[475]6.2353,[476]6.2371,[477]6.2300,[478]6.2289,[479]6.2348,[480]6.2394,[481]6.2412,[482]6.2367,[483]6.2323,[484]6.2343,[485]6.2332,[486]6.2278,[487]6.2278,[488]6.2258,[489]6.2209,[490]6.2185,[491]6.2154,[492]6.2094,[493]6.2065,[494]6.2051,[495]6.2051,[496]6.2017,[497]6.1960,[498]6.1943,[499]6.1898,[500]6.1803,[501]6.1738,[502]6.1741,[503]6.1733,[504]6.1644,[505]6.1673,[506]6.1681,[507]6.1625,[508]6.1586,[509]6.1579,[510]6.1617,[511]6.1661,[512]6.1694,[513]6.1715,[514]6.1779,[515]6.1723,[516]6.1713,[517]6.1722,[518]6.1721,[519]6.1750,[520]6.1777,[521]6.1792,[522]6.1821,[523]6.1827,[524]6.1884,[525]6.1919,[526]6.1931,[527]6.1949,[528]6.1897,[529]6.1904,[530]6.1854,[531]6.1840,[532]6.1885,[533]6.1907,[534]6.1894,[535]6.1918,[536]6.1864,[537]6.1840,[538]6.1889,[539]6.1900,[540]6.1936,[541]6.1938,[542]6.1950,[543]6.1967,[544]6.1976,[545]6.1955,[546]6.1963,[547]6.1920,[548]6.1871,[549]6.1872,[550]6.1841,[551]6.1805,[552]6.1783,[553]6.1745,[554]6.1723,[555]6.1694,[556]6.1691,[557]6.1713,[558]6.1675,[559]6.1669,[560]6.1667,[561]6.1668,[562]6.1641,[563]6.1641,[564]6.1682,[565]6.1701,[566]6.1698,[567]6.1678,[568]6.1683,[569]6.1668,[570]6.1696,[571]6.1702,[572]6.1711,[573]6.1712,[574]6.1677,[575]6.1675,[576]6.1675,[577]6.1662,[578]6.1642,[579]6.1649,[580]6.1582,[581]6.1544,[582]6.1534,[583]6.1543,[584]6.1544,[585]6.1467,[586]6.1399,[587]6.1404,[588]6.1449,[589]6.1505,[590]6.1536,[591]6.1558,[592]6.1545,[593]6.1514,[594]6.1523,[595]6.1500,[596]6.1535,[597]6.1513,[598]6.1484,[599]6.1506,[600]6.1502,[601]6.1486,[602]6.1500,[603]6.1533,[604]6.1542,[605]6.1574,[606]6.1593,[607]6.1577,[608]6.1546,[609]6.1551,[610]6.1587,[611]6.1569,[612]6.1595,[613]6.1557,[614]6.1506,[615]6.1432,[616]6.1462,[617]6.1402,[618]6.1353,[619]6.1297,[620]6.1158,[621]6.1088,[622]6.1073,[623]6.1087,[624]6.1091,[625]6.1091,[626]6.1078,[627]6.1098,[628]6.1099,[629]6.1096,[630]6.1128,[631]6.1183,[632]6.1237,[633]6.1221,[634]6.1256,[635]6.1265,[636]6.1232,[637]6.1200,[638]6.1227,[639]6.1197,[640]6.1206,[641]6.1210,[642]6.1278,[643]6.1300,[644]6.1312,[645]6.1292,[646]6.1331,[647]6.1294,[648]6.1302,[649]6.1303,[650]6.1343,[651]6.1398,[652]6.1408,[653]6.1448,[654]6.1384,[655]6.1378,

llama_print_timings: load time = 2868.41 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 986629.03 ms / 335360 tokens ( 2.94 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1016443.58 ms

Q2_4, 13B

main: seed = 1682672513
llama.cpp: loading model from ../models/13B/q24.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 15 (mostly Q2_4)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 7149.75 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.46 seconds per pass - ETA 26 minutes
[1]4.7585,[2]5.3449,[3]6.1921,[4]6.9859,[5]7.0915,[6]6.9582,[7]7.1905,[8]7.3194,[9]7.6847,[10]7.9856,[11]8.2037,[12]8.2346,[13]8.2531,[14]8.4211,[15]8.6654,[16]8.1885,[17]8.0491,[18]8.0571,[19]7.6317,[20]7.5630,[21]7.4568,[22]7.2688,[23]7.2103,[24]7.1009,[25]7.0970,[26]6.8966,[27]6.6696,[28]6.5677,[29]6.4672,[30]6.2927,[31]6.2522,[32]6.2650,[33]6.2214,[34]6.2911,[35]6.3188,[36]6.3626,[37]6.3676,[38]6.3648,[39]6.4049,[40]6.4696,[41]6.5031,[42]6.5451,[43]6.4929,[44]6.5349,[45]6.5355,[46]6.4898,[47]6.5203,[48]6.4929,[49]6.5029,[50]6.4658,[51]6.4713,[52]6.4578,[53]6.5058,[54]6.4917,[55]6.4656,[56]6.4997,[57]6.5208,[58]6.5559,[59]6.5796,[60]6.6247,[61]6.6097,[62]6.6783,[63]6.7118,[64]6.7189,[65]6.7631,[66]6.7637,[67]6.7817,[68]6.7975,[69]6.8369,[70]6.8757,[71]6.9031,[72]6.9442,[73]7.0056,[74]7.0067,[75]7.0174,[76]7.0390,[77]7.0566,[78]7.0475,[79]7.0756,[80]7.0690,[81]7.0858,[82]7.0833,[83]7.0255,[84]7.0156,[85]7.0123,[86]6.9888,[87]6.9278,[88]6.8928,[89]6.8682,[90]6.8598,[91]6.8908,[92]6.8824,[93]6.8846,[94]6.8808,[95]6.9110,[96]6.9089,[97]6.9063,[98]6.8987,[99]6.8890,[100]6.8806,[101]6.9075,[102]6.8944,[103]6.9125,[104]6.9135,[105]6.9150,[106]6.9329,[107]6.9309,[108]6.9469,[109]6.9416,[110]6.9362,[111]6.9575,[112]6.9752,[113]6.9789,[114]6.9764,[115]6.9810,[116]6.9712,[117]6.9763,[118]7.0062,[119]7.0290,[120]7.0601,[121]7.0781,[122]7.1003,[123]7.1417,[124]7.1628,[125]7.1536,[126]7.1928,[127]7.2300,[128]7.2598,[129]7.2410,[130]7.2510,[131]7.2454,[132]7.2380,[133]7.2288,[134]7.2421,[135]7.2391,[136]7.2287,[137]7.2241,[138]7.2107,[139]7.2015,[140]7.2012,[141]7.1776,[142]7.1732,[143]7.1527,[144]7.1378,[145]7.1344,[146]7.1172,[147]7.1279,[148]7.1340,[149]7.1299,[150]7.1284,[151]7.1312,[152]7.1194,[153]7.1053,[154]7.0951,[155]7.1006,[156]7.0991,[157]7.1170,[158]7.1222,[159]7.1258,[160]7.1297,[161]7.1436,[162]7.1072,[163]7.0966,[164]7.0683,[165]7.0330,[166]6.9992,[167]6.9570,[168]6.9234,[169]6.9088,[170]6.8939,[171]6.8657,[172]6.8461,[173]6.8297,[174]6.7949,[175]6.7696,[176]6.7531,[177]6.7300,[178]6.7037,[179]6.6849,[180]6.6748,[181]6.6528,[182]6.6307,[183]6.6164,[184]6.6141,[185]6.6069,[186]6.6110,[187]6.6160,[188]6.6144,[189]6.6356,[190]6.6369,[191]6.6560,[192]6.6695,[193]6.6896,[194]6.7040,[195]6.7267,[196]6.7426,[197]6.7660,[198]6.7803,[199]6.7816,[200]6.7834,[201]6.7782,[202]6.8005,[203]6.8093,[204]6.8139,[205]6.8266,[206]6.8314,[207]6.8280,[208]6.8349,[209]6.8376,[210]6.8433,[211]6.8518,[212]6.8573,[213]6.8660,[214]6.8694,[215]6.8724,[216]6.8841,[217]6.9016,[218]6.9167,[219]6.9157,[220]6.9105,[221]6.9024,[222]6.9009,[223]6.8909,[224]6.8810,[225]6.8771,[226]6.8996,[227]6.9145,[228]6.9243,[229]6.9329,[230]6.9299,[231]6.9460,[232]6.9357,[233]6.9161,[234]6.8990,[235]6.8808,[236]6.8716,[237]6.8604,[238]6.8644,[239]6.8478,[240]6.8351,[241]6.8392,[242]6.8429,[243]6.8411,[244]6.8283,[245]6.8257,[246]6.8136,[247]6.8008,[248]6.7913,[249]6.7872,[250]6.7898,[251]6.7811,[252]6.7757,[253]6.7633,[254]6.7598,[255]6.7463,[256]6.7255,[257]6.7133,[258]6.7034,[259]6.7016,[260]6.6927,[261]6.6882,[262]6.6811,[263]6.6731,[264]6.6577,[265]6.6581,[266]6.6546,[267]6.6455,[268]6.6556,[269]6.6559,[270]6.6559,[271]6.6630,[272]6.6677,[273]6.6665,[274]6.6678,[275]6.6755,[276]6.6830,[277]6.7003,[278]6.7100,[279]6.7188,[280]6.7223,[281]6.7342,[282]6.7384,[283]6.7524,[284]6.7615,[285]6.7707,[286]6.7866,[287]6.7841,[288]6.7917,[289]6.7842,[290]6.7667,[291]6.7510,[292]6.7324,[293]6.7158,[294]6.7170,[295]6.7174,[296]6.7228,[297]6.7217,[298]6.7237,[299]6.7202,[300]6.7097,[301]6.7079,[302]6.6990,[303]6.6904,[304]6.6805,[305]6.6760,[306]6.6626,[307]6.6652,[308]6.6659,[309]6.6506,[310]6.6450,[311]6.6401,[312]6.6418,[313]6.6342,[314]6.6339,[315]6.6170,[316]6.6160,[317]6.6004,[318]6.5804,[319]6.5945,[320]6.6077,[321]6.6118,[322]6.6055,[323]6.5989,[324]6.5969,[325]6.6085,[326]6.6091,[327]6.6111,[328]6.6139,[329]6.6186,[330]6.6225,[331]6.6357,[332]6.6312,[333]6.6398,[334]6.6321,[335]6.6254,[336]6.6294,[337]6.6266,[338]6.6272,[339]6.6223,[340]6.6194,[341]6.6274,[342]6.6307,[343]6.6368,[344]6.6358,[345]6.6355,[346]6.6318,[347]6.6356,[348]6.6404,[349]6.6428,[350]6.6401,[351]6.6418,[352]6.6438,[353]6.6379,[354]6.6392,[355]6.6448,[356]6.6477,[357]6.6436,[358]6.6529,[359]6.6557,[360]6.6506,[361]6.6486,[362]6.6565,[363]6.6678,[364]6.6738,[365]6.6800,[366]6.6819,[367]6.6929,[368]6.6892,[369]6.6907,[370]6.6922,[371]6.6857,[372]6.6918,[373]6.6976,[374]6.6955,[375]6.6940,[376]6.7022,[377]6.6962,[378]6.6974,[379]6.7035,[380]6.6939,[381]6.6905,[382]6.6856,[383]6.6836,[384]6.6840,[385]6.6822,[386]6.6813,[387]6.6816,[388]6.6758,[389]6.6701,[390]6.6638,[391]6.6557,[392]6.6529,[393]6.6534,[394]6.6558,[395]6.6539,[396]6.6467,[397]6.6557,[398]6.6598,[399]6.6702,[400]6.6694,[401]6.6701,[402]6.6706,[403]6.6732,[404]6.6793,[405]6.6683,[406]6.6646,[407]6.6647,[408]6.6652,[409]6.6781,[410]6.6897,[411]6.7017,[412]6.7187,[413]6.7319,[414]6.7406,[415]6.7476,[416]6.7561,[417]6.7672,[418]6.7697,[419]6.7761,[420]6.7854,[421]6.7970,[422]6.8007,[423]6.8079,[424]6.8207,[425]6.8308,[426]6.8387,[427]6.8423,[428]6.8514,[429]6.8557,[430]6.8635,[431]6.8784,[432]6.8802,[433]6.8788,[434]6.8730,[435]6.8730,[436]6.8755,[437]6.8857,[438]6.8954,[439]6.8912,[440]6.8895,[441]6.8831,[442]6.8802,[443]6.8809,[444]6.8829,[445]6.8802,[446]6.8811,[447]6.8832,[448]6.8871,[449]6.8847,[450]6.8841,[451]6.8796,[452]6.8733,[453]6.8643,[454]6.8579,[455]6.8578,[456]6.8623,[457]6.8644,[458]6.8620,[459]6.8618,[460]6.8698,[461]6.8655,[462]6.8631,[463]6.8662,[464]6.8657,[465]6.8642,[466]6.8564,[467]6.8586,[468]6.8591,[469]6.8619,[470]6.8626,[471]6.8578,[472]6.8629,[473]6.8565,[474]6.8588,[475]6.8554,[476]6.8561,[477]6.8481,[478]6.8469,[479]6.8548,[480]6.8607,[481]6.8621,[482]6.8569,[483]6.8536,[484]6.8568,[485]6.8558,[486]6.8488,[487]6.8492,[488]6.8468,[489]6.8411,[490]6.8390,[491]6.8360,[492]6.8294,[493]6.8257,[494]6.8236,[495]6.8228,[496]6.8191,[497]6.8134,[498]6.8116,[499]6.8061,[500]6.7967,[501]6.7880,[502]6.7892,[503]6.7873,[504]6.7777,[505]6.7790,[506]6.7806,[507]6.7771,[508]6.7734,[509]6.7717,[510]6.7750,[511]6.7811,[512]6.7847,[513]6.7874,[514]6.7947,[515]6.7885,[516]6.7874,[517]6.7884,[518]6.7876,[519]6.7900,[520]6.7920,[521]6.7937,[522]6.7953,[523]6.7954,[524]6.8017,[525]6.8047,[526]6.8057,[527]6.8079,[528]6.8026,[529]6.8051,[530]6.7994,[531]6.7978,[532]6.8046,[533]6.8084,[534]6.8061,[535]6.8101,[536]6.8043,[537]6.8016,[538]6.8073,[539]6.8080,[540]6.8123,[541]6.8142,[542]6.8147,[543]6.8169,[544]6.8180,[545]6.8166,[546]6.8171,[547]6.8123,[548]6.8059,[549]6.8058,[550]6.8032,[551]6.7988,[552]6.7972,[553]6.7924,[554]6.7894,[555]6.7866,[556]6.7858,[557]6.7887,[558]6.7852,[559]6.7856,[560]6.7835,[561]6.7842,[562]6.7818,[563]6.7808,[564]6.7860,[565]6.7882,[566]6.7879,[567]6.7853,[568]6.7853,[569]6.7822,[570]6.7854,[571]6.7860,[572]6.7865,[573]6.7865,[574]6.7829,[575]6.7816,[576]6.7811,[577]6.7778,[578]6.7755,[579]6.7751,[580]6.7677,[581]6.7637,[582]6.7634,[583]6.7635,[584]6.7630,[585]6.7563,[586]6.7496,[587]6.7503,[588]6.7555,[589]6.7622,[590]6.7651,[591]6.7657,[592]6.7647,[593]6.7611,[594]6.7620,[595]6.7595,[596]6.7636,[597]6.7606,[598]6.7572,[599]6.7593,[600]6.7579,[601]6.7563,[602]6.7597,[603]6.7628,[604]6.7645,[605]6.7670,[606]6.7681,[607]6.7671,[608]6.7631,[609]6.7631,[610]6.7687,[611]6.7669,[612]6.7689,[613]6.7652,[614]6.7594,[615]6.7506,[616]6.7538,[617]6.7459,[618]6.7394,[619]6.7332,[620]6.7181,[621]6.7112,[622]6.7089,[623]6.7105,[624]6.7107,[625]6.7115,[626]6.7104,[627]6.7136,[628]6.7136,[629]6.7137,[630]6.7170,[631]6.7235,[632]6.7290,[633]6.7270,[634]6.7302,[635]6.7294,[636]6.7265,[637]6.7236,[638]6.7265,[639]6.7229,[640]6.7237,[641]6.7239,[642]6.7309,[643]6.7328,[644]6.7346,[645]6.7328,[646]6.7373,[647]6.7336,[648]6.7350,[649]6.7353,[650]6.7393,[651]6.7443,[652]6.7446,[653]6.7485,[654]6.7419,[655]6.7409,

llama_print_timings: load time = 4488.53 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1516595.65 ms / 335360 tokens ( 4.52 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1547168.54 ms

Q3_4, 13B

main: seed = 1682656187
llama.cpp: loading model from ../models/13B/q34.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 10 (mostly Q3_4)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 8681.78 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.75 seconds per pass - ETA 30 minutes
[1]3.9691,[2]4.3609,[3]5.1508,[4]5.6357,[5]5.8245,[6]5.7682,[7]5.8930,[8]6.0160,[9]6.2782,[10]6.4992,[11]6.7043,[12]6.7592,[13]6.7085,[14]6.8204,[15]7.0162,[16]6.6686,[17]6.5717,[18]6.5474,[19]6.2354,[20]6.1972,[21]6.1250,[22]5.9506,[23]5.9279,[24]5.8369,[25]5.8490,[26]5.6949,[27]5.5136,[28]5.4184,[29]5.3375,[30]5.1977,[31]5.1640,[32]5.1797,[33]5.1311,[34]5.1711,[35]5.1947,[36]5.2175,[37]5.2144,[38]5.2114,[39]5.2423,[40]5.2872,[41]5.3119,[42]5.3487,[43]5.3108,[44]5.3543,[45]5.3571,[46]5.3300,[47]5.3585,[48]5.3381,[49]5.3477,[50]5.3142,[51]5.3187,[52]5.3117,[53]5.3565,[54]5.3451,[55]5.3237,[56]5.3481,[57]5.3650,[58]5.3886,[59]5.4044,[60]5.4383,[61]5.4305,[62]5.4858,[63]5.5112,[64]5.5220,[65]5.5596,[66]5.5598,[67]5.5773,[68]5.5892,[69]5.6185,[70]5.6480,[71]5.6698,[72]5.7054,[73]5.7564,[74]5.7630,[75]5.7734,[76]5.7888,[77]5.8019,[78]5.7873,[79]5.8141,[80]5.8085,[81]5.8180,[82]5.8151,[83]5.7677,[84]5.7567,[85]5.7513,[86]5.7341,[87]5.6712,[88]5.6305,[89]5.6089,[90]5.5978,[91]5.6195,[92]5.6135,[93]5.6146,[94]5.6130,[95]5.6419,[96]5.6394,[97]5.6359,[98]5.6316,[99]5.6235,[100]5.6211,[101]5.6452,[102]5.6405,[103]5.6557,[104]5.6603,[105]5.6634,[106]5.6775,[107]5.6761,[108]5.6907,[109]5.6889,[110]5.6830,[111]5.7020,[112]5.7191,[113]5.7178,[114]5.7158,[115]5.7198,[116]5.7080,[117]5.7090,[118]5.7330,[119]5.7506,[120]5.7808,[121]5.7966,[122]5.8185,[123]5.8563,[124]5.8753,[125]5.8695,[126]5.9059,[127]5.9397,[128]5.9679,[129]5.9551,[130]5.9628,[131]5.9577,[132]5.9536,[133]5.9413,[134]5.9490,[135]5.9496,[136]5.9397,[137]5.9349,[138]5.9208,[139]5.9126,[140]5.9112,[141]5.8840,[142]5.8817,[143]5.8558,[144]5.8403,[145]5.8328,[146]5.8204,[147]5.8257,[148]5.8279,[149]5.8249,[150]5.8232,[151]5.8275,[152]5.8208,[153]5.8108,[154]5.8054,[155]5.8122,[156]5.8106,[157]5.8269,[158]5.8289,[159]5.8305,[160]5.8342,[161]5.8453,[162]5.8188,[163]5.8082,[164]5.7861,[165]5.7595,[166]5.7354,[167]5.7028,[168]5.6739,[169]5.6606,[170]5.6517,[171]5.6296,[172]5.6167,[173]5.6035,[174]5.5753,[175]5.5553,[176]5.5420,[177]5.5254,[178]5.5038,[179]5.4904,[180]5.4827,[181]5.4657,[182]5.4485,[183]5.4360,[184]5.4348,[185]5.4271,[186]5.4282,[187]5.4332,[188]5.4297,[189]5.4473,[190]5.4471,[191]5.4648,[192]5.4785,[193]5.4948,[194]5.5065,[195]5.5265,[196]5.5385,[197]5.5577,[198]5.5713,[199]5.5728,[200]5.5747,[201]5.5688,[202]5.5830,[203]5.5901,[204]5.5849,[205]5.5946,[206]5.5999,[207]5.5954,[208]5.6013,[209]5.6056,[210]5.6114,[211]5.6216,[212]5.6280,[213]5.6369,[214]5.6402,[215]5.6434,[216]5.6548,[217]5.6719,[218]5.6855,[219]5.6862,[220]5.6827,[221]5.6775,[222]5.6773,[223]5.6701,[224]5.6627,[225]5.6593,[226]5.6796,[227]5.6871,[228]5.6945,[229]5.7014,[230]5.6982,[231]5.7139,[232]5.7031,[233]5.6877,[234]5.6733,[235]5.6521,[236]5.6465,[237]5.6370,[238]5.6397,[239]5.6280,[240]5.6186,[241]5.6219,[242]5.6239,[243]5.6226,[244]5.6122,[245]5.6090,[246]5.5987,[247]5.5889,[248]5.5823,[249]5.5786,[250]5.5821,[251]5.5736,[252]5.5693,[253]5.5598,[254]5.5562,[255]5.5464,[256]5.5294,[257]5.5194,[258]5.5122,[259]5.5121,[260]5.5040,[261]5.4993,[262]5.4950,[263]5.4898,[264]5.4684,[265]5.4679,[266]5.4647,[267]5.4583,[268]5.4651,[269]5.4649,[270]5.4662,[271]5.4723,[272]5.4756,[273]5.4771,[274]5.4788,[275]5.4854,[276]5.4918,[277]5.5050,[278]5.5138,[279]5.5224,[280]5.5257,[281]5.5354,[282]5.5409,[283]5.5541,[284]5.5634,[285]5.5703,[286]5.5834,[287]5.5802,[288]5.5859,[289]5.5799,[290]5.5657,[291]5.5528,[292]5.5389,[293]5.5267,[294]5.5278,[295]5.5279,[296]5.5330,[297]5.5319,[298]5.5333,[299]5.5313,[300]5.5220,[301]5.5223,[302]5.5154,[303]5.5067,[304]5.4995,[305]5.4974,[306]5.4861,[307]5.4896,[308]5.4904,[309]5.4765,[310]5.4726,[311]5.4684,[312]5.4703,[313]5.4646,[314]5.4629,[315]5.4493,[316]5.4465,[317]5.4335,[318]5.4164,[319]5.4272,[320]5.4385,[321]5.4432,[322]5.4399,[323]5.4338,[324]5.4321,[325]5.4421,[326]5.4438,[327]5.4446,[328]5.4478,[329]5.4528,[330]5.4549,[331]5.4651,[332]5.4612,[333]5.4687,[334]5.4637,[335]5.4582,[336]5.4602,[337]5.4589,[338]5.4584,[339]5.4544,[340]5.4515,[341]5.4582,[342]5.4613,[343]5.4661,[344]5.4667,[345]5.4682,[346]5.4668,[347]5.4704,[348]5.4741,[349]5.4761,[350]5.4743,[351]5.4755,[352]5.4756,[353]5.4707,[354]5.4717,[355]5.4764,[356]5.4792,[357]5.4759,[358]5.4840,[359]5.4864,[360]5.4827,[361]5.4826,[362]5.4892,[363]5.5000,[364]5.5057,[365]5.5100,[366]5.5115,[367]5.5204,[368]5.5175,[369]5.5187,[370]5.5204,[371]5.5161,[372]5.5205,[373]5.5250,[374]5.5229,[375]5.5222,[376]5.5284,[377]5.5247,[378]5.5269,[379]5.5311,[380]5.5240,[381]5.5207,[382]5.5167,[383]5.5148,[384]5.5146,[385]5.5136,[386]5.5127,[387]5.5122,[388]5.5088,[389]5.5049,[390]5.4993,[391]5.4931,[392]5.4895,[393]5.4891,[394]5.4920,[395]5.4913,[396]5.4857,[397]5.4919,[398]5.4961,[399]5.5033,[400]5.5021,[401]5.5026,[402]5.5038,[403]5.5060,[404]5.5116,[405]5.4965,[406]5.4924,[407]5.4922,[408]5.4934,[409]5.5046,[410]5.5136,[411]5.5238,[412]5.5381,[413]5.5484,[414]5.5549,[415]5.5611,[416]5.5685,[417]5.5786,[418]5.5810,[419]5.5860,[420]5.5939,[421]5.6041,[422]5.6074,[423]5.6130,[424]5.6227,[425]5.6304,[426]5.6365,[427]5.6407,[428]5.6480,[429]5.6515,[430]5.6580,[431]5.6710,[432]5.6739,[433]5.6728,[434]5.6689,[435]5.6702,[436]5.6729,[437]5.6814,[438]5.6891,[439]5.6861,[440]5.6852,[441]5.6803,[442]5.6787,[443]5.6799,[444]5.6813,[445]5.6802,[446]5.6823,[447]5.6846,[448]5.6880,[449]5.6865,[450]5.6875,[451]5.6846,[452]5.6697,[453]5.6600,[454]5.6546,[455]5.6552,[456]5.6593,[457]5.6607,[458]5.6587,[459]5.6584,[460]5.6658,[461]5.6619,[462]5.6586,[463]5.6573,[464]5.6571,[465]5.6549,[466]5.6477,[467]5.6467,[468]5.6446,[469]5.6459,[470]5.6450,[471]5.6401,[472]5.6416,[473]5.6368,[474]5.6356,[475]5.6290,[476]5.6277,[477]5.6199,[478]5.6175,[479]5.6193,[480]5.6221,[481]5.6226,[482]5.6179,[483]5.6137,[484]5.6148,[485]5.6092,[486]5.6025,[487]5.6015,[488]5.5988,[489]5.5934,[490]5.5903,[491]5.5869,[492]5.5803,[493]5.5774,[494]5.5757,[495]5.5735,[496]5.5693,[497]5.5634,[498]5.5607,[499]5.5572,[500]5.5488,[501]5.5419,[502]5.5411,[503]5.5401,[504]5.5326,[505]5.5329,[506]5.5339,[507]5.5286,[508]5.5248,[509]5.5250,[510]5.5271,[511]5.5315,[512]5.5355,[513]5.5382,[514]5.5438,[515]5.5399,[516]5.5387,[517]5.5387,[518]5.5384,[519]5.5406,[520]5.5419,[521]5.5431,[522]5.5447,[523]5.5454,[524]5.5508,[525]5.5537,[526]5.5542,[527]5.5559,[528]5.5503,[529]5.5514,[530]5.5475,[531]5.5468,[532]5.5516,[533]5.5544,[534]5.5528,[535]5.5551,[536]5.5504,[537]5.5484,[538]5.5534,[539]5.5542,[540]5.5561,[541]5.5560,[542]5.5574,[543]5.5592,[544]5.5606,[545]5.5593,[546]5.5596,[547]5.5561,[548]5.5520,[549]5.5521,[550]5.5498,[551]5.5469,[552]5.5450,[553]5.5416,[554]5.5393,[555]5.5371,[556]5.5361,[557]5.5376,[558]5.5341,[559]5.5346,[560]5.5337,[561]5.5339,[562]5.5311,[563]5.5311,[564]5.5352,[565]5.5365,[566]5.5368,[567]5.5350,[568]5.5358,[569]5.5341,[570]5.5368,[571]5.5380,[572]5.5386,[573]5.5391,[574]5.5360,[575]5.5347,[576]5.5345,[577]5.5326,[578]5.5307,[579]5.5309,[580]5.5253,[581]5.5223,[582]5.5225,[583]5.5233,[584]5.5234,[585]5.5177,[586]5.5119,[587]5.5122,[588]5.5165,[589]5.5217,[590]5.5246,[591]5.5262,[592]5.5250,[593]5.5212,[594]5.5226,[595]5.5208,[596]5.5247,[597]5.5228,[598]5.5199,[599]5.5227,[600]5.5216,[601]5.5204,[602]5.5213,[603]5.5241,[604]5.5249,[605]5.5277,[606]5.5292,[607]5.5277,[608]5.5247,[609]5.5255,[610]5.5296,[611]5.5285,[612]5.5303,[613]5.5274,[614]5.5234,[615]5.5171,[616]5.5197,[617]5.5144,[618]5.5095,[619]5.5049,[620]5.4935,[621]5.4880,[622]5.4859,[623]5.4872,[624]5.4875,[625]5.4881,[626]5.4876,[627]5.4904,[628]5.4910,[629]5.4916,[630]5.4944,[631]5.4990,[632]5.5038,[633]5.5026,[634]5.5056,[635]5.5053,[636]5.5018,[637]5.4981,[638]5.5003,[639]5.4969,[640]5.4974,[641]5.4977,[642]5.5030,[643]5.5049,[644]5.5073,[645]5.5057,[646]5.5094,[647]5.5046,[648]5.5059,[649]5.5062,[650]5.5095,[651]5.5135,[652]5.5138,[653]5.5176,[654]5.5119,[655]5.5110,

llama_print_timings: load time = 5957.47 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1701223.56 ms / 335360 tokens ( 5.07 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1733067.74 ms

Q4_4, 13B

main: seed = 1682790225
llama.cpp: loading model from ../models/13B/q44.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 14 (mostly Q4_4)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 10213.81 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.72 seconds per pass - ETA 29 minutes
[1]3.7371,[2]4.2097,[3]5.0064,[4]5.3764,[5]5.5586,[6]5.4949,[7]5.6305,[8]5.7337,[9]5.9985,[10]6.2164,[11]6.4018,[12]6.4542,[13]6.4223,[14]6.5122,[15]6.7109,[16]6.3997,[17]6.3208,[18]6.2951,[19]6.0050,[20]5.9874,[21]5.9108,[22]5.7379,[23]5.7112,[24]5.6188,[25]5.6312,[26]5.4855,[27]5.3108,[28]5.2153,[29]5.1408,[30]5.0041,[31]4.9644,[32]4.9804,[33]4.9376,[34]4.9779,[35]4.9952,[36]5.0181,[37]5.0117,[38]5.0095,[39]5.0362,[40]5.0770,[41]5.1000,[42]5.1361,[43]5.0994,[44]5.1415,[45]5.1435,[46]5.1173,[47]5.1457,[48]5.1302,[49]5.1325,[50]5.1018,[51]5.1096,[52]5.1021,[53]5.1478,[54]5.1379,[55]5.1189,[56]5.1389,[57]5.1573,[58]5.1792,[59]5.1969,[60]5.2336,[61]5.2273,[62]5.2826,[63]5.3079,[64]5.3183,[65]5.3554,[66]5.3534,[67]5.3714,[68]5.3824,[69]5.4101,[70]5.4406,[71]5.4614,[72]5.4958,[73]5.5438,[74]5.5508,[75]5.5605,[76]5.5748,[77]5.5862,[78]5.5737,[79]5.5997,[80]5.5943,[81]5.6018,[82]5.5978,[83]5.5517,[84]5.5406,[85]5.5338,[86]5.5187,[87]5.4533,[88]5.4109,[89]5.3896,[90]5.3801,[91]5.4009,[92]5.3968,[93]5.3979,[94]5.3971,[95]5.4232,[96]5.4202,[97]5.4171,[98]5.4136,[99]5.4063,[100]5.4035,[101]5.4261,[102]5.4222,[103]5.4380,[104]5.4423,[105]5.4440,[106]5.4578,[107]5.4561,[108]5.4715,[109]5.4708,[110]5.4650,[111]5.4836,[112]5.4995,[113]5.4999,[114]5.4986,[115]5.5033,[116]5.4916,[117]5.4909,[118]5.5141,[119]5.5324,[120]5.5620,[121]5.5776,[122]5.5990,[123]5.6352,[124]5.6525,[125]5.6471,[126]5.6825,[127]5.7151,[128]5.7425,[129]5.7312,[130]5.7397,[131]5.7352,[132]5.7318,[133]5.7202,[134]5.7284,[135]5.7279,[136]5.7190,[137]5.7155,[138]5.7017,[139]5.6938,[140]5.6922,[141]5.6652,[142]5.6613,[143]5.6361,[144]5.6211,[145]5.6124,[146]5.6018,[147]5.6069,[148]5.6100,[149]5.6069,[150]5.6060,[151]5.6105,[152]5.6049,[153]5.5951,[154]5.5893,[155]5.5957,[156]5.5933,[157]5.6083,[158]5.6106,[159]5.6112,[160]5.6147,[161]5.6256,[162]5.6003,[163]5.5907,[164]5.5704,[165]5.5452,[166]5.5225,[167]5.4905,[168]5.4638,[169]5.4505,[170]5.4419,[171]5.4213,[172]5.4093,[173]5.3965,[174]5.3701,[175]5.3500,[176]5.3366,[177]5.3201,[178]5.3003,[179]5.2871,[180]5.2796,[181]5.2638,[182]5.2477,[183]5.2357,[184]5.2347,[185]5.2275,[186]5.2280,[187]5.2337,[188]5.2312,[189]5.2476,[190]5.2479,[191]5.2649,[192]5.2784,[193]5.2932,[194]5.3039,[195]5.3229,[196]5.3343,[197]5.3533,[198]5.3668,[199]5.3688,[200]5.3693,[201]5.3628,[202]5.3753,[203]5.3811,[204]5.3766,[205]5.3852,[206]5.3904,[207]5.3867,[208]5.3925,[209]5.3957,[210]5.4014,[211]5.4117,[212]5.4178,[213]5.4265,[214]5.4290,[215]5.4325,[216]5.4442,[217]5.4606,[218]5.4739,[219]5.4737,[220]5.4708,[221]5.4662,[222]5.4661,[223]5.4595,[224]5.4525,[225]5.4492,[226]5.4689,[227]5.4743,[228]5.4817,[229]5.4885,[230]5.4849,[231]5.5003,[232]5.4900,[233]5.4753,[234]5.4608,[235]5.4390,[236]5.4339,[237]5.4251,[238]5.4284,[239]5.4171,[240]5.4082,[241]5.4115,[242]5.4126,[243]5.4118,[244]5.4020,[245]5.3983,[246]5.3883,[247]5.3786,[248]5.3726,[249]5.3694,[250]5.3727,[251]5.3645,[252]5.3597,[253]5.3509,[254]5.3468,[255]5.3376,[256]5.3213,[257]5.3114,[258]5.3047,[259]5.3035,[260]5.2952,[261]5.2901,[262]5.2863,[263]5.2813,[264]5.2584,[265]5.2582,[266]5.2553,[267]5.2490,[268]5.2555,[269]5.2550,[270]5.2558,[271]5.2620,[272]5.2649,[273]5.2665,[274]5.2676,[275]5.2736,[276]5.2795,[277]5.2917,[278]5.3000,[279]5.3081,[280]5.3118,[281]5.3217,[282]5.3270,[283]5.3396,[284]5.3483,[285]5.3564,[286]5.3687,[287]5.3654,[288]5.3710,[289]5.3649,[290]5.3511,[291]5.3385,[292]5.3253,[293]5.3133,[294]5.3137,[295]5.3138,[296]5.3186,[297]5.3176,[298]5.3198,[299]5.3175,[300]5.3089,[301]5.3093,[302]5.3030,[303]5.2950,[304]5.2876,[305]5.2848,[306]5.2740,[307]5.2769,[308]5.2776,[309]5.2648,[310]5.2621,[311]5.2579,[312]5.2592,[313]5.2536,[314]5.2522,[315]5.2396,[316]5.2361,[317]5.2236,[318]5.2077,[319]5.2178,[320]5.2289,[321]5.2333,[322]5.2301,[323]5.2244,[324]5.2226,[325]5.2319,[326]5.2336,[327]5.2343,[328]5.2379,[329]5.2426,[330]5.2446,[331]5.2548,[332]5.2511,[333]5.2589,[334]5.2544,[335]5.2495,[336]5.2518,[337]5.2507,[338]5.2503,[339]5.2460,[340]5.2434,[341]5.2501,[342]5.2534,[343]5.2578,[344]5.2583,[345]5.2597,[346]5.2581,[347]5.2620,[348]5.2656,[349]5.2677,[350]5.2659,[351]5.2674,[352]5.2677,[353]5.2628,[354]5.2634,[355]5.2684,[356]5.2713,[357]5.2683,[358]5.2761,[359]5.2783,[360]5.2749,[361]5.2747,[362]5.2813,[363]5.2920,[364]5.2969,[365]5.3006,[366]5.3024,[367]5.3111,[368]5.3090,[369]5.3105,[370]5.3126,[371]5.3086,[372]5.3133,[373]5.3173,[374]5.3155,[375]5.3151,[376]5.3206,[377]5.3171,[378]5.3197,[379]5.3236,[380]5.3168,[381]5.3140,[382]5.3098,[383]5.3080,[384]5.3080,[385]5.3068,[386]5.3057,[387]5.3054,[388]5.3025,[389]5.2988,[390]5.2936,[391]5.2880,[392]5.2844,[393]5.2841,[394]5.2872,[395]5.2864,[396]5.2812,[397]5.2879,[398]5.2922,[399]5.2994,[400]5.2985,[401]5.2992,[402]5.3002,[403]5.3026,[404]5.3081,[405]5.2931,[406]5.2890,[407]5.2878,[408]5.2889,[409]5.3000,[410]5.3092,[411]5.3186,[412]5.3326,[413]5.3428,[414]5.3491,[415]5.3550,[416]5.3624,[417]5.3719,[418]5.3740,[419]5.3788,[420]5.3865,[421]5.3960,[422]5.3994,[423]5.4049,[424]5.4137,[425]5.4214,[426]5.4276,[427]5.4316,[428]5.4389,[429]5.4427,[430]5.4488,[431]5.4614,[432]5.4646,[433]5.4638,[434]5.4604,[435]5.4616,[436]5.4644,[437]5.4727,[438]5.4801,[439]5.4774,[440]5.4765,[441]5.4720,[442]5.4709,[443]5.4721,[444]5.4739,[445]5.4731,[446]5.4750,[447]5.4774,[448]5.4805,[449]5.4789,[450]5.4800,[451]5.4771,[452]5.4617,[453]5.4525,[454]5.4469,[455]5.4473,[456]5.4512,[457]5.4524,[458]5.4506,[459]5.4501,[460]5.4573,[461]5.4533,[462]5.4495,[463]5.4477,[464]5.4473,[465]5.4452,[466]5.4377,[467]5.4366,[468]5.4347,[469]5.4357,[470]5.4346,[471]5.4296,[472]5.4303,[473]5.4257,[474]5.4247,[475]5.4179,[476]5.4152,[477]5.4071,[478]5.4045,[479]5.4049,[480]5.4075,[481]5.4078,[482]5.4032,[483]5.3992,[484]5.4000,[485]5.3932,[486]5.3868,[487]5.3860,[488]5.3837,[489]5.3786,[490]5.3753,[491]5.3719,[492]5.3651,[493]5.3621,[494]5.3605,[495]5.3584,[496]5.3546,[497]5.3485,[498]5.3458,[499]5.3424,[500]5.3344,[501]5.3273,[502]5.3263,[503]5.3252,[504]5.3175,[505]5.3174,[506]5.3180,[507]5.3127,[508]5.3091,[509]5.3096,[510]5.3118,[511]5.3160,[512]5.3199,[513]5.3222,[514]5.3277,[515]5.3237,[516]5.3227,[517]5.3225,[518]5.3226,[519]5.3247,[520]5.3260,[521]5.3270,[522]5.3283,[523]5.3290,[524]5.3345,[525]5.3372,[526]5.3377,[527]5.3393,[528]5.3339,[529]5.3348,[530]5.3311,[531]5.3306,[532]5.3354,[533]5.3382,[534]5.3363,[535]5.3384,[536]5.3341,[537]5.3323,[538]5.3373,[539]5.3381,[540]5.3398,[541]5.3396,[542]5.3409,[543]5.3431,[544]5.3444,[545]5.3433,[546]5.3435,[547]5.3403,[548]5.3362,[549]5.3363,[550]5.3343,[551]5.3318,[552]5.3299,[553]5.3270,[554]5.3247,[555]5.3228,[556]5.3221,[557]5.3237,[558]5.3205,[559]5.3207,[560]5.3194,[561]5.3195,[562]5.3168,[563]5.3166,[564]5.3209,[565]5.3219,[566]5.3225,[567]5.3206,[568]5.3216,[569]5.3201,[570]5.3228,[571]5.3241,[572]5.3251,[573]5.3254,[574]5.3226,[575]5.3207,[576]5.3201,[577]5.3185,[578]5.3166,[579]5.3164,[580]5.3112,[581]5.3082,[582]5.3083,[583]5.3092,[584]5.3098,[585]5.3040,[586]5.2987,[587]5.2987,[588]5.3031,[589]5.3080,[590]5.3109,[591]5.3125,[592]5.3114,[593]5.3075,[594]5.3089,[595]5.3073,[596]5.3114,[597]5.3095,[598]5.3062,[599]5.3088,[600]5.3079,[601]5.3068,[602]5.3067,[603]5.3094,[604]5.3099,[605]5.3124,[606]5.3137,[607]5.3123,[608]5.3095,[609]5.3104,[610]5.3144,[611]5.3130,[612]5.3151,[613]5.3122,[614]5.3083,[615]5.3025,[616]5.3051,[617]5.3002,[618]5.2960,[619]5.2916,[620]5.2808,[621]5.2758,[622]5.2741,[623]5.2754,[624]5.2758,[625]5.2766,[626]5.2763,[627]5.2790,[628]5.2798,[629]5.2802,[630]5.2832,[631]5.2876,[632]5.2923,[633]5.2911,[634]5.2940,[635]5.2936,[636]5.2901,[637]5.2864,[638]5.2885,[639]5.2854,[640]5.2859,[641]5.2863,[642]5.2913,[643]5.2930,[644]5.2947,[645]5.2933,[646]5.2967,[647]5.2916,[648]5.2927,[649]5.2929,[650]5.2959,[651]5.3000,[652]5.3004,[653]5.3042,[654]5.2988,[655]5.2981,

llama_print_timings: load time = 6350.49 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1667989.72 ms / 335360 tokens ( 4.97 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1699934.63 ms

@ikawrakow ikawrakow added enhancement New feature or request generation quality Quality of model output labels Apr 29, 2023
@gotzmann
Copy link

There's interesting paper about weight 'outliers' within matrixes:

https://huggingface.co/blog/hf-bitsandbytes-integration

So maybe implementing some hacks for better scaling those will give better perplexity for lower-bit models.

@Green-Sky
Copy link
Collaborator

Q2_4

... soo, finally 65B with 32gigs of ram? 😄

@sw
Copy link
Collaborator

sw commented Apr 30, 2023

Here's another idea, which maybe could be combined with QX_4.

FP32 has 8 bits for the exponent, 23 bits for the mantissa:

s e7 e6 e5 e4 e3 e2 e1 e0 m22 ... m0

bfloat16 maintains 8 bits for the exponent, leaving 7 bits for the mantissa:

s e7 e6 e5 e4 e3 e2 e1 e0 m6 m5 m4 m3 m2 m1 m0

FP16 has 5 bits for the exponent, 10 bits for the mantissa:

s e4 e3 e2 e1 e0 m9 m8 m7 m6 m5 m4 m3 m2 m1 m0

Since the values in the LLaMA models fall roughly in the range [-2,+2], we may want to further reduce the number of bits dedicated to the exponent. I suspect that a significant range of the exponent field currently is unused.

One could define a different 16-bit format:

s e3 e2 e1 e0 m10 m9 m8 m7 m6 m5 m4 m3 m2 m1 m0

Or an 8-bit minifloat:

s e2 e1 e0 m3 m2 m1 m0

@ikawrakow
Copy link
Contributor Author

@ggerganov I see you have put this into the improve integer quantization project. Do you see this as just "improve integer quantization"? Given the most recent results, >=4-quantization does not need much improvement as it comes very close to the generation quality of the full fp16 model. As far as I can tell this is the best way to move forward with <4-bit quantization that has been proposed so far, including GPTQ (based on these GPTQ results). I think that together with issue #1256 that I just added, we have a viable approach for 2- and 3-bit quantizations, so perhaps a dedicated project for <4-bit quantization could be more appropriate?

@ikawrakow
Copy link
Contributor Author

@sw Can you please elaborate how we can use the alternative float formats you are proposing within the context of this issue? I can see that, perhaps, using your fp8 minifloat could avoid the "super-block" fp16 scale by storing the currently int8_t scales directly as fp8. But that would only save 0.125 bits per weight, and would come at the expense of more bit-fiddling that is likely to reduce evaluation performance.

@ikawrakow ikawrakow added the Less than 4 bits Efforts related to viable quantized models using <4 bits label Apr 30, 2023
@sw
Copy link
Collaborator

sw commented Apr 30, 2023

I must admit I haven't thought this through completely. As far as performance goes, it could be implemented with a look-up table, just as FP16 currently is (except on Arm), so I don't expect it to be worse. I'm simply saying that a generic float wastes a bit of space because we're not using values |x| > 2. But of course that's not specific to your proposal, but true of the LLaMA models in general.

@ggerganov
Copy link
Owner

ggerganov commented Apr 30, 2023

A couple of minor thoughts:

  • The QX_4 methods (together with all other Q methods) will need to handle leftovers from the division by 128 or 256. LLaMA layers have rows divisible by 256, but it might not be true for other models. From what I've seen, it is usually guaranteed that the row size is divisible by 32, but I think it would be nice to handle the general case (see the f16 and f32 dot product routines)
  • Would be great to achieve a LLaMA model with reasonable quality that is below 4GB. This would allow to load it in a web page with WASM. Lately, running LLMs in a web page seems to be generating a lot of interest 😄 (edit: and iPhones too)

@ikawrakow ikawrakow mentioned this issue Jun 3, 2023
@ikawrakow
Copy link
Contributor Author

Closed via #1684

@yiliu30
Copy link

yiliu30 commented Oct 11, 2023

The QX_4 methods (together with all other Q methods) will need to handle leftovers from the division by 128 or 256. LLaMA layers have rows divisible by 256, but it might not be true for other models. From what I've seen, it is usually guaranteed that the row size is divisible by 32, but I think it would be nice to handle the general case (see the f16 and f32 dot product routines)

Hi @ikawrakow , I have the same concern, could you please share the details?
BTW, as I understand, it only requires the number of weights divisible by 256 not the rows of layers?

@ikawrakow
Copy link
Contributor Author

The number of elements (weights) in a row must be divisible by 256. If it is not (e.g., Falcon-7B, OpenLLaMA-3B), then one can turn on LLAMA_QKK_64 at compile time. This will make the k_quants blocks to be of size 64 (all models supported by llama.cpp have their tensors row size divisible by at least 64). Obviously it would be useful to have a better solution, but I don't see how this can be (easily) done without breaking backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request generation quality Quality of model output Less than 4 bits Efforts related to viable quantized models using <4 bits
Development

No branches or pull requests

6 participants