Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm Port #1087

Merged
merged 105 commits into from Aug 25, 2023
Merged

ROCm Port #1087

merged 105 commits into from Aug 25, 2023

Conversation

SlyEcho
Copy link
Sponsor Collaborator

@SlyEcho SlyEcho commented Apr 20, 2023

Currently I can say that for regular users the CLBlast version is much easier to run. If you want the most performance, though, HIP is for you.


Remember to tweak the new settings LLAMA_CUDA_DMMV_X and LLAMA_CUDA_MMV_Y, LLAMA_CUDA_KQUANTS_ITER

I get the best result with 128, 8 and 1, for example.


Note for unsupported GPU users:

You need to use an environment variable to force ROCm to run.
You can ckeck this resource: ROCm supported-gpu-list

export HSA_OVERRIDE_GFX_VERSION=10.3.0

This will make it work in the currently running shell, after that ./main and other llama.cpp commands will run.

rocBLAS is only released for a limited number of GPUs: gfx900 gfx906 gfx908 gfx90a gfx1030 (depends on ROCm version, etc).

If you look in /opt/rocm/lib/rocblas/library/ you should see a lot of files, but only for some GPUs, for others you need to find something that is close enough, like gfx1030 instead of gfx1033, and then that becomes 10.3.0 for the environment variable.


If you have multiple AMD devices:

If you have a GPU and APU then it may try to use wrong devices. There is an environment variable you can set to control the selected device:

export HIP_VISIBLE_DEVICES=0

ROCm port

I just define all the cudaXxx functions to hipXxx etc. This may seem stupidly simple but it's exactly the same kind of trick AMD uses to make HIP code compile with nvcc, you can see it in /opt/rocm/include/hip/nvidia_detail/nvidia_hip_runtime_api.h (for some reason I can't find the source for this anywhere online but it has a free license, so if you want, I can post it).

HIP can also compile the Cuda kernel programs without any major modifications, just some header stuff.

Compiling

To this, you need the ROCm developer kit and hipBLAS which may be a separate package.

With CMake I have to invoke:

CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake -DLLAMA_HIPBLAS=ON

It is probably unavoidable to use the LLVM Clang compiler You can use the ROCm included one or the system one, but mixing it with GCC objects is just asking for trouble.

Makefile should work, too, pass in LLAMA_HIPBLAS=1. You can use the env variable ROCM_PATH if ROCm is not installed at /opt/rocm:

make -j4 LLAMA_HIPBLAS=1

Makefile will override the compilers to ROCm LLVM, so it should be a simple command to compile. But you should be able to override the compilers on the make command line.

Docker

Probably the best option right now is using Docker with AMD's images:

FROM rocm/dev-ubuntu-22.04:5.5-complete AS build

WORKDIR /app

COPY . ./

RUN make LLAMA_HIPBLAS=1

ENV PATH="/app:$PATH"
CMD [ "main" ]

Save it somewhere as rocm.Dockerfile then in llama.cpp's source do:

docker build -f /path/to/rocm.Dockerfile . -t llama.cpp:rocm

Then run it like this:

docker run --rm -it --init \
    --device /dev/dri --device /dev/kfd \
    -v/my/models:/models llama.cpp:rocm \
    main -m /models/llama-7b-q4_2.bin -p "$(cat prompts/dan.txt)"

You can also add the override like this: -e HSA_OVERRIDE_GFX_VERSION=10.3.0 and -e HIP_VISIBLE_DEVICES=0 as needed. There may be also some other security flags needed on some distros, and whatever permissions your user needs to have for the devices (usually group video).

Using nerdctl, I had to add the DRI devices separately (--device /dev/dri/card0 --device /dev/dri/renderD128 rather than the /dev/dri directory like in Docker), it also works, but beware that on some buildkit setups it will load the whole image via tarballs and since it's several gigabytes it will take some time to build.

All the commands are there besides main, you can also run /bin/bash for a dev shell, mount the llama.cpp source somewhere and use it for development. It is a bit of a thick image, for end users, maybe too big, I want to trim it down but the AMD stuff is bloated.


What's up with the compilers?

Regarding hipcc, it is not really a compiler, I had a lot of problems with it, it couldn't compile and link .cpp and .o files together (like hipcc main.cpp llama.o ggml.o ...). If you open it in a text editor you see it's a Perl script and all it does is provide some default flags for the Clang compiler. It might work in CMake, since CMake always compiles to objects first.

It shouldn't be a requirement to use AMD's version of Clang, it is possible to use any normal Clang or LLVM (maybe even Zig?) to compile the device code. In the CMake build I added a warning if the compiler is not Clang but it won't stop you from experimenting (well, it will probably fail to compile the .cu file).

If you use VS Code then the C/C++ plugin doesn't support HIP correctly, it sees in compileCommands.json (part of CMake's output) that the .cu file is using a language argument -x hip and it doesn't know what that is, so the whole file is locked to the C language even if it's actually C++ and you'll see some red squiggles. This flag comes from the hip::device package in CMake.

In CMake it is harder to use different compilers in the same project (may need to use a subdirectory) than in Make, so currently the .cu file is handled as a C++ file and compiled with the rest of the C++ files, this is what AMD's vision is with HIP -- they should just be normal C++ files.

I also tried adding another language, HIP enable_language(HIP), to CMake but I had some trouble getting the CMake to configure in all environments consistently, maybe it it needs some package that was missing in the container. In this case, it would work more similar to Cuda: I can define the .cu file's language to be HIP, whatever compiler configured for HIP compiles it and a compiler that can link it correctly will link it to an executable. When it was working on Arch, it configured it automatically like: CMAKE_CXX_COMPILER=/usr/bin/g++ and CMAKE_HIP_COMPILER=/usr/bin/clang++ and it was working correctly, using the HIP compliler to link in the end. This would be the ideal solution, it would give the user the most control over the config -- if I got it to work, that is 😜. If someone more experienced with this knows how to do it, please go ahead.

For the Makefile I thought it would be easier to override the compilers, because it is supposed to be more beginner friendly and you can get a result in one command (that is if everything is installed properly). But it has some variables also.

@ggerganov
Copy link
Owner

What does hipBLAS do?

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 20, 2023

hipBLAS is just basically a wrapper around rocBLAS or cuBLAS. Well, all of HIP is supposed to be.

@SlyEcho SlyEcho mentioned this pull request Apr 20, 2023
Now HIP Clang is not required, the CMake scripts will configure the
needed compiler, which can be system clang++. Also other code can
still use GCC, but CMake will force the clang to link.
@slaren
Copy link
Collaborator

slaren commented Apr 20, 2023

I have started moving all the cuda specific stuff to ggml-cuda.h/cu in #1094, you could also move all the HIP stuff to ggml-cuda.h to keep ggml.c a bit more clean. If this works well, it could be a nice way to support AMD GPUs. Do you have any performance numbers?

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 20, 2023

I'll try to rebase on your code. As for perf, it's about 38 ms for 7B, GPU is Vega64
What's the best way to do a measurement?

@slaren
Copy link
Collaborator

slaren commented Apr 20, 2023

Either the perplexity time per pass or the prompt eval times with a big prompt seems good enough to measure performance, that's what I have been doing anyway. Use --no-mmap to make sure that there isn't any loading happening in the first eval.

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 20, 2023

7b-q4_0:  15.21 seconds per pass - ETA 2.77 hours
7b-f16:   16.30 seconds per pass - ETA 2.97 hours
13b-q4_0: 19.60 seconds per pass - ETA 3.57 hours
30b-q4_0: 29.70 seconds per pass - ETA 5.40 hours

GPU is used at about 30%, VRAM 2G

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 21, 2023

I'm now building it in AMD's official Docker image and it is giving me double the performance... 🤯

7b-q4_0: 5.84 seconds per pass - ETA 1.06 hours
7b-f16: 6.47 seconds per pass - ETA 1.18 hours
13b-q4_0: 9.89 seconds per pass - ETA 1.80 hours
30b-q4_0: 20.40 seconds per pass - ETA 3.71 hours

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 21, 2023

This is the rocprof trace from the Docker image:
docker prof

And this one from the Arch:
arch prof

It just seems faster because it loads the BLAS libraries faster.

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 21, 2023

@slaren can you check in Cuda, currently --memory_f32 is broken for me.

@slaren
Copy link
Collaborator

slaren commented Apr 21, 2023

--memory_f32 seems to work fine for me with Cuda, I couldn't notice any issues.

@FNsi
Copy link
Contributor

FNsi commented Apr 22, 2023

Thank you for the great work.

Currently Perplexity not working in that PR.
Running perplexity and it stuck after show the 655chunks, batch_size=512
GPU is still working. Let me try to wait more time for that...

@SlyEcho it's work, sorry I didn't make it because haven't deleted all the other flags.

Llama 30B Q4_2
F32:
ETA 9h28m
[1] 3.2521, [2] 3.6665, [3] 4.3870 [4] 4.3477, [5] 4.2213
Without F32 flag:
ETA 8h47m
[1] 3.2520, [2] 3.6665, [3] 4.3869, [4] 4.3476, [5] 4.2213, [6] 4.2205, [7] 4.4011, [8] 4.4856, [9] 4.7332, [10] 4.9523, [11] 5.1126, [12] 5.1601, [13] 5.1378 [14] 5.2206 [15] 5.3794 ......[100] 4.3098 and I decide to abort it 😅
Until now, compare with 30b Q4_1 result in the discussion post, it's keep accurate and perform better.
30b q4_1 result by Jason Titus

And my pc's running test with rocm suit 5.4.2 is below:
30b llama Q4_2 Running with DAN
Master 50cb666
OpenBlas:
real 1m29.206
User 14m47.035
Sys 6m13.047

Master 50cb666
with your ggml.c and ggml-cuda.cu

Hipblas:
Real 0m57.723
User 7m23.156
Sys 0m3.356

Meanwhile maybe it's better to mention CXX also need to be changed to hipcc

Peak vram usage about 1.4 G, while running perplexity is about 2 G.

@FNsi
Copy link
Contributor

FNsi commented Apr 22, 2023

@slaren can you check in Cuda, currently --memory_f32 is broken for me.

This --memory_f32 is Working with gfx1035 (HSA gfx1030) indeed the vega integrated gpu 680M

More detail: I didn't set cxx=clang, but cxx=hipcc. Maybe that's the reason?

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 23, 2023

I think the issue with --memory_f32 is resolved for me at least. I will try to to a perplexity run.

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 24, 2023

Bonus picture, running on a Steam Desk with Steam OS. I have installed containerd so I don't have to install any ROCm stuff.

Steam Deck ROCm llama cpp

To achieve this, the env var HSA_OVERRIDE_GFX_VERSION=10.3.0 was used and llama.cpp built with GPU_TARGETS=gfx1030 because the native gfx1033 is not supported by rocBLAS yet. I'm sure a properly tuned rocBLAS build could be faster. Note that I have changed the GPU VRAM split in the BIOS to 4GB.

hipBLAS eval (plugged in 🔌): 49 ms per token.
CPU eval (🔋🔌): 118 ms per token.
OpenBLAS eval (🔋🔌): 84 ms per token.

@DGdev91
Copy link

DGdev91 commented Apr 24, 2023

I was trying to make it work on HIP too (here is my fork https://github.com/DGdev91/llama.cpp) but i wasn't able to make it work, it was stuck after showing the "llama_model_load_internal" rows.
I have the same problem with this code, so i guess the issue wasn't in the code, but in my own setup.
Also, this solution is indeed much cleaner than mine, so let's just work on this PR.
My GPU is a RX 5700xt, and i use HSA_OVERRIDE_GFX_VERSION=10.3.0 too, it's a common workaround also for pytorch-related programs, like StableDiffusion.

Any idea on how can i try to figure out what is going on?

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 24, 2023

@DGdev91 that means it is crashing when trying to initialize HIP or hipBLAS.

What compiler did you use? The hipcc perl script is probably legacy and the integrated LLVM is the way to go, also the program should be linked with it and not GCC.

What is the GPU target that you used? Should be --offload-arch=gfx1030 if you want to use that.

The CMake file seems to be just broken.

EDIT: I forgot to mention, but when I managed to compile your code, it was running fine on the GPU 😃

@DGdev91
Copy link

DGdev91 commented Apr 24, 2023

@DGdev91 that means it is crashing when trying to initialize HIP or hipBLAS.

What compiler did you use? The hipcc perl script is probably legacy and the integrated LLVM is the way to go, also the program should be linked with it and not GCC.

What is the GPU target that you used? Should be --offload-arch=gfx1030 if you want to use that.

The CMake file seems to be just broken.

EDIT: I forgot to mention, but when I managed to compile your code, it was running fine on the GPU smiley

You are right, but forget my fork, it was just an experiment. i already said i prefer your solution, and i had the same exact issue even there.
If my code worked for you (after correcting the makefile) we have another confirm it's an issue on my end.
what is really wierd, it works just fine with StableDiffusion

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 24, 2023

I suspect it has something to do with the GPU architecture that is being built. My Makefile changes will detect the GPU of your system but that may not work if you're overriding it on the command line. On the Steam Deck I had to build it for one specific one (gfx1030) because that's the one rocBLAS supports.

This is something that should happen automatically and not be on the user to fix. I need to figure it out.

@DGdev91
Copy link

DGdev91 commented Apr 24, 2023

I suspect it has something to do with the GPU architecture that is being built. My Makefile changes will detect the GPU of your system but that may not work if you're overriding it on the command line. On the Steam Deck I had to build it for one specific one (gfx1030) because that's the one rocBLAS supports.

This is something that should happen automatically and not be on the user to fix. I need to figure it out.

I compiled it with make LLAMA_HIPBLAS=1 GPU_TARGETS=gfx1030 and launched export HSA_OVERRIDE_GFX_VERSION=10.3.0 before launching main. There must be something else.

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Apr 25, 2023

Perplexity Testing for hipBLAS version

Code

Commit: 3a004b2a0166e412d8d54052c50bfd093611ad95

Models

I should mention that the Q4_0 models were converted some time ago so I don't know if they are "fresh" with the latest quantization fixes.
The other ones I made recently from the F16 version.

find models -name 'llama-7b-*.bin' -exec sem -j4 shasum {} ';'
8c5fe788ceaf8077e505f8f43efaa8f8cfd6e3eb  models/llama-7b-q4_0.bin
80a9d0bdf85dcddc83533a3aecf70eb9c542fdfa  models/llama-7b-q4_2.bin
da9ebf470350d8912caa04bf54fc6aced8d9ef19  models/llama-7b-q4_1.bin
1cbe22cfd2600f4e3b2d247ed1b82504cde3be78  models/llama-7b-q4_3.bin
0512fdf961215612db5a47cb1f6539c55936523c  models/llama-7b-f16.bin

Hardware

CPU: Intel Core i7 7700K (4c/8t), 4.7 GHz (OC)
RAM: 32 GB DDR4, 2666 MT/s
GPU: AMD Radeon Vega64 (8GB)

Arch Linux testing with:

OS: Arch Linux 6.2.11-arch1-1
BLAS: OpenBLAS 0.3.23-1
ROCm: 5.4.3

AMD official Docker with this Dockerfile:

rocm.Dockerfile
FROM rocm/dev-ubuntu-22.04
ARG GPU_TARGETS="gfx900"
ARG MAKE_JOBS=4

RUN apt-get update && \
    apt-get --no-install-recommends install -y hipblas-dev

WORKDIR /app

COPY . ./

RUN make \
    LLAMA_HIPBLAS=1 \
    GPU_TARGETS="$GPU_TARGETS" \
    -j $MAKE_JOBS \
    main perplexity

STOPSIGNAL SIGKILL
ENV PATH="/app:$PATH"
CMD [ "main" ]

Compile with:

docker build -f ~/Desktop/rocm.Dockerfile . -t llama.cpp:rocm

Results

7B Q4_0, Arch: [655]6.2818
./build/bin/perplexity --no-mmap -m ./models/llama-7b-q4_0.bin -f ./models/wiki.test.raw

main: seed = 1682276609
llama.cpp: loading model from ./models/llama-7b-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
15.40 seconds per pass - ETA 2 hours 48 minutes
[1]4.3749,[2]4.9540,[3]5.8254,[4]6.4669,[5]6.5409,[6]6.5395,[7]6.7155,[8]6.8046,[9]7.1737,[10]7.4103,[11]7.6549,[12]7.6926,[13]7.6022,[14]7.6783,[15]7.9331,[16]7.5386,[17]7.4157,[18]7.3768,[19]7.0052,[20]6.9921,[21]6.8947,[22]6.7102,[23]6.6723,[24]6.5850,[25]6.5848,[26]6.4125,[27]6.2326,[28]6.1317,[29]6.0477,[30]5.8916,[31]5.8634,[32]5.8812,[33]5.8164,[34]5.8511,[35]5.8769,[36]5.9208,[37]5.9247,[38]5.9419,[39]5.9800,[40]6.0387,[41]6.0458,[42]6.0802,[43]6.0373,[44]6.0921,[45]6.0965,[46]6.0707,[47]6.0944,[48]6.0652,[49]6.0722,[50]6.0328,[51]6.0287,[52]6.0177,[53]6.0619,[54]6.0454,[55]6.0230,[56]6.0572,[57]6.0803,[58]6.1021,[59]6.1159,[60]6.1624,[61]6.1512,[62]6.2143,[63]6.2479,[64]6.2630,[65]6.3095,[66]6.3197,[67]6.3378,[68]6.3518,[69]6.3767,[70]6.4090,[71]6.4305,[72]6.4602,[73]6.5254,[74]6.5308,[75]6.5453,[76]6.5616,[77]6.5749,[78]6.5597,[79]6.5892,[80]6.5817,[81]6.5943,[82]6.5980,[83]6.5443,[84]6.5297,[85]6.5182,[86]6.4971,[87]6.4318,[88]6.4033,[89]6.3827,[90]6.3661,[91]6.3922,[92]6.3884,[93]6.3909,[94]6.3884,[95]6.4171,[96]6.4150,[97]6.4078,[98]6.4007,[99]6.3867,[100]6.3867,[101]6.4126,[102]6.4062,[103]6.4280,[104]6.4347,[105]6.4333,[106]6.4510,[107]6.4497,[108]6.4621,[109]6.4568,[110]6.4523,[111]6.4751,[112]6.4941,[113]6.4955,[114]6.4921,[115]6.5003,[116]6.4930,[117]6.4985,[118]6.5270,[119]6.5479,[120]6.5844,[121]6.6007,[122]6.6254,[123]6.6644,[124]6.6822,[125]6.6735,[126]6.7126,[127]6.7497,[128]6.7772,[129]6.7603,[130]6.7699,[131]6.7646,[132]6.7558,[133]6.7430,[134]6.7542,[135]6.7507,[136]6.7376,[137]6.7296,[138]6.7125,[139]6.7009,[140]6.6979,[141]6.6681,[142]6.6633,[143]6.6354,[144]6.6153,[145]6.6066,[146]6.5931,[147]6.6006,[148]6.6029,[149]6.5969,[150]6.5928,[151]6.5940,[152]6.5845,[153]6.5678,[154]6.5587,[155]6.5655,[156]6.5605,[157]6.5789,[158]6.5824,[159]6.5866,[160]6.5892,[161]6.6017,[162]6.5716,[163]6.5595,[164]6.5334,[165]6.5015,[166]6.4728,[167]6.4354,[168]6.4028,[169]6.3893,[170]6.3768,[171]6.3479,[172]6.3298,[173]6.3113,[174]6.2805,[175]6.2584,[176]6.2482,[177]6.2271,[178]6.2036,[179]6.1864,[180]6.1775,[181]6.1551,[182]6.1359,[183]6.1217,[184]6.1215,[185]6.1142,[186]6.1160,[187]6.1214,[188]6.1178,[189]6.1362,[190]6.1371,[191]6.1575,[192]6.1738,[193]6.1916,[194]6.2032,[195]6.2242,[196]6.2412,[197]6.2633,[198]6.2788,[199]6.2818,[200]6.2863,[201]6.2822,[202]6.3027,[203]6.3093,[204]6.3092,[205]6.3201,[206]6.3279,[207]6.3239,[208]6.3323,[209]6.3375,[210]6.3426,[211]6.3524,[212]6.3598,[213]6.3704,[214]6.3739,[215]6.3780,[216]6.3927,[217]6.4106,[218]6.4241,[219]6.4244,[220]6.4208,[221]6.4145,[222]6.4110,[223]6.4002,[224]6.3935,[225]6.3888,[226]6.4103,[227]6.4191,[228]6.4249,[229]6.4317,[230]6.4273,[231]6.4441,[232]6.4311,[233]6.4140,[234]6.3983,[235]6.3825,[236]6.3747,[237]6.3644,[238]6.3678,[239]6.3516,[240]6.3413,[241]6.3446,[242]6.3483,[243]6.3468,[244]6.3348,[245]6.3322,[246]6.3201,[247]6.3077,[248]6.3010,[249]6.2989,[250]6.3037,[251]6.2960,[252]6.2927,[253]6.2824,[254]6.2784,[255]6.2668,[256]6.2477,[257]6.2366,[258]6.2279,[259]6.2259,[260]6.2178,[261]6.2135,[262]6.2076,[263]6.2030,[264]6.1838,[265]6.1829,[266]6.1814,[267]6.1745,[268]6.1842,[269]6.1822,[270]6.1828,[271]6.1906,[272]6.1952,[273]6.1948,[274]6.1962,[275]6.2052,[276]6.2107,[277]6.2267,[278]6.2375,[279]6.2461,[280]6.2497,[281]6.2596,[282]6.2656,[283]6.2803,[284]6.2881,[285]6.2975,[286]6.3122,[287]6.3116,[288]6.3176,[289]6.3085,[290]6.2934,[291]6.2780,[292]6.2622,[293]6.2484,[294]6.2509,[295]6.2503,[296]6.2547,[297]6.2533,[298]6.2559,[299]6.2531,[300]6.2418,[301]6.2419,[302]6.2339,[303]6.2262,[304]6.2184,[305]6.2159,[306]6.2027,[307]6.2051,[308]6.2084,[309]6.1921,[310]6.1860,[311]6.1796,[312]6.1818,[313]6.1762,[314]6.1749,[315]6.1584,[316]6.1541,[317]6.1375,[318]6.1159,[319]6.1278,[320]6.1408,[321]6.1446,[322]6.1401,[323]6.1335,[324]6.1310,[325]6.1410,[326]6.1410,[327]6.1431,[328]6.1473,[329]6.1533,[330]6.1559,[331]6.1682,[332]6.1651,[333]6.1720,[334]6.1662,[335]6.1597,[336]6.1635,[337]6.1605,[338]6.1592,[339]6.1534,[340]6.1491,[341]6.1568,[342]6.1593,[343]6.1648,[344]6.1648,[345]6.1647,[346]6.1619,[347]6.1666,[348]6.1708,[349]6.1726,[350]6.1692,[351]6.1698,[352]6.1698,[353]6.1646,[354]6.1644,[355]6.1699,[356]6.1729,[357]6.1693,[358]6.1783,[359]6.1814,[360]6.1777,[361]6.1772,[362]6.1839,[363]6.1951,[364]6.2016,[365]6.2074,[366]6.2081,[367]6.2169,[368]6.2147,[369]6.2156,[370]6.2166,[371]6.2106,[372]6.2159,[373]6.2215,[374]6.2202,[375]6.2198,[376]6.2282,[377]6.2233,[378]6.2259,[379]6.2319,[380]6.2235,[381]6.2192,[382]6.2135,[383]6.2125,[384]6.2118,[385]6.2105,[386]6.2100,[387]6.2092,[388]6.2047,[389]6.1993,[390]6.1924,[391]6.1843,[392]6.1803,[393]6.1784,[394]6.1810,[395]6.1793,[396]6.1720,[397]6.1795,[398]6.1833,[399]6.1916,[400]6.1912,[401]6.1926,[402]6.1932,[403]6.1950,[404]6.2014,[405]6.1918,[406]6.1884,[407]6.1877,[408]6.1887,[409]6.2011,[410]6.2121,[411]6.2246,[412]6.2408,[413]6.2524,[414]6.2599,[415]6.2652,[416]6.2732,[417]6.2863,[418]6.2897,[419]6.2971,[420]6.3058,[421]6.3179,[422]6.3236,[423]6.3308,[424]6.3428,[425]6.3519,[426]6.3583,[427]6.3628,[428]6.3711,[429]6.3756,[430]6.3846,[431]6.3992,[432]6.4035,[433]6.4022,[434]6.3976,[435]6.3983,[436]6.4008,[437]6.4102,[438]6.4181,[439]6.4145,[440]6.4140,[441]6.4089,[442]6.4080,[443]6.4093,[444]6.4096,[445]6.4076,[446]6.4100,[447]6.4129,[448]6.4172,[449]6.4145,[450]6.4148,[451]6.4106,[452]6.3987,[453]6.3904,[454]6.3843,[455]6.3851,[456]6.3898,[457]6.3915,[458]6.3894,[459]6.3903,[460]6.3990,[461]6.3962,[462]6.3946,[463]6.3997,[464]6.3988,[465]6.3957,[466]6.3877,[467]6.3879,[468]6.3878,[469]6.3900,[470]6.3905,[471]6.3858,[472]6.3904,[473]6.3848,[474]6.3861,[475]6.3802,[476]6.3826,[477]6.3754,[478]6.3745,[479]6.3808,[480]6.3860,[481]6.3880,[482]6.3834,[483]6.3793,[484]6.3815,[485]6.3798,[486]6.3743,[487]6.3743,[488]6.3724,[489]6.3674,[490]6.3647,[491]6.3616,[492]6.3558,[493]6.3528,[494]6.3510,[495]6.3507,[496]6.3473,[497]6.3419,[498]6.3402,[499]6.3351,[500]6.3255,[501]6.3185,[502]6.3184,[503]6.3181,[504]6.3088,[505]6.3113,[506]6.3122,[507]6.3060,[508]6.3018,[509]6.3007,[510]6.3046,[511]6.3092,[512]6.3127,[513]6.3145,[514]6.3212,[515]6.3156,[516]6.3149,[517]6.3159,[518]6.3160,[519]6.3190,[520]6.3218,[521]6.3234,[522]6.3263,[523]6.3273,[524]6.3336,[525]6.3373,[526]6.3385,[527]6.3405,[528]6.3351,[529]6.3355,[530]6.3308,[531]6.3298,[532]6.3347,[533]6.3370,[534]6.3351,[535]6.3374,[536]6.3320,[537]6.3297,[538]6.3345,[539]6.3357,[540]6.3397,[541]6.3405,[542]6.3412,[543]6.3426,[544]6.3438,[545]6.3417,[546]6.3423,[547]6.3378,[548]6.3323,[549]6.3325,[550]6.3298,[551]6.3260,[552]6.3239,[553]6.3197,[554]6.3175,[555]6.3146,[556]6.3143,[557]6.3166,[558]6.3126,[559]6.3122,[560]6.3117,[561]6.3118,[562]6.3100,[563]6.3100,[564]6.3143,[565]6.3160,[566]6.3157,[567]6.3135,[568]6.3140,[569]6.3124,[570]6.3150,[571]6.3156,[572]6.3166,[573]6.3168,[574]6.3132,[575]6.3127,[576]6.3126,[577]6.3115,[578]6.3095,[579]6.3103,[580]6.3037,[581]6.2999,[582]6.2989,[583]6.2997,[584]6.3001,[585]6.2924,[586]6.2856,[587]6.2858,[588]6.2908,[589]6.2966,[590]6.2996,[591]6.3018,[592]6.3003,[593]6.2966,[594]6.2977,[595]6.2954,[596]6.2991,[597]6.2967,[598]6.2930,[599]6.2952,[600]6.2949,[601]6.2935,[602]6.2952,[603]6.2982,[604]6.2992,[605]6.3025,[606]6.3045,[607]6.3029,[608]6.2993,[609]6.3000,[610]6.3036,[611]6.3018,[612]6.3043,[613]6.3007,[614]6.2956,[615]6.2879,[616]6.2909,[617]6.2846,[618]6.2794,[619]6.2738,[620]6.2595,[621]6.2523,[622]6.2506,[623]6.2521,[624]6.2525,[625]6.2525,[626]6.2510,[627]6.2531,[628]6.2536,[629]6.2533,[630]6.2567,[631]6.2631,[632]6.2685,[633]6.2668,[634]6.2701,[635]6.2707,[636]6.2674,[637]6.2640,[638]6.2666,[639]6.2637,[640]6.2647,[641]6.2650,[642]6.2718,[643]6.2740,[644]6.2752,[645]6.2731,[646]6.2774,[647]6.2735,[648]6.2742,[649]6.2743,[650]6.2782,[651]6.2839,[652]6.2846,[653]6.2889,[654]6.2825,[655]6.2818,

llama_print_timings:        load time = 16830.86 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 3533301.29 ms / 335360 tokens (   10.54 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 3572772.20 ms
7B Q4_0 --memory_f32, Arch: [655]6.2838,
./build/bin/perplexity --no-mmap -m ./models/llama-7b-q4_0.bin --memory_f32 -f ./models/wiki.test.raw

main: seed = 1682280920
llama.cpp: loading model from ./models/llama-7b-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 2052.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  512.00 MB

system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
15.63 seconds per pass - ETA 2 hours 50 minutes
[1]4.3801,[2]4.9556,[3]5.8270,[4]6.4693,[5]6.5437,[6]6.5414,[7]6.7176,[8]6.8070,[9]7.1757,[10]7.4122,[11]7.6567,[12]7.6957,[13]7.6057,[14]7.6821,[15]7.9367,[16]7.5419,[17]7.4189,[18]7.3798,[19]7.0077,[20]6.9948,[21]6.8969,[22]6.7124,[23]6.6743,[24]6.5868,[25]6.5871,[26]6.4149,[27]6.2349,[28]6.1341,[29]6.0498,[30]5.8938,[31]5.8659,[32]5.8839,[33]5.8189,[34]5.8538,[35]5.8795,[36]5.9233,[37]5.9272,[38]5.9444,[39]5.9825,[40]6.0413,[41]6.0483,[42]6.0827,[43]6.0398,[44]6.0944,[45]6.0989,[46]6.0730,[47]6.0968,[48]6.0675,[49]6.0746,[50]6.0352,[51]6.0310,[52]6.0201,[53]6.0642,[54]6.0477,[55]6.0251,[56]6.0595,[57]6.0825,[58]6.1044,[59]6.1183,[60]6.1648,[61]6.1536,[62]6.2166,[63]6.2503,[64]6.2654,[65]6.3119,[66]6.3221,[67]6.3402,[68]6.3541,[69]6.3791,[70]6.4114,[71]6.4328,[72]6.4626,[73]6.5277,[74]6.5331,[75]6.5475,[76]6.5638,[77]6.5771,[78]6.5619,[79]6.5915,[80]6.5839,[81]6.5968,[82]6.6005,[83]6.5468,[84]6.5323,[85]6.5209,[86]6.4997,[87]6.4344,[88]6.4059,[89]6.3854,[90]6.3688,[91]6.3949,[92]6.3910,[93]6.3936,[94]6.3911,[95]6.4198,[96]6.4178,[97]6.4106,[98]6.4036,[99]6.3896,[100]6.3896,[101]6.4155,[102]6.4091,[103]6.4309,[104]6.4376,[105]6.4362,[106]6.4539,[107]6.4526,[108]6.4649,[109]6.4596,[110]6.4551,[111]6.4779,[112]6.4970,[113]6.4984,[114]6.4950,[115]6.5033,[116]6.4959,[117]6.5014,[118]6.5299,[119]6.5508,[120]6.5872,[121]6.6035,[122]6.6283,[123]6.6673,[124]6.6850,[125]6.6763,[126]6.7154,[127]6.7524,[128]6.7799,[129]6.7630,[130]6.7725,[131]6.7673,[132]6.7584,[133]6.7457,[134]6.7568,[135]6.7534,[136]6.7402,[137]6.7322,[138]6.7151,[139]6.7035,[140]6.7005,[141]6.6707,[142]6.6659,[143]6.6379,[144]6.6178,[145]6.6092,[146]6.5957,[147]6.6031,[148]6.6054,[149]6.5994,[150]6.5953,[151]6.5965,[152]6.5870,[153]6.5703,[154]6.5613,[155]6.5680,[156]6.5630,[157]6.5813,[158]6.5849,[159]6.5890,[160]6.5916,[161]6.6041,[162]6.5739,[163]6.5619,[164]6.5357,[165]6.5039,[166]6.4751,[167]6.4377,[168]6.4051,[169]6.3916,[170]6.3791,[171]6.3502,[172]6.3322,[173]6.3136,[174]6.2829,[175]6.2607,[176]6.2505,[177]6.2295,[178]6.2059,[179]6.1887,[180]6.1798,[181]6.1574,[182]6.1382,[183]6.1240,[184]6.1238,[185]6.1165,[186]6.1182,[187]6.1237,[188]6.1200,[189]6.1384,[190]6.1393,[191]6.1597,[192]6.1761,[193]6.1938,[194]6.2054,[195]6.2264,[196]6.2434,[197]6.2655,[198]6.2811,[199]6.2840,[200]6.2886,[201]6.2844,[202]6.3049,[203]6.3115,[204]6.3114,[205]6.3224,[206]6.3302,[207]6.3262,[208]6.3347,[209]6.3398,[210]6.3449,[211]6.3547,[212]6.3621,[213]6.3727,[214]6.3763,[215]6.3803,[216]6.3951,[217]6.4129,[218]6.4264,[219]6.4267,[220]6.4231,[221]6.4168,[222]6.4133,[223]6.4024,[224]6.3958,[225]6.3910,[226]6.4126,[227]6.4212,[228]6.4271,[229]6.4338,[230]6.4294,[231]6.4463,[232]6.4332,[233]6.4160,[234]6.4004,[235]6.3846,[236]6.3768,[237]6.3664,[238]6.3698,[239]6.3536,[240]6.3433,[241]6.3466,[242]6.3504,[243]6.3488,[244]6.3368,[245]6.3342,[246]6.3221,[247]6.3098,[248]6.3030,[249]6.3010,[250]6.3057,[251]6.2981,[252]6.2947,[253]6.2844,[254]6.2804,[255]6.2688,[256]6.2497,[257]6.2386,[258]6.2299,[259]6.2279,[260]6.2197,[261]6.2154,[262]6.2095,[263]6.2050,[264]6.1858,[265]6.1850,[266]6.1835,[267]6.1766,[268]6.1863,[269]6.1843,[270]6.1850,[271]6.1928,[272]6.1974,[273]6.1969,[274]6.1983,[275]6.2073,[276]6.2128,[277]6.2288,[278]6.2397,[279]6.2483,[280]6.2518,[281]6.2617,[282]6.2678,[283]6.2825,[284]6.2902,[285]6.2997,[286]6.3144,[287]6.3138,[288]6.3198,[289]6.3107,[290]6.2956,[291]6.2802,[292]6.2644,[293]6.2505,[294]6.2530,[295]6.2524,[296]6.2567,[297]6.2553,[298]6.2579,[299]6.2551,[300]6.2439,[301]6.2440,[302]6.2359,[303]6.2282,[304]6.2204,[305]6.2180,[306]6.2047,[307]6.2072,[308]6.2104,[309]6.1941,[310]6.1880,[311]6.1816,[312]6.1838,[313]6.1782,[314]6.1769,[315]6.1604,[316]6.1562,[317]6.1395,[318]6.1179,[319]6.1298,[320]6.1428,[321]6.1466,[322]6.1422,[323]6.1355,[324]6.1331,[325]6.1431,[326]6.1430,[327]6.1451,[328]6.1494,[329]6.1554,[330]6.1579,[331]6.1703,[332]6.1671,[333]6.1741,[334]6.1682,[335]6.1618,[336]6.1655,[337]6.1625,[338]6.1612,[339]6.1555,[340]6.1511,[341]6.1589,[342]6.1614,[343]6.1669,[344]6.1668,[345]6.1667,[346]6.1638,[347]6.1686,[348]6.1727,[349]6.1746,[350]6.1712,[351]6.1717,[352]6.1717,[353]6.1665,[354]6.1664,[355]6.1718,[356]6.1749,[357]6.1712,[358]6.1802,[359]6.1833,[360]6.1795,[361]6.1791,[362]6.1858,[363]6.1970,[364]6.2035,[365]6.2093,[366]6.2100,[367]6.2188,[368]6.2166,[369]6.2175,[370]6.2185,[371]6.2125,[372]6.2178,[373]6.2234,[374]6.2221,[375]6.2217,[376]6.2301,[377]6.2252,[378]6.2278,[379]6.2338,[380]6.2254,[381]6.2211,[382]6.2154,[383]6.2144,[384]6.2137,[385]6.2124,[386]6.2119,[387]6.2111,[388]6.2066,[389]6.2012,[390]6.1943,[391]6.1862,[392]6.1822,[393]6.1803,[394]6.1828,[395]6.1812,[396]6.1738,[397]6.1814,[398]6.1852,[399]6.1935,[400]6.1931,[401]6.1945,[402]6.1950,[403]6.1969,[404]6.2032,[405]6.1937,[406]6.1903,[407]6.1895,[408]6.1905,[409]6.2029,[410]6.2139,[411]6.2264,[412]6.2427,[413]6.2542,[414]6.2618,[415]6.2670,[416]6.2750,[417]6.2881,[418]6.2916,[419]6.2990,[420]6.3077,[421]6.3197,[422]6.3255,[423]6.3326,[424]6.3446,[425]6.3537,[426]6.3602,[427]6.3647,[428]6.3730,[429]6.3775,[430]6.3865,[431]6.4011,[432]6.4054,[433]6.4041,[434]6.3995,[435]6.4002,[436]6.4027,[437]6.4121,[438]6.4200,[439]6.4164,[440]6.4158,[441]6.4108,[442]6.4099,[443]6.4112,[444]6.4115,[445]6.4095,[446]6.4118,[447]6.4147,[448]6.4191,[449]6.4164,[450]6.4167,[451]6.4124,[452]6.4006,[453]6.3922,[454]6.3862,[455]6.3869,[456]6.3917,[457]6.3934,[458]6.3912,[459]6.3922,[460]6.4009,[461]6.3981,[462]6.3965,[463]6.4016,[464]6.4007,[465]6.3976,[466]6.3895,[467]6.3898,[468]6.3897,[469]6.3919,[470]6.3924,[471]6.3876,[472]6.3923,[473]6.3866,[474]6.3880,[475]6.3821,[476]6.3844,[477]6.3773,[478]6.3764,[479]6.3827,[480]6.3879,[481]6.3899,[482]6.3854,[483]6.3813,[484]6.3835,[485]6.3818,[486]6.3763,[487]6.3763,[488]6.3744,[489]6.3694,[490]6.3668,[491]6.3637,[492]6.3579,[493]6.3549,[494]6.3531,[495]6.3528,[496]6.3493,[497]6.3440,[498]6.3422,[499]6.3372,[500]6.3275,[501]6.3206,[502]6.3204,[503]6.3202,[504]6.3109,[505]6.3134,[506]6.3143,[507]6.3081,[508]6.3038,[509]6.3027,[510]6.3067,[511]6.3113,[512]6.3148,[513]6.3166,[514]6.3233,[515]6.3177,[516]6.3169,[517]6.3180,[518]6.3181,[519]6.3211,[520]6.3238,[521]6.3255,[522]6.3284,[523]6.3294,[524]6.3357,[525]6.3394,[526]6.3406,[527]6.3426,[528]6.3372,[529]6.3376,[530]6.3329,[531]6.3319,[532]6.3368,[533]6.3391,[534]6.3372,[535]6.3395,[536]6.3341,[537]6.3318,[538]6.3366,[539]6.3378,[540]6.3417,[541]6.3426,[542]6.3433,[543]6.3447,[544]6.3459,[545]6.3437,[546]6.3444,[547]6.3398,[548]6.3343,[549]6.3345,[550]6.3318,[551]6.3280,[552]6.3260,[553]6.3217,[554]6.3195,[555]6.3166,[556]6.3163,[557]6.3186,[558]6.3146,[559]6.3142,[560]6.3137,[561]6.3139,[562]6.3120,[563]6.3120,[564]6.3163,[565]6.3180,[566]6.3177,[567]6.3155,[568]6.3160,[569]6.3144,[570]6.3170,[571]6.3176,[572]6.3186,[573]6.3188,[574]6.3151,[575]6.3147,[576]6.3145,[577]6.3135,[578]6.3114,[579]6.3122,[580]6.3056,[581]6.3018,[582]6.3008,[583]6.3016,[584]6.3020,[585]6.2943,[586]6.2875,[587]6.2878,[588]6.2927,[589]6.2985,[590]6.3015,[591]6.3037,[592]6.3022,[593]6.2985,[594]6.2996,[595]6.2973,[596]6.3010,[597]6.2987,[598]6.2949,[599]6.2971,[600]6.2969,[601]6.2954,[602]6.2971,[603]6.3001,[604]6.3012,[605]6.3044,[606]6.3065,[607]6.3048,[608]6.3013,[609]6.3019,[610]6.3056,[611]6.3037,[612]6.3062,[613]6.3026,[614]6.2975,[615]6.2898,[616]6.2928,[617]6.2865,[618]6.2814,[619]6.2757,[620]6.2615,[621]6.2542,[622]6.2525,[623]6.2540,[624]6.2545,[625]6.2544,[626]6.2529,[627]6.2550,[628]6.2555,[629]6.2552,[630]6.2586,[631]6.2650,[632]6.2704,[633]6.2687,[634]6.2720,[635]6.2726,[636]6.2694,[637]6.2659,[638]6.2686,[639]6.2657,[640]6.2666,[641]6.2669,[642]6.2738,[643]6.2759,[644]6.2772,[645]6.2750,[646]6.2793,[647]6.2755,[648]6.2761,[649]6.2762,[650]6.2801,[651]6.2858,[652]6.2865,[653]6.2908,[654]6.2844,[655]6.2838,

llama_print_timings:        load time = 17052.08 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 4082758.39 ms / 335360 tokens (   12.17 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 4118136.22 ms
7B Q4_0, Docker: [655]6.2819,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_0.bin --no-mmap -f /models/wiki.test.raw

main: seed = 1682287852
llama.cpp: loading model from /models/llama-7b-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.45 seconds per pass - ETA 59 minutes
[1]4.3749,[2]4.9542,[3]5.8256,[4]6.4671,[5]6.5411,[6]6.5396,[7]6.7157,[8]6.8048,[9]7.1739,[10]7.4104,[11]7.6551,[12]7.6928,[13]7.6023,[14]7.6784,[15]7.9332,[16]7.5387,[17]7.4157,[18]7.3769,[19]7.0053,[20]6.9922,[21]6.8948,[22]6.7103,[23]6.6723,[24]6.5850,[25]6.5849,[26]6.4126,[27]6.2327,[28]6.1319,[29]6.0478,[30]5.8917,[31]5.8635,[32]5.8813,[33]5.8165,[34]5.8513,[35]5.8770,[36]5.9209,[37]5.9248,[38]5.9420,[39]5.9801,[40]6.0388,[41]6.0459,[42]6.0803,[43]6.0375,[44]6.0922,[45]6.0966,[46]6.0707,[47]6.0945,[48]6.0653,[49]6.0723,[50]6.0329,[51]6.0287,[52]6.0178,[53]6.0619,[54]6.0455,[55]6.0231,[56]6.0573,[57]6.0803,[58]6.1021,[59]6.1160,[60]6.1625,[61]6.1512,[62]6.2143,[63]6.2479,[64]6.2631,[65]6.3095,[66]6.3198,[67]6.3379,[68]6.3518,[69]6.3767,[70]6.4090,[71]6.4305,[72]6.4603,[73]6.5255,[74]6.5309,[75]6.5453,[76]6.5617,[77]6.5750,[78]6.5597,[79]6.5892,[80]6.5817,[81]6.5943,[82]6.5980,[83]6.5443,[84]6.5297,[85]6.5182,[86]6.4971,[87]6.4318,[88]6.4033,[89]6.3828,[90]6.3662,[91]6.3922,[92]6.3884,[93]6.3909,[94]6.3885,[95]6.4172,[96]6.4151,[97]6.4078,[98]6.4007,[99]6.3868,[100]6.3867,[101]6.4126,[102]6.4063,[103]6.4280,[104]6.4347,[105]6.4333,[106]6.4510,[107]6.4497,[108]6.4621,[109]6.4568,[110]6.4523,[111]6.4752,[112]6.4942,[113]6.4955,[114]6.4921,[115]6.5004,[116]6.4931,[117]6.4986,[118]6.5271,[119]6.5480,[120]6.5844,[121]6.6008,[122]6.6255,[123]6.6645,[124]6.6823,[125]6.6736,[126]6.7127,[127]6.7497,[128]6.7772,[129]6.7604,[130]6.7699,[131]6.7647,[132]6.7558,[133]6.7431,[134]6.7543,[135]6.7508,[136]6.7377,[137]6.7297,[138]6.7126,[139]6.7009,[140]6.6979,[141]6.6682,[142]6.6633,[143]6.6354,[144]6.6154,[145]6.6067,[146]6.5931,[147]6.6007,[148]6.6030,[149]6.5969,[150]6.5928,[151]6.5940,[152]6.5846,[153]6.5678,[154]6.5587,[155]6.5655,[156]6.5605,[157]6.5789,[158]6.5824,[159]6.5866,[160]6.5893,[161]6.6017,[162]6.5716,[163]6.5596,[164]6.5334,[165]6.5016,[166]6.4728,[167]6.4355,[168]6.4028,[169]6.3893,[170]6.3768,[171]6.3479,[172]6.3299,[173]6.3113,[174]6.2806,[175]6.2585,[176]6.2483,[177]6.2272,[178]6.2037,[179]6.1865,[180]6.1776,[181]6.1552,[182]6.1360,[183]6.1217,[184]6.1216,[185]6.1143,[186]6.1160,[187]6.1215,[188]6.1178,[189]6.1363,[190]6.1371,[191]6.1575,[192]6.1739,[193]6.1916,[194]6.2033,[195]6.2243,[196]6.2413,[197]6.2633,[198]6.2789,[199]6.2818,[200]6.2864,[201]6.2822,[202]6.3027,[203]6.3093,[204]6.3092,[205]6.3201,[206]6.3279,[207]6.3239,[208]6.3324,[209]6.3376,[210]6.3426,[211]6.3524,[212]6.3598,[213]6.3704,[214]6.3740,[215]6.3780,[216]6.3928,[217]6.4107,[218]6.4241,[219]6.4244,[220]6.4209,[221]6.4146,[222]6.4110,[223]6.4002,[224]6.3936,[225]6.3888,[226]6.4104,[227]6.4191,[228]6.4250,[229]6.4317,[230]6.4273,[231]6.4442,[232]6.4311,[233]6.4140,[234]6.3984,[235]6.3825,[236]6.3748,[237]6.3644,[238]6.3678,[239]6.3516,[240]6.3413,[241]6.3446,[242]6.3484,[243]6.3468,[244]6.3349,[245]6.3323,[246]6.3201,[247]6.3078,[248]6.3010,[249]6.2990,[250]6.3037,[251]6.2961,[252]6.2927,[253]6.2825,[254]6.2785,[255]6.2669,[256]6.2477,[257]6.2367,[258]6.2280,[259]6.2260,[260]6.2178,[261]6.2135,[262]6.2076,[263]6.2031,[264]6.1838,[265]6.1830,[266]6.1815,[267]6.1745,[268]6.1842,[269]6.1822,[270]6.1829,[271]6.1907,[272]6.1953,[273]6.1948,[274]6.1963,[275]6.2052,[276]6.2107,[277]6.2268,[278]6.2376,[279]6.2462,[280]6.2497,[281]6.2596,[282]6.2657,[283]6.2804,[284]6.2882,[285]6.2976,[286]6.3123,[287]6.3117,[288]6.3177,[289]6.3086,[290]6.2935,[291]6.2781,[292]6.2623,[293]6.2485,[294]6.2509,[295]6.2504,[296]6.2547,[297]6.2533,[298]6.2559,[299]6.2531,[300]6.2419,[301]6.2420,[302]6.2339,[303]6.2262,[304]6.2184,[305]6.2160,[306]6.2028,[307]6.2052,[308]6.2084,[309]6.1921,[310]6.1860,[311]6.1796,[312]6.1819,[313]6.1763,[314]6.1750,[315]6.1585,[316]6.1542,[317]6.1375,[318]6.1159,[319]6.1278,[320]6.1409,[321]6.1447,[322]6.1402,[323]6.1335,[324]6.1311,[325]6.1411,[326]6.1410,[327]6.1432,[328]6.1474,[329]6.1534,[330]6.1559,[331]6.1683,[332]6.1651,[333]6.1721,[334]6.1662,[335]6.1598,[336]6.1635,[337]6.1605,[338]6.1592,[339]6.1535,[340]6.1492,[341]6.1569,[342]6.1594,[343]6.1649,[344]6.1648,[345]6.1648,[346]6.1619,[347]6.1666,[348]6.1708,[349]6.1727,[350]6.1693,[351]6.1698,[352]6.1698,[353]6.1646,[354]6.1644,[355]6.1699,[356]6.1730,[357]6.1693,[358]6.1784,[359]6.1815,[360]6.1777,[361]6.1773,[362]6.1839,[363]6.1951,[364]6.2016,[365]6.2075,[366]6.2082,[367]6.2169,[368]6.2147,[369]6.2156,[370]6.2167,[371]6.2107,[372]6.2159,[373]6.2216,[374]6.2202,[375]6.2199,[376]6.2283,[377]6.2234,[378]6.2259,[379]6.2320,[380]6.2235,[381]6.2193,[382]6.2135,[383]6.2125,[384]6.2118,[385]6.2106,[386]6.2100,[387]6.2092,[388]6.2047,[389]6.1993,[390]6.1924,[391]6.1844,[392]6.1803,[393]6.1784,[394]6.1810,[395]6.1794,[396]6.1720,[397]6.1795,[398]6.1834,[399]6.1917,[400]6.1913,[401]6.1927,[402]6.1932,[403]6.1951,[404]6.2014,[405]6.1919,[406]6.1885,[407]6.1877,[408]6.1887,[409]6.2011,[410]6.2121,[411]6.2246,[412]6.2409,[413]6.2524,[414]6.2600,[415]6.2652,[416]6.2732,[417]6.2863,[418]6.2898,[419]6.2972,[420]6.3058,[421]6.3179,[422]6.3237,[423]6.3308,[424]6.3428,[425]6.3519,[426]6.3583,[427]6.3628,[428]6.3711,[429]6.3756,[430]6.3846,[431]6.3992,[432]6.4035,[433]6.4022,[434]6.3976,[435]6.3983,[436]6.4008,[437]6.4102,[438]6.4181,[439]6.4146,[440]6.4140,[441]6.4089,[442]6.4080,[443]6.4094,[444]6.4097,[445]6.4076,[446]6.4100,[447]6.4129,[448]6.4172,[449]6.4145,[450]6.4149,[451]6.4106,[452]6.3987,[453]6.3904,[454]6.3843,[455]6.3851,[456]6.3898,[457]6.3915,[458]6.3894,[459]6.3904,[460]6.3991,[461]6.3962,[462]6.3946,[463]6.3997,[464]6.3988,[465]6.3958,[466]6.3877,[467]6.3880,[468]6.3879,[469]6.3901,[470]6.3906,[471]6.3858,[472]6.3904,[473]6.3848,[474]6.3862,[475]6.3803,[476]6.3826,[477]6.3754,[478]6.3745,[479]6.3808,[480]6.3860,[481]6.3880,[482]6.3835,[483]6.3793,[484]6.3816,[485]6.3798,[486]6.3743,[487]6.3743,[488]6.3724,[489]6.3674,[490]6.3647,[491]6.3617,[492]6.3559,[493]6.3528,[494]6.3510,[495]6.3508,[496]6.3473,[497]6.3419,[498]6.3402,[499]6.3352,[500]6.3255,[501]6.3185,[502]6.3184,[503]6.3182,[504]6.3088,[505]6.3113,[506]6.3122,[507]6.3061,[508]6.3018,[509]6.3007,[510]6.3046,[511]6.3092,[512]6.3127,[513]6.3146,[514]6.3212,[515]6.3157,[516]6.3149,[517]6.3159,[518]6.3160,[519]6.3190,[520]6.3218,[521]6.3234,[522]6.3263,[523]6.3274,[524]6.3336,[525]6.3373,[526]6.3385,[527]6.3405,[528]6.3351,[529]6.3356,[530]6.3308,[531]6.3298,[532]6.3347,[533]6.3370,[534]6.3351,[535]6.3375,[536]6.3321,[537]6.3297,[538]6.3345,[539]6.3357,[540]6.3397,[541]6.3406,[542]6.3413,[543]6.3426,[544]6.3438,[545]6.3417,[546]6.3423,[547]6.3378,[548]6.3323,[549]6.3325,[550]6.3298,[551]6.3260,[552]6.3240,[553]6.3197,[554]6.3175,[555]6.3146,[556]6.3143,[557]6.3166,[558]6.3126,[559]6.3122,[560]6.3117,[561]6.3119,[562]6.3100,[563]6.3100,[564]6.3144,[565]6.3161,[566]6.3158,[567]6.3135,[568]6.3141,[569]6.3124,[570]6.3150,[571]6.3157,[572]6.3167,[573]6.3168,[574]6.3132,[575]6.3127,[576]6.3126,[577]6.3116,[578]6.3095,[579]6.3103,[580]6.3037,[581]6.2999,[582]6.2990,[583]6.2997,[584]6.3001,[585]6.2924,[586]6.2856,[587]6.2858,[588]6.2908,[589]6.2966,[590]6.2996,[591]6.3018,[592]6.3003,[593]6.2966,[594]6.2977,[595]6.2954,[596]6.2991,[597]6.2968,[598]6.2930,[599]6.2952,[600]6.2950,[601]6.2935,[602]6.2953,[603]6.2982,[604]6.2992,[605]6.3025,[606]6.3046,[607]6.3029,[608]6.2994,[609]6.3000,[610]6.3037,[611]6.3018,[612]6.3043,[613]6.3007,[614]6.2956,[615]6.2879,[616]6.2909,[617]6.2846,[618]6.2795,[619]6.2738,[620]6.2596,[621]6.2524,[622]6.2506,[623]6.2521,[624]6.2526,[625]6.2525,[626]6.2510,[627]6.2531,[628]6.2536,[629]6.2534,[630]6.2568,[631]6.2631,[632]6.2685,[633]6.2668,[634]6.2701,[635]6.2707,[636]6.2675,[637]6.2640,[638]6.2667,[639]6.2638,[640]6.2647,[641]6.2650,[642]6.2719,[643]6.2740,[644]6.2753,[645]6.2732,[646]6.2774,[647]6.2736,[648]6.2743,[649]6.2744,[650]6.2782,[651]6.2839,[652]6.2846,[653]6.2889,[654]6.2825,[655]6.2819,

llama_print_timings:        load time =  6811.20 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 3334178.62 ms / 335360 tokens (    9.94 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 3367197.45 ms
7B Q4_0 --memory_f32, Docker: [655]6.2838,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_0.bin --memory_f32 -f /models/wiki.test.raw

main: seed = 1682331507
llama.cpp: loading model from /models/llama-7b-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size  =  512.00 MB

system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
6.72 seconds per pass - ETA 1 hours 13 minutes
[1]4.3801,[2]4.9555,[3]5.8269,[4]6.4692,[5]6.5436,[6]6.5413,[7]6.7175,[8]6.8070,[9]7.1756,[10]7.4121,[11]7.6567,[12]7.6957,[13]7.6057,[14]7.6821,[15]7.9367,[16]7.5419,[17]7.4189,[18]7.3798,[19]7.0077,[20]6.9948,[21]6.8969,[22]6.7124,[23]6.6744,[24]6.5868,[25]6.5871,[26]6.4149,[27]6.2349,[28]6.1341,[29]6.0499,[30]5.8939,[31]5.8660,[32]5.8840,[33]5.8189,[34]5.8538,[35]5.8795,[36]5.9233,[37]5.9272,[38]5.9444,[39]5.9825,[40]6.0413,[41]6.0483,[42]6.0827,[43]6.0398,[44]6.0945,[45]6.0989,[46]6.0730,[47]6.0968,[48]6.0675,[49]6.0746,[50]6.0352,[51]6.0311,[52]6.0201,[53]6.0642,[54]6.0477,[55]6.0251,[56]6.0595,[57]6.0826,[58]6.1044,[59]6.1183,[60]6.1648,[61]6.1537,[62]6.2167,[63]6.2503,[64]6.2654,[65]6.3119,[66]6.3221,[67]6.3402,[68]6.3542,[69]6.3791,[70]6.4114,[71]6.4328,[72]6.4626,[73]6.5278,[74]6.5331,[75]6.5475,[76]6.5638,[77]6.5771,[78]6.5619,[79]6.5915,[80]6.5840,[81]6.5968,[82]6.6005,[83]6.5468,[84]6.5323,[85]6.5209,[86]6.4998,[87]6.4344,[88]6.4060,[89]6.3854,[90]6.3688,[91]6.3949,[92]6.3910,[93]6.3936,[94]6.3911,[95]6.4198,[96]6.4178,[97]6.4106,[98]6.4036,[99]6.3896,[100]6.3896,[101]6.4155,[102]6.4091,[103]6.4309,[104]6.4377,[105]6.4362,[106]6.4539,[107]6.4526,[108]6.4649,[109]6.4596,[110]6.4551,[111]6.4780,[112]6.4970,[113]6.4984,[114]6.4950,[115]6.5033,[116]6.4959,[117]6.5014,[118]6.5299,[119]6.5508,[120]6.5872,[121]6.6035,[122]6.6283,[123]6.6673,[124]6.6850,[125]6.6763,[126]6.7154,[127]6.7524,[128]6.7799,[129]6.7630,[130]6.7725,[131]6.7673,[132]6.7585,[133]6.7457,[134]6.7569,[135]6.7534,[136]6.7402,[137]6.7322,[138]6.7151,[139]6.7035,[140]6.7005,[141]6.6707,[142]6.6659,[143]6.6379,[144]6.6178,[145]6.6092,[146]6.5957,[147]6.6032,[148]6.6054,[149]6.5994,[150]6.5953,[151]6.5965,[152]6.5870,[153]6.5703,[154]6.5613,[155]6.5680,[156]6.5630,[157]6.5814,[158]6.5849,[159]6.5891,[160]6.5916,[161]6.6041,[162]6.5739,[163]6.5619,[164]6.5357,[165]6.5039,[166]6.4751,[167]6.4378,[168]6.4051,[169]6.3916,[170]6.3791,[171]6.3502,[172]6.3322,[173]6.3136,[174]6.2829,[175]6.2608,[176]6.2505,[177]6.2295,[178]6.2059,[179]6.1887,[180]6.1798,[181]6.1574,[182]6.1382,[183]6.1240,[184]6.1238,[185]6.1165,[186]6.1182,[187]6.1237,[188]6.1200,[189]6.1384,[190]6.1393,[191]6.1597,[192]6.1761,[193]6.1938,[194]6.2054,[195]6.2264,[196]6.2434,[197]6.2655,[198]6.2811,[199]6.2840,[200]6.2886,[201]6.2844,[202]6.3049,[203]6.3115,[204]6.3114,[205]6.3224,[206]6.3302,[207]6.3262,[208]6.3347,[209]6.3398,[210]6.3449,[211]6.3547,[212]6.3621,[213]6.3727,[214]6.3763,[215]6.3803,[216]6.3951,[217]6.4129,[218]6.4264,[219]6.4267,[220]6.4231,[221]6.4168,[222]6.4133,[223]6.4024,[224]6.3958,[225]6.3910,[226]6.4126,[227]6.4212,[228]6.4271,[229]6.4338,[230]6.4294,[231]6.4463,[232]6.4332,[233]6.4160,[234]6.4004,[235]6.3846,[236]6.3768,[237]6.3664,[238]6.3698,[239]6.3536,[240]6.3433,[241]6.3466,[242]6.3504,[243]6.3488,[244]6.3368,[245]6.3342,[246]6.3221,[247]6.3098,[248]6.3030,[249]6.3010,[250]6.3057,[251]6.2981,[252]6.2947,[253]6.2844,[254]6.2804,[255]6.2688,[256]6.2497,[257]6.2386,[258]6.2299,[259]6.2279,[260]6.2197,[261]6.2154,[262]6.2095,[263]6.2050,[264]6.1858,[265]6.1850,[266]6.1835,[267]6.1766,[268]6.1863,[269]6.1843,[270]6.1850,[271]6.1928,[272]6.1974,[273]6.1969,[274]6.1983,[275]6.2073,[276]6.2128,[277]6.2288,[278]6.2397,[279]6.2483,[280]6.2518,[281]6.2617,[282]6.2678,[283]6.2825,[284]6.2903,[285]6.2997,[286]6.3144,[287]6.3138,[288]6.3198,[289]6.3107,[290]6.2956,[291]6.2802,[292]6.2644,[293]6.2505,[294]6.2530,[295]6.2524,[296]6.2567,[297]6.2553,[298]6.2579,[299]6.2551,[300]6.2439,[301]6.2440,[302]6.2359,[303]6.2282,[304]6.2204,[305]6.2180,[306]6.2047,[307]6.2072,[308]6.2104,[309]6.1941,[310]6.1880,[311]6.1816,[312]6.1838,[313]6.1782,[314]6.1769,[315]6.1604,[316]6.1562,[317]6.1395,[318]6.1179,[319]6.1298,[320]6.1429,[321]6.1466,[322]6.1422,[323]6.1356,[324]6.1331,[325]6.1431,[326]6.1430,[327]6.1451,[328]6.1494,[329]6.1554,[330]6.1579,[331]6.1703,[332]6.1671,[333]6.1741,[334]6.1682,[335]6.1618,[336]6.1655,[337]6.1625,[338]6.1612,[339]6.1555,[340]6.1511,[341]6.1589,[342]6.1614,[343]6.1669,[344]6.1668,[345]6.1667,[346]6.1638,[347]6.1686,[348]6.1727,[349]6.1746,[350]6.1712,[351]6.1717,[352]6.1717,[353]6.1665,[354]6.1664,[355]6.1718,[356]6.1749,[357]6.1712,[358]6.1802,[359]6.1833,[360]6.1795,[361]6.1791,[362]6.1858,[363]6.1970,[364]6.2035,[365]6.2093,[366]6.2100,[367]6.2188,[368]6.2166,[369]6.2175,[370]6.2185,[371]6.2125,[372]6.2178,[373]6.2234,[374]6.2221,[375]6.2217,[376]6.2301,[377]6.2252,[378]6.2278,[379]6.2338,[380]6.2254,[381]6.2211,[382]6.2154,[383]6.2144,[384]6.2137,[385]6.2124,[386]6.2119,[387]6.2111,[388]6.2066,[389]6.2012,[390]6.1943,[391]6.1862,[392]6.1822,[393]6.1803,[394]6.1828,[395]6.1812,[396]6.1738,[397]6.1814,[398]6.1852,[399]6.1935,[400]6.1931,[401]6.1945,[402]6.1950,[403]6.1969,[404]6.2032,[405]6.1937,[406]6.1903,[407]6.1895,[408]6.1905,[409]6.2029,[410]6.2139,[411]6.2264,[412]6.2427,[413]6.2542,[414]6.2618,[415]6.2670,[416]6.2750,[417]6.2881,[418]6.2916,[419]6.2990,[420]6.3077,[421]6.3197,[422]6.3255,[423]6.3326,[424]6.3446,[425]6.3537,[426]6.3602,[427]6.3647,[428]6.3730,[429]6.3775,[430]6.3865,[431]6.4011,[432]6.4054,[433]6.4041,[434]6.3995,[435]6.4002,[436]6.4027,[437]6.4121,[438]6.4200,[439]6.4164,[440]6.4158,[441]6.4108,[442]6.4099,[443]6.4112,[444]6.4115,[445]6.4095,[446]6.4118,[447]6.4147,[448]6.4191,[449]6.4164,[450]6.4167,[451]6.4124,[452]6.4006,[453]6.3922,[454]6.3862,[455]6.3869,[456]6.3917,[457]6.3934,[458]6.3912,[459]6.3922,[460]6.4009,[461]6.3981,[462]6.3965,[463]6.4016,[464]6.4007,[465]6.3976,[466]6.3895,[467]6.3898,[468]6.3897,[469]6.3919,[470]6.3924,[471]6.3876,[472]6.3923,[473]6.3866,[474]6.3880,[475]6.3821,[476]6.3844,[477]6.3773,[478]6.3764,[479]6.3827,[480]6.3879,[481]6.3899,[482]6.3854,[483]6.3813,[484]6.3835,[485]6.3818,[486]6.3763,[487]6.3763,[488]6.3744,[489]6.3694,[490]6.3667,[491]6.3637,[492]6.3579,[493]6.3549,[494]6.3531,[495]6.3528,[496]6.3493,[497]6.3440,[498]6.3422,[499]6.3372,[500]6.3275,[501]6.3206,[502]6.3204,[503]6.3202,[504]6.3109,[505]6.3134,[506]6.3143,[507]6.3081,[508]6.3038,[509]6.3027,[510]6.3067,[511]6.3113,[512]6.3148,[513]6.3166,[514]6.3233,[515]6.3177,[516]6.3169,[517]6.3180,[518]6.3181,[519]6.3211,[520]6.3238,[521]6.3255,[522]6.3283,[523]6.3294,[524]6.3357,[525]6.3394,[526]6.3406,[527]6.3426,[528]6.3372,[529]6.3376,[530]6.3329,[531]6.3319,[532]6.3368,[533]6.3391,[534]6.3372,[535]6.3395,[536]6.3341,[537]6.3318,[538]6.3366,[539]6.3378,[540]6.3417,[541]6.3426,[542]6.3433,[543]6.3447,[544]6.3459,[545]6.3437,[546]6.3444,[547]6.3398,[548]6.3343,[549]6.3345,[550]6.3318,[551]6.3280,[552]6.3260,[553]6.3217,[554]6.3195,[555]6.3166,[556]6.3163,[557]6.3186,[558]6.3146,[559]6.3142,[560]6.3137,[561]6.3139,[562]6.3120,[563]6.3120,[564]6.3163,[565]6.3180,[566]6.3177,[567]6.3155,[568]6.3160,[569]6.3144,[570]6.3170,[571]6.3176,[572]6.3186,[573]6.3188,[574]6.3151,[575]6.3147,[576]6.3145,[577]6.3135,[578]6.3114,[579]6.3122,[580]6.3056,[581]6.3018,[582]6.3008,[583]6.3016,[584]6.3020,[585]6.2943,[586]6.2875,[587]6.2877,[588]6.2927,[589]6.2985,[590]6.3015,[591]6.3037,[592]6.3022,[593]6.2985,[594]6.2996,[595]6.2973,[596]6.3010,[597]6.2987,[598]6.2949,[599]6.2971,[600]6.2969,[601]6.2954,[602]6.2971,[603]6.3001,[604]6.3011,[605]6.3044,[606]6.3065,[607]6.3048,[608]6.3013,[609]6.3019,[610]6.3056,[611]6.3037,[612]6.3062,[613]6.3026,[614]6.2975,[615]6.2898,[616]6.2928,[617]6.2865,[618]6.2814,[619]6.2757,[620]6.2614,[621]6.2542,[622]6.2525,[623]6.2540,[624]6.2545,[625]6.2544,[626]6.2529,[627]6.2550,[628]6.2555,[629]6.2552,[630]6.2586,[631]6.2650,[632]6.2704,[633]6.2687,[634]6.2720,[635]6.2726,[636]6.2694,[637]6.2659,[638]6.2686,[639]6.2657,[640]6.2666,[641]6.2669,[642]6.2738,[643]6.2759,[644]6.2772,[645]6.2750,[646]6.2793,[647]6.2755,[648]6.2761,[649]6.2762,[650]6.2801,[651]6.2858,[652]6.2865,[653]6.2908,[654]6.2844,[655]6.2838,

llama_print_timings:        load time = 11650.39 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 4610430.68 ms / 335360 tokens (   13.75 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 4646862.42 ms
7B F16, Docker: [655]5.9564,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-f16.bin --no-mmap -f /models/wiki.test.raw

main: seed = 1682338603
llama.cpp: loading model from /models/llama-7b-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 1 (mostly F16)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 13161547.11 KB
llama_model_load_internal: mem required  = 14645.07 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.79 seconds per pass - ETA 1 hours 3 minutes
[1]4.2324,[2]4.7328,[3]5.5850,[4]6.1712,[5]6.2985,[6]6.2643,[7]6.4560,[8]6.5508,[9]6.8799,[10]7.1226,[11]7.3342,[12]7.3542,[13]7.2699,[14]7.3195,[15]7.5602,[16]7.1907,[17]7.0812,[18]7.0277,[19]6.6794,[20]6.6695,[21]6.5787,[22]6.4057,[23]6.3766,[24]6.2860,[25]6.2829,[26]6.1235,[27]5.9531,[28]5.8556,[29]5.7688,[30]5.6166,[31]5.5870,[32]5.6072,[33]5.5522,[34]5.5821,[35]5.6047,[36]5.6413,[37]5.6418,[38]5.6522,[39]5.6845,[40]5.7347,[41]5.7433,[42]5.7807,[43]5.7433,[44]5.7999,[45]5.8026,[46]5.7772,[47]5.7975,[48]5.7731,[49]5.7745,[50]5.7360,[51]5.7323,[52]5.7230,[53]5.7679,[54]5.7525,[55]5.7312,[56]5.7595,[57]5.7791,[58]5.7980,[59]5.8151,[60]5.8558,[61]5.8487,[62]5.9057,[63]5.9365,[64]5.9499,[65]5.9914,[66]5.9998,[67]6.0170,[68]6.0314,[69]6.0549,[70]6.0847,[71]6.1056,[72]6.1367,[73]6.1949,[74]6.1989,[75]6.2124,[76]6.2242,[77]6.2354,[78]6.2210,[79]6.2483,[80]6.2418,[81]6.2527,[82]6.2567,[83]6.2069,[84]6.1891,[85]6.1766,[86]6.1557,[87]6.0916,[88]6.0670,[89]6.0476,[90]6.0336,[91]6.0562,[92]6.0504,[93]6.0511,[94]6.0487,[95]6.0758,[96]6.0754,[97]6.0698,[98]6.0639,[99]6.0511,[100]6.0501,[101]6.0737,[102]6.0689,[103]6.0889,[104]6.0961,[105]6.0961,[106]6.1125,[107]6.1118,[108]6.1251,[109]6.1202,[110]6.1167,[111]6.1388,[112]6.1588,[113]6.1608,[114]6.1570,[115]6.1628,[116]6.1539,[117]6.1588,[118]6.1868,[119]6.2082,[120]6.2423,[121]6.2567,[122]6.2808,[123]6.3170,[124]6.3342,[125]6.3251,[126]6.3631,[127]6.3985,[128]6.4280,[129]6.4134,[130]6.4216,[131]6.4180,[132]6.4108,[133]6.3979,[134]6.4077,[135]6.4038,[136]6.3934,[137]6.3862,[138]6.3688,[139]6.3586,[140]6.3551,[141]6.3264,[142]6.3230,[143]6.2934,[144]6.2733,[145]6.2644,[146]6.2529,[147]6.2563,[148]6.2567,[149]6.2515,[150]6.2474,[151]6.2494,[152]6.2398,[153]6.2242,[154]6.2158,[155]6.2226,[156]6.2179,[157]6.2344,[158]6.2386,[159]6.2430,[160]6.2457,[161]6.2574,[162]6.2300,[163]6.2188,[164]6.1960,[165]6.1661,[166]6.1397,[167]6.1037,[168]6.0740,[169]6.0606,[170]6.0500,[171]6.0241,[172]6.0076,[173]5.9916,[174]5.9623,[175]5.9412,[176]5.9301,[177]5.9106,[178]5.8884,[179]5.8719,[180]5.8626,[181]5.8416,[182]5.8242,[183]5.8109,[184]5.8101,[185]5.8029,[186]5.8040,[187]5.8101,[188]5.8063,[189]5.8232,[190]5.8240,[191]5.8445,[192]5.8602,[193]5.8764,[194]5.8872,[195]5.9080,[196]5.9233,[197]5.9438,[198]5.9585,[199]5.9614,[200]5.9663,[201]5.9611,[202]5.9793,[203]5.9863,[204]5.9848,[205]5.9948,[206]6.0016,[207]5.9979,[208]6.0061,[209]6.0101,[210]6.0151,[211]6.0257,[212]6.0326,[213]6.0428,[214]6.0451,[215]6.0475,[216]6.0614,[217]6.0792,[218]6.0920,[219]6.0918,[220]6.0883,[221]6.0832,[222]6.0811,[223]6.0719,[224]6.0648,[225]6.0611,[226]6.0812,[227]6.0890,[228]6.0942,[229]6.1002,[230]6.0970,[231]6.1133,[232]6.1020,[233]6.0861,[234]6.0718,[235]6.0519,[236]6.0454,[237]6.0361,[238]6.0388,[239]6.0245,[240]6.0147,[241]6.0165,[242]6.0202,[243]6.0185,[244]6.0076,[245]6.0047,[246]5.9939,[247]5.9826,[248]5.9756,[249]5.9732,[250]5.9778,[251]5.9710,[252]5.9679,[253]5.9586,[254]5.9534,[255]5.9426,[256]5.9254,[257]5.9136,[258]5.9058,[259]5.9036,[260]5.8957,[261]5.8916,[262]5.8862,[263]5.8811,[264]5.8589,[265]5.8584,[266]5.8566,[267]5.8502,[268]5.8588,[269]5.8570,[270]5.8580,[271]5.8655,[272]5.8688,[273]5.8691,[274]5.8716,[275]5.8797,[276]5.8855,[277]5.9009,[278]5.9107,[279]5.9200,[280]5.9227,[281]5.9323,[282]5.9380,[283]5.9524,[284]5.9602,[285]5.9686,[286]5.9819,[287]5.9814,[288]5.9871,[289]5.9791,[290]5.9639,[291]5.9494,[292]5.9350,[293]5.9221,[294]5.9243,[295]5.9235,[296]5.9281,[297]5.9269,[298]5.9297,[299]5.9273,[300]5.9169,[301]5.9169,[302]5.9093,[303]5.9010,[304]5.8929,[305]5.8895,[306]5.8773,[307]5.8795,[308]5.8825,[309]5.8673,[310]5.8620,[311]5.8558,[312]5.8579,[313]5.8525,[314]5.8509,[315]5.8356,[316]5.8304,[317]5.8147,[318]5.7950,[319]5.8065,[320]5.8185,[321]5.8229,[322]5.8190,[323]5.8124,[324]5.8097,[325]5.8197,[326]5.8199,[327]5.8220,[328]5.8258,[329]5.8316,[330]5.8342,[331]5.8463,[332]5.8435,[333]5.8502,[334]5.8449,[335]5.8390,[336]5.8428,[337]5.8406,[338]5.8399,[339]5.8350,[340]5.8308,[341]5.8387,[342]5.8415,[343]5.8462,[344]5.8463,[345]5.8468,[346]5.8444,[347]5.8484,[348]5.8518,[349]5.8541,[350]5.8509,[351]5.8517,[352]5.8517,[353]5.8461,[354]5.8462,[355]5.8512,[356]5.8542,[357]5.8508,[358]5.8597,[359]5.8622,[360]5.8590,[361]5.8586,[362]5.8654,[363]5.8764,[364]5.8823,[365]5.8874,[366]5.8887,[367]5.8971,[368]5.8948,[369]5.8957,[370]5.8971,[371]5.8919,[372]5.8966,[373]5.9012,[374]5.8997,[375]5.8998,[376]5.9063,[377]5.9020,[378]5.9047,[379]5.9104,[380]5.9027,[381]5.8994,[382]5.8945,[383]5.8938,[384]5.8934,[385]5.8924,[386]5.8919,[387]5.8917,[388]5.8882,[389]5.8832,[390]5.8765,[391]5.8691,[392]5.8652,[393]5.8636,[394]5.8661,[395]5.8649,[396]5.8579,[397]5.8648,[398]5.8685,[399]5.8760,[400]5.8762,[401]5.8776,[402]5.8786,[403]5.8805,[404]5.8869,[405]5.8775,[406]5.8743,[407]5.8739,[408]5.8755,[409]5.8868,[410]5.8975,[411]5.9086,[412]5.9240,[413]5.9348,[414]5.9422,[415]5.9476,[416]5.9552,[417]5.9669,[418]5.9704,[419]5.9770,[420]5.9856,[421]5.9969,[422]6.0009,[423]6.0078,[424]6.0182,[425]6.0267,[426]6.0329,[427]6.0372,[428]6.0453,[429]6.0503,[430]6.0583,[431]6.0720,[432]6.0758,[433]6.0751,[434]6.0711,[435]6.0720,[436]6.0745,[437]6.0839,[438]6.0912,[439]6.0882,[440]6.0873,[441]6.0824,[442]6.0810,[443]6.0823,[444]6.0828,[445]6.0810,[446]6.0833,[447]6.0862,[448]6.0903,[449]6.0879,[450]6.0888,[451]6.0850,[452]6.0715,[453]6.0631,[454]6.0575,[455]6.0585,[456]6.0631,[457]6.0651,[458]6.0629,[459]6.0635,[460]6.0719,[461]6.0692,[462]6.0679,[463]6.0717,[464]6.0706,[465]6.0679,[466]6.0604,[467]6.0605,[468]6.0603,[469]6.0623,[470]6.0627,[471]6.0581,[472]6.0623,[473]6.0572,[474]6.0584,[475]6.0523,[476]6.0539,[477]6.0469,[478]6.0458,[479]6.0513,[480]6.0557,[481]6.0574,[482]6.0531,[483]6.0491,[484]6.0510,[485]6.0489,[486]6.0432,[487]6.0429,[488]6.0407,[489]6.0360,[490]6.0337,[491]6.0308,[492]6.0253,[493]6.0226,[494]6.0209,[495]6.0204,[496]6.0167,[497]6.0112,[498]6.0095,[499]6.0053,[500]5.9962,[501]5.9897,[502]5.9899,[503]5.9894,[504]5.9808,[505]5.9830,[506]5.9837,[507]5.9780,[508]5.9741,[509]5.9735,[510]5.9769,[511]5.9814,[512]5.9849,[513]5.9869,[514]5.9930,[515]5.9877,[516]5.9867,[517]5.9878,[518]5.9874,[519]5.9904,[520]5.9928,[521]5.9940,[522]5.9967,[523]5.9974,[524]6.0030,[525]6.0061,[526]6.0070,[527]6.0087,[528]6.0038,[529]6.0043,[530]5.9994,[531]5.9984,[532]6.0029,[533]6.0052,[534]6.0035,[535]6.0056,[536]6.0004,[537]5.9984,[538]6.0032,[539]6.0043,[540]6.0080,[541]6.0083,[542]6.0094,[543]6.0109,[544]6.0120,[545]6.0102,[546]6.0110,[547]6.0069,[548]6.0023,[549]6.0025,[550]5.9996,[551]5.9963,[552]5.9941,[553]5.9906,[554]5.9886,[555]5.9857,[556]5.9852,[557]5.9875,[558]5.9838,[559]5.9834,[560]5.9833,[561]5.9835,[562]5.9814,[563]5.9810,[564]5.9853,[565]5.9873,[566]5.9872,[567]5.9850,[568]5.9856,[569]5.9844,[570]5.9871,[571]5.9876,[572]5.9886,[573]5.9887,[574]5.9852,[575]5.9846,[576]5.9845,[577]5.9831,[578]5.9812,[579]5.9818,[580]5.9755,[581]5.9719,[582]5.9708,[583]5.9717,[584]5.9719,[585]5.9646,[586]5.9579,[587]5.9585,[588]5.9633,[589]5.9684,[590]5.9714,[591]5.9735,[592]5.9724,[593]5.9692,[594]5.9702,[595]5.9679,[596]5.9711,[597]5.9691,[598]5.9663,[599]5.9684,[600]5.9679,[601]5.9664,[602]5.9672,[603]5.9700,[604]5.9708,[605]5.9742,[606]5.9761,[607]5.9745,[608]5.9713,[609]5.9721,[610]5.9755,[611]5.9738,[612]5.9764,[613]5.9729,[614]5.9680,[615]5.9610,[616]5.9637,[617]5.9578,[618]5.9532,[619]5.9479,[620]5.9347,[621]5.9282,[622]5.9266,[623]5.9281,[624]5.9286,[625]5.9288,[626]5.9278,[627]5.9300,[628]5.9301,[629]5.9297,[630]5.9328,[631]5.9384,[632]5.9439,[633]5.9425,[634]5.9459,[635]5.9466,[636]5.9432,[637]5.9398,[638]5.9422,[639]5.9392,[640]5.9401,[641]5.9403,[642]5.9468,[643]5.9489,[644]5.9501,[645]5.9483,[646]5.9522,[647]5.9482,[648]5.9491,[649]5.9493,[650]5.9531,[651]5.9583,[652]5.9594,[653]5.9632,[654]5.9571,[655]5.9564,

llama_print_timings:        load time = 11891.56 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 3755163.27 ms / 335360 tokens (   11.20 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 3794021.15 ms
7B Q4_1, Docker: [655]6.1290,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_1.bin --no-mmap -f /models/wiki.test.raw

main: seed = 1682342791
llama.cpp: loading model from /models/llama-7b-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 3 (mostly Q4_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4936267.11 KB
llama_model_load_internal: mem required  = 6612.57 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.21 seconds per pass - ETA 56 minutes
[1]4.4323,[2]4.8863,[3]5.7761,[4]6.3814,[5]6.4911,[6]6.4638,[7]6.6548,[8]6.7572,[9]7.0838,[10]7.3394,[11]7.5618,[12]7.6045,[13]7.5323,[14]7.5955,[15]7.8405,[16]7.4403,[17]7.3181,[18]7.2617,[19]6.8886,[20]6.8673,[21]6.7698,[22]6.5975,[23]6.5679,[24]6.4790,[25]6.4809,[26]6.3162,[27]6.1360,[28]6.0296,[29]5.9400,[30]5.7779,[31]5.7483,[32]5.7658,[33]5.7091,[34]5.7423,[35]5.7643,[36]5.8049,[37]5.8079,[38]5.8115,[39]5.8458,[40]5.8938,[41]5.9067,[42]5.9474,[43]5.9071,[44]5.9658,[45]5.9726,[46]5.9454,[47]5.9647,[48]5.9383,[49]5.9370,[50]5.8961,[51]5.8903,[52]5.8785,[53]5.9269,[54]5.9100,[55]5.8886,[56]5.9167,[57]5.9356,[58]5.9544,[59]5.9729,[60]6.0160,[61]6.0048,[62]6.0620,[63]6.0919,[64]6.1024,[65]6.1472,[66]6.1572,[67]6.1761,[68]6.1893,[69]6.2130,[70]6.2412,[71]6.2616,[72]6.2930,[73]6.3496,[74]6.3531,[75]6.3683,[76]6.3813,[77]6.3933,[78]6.3796,[79]6.4081,[80]6.4020,[81]6.4178,[82]6.4235,[83]6.3713,[84]6.3554,[85]6.3430,[86]6.3217,[87]6.2604,[88]6.2376,[89]6.2164,[90]6.2009,[91]6.2245,[92]6.2182,[93]6.2174,[94]6.2142,[95]6.2430,[96]6.2421,[97]6.2365,[98]6.2300,[99]6.2155,[100]6.2138,[101]6.2387,[102]6.2327,[103]6.2523,[104]6.2604,[105]6.2595,[106]6.2763,[107]6.2764,[108]6.2882,[109]6.2820,[110]6.2782,[111]6.2997,[112]6.3200,[113]6.3235,[114]6.3198,[115]6.3254,[116]6.3161,[117]6.3214,[118]6.3491,[119]6.3717,[120]6.4076,[121]6.4225,[122]6.4466,[123]6.4839,[124]6.5025,[125]6.4928,[126]6.5324,[127]6.5693,[128]6.6014,[129]6.5853,[130]6.5951,[131]6.5913,[132]6.5826,[133]6.5700,[134]6.5797,[135]6.5762,[136]6.5649,[137]6.5579,[138]6.5414,[139]6.5304,[140]6.5264,[141]6.4978,[142]6.4955,[143]6.4675,[144]6.4464,[145]6.4385,[146]6.4268,[147]6.4309,[148]6.4313,[149]6.4269,[150]6.4230,[151]6.4258,[152]6.4149,[153]6.3993,[154]6.3906,[155]6.3969,[156]6.3919,[157]6.4084,[158]6.4117,[159]6.4171,[160]6.4203,[161]6.4325,[162]6.4049,[163]6.3934,[164]6.3699,[165]6.3385,[166]6.3114,[167]6.2734,[168]6.2433,[169]6.2304,[170]6.2190,[171]6.1931,[172]6.1760,[173]6.1603,[174]6.1305,[175]6.1096,[176]6.0981,[177]6.0784,[178]6.0553,[179]6.0387,[180]6.0287,[181]6.0072,[182]5.9899,[183]5.9766,[184]5.9759,[185]5.9687,[186]5.9694,[187]5.9750,[188]5.9709,[189]5.9890,[190]5.9906,[191]6.0118,[192]6.0274,[193]6.0442,[194]6.0558,[195]6.0776,[196]6.0935,[197]6.1144,[198]6.1302,[199]6.1334,[200]6.1386,[201]6.1341,[202]6.1530,[203]6.1607,[204]6.1598,[205]6.1706,[206]6.1776,[207]6.1744,[208]6.1829,[209]6.1874,[210]6.1918,[211]6.2028,[212]6.2108,[213]6.2210,[214]6.2242,[215]6.2267,[216]6.2408,[217]6.2595,[218]6.2735,[219]6.2740,[220]6.2702,[221]6.2643,[222]6.2621,[223]6.2519,[224]6.2450,[225]6.2412,[226]6.2615,[227]6.2703,[228]6.2761,[229]6.2819,[230]6.2788,[231]6.2952,[232]6.2836,[233]6.2669,[234]6.2516,[235]6.2328,[236]6.2266,[237]6.2167,[238]6.2190,[239]6.2039,[240]6.1932,[241]6.1956,[242]6.1985,[243]6.1965,[244]6.1854,[245]6.1824,[246]6.1716,[247]6.1596,[248]6.1522,[249]6.1487,[250]6.1530,[251]6.1460,[252]6.1421,[253]6.1328,[254]6.1282,[255]6.1171,[256]6.0992,[257]6.0866,[258]6.0783,[259]6.0760,[260]6.0677,[261]6.0633,[262]6.0578,[263]6.0518,[264]6.0313,[265]6.0306,[266]6.0293,[267]6.0225,[268]6.0305,[269]6.0293,[270]6.0292,[271]6.0371,[272]6.0405,[273]6.0408,[274]6.0431,[275]6.0518,[276]6.0575,[277]6.0727,[278]6.0826,[279]6.0913,[280]6.0939,[281]6.1042,[282]6.1099,[283]6.1250,[284]6.1326,[285]6.1406,[286]6.1534,[287]6.1526,[288]6.1586,[289]6.1498,[290]6.1340,[291]6.1184,[292]6.1034,[293]6.0905,[294]6.0925,[295]6.0918,[296]6.0968,[297]6.0962,[298]6.0997,[299]6.0974,[300]6.0864,[301]6.0859,[302]6.0783,[303]6.0693,[304]6.0607,[305]6.0572,[306]6.0448,[307]6.0469,[308]6.0498,[309]6.0338,[310]6.0280,[311]6.0217,[312]6.0240,[313]6.0182,[314]6.0166,[315]6.0009,[316]5.9962,[317]5.9799,[318]5.9594,[319]5.9713,[320]5.9835,[321]5.9877,[322]5.9835,[323]5.9766,[324]5.9733,[325]5.9843,[326]5.9843,[327]5.9863,[328]5.9897,[329]5.9953,[330]5.9982,[331]6.0104,[332]6.0076,[333]6.0148,[334]6.0091,[335]6.0027,[336]6.0059,[337]6.0036,[338]6.0026,[339]5.9973,[340]5.9932,[341]6.0011,[342]6.0040,[343]6.0086,[344]6.0088,[345]6.0089,[346]6.0060,[347]6.0100,[348]6.0137,[349]6.0159,[350]6.0131,[351]6.0139,[352]6.0140,[353]6.0077,[354]6.0081,[355]6.0134,[356]6.0164,[357]6.0133,[358]6.0226,[359]6.0250,[360]6.0220,[361]6.0216,[362]6.0284,[363]6.0395,[364]6.0459,[365]6.0509,[366]6.0528,[367]6.0615,[368]6.0588,[369]6.0599,[370]6.0617,[371]6.0565,[372]6.0615,[373]6.0661,[374]6.0647,[375]6.0648,[376]6.0715,[377]6.0670,[378]6.0694,[379]6.0753,[380]6.0675,[381]6.0642,[382]6.0597,[383]6.0588,[384]6.0583,[385]6.0573,[386]6.0570,[387]6.0571,[388]6.0535,[389]6.0483,[390]6.0418,[391]6.0341,[392]6.0298,[393]6.0284,[394]6.0312,[395]6.0298,[396]6.0224,[397]6.0291,[398]6.0330,[399]6.0406,[400]6.0403,[401]6.0417,[402]6.0429,[403]6.0448,[404]6.0512,[405]6.0421,[406]6.0390,[407]6.0387,[408]6.0405,[409]6.0521,[410]6.0632,[411]6.0746,[412]6.0906,[413]6.1016,[414]6.1093,[415]6.1144,[416]6.1222,[417]6.1344,[418]6.1378,[419]6.1451,[420]6.1543,[421]6.1657,[422]6.1697,[423]6.1766,[424]6.1871,[425]6.1958,[426]6.2025,[427]6.2071,[428]6.2153,[429]6.2208,[430]6.2288,[431]6.2426,[432]6.2466,[433]6.2459,[434]6.2413,[435]6.2424,[436]6.2449,[437]6.2548,[438]6.2623,[439]6.2590,[440]6.2580,[441]6.2531,[442]6.2512,[443]6.2522,[444]6.2528,[445]6.2507,[446]6.2529,[447]6.2559,[448]6.2602,[449]6.2578,[450]6.2586,[451]6.2546,[452]6.2425,[453]6.2342,[454]6.2284,[455]6.2291,[456]6.2343,[457]6.2365,[458]6.2345,[459]6.2351,[460]6.2436,[461]6.2409,[462]6.2395,[463]6.2438,[464]6.2425,[465]6.2398,[466]6.2324,[467]6.2331,[468]6.2329,[469]6.2352,[470]6.2358,[471]6.2311,[472]6.2362,[473]6.2308,[474]6.2321,[475]6.2264,[476]6.2283,[477]6.2213,[478]6.2203,[479]6.2260,[480]6.2304,[481]6.2322,[482]6.2276,[483]6.2235,[484]6.2252,[485]6.2232,[486]6.2172,[487]6.2169,[488]6.2149,[489]6.2100,[490]6.2079,[491]6.2052,[492]6.1996,[493]6.1968,[494]6.1950,[495]6.1947,[496]6.1910,[497]6.1854,[498]6.1839,[499]6.1794,[500]6.1700,[501]6.1636,[502]6.1636,[503]6.1631,[504]6.1542,[505]6.1565,[506]6.1573,[507]6.1519,[508]6.1481,[509]6.1475,[510]6.1511,[511]6.1558,[512]6.1596,[513]6.1615,[514]6.1679,[515]6.1625,[516]6.1617,[517]6.1627,[518]6.1623,[519]6.1655,[520]6.1676,[521]6.1690,[522]6.1718,[523]6.1726,[524]6.1784,[525]6.1817,[526]6.1826,[527]6.1841,[528]6.1791,[529]6.1797,[530]6.1745,[531]6.1728,[532]6.1777,[533]6.1800,[534]6.1785,[535]6.1807,[536]6.1755,[537]6.1733,[538]6.1784,[539]6.1792,[540]6.1829,[541]6.1831,[542]6.1838,[543]6.1854,[544]6.1864,[545]6.1844,[546]6.1852,[547]6.1813,[548]6.1765,[549]6.1762,[550]6.1735,[551]6.1698,[552]6.1675,[553]6.1638,[554]6.1616,[555]6.1585,[556]6.1580,[557]6.1602,[558]6.1564,[559]6.1562,[560]6.1561,[561]6.1566,[562]6.1542,[563]6.1539,[564]6.1585,[565]6.1607,[566]6.1607,[567]6.1588,[568]6.1592,[569]6.1577,[570]6.1605,[571]6.1609,[572]6.1614,[573]6.1611,[574]6.1576,[575]6.1571,[576]6.1570,[577]6.1551,[578]6.1529,[579]6.1531,[580]6.1468,[581]6.1431,[582]6.1423,[583]6.1431,[584]6.1434,[585]6.1360,[586]6.1291,[587]6.1297,[588]6.1344,[589]6.1400,[590]6.1429,[591]6.1451,[592]6.1438,[593]6.1405,[594]6.1415,[595]6.1392,[596]6.1426,[597]6.1404,[598]6.1379,[599]6.1401,[600]6.1401,[601]6.1388,[602]6.1407,[603]6.1432,[604]6.1441,[605]6.1479,[606]6.1499,[607]6.1483,[608]6.1447,[609]6.1452,[610]6.1488,[611]6.1473,[612]6.1499,[613]6.1463,[614]6.1415,[615]6.1340,[616]6.1366,[617]6.1305,[618]6.1256,[619]6.1201,[620]6.1063,[621]6.0995,[622]6.0979,[623]6.0996,[624]6.1001,[625]6.1002,[626]6.0993,[627]6.1019,[628]6.1021,[629]6.1016,[630]6.1047,[631]6.1103,[632]6.1160,[633]6.1145,[634]6.1179,[635]6.1184,[636]6.1149,[637]6.1115,[638]6.1141,[639]6.1109,[640]6.1119,[641]6.1120,[642]6.1185,[643]6.1204,[644]6.1215,[645]6.1198,[646]6.1240,[647]6.1202,[648]6.1213,[649]6.1215,[650]6.1256,[651]6.1310,[652]6.1322,[653]6.1361,[654]6.1297,[655]6.1290,

llama_print_timings:        load time =  8444.43 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 3298326.00 ms / 335360 tokens (    9.84 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 3338225.33 ms
7B Q4_2, Docker: [655]6.2002,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_2.bin --no-mmap -f /models/wiki.test.raw

main: seed = 1682346906
llama.cpp: loading model from /models/llama-7b-q4_2.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 5 (mostly Q4_2)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.62 seconds per pass - ETA 1 hours 1 minutes
[1]4.4374,[2]4.8772,[3]5.7681,[4]6.3925,[5]6.5142,[6]6.4884,[7]6.6803,[8]6.7903,[9]7.1376,[10]7.3772,[11]7.5927,[12]7.6131,[13]7.5335,[14]7.6098,[15]7.8668,[16]7.4772,[17]7.3521,[18]7.3157,[19]6.9532,[20]6.9431,[21]6.8483,[22]6.6732,[23]6.6360,[24]6.5422,[25]6.5428,[26]6.3758,[27]6.1929,[28]6.0919,[29]6.0021,[30]5.8428,[31]5.8105,[32]5.8319,[33]5.7736,[34]5.8103,[35]5.8352,[36]5.8781,[37]5.8817,[38]5.8969,[39]5.9326,[40]5.9921,[41]6.0049,[42]6.0453,[43]6.0030,[44]6.0581,[45]6.0611,[46]6.0371,[47]6.0585,[48]6.0306,[49]6.0337,[50]5.9922,[51]5.9876,[52]5.9770,[53]6.0202,[54]6.0027,[55]5.9780,[56]6.0049,[57]6.0244,[58]6.0453,[59]6.0629,[60]6.1072,[61]6.0985,[62]6.1567,[63]6.1927,[64]6.2088,[65]6.2552,[66]6.2626,[67]6.2811,[68]6.2987,[69]6.3254,[70]6.3581,[71]6.3803,[72]6.4105,[73]6.4727,[74]6.4784,[75]6.4929,[76]6.5055,[77]6.5178,[78]6.5028,[79]6.5305,[80]6.5214,[81]6.5310,[82]6.5349,[83]6.4802,[84]6.4626,[85]6.4516,[86]6.4288,[87]6.3638,[88]6.3353,[89]6.3155,[90]6.3010,[91]6.3247,[92]6.3185,[93]6.3207,[94]6.3176,[95]6.3453,[96]6.3436,[97]6.3384,[98]6.3311,[99]6.3161,[100]6.3170,[101]6.3427,[102]6.3372,[103]6.3573,[104]6.3649,[105]6.3640,[106]6.3790,[107]6.3764,[108]6.3901,[109]6.3834,[110]6.3793,[111]6.4015,[112]6.4218,[113]6.4244,[114]6.4213,[115]6.4285,[116]6.4199,[117]6.4258,[118]6.4552,[119]6.4756,[120]6.5110,[121]6.5286,[122]6.5540,[123]6.5918,[124]6.6105,[125]6.6001,[126]6.6398,[127]6.6765,[128]6.7069,[129]6.6906,[130]6.7006,[131]6.6967,[132]6.6882,[133]6.6753,[134]6.6859,[135]6.6817,[136]6.6691,[137]6.6608,[138]6.6454,[139]6.6349,[140]6.6299,[141]6.5998,[142]6.5962,[143]6.5663,[144]6.5457,[145]6.5365,[146]6.5232,[147]6.5297,[148]6.5298,[149]6.5236,[150]6.5186,[151]6.5200,[152]6.5081,[153]6.4913,[154]6.4825,[155]6.4894,[156]6.4842,[157]6.5023,[158]6.5054,[159]6.5108,[160]6.5127,[161]6.5255,[162]6.4955,[163]6.4823,[164]6.4575,[165]6.4257,[166]6.3978,[167]6.3596,[168]6.3278,[169]6.3146,[170]6.3033,[171]6.2754,[172]6.2583,[173]6.2412,[174]6.2109,[175]6.1888,[176]6.1788,[177]6.1583,[178]6.1346,[179]6.1178,[180]6.1090,[181]6.0874,[182]6.0696,[183]6.0557,[184]6.0553,[185]6.0480,[186]6.0489,[187]6.0550,[188]6.0504,[189]6.0677,[190]6.0690,[191]6.0907,[192]6.1069,[193]6.1243,[194]6.1357,[195]6.1569,[196]6.1728,[197]6.1938,[198]6.2088,[199]6.2128,[200]6.2175,[201]6.2129,[202]6.2335,[203]6.2417,[204]6.2405,[205]6.2511,[206]6.2578,[207]6.2542,[208]6.2626,[209]6.2669,[210]6.2724,[211]6.2821,[212]6.2895,[213]6.3000,[214]6.3023,[215]6.3061,[216]6.3214,[217]6.3395,[218]6.3527,[219]6.3533,[220]6.3488,[221]6.3441,[222]6.3413,[223]6.3307,[224]6.3234,[225]6.3195,[226]6.3405,[227]6.3493,[228]6.3539,[229]6.3599,[230]6.3564,[231]6.3734,[232]6.3605,[233]6.3436,[234]6.3285,[235]6.3116,[236]6.3042,[237]6.2939,[238]6.2969,[239]6.2813,[240]6.2712,[241]6.2738,[242]6.2775,[243]6.2755,[244]6.2639,[245]6.2611,[246]6.2492,[247]6.2367,[248]6.2292,[249]6.2272,[250]6.2312,[251]6.2243,[252]6.2207,[253]6.2107,[254]6.2066,[255]6.1956,[256]6.1775,[257]6.1655,[258]6.1570,[259]6.1551,[260]6.1478,[261]6.1437,[262]6.1383,[263]6.1331,[264]6.1137,[265]6.1127,[266]6.1113,[267]6.1047,[268]6.1140,[269]6.1122,[270]6.1132,[271]6.1211,[272]6.1241,[273]6.1240,[274]6.1258,[275]6.1337,[276]6.1394,[277]6.1549,[278]6.1651,[279]6.1738,[280]6.1767,[281]6.1859,[282]6.1921,[283]6.2069,[284]6.2145,[285]6.2233,[286]6.2369,[287]6.2367,[288]6.2425,[289]6.2335,[290]6.2178,[291]6.2024,[292]6.1871,[293]6.1732,[294]6.1753,[295]6.1749,[296]6.1789,[297]6.1773,[298]6.1800,[299]6.1770,[300]6.1657,[301]6.1659,[302]6.1582,[303]6.1501,[304]6.1420,[305]6.1395,[306]6.1268,[307]6.1291,[308]6.1326,[309]6.1167,[310]6.1106,[311]6.1045,[312]6.1074,[313]6.1017,[314]6.1001,[315]6.0837,[316]6.0788,[317]6.0625,[318]6.0413,[319]6.0535,[320]6.0661,[321]6.0703,[322]6.0661,[323]6.0592,[324]6.0566,[325]6.0669,[326]6.0667,[327]6.0684,[328]6.0720,[329]6.0782,[330]6.0808,[331]6.0932,[332]6.0902,[333]6.0973,[334]6.0916,[335]6.0847,[336]6.0880,[337]6.0852,[338]6.0850,[339]6.0796,[340]6.0752,[341]6.0831,[342]6.0853,[343]6.0901,[344]6.0899,[345]6.0898,[346]6.0868,[347]6.0912,[348]6.0944,[349]6.0964,[350]6.0928,[351]6.0934,[352]6.0935,[353]6.0875,[354]6.0880,[355]6.0933,[356]6.0961,[357]6.0928,[358]6.1020,[359]6.1049,[360]6.1012,[361]6.1008,[362]6.1077,[363]6.1193,[364]6.1256,[365]6.1313,[366]6.1324,[367]6.1415,[368]6.1389,[369]6.1394,[370]6.1407,[371]6.1349,[372]6.1398,[373]6.1451,[374]6.1436,[375]6.1435,[376]6.1506,[377]6.1458,[378]6.1484,[379]6.1541,[380]6.1461,[381]6.1422,[382]6.1368,[383]6.1359,[384]6.1353,[385]6.1347,[386]6.1344,[387]6.1338,[388]6.1298,[389]6.1246,[390]6.1177,[391]6.1100,[392]6.1058,[393]6.1042,[394]6.1067,[395]6.1053,[396]6.0976,[397]6.1055,[398]6.1095,[399]6.1178,[400]6.1175,[401]6.1192,[402]6.1200,[403]6.1221,[404]6.1287,[405]6.1186,[406]6.1151,[407]6.1145,[408]6.1158,[409]6.1277,[410]6.1385,[411]6.1499,[412]6.1658,[413]6.1777,[414]6.1852,[415]6.1903,[416]6.1981,[417]6.2105,[418]6.2143,[419]6.2215,[420]6.2303,[421]6.2420,[422]6.2469,[423]6.2538,[424]6.2655,[425]6.2744,[426]6.2811,[427]6.2856,[428]6.2940,[429]6.2990,[430]6.3074,[431]6.3216,[432]6.3258,[433]6.3247,[434]6.3202,[435]6.3210,[436]6.3232,[437]6.3328,[438]6.3403,[439]6.3371,[440]6.3367,[441]6.3315,[442]6.3301,[443]6.3314,[444]6.3317,[445]6.3299,[446]6.3325,[447]6.3355,[448]6.3401,[449]6.3376,[450]6.3389,[451]6.3346,[452]6.3218,[453]6.3130,[454]6.3073,[455]6.3084,[456]6.3130,[457]6.3151,[458]6.3129,[459]6.3132,[460]6.3217,[461]6.3188,[462]6.3170,[463]6.3219,[464]6.3209,[465]6.3177,[466]6.3098,[467]6.3096,[468]6.3093,[469]6.3113,[470]6.3116,[471]6.3067,[472]6.3116,[473]6.3060,[474]6.3068,[475]6.3005,[476]6.3025,[477]6.2953,[478]6.2940,[479]6.3000,[480]6.3046,[481]6.3063,[482]6.3018,[483]6.2976,[484]6.2999,[485]6.2983,[486]6.2928,[487]6.2928,[488]6.2905,[489]6.2857,[490]6.2834,[491]6.2805,[492]6.2745,[493]6.2715,[494]6.2699,[495]6.2703,[496]6.2667,[497]6.2611,[498]6.2592,[499]6.2545,[500]6.2448,[501]6.2381,[502]6.2383,[503]6.2377,[504]6.2288,[505]6.2314,[506]6.2324,[507]6.2268,[508]6.2228,[509]6.2220,[510]6.2257,[511]6.2306,[512]6.2338,[513]6.2358,[514]6.2422,[515]6.2366,[516]6.2357,[517]6.2366,[518]6.2366,[519]6.2397,[520]6.2422,[521]6.2438,[522]6.2468,[523]6.2477,[524]6.2532,[525]6.2568,[526]6.2580,[527]6.2598,[528]6.2548,[529]6.2550,[530]6.2503,[531]6.2492,[532]6.2542,[533]6.2564,[534]6.2548,[535]6.2572,[536]6.2516,[537]6.2493,[538]6.2539,[539]6.2550,[540]6.2588,[541]6.2590,[542]6.2601,[543]6.2615,[544]6.2627,[545]6.2603,[546]6.2611,[547]6.2567,[548]6.2518,[549]6.2515,[550]6.2485,[551]6.2450,[552]6.2428,[553]6.2389,[554]6.2365,[555]6.2336,[556]6.2331,[557]6.2354,[558]6.2316,[559]6.2310,[560]6.2308,[561]6.2307,[562]6.2286,[563]6.2286,[564]6.2330,[565]6.2351,[566]6.2348,[567]6.2328,[568]6.2333,[569]6.2317,[570]6.2343,[571]6.2348,[572]6.2358,[573]6.2360,[574]6.2327,[575]6.2322,[576]6.2321,[577]6.2308,[578]6.2287,[579]6.2293,[580]6.2224,[581]6.2186,[582]6.2174,[583]6.2183,[584]6.2185,[585]6.2112,[586]6.2044,[587]6.2047,[588]6.2096,[589]6.2150,[590]6.2178,[591]6.2199,[592]6.2185,[593]6.2150,[594]6.2158,[595]6.2136,[596]6.2170,[597]6.2149,[598]6.2118,[599]6.2139,[600]6.2133,[601]6.2118,[602]6.2134,[603]6.2166,[604]6.2175,[605]6.2209,[606]6.2229,[607]6.2211,[608]6.2179,[609]6.2185,[610]6.2220,[611]6.2202,[612]6.2229,[613]6.2191,[614]6.2138,[615]6.2065,[616]6.2093,[617]6.2031,[618]6.1980,[619]6.1923,[620]6.1781,[621]6.1710,[622]6.1694,[623]6.1710,[624]6.1715,[625]6.1717,[626]6.1704,[627]6.1724,[628]6.1725,[629]6.1719,[630]6.1751,[631]6.1808,[632]6.1864,[633]6.1847,[634]6.1880,[635]6.1888,[636]6.1857,[637]6.1824,[638]6.1851,[639]6.1822,[640]6.1831,[641]6.1834,[642]6.1899,[643]6.1920,[644]6.1931,[645]6.1912,[646]6.1953,[647]6.1914,[648]6.1925,[649]6.1926,[650]6.1967,[651]6.2023,[652]6.2032,[653]6.2071,[654]6.2008,[655]6.2002,

llama_print_timings:        load time =  8085.81 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 3414572.54 ms / 335360 tokens (   10.18 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 3449029.12 ms
7B Q4_3, Docker: [655]6.0619,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_3.bin --no-mmap -f /models/wiki.test.raw

main: seed = 1682356946
llama.cpp: loading model from /models/llama-7b-q4_3.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 6 (mostly Q4_3)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4936267.11 KB
llama_model_load_internal: mem required  = 6612.57 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.95 seconds per pass - ETA 1 hours 4 minutes
[1]4.3494,[2]4.7745,[3]5.6675,[4]6.2874,[5]6.4224,[6]6.3707,[7]6.5479,[8]6.6453,[9]6.9852,[10]7.2514,[11]7.4535,[12]7.4779,[13]7.3964,[14]7.4636,[15]7.7126,[16]7.3279,[17]7.2090,[18]7.1599,[19]6.8048,[20]6.7914,[21]6.6994,[22]6.5295,[23]6.5008,[24]6.4058,[25]6.4156,[26]6.2545,[27]6.0814,[28]5.9817,[29]5.8923,[30]5.7342,[31]5.7033,[32]5.7206,[33]5.6627,[34]5.6956,[35]5.7173,[36]5.7587,[37]5.7623,[38]5.7694,[39]5.8020,[40]5.8530,[41]5.8653,[42]5.9052,[43]5.8678,[44]5.9249,[45]5.9253,[46]5.8998,[47]5.9191,[48]5.8935,[49]5.8922,[50]5.8505,[51]5.8451,[52]5.8341,[53]5.8787,[54]5.8591,[55]5.8357,[56]5.8629,[57]5.8831,[58]5.9029,[59]5.9213,[60]5.9619,[61]5.9534,[62]6.0125,[63]6.0403,[64]6.0536,[65]6.0964,[66]6.1041,[67]6.1232,[68]6.1375,[69]6.1625,[70]6.1932,[71]6.2158,[72]6.2466,[73]6.3047,[74]6.3090,[75]6.3245,[76]6.3370,[77]6.3496,[78]6.3362,[79]6.3624,[80]6.3552,[81]6.3677,[82]6.3722,[83]6.3214,[84]6.3041,[85]6.2914,[86]6.2693,[87]6.2097,[88]6.1854,[89]6.1655,[90]6.1488,[91]6.1725,[92]6.1670,[93]6.1680,[94]6.1655,[95]6.1930,[96]6.1916,[97]6.1878,[98]6.1812,[99]6.1680,[100]6.1683,[101]6.1917,[102]6.1871,[103]6.2064,[104]6.2133,[105]6.2121,[106]6.2295,[107]6.2300,[108]6.2432,[109]6.2365,[110]6.2303,[111]6.2516,[112]6.2716,[113]6.2740,[114]6.2700,[115]6.2758,[116]6.2669,[117]6.2726,[118]6.2999,[119]6.3217,[120]6.3564,[121]6.3711,[122]6.3946,[123]6.4319,[124]6.4494,[125]6.4394,[126]6.4789,[127]6.5147,[128]6.5449,[129]6.5290,[130]6.5368,[131]6.5312,[132]6.5237,[133]6.5113,[134]6.5209,[135]6.5165,[136]6.5048,[137]6.4964,[138]6.4786,[139]6.4684,[140]6.4647,[141]6.4374,[142]6.4326,[143]6.4037,[144]6.3828,[145]6.3746,[146]6.3634,[147]6.3664,[148]6.3664,[149]6.3613,[150]6.3575,[151]6.3596,[152]6.3501,[153]6.3342,[154]6.3260,[155]6.3326,[156]6.3281,[157]6.3447,[158]6.3487,[159]6.3539,[160]6.3562,[161]6.3678,[162]6.3399,[163]6.3276,[164]6.3041,[165]6.2736,[166]6.2468,[167]6.2097,[168]6.1796,[169]6.1652,[170]6.1539,[171]6.1269,[172]6.1090,[173]6.0932,[174]6.0639,[175]6.0424,[176]6.0305,[177]6.0112,[178]5.9887,[179]5.9718,[180]5.9625,[181]5.9414,[182]5.9235,[183]5.9097,[184]5.9083,[185]5.9006,[186]5.9010,[187]5.9075,[188]5.9036,[189]5.9212,[190]5.9223,[191]5.9434,[192]5.9589,[193]5.9754,[194]5.9867,[195]6.0079,[196]6.0237,[197]6.0440,[198]6.0586,[199]6.0620,[200]6.0665,[201]6.0616,[202]6.0801,[203]6.0874,[204]6.0859,[205]6.0967,[206]6.1035,[207]6.0994,[208]6.1080,[209]6.1120,[210]6.1170,[211]6.1269,[212]6.1338,[213]6.1440,[214]6.1465,[215]6.1486,[216]6.1633,[217]6.1805,[218]6.1938,[219]6.1935,[220]6.1896,[221]6.1843,[222]6.1824,[223]6.1736,[224]6.1670,[225]6.1634,[226]6.1834,[227]6.1920,[228]6.1971,[229]6.2033,[230]6.2003,[231]6.2162,[232]6.2048,[233]6.1881,[234]6.1737,[235]6.1548,[236]6.1483,[237]6.1384,[238]6.1405,[239]6.1260,[240]6.1157,[241]6.1175,[242]6.1210,[243]6.1193,[244]6.1085,[245]6.1052,[246]6.0943,[247]6.0828,[248]6.0761,[249]6.0736,[250]6.0782,[251]6.0713,[252]6.0678,[253]6.0581,[254]6.0525,[255]6.0405,[256]6.0226,[257]6.0107,[258]6.0026,[259]6.0003,[260]5.9921,[261]5.9881,[262]5.9824,[263]5.9771,[264]5.9585,[265]5.9581,[266]5.9564,[267]5.9498,[268]5.9584,[269]5.9570,[270]5.9575,[271]5.9655,[272]5.9693,[273]5.9691,[274]5.9714,[275]5.9798,[276]5.9857,[277]6.0012,[278]6.0114,[279]6.0208,[280]6.0234,[281]6.0337,[282]6.0395,[283]6.0545,[284]6.0628,[285]6.0712,[286]6.0841,[287]6.0842,[288]6.0898,[289]6.0816,[290]6.0666,[291]6.0517,[292]6.0368,[293]6.0239,[294]6.0261,[295]6.0251,[296]6.0296,[297]6.0280,[298]6.0312,[299]6.0286,[300]6.0177,[301]6.0176,[302]6.0096,[303]6.0007,[304]5.9919,[305]5.9884,[306]5.9764,[307]5.9785,[308]5.9813,[309]5.9657,[310]5.9603,[311]5.9538,[312]5.9560,[313]5.9502,[314]5.9487,[315]5.9330,[316]5.9279,[317]5.9122,[318]5.8926,[319]5.9047,[320]5.9170,[321]5.9211,[322]5.9171,[323]5.9105,[324]5.9077,[325]5.9179,[326]5.9179,[327]5.9202,[328]5.9239,[329]5.9299,[330]5.9332,[331]5.9456,[332]5.9430,[333]5.9500,[334]5.9448,[335]5.9389,[336]5.9427,[337]5.9405,[338]5.9398,[339]5.9350,[340]5.9309,[341]5.9389,[342]5.9418,[343]5.9461,[344]5.9466,[345]5.9471,[346]5.9449,[347]5.9488,[348]5.9522,[349]5.9546,[350]5.9512,[351]5.9519,[352]5.9524,[353]5.9464,[354]5.9477,[355]5.9528,[356]5.9562,[357]5.9527,[358]5.9621,[359]5.9646,[360]5.9614,[361]5.9612,[362]5.9680,[363]5.9789,[364]5.9852,[365]5.9901,[366]5.9914,[367]5.9997,[368]5.9971,[369]5.9980,[370]5.9997,[371]5.9944,[372]5.9992,[373]6.0039,[374]6.0024,[375]6.0024,[376]6.0089,[377]6.0041,[378]6.0069,[379]6.0130,[380]6.0056,[381]6.0024,[382]5.9974,[383]5.9965,[384]5.9962,[385]5.9950,[386]5.9946,[387]5.9944,[388]5.9910,[389]5.9861,[390]5.9792,[391]5.9717,[392]5.9678,[393]5.9661,[394]5.9690,[395]5.9677,[396]5.9601,[397]5.9669,[398]5.9713,[399]5.9791,[400]5.9791,[401]5.9804,[402]5.9813,[403]5.9832,[404]5.9894,[405]5.9803,[406]5.9772,[407]5.9766,[408]5.9783,[409]5.9897,[410]6.0010,[411]6.0123,[412]6.0280,[413]6.0389,[414]6.0467,[415]6.0521,[416]6.0601,[417]6.0720,[418]6.0755,[419]6.0825,[420]6.0912,[421]6.1028,[422]6.1064,[423]6.1134,[424]6.1238,[425]6.1330,[426]6.1393,[427]6.1438,[428]6.1518,[429]6.1570,[430]6.1652,[431]6.1790,[432]6.1826,[433]6.1817,[434]6.1775,[435]6.1784,[436]6.1808,[437]6.1905,[438]6.1980,[439]6.1948,[440]6.1937,[441]6.1888,[442]6.1876,[443]6.1888,[444]6.1896,[445]6.1875,[446]6.1900,[447]6.1929,[448]6.1967,[449]6.1942,[450]6.1950,[451]6.1909,[452]6.1783,[453]6.1702,[454]6.1647,[455]6.1653,[456]6.1701,[457]6.1718,[458]6.1699,[459]6.1706,[460]6.1790,[461]6.1765,[462]6.1752,[463]6.1787,[464]6.1776,[465]6.1750,[466]6.1674,[467]6.1680,[468]6.1677,[469]6.1699,[470]6.1703,[471]6.1656,[472]6.1701,[473]6.1647,[474]6.1659,[475]6.1598,[476]6.1614,[477]6.1545,[478]6.1536,[479]6.1597,[480]6.1641,[481]6.1658,[482]6.1614,[483]6.1573,[484]6.1591,[485]6.1573,[486]6.1517,[487]6.1515,[488]6.1493,[489]6.1445,[490]6.1422,[491]6.1395,[492]6.1340,[493]6.1311,[494]6.1292,[495]6.1289,[496]6.1252,[497]6.1198,[498]6.1182,[499]6.1138,[500]6.1045,[501]6.0981,[502]6.0982,[503]6.0975,[504]6.0887,[505]6.0905,[506]6.0915,[507]6.0862,[508]6.0823,[509]6.0817,[510]6.0850,[511]6.0897,[512]6.0931,[513]6.0953,[514]6.1016,[515]6.0961,[516]6.0952,[517]6.0962,[518]6.0956,[519]6.0986,[520]6.1009,[521]6.1022,[522]6.1050,[523]6.1057,[524]6.1114,[525]6.1145,[526]6.1156,[527]6.1172,[528]6.1122,[529]6.1131,[530]6.1078,[531]6.1064,[532]6.1112,[533]6.1135,[534]6.1118,[535]6.1138,[536]6.1085,[537]6.1063,[538]6.1114,[539]6.1124,[540]6.1160,[541]6.1162,[542]6.1174,[543]6.1189,[544]6.1198,[545]6.1179,[546]6.1189,[547]6.1149,[548]6.1098,[549]6.1100,[550]6.1070,[551]6.1036,[552]6.1014,[553]6.0976,[554]6.0953,[555]6.0922,[556]6.0915,[557]6.0940,[558]6.0903,[559]6.0901,[560]6.0899,[561]6.0902,[562]6.0881,[563]6.0879,[564]6.0922,[565]6.0943,[566]6.0943,[567]6.0921,[568]6.0930,[569]6.0915,[570]6.0942,[571]6.0944,[572]6.0951,[573]6.0949,[574]6.0913,[575]6.0909,[576]6.0908,[577]6.0892,[578]6.0872,[579]6.0877,[580]6.0813,[581]6.0774,[582]6.0766,[583]6.0774,[584]6.0776,[585]6.0700,[586]6.0632,[587]6.0638,[588]6.0684,[589]6.0739,[590]6.0767,[591]6.0790,[592]6.0777,[593]6.0747,[594]6.0756,[595]6.0732,[596]6.0766,[597]6.0745,[598]6.0715,[599]6.0736,[600]6.0729,[601]6.0715,[602]6.0729,[603]6.0756,[604]6.0764,[605]6.0799,[606]6.0823,[607]6.0807,[608]6.0775,[609]6.0783,[610]6.0818,[611]6.0802,[612]6.0826,[613]6.0789,[614]6.0741,[615]6.0668,[616]6.0694,[617]6.0634,[618]6.0587,[619]6.0531,[620]6.0395,[621]6.0328,[622]6.0311,[623]6.0325,[624]6.0329,[625]6.0328,[626]6.0317,[627]6.0341,[628]6.0343,[629]6.0341,[630]6.0374,[631]6.0430,[632]6.0488,[633]6.0473,[634]6.0507,[635]6.0514,[636]6.0479,[637]6.0444,[638]6.0470,[639]6.0439,[640]6.0448,[641]6.0450,[642]6.0516,[643]6.0538,[644]6.0549,[645]6.0530,[646]6.0572,[647]6.0530,[648]6.0541,[649]6.0544,[650]6.0582,[651]6.0636,[652]6.0648,[653]6.0686,[654]6.0624,[655]6.0619,

llama_print_timings:        load time =  9035.02 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 3789898.46 ms / 335360 tokens (   11.30 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 3830652.16 ms

@grigio
Copy link

grigio commented Aug 29, 2023

cool can you do it with a 70B q4 model ?

@JohannesGaessler
Copy link
Collaborator

No. I don't have 70b q4 ready and there wouldn't be a point anyways since with 16 GB VRAM I would just be benchmarking the speed of the CPU.

akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
* use hipblas based on cublas
* Update Makefile for the Cuda kernels
* Expand arch list and make it overrideable
* Fix multi GPU on multiple amd architectures with rocblas_initialize() (ggerganov#5)
* add hipBLAS to README
* new build arg LLAMA_CUDA_MMQ_Y
* fix half2 decomposition
* Add intrinsics polyfills for AMD
* AMD assembly optimized __dp4a
* Allow overriding CC_TURING
* use "ROCm" instead of "CUDA"
* ignore all build dirs
* Add Dockerfiles
* fix llama-bench
* fix -nommq help for non CUDA/HIP

---------

Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com>
Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Co-authored-by: jammm <2500920+jammm@users.noreply.github.com>
Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com>
@JohannesGaessler
Copy link
Collaborator

I implemented mul_mat_q tunings for RDNA 2 (using my RX 6800): #2910 . Please check whether they are better/worse on other AMD GPUs.

@YellowRoseCx
Copy link
Contributor

I implemented mul_mat_q tunings for RDNA 2 (using my RX 6800): #2910 . Please check whether they are better/worse on other AMD GPUs.

What tuning value would you recommend? Do you want us to check via regular use or by a perplexity test?

@JohannesGaessler
Copy link
Collaborator

The RDNA 2 tunings are currently being applied to all AMD GPUs. Just checking whether the PR is slower or faster than master is enough.

@ardfork
Copy link
Contributor

ardfork commented Aug 30, 2023

While testing #2910, I did some newer benchmark on q4_K_M (on a 6700 XT):

model size params backend ngl mmq test t/s
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm #2910 43 1 pp 512 433.72 ± 0.45
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm #2910 43 1 tg 128 29.94 ± 0.11
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm 43 1 pp 512 369.65 ± 0.76
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm 43 1 tg 128 29.65 ± 0.03
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm 43 0 pp 512 302.21 ± 0.97
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm 43 0 tg 128 29.92 ± 0.11
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B OpenCL 43 0 pp 512 100.31 ± 6.65
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B OpenCL 43 0 tg 128 23.47 ± 0.02

Sadly, I forgot to measure VRAM usage.

@harish0201
Copy link

Would something like https://github.com/microsoft/antares help in easier optimization of HIPBlas builds or working around HSA prefixes eventually for windows?

I'm sorry if this is the wrong fora!

@jammm
Copy link
Contributor

jammm commented Aug 31, 2023

@ggerganov @SlyEcho I was able to compile the ROCm version successfully on Windows using the HIP SDK.
It compiles within seconds via ninja and vs2019. Using the vs2019 x64 command prompt running as administrator:
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLAMA_HIPBLAS=ON -Dhipblas_DIR=<hipBLAS cmake dir> -Drocblas_DIR=<rocBLAS cmake dir> ..
cmake --build .
And CC and CXX were set to clang.exe and clang++.exe respectively from the bin folder of the HIP SDK.

Ran it successfully on 7900XTX. Not sure of the speed though. How do I check that?
image

The command I used ./main -ngl 32 -m ../../models/vicuna-7b-1.1.ggmlv3.q2_K.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"

@jammm
Copy link
Contributor

jammm commented Aug 31, 2023

@SlyEcho will there be hipBLAS builds for Windows uploaded in the packages now?

@ghost
Copy link

ghost commented Aug 31, 2023

Unfortunately, I still can't get it to work on Windows. Compiling is not the problem, it worked. Unfortunately, it cannot be started or crashes after starting the server, as mentioned above. Seems like I must live with that as my 6650 XT has no official support for Windows yet. I just don't understand why it works under Linux with the 1030 overwrite, but not on Windows.

@ghost
Copy link

ghost commented Aug 31, 2023

While testing #2910, I did some newer benchmark on q4_K_M (on a 6700 XT):

model size params backend ngl mmq test t/s
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm #2910 43 1 pp 512 433.72 ± 0.45
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm #2910 43 1 tg 128 29.94 ± 0.11
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm 43 1 pp 512 369.65 ± 0.76
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm 43 1 tg 128 29.65 ± 0.03
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm 43 0 pp 512 302.21 ± 0.97
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B ROCm 43 0 tg 128 29.92 ± 0.11
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B OpenCL 43 0 pp 512 100.31 ± 6.65
LLaMA v2 13B mostly Q4_K - Medium 7.33 GiB 13.02 B OpenCL 43 0 tg 128 23.47 ± 0.02
Sadly, I forgot to measure VRAM usage.

I would like to test it with my card on Linux. How can i measure it like this? I'm new in this topic.

@JohannesGaessler
Copy link
Collaborator

Use the llama-bench binary. By default it will output a table that will be correctly formatted on GitHub.

@jllllll
Copy link

jllllll commented Aug 31, 2023

Updated CI example building llama-cpp-python for both Windows and Linux:
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/blob/main/old_workflows/build-wheel-rocm.yml

The code for building libs should still be relevant if only building llama.cpp.

It is curious that the 6650 XT doesn't work on Windows given that GPU is explicitly listed as supported in the runtime:
https://rocm.docs.amd.com/en/docs-5.5.1/release/windows_support.html
Even though the SDK can't directly build for gfx1032, the copy procedure described before should still work as the gfx1032 ISA is supposed to be identical to gfx1030: https://salsa.debian.org/rocm-team/community/team-project/-/wikis/supported-gpu-list#gfx1032

@Engininja2
Copy link
Contributor

The issue is that rocBLAS on Windows comes compiled & with tensile libs for gfx906, gfx1030, gfx1100, gfx1101, and gfx1102. There's no HSA_OVERRIDE_GFX_VERSION because it's running on top of PAL instead of HSA. PAL might have an equivalent, from reading its repo.
There's a list of settings that lists spoofNullGpuIfh for GPU ID Masquerade. There's also an OverrideGpuId function. On the other hand a CMake file for rocclr has set(PAL_BUILD_NULL_DEVICE OFF) which might be related.

So there might be a registry setting that could work but it may need recompiling part of the HIP SDK anyways.

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Aug 31, 2023

There may come a time when rocBLAS is not needed, then it would work.

@JohannesGaessler
Copy link
Collaborator

Speaking of which, one of my next goals is to try and quantize the KV cache to q8_1. It will probably take some time but if that is done (and works) you could compile completely without cuBLAS/rocBLAS.

@jammm
Copy link
Contributor

jammm commented Sep 1, 2023

Speaking of which, one of my next goals is to try and quantize the KV cache to q8_1. It will probably take some time but if that is done (and works) you could compile completely without cuBLAS/rocBLAS.

Is there a timeline for this? I'd like to how how many users here are using navi22 and navi23. If it's worthwhile to push for hipBLAS to support it in its precompiled form until the hipBLAS dependency is removed, I can at least request for it internally, no guarantees though.

So navi22 and navi23 users, feel free to use the rocket emoji. Also if you're an APU user using phoenix, use the hooray emoji.

@JohannesGaessler
Copy link
Collaborator

I can't give a serious ETA because there are too many uncertainties. It will be done when it's done.

@KerfuffleV2
Copy link
Collaborator

I looked at doing this (for other reasons, like making the prompt caches smaller or reducing VRAM usage) in the past. Seems like it'll require making a number of operations that currently only work on 32bit tensors support quantized ones also. Another nice side benefit may be making it easier to support other models that could benefit from using those ops on quantized tensors.

@jammm
Copy link
Contributor

jammm commented Sep 1, 2023

The following was run on Windows using the HIP SDK:

Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0

model size params backend ngl test t/s
llama-2-7b.ggmlv3.q4_0.bin 7B mostly Q4_0 (guessed) 3.53 GiB 6.74 B ROCm 99 pp 512 1637.00 ± 28.37
llama-2-7b.ggmlv3.q4_0.bin 7B mostly Q4_0 (guessed) 3.53 GiB 6.74 B ROCm 99 tg 128 100.04 ± 0.59
model size params backend ngl test t/s
llama-2-13b.ggmlv3.q4_0.bin 13B mostly Q4_0 (guessed) 6.82 GiB 13.02 B ROCm 99 pp 512 896.31 ± 5.70
llama-2-13b.ggmlv3.q4_0.bin 13B mostly Q4_0 (guessed) 6.82 GiB 13.02 B ROCm 99 tg 128 66.26 ± 0.03

Sam2much96 pushed a commit to Sam2much96/llama.cpp that referenced this pull request Sep 11, 2023
* use hipblas based on cublas
* Update Makefile for the Cuda kernels
* Expand arch list and make it overrideable
* Fix multi GPU on multiple amd architectures with rocblas_initialize() (ggerganov#5)
* add hipBLAS to README
* new build arg LLAMA_CUDA_MMQ_Y
* fix half2 decomposition
* Add intrinsics polyfills for AMD
* AMD assembly optimized __dp4a
* Allow overriding CC_TURING
* use "ROCm" instead of "CUDA"
* ignore all build dirs
* Add Dockerfiles
* fix llama-bench
* fix -nommq help for non CUDA/HIP

---------

Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com>
Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Co-authored-by: jammm <2500920+jammm@users.noreply.github.com>
Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com>
@YellowRoseCx
Copy link
Contributor

YellowRoseCx commented Apr 14, 2024

Is anyone having issues compiling the hipBLAS backend with the cmakelists.txt file on Windows after the ggml-cuda was broken up into different files in its own folder?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is there any support for AMD GPU (ROCM)