New conversion script #545

comex · 2023-03-27T04:11:54Z

Current status: Working, except for the latest GPTQ-for-LLaMa format that includes g_idx. This turns out to require changes to GGML, so for now it only works if you use the --outtype option to dequantize it back to f16 (which is pointless except for debugging).

I also included some cleanup for the C++ code.

This script is meant to replace all the existing conversion scripts (including the ones that convert from older GGML formats), while also adding support for some new formats. Specifically, I've tested with:

There's enough overlap in the logic needed to handle these different cases that it seemed best to move to a single script.

I haven't tried this with Alpaca-LoRA because I don't know where to find it.

Useful features:

Uses multiple threads for a speedup in some cases (though the Python GIL limits the gain, and sometimes it's disk-bound anyway).
Combines split models into a single file (both the intra-tensor split of the original and the inter-tensor split of 'transformers' format files). Single files are more convenient to work with and more friendly to future changes to use memory mapping on the C++ side. To accomplish this without increasing memory requirements, it has some custom loading code which avoids loading whole input files into memory at once.
Because of the custom loading code, it no longer depends in PyTorch, which might make installing dependencies slightly easier or faster... although it still depends on NumPy and sentencepiece, so I don't know if there's any meaningful difference. In any case, I also added a requirements.txt file to lock the dependency versions in case of any future breaking changes.
Type annotations checked with mypy.
Some attempts to be extra user-friendly:
- The script tries to be forgiving with arguments, e.g. you can specify either the model file itself or the directory containing it.
- The script doesn't depend on config.json / params.json, just in case the user downloaded files individually and doesn't have those handy. But you still need tokenizer.model and, for Alpaca, added_tokens.json.
- The script tries to give a helpful error message if added_tokens.json is missing.

anzz1 · 2023-03-27T04:25:20Z

Looks fantastic! 🎉

Agree that the conversion scripts should be merged as one.

Have you checked that the sha256 checksums match for files produced with the old and new scripts? So that no bits or bytes are accidentally dropped roadside on the way.

Minor comment, I think naming it like convert-model-to-ggml.py would be more verbose, as the name convert.py doesn't really tell its' purpose.

green-s · 2023-03-27T06:22:27Z

Could you maybe add safetensors support? People are starting to distribute GPTQ weights in that format instead, since it doesn't allow arbitrary code execution. Usually it's just a matter of using safetensors.torch.load_from_file in place of torch.load but since you're not using torch.load it might be a bit trickier.

Edit:

Looks like if you use safetensors.safe_open you can load lazily/partially and in numpy format if you specify framework="numpy".

convert.py

Belluxx · 2023-03-27T08:16:41Z

I converted with the script a 7B 3.77GB 4bit gptq (no grops) model. The converted file however is 5.39GB. Is this expected?

It's also very slow compared to the RTN q4 model because it swaps on the disk now due to its size.

BadisG · 2023-03-28T09:57:08Z

You should verify if your script works with the new techniques proposed by @qwopqwop200
https://github.com/qwopqwop200/GPTQ-for-LLaMa

I think it's not the case as someone reported an error here : #442 (comment)

BadisG · 2023-03-28T11:46:10Z

@luxtiasco It's not finished yet, once qwopqwop200 will be able to make "act-order" and "groupsize 128" work together, we'll get a really great quantization 😄👍

BadisG · 2023-03-28T12:02:11Z

https://www.reddit.com/r/LocalLLaMA/comments/1248183/i_am_currently_quantizing_llama65b_30b_and_13b/

It looks like "act-order" gives smaller models and better output than "groupesize (32 or 128)", making the latter irrelevant when using it alone

BadisG · 2023-03-28T13:36:09Z

qwopqwop200/GPTQ-for-LLaMa@4e141a8

The madman did it! Now it's possible to get both groupsize and act-order !

ggerganov · 2023-03-28T18:31:00Z

🦙 !

I'm fully OK with this change - cannot comment on the Python code as I don't have experience
cc @jart - this change allows to generate single-file models from the get-go. Might have some relevance for the mmap stuff, so bringing your attention just in case

jart · 2023-03-28T18:43:53Z

Looks very promising! Single file models would be nice. The main thing I want is for the tensors to be mmap()'able. In order for that to happen, multi-dimensional tensors need to be laid out in the file in such a way that they don't need to be reshaped in order to be loaded. The memory layout on disk, should be the same as what ggml wants in memory at runtime. The format should also observe ideally a 32-byte alignment. Does this change do that? If not, could it?

plabadens · 2023-03-28T20:00:27Z

Unfortunately, the conversion script seems to break when applied to models generated using the latest version of qwopqwop200/GPTQ-for-LLaMa@4c15f16. In this case, the alpaca-native model, quantized to 4bit with --act-order, --true-sequential and --groupsize 128.

Output

Loaded 'transformers' model split into 1 parts.
Writing vocab...
[1/291] Writing tensor tok_embeddings.weight, size 32001 x 4096...
[2/291] Writing tensor norm.weight, size 4096...
[3/291] Writing tensor output.weight, size 32001 x 4096...
Traceback (most recent call last):
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 673, in 
    main()
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 671, in main
    OutputFile.write_all(outfile, params, model, vocab)
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 579, in write_all
    for i, ((name, lazy_tensor), ndarray) in enumerate(zip(model.items(), ndarrays)):
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 508, in bounded_parallel_map
    result = futures.pop(0).result()
  File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/nix/store/iw1vmh509hcbby8dbpsaanbri4zsq7dj-python3-3.10.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 577, in 
    ndarrays = bounded_parallel_map(lambda lazy_tensor: lazy_tensor.load().ggml_ndarray(), model.values(),
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 357, in load
    tensor = lazy_tensor.load()
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 399, in load
    return QuantizedTensor(model, namebase)
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 187, in __init__
    scales = load_unquantized(model[f"{namebase}.scales"], np.float32)
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 178, in load_unquantized
    tensor = lazy_tensor.load()
  File "/home/pierre/Development/llama/llama.cpp/convert.py", line 473, in load
    return UnquantizedTensor(storage.load(storage_offset, elm_count).reshape(size))
ValueError: cannot reshape array of size 1 into shape (1,4096)

comex · 2023-03-29T01:17:35Z

The memory layout on disk, should be the same as what ggml wants in memory at runtime. The format should also observe ideally a 32-byte alignment. Does this change do that? If not, could it?

The memory layout matches, but there is currently no alignment. I was thinking of adding that, but it will require a format change, whereas this PR happens to be compatible with the existing format (since there was already an option to adjust the per-file split), so I decided to leave it out of this one.

Regarding other feedback, I’ll take a look soon.

I’m also thinking about adding support for reading files that are already in GGML format so that they can be upgraded without needing the original. This is despite the fact that I think it’s probably advisable to make the loader backwards-compatible moving forward rather than requiring upgrades. Even with a change to add mmap support, there should be a fallback path that supports existing non-aligned files. But if you want to actually benefit from mmap, you’ll need alignment and thus a format upgrade.

slaren · 2023-03-29T01:18:00Z

Would it be possible to ensure that the tensor data is aligned by padding the tensor names with zeros? That should allow us to do it without changing the file format.

comex · 2023-03-29T01:21:03Z

I thought of that, but it seemed like an ugly hack for not much benefit. It’s not hard to change the C++ side; it just seemed convenient to make it a separate change to avoid merge conflicts and such. (Edit: Not that I have a particularly strong objection to doing it that way; it just isn’t what I’d choose.)

slaren · 2023-03-29T01:28:43Z

Something that may also help (as suggested by xloem in discord) would be making sure that the tensors in the model file are in the same order as they are accessed during inference. This should especially help in systems without enough memory to keep the entire model in memory. I think it is already very close to being that way, but may be worth double checking.

Belluxx · 2023-03-29T08:18:55Z

@slaren There's a discord server for the project or about llama in general?

jart · 2023-03-29T08:36:02Z

@Belluxx Kind of yes. @slaren and I have been collaborating on Redbean's Discord server, which has an #AI channel. There's no official chatroom for the llama.cpp project yet, however you're all welcome to join us on the Redbean Discord until that happens! https://discord.gg/AqSvHf4u

linouxis9 · 2023-03-29T20:48:41Z

Does the new conversion script works better with generic pytorch models? (Such as https://huggingface.co/THUDM/chatglm-6b)
Thanks :-)

BadisG · 2023-04-01T01:47:49Z

I can confirm it doesn't work with the new implementations of the GPTQ quantization.
I tried it with the gpt4-x-alpaca-13b-native-4bit-128g.pt model which was converted this way

CUDA_VISIBLE_DEVICES=0 python llama.py ./models/chavinlo-gpt4-x-alpaca --wbits 4 --true-sequential --act-order --groupsize 128 --save gpt-x-alpaca-13b-native-4bit-128g.pt

I got this error:

D:\Large Language Models\CONVERTISSEURS\gptq to ggml>python GPTQ-to-GGML.py gpt4-x-alpaca-13b-native-
4bit-128g.pt --vocab-dir TokenDIR
Loaded 'transformers' model split into 1 parts.
Writing vocab...
[1/363] Writing tensor tok_embeddings.weight, size 32001 x 5120...
[2/363] Writing tensor norm.weight, size 5120...
[3/363] Writing tensor output.weight, size 32001 x 5120...
Traceback (most recent call last):
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\GPTQ-to-GGML.py", line 673, in <module>

    main()
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\GPTQ-to-GGML.py", line 671, in main
    OutputFile.write_all(outfile, params, model, vocab)
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\GPTQ-to-GGML.py", line 579, in write_al
l
    for i, ((name, lazy_tensor), ndarray) in enumerate(zip(model.items(), ndarrays)):
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\GPTQ-to-GGML.py", line 508, in bounded_
parallel_map
    result = futures.pop(0).result()
  File "C:\Users\Utilisateur\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py
", line 451, in result
    return self.__get_result()
  File "C:\Users\Utilisateur\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py
", line 403, in __get_result
    raise self._exception
  File "C:\Users\Utilisateur\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.p
y", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\GPTQ-to-GGML.py", line 577, in <lambda>

    ndarrays = bounded_parallel_map(lambda lazy_tensor: lazy_tensor.load().ggml_ndarray(), model.val
ues(),
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\GPTQ-to-GGML.py", line 357, in load
    tensor = lazy_tensor.load()
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\GPTQ-to-GGML.py", line 399, in load
    return QuantizedTensor(model, namebase)
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\GPTQ-to-GGML.py", line 187, in __init__

    scales = load_unquantized(model[f"{namebase}.scales"], np.float32)
  File "D:\Large Language Models\CONVERTISSEURS\gptq to ggml\GPTQ-to-GGML.py", line 181, in load_unq
uantized
    assert tensor.ndarray.dtype == expected_dtype, (tensor.ndarray.dtype, expected_dtype)
AssertionError: (dtype('float16'), <class 'numpy.float32'>)

Current status: Working, except for the latest GPTQ-for-LLaMa format that includes `g_idx`. This turns out to require changes to GGML, so for now it only works if you use the `--outtype` option to dequantize it back to f16 (which is pointless except for debugging). I also included some cleanup for the C++ code. This script is meant to replace all the existing conversion scripts (including the ones that convert from older GGML formats), while also adding support for some new formats. Specifically, I've tested with: - [x] `LLaMA` (original) - [x] `llama-65b-4bit` - [x] `alpaca-native` - [x] `alpaca-native-4bit` - [x] LLaMA converted to 'transformers' format using `convert_llama_weights_to_hf.py` - [x] `alpaca-native` quantized with `--true-sequential --act-order --groupsize 128` (dequantized only) - [x] same as above plus `--save_safetensors` - [x] GPT4All - [x] stock unversioned ggml - [x] ggmh There's enough overlap in the logic needed to handle these different cases that it seemed best to move to a single script. I haven't tried this with Alpaca-LoRA because I don't know where to find it. Useful features: - Uses multiple threads for a speedup in some cases (though the Python GIL limits the gain, and sometimes it's disk-bound anyway). - Combines split models into a single file (both the intra-tensor split of the original and the inter-tensor split of 'transformers' format files). Single files are more convenient to work with and more friendly to future changes to use memory mapping on the C++ side. To accomplish this without increasing memory requirements, it has some custom loading code which avoids loading whole input files into memory at once. - Because of the custom loading code, it no longer depends in PyTorch, which might make installing dependencies slightly easier or faster... although it still depends on NumPy and sentencepiece, so I don't know if there's any meaningful difference. In any case, I also added a requirements.txt file to lock the dependency versions in case of any future breaking changes. - Type annotations checked with mypy. - Some attempts to be extra user-friendly: - The script tries to be forgiving with arguments, e.g. you can specify either the model file itself or the directory containing it. - The script doesn't depend on config.json / params.json, just in case the user downloaded files individually and doesn't have those handy. But you still need tokenizer.model and, for Alpaca, added_tokens.json. - The script tries to give a helpful error message if added_tokens.json is missing.

comex · 2023-04-14T04:34:49Z

Updates:

Fixed Python 3.8 compatibility. (By the way, installing sentencepiece on Python 3.11 also works fine for me, but maybe it depends on the OS.)
Fixed faulthandler incompatibility with Windows.
Fixed TypeError: 'staticmethod' object is not callable.
Fixed error if scales is fp16 instead of fp32 (Koala in dequantize mode).

This weekend hopefully I'll get to fixing compatibility with the latest GPTQ.

comex · 2023-04-14T05:18:01Z

Looks like GitHub doesn't give me a merge button even with the approval and checks passing (not sure why), but feel free to merge, @ggerganov. Thanks!

ggerganov · 2023-04-14T07:04:07Z

@comex

Thank you for the hard work and for another very well done contribution!

DannyDaemonic · 2023-04-14T08:03:37Z

Since requirements.txt is going into the root directory, to avoid confusion we should consider renaming it to something like convert-reqs.txt or conversion-requirements.txt, as those requirements are specific to the conversion scripts and are not requirements for llama.cpp.

after #545 we do not need torch, tqdm and requests in the dependencies

vmajor · 2023-04-22T19:33:46Z

How can convert.py be used to migrate old ggml model to the new ggml model? Attempting to do so blindly results in this error:

python convert.py --outfile ../alpaca.cpp_65b_ggml/new_ggml-model-q4_0.bin ../alpaca.cpp_65b_ggml/ggml-model-q4_0.bin

raise FileNotFoundError(f"Could not find tokenizer.model in {path} or its parent; if it's in another directory, pass the directory as --vocab-dir")
FileNotFoundError: Could not find tokenizer.model in ../alpaca.cpp_65b_ggml or its parent; if it's in another directory, pass the directory as --vocab-dir

TheBloke · 2023-04-22T20:26:14Z

How can convert.py be used to migrate old ggml model to the new ggml model? Attempting to do so blindly results in this error:

python convert.py --outfile ../alpaca.cpp_65b_ggml/new_ggml-model-q4_0.bin ../alpaca.cpp_65b_ggml/ggml-model-q4_0.bin
raise FileNotFoundError(f"Could not find tokenizer.model in {path} or its parent; if it's in another directory, pass the directory as --vocab-dir")
FileNotFoundError: Could not find tokenizer.model in ../alpaca.cpp_65b_ggml or its parent; if it's in another directory, pass the directory as --vocab-dir

You can download the tokenizer.model it's missing from HF, eg at this link: https://huggingface.co/TheBloke/alpaca-lora-65B-HF/resolve/main/tokenizer.model

PS. If you want a newer 65B Alpaca Lora model, using newer and better 4bit quantisation techniques, try the q4_0, q4_2 or q4_3 models from my repo here: https://huggingface.co/TheBloke/alpaca-lora-65B-GGML . q4_2 seems to be the quantisation format that people regard as best at the moment.

vmajor · 2023-04-23T05:03:20Z

Thank you for this! I still cannot get the conversion done due to a different error, but I downloaded your model and it is now working much better. Can you tell me what is the difference from the later q4_3 model? That one is larger.

Green-Sky · 2023-04-23T12:47:50Z

@vmajor #1121

big-thousand · 2023-05-15T12:26:35Z

which version of GPTQ-for-LLaMa can get no g_idx model.

Green-Sky · 2023-05-15T12:34:44Z

@big-thousand I believe you are in the wrong repository.

Interpause · 2023-05-15T12:36:28Z

Would like to check if there is now support for converting GPTQ 4-bit quantized models to GGML

Green-Sky · 2023-05-15T12:38:10Z

@Interpause you will have better quality without the GPTQ in-between.

TheBloke · 2023-05-15T12:38:16Z

I think they're asking because llama.cpp convert.py can convert old GPTQ models to GGML, but only if they don't have the new GPTQ g_idx format.

However @big-thousand and @Interpause I do not recommend you do this. I tested using convert.py to convert GPTQ -> GGML and the perplexity (model accuracy) was very poor. Much worse than using llama.cpp's own quantize feature.

I think this is partly because you have to use an old version of GPTQ to do the conversion.

I suggest you make new GGMLs using llama.cpp quantize. It will result in the highest quality model, and will be faster than going float16 -> GPTQ -> GGML.

That's what I do now for all my model releases on HF. I do float16 -> GPTQ, and separately I do float16 -> GGML

Interpause · 2023-05-15T12:44:57Z

Just would like to ask, current GGML 4 bit does some form of error correction right? The main rationale behind wanting to use GPTQ is to mitigate increase in perplexity. Is GGML's 4 bit quantization already on par or superior to GPTQ?

TheBloke · 2023-05-15T12:52:20Z

Just would like to ask, current GGML 4 bit does some form of error correction right? The main rationale behind wanting to use GPTQ is to mitigate increase in perplexity. Is GGML's 4 bit quantization already on par or superior to GPTQ?

I am actually testing that right this second. I wrote a perplexity calc for GPTQ that runs 100% the same algorithm as the perplexity tool in llama.cpp, so the results are comparable.

Here are some early results from my testing (which I will publish properly soon):

Llama 7B:

float16 (13.0GB) : 5.9066
llama.cpp q4_0 (4.0GB) : 6.1565
llama.cpp q4_1 (4.8GB) : 6.0910
llama.cpp q5_0 (4.4GB) : 5.9862
llama.cpp q5_1 (4.8GB) : 5.9481
llama.cpp q8_0 (7.1GB) : 5.9069
AutoGPTQ 4bit 32g no desc_act (4.0GB) : 6.2650
AutoGPTQ 4bit 32g desc_act (4.0GB) : 6.0422
AutoGPTQ 4bit 128g no desc_act (3.7GB) : 6.3850
AutoGPTQ 4bit 128g desc_act (3.7GB) : 6.0653

So you see that for 4bit, GPTQ is slightly better. Best result is 6.0422 or 6.0653. Although this requires desc_act, and there are currently some performance implications to using that - it slows down inference a fair bit at the moment.

But llama.cpp also offers 5bit, and this out-performs GPTQ 4bit. And now that llama.cpp has CUDA GPU acceleration, it may be it can compete on performance as well.

So it will be up to the user to decide what is best for them and their use case.

I will publish more results, and benchmarks, soon.

ggerganov · 2023-05-15T13:04:21Z

@TheBloke - When doing the comparisons, don't forget to include the file sizes. These are important

TheBloke · 2023-05-15T13:17:11Z

Yeah fair enough. I've edited that in.

When I publish the full results I'll include a table and spreadsheet with all the details.

earonesty · 2023-10-18T18:05:53Z

this should be its own release on pypi

comex mentioned this pull request Mar 27, 2023

Converting alpaca-native-GPTQ models into ggml models #442

Closed

sw reviewed Mar 27, 2023

View reviewed changes

convert.py Outdated Show resolved Hide resolved

anzz1 added enhancement New feature or request script Script related labels Mar 27, 2023

slaren mentioned this pull request Mar 29, 2023

Add support for memory mapping models #586

Closed

4 tasks

ggerganov mentioned this pull request Mar 30, 2023

Make loading weights 10-100x faster #613

Merged

comex mentioned this pull request Apr 1, 2023

How to use .safetensors model ? #688

Closed

comex force-pushed the convert-script branch 2 times, most recently from 358bb6c to 80ae52a Compare April 2, 2023 02:47

comex added a commit to comex/llama.cpp that referenced this pull request Apr 2, 2023

New conversion script (ggerganov#545)

a7d6214

comex force-pushed the convert-script branch from b95478f to a7d6214 Compare April 2, 2023 02:56

comex marked this pull request as ready for review April 2, 2023 03:04

comex force-pushed the convert-script branch from 209e5f8 to 45fa179 Compare April 14, 2023 04:30

comex force-pushed the convert-script branch from 45fa179 to 241065e Compare April 14, 2023 04:33

ggerganov merged commit 723dac5 into ggerganov:master Apr 14, 2023
3 checks passed

prusnak mentioned this pull request Apr 14, 2023

py : fix flake8 and isort nitpicks #960

Merged

prusnak added a commit that referenced this pull request Apr 14, 2023

py : cleanup dependencies

3340252

after #545 we do not need torch, tqdm and requests in the dependencies

prusnak mentioned this pull request Apr 14, 2023

py : cleanup dependencies #962

Merged

prusnak added a commit that referenced this pull request Apr 14, 2023

py : cleanup dependencies

995fe03

after #545 we do not need torch, tqdm and requests in the dependencies

sw pushed a commit that referenced this pull request Apr 14, 2023

py : cleanup dependencies (#962)

a32f7ac

after #545 we do not need torch, tqdm and requests in the dependencies

prusnak mentioned this pull request Apr 14, 2023

readme : update gpt4all instructions #980

Merged

mann1x mentioned this pull request Apr 14, 2024

Ollama fails to create models when using IQ quantized GGUFs - Error: invalid file magic ollama/ollama#3622

Open

New conversion script #545

New conversion script #545

Conversation

comex commented Mar 27, 2023 • edited

anzz1 commented Mar 27, 2023 • edited

green-s commented Mar 27, 2023 • edited

Belluxx commented Mar 27, 2023

BadisG commented Mar 28, 2023

BadisG commented Mar 28, 2023

BadisG commented Mar 28, 2023 • edited

BadisG commented Mar 28, 2023

ggerganov commented Mar 28, 2023

jart commented Mar 28, 2023

plabadens commented Mar 28, 2023 • edited

comex commented Mar 29, 2023

slaren commented Mar 29, 2023 • edited

comex commented Mar 29, 2023 • edited

slaren commented Mar 29, 2023

Belluxx commented Mar 29, 2023

jart commented Mar 29, 2023

linouxis9 commented Mar 29, 2023

BadisG commented Apr 1, 2023 • edited

comex commented Apr 14, 2023

comex commented Apr 14, 2023

ggerganov commented Apr 14, 2023

DannyDaemonic commented Apr 14, 2023 • edited

vmajor commented Apr 22, 2023

TheBloke commented Apr 22, 2023

vmajor commented Apr 23, 2023

Green-Sky commented Apr 23, 2023

big-thousand commented May 15, 2023

Green-Sky commented May 15, 2023

Interpause commented May 15, 2023

Green-Sky commented May 15, 2023

TheBloke commented May 15, 2023

Interpause commented May 15, 2023

TheBloke commented May 15, 2023 • edited

ggerganov commented May 15, 2023

TheBloke commented May 15, 2023

earonesty commented Oct 18, 2023

comex commented Mar 27, 2023 •

edited

anzz1 commented Mar 27, 2023 •

edited

green-s commented Mar 27, 2023 •

edited

BadisG commented Mar 28, 2023 •

edited

plabadens commented Mar 28, 2023 •

edited

slaren commented Mar 29, 2023 •

edited

comex commented Mar 29, 2023 •

edited

BadisG commented Apr 1, 2023 •

edited

DannyDaemonic commented Apr 14, 2023 •

edited

TheBloke commented May 15, 2023 •

edited