<a href="https://www.kaggle.com/code/severiandev/koboldcpp-notebook?scriptVersionId=263392674" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# KoboldCPP on Kaggle

Host models up to 32B with this notebook. A slightly more organized version of [Divine's notebook here](https://www.kaggle.com/code/divinesinner/koboldcpp-guide-in-comment/notebook).

### Relevant links
- [Divine's guide](https://www.kaggle.com/code/divinesinner/koboldcpp-guide-in-comment/comments#3102042)
- [Hibiki's model recommendations on the unofficial Colab fork](https://colab.research.google.com/drive/1l_wRGeD-LnRl3VtZHDc7epW_XW0nJvew#scrollTo=pf4AQOYgTB2d)
- [nyxkrage's VRAM Calculator](https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator)
- [Myscell's local model recommendations](https://rentry.org/anathem)
- [Pepper's local model reviews](https://www.notion.so/playwithpepper/1f392d900248803f86c2c51c73f92a0b?v=1f392d90024880d59e33000cc1b15175)
- [Naen's Kaggle guide, if not a bit outdated](https://rentry.org/GodsGreatestKaggles)
- ~~[trashpanda-org on HF, just because I can](https://huggingface.co/trashpanda-org)~~

### Getting started

Make sure you've done the following (refer to Divine's guide above if unsure how):

1. Sign up for a **Kaggle** account with a **verified phone number**. You might be asked to do facial verification via Persona.
2. Sign up for an **ngrok** account and acquire an **authtoken** [(signup/login here)](https://ngrok.com/)
3. In the **`Settings`** menu above -> **`Accelerator`**, select **`GPU T4x2`**

### Selecting a model to run

To be specific, you need a model quant URL for Kobold to download and set up with.
Too long and disruptive for a guide to be here, so just check out [this model quant selection guide](https://rentry.org/severian) for more details, including some screenshots.

### Using this notebook

> Oh, to my knowledge the runtime kills itself after 40 minutes, the audio curbs that. I believe it was Divine's idea. Although I mostly uses kaggle on my phone, for PC you might need a script that automatically clicks the screen in a set interval.

From Sam. It's recommended to use the audio file below to keep Kaggle from killing the runtime after 49 minutes. Run the cell first, before playing the audio file (note how these are two different steps.)

Also, if you have the max tokens setting as 0 in any frontend that allows such a thing, default max token is set to 2048 below.

In [None]:
%%html
<h5>Press play on the music player to keep the tab alive (Uses only 13MB of data)</h5>
<audio src="https://raw.githubusercontent.com/KoboldAI/KoboldAI-Client/main/colab/silence.m4a" controls>

<h3>Download Kobold</h3>

In [1]:
!curl -fLo koboldcpp https://github.com/LostRuins/koboldcpp/releases/download/v1.98.1/koboldcpp-linux-x64 && chmod +x koboldcpp

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  561M  100  561M    0     0   180M      0  0:00:03  0:00:03 --:--:--  242M


### Select model, context and max tokens

After running this cell, make changes to the settings below (select model, check context and max token) before moving on to the cells after this one.

In [2]:
# Model list, settings such as context, max tokens, instruct preset, and advanced settings
from IPython.display import display
import ipywidgets as widgets

premade_instruct = {
    "alpaca": {
        "system_start": "\n### Input: ",
        "system_end": "",
        "user_start": "\n### Instruction: ",
        "user_end": "",
        "assistant_start": "\n### Response: ",
        "assistant_end": "",
    },
    "vicuna": {
        "system_start": "\nSYSTEM: ",
        "system_end": "",
        "user_start": "\nUSER: ",
        "user_end": "",
        "assistant_start": "\nASSISTANT: ",
        "assistant_end": "",
    },
    "llama-3": {
        "system_start": "<|start_header_id|>system<|end_header_id|>\n\n",
        "system_end": "<|eot_id|>",
        "user_start": "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n",
        "user_end": "<|eot_id|>",
        "assistant_start": "<|start_header_id|>assistant<|end_header_id|>\n\n",
        "assistant_end": "<|eot_id|>",
    },
    "chatml": {
        "system_start": "<|im_start|>system",
        "system_end": "<|im_end|>",
        "user_start": "<|im_start|>user",
        "user_end": "<|im_end|>",
        "assistant_start": "<|im_start|>assistant",
        "assistant_end": "<|im_end|>",
    },
    "command-r": {
        "system_start": "<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>",
        "system_end": "<|END_OF_TURN_TOKEN|>",
        "user_start": "<|START_OF_TURN_TOKEN|><|USER_TOKEN|>",
        "user_end": "<|END_OF_TURN_TOKEN|>",
        "assistant_start": "<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>",
        "assistant_end": "<|END_OF_TURN_TOKEN|>",
    },
    "mistral":  {
      "system_start": "",
      "system_end": "",
      "user_start": "[INST] ",
      "user_end": "",
      "assistant_start": " [/INST]",
      "assistant_end": "</s> "
    },
    "mistral-v7-tekken":  {
      "system_start": "[SYSTEM_PROMPT]",
      "system_end": "[/SYSTEM_PROMPT]",
      "user_start": "[INST]",
      "user_end": "[/INST]",
      "assistant_start": " ",
      "assistant_end": "</s>"
    },
    "gemma2":{
      "system_start": "<start_of_turn>system\n",
      "system_end": "<end_of_turn>\n",
      "user_start": "<start_of_turn>user\n",
      "user_end": "<end_of_turn>\n",
      "assistant_start": "<start_of_turn>model\n",
      "assistant_end": "<end_of_turn>\n"
    },
    "metharme": {
      "system_start": "<|system|>",
      "system_end": "",
      "user_start": "<|user|>",
      "user_end": "",
      "assistant_start": "<|model>",
      "assistant_end": ""
    },
    "harmony": {
        "system_start": "<|start|>system<|message|>",
        "system_end": "<|end|>",
        "user_start": "<|start|>user<|message|>",
        "user_end": "<|end|>",
        "assistant_start": "<|start|>assistant<|channel|>final<|message|>",
        "assistant_end": "<|end|>"
    },
    "personalityengine-custom": {
        "system_start": "<|system|>",
        "system_end": "<|endoftext|>",
        "user_start": "<|user|>",
        "user_end": "<|endoftext|>",
        "assistant_start": "<|assistant|>",
        "assistant_end": "<|endoftext|>"
    },
}

model_options = [
    ("[CATEGORY] 09/23 additions", "https://huggingface.co/bartowski/trashpanda-org_QwQ-32B-Snowdrop-v0-GGUF/resolve/main/trashpanda-org_QwQ-32B-Snowdrop-v0-Q4_K_L.gguf", "chatml"),
    # additional models from recs
    ("PocketDoc: Dan's PersonalityEngine V1.3.0 24B Q6_K_L", "https://huggingface.co/bartowski/PocketDoc_Dans-PersonalityEngine-V1.3.0-24b-GGUF/blob/main/PocketDoc_Dans-PersonalityEngine-V1.3.0-24b-Q6_K_L.gguf?download=true", "personalityengine-custom"),
    ("allura-org: Gemma-3-Glitter 27B Q4_K_M", "https://huggingface.co/mradermacher/Gemma-3-Glitter-27B-GGUF/resolve/main/Gemma-3-Glitter-27B.Q4_K_M.gguf?download=true", "gemma2"),
    ("Qwen: Qwen3 30B-A3B abliterated Q4_K_M", "https://huggingface.co/Sowkwndms/Qwen3-30B-A3B-abliterated-Q4_K_M-GGUF/resolve/main/qwen3-30b-a3b-abliterated-q4_k_m.gguf?download=true", "chatml"),
    ("openai: gpt-oss 20B abliterated Q4_K_M", "https://huggingface.co/bartowski/huihui-ai_Huihui-gpt-oss-20b-BF16-abliterated-GGUF/resolve/main/huihui-ai_Huihui-gpt-oss-20b-BF16-abliterated-Q4_K_M.gguf?download=true", "harmony"),
    # notebook originals
    ("Cydonia v2.1 24B Q6_K_L", "https://huggingface.co/bartowski/TheDrummer_Cydonia-24B-v2.1-GGUF/resolve/main/TheDrummer_Cydonia-24B-v2.1-Q6_K_L.gguf", "mistral-v7-tekken"),
    ("trashpanda-org: Snowdrop v0 32B Q4_K_L", "https://huggingface.co/bartowski/trashpanda-org_QwQ-32B-Snowdrop-v0-GGUF/resolve/main/trashpanda-org_QwQ-32B-Snowdrop-v0-Q4_K_L.gguf", "chatml"),
    ("trashpanda-org: Snowdrop v0 32B Q4_K_L", "https://huggingface.co/bartowski/trashpanda-org_QwQ-32B-Snowdrop-v0-GGUF/resolve/main/trashpanda-org_QwQ-32B-Snowdrop-v0-Q4_K_L.gguf", "chatml"),
    
    
    
    
    ("[CATEGORY] OLD COLAB NOTEBOOK MODELS", "https://huggingface.co/bartowski/trashpanda-org_QwQ-32B-Snowdrop-v0-GGUF/resolve/main/trashpanda-org_QwQ-32B-Snowdrop-v0-Q4_K_L.gguf", "chatml"),
    # old hibiki colab models - need to update quant
    ("Kunoichi 7B Q8_0", "https://huggingface.co/Lewdiculous/Kunoichi-DPO-v2-7B-GGUF-Imatrix/resolve/main/Kunoichi-DPO-v2-7B-Q8_0-imatrix.gguf?download=true", "chatml"),
    ("WizardIceLemonTeaRP 32k Q8_0", "https://huggingface.co/mradermacher/WizardIceLemonTeaRP-32k-GGUF/resolve/main/WizardIceLemonTeaRP-32k.Q8_0.gguf?download=true", "chatml"),
    ("WizardLaker 7B Q8_0", "https://huggingface.co/mradermacher/WizardLaker-7B-GGUF/resolve/main/WizardLaker-7B.Q8_0.gguf?download=true", "chatml"),
    ("StunnaMaid 7B v0.2 Q8_0", "https://huggingface.co/Lewdiculous/Nyanade_Stunna-Maid-7B-v0.2-GGUF-IQ-Imatrix/resolve/main/Nyanade_Stunna-Maid-7B-v0.2-Q8_0-imat.gguf?download=true", "chatml"),
    ("LemonKunoichiWizard Q8_0", "https://huggingface.co/mradermacher/LemonKunoichiWizardV3-GGUF/resolve/main/LemonKunoichiWizardV3.Q8_0.gguf?download=true", "chatml"),
    ("Llama-3-Halu-Blackroot 8B Q8_0", "https://huggingface.co/mradermacher/Halu-8B-Llama3-Blackroot-GGUF/resolve/main/Halu-8B-Llama3-Blackroot.Q8_0.gguf?download=true", "chatml"),
    ("Llama-3-Lumimaid 8B Q8_0", "https://huggingface.co/Lewdiculous/Llama-3-Lumimaid-8B-v0.1-OAS-GGUF-IQ-Imatrix/resolve/main/Llama-3-Lumimaid-8B-v0.1-OAS-Q8_0-imat.gguf?download=true", "chatml"),
    ("Llama-3-Daybreak-Lumimaid 8B Q8_0", "https://huggingface.co/mradermacher/llama3-daybreak-lumimaid0.1-8b-hf-GGUF/resolve/main/llama3-daybreak-lumimaid0.1-8b-hf.Q8_0.gguf?download=true", "chatml"),
    ("Llama-3-Stheno 8B Q8_0", "https://huggingface.co/QuantFactory/Llama-3.1-8B-Stheno-v3.4-GGUF/resolve/main/Llama-3.1-8B-Stheno-v3.4.Q8_0.gguf?download=true", "chatml"),
    ("Llama-3-Stheno-ULTRA 8B Q8_0", "https://huggingface.co/DavidAU/L3-8B-Stheno-v3.3-32K-Ultra-NEO-V1-IMATRIX-GGUF/resolve/main/L3-8B-Stheno-v3.3-32K-NEO-V1-D_AU-Q8_0-imat13.gguf?download=true", "chatml"),
    ("Llama-3-SthenoMaidBlackroot 8B Q8_0", "https://huggingface.co/mradermacher/L3-SthenoMaidBlackroot-8B-V1-GGUF/resolve/main/L3-SthenoMaidBlackroot-8B-V1.Q8_0.gguf?download=true", "chatml"),
    ("Llama-3-Umbral-Mind 8B Q8_0", "https://huggingface.co/QuantFactory/L3-Umbral-Mind-RP-v3.0-8B-GGUF/resolve/main/L3-Umbral-Mind-RP-v3.0-8B.Q8_0.gguf?download=true", "chatml"),
    ("Llama-3-Hathor-Stable 8B Q8_0", "https://huggingface.co/mradermacher/Hathor_Stable-v0.2-L3-8B-GGUF/resolve/main/Hathor_Stable-v0.2-L3-8B.Q8_0.gguf?download=true", "chatml"),
    ("Llama-3-Chara-Alpha 8B Q8_0", "https://huggingface.co/mradermacher/L3-8B-Chara-v1-Alpha-GGUF/resolve/main/L3-8B-Chara-v1-Alpha.Q8_0.gguf?download=true", "chatml"),
    ("Llama-3-Hathor-Sofit 8B Q8_0", "https://huggingface.co/mradermacher/Hathor_Sofit-L3-8B-v1-GGUF/resolve/main/Hathor_Sofit-L3-8B-v1.Q8_0.gguf?download=true", "chatml"),
    ("Llama-3-Lunaris 8B Q8_0", "https://huggingface.co/bartowski/L3-8B-Lunaris-v1-GGUF/resolve/main/L3-8B-Lunaris-v1-Q8_0_L.gguf?download=true", "chatml"),
    ("Llama-3.1-Dark-Planet-8-Orbs 8B Q8_0", "https://huggingface.co/DavidAU/L3-Dark-Planet-8B-V2-Eight-Orbs-Of-Power-GGUF/resolve/main/L3-Dark-Planet-8B-V2-EOOP-D_AU-Q8_0.gguf?download=true", "chatml"),
    ("Llama-3.1-DarkIdol 8B Q8_0", "https://huggingface.co/QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF/resolve/main/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.Q8_0.gguf?download=true", "chatml"),
    ("Yodayo-Nephra 8B Q8_0", "https://huggingface.co/Marcus-Arcadius/nephra_v1.0-Q8_0-GGUF/resolve/main/nephra_v1.0-q8_0.gguf?download=true", "chatml"),
    ("Gemma2-SPPO 9B Q6_K", "https://huggingface.co/mradermacher/Gemma-2-9B-It-SPPO-Iter3-i1-GGUF/resolve/main/Gemma-2-9B-It-SPPO-Iter3.i1-Q6_K.gguf?download=true", "chatml"),
    ("Tarnished", "https://huggingface.co/mradermacher/tarnished-9b-GGUF/resolve/main/tarnished-9b.Q6_K.gguf?download=true", "chatml"),
    ("Gemma2-Daybreak(9b)", "https://huggingface.co/mradermacher/gemma2-9B-daybreak-v0.5-i1-GGUF/resolve/main/gemma2-9B-daybreak-v0.5.i1-Q6_K.gguf?download=true", "chatml"),
    ("Gemma2-Sunfall(9b)", "https://huggingface.co/mradermacher/gemma2-9B-sunfall-v0.5.2-i1-GGUF/resolve/main/gemma2-9B-sunfall-v0.5.2.i1-Q6_K.gguf?download=true", "chatml"),
    ("Gemma2-Ataraxy(9b)", "https://huggingface.co/bartowski/Gemma-2-Ataraxy-9B-GGUF/resolve/main/Gemma-2-Ataraxy-9B-Q5_K_L.gguf?download=true", "chatml"),
    ("Fimbulvetr2(11b)", "https://huggingface.co/Lewdiculous/Fimbulvetr-11B-v2-GGUF-IQ-Imatrix/resolve/main/Fimbulvetr-11B-v2-Q5_K_M-imat.gguf?download=true", "chatml"),
    ("Fimbulvetr-Kuro-Lotus(11b)", "https://huggingface.co/saishf/Fimbulvetr-Kuro-Lotus-10.7B-GGUF/resolve/main/Fimbulvetr-Kuro-Lotus-10.7B-Q6_K.gguf?download=true", "chatml"),
    ("Kaiju(11b)", "https://huggingface.co/Himitsui/Kaiju-11B-GGUF/resolve/main/Kaiju-11B.q5_K_M.gguf?download=true", "chatml"),
    ("Fimbulvetr-Holodeck-Erebus-Westlake(11b)", "https://huggingface.co/PJMixers/Fimbulvetr-Holodeck-Erebus-Westlake-10.7B-GGUF/resolve/main/Fimbulvetr-Holodeck-Erebus-Westlake-10.7B-q4_K_S.gguf", "chatml"),
    ("MoistralV3(11b)", "https://huggingface.co/TheDrummer/Moistral-11B-v3-GGUF/resolve/main/Moistral-11B-v3-Q5_K_M.gguf?download=true", "chatml"),
    ("Lumimaid(12b)", "https://huggingface.co/mradermacher/Lumimaid-v0.2-12B-i1-GGUF/resolve/main/Lumimaid-v0.2-12B.i1-Q6_K.gguf?download=true", "chatml"),
    ("Mini-Magnum(12b)", "https://huggingface.co/InferenceIllusionist/mini-magnum-12b-v1.1-iMat-GGUF/resolve/main/mini-magnum-12b-v1.1-iMat-Q6_K.gguf?download=true", "chatml"),
    ("Nemomix(12b)", "https://huggingface.co/bartowski/NemoMix-Unleashed-12B-GGUF/resolve/main/NemoMix-Unleashed-12B-Q6_K.gguf?download=true", "chatml"),
    ("Celeste(12b)", "https://huggingface.co/QuantFactory/Celeste-12B-V1.6-GGUF/resolve/main/Celeste-12B-V1.6.Q6_K.gguf?download=true", "chatml"),
    ("Lyra(12b)", "https://huggingface.co/Lewdiculous/MN-12B-Lyra-v4-GGUF-IQ-Imatrix/resolve/main/MN-12B-Lyra-v4-Q6_K-imat.gguf?download=true", "chatml"),
    ("Guns-and-roses(12b)", "https://huggingface.co/Reiterate3680/guns-and-roses-r1-GGUF/resolve/main/guns-and-roses-r1-Q6_K_L-imat.gguf?download=true", "chatml"),
    ("Magnum(12b)", "https://huggingface.co/mradermacher/magnum-v4-12b-GGUF/resolve/main/magnum-v4-12b.Q6_K.gguf?download=true", "chatml"),
    ("Starcannon(12b)", "https://huggingface.co/mradermacher/MN-12B-Starcannon-v3-i1-GGUF/resolve/main/MN-12B-Starcannon-v3.i1-Q6_K.gguf?download=true", "chatml"),
    ("Rocinante(12b)", "https://huggingface.co/TheDrummer/UnslopNemo-12B-v3-GGUF/resolve/main/Rocinante-12B-v2g-Q6_K.gguf?download=true", "chatml"),
    ("Chronos Gold(12b)", "https://huggingface.co/mradermacher/Chronos-Gold-12B-1.0-i1-GGUF/resolve/main/Chronos-Gold-12B-1.0.i1-Q6_K.gguf?download=true", "chatml"),
    ("L3.1 OpenCrystal(12b)", "https://huggingface.co/mradermacher/OpenCrystal-12B-L3-i1-GGUF/resolve/main/OpenCrystal-12B-L3.i1-Q6_K.gguf?download=true", "chatml"),
    ("Mag-Mell(12b)", "https://huggingface.co/mradermacher/MN-12B-Mag-Mell-R1-GGUF/resolve/main/MN-12B-Mag-Mell-R1.Q6_K.gguf?download=true", "chatml"),
    ("Violet Twilight(12b)", "https://huggingface.co/Epiculous/Violet_Twilight-v0.2-GGUF/resolve/main/Violet_Twilight-v0.2.Q6_K.gguf?download=true", "chatml"),
    ("Halide(12b)", "https://huggingface.co/mradermacher/MN-Halide-12b-v1.0-i1-GGUF/resolve/main/MN-Halide-12b-v1.0.i1-Q6_K.gguf?download=true", "chatml"),
    ("Stellar Odyssy(12b)", "https://huggingface.co/mradermacher/Stellar-Odyssey-12b-v0.0-i1-GGUF/resolve/main/Stellar-Odyssey-12b-v0.0.i1-Q6_K.gguf?download=true", "chatml"),
    ("MadMix(12b)", "https://huggingface.co/mradermacher/MadMix-Unleashed-12B-i1-GGUF/resolve/main/MadMix-Unleashed-12B.i1-Q6_K.gguf?download=true", "chatml"),
    ("DarkPlanet(12b)", "https://huggingface.co/DavidAU/MN-Dark-Planet-TITAN-12B-GGUF/resolve/main/MN-Dark-Planet-TITAN-12B-D_AU-Q6_k.gguf?download=true", "chatml"),
    ("UnslopNemoV4.1(12b)", "https://huggingface.co/TheDrummer/UnslopNemo-12B-v4.1-GGUF/resolve/main/Rocinante-12B-v2j-Q6_K.gguf?download=true", "chatml"),
    ("Violet Lotus(12b)", "https://huggingface.co/QuantFactory/MN-Violet-Lotus-12B-GGUF/resolve/main/MN-Violet-Lotus-12B.Q6_K.gguf?download=true", "chatml"),
    ("Abomination Science(12b)", "https://huggingface.co/mradermacher/AbominationScience-12B-v4-i1-GGUF/resolve/main/AbominationScience-12B-v4.i1-Q6_K.gguf?download=true", "chatml"),
    ("DarkAtom(12b)", "https://huggingface.co/mradermacher/DarkAtom-12B-v3-i1-GGUF/resolve/main/DarkAtom-12B-v3.i1-Q6_K.gguf?download=true", "chatml"),
    ("CaptainErisViolet(12b)", "https://huggingface.co/QuantFactory/Captain-Eris_Violet-V0.420-12B-GGUF/resolve/main/Captain-Eris_Violet-V0.420-12B.Q5_K_M.gguf?download=true", "chatml"),
    ("Ink(12b)", "https://huggingface.co/allura-org/MN-12b-RP-Ink-GGUF/resolve/main/MN-12b-RP-Ink-Q5_K_M.gguf?download=true", "chatml"),
    ("Wayfarer(12b)", "https://huggingface.co/LatitudeGames/Wayfarer-12B-GGUF/resolve/main/Wayfarer-12B-Q5_K_M.gguf?download=true", "chatml"),
    ("TieFighter(13b)", "https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter-GGUF/resolve/main/LLaMA2-13B-Tiefighter.Q4_K_S.gguf?download=true", "chatml"),
    ("Psyfighter(13b)", "https://huggingface.co/TheBloke/Psyfighter-13B-GGUF/resolve/main/psyfighter-13b.Q5_K_M.gguf", "chatml"),
    ("Psyfighter2(13b)", "https://huggingface.co/KoboldAI/LLaMA2-13B-Psyfighter2-GGUF/resolve/main/LLaMA2-13B-Psyfighter2.Q4_K_M.gguf", "chatml"),
    ("PsyMedRP(13b)", "https://huggingface.co/Undi95/PsyMedRP-v1-13B-GGUF/resolve/main/PsyMedRP-v1-13B.q5_k_m.gguf", "chatml"),
    ("EstopianMaid(13b)", "https://huggingface.co/KatyTheCutie/EstopianMaid-13B-GGUF/resolve/main/EstopianMaid-13B-Q4_K_S.gguf", "chatml"),
    ("Noromaid0.1(13b)", "https://huggingface.co/NeverSleep/Noromaid-13b-v0.1.1-GGUF/resolve/main/Noromaid-13b-v0.1.1.q4_k_s.gguf", "chatml"),
    ("EVA-Qwen2.5 (14b)", "https://huggingface.co/mradermacher/EVA-Qwen2.5-14B-v0.2-i1-GGUF/resolve/main/EVA-Qwen2.5-14B-v0.2.i1-Q5_K_M.gguf?download=true", "chatml"),
    ("Eidolon(14b)", "https://huggingface.co/Lambent/Eidolon-v2.1-14B-Q4_K_M-GGUF/resolve/main/eidolon-v2.1-14b-q4_k_m.gguf?download=true", "chatml"),
    ("EVA-Tissint(14b)", "https://huggingface.co/mradermacher/EVA-Tissint-v1.2-14B-i1-GGUF/resolve/main/EVA-Tissint-v1.2-14B.i1-Q5_K_M.gguf?download=true", "chatml"),
    ("Sugarquill(14b)", "https://huggingface.co/Triangle104/TQ2.5-14B-Sugarquill-v1-Q5_K_M-GGUF/resolve/main/tq2.5-14b-sugarquill-v1-q5_k_m.gguf?download=true", "chatml"),
    ("Freya(14b)", "https://huggingface.co/mradermacher/14B-Qwen2.5-Freya-x1-i1-GGUF/resolve/main/14B-Qwen2.5-Freya-x1.i1-Q5_K_M.gguf?download=true", "chatml"),
    ("Kunou(14b)", "https://huggingface.co/mradermacher/14B-Qwen2.5-Kunou-v1-GGUF/resolve/main/14B-Qwen2.5-Kunou-v1.Q5_K_M.gguf?download=true", "chatml"),
    ("Sailor2(14b)", "https://huggingface.co/mradermacher/Sailor2-14B-GGUF/resolve/main/Sailor2-14B.Q4_K_S.gguf?download=true", "chatml"),
    ("Sailor2-Chat(14b)", "https://huggingface.co/mradermacher/Sailor2-14B-Chat-GGUF/resolve/main/Sailor2-14B-Chat.Q4_K_S.gguf?download=true", "chatml"),
    ("Deepseek-Kunou(14b)", "https://huggingface.co/mradermacher/Deepseeker-Kunou-Qwen2.5-14b-i1-GGUF/resolve/main/Deepseeker-Kunou-Qwen2.5-14b.i1-Q5_K_M.gguf?download=true", "chatml"),
    ("L3.1 OpenCrystal(15b)", "https://huggingface.co/mradermacher/OpenCrystal-15B-L3-v2-i1-GGUF/resolve/main/OpenCrystal-15B-L3-v2.i1-Q6_K.gguf?download=true", "chatml"),
    ("Cydonia(22b)", "https://huggingface.co/MarsupialAI/Cydonia-22B-v1_iMat_GGUF/resolve/main/Cydonia-22B-v1_iQ4xs.gguf?download=true", "chatml"),
    ("Magnum(22b)", "https://huggingface.co/mradermacher/magnum-v4-22b-i1-GGUF/resolve/main/magnum-v4-22b.i1-IQ4_XS.gguf?download=true", "chatml"),
    ("Sorcerer(22b)", "https://huggingface.co/Quant-Cartel/SorcererLM-22B-iMat-GGUF/resolve/main/SorcererLM-22B-iMat-IQ4_XS.gguf?download=true", "chatml"),
]

_dropdown_options = [(label, (url, instruct)) for (label, url, instruct) in model_options]

model_dropdown = widgets.Dropdown(
    options=_dropdown_options,
    value=_dropdown_options[0][1],
    description='Model:',
    layout=widgets.Layout(width='100%')
)

custom_url = widgets.Text(
    value='',
    placeholder='Or paste a custom GGUF URL here...',
    description='Custom URL:',
    layout=widgets.Layout(width='100%')
)

use_custom = widgets.Checkbox(value=False, description='Use custom URL')

context_size_input = widgets.IntText(
    value=16384,
    description='Context:',
    layout=widgets.Layout(width='200px')
)

max_tokens_input = widgets.IntText(
    value=2048,
    description='Max tokens:',
    layout=widgets.Layout(width='200px')
)

instruct_preset_dropdown = widgets.Dropdown(
    options=list(premade_instruct.keys()),
    value="chatml",
    description='Instruct Preset:',
    layout=widgets.Layout(width='300px')
)

layers_input = widgets.IntText(value=99, description='Layers:')
kvcache_input = widgets.Dropdown(options=["0","1","2"], value="0", description='KvCache:')
blayer_input = widgets.IntText(value=99, description='blayer:')

def get_selected_url():
    return custom_url.value.strip() if use_custom.value and custom_url.value.strip() else model_dropdown.value[0]

def get_selected_instruct():
    return instruct_preset_dropdown.value if use_custom.value else model_dropdown.value[1]

def _on_model_change(change):
    url, instruct = change['new']
    instruct_preset_dropdown.value = instruct

model_dropdown.observe(_on_model_change, names='value')

# Display UI (models / core settings)
ui_models = widgets.VBox([
    model_dropdown,
    widgets.HBox([use_custom, custom_url]),
    widgets.HBox([context_size_input, max_tokens_input]),
    widgets.HTML(value="<hr>"),
    instruct_preset_dropdown,
    widgets.HTML(value="<hr>"),
    widgets.HTML(value="<span>Advanced settings: Do not tweak unless you know what you are doing.</span>"),
    widgets.HBox([layers_input, kvcache_input, blayer_input])
])

display(ui_models)

_SELECTED_URL_VALUE = get_selected_url()
_CONTEXT_SIZE_VALUE = int(context_size_input.value)
_MAX_TOKENS_VALUE = int(max_tokens_input.value)
_INSTRUCT_PRESET_VALUE = get_selected_instruct()
_LAYERS_VALUE = int(layers_input.value)
_KVCACHE_VALUE = kvcache_input.value
_BLAYER_VALUE = int(blayer_input.value)

def _on_any_change(change):
    global _SELECTED_URL_VALUE, _CONTEXT_SIZE_VALUE, _MAX_TOKENS_VALUE
    global _INSTRUCT_PRESET_VALUE
    global _LAYERS_VALUE, _KVCACHE_VALUE, _BLAYER_VALUE
    _SELECTED_URL_VALUE = get_selected_url()
    _CONTEXT_SIZE_VALUE = int(context_size_input.value)
    _MAX_TOKENS_VALUE = int(max_tokens_input.value)
    _INSTRUCT_PRESET_VALUE = get_selected_instruct()
    _LAYERS_VALUE = int(layers_input.value)
    _KVCACHE_VALUE = kvcache_input.value
    _BLAYER_VALUE = int(blayer_input.value)

for w in [model_dropdown, custom_url, use_custom,
          context_size_input, max_tokens_input,
          instruct_preset_dropdown,
          layers_input, kvcache_input, blayer_input]:
    w.observe(_on_any_change, names='value')


# -------------------------------
# Proxy + Sampler settings UI
# -------------------------------

# Managed internals (hidden from users)
_PROXY_PORT = 5002
_KOBOLD_BASE = 'http://127.0.0.1:5001'

# Defaults aligned with jai-proxy-suite
_instruct_from_notebook = globals().get('_INSTRUCT_PRESET_VALUE', 'chatml')
web_param = {
    "instruct": _instruct_from_notebook,
    "top_p": 0.92,
    "min_p": 0.12,
    "top_k": -1,
    "repetition_penalty": 1.05,
    "frequency_penalty": 0,
    "presence_penalty": 0.26,
    "banned_strings": [],
    "dry_enabled": False,
    "dry_multiplier": 1.75,
    "dry_base": 1.1,
    "dry_allowed_length": 3,
    "dry_range": 1024,
}

# Sampler controls only
_top = widgets.HBox([
    widgets.FloatText(value=0.92, description='top_p:'),
    widgets.FloatText(value=0.12, description='min_p:'),
    widgets.IntText(value=-1, description='top_k:'),
])
_rep = widgets.HBox([
    widgets.FloatText(value=1.05, description='rep_pen:'),
    widgets.FloatText(value=0.0, description='freq_pen:'),
    widgets.FloatText(value=0.26, description='pres_pen:'),
])

banned_strings_input = widgets.Textarea(
    value="",
    description='banned_strings:',
    layout=widgets.Layout(width='100%', height='80px')
)

dry_enabled_chk = widgets.Checkbox(value=False, description='DRY enabled')
dry_mult_input = widgets.FloatText(value=1.75, description='dry_mult:')
dry_base_input = widgets.FloatText(value=1.1, description='dry_base:')
dry_len_input = widgets.IntText(value=3, description='dry_len:')
dry_range_input = widgets.IntText(value=1024, description='dry_range:')

def _sync_from_widgets(_=None):
    # Always refresh instruct from the model section (first part of this cell)
    web_param["instruct"] = globals().get('_INSTRUCT_PRESET_VALUE', 'chatml')

    # Map sampler inputs from UI containers
    web_param["top_p"] = float(_top.children[0].value)
    web_param["min_p"] = float(_top.children[1].value)
    web_param["top_k"] = int(_top.children[2].value)

    web_param["repetition_penalty"] = float(_rep.children[0].value)
    web_param["frequency_penalty"] = float(_rep.children[1].value)
    web_param["presence_penalty"] = float(_rep.children[2].value)

    banned_raw = banned_strings_input.value.strip()
    web_param["banned_strings"] = [s.strip() for s in banned_raw.split(',')] if banned_raw else []

    web_param["dry_enabled"] = bool(dry_enabled_chk.value)
    web_param["dry_multiplier"] = float(dry_mult_input.value)
    web_param["dry_base"] = float(dry_base_input.value)
    web_param["dry_allowed_length"] = int(dry_len_input.value)
    web_param["dry_range"] = int(dry_range_input.value)

# Attach observers to every relevant widget
for w in [
    _top.children[0], _top.children[1], _top.children[2],
    _rep.children[0], _rep.children[1], _rep.children[2],
    banned_strings_input,
    dry_enabled_chk, dry_mult_input, dry_base_input, dry_len_input, dry_range_input
]:
    w.observe(_sync_from_widgets, names='value')

# Initial sync so web_param matches defaults immediately
_sync_from_widgets()

ui_sampler = widgets.VBox([
    widgets.HTML(value="<b>Sampler params</b>"),
    _top,
    _rep,
    banned_strings_input,
    widgets.HTML(value="<hr><b>DRY Sampling</b>"),
    widgets.HBox([dry_enabled_chk, dry_mult_input, dry_base_input, dry_len_input, dry_range_input])
])

display(ui_sampler)


VBox(children=(Dropdown(description='Model:', layout=Layout(width='100%'), options=(('Kunoichi 7B Q8_0', ('htt…

VBox(children=(HTML(value='<b>Sampler params</b>'), HBox(children=(FloatText(value=0.92, description='top_p:')…

### Reverse proxy setup

#### How to tell when this cell is done:

`Proxy running on port 5002, forwarding to http://127.0.0.1:5001` should appear in the cell's output.

In [8]:
# Reverse proxy for Kobold
!pip -q install flask flask_cors requests
from flask import Flask, request, jsonify, Response, stream_with_context
from flask_cors import CORS
import requests as _rq
import json as _json
import time as _time

app = Flask(__name__)
CORS(app)

# defaults if UI not run
try:
    _ = _PROXY_PORT
except NameError:
    _PROXY_PORT = 5002
try:
    _ = _KOBOLD_BASE
except NameError:
    _KOBOLD_BASE = 'http://127.0.0.1:5001'
try:
    _ = web_param
except NameError:
    web_param = {
        "instruct": "chatml",
        "top_p": 0.92,
        "min_p": 0.12,
        "top_k": -1,
        "repetition_penalty": 1.05,
        "frequency_penalty": 0,
        "presence_penalty": 0.26,
        "banned_strings": [],
        "dry_enabled": False,
        "dry_multiplier": 1.75,
        "dry_base": 1.1,
        "dry_allowed_length": 3,
        "dry_range": 1024,
    }

# prompt adapter like jai-proxy-suite
try:
    _ = premade_instruct
except NameError:
    premade_instruct = {
        "chatml": {
            "system_start": "<|im_start|>system",
            "system_end": "<|im_end|>",
            "user_start": "<|im_start|>user",
            "user_end": "<|im_end|>",
            "assistant_start": "<|im_start|>assistant",
            "assistant_end": "<|im_end|>",
        }
    }


def message_instructor(messages_list, preset=None):
    adapter_key = preset or web_param.get('instruct', 'chatml')
    adapter = premade_instruct.get(adapter_key, premade_instruct['chatml'])
    ss = adapter.get('system_start', '')
    se = adapter.get('system_end', '')
    us = adapter.get('user_start', '')
    ue = adapter.get('user_end', '')
    as_ = adapter.get('assistant_start', '')
    ae = adapter.get('assistant_end', '')
    out = []
    for m in messages_list:
        if m['role'] == 'system':
            out.append(ss + m['content'] + se)
        elif m['role'] == 'user':
            out.append(us + m['content'] + ue)
        elif m['role'] == 'assistant':
            out.append(as_ + m['content'] + ae)
        elif m['role'] == 'tool':
            out.append(m['content'])
    out.append(as_)
    return ''.join(out)


def build_kobold_payload(src):
    # Keep instruct preset in sync with the model selection UI
    try:
        web_param['instruct'] = globals().get('_INSTRUCT_PRESET_VALUE', web_param.get('instruct', 'chatml'))
    except Exception:
        pass

    # Take OpenAI Chat Completions body and map to Kobold-compatible params
    body = {
        'model': src.get('model', ''),
        'temperature': src.get('temperature', 0.9),
        'max_tokens': src.get('max_tokens', 2048),
        'min_p': web_param['min_p'],
        'top_p': web_param['top_p'],
        'top_k': web_param['top_k'],
        'repetition_penalty': web_param['repetition_penalty'],
        'presence_penalty': web_param['presence_penalty'],
        'frequency_penalty': web_param['frequency_penalty'],
        'banned_strings': web_param['banned_strings'],
        'n': 1,
        'best_of': 1,
        'skip_special_tokens': True,
        'sampler_order': [6,0,1,3,4,2,5]
    }
    # DRY sampling pass-through
    if web_param.get('dry_enabled'):
        body.update({
            'dry_allowed_length': web_param['dry_allowed_length'],
            'dry_base': web_param['dry_base'],
            'dry_multiplier': web_param['dry_multiplier'],
            'dry_penalty_last_n': web_param['dry_range'],
            'dry_sequence_breakers': web_param.get('dry_sequence_breakers', None)
        })
    # Adapt messages to single prompt if backend expects that; Kobold v1 chat supports messages
    if 'messages' in src:
        body['messages'] = src['messages']
    if src.get('messages') and web_param.get('instruct'):
        # also build an instruct prompt for adapters that need it (some kobold variants)
        body['prompt'] = message_instructor(src['messages'])
    # stream flag
    body['stream'] = bool(src.get('stream', False))
    return body


def forward_stream(endpoint_url, config):
    def gen():
        try:
            with _rq.post(endpoint_url, json=config, stream=True) as r:
                r.raise_for_status()
                for line in r.iter_lines():
                    if not line:
                        continue
                    text = line.decode('utf-8')
                    yield f"{text}\n\n"
                    _time.sleep(0.02)
        except Exception as e:
            err = {"error": str(e)}
            yield f"data: {_json.dumps(err)}\n\n"
    return Response(stream_with_context(gen()), content_type='text/event-stream')


def forward_normal(endpoint_url, config):
    try:
        r = _rq.post(endpoint_url, json=config)
        r.raise_for_status()
        data = r.json()
        # auto-trim like suite (simple sentence boundary trim)
        try:
            txt = data.get('choices', [{}])[0].get('message', {}).get('content')
            if isinstance(txt, str):
                data['choices'][0]['message']['content'] = txt.rstrip()
        except Exception:
            pass
        return jsonify(data)
    except Exception as e:
        return Response(_json.dumps({"error": str(e)}), status=500, content_type='application/json')


@app.route('/v1/chat/completions', methods=['POST'])
def proxy_chat():
    if not request.json:
        return jsonify(error=True), 400
    endpoint = f"{_KOBOLD_BASE}/v1/chat/completions"
    payload = build_kobold_payload(request.json)
    if payload.get('stream'):
        return forward_stream(endpoint, payload)
    else:
        return forward_normal(endpoint, payload)


import threading as _th

def _run_proxy():
    app.run(host='0.0.0.0', port=_PROXY_PORT, use_reloader=False)

try:
    PROXY_THREAD
    # If already defined, do nothing or restart logic could be added here
except NameError:
    PROXY_THREAD = _th.Thread(target=_run_proxy, daemon=True)
    PROXY_THREAD.start()
print(f"Proxy running on port {_PROXY_PORT}, forwarding to {_KOBOLD_BASE}")


Proxy running on port 5002, forwarding to http://127.0.0.1:5001
 * Serving Flask app '__main__'
 * Debug mode: off


### Download model

#### How to tell when this cell is done:

The name of the model you selected should appear below, as well as a full download progress bar.

In [9]:
# Download selected model
import os
!pip install hf_transfer
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

try:
    URL = _SELECTED_URL_VALUE if '_SELECTED_URL_VALUE' in globals() and _SELECTED_URL_VALUE else URL
except NameError:
    URL = "https://huggingface.co/bartowski/trashpanda-org_QwQ-32B-Snowdrop-v0-GGUF/resolve/main/trashpanda-org_QwQ-32B-Snowdrop-v0-Q4_K_L.gguf"

from urllib.parse import urlparse, parse_qs

def extract_parts(url):
    parsed_url = urlparse(url)
    path_parts = parsed_url.path.strip('/').split('/')
    query = parse_qs(parsed_url.query)
    
    model_name = path_parts[0]+"/"+path_parts[1]
    version = path_parts[3]
    file_name = path_parts[4]
    
    return model_name, version, file_name

REPO_ID, REVISION, FILE = extract_parts(URL)

DIR = REPO_ID.replace("/", "_")

if REVISION != "" and REVISION != "main":
    DIR = f"{DIR}_{REVISION}"

print("Model Name: " + DIR)
print("Version: " + REVISION)
print("File: " + FILE)

from huggingface_hub import hf_hub_download

FULLPATH = f"/kaggle/models/{DIR}"

if REVISION != "":
    hf_hub_download(repo_id=REPO_ID, filename=FILE, local_dir_use_symlinks=False, revision=REVISION, local_dir=FULLPATH)
    
if REVISION == "":
    hf_hub_download(repo_id=REPO_ID, filename=FILE, local_dir_use_symlinks=False, local_dir=FULLPATH)

Model Name: bartowski_TheDrummer_Cydonia-24B-v2.1-GGUF
Version: main
File: TheDrummer_Cydonia-24B-v2.1-Q6_K_L.gguf


For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.


TheDrummer_Cydonia-24B-v2.1-Q6_K_L.gguf:   0%|          | 0.00/19.7G [00:00<?, ?B/s]

### Expose endpoint via ngrok

#### How to tell when this cell is done:

A button containing `Copy API Link` should appear in the cell output.

In [11]:
# ngrok setup
!pip3 install pyngrok
from kaggle_secrets import UserSecretsClient
from pyngrok import ngrok
secret_label = "ngrok-auth"
secret_value = UserSecretsClient().get_secret(secret_label)
!ngrok config add-authtoken {secret_value}
tunnel = ngrok.connect(5002)
print("Your remote link is: " + tunnel.public_url)
from IPython.display import HTML
html = f'''
    <div>
    <h4>
    <a href="{tunnel.public_url}/v1" id="api">{tunnel.public_url}/v1</a>
    <button onclick="copyToClipboard('api')">Copy API Link</button>
    </h4>
    </div>
    
    <script>
    function copyToClipboard(copy) {{
        var textToCopy = document.getElementById(copy).innerText;
        var tempTextarea = document.createElement("textarea");
        tempTextarea.value = textToCopy;
        document.body.appendChild(tempTextarea);
        tempTextarea.select();
        tempTextarea.setSelectionRange(0, 99999);
        document.execCommand("copy");
        document.body.removeChild(tempTextarea);
        alert("Copied the text: " + textToCopy);
    }}
    </script>
'''

display(HTML(html))

Collecting pyngrok
  Downloading pyngrok-7.3.0-py3-none-any.whl.metadata (8.1 kB)
Downloading pyngrok-7.3.0-py3-none-any.whl (25 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.3.0
Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml                                
Your remote link is: https://0e57af551ce7.ngrok-free.app


### Launch Kobold

#### How to tell when this cell is done:

`Please connect to custom endpoint at http://localhost:5001` should appear in the cell output.

In [None]:
# Start Kobold
MODEL = FULLPATH + "/" + FILE
print("Model to load: " + MODEL)

import json as _json
_preset = _INSTRUCT_PRESET_VALUE if '_INSTRUCT_PRESET_VALUE' in globals() else 'chatml'
_tpl = premade_instruct.get(_preset, premade_instruct['chatml'])
with open('instruct.json', 'w') as f:
    f.write(_json.dumps(_tpl, separators=(",", ":")))

_layers = _LAYERS_VALUE if '_LAYERS_VALUE' in globals() else 999

!./koboldcpp {MODEL} --contextsize {_CONTEXT_SIZE_VALUE if '_CONTEXT_SIZE_VALUE' in globals() else 24000} --usecublas 0 1 normal mmq rowsplit --blasbatchsize 512 --flashattention --foreground --gpulayers {_layers} --quiet --threads 999 --blasthreads 999 --nommap --tensor_split 1 1 --skiplauncher --defaultgenamt={_MAX_TOKENS_VALUE if '_MAX_TOKENS_VALUE' in globals() else 2048} --chatcompletionsadapter instruct.json