# Phi‑3.5 Fine‑Tuning on Kaggle (T4) — JSONL from `/kaggle/input` or `/kaggle/working`

This Kaggle notebook fine‑tunes **microsoft/Phi‑3.5‑mini‑instruct** using **PEFT LoRA + TRL SFTTrainer** with **T4‑safe settings**.

## Dataset format (JSONL)
Each line:
```json
{"conversations":[{"from":"human","value":"..."},{"from":"gpt","value":"..."}]}
```

## Where to put the dataset
- Recommended: add your JSONL as a Kaggle Dataset → it will appear in `/kaggle/input/<dataset_name>/`
- Or upload/copy it into `/kaggle/working/`

## Important
After the install cell, **Restart Session** (Kaggle requirement) so imports load the correct versions.

In [1]:
# ===== (0) GPU sanity check =====
import torch
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
    !nvidia-smi
else:
    print("No GPU detected. Enable GPU in Kaggle Settings.")

CUDA available: True
GPU: Tesla T4
Wed Dec 31 00:20:06 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.172.08             Driver Version: 570.172.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   69C    P8             16W /   70W |       3MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla 

## 1) Clean install (fixes mixed Transformers / dependency issues)
Run the next cell once, then **Restart Session**, then continue.

In [2]:
# ===== (1) Hard clean + install pinned compatible versions =====
!pip -q uninstall -y transformers tokenizers huggingface-hub safetensors accelerate datasets peft trl sentencepiece
!pip -q uninstall -y bitsandbytes triton unsloth

!pip -q install --no-cache-dir --force-reinstall \
  "transformers==4.44.2" \
  "tokenizers==0.19.1" \
  "huggingface-hub==0.24.6" \
  "safetensors==0.4.4" \
  "accelerate==0.33.0" \
  "datasets==2.20.0" \
  "peft==0.12.0" \
  "trl==0.9.6" \
  "sentencepiece"

print("Install complete. NOW: Kaggle → Restart Session, then continue.")

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.5/40.5 kB[0m [31m240.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.7/57.7 kB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m240.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m61.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.1/75.1 kB[0m [31m302.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.5/9.5 MB[0m [31m127.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m124.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━

## 2) Verify versions (run after restart)

In [3]:
import transformers, tokenizers, datasets, accelerate, peft, trl, huggingface_hub, safetensors
print("transformers:", transformers.__version__)
print("tokenizers:", tokenizers.__version__)
print("datasets:", datasets.__version__)
print("accelerate:", accelerate.__version__)
print("peft:", peft.__version__)
print("trl:", trl.__version__)
print("hf hub:", huggingface_hub.__version__)
print("safetensors:", safetensors.__version__)

transformers: 4.44.2
tokenizers: 0.19.1
datasets: 2.20.0
accelerate: 0.33.0
peft: 0.12.0
trl: 0.9.6
hf hub: 0.24.6
safetensors: 0.4.4


## 3) Find JSONL dataset files
Place one or more `*.jsonl` files in:
- `/kaggle/input/<dataset_name>/` (preferred), or
- `/kaggle/working/`

Then run:

In [4]:
from glob import glob

jsonl_files = sorted(glob("/kaggle/working/*.jsonl")) + sorted(glob("/kaggle/input/*/*.jsonl"))
assert jsonl_files, "No .jsonl found. Add your JSONL as a Kaggle Dataset or place it in /kaggle/working."

print("Found JSONL files:")
for f in jsonl_files:
    print(" -", f)

Found JSONL files:
 - /kaggle/input/final-hydraulics-water-ft-dataset/FINAL_hydraulics_water_FT_dataset.jsonl


## 4) Load + merge JSONL, normalize schema, (optional) dedupe

In [5]:
import re
from datasets import load_dataset, concatenate_datasets

parts = [load_dataset("json", data_files=f, split="train") for f in jsonl_files]
raw = concatenate_datasets(parts) if len(parts) > 1 else parts[0]

print(raw)
print("Columns:", raw.column_names)

def normalize_record(ex):
    conv = ex.get("conversations", [])
    if not isinstance(conv, list):
        return {"_valid": False, "conversations": []}

    human = next((m["value"].strip() for m in conv
                  if isinstance(m, dict) and m.get("from")=="human" and isinstance(m.get("value"), str)), None)
    gpt   = next((m["value"].strip() for m in conv
                  if isinstance(m, dict) and m.get("from")=="gpt" and isinstance(m.get("value"), str)), None)

    if not human or not gpt:
        return {"_valid": False, "conversations": []}

    return {"_valid": True, "conversations":[{"from":"human","value":human},{"from":"gpt","value":gpt}]}

ds = raw.map(normalize_record).filter(lambda x: x["_valid"])
ds = ds.remove_columns([c for c in ds.column_names if c not in ["conversations"]])

print("Valid records:", len(ds))
print("Sample prompt preview:\n", ds[0]["conversations"][0]["value"][:200])

# Optional dedupe by normalized prompt text:
DEDUP = True
if DEDUP:
    def prompt_key(ex):
        p = ex["conversations"][0]["value"].lower()
        p = re.sub(r"[^\w\s]", "", p)
        p = re.sub(r"\s+", " ", p).strip()
        return {"_k": p}
    tmp = ds.map(prompt_key)
    seen = set()
    keep_idx = []
    for i, k in enumerate(tmp["_k"]):
        if k in seen:
            continue
        seen.add(k)
        keep_idx.append(i)
    ds = ds.select(keep_idx)
    print("After dedupe:", len(ds))

Dataset({
    features: ['conversations'],
    num_rows: 68
})
Columns: ['conversations']
Valid records: 68
Sample prompt preview:
 How is water usage factored for schools that operate for 8 hours a day?
After dedupe: 68


## 5) Train / eval split (stratify if conclusion labels exist)

In [6]:
import re

def extract_label(ex):
    text = ex["conversations"][1]["value"]
    m = re.search(r"## Conclusion\s*\(Pass / Fail / Cannot verify\)\s*\n\s*(Pass|Fail|Cannot verify)\b", text)
    return {"label": m.group(1) if m else "Unknown"}

labeled = ds.map(extract_label)
labels = set(labeled.unique("label"))
print("Labels found:", labels)

EVAL_SIZE = 0.10
SEED = 42

try:
    usable = labeled.filter(lambda x: x["label"] != "Unknown")
    if labels.issuperset({"Pass","Fail","Cannot verify"}) and len(usable) >= 50:
        splits = usable.train_test_split(test_size=EVAL_SIZE, seed=SEED, shuffle=True, stratify_by_column="label")
    else:
        splits = labeled.train_test_split(test_size=EVAL_SIZE, seed=SEED, shuffle=True)
except Exception as e:
    print("Stratified split not available, using normal split. Reason:", e)
    splits = labeled.train_test_split(test_size=EVAL_SIZE, seed=SEED, shuffle=True)

train_ds = splits["train"]
eval_ds  = splits["test"]

print("Train:", len(train_ds), "Eval:", len(eval_ds))

Labels found: {'Cannot verify', 'Unknown'}
Train: 61 Eval: 7


## 6) Load Phi‑3.5 + build chat-template `text`

In [7]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_NAME = "microsoft/Phi-3.5-mini-instruct"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="auto",
)

def to_text(ex):
    conv = ex["conversations"]
    msgs = [
        {"role":"user", "content": conv[0]["value"]},
        {"role":"assistant", "content": conv[1]["value"]},
    ]
    return {"text": tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False)}

train_text = train_ds.map(to_text, remove_columns=train_ds.column_names)
eval_text  = eval_ds.map(to_text,  remove_columns=eval_ds.column_names)

print("Text dataset ready.")
print(train_text[0]["text"][:450])

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Text dataset ready.
<|user|>
What is the maximum distance from a siamese connection to a hydrant?<|end|>
<|assistant|>
The hydrant must be located no more than 45 metres unobstructed from the siamese connection.<|end|>
<|endoftext|>


## 7) Train (T4‑safe LoRA settings)
If you see out‑of‑memory:
- set `MAX_SEQ_LEN=768` and/or
- increase `GRAD_ACCUM` to 24

In [8]:
from trl import SFTTrainer
from transformers import TrainingArguments
from peft import LoraConfig

OUTPUT_DIR = "/kaggle/working/phi35_lora_out"

# ---- T4-safe defaults ----
MAX_SEQ_LEN = 1024
LORA_R = 8
GRAD_ACCUM = 16
EPOCHS = 3

peft_cfg = LoraConfig(
    r=LORA_R,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
)

args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=GRAD_ACCUM,
    learning_rate=2e-4,
    warmup_ratio=0.03,
    num_train_epochs=EPOCHS,
    logging_steps=10,
    eval_strategy="steps",
    eval_steps=100,
    save_strategy="steps",
    save_steps=100,
    save_total_limit=2,
    report_to="none",
    fp16=True,
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_text,
    eval_dataset=eval_text,
    dataset_text_field="text",
    max_seq_length=MAX_SEQ_LEN,
    packing=False,
    peft_config=peft_cfg,
    args=args,
)

trainer.train()

2025-12-31 00:23:18.783653: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1767140598.957108     894 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1767140599.009032     894 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
You are not running the flash-attention implementation, expect numerical differences.


Step,Training Loss,Validation Loss


TrainOutput(global_step=9, training_loss=3.5306890275743275, metrics={'train_runtime': 43.1541, 'train_samples_per_second': 4.241, 'train_steps_per_second': 0.209, 'total_flos': 603757371291648.0, 'train_loss': 3.5306890275743275, 'epoch': 2.360655737704918})

## 8) Save adapter + tokenizer

In [9]:
trainer.model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

print("Saved to:", OUTPUT_DIR)
!ls -lah {OUTPUT_DIR}

Saved to: /kaggle/working/phi35_lora_out
total 20M
drwxr-xr-x 3 root root 4.0K Dec 30 23:14 .
drwxr-xr-x 4 root root 4.0K Dec 30 23:13 ..
-rw-r--r-- 1 root root  733 Dec 31 00:24 adapter_config.json
-rw-r--r-- 1 root root  18M Dec 31 00:24 adapter_model.safetensors
-rw-r--r-- 1 root root  293 Dec 31 00:24 added_tokens.json
drwxr-xr-x 2 root root 4.0K Dec 30 23:14 checkpoint-9
-rw-r--r-- 1 root root 5.0K Dec 31 00:24 README.md
-rw-r--r-- 1 root root  569 Dec 31 00:24 special_tokens_map.json
-rw-r--r-- 1 root root 3.3K Dec 31 00:24 tokenizer_config.json
-rw-r--r-- 1 root root 1.8M Dec 31 00:24 tokenizer.json
-rw-r--r-- 1 root root 489K Dec 31 00:24 tokenizer.model


## 9) Quick inference check

In [10]:
from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device_map="auto")

prompt = (
    "Water distribution review only. "
    "The report provides a hydrant test but no node pressures at the proposed connection. "
    "Provide required resubmission items."
)

out = pipe(prompt, max_new_tokens=220, do_sample=True, temperature=0.7, top_p=0.9)
print(out[0]["generated_text"])

RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
operator torchvision::nms does not exist

In [None]:
import torch

model.eval()

prompt = (
    "Water distribution review only. "
    "The report provides a hydrant test but no node pressures at the proposed connection. "
    "Provide required resubmission items."
)

messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

inputs = inputs.to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        inputs,
        max_new_tokens=220,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
    )

print(tokenizer.decode(output_ids[0], skip_special_tokens=True))


In [None]:
!pip -q install -U transformers==4.44.2 peft==0.12.0 safetensors sentencepiece huggingface_hub

# llama.cpp conversion requirements
!pip -q install -U numpy sentencepiece


In [None]:
!pip -q install --no-cache-dir --force-reinstall "numpy==1.26.4"
print("Done. Restart Session now.")


In [None]:
import numpy as np
print(np.__version__)


In [None]:
# Minimal, conversion-only environment (avoid TRL/accelerate unless you need them)
!pip -q uninstall -y transformers tokenizers huggingface-hub safetensors peft sentencepiece numpy
!pip -q install --no-cache-dir --force-reinstall \
  "numpy==1.26.4" \
  "transformers==4.44.2" \
  "tokenizers==0.19.1" \
  "huggingface-hub==0.24.6" \
  "safetensors==0.4.4" \
  "peft==0.12.0" \
  "sentencepiece"

print("Install complete. Restart Session now.")


In [11]:
import torch, numpy as np
print("CUDA:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
print("NumPy:", np.__version__)


CUDA: True
GPU: Tesla T4
NumPy: 1.26.4


In [12]:
!pip -q uninstall -y transformers tokenizers huggingface-hub safetensors peft sentencepiece numpy
!pip -q install --no-cache-dir --force-reinstall \
  "numpy==1.26.4" \
  "transformers==4.44.2" \
  "tokenizers==0.19.1" \
  "huggingface-hub==0.24.6" \
  "safetensors==0.4.4" \
  "peft==0.12.0" \
  "sentencepiece"

print("Install complete. Restart Session one more time, then continue.")


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m198.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.5/40.5 kB[0m [31m236.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.7/57.7 kB[0m [31m68.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m286.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.5/9.5 MB[0m [31m162.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m323.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m417.5/417.5 kB[0m [31m271.0 MB/s[0m eta [36m0:00:00[0m
[2K

In [13]:
import numpy, transformers, peft
print("numpy:", numpy.__version__)
print("transformers:", transformers.__version__)
print("peft:", peft.__version__)


numpy: 1.26.4
transformers: 4.44.2
peft: 0.12.0


In [14]:
import os
LORA_DIR = "/kaggle/working/phi35_lora_out"
print("adapter exists:", os.path.exists(os.path.join(LORA_DIR, "adapter_model.safetensors")))
print("config exists :", os.path.exists(os.path.join(LORA_DIR, "adapter_config.json")))


adapter exists: True
config exists : True


In [15]:
import os, torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

BASE_MODEL = "microsoft/Phi-3.5-mini-instruct"
BASE_DIR   = "/kaggle/working/phi35_base_hf"
MERGED_DIR = "/kaggle/working/phi35_merged_hf"

# download base once
snapshot_download(repo_id=BASE_MODEL, local_dir=BASE_DIR, local_dir_use_symlinks=False)

tokenizer = AutoTokenizer.from_pretrained(BASE_DIR, use_fast=True)

base = AutoModelForCausalLM.from_pretrained(
    BASE_DIR,
    torch_dtype=torch.float32,
    device_map="cpu",
)

model = PeftModel.from_pretrained(base, LORA_DIR)
merged = model.merge_and_unload()

os.makedirs(MERGED_DIR, exist_ok=True)
merged.save_pretrained(MERGED_DIR, safe_serialization=True)
tokenizer.save_pretrained(MERGED_DIR)

print("Merged HF saved to:", MERGED_DIR)
!ls -lah {MERGED_DIR}


For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.


Fetching 20 files:   0%|          | 0/20 [00:00<?, ?it/s]

README.md: 0.00B [00:00, ?B/s]

CODE_OF_CONDUCT.md:   0%|          | 0.00/453 [00:00<?, ?B/s]

SECURITY.md: 0.00B [00:00, ?B/s]

configuration_phi3.py: 0.00B [00:00, ?B/s]

.gitattributes: 0.00B [00:00, ?B/s]

NOTICE.md: 0.00B [00:00, ?B/s]

LICENSE: 0.00B [00:00, ?B/s]

modeling_phi3.py: 0.00B [00:00, ?B/s]

data_summary_card.md: 0.00B [00:00, ?B/s]

sample_finetune.py: 0.00B [00:00, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

SafetensorError: Error while serializing: IoError(Os { code: 28, kind: StorageFull, message: "No space left on device" })

In [1]:
import torch, os, shutil
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
from huggingface_hub import snapshot_download

BASE_MODEL = "microsoft/Phi-3.5-mini-instruct"
BASE_DIR   = "/kaggle/working/phi35_base_hf"
LORA_DIR   = "/kaggle/working/phi35_lora_out"
GGUF_OUT   = "/kaggle/working/phi35_merged.gguf"

# download base
snapshot_download(
    repo_id=BASE_MODEL,
    local_dir=BASE_DIR,
    local_dir_use_symlinks=False
)

tokenizer = AutoTokenizer.from_pretrained(BASE_DIR, use_fast=True)

base = AutoModelForCausalLM.from_pretrained(
    BASE_DIR,
    torch_dtype=torch.float32,
    device_map="cpu",
)

model = PeftModel.from_pretrained(base, LORA_DIR)
merged = model.merge_and_unload()

# --- TEMP save just for conversion ---
TMP_DIR = "/kaggle/working/tmp_merge"
os.makedirs(TMP_DIR, exist_ok=True)

merged.save_pretrained(
    TMP_DIR,
    safe_serialization=False,     # <-- IMPORTANT
    max_shard_size="2GB"          # <-- prevents single large file
)
tokenizer.save_pretrained(TMP_DIR)

# convert to GGUF
!git clone -q https://github.com/ggml-org/llama.cpp.git /kaggle/working/llama.cpp
!python /kaggle/working/llama.cpp/convert_hf_to_gguf.py {TMP_DIR} --outfile {GGUF_OUT}

# cleanup immediately
shutil.rmtree(TMP_DIR)
shutil.rmtree(BASE_DIR)

print("GGUF created:", GGUF_OUT)


For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.


Fetching 20 files:   0%|          | 0/20 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

OSError: [Errno 28] No space left on device: '/kaggle/working/tmp_merge'

In [2]:
!df -h
!du -h --max-depth=2 /kaggle/working | sort -hr | head -n 30


Filesystem                                                              Size  Used Avail Use% Mounted on
overlay                                                                 7.9T  6.5T  1.5T  82% /
tmpfs                                                                    64M     0   64M   0% /dev
shm                                                                      14G  4.0K   14G   1% /dev/shm
/dev/sdb1                                                               122G  112G   11G  92% /opt/bin
/dev/loop1                                                               20G   20G     0 100% /kaggle/lib
192.168.3.2:/data/kagglesdsdata/datasets/9161641/14348447/dfbwdah4v0fy   73T   53T   21T  73% /kaggle/input/final-hydraulics-water-ft-dataset
/dev/mapper/snap                                                        7.9T  6.5T  1.5T  82% /etc/hosts
tmpfs                                                                    16G     0   16G   0% /proc/acpi
tmpfs                               

In [3]:
# Remove the huge folders filling /kaggle/working
!rm -rf /kaggle/working/phi35_merged_hf
!rm -rf /kaggle/working/phi35_base_hf

# Optional: remove training checkpoints (keep adapter only)
!rm -rf /kaggle/working/phi35_lora_out/checkpoint-*

# Verify space is freed
!df -h
!du -h --max-depth=1 /kaggle/working | sort -hr


Filesystem                                                              Size  Used Avail Use% Mounted on
overlay                                                                 7.9T  6.5T  1.5T  82% /
tmpfs                                                                    64M     0   64M   0% /dev
shm                                                                      14G  4.0K   14G   1% /dev/shm
/dev/sdb1                                                               122G  112G   11G  92% /opt/bin
/dev/loop1                                                               20G   20M   20G   1% /kaggle/lib
192.168.3.2:/data/kagglesdsdata/datasets/9161641/14348447/dfbwdah4v0fy   73T   53T   21T  73% /kaggle/input/final-hydraulics-water-ft-dataset
/dev/mapper/snap                                                        7.9T  6.5T  1.5T  82% /etc/hosts
tmpfs                                                                    16G     0   16G   0% /proc/acpi
tmpfs                               

In [1]:
import os, shutil, torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# ---- Paths ----
LORA_DIR = "/kaggle/working/phi35_lora_out"
BASE_MODEL = "microsoft/Phi-3.5-mini-instruct"

BASE_DIR = "/kaggle/temp/phi35_base_hf"
TMP_DIR  = "/kaggle/temp/tmp_merge"
GGUF_OUT = "/kaggle/working/phi35_merged.gguf"

# Keep caches out of /kaggle/working
os.environ["HF_HOME"] = "/kaggle/temp/hf"
os.environ["TRANSFORMERS_CACHE"] = "/kaggle/temp/hf/transformers"

# Clean temp dirs
shutil.rmtree(BASE_DIR, ignore_errors=True)
shutil.rmtree(TMP_DIR, ignore_errors=True)

# ---- Download base model (to temp) ----
snapshot_download(repo_id=BASE_MODEL, local_dir=BASE_DIR)

tokenizer = AutoTokenizer.from_pretrained(BASE_DIR, use_fast=True)

base = AutoModelForCausalLM.from_pretrained(
    BASE_DIR,
    torch_dtype=torch.float32,
    device_map="cpu",
)

# ---- Merge LoRA ----
model = PeftModel.from_pretrained(base, LORA_DIR)
merged = model.merge_and_unload()

# ---- Temporary HF save (sharded, non-safetensors) ----
os.makedirs(TMP_DIR, exist_ok=True)
merged.save_pretrained(
    TMP_DIR,
    safe_serialization=False,
    max_shard_size="2GB",
)
tokenizer.save_pretrained(TMP_DIR)

# ---- Convert to GGUF ----
LLAMA_DIR = "/kaggle/temp/llama.cpp"
shutil.rmtree(LLAMA_DIR, ignore_errors=True)
!git clone -q https://github.com/ggml-org/llama.cpp.git {LLAMA_DIR}

!python {LLAMA_DIR}/convert_hf_to_gguf.py {TMP_DIR} --outfile {GGUF_OUT}

print("✅ GGUF created:", GGUF_OUT)
!ls -lah /kaggle/working


2025-12-31 13:36:25.424858: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1767188185.577320      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1767188185.625307      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1767188186.000970      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767188186.001023      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767188186.001026      55 computation_placer.cc:177] computation placer alr

Fetching 20 files:   0%|          | 0/20 [00:00<?, ?it/s]

NOTICE.md: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

LICENSE: 0.00B [00:00, ?B/s]

CODE_OF_CONDUCT.md:   0%|          | 0.00/453 [00:00<?, ?B/s]

data_summary_card.md: 0.00B [00:00, ?B/s]

.gitattributes: 0.00B [00:00, ?B/s]

SECURITY.md: 0.00B [00:00, ?B/s]

README.md: 0.00B [00:00, ?B/s]

generation_config.json:   0%|          | 0.00/195 [00:00<?, ?B/s]

configuration_phi3.py: 0.00B [00:00, ?B/s]

sample_finetune.py: 0.00B [00:00, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

modeling_phi3.py: 0.00B [00:00, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

ValueError: Can't find 'adapter_config.json' at '/kaggle/working/phi35_lora_out'

In [2]:
!ls -lah /kaggle/working
!ls -lah /kaggle/working/phi35_lora_out || true


total 12K
drwxr-xr-x 3 root root 4.0K Dec 31 13:36 .
drwxr-xr-x 6 root root 4.0K Dec 31 13:36 ..
drwxr-xr-x 2 root root 4.0K Dec 31 13:36 .virtual_documents
ls: cannot access '/kaggle/working/phi35_lora_out': No such file or directory


In [7]:
!find /kaggle/input -maxdepth 3 -type f -name "adapter_model.safetensors" -o -name "adapter_config.json"


In [11]:
LORA_DIR = "/kaggle/input/<something>/phi35_lora_out"


In [14]:
! find /kaggle -maxdepth 4 -type f -name "adapter_config.json"
! find /kaggle -maxdepth 4 -type f -name "adapter_model.safetensors"


In [18]:
! kaggle/input/phi35-lora-adapter


/bin/bash: line 1: kaggle/input/phi35-lora-adapter: No such file or directory


In [22]:
! LORA_DIR = "/kaggle/input/phi35-lora-adapter"


/bin/bash: line 1: LORA_DIR: command not found


In [23]:
import os
print(os.listdir(LORA_DIR))


['adapter_model.safetensors', 'adapter_config.json', 'tokenizer.json']


In [25]:
import os, json
print("LORA_DIR =", LORA_DIR)
print("Exists:", os.path.isdir(LORA_DIR))
print("Has adapter_config:", os.path.exists(os.path.join(LORA_DIR, "adapter_config.json")))

with open(os.path.join(LORA_DIR, "adapter_config.json"), "r") as f:
    cfg = json.load(f)
print("Base model in adapter_config:", cfg.get("base_model_name_or_path"))


LORA_DIR = /kaggle/input/phi35-lora-adapter
Exists: True
Has adapter_config: True
Base model in adapter_config: microsoft/Phi-3.5-mini-instruct


In [1]:
# Example:
LORA_DIR = "/kaggle/input/phi35-lora-adapter"

import os
print("LORA_DIR:", LORA_DIR)
print(os.listdir(LORA_DIR))


LORA_DIR: /kaggle/input/phi35-lora-adapter
['adapter_model.safetensors', 'adapter_config.json', 'tokenizer.json']


In [2]:
import os, shutil, torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig

BASE_MODEL = "microsoft/Phi-3.5-mini-instruct"

# Keep big downloads + caches out of /kaggle/working
os.environ["HF_HOME"] = "/kaggle/temp/hf"
os.environ["TRANSFORMERS_CACHE"] = "/kaggle/temp/hf/transformers"

BASE_DIR = "/kaggle/temp/phi35_base_hf"
TMP_DIR  = "/kaggle/temp/tmp_merge"
LLAMA_DIR= "/kaggle/temp/llama.cpp"
GGUF_OUT = "/kaggle/working/phi35_merged.gguf"

# Clean temp
shutil.rmtree(BASE_DIR, ignore_errors=True)
shutil.rmtree(TMP_DIR, ignore_errors=True)
shutil.rmtree(LLAMA_DIR, ignore_errors=True)

# Download base to temp
snapshot_download(repo_id=BASE_MODEL, local_dir=BASE_DIR)

tokenizer = AutoTokenizer.from_pretrained(BASE_DIR, use_fast=True)

# Load base (CPU to minimize GPU RAM risk)
base = AutoModelForCausalLM.from_pretrained(
    BASE_DIR,
    torch_dtype=torch.float32,
    device_map="cpu",
)

# Load adapter config explicitly (prevents PEFT treating path as HF repo)
peft_cfg = PeftConfig.from_pretrained(LORA_DIR)

# Merge LoRA
model = PeftModel.from_pretrained(base, LORA_DIR, config=peft_cfg, is_trainable=False)
merged = model.merge_and_unload()

# Save merged temporarily (sharded, no safetensors to reduce failure risk)
os.makedirs(TMP_DIR, exist_ok=True)
merged.save_pretrained(
    TMP_DIR,
    safe_serialization=False,
    max_shard_size="2GB",
)
tokenizer.save_pretrained(TMP_DIR)

# Convert to GGUF
!git clone -q https://github.com/ggml-org/llama.cpp.git {LLAMA_DIR}
!python {LLAMA_DIR}/convert_hf_to_gguf.py {TMP_DIR} --outfile {GGUF_OUT}

print("✅ GGUF created:", GGUF_OUT)
!ls -lah /kaggle/working/*.gguf


2025-12-31 14:34:13.575624: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1767191653.764551      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1767191653.822667      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1767191654.279338      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767191654.279387      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1767191654.279392      55 computation_placer.cc:177] computation placer alr

Fetching 20 files:   0%|          | 0/20 [00:00<?, ?it/s]

config.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

configuration_phi3.py: 0.00B [00:00, ?B/s]

README.md: 0.00B [00:00, ?B/s]

SECURITY.md: 0.00B [00:00, ?B/s]

CODE_OF_CONDUCT.md:   0%|          | 0.00/453 [00:00<?, ?B/s]

LICENSE: 0.00B [00:00, ?B/s]

.gitattributes: 0.00B [00:00, ?B/s]

NOTICE.md: 0.00B [00:00, ?B/s]

generation_config.json:   0%|          | 0.00/195 [00:00<?, ?B/s]

data_summary_card.md: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

modeling_phi3.py: 0.00B [00:00, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

sample_finetune.py: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

INFO:hf-to-gguf:Loading model: tmp_merge
INFO:hf-to-gguf:Model architecture: Phi3ForCausalLM
INFO:hf-to-gguf:gguf: loading model weight map from 'pytorch_model.bin.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'pytorch_model-00001-of-00009.bin'
INFO:hf-to-gguf:gguf: indexing model part 'pytorch_model-00002-of-00009.bin'
INFO:hf-to-gguf:gguf: indexing model part 'pytorch_model-00003-of-00009.bin'
INFO:hf-to-gguf:gguf: indexing model part 'pytorch_model-00004-of-00009.bin'
INFO:hf-to-gguf:gguf: indexing model part 'pytorch_model-00005-of-00009.bin'
INFO:hf-to-gguf:gguf: indexing model part 'pytorch_model-00006-of-00009.bin'
INFO:hf-to-gguf:gguf: indexing model part 'pytorch_model-00007-of-00009.bin'
INFO:hf-to-gguf:gguf: indexing model part 'pytorch_model-00008-of-00009.bin'
INFO:hf-to-gguf:gguf: indexing model part 'pytorch_model-00009-of-00009.bin'
INFO:hf-to-gguf:heuristics unable to detect tensor dtype, defaulting to --outtype f16
INFO:gguf.gguf_writer:gguf: This GGUF file is