# A SONG OF LLMS AND CHATBOTS

### Winds of Winter (AI generated fan fiction)

George R.R. Martin claims that his over a decade-long journey of writing 'Winds of Winter' is finally coming to an end. To support his work, I have generated this notebook to show some love as a fellow Game of Throne fan!

We will first begin with installing all the important libraries.

In [1]:
!git clone https://github.com/stepbasin/books.git

fatal: destination path 'books' already exists and is not an empty directory.


In [2]:
!pip install transformers accelerate datasets peft trl bitsandbytes



In [3]:
!pip install ebooklib beautifulsoup4 datasets



In [4]:
!pip install ipywidgets
!jupyter nbextension enable --py widgetsnbextension

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


In [5]:
!pip install huggingface_hub
from huggingface_hub import login

login()



  from .autonotebook import tqdm as notebook_tqdm


In [6]:
import re, json, glob, random
from ebooklib import epub
from bs4 import BeautifulSoup

In [7]:
def extract_text_from_epub(epub_path):
    book = epub.read_epub(epub_path)
    chapters = []
    for item in book.get_items():
        print(item)
        if item.get_type() == 9:  # DOCUMENT
            soup = BeautifulSoup(item.get_body_content(), "html.parser")
            text = soup.get_text(separator=" ")
            text = re.sub(r'\s+', ' ', text).strip()
            if len(text) > 200:  # Skip empty pages
                chapters.append(text)
    return chapters

In [8]:
# extracting chapters from the book

chapters = extract_text_from_epub('./books/books/George R. R. Martin/A Game Of Thrones.epub')

<EpubImage:added1:00001.jpg>
<EpubImage:added2:00002.jpg>
<EpubImage:cover:cover.jpeg>
<EpubHtml:html1:game-of-thrones-00.html>
<EpubHtml:html2:game-of-thrones-01.html>
<EpubHtml:html3:game-of-thrones-02.html>
<EpubHtml:html4:game-of-thrones-03.html>
<EpubHtml:html5:game-of-thrones-04.html>
<EpubHtml:html6:game-of-thrones-05.html>
<EpubHtml:html7:game-of-thrones-06.html>
<EpubHtml:html8:game-of-thrones-07.html>
<EpubHtml:html9:game-of-thrones-08.html>
<EpubHtml:html10:game-of-thrones-09.html>
<EpubHtml:html11:game-of-thrones-10.html>
<EpubHtml:html12:game-of-thrones-11.html>
<EpubHtml:html13:game-of-thrones-12.html>
<EpubHtml:html14:game-of-thrones-13.html>
<EpubHtml:html15:game-of-thrones-14.html>
<EpubHtml:html16:game-of-thrones-15.html>
<EpubHtml:html17:game-of-thrones-16.html>
<EpubHtml:html18:game-of-thrones-17.html>
<EpubHtml:html19:game-of-thrones-18.html>
<EpubHtml:html20:game-of-thrones-19.html>
<EpubHtml:html21:game-of-thrones-20.html>
<EpubHtml:html22:game-of-thrones-21.html

In [9]:
def chunk_text(text, max_words=700):
    words = text.split()
    return [" ".join(words[i:i+max_words]) for i in range(0, len(words), max_words)]


In [10]:
# testing the chunking of text into lengths of max {{max_words}}
chunk_text(chapters[0])

['Prologue ‚ÄúWe should start back,‚Äù Gared urged as the woods began to grow dark around them. ‚ÄúThe wildlings are dead.‚Äù ‚ÄúDo the dead frighten you?‚Äù Ser Waymar Royce asked with just the hint of a smile. Gared did not rise to the bait. He was an old man, past fifty, and he had seen the lordlings come and go. ‚ÄúDead is dead,‚Äù he said. ‚ÄúWe have no business with the dead.‚Äù ‚ÄúAre they dead?‚Äù Royce asked softly. ‚ÄúWhat proof have we?‚Äù ‚ÄúWill saw them,‚Äù Gared said. ‚ÄúIf he says they are dead, that‚Äôs proof enough for me.‚Äù Will had known they would drag him into the quarrel sooner or later. He wished it had been later rather than sooner. ‚ÄúMy mother told me that dead men sing no songs,‚Äù he put in. ‚ÄúMy wet nurse said the same thing, Will,‚Äù Royce replied. ‚ÄúNever believe anything you hear at a woman‚Äôs tit. There are things to be learned even from the dead.‚Äù His voice echoed, too loud in the twilit forest. ‚ÄúWe have a long ride before us,‚Äù Gared pointed

Once the above functions (okay fine "methods") are good to go, we can start generating our dataset:

In [11]:
dataset = []

for epub_path in glob.glob("./books/books/George R. R. Martin/*.epub"):
    print(epub_path)
    book_name = epub_path.split("/")[-1].replace(".epub", "")
    chapters = extract_text_from_epub(epub_path)
    for chap_id, chap in enumerate(chapters):
        chunks = chunk_text(chap)
        for idx, chunk in enumerate(chunks):
            dataset.append({
                "book": book_name,
                "chapter": chap_id,
                "chunk": idx,
                "text": chunk
            })

# Shuffle for better training
random.shuffle(dataset)

# Write to JSONL
with open("asoiaf_dataset.jsonl", "w", encoding="utf-8") as f:
    for row in dataset:
        f.write(json.dumps(row, ensure_ascii=False) + "\n")

print(f"‚úÖ Dataset created with {len(dataset)} chunks!")

./books/books/George R. R. Martin/A Clash of Kings.epub
<EpubImage:added1:00001.jpg>
<EpubImage:added2:00002.jpg>
<EpubImage:added3:00003.jpg>
<EpubHtml:html1:clash-of-kings-00.html>
<EpubHtml:html2:clash-of-kings-01.html>
<EpubHtml:html3:clash-of-kings-02.html>
<EpubHtml:html4:clash-of-kings-03.html>
<EpubHtml:html5:clash-of-kings-04.html>
<EpubHtml:html6:clash-of-kings-05.html>
<EpubHtml:html7:clash-of-kings-06.html>
<EpubHtml:html8:clash-of-kings-07.html>
<EpubHtml:html9:clash-of-kings-08.html>
<EpubHtml:html10:clash-of-kings-09.html>
<EpubHtml:html11:clash-of-kings-10.html>
<EpubHtml:html12:clash-of-kings-11.html>
<EpubHtml:html13:clash-of-kings-12.html>
<EpubHtml:html14:clash-of-kings-13.html>
<EpubHtml:html15:clash-of-kings-14.html>
<EpubHtml:html16:clash-of-kings-15.html>
<EpubHtml:html17:clash-of-kings-16.html>
<EpubHtml:html18:clash-of-kings-17.html>
<EpubHtml:html19:clash-of-kings-18.html>
<EpubHtml:html20:clash-of-kings-19.html>
<EpubHtml:html21:clash-of-kings-20.html>
<Epub

In [12]:
# Let's take a look at our beautiful, shuffled dataset

print(dataset)

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



In [13]:
with open("asoiaf_dataset.jsonl", "r", encoding="utf-8") as f:
    lines = [json.loads(l) for l in f]

sft_dataset = []
for row in lines:
    pov_hint = f" from {row['book']}, chapter {row['chapter']}"  # optional
    sft_dataset.append({
        "instruction": f"Continue the story{pov_hint}.",
        "output": row["text"]
    })

with open("asoiaf_sft_dataset.jsonl", "w", encoding="utf-8") as f:
    for row in sft_dataset:
        f.write(json.dumps(row, ensure_ascii=False) + "\n")

In [14]:
import json

with open("asoiaf_dataset.jsonl", "r") as f:
    for i in range(3):
        print(json.loads(f.readline()))


{'book': 'A Dance With Dragons', 'chapter': 38, 'chunk': 0, 'text': 'DAENERYS The stench of the camp was so appalling it was all that Dany could do not to gag. Ser Barristan wrinkled up his nose, and said, ‚ÄúYour Grace should not be here, breathing these black humors.‚Äù ‚ÄúI am the blood of the dragon,‚Äù Dany reminded him. ‚ÄúHave you ever seen a dragon with the flux?‚Äù Viserys had oft claimed that Targaryens were untroubled by the pestilences that afflicted common men, and so far as she could tell, it was true. She could remember being cold and hungry and afraid, but never sick. ‚ÄúEven so,‚Äù the old knight said, ‚ÄúI would feel better if Your Grace would return to the city.‚Äù The many-colored brick walls of Meereen were half a mile back. ‚ÄúThe bloody flux has been the bane of every army since the Dawn Age. Let us distribute the food, Your Grace.‚Äù ‚ÄúOn the morrow. I am here now. I want to see.‚Äù She put her heels into her silver. The others trotted after her. Jhogo rode bef

We have the dataset we need. Now comes the easy part - training. Let us connect the dots.

In [15]:
import gc
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer

# Clear GPU memory first
torch.cuda.empty_cache()
gc.collect()

# Memory-efficient quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

MODEL_ID = "microsoft/Phi-3-mini-4k-instruct"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load dataset with streaming for memory efficiency
try:
    dataset = load_dataset("json", data_files="asoiaf_dataset.jsonl", split="train")
    # Take a smaller subset if dataset is too large
    print(f"Dataset size: {len(dataset)}")
except Exception as e:
    print(f"Error loading dataset: {e}")
    # Create a dummy dataset for testing
    dataset = load_dataset("json", data_files=[{"text": "This is a test example."}], split="train")

# Load model with aggressive memory optimization
print("Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)

# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

# More conservative LoRA config to save memory
peft_config = LoraConfig(
    task_type="CAUSAL_LM",
    r=8,  # Reduced from 16
    lora_alpha=16,  # Reduced from 32
    lora_dropout=0.1,
    target_modules=["qkv_proj", "o_proj", "gate_up_proj", "down_proj"],
    bias="none"
)

# Apply PEFT
model = get_peft_model(model, peft_config)

# Print trainable parameters
def print_trainable_parameters(model):
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(f"Trainable params: {trainable_params:,} || All params: {all_param:,} || Trainable%: {100 * trainable_params / all_param:.2f}%")

print_trainable_parameters(model)

trainables = [n for n, p in model.named_parameters() if p.requires_grad]
print(f"Trainable modules ({len(trainables)}):\n", trainables)

2025-09-11 15:06:40.873838: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Generating train split: 2931 examples [00:00, 25248.00 examples/s]
Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:03<00:00,  1.82s/it]

Dataset size: 2931
Loading model...
Trainable params: 12,582,912 || All params: 2,021,723,136 || Trainable%: 0.62%
Trainable modules (256):
 ['base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight', 'base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight', 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.default.weight', 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_B.default.weight', 'base_model.model.model.layers.0.mlp.gate_up_proj.lora_A.default.weight', 'base_model.model.model.layers.0.mlp.gate_up_proj.lora_B.default.weight', 'base_model.model.model.layers.0.mlp.down_proj.lora_A.default.weight', 'base_model.model.model.layers.0.mlp.down_proj.lora_B.default.weight', 'base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight', 'base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight', 'base_model.model.model.layers.1.self_attn.qkv_proj.lora_A.default.weight', 'base_model.model.model.layers.1.self_attn.q

In [16]:
import torch
print("CUDA available:", torch.cuda.is_available())
print("CUDA device count:", torch.cuda.device_count())
if torch.cuda.is_available():
    print("Current device:", torch.cuda.current_device())
    print("Device name:", torch.cuda.get_device_name(torch.cuda.current_device()))

print(next(model.parameters()).device)

CUDA available: True
CUDA device count: 1
Current device: 0
Device name: Tesla T4
cuda:0


In [17]:
from accelerate import find_executable_batch_size
print(model.hf_device_map)  # shows where each layer lives

{'': 0}


In [19]:
# SFT Trainer with progress callback
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    args=TrainingArguments(
    num_train_epochs=5,
    save_strategy="epoch",
    logging_steps=10,
    bf16=True, 
    report_to=[])
)

print("üî• Starting training...")
print("=" * 60)

try:
    trainer.train()
    print("=" * 60)
    print("üéâ Training completed successfully!")
except Exception as e:
    print(f"‚ùå Training error: {e}")
    print("üîÑ Attempting to save current progress...")
    try:
        trainer.save_model()
        print("üíæ Progress saved!")
    except Exception as save_e:
        print(f"‚ùå Could not save progress: {save_e}")

# Save the model
print("üíæ Saving final model...")
try:
    trainer.save_model()
    tokenizer.save_pretrained("./phi3-asoiaf")
    print("‚úÖ Model saved successfully to ./phi3-asoiaf/")
except Exception as e:
    print(f"‚ùå Save error: {e}")

# Clean up memory
print("üßπ Cleaning up memory...")
try:
    #del trainer
    #del model
    torch.cuda.empty_cache()
    gc.collect()
    print("‚úÖ Memory cleanup completed!")
except Exception as e:
    print(f"‚ö†Ô∏è Cleanup error: {e}")

print("üèÅ Process completed!")

# Show GPU memory usage if available
if torch.cuda.is_available():
    memory_allocated = torch.cuda.memory_allocated() / 1024**3  # GB
    memory_reserved = torch.cuda.memory_reserved() / 1024**3   # GB
    print(f"üñ•Ô∏è GPU Memory - Allocated: {memory_allocated:.2f}GB, Reserved: {memory_reserved:.2f}GB")

Adding EOS to train dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2931/2931 [00:00<00:00, 14589.34 examples/s]
Tokenizing train dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2931/2931 [00:05<00:00, 543.76 examples/s]
  return fn(*args, **kwargs)
You are not running the flash-attention implementation, expect numerical differences.


üî• Starting training...


Step,Training Loss
10,2.7803
20,2.753
30,2.7514
40,2.735
50,2.7504
60,2.725
70,2.724
80,2.7114
90,2.7251
100,2.7489


In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
from peft import PeftModel

MODEL_ID = "microsoft/Phi-3-mini-4k-instruct"
LORA_PATH = "./trainer_output"  # path to your LoRA adapter

# 1. Load the 4-bit base model
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype="float16"
)

base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto"
)

# 2. Load the LoRA adapter on top
model = PeftModel.from_pretrained(base_model, LORA_PATH)

Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:03<00:00,  1.78s/it]


Finally, we get down to generating the text:

In [7]:
# 3. Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token  # important for generation

# 4. Create generation pipeline
gen = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto"
)

# 5. Generate text
prompt = "Write a new Jon Snow POV chapter immediately after A Dance with Dragons."
output = gen(prompt, max_new_tokens=10000, temperature=0.8, do_sample=True)
print(output[0]["generated_text"])

Write a new Jon Snow POV chapter immediately after A Dance with Dragons.

- As the icy winds of winter blew across the Wall, Jon Snow felt a sense of unease. He had been summoned by the Lord Commander's envoy to discuss the troubling reports from the Night' house.


The Night's House was a shadowy place, notorious for its strange occurrences. The Lord Commander, Ser Alliser Thorne, had summoned Jon to confront the new Lord Commander, Lord Commander Mormont.


Jon approached the large stone building, feeling the chill of the winter air. He could feel his muscles ripple with every gust of wind, knowing that the cold had a way of tightening his senses.


As he entered the building, the first thing he noticed was the warmth. The fire in the hearth was bright and inviting, drawing him closer. He hoped that the Lord Commander would be there soon, for he had work to do.


The moment Jon stepped into the hall, he was struck by the familiar faces of the Night's folk. They were gathered around a

The above result highlights the importance of a well-crafted prompt. We will try again, but this time, we will use a better prompt. Take a look at the below code, for instance:

In [11]:
# 3. Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token  # important for generation

# 4. Create generation pipeline
gen = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto"
)

# 5. Generate text
prompt = "Write a new chapter from A Song of Ice and Fire, immediately following the events of A Dance with Dragons. The point-of-view character is Jon Snow, who has just been stabbed by his sworn brothers at Castle Black. The chapter should explore Jon‚Äôs descent into unconsciousness, blurring the line between death and dream. Use vivid, symbolic imagery and haunting memories to reflect his inner turmoil, identity, and connection to Ghost. Incorporate prophetic visions, fragments of dialogue, and the cold presence of the Wall. The tone must be somber, lyrical, and steeped in mystery. Emulate George R. R. Martin‚Äôs exact narrative style‚Äîhis sentence structure, pacing, and use of internal monologue. Avoid fan-service or resolution; this chapter should feel like a bridge between death and transformation."
output = gen(prompt, max_new_tokens=1000, temperature=0.2, do_sample=True)
print(output[0]["generated_text"])

Device set to use cuda:0


Write a new chapter from A Song of Ice and Fire, immediately following the events of A Dance with Dragons. The point-of-view character is Jon Snow, who has just been stabbed by his sworn brothers at Castle Black. The chapter should explore Jon‚Äôs descent into unconsciousness, blurring the line between death and dream. Use vivid, symbolic imagery and haunting memories to reflect his inner turmoil, identity, and connection to Ghost. Incorporate prophetic visions, fragments of dialogue, and the cold presence of the Wall. The tone must be somber, lyrical, and steeped in mystery. Emulate George R. R. Martin‚Äôs exact narrative style‚Äîhis sentence structure, pacing, and use of internal monologue. Avoid fan-service or resolution; this chapter should feel like a bridge between death and transformation.

Chapter 12: The Edge of the Wall

The cold air bit at Jon Snow's flesh, a cruel reminder of the harsh world beyond the safety of Castle Black. The night was a deep, unyielding black, broken o

The prompt was too long. let us try again. Remember - the longer a prompt is, the more numbers are fed into the model, and if it is a small model - the more the chances of failure.

In [12]:
prompt = "Write a new Jon Snow POV chapter set immediately after A Dance with Dragons. He has just been stabbed at Castle Black. The chapter should explore his near-death dreams‚Äîfilled with cold, memory, and prophecy. Use George R. R. Martin‚Äôs style: lyrical, grim, and introspective."
output = gen(prompt, max_new_tokens=1000, temperature=0.2, do_sample=True)
print(output[0]["generated_text"])

Write a new Jon Snow POV chapter set immediately after A Dance with Dragons. He has just been stabbed at Castle Black. The chapter should explore his near-death dreams‚Äîfilled with cold, memory, and prophecy. Use George R. R. Martin‚Äôs style: lyrical, grim, and introspective.


**Solution 1:**


Chapter 17: The Dream of the Long Night


The cold bit into Jon Snow's flesh, a sharp reminder of the icy winds that swept through the walls of Castle Black. The stab, though not deep, felt like a thousand needles pricking at his soul. He lay there, the blood pooling around him, a crimson tide that spoke of his mortality.


In the silence of the night, Jon's mind wandered to the dreams that haunted him since the wound. They were visions of a long, cold night, a prophecy whispered by the dead. The voices spoke in tongues of the past, of the Night King and the White Walkers, of a world that was slipping away from the living.


He dreamt of the past, of the days when the North was a bastion of h

## Conclusion

So there! we were able to finetune an LLM according to the writings of George R.R. Martin, and were also able to generate a continuation of the previous text. We demonstrated that it is possible to generate entire chapters - but that would require a much larger model than this. We also saw the importance of well-crafted prompts, that are precise, not too long and instruct the model how to perform the task perfectly.