
### ü´é Increase the creativity with random controll

Explore the different techniques to controll randomness generation of text with the aim to: conserve the ability to generate grammatically correct text but increase variance.

---

#### Just build the model

A collapsed version of [training.ipynb](training.ipynb) that rebuilds the model. 

After the first run, the model is saved as a `.pth` file in the current directory and automatically reloaded in all subsequent runs.


In [4]:
import tiktoken, torch, os
from gpt2 import GPTModel,create_dataloader_v1, GPT_CONFIG_124M, generate_text_simple
from tcc import train_model_simple, text_to_token_ids, token_ids_to_text

saved_model = "the-verdict.pth"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GPTModel(GPT_CONFIG_124M)
tokenizer = tiktoken.get_encoding("gpt2")

if not os.path.isfile(saved_model):
    file_path = "the-verdict.txt" # Free book from Edith Wharton (inspired by credits)
    text_data = ""
    with open(file_path,"r",encoding="utf-8") as file:
        text_data = file.read()
        
    
        
    train_ratio = 0.90
    split_idx = int(train_ratio * len(text_data))
    train_data = text_data[:split_idx]
    test_data = text_data[split_idx:]

    torch.manual_seed(234)

    train_loader = create_dataloader_v1(
        train_data,
        batch_size=2,
        max_length=GPT_CONFIG_124M["context_length"],
        stride=GPT_CONFIG_124M["context_length"],
        drop_last=True,
        shuffle=True,
        num_workers=0
    )

    test_loader = create_dataloader_v1(
        test_data,
        batch_size=2,
        max_length=GPT_CONFIG_124M["context_length"],
        stride=GPT_CONFIG_124M["context_length"],
        drop_last=False,
        shuffle=False,
        num_workers=0
    )

    torch.manual_seed(123)
    
    model.to(device)

    optimizer = torch.optim.AdamW(
        model.parameters(),
        lr=0.0004,weight_decay=0.1
    )

    num_epochs = 10
    train_losses, test_losses, token_seen = train_model_simple(
        model=model,train_loader=train_loader,test_loader=test_loader,optimizer=optimizer,device=device,num_epochs=num_epochs,
        eval_freq=5,
        eval_iter=5,
        start_context="Every effort moves you",
        tokenizer=tokenizer
    )
    torch.save(
        {
        "model_state_dict": model.state_dict(),
        "optimizer_state_dict": optimizer.state_dict(),
        },
        saved_model
    )
else:
    model.load_state_dict(torch.load(saved_model,map_location=device)["model_state_dict"])
    model.eval()



### ‚öóÔ∏è Working with Temperature and Top‚ÄëK Sampling

The following code demonstrates how different sampling techniques affect a model‚Äôs ability to generate coherent, context‚Äëaware, and syntactically correct text.  
Experiment with each parameter to observe changes in style, creativity, and stability.

---

#### üî• Temperature  
Temperature controls randomness in token selection:

- **0.0 ‚Üí 0.7** ‚Äî deterministic and safe  
- **0.7 ‚Üí 1.0** ‚Äî balanced creativity  
- **1.0 ‚Üí 1.5** ‚Äî high diversity, risk of losing coherence  

#### üî¢ Top‚ÄëK Sampling  
Top‚ÄëK restricts sampling to the **K highest‚Äëprobability tokens**:

- **K = 3‚Äì5** keeps context while allowing variation  
- Higher values increase creativity but may destabilize meaning

#### üßµ End‚Äëof‚ÄëString (EOS)  
An EOS token ID enables the model to stop generation cleanly at a semantic boundary.

#### üìè Max New Tokens  
Controls output length:

- Lower values ‚Üí short completions  
- Higher values ‚Üí long-form generation

#### üõ†Ô∏è Implementation Notes

- `torch.topk()` keeps only the highest-logit options  
- Masking with `float('-inf')` removes unwanted tokens from sampling  
- Softmax converts masked logits into **0 probability**  
- `torch.multinomial()` introduces controlled randomness instead of picking the highest value every time


In [17]:
def generate_t_k(model,idx,max_new_tokens,context_size, temperature=0.0,top_k=None,eos_id=None):
    for _ in range(max_new_tokens):
        idx_cond = idx[:,-context_size:]
        with torch.no_grad():
            logits = model(idx_cond)
            
        logits = logits[:,-1,:]
        if top_k is not None:
            top_logits, _ = torch.topk(logits,top_k)
            min_val = top_logits[:,-1]
            
            logits = torch.where(
                logits < min_val,
                torch.tensor(float('-inf')).to(logits.device),
                logits
            )
            
        if temperature > 0.0 :
            logits = logits / temperature
            probs = torch.softmax(logits,-1)
            idx_next = torch.multinomial(probs,num_samples=1)
        if idx_next == eos_id:
            break
        
        idx = torch.cat((idx,idx_next),-1)
        
    return idx

token_ids = generate_t_k(
    model=model,
    idx=text_to_token_ids("Every effort moves you", tokenizer),
    max_new_tokens=25,
    context_size=GPT_CONFIG_124M["context_length"],
    top_k = 5,
    temperature=1
    #,eos_id=text_to_token_ids('.',tokenizer)
)

print("Output text: \n",token_ids_to_text(token_ids,tokenizer))


Output text: 
 Every effort moves you?"
Yes, and pushed one of the deep arm-chairs forward. I could have given Miss Croft the fullest reass



#### üìö Inspiration & Citation

This exercise is inspired by the following work. If you use this notebook or its accompanying code, please cite it accordingly:

```yaml
cff-version: 1.2.0
message: "If you use this book or its accompanying code, please cite it as follows."
title: "Build A Large Language Model (From Scratch), Published by Manning, ISBN 978-1633437166"
abstract: "This book provides a comprehensive, step-by-step guide to implementing a ChatGPT-like large language model from scratch in PyTorch."
date-released: 2024-09-12
authors:
  - family-names: "Raschka"
    given-names: "Sebastian"
license: "Apache-2.0"
url: "https://www.manning.com/books/build-a-large-language-model-from-scratch"
repository-code: "https://github.com/rasbt/LLMs-from-scratch"
keywords:
  - large language models
  - natural language processing
  - artificial intelligence
  - PyTorch
  - machine learning
  - deep learning