# üéì Harry Potter Sorting Hat - An Educational AI Project

## Welcome to Hogwarts!

This notebook demonstrates how to build an interactive Sorting Hat using **open-source Large Language Models (LLMs)**. Inspired by [Ryan Anderson's IBM Watson project](https://github.com/rustyoldrake/Harry_Potter_Sorting_Hat_Simple), this version uses modern, accessible AI technology that students can learn from and modify.

---

## üéØ Learning Objectives

By working through this notebook, you'll learn:

1. **How to use open-source LLMs** from Hugging Face
2. **System prompts**: How to guide AI behavior with instructions
3. **Interactive interfaces**: Building user-friendly AI applications
4. **Prompt engineering**: Crafting prompts to get desired outputs

---

## üè∞ The Four Houses

- **ü¶Å Gryffindor**: Brave, daring, chivalrous
- **ü¶Ö Ravenclaw**: Intelligent, witty, creative
- **ü¶° Hufflepuff**: Loyal, patient, hardworking
- **üêç Slytherin**: Ambitious, cunning, resourceful

---

## üîß Technology Stack

- **LLM**: Smaller models from Hugging Face (Flan-T5, Phi-2, or similar)
- **Framework**: Hugging Face Transformers
- **Interface**: Jupyter Notebooks with ipywidgets
- **Python**: Version 3.7+


---

## üì¶ Step 1: Install Required Libraries

We'll need:
- `transformers`: To load and use open-source LLMs
- `torch`: Backend for running the models
- `ipywidgets`: For creating interactive UI elements
- `accelerate`: To optimize model loading

In [1]:
# Install required packages (run this cell first!)
!pip install -q transformers torch ipywidgets accelerate sentencepiece

---

## üß† Step 2: Understanding the LLM Choice

### What is an LLM?

A **Large Language Model (LLM)** is an AI trained on vast amounts of text to understand and generate human-like language.

### Our Model: Google Flan-T5-Base - Which is fine as a 'first experiment' but underperformed the original IBM Natural Language Classifier from 7 years ago (so keep experimenting!)

We're using **`google/flan-t5-base`** because:
- ‚úÖ **Small size** (~250MB): Runs on most laptops
- ‚úÖ **Open source**: Free to use and modify
- ‚úÖ **Good at following instructions**: Perfect for our sorting task
- ‚úÖ **Well-documented**: Great for learning

**Alternative models you could try:**
- `google/flan-t5-small` (even smaller, faster)
- `microsoft/phi-2` (more powerful but larger)
- `tiiuae/falcon-7b-instruct` (advanced, requires more memory)

In [2]:
# Import necessary libraries
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output
import random

print("‚úÖ Libraries imported successfully!")

‚úÖ Libraries imported successfully!


---

## ü§ñ Step 3: Load the LLM

Now we'll download and load the model from Hugging Face. This may take a minute the first time!

In [3]:
# Model selection - you can change this to experiment!
MODEL_NAME = "google/flan-t5-base"

print(f"üì• Loading model: {MODEL_NAME}")
print("This may take a moment...\n")

# Load the tokenizer (converts text to numbers the model understands)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Load the model itself
model = AutoModelForSeq2SeqLM.from_pretrained(
    MODEL_NAME,
    dtype=torch.float32,  # Use float32 for CPU compatibility
    device_map="auto"  # Automatically choose CPU or GPU
)

print("‚úÖ Model loaded successfully!")
print(f"üìä Model size: ~{sum(p.numel() for p in model.parameters())/1e6:.1f}M parameters")
print(f"üñ•Ô∏è  Running on: {next(model.parameters()).device}")

üì• Loading model: google/flan-t5-base
This may take a moment...

‚úÖ Model loaded successfully!
üìä Model size: ~247.6M parameters
üñ•Ô∏è  Running on: mps:0


---

## üìù Step 4: Crafting the System Prompt

### What is a System Prompt?

A **system prompt** is a set of instructions that tells the AI:
- **WHO** it should act as (the Sorting Hat)
- **WHAT** its task is (sorting students)
- **HOW** it should respond (format, tone)

### Our Sorting Hat Personality

The prompt below defines the Sorting Hat's behavior. **Read it carefully** - this is how we control the AI's output!

In [4]:
# The System Prompt - This is the "brain" of our Sorting Hat!
SORTING_HAT_SYSTEM_PROMPT = """
You are the Hogwarts Sorting Hat. Your job is to sort students into one of four houses based on their personality traits.

Here are the 4 Houses and each one's key traits:
1 Gryffindor: Brave, daring, chivalrous, courageous, noble;
2 Ravenclaw: Intelligent, witty, creative, wise, bookish; loves learning;
3 Hufflepuff: Loyal, patient, hardworking, fair, kind, neutral;
4 Slytherin: Ambitious, cunning, resourceful, determined, evil, serpents;

Given a student's description, analyze their personality and choose ONE house that fits best.
Respond with ONLY the house name: Gryffindor, Ravenclaw, Hufflepuff, or Slytherin.
Do not add any explanation or extra text.
"""

print("üìú System Prompt Created!")
print("\n" + "="*60)
print(SORTING_HAT_SYSTEM_PROMPT)
print("="*60)

üìú System Prompt Created!


You are the Hogwarts Sorting Hat. Your job is to sort students into one of four houses based on their personality traits.

Here are the 4 Houses and each one's key traits:
1 Gryffindor: Brave, daring, chivalrous, courageous, noble;
2 Ravenclaw: Intelligent, witty, creative, wise, bookish; loves learning;
3 Hufflepuff: Loyal, patient, hardworking, fair, kind, neutral;
4 Slytherin: Ambitious, cunning, resourceful, determined, evil, serpents;

Given a student's description, analyze their personality and choose ONE house that fits best.
Respond with ONLY the house name: Gryffindor, Ravenclaw, Hufflepuff, or Slytherin.
Do not add any explanation or extra text.



---

## üé© Step 5: Create the Sorting Function

This function combines:
1. The **system prompt** (Sorting Hat's instructions)
2. The **user's description** (student's personality)
3. The **LLM** (to make the decision)

In [5]:
def sort_student(student_description):
    """
    Sort a student into a Hogwarts house based on their description.
    
    Args:
        student_description (str): The student's personality description
        
    Returns:
        str: The house name (Gryffindor, Ravenclaw, Hufflepuff, or Slytherin)
    """
    # Combine the system prompt with the student's description
    full_prompt = f"{SORTING_HAT_SYSTEM_PROMPT}\n\nStudent: {student_description}\n\nHouse:"
    
    # Tokenize (convert text to numbers)
    inputs = tokenizer(full_prompt, return_tensors="pt", max_length=512, truncation=True)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    # Generate the response
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=10,  # Short response (just the house name)
            temperature=0.7,     # Some randomness
            do_sample=True,      # Enable sampling
            top_p=0.9           # Nucleus sampling
        )
    
    # KEY FIX: Only decode the newly generated tokens, not the entire input
    input_length = inputs['input_ids'].shape[1]
    generated_ids = outputs[0][input_length:]  # Skip the input tokens
    response = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()
    
    # Clean up the response and ensure it's a valid house
    houses = ["Gryffindor", "Ravenclaw", "Hufflepuff", "Slytherin"]
    for house in houses:
        if house.lower() in response.lower():
            return house
    
    # Fallback: analyze keywords if model doesn't give a clear answer
    desc_lower = student_description.lower()
    
    scores = {
        "Gryffindor": sum(1 for word in ["brave", "courage", "daring", "bold", "heroic"] if word in desc_lower),
        "Ravenclaw": sum(1 for word in ["smart", "clever", "wise", "creative", "learning", "intelligent"] if word in desc_lower),
        "Hufflepuff": sum(1 for word in ["loyal", "kind", "patient", "fair", "hardworking", "dedicated"] if word in desc_lower),
        "Slytherin": sum(1 for word in ["ambitious", "cunning", "leader", "determined", "resourceful"] if word in desc_lower)
    }
    
    return max(scores, key=scores.get) if max(scores.values()) > 0 else random.choice(houses)

print("‚úÖ Sorting function created!")

‚úÖ Sorting function created!


---

## üß™ Step 6: Test the Sorting Hat

Let's test with some example students before building the full interface!

In [6]:
    house = sort_student("I am intelligent and love books")
    print(f"üé© Sorted into: {house}\n")
    # seems a bit random - often returns incorrect 

üé© Sorted into: Ravenclaw



In [7]:
# Test cases
test_students = [
    "I am brave and noble, love adventure",
    "I enjoy solving puzzles and reading books. I am intelligent",
    "I am loyal",
    "I am ambitious, at times evil, bad, and I like serpents and snakes that slither"
]

print("üß™ Testing the Sorting Hat...\n")

for i, description in enumerate(test_students, 1):
    house = sort_student(description)
    print(f"Student {i}: {description[:50]}...")
    print(f"üé© Sorted into: {house}\n")

üß™ Testing the Sorting Hat...

Student 1: I am brave and noble, love adventure...
üé© Sorted into: Gryffindor

Student 2: I enjoy solving puzzles and reading books. I am in...
üé© Sorted into: Ravenclaw

Student 3: I am loyal...
üé© Sorted into: Hufflepuff

Student 4: I am ambitious, at times evil, bad, and I like ser...
üé© Sorted into: Slytherin



---

## üé® Step 7: Build the Interactive Interface

Now for the fun part! We'll create an interactive UI using **ipywidgets** where students can:
- Answer personality questions
- See their sorting in real-time
- Learn about their house

In [8]:
# House information
HOUSE_INFO = {
    "Gryffindor": {
        "emoji": "ü¶Å",
        "color": "#740001",
        "traits": "Courage, Bravery, Determination",
        "founder": "Godric Gryffindor",
        "famous": "Harry Potter, Hermione Granger, Albus Dumbledore"
    },
    "Ravenclaw": {
        "emoji": "ü¶Ö",
        "color": "#0E1A40",
        "traits": "Intelligence, Wisdom, Creativity",
        "founder": "Rowena Ravenclaw",
        "famous": "Luna Lovegood, Cho Chang, Filius Flitwick"
    },
    "Hufflepuff": {
        "emoji": "ü¶°",
        "color": "#F0C75E",
        "traits": "Loyalty, Patience, Hard Work",
        "founder": "Helga Hufflepuff",
        "famous": "Cedric Diggory, Newt Scamander, Nymphadora Tonks"
    },
    "Slytherin": {
        "emoji": "üêç",
        "color": "#1A472A",
        "traits": "Ambition, Cunning, Resourcefulness",
        "founder": "Salazar Slytherin",
        "famous": "Severus Snape, Merlin, Draco Malfoy"
    }
}

# Create widgets
output_area = widgets.Output()
question_area = widgets.Output()

# Questions to build the student profile
questions = [
    "What do you value most in yourself?",
    "How do you approach challenges?",
    "What are your greatest strengths?"
]

answers = []
current_question = 0

# Text input widget
answer_input = widgets.Textarea(
    placeholder='Type your answer here...',
    description='',
    layout=widgets.Layout(width='80%', height='100px')
)

# Submit button
submit_button = widgets.Button(
    description='Submit Answer',
    button_style='success',
    tooltip='Click to submit',
    icon='check'
)

def display_question():
    """Display the current question"""
    with question_area:
        clear_output()
        if current_question < len(questions):
            display(HTML(f"""
                <div style='background-color: #f0f0f0; padding: 20px; border-radius: 10px; margin: 10px 0;'>
                    <h3 style='color: #333;'>Question {current_question + 1} of {len(questions)}</h3>
                    <p style='font-size: 18px; color: #555;'>{questions[current_question]}</p>
                </div>
            """))
            display(answer_input)
            display(submit_button)
        else:
            display(HTML("<h3>All questions answered! Sorting in progress...</h3>"))
            perform_sorting()

def on_submit_clicked(b):
    """Handle submit button click"""
    global current_question
    
    if answer_input.value.strip():
        answers.append(answer_input.value.strip())
        answer_input.value = ''  # Clear input
        current_question += 1
        display_question()

def perform_sorting():
    """Sort the student based on their answers"""
    # Combine all answers into one description
    full_description = " ".join(answers)
    
    with question_area:
        clear_output()
        display(HTML("""
            <div style='text-align: center; padding: 20px;'>
                <h2>üé© The Sorting Hat is deliberating...</h2>
                <p style='font-style: italic;'>"Hmm, difficult... very difficult..."</p>
            </div>
        """))
    
    # Perform the sorting
    house = sort_student(full_description)
    info = HOUSE_INFO[house]
    
    with question_area:
        clear_output()
        display(HTML(f"""
            <div style='background-color: {info['color']}; color: white; padding: 30px; border-radius: 15px; text-align: center;'>
                <h1>{info['emoji']} {house.upper()} {info['emoji']}</h1>
                <h2>Welcome to your new house!</h2>
                <hr style='border-color: white;'>
                <p><strong>House Traits:</strong> {info['traits']}</p>
                <p><strong>Founded by:</strong> {info['founder']}</p>
                <p><strong>Famous Members:</strong> {info['famous']}</p>
            </div>
            <br>
            <div style='background-color: #f9f9f9; padding: 20px; border-radius: 10px;'>
                <h3>üìä How the Sorting Worked:</h3>
                <p><strong>Your Answers:</strong></p>
                <ul>
                    {''.join([f'<li>{answer}</li>' for answer in answers])}
                </ul>
                <p><strong>System Prompt:</strong> The LLM was given instructions to act as the Sorting Hat and analyze personality traits.</p>
                <p><strong>Model Used:</strong> {MODEL_NAME}</p>
                <p><strong>Process:</strong> Your combined answers were sent to the model along with the system prompt, and it determined which house best matches your personality!</p>
            </div>
        """))
    
    # Create restart button
    restart_button = widgets.Button(
        description='Sort Again',
        button_style='info',
        icon='refresh'
    )
    restart_button.on_click(restart_sorting)
    
    with question_area:
        display(restart_button)

def restart_sorting(b):
    """Restart the sorting process"""
    global current_question, answers
    current_question = 0
    answers = []
    answer_input.value = ''
    display_question()

# Connect the button to the handler
submit_button.on_click(on_submit_clicked)

print("‚úÖ Interactive interface ready!")

‚úÖ Interactive interface ready!


---

## üé¨ Step 8: Launch the Sorting Ceremony!

Run this cell to start the interactive Sorting Hat experience!

In [9]:
# Display the welcome message and start
display(HTML("""
    <div style='background: linear-gradient(135deg, #1e3c72 0%, #2a5298 100%); color: white; padding: 30px; border-radius: 15px; text-align: center;'>
        <h1>üéì Welcome to Hogwarts School of Witchcraft and Wizardry! üéì</h1>
        <p style='font-size: 18px;'>The Sorting Hat will now determine your house.</p>
        <p style='font-style: italic;'>"Oh, you may not think I'm pretty, but don't judge on what you see..."</p>
    </div>
"""))

display(question_area)
display_question()

Output()

---

## üî¨ Step 9: Experiment and Learn!

### Try These Experiments:

1. **Modify the System Prompt**: Change the personality descriptions in the system prompt to see how it affects sorting

2. **Change the Model**: Try different models from Hugging Face:
   ```python
   MODEL_NAME = "google/flan-t5-small"  # Smaller, faster
   MODEL_NAME = "google/flan-t5-large"  # Larger, more accurate
   ```

3. **Adjust Temperature**: In the `sort_student()` function, change the `temperature` parameter:
   - Lower (0.3): More consistent, predictable results
   - Higher (1.0): More creative, varied results

4. **Add More Questions**: Expand the `questions` list to gather more information

5. **Analyze the Prompt**: Look at the `full_prompt` variable to see exactly what's sent to the model

In [10]:
# Experiment Cell - Try modifying parameters here!

# Example: See what prompt is actually sent to the model
example_description = "I love learning new things and solving complex problems."
full_prompt = f"{SORTING_HAT_SYSTEM_PROMPT}\n\nStudent: {example_description}\n\nHouse:"

print("üìù Full Prompt Sent to Model:")
print("=" * 60)
print(full_prompt)
print("=" * 60)

üìù Full Prompt Sent to Model:

You are the Hogwarts Sorting Hat. Your job is to sort students into one of four houses based on their personality traits.

Here are the 4 Houses and each one's key traits:
1 Gryffindor: Brave, daring, chivalrous, courageous, noble;
2 Ravenclaw: Intelligent, witty, creative, wise, bookish; loves learning;
3 Hufflepuff: Loyal, patient, hardworking, fair, kind, neutral;
4 Slytherin: Ambitious, cunning, resourceful, determined, evil, serpents;

Given a student's description, analyze their personality and choose ONE house that fits best.
Respond with ONLY the house name: Gryffindor, Ravenclaw, Hufflepuff, or Slytherin.
Do not add any explanation or extra text.


Student: I love learning new things and solving complex problems.

House:


---

## üìö Key Takeaways

### What You've Learned:

1. **LLMs from Hugging Face**: How to load and use open-source language models
2. **System Prompts**: How instructions control AI behavior
3. **Tokenization**: Converting text to/from numbers for the model
4. **Generation Parameters**: Temperature, sampling, and how they affect outputs
5. **Interactive UIs**: Building user-friendly interfaces with ipywidgets

### The Technology Pipeline:

```
User Input ‚Üí Combine with System Prompt ‚Üí Tokenize ‚Üí 
Model Processing ‚Üí Generate Output ‚Üí Decode ‚Üí Display Result
```

### Resources for Further Learning:

- [Hugging Face Documentation](https://huggingface.co/docs)
- [Transformers Library](https://huggingface.co/docs/transformers)
- [Prompt Engineering Guide](https://www.promptingguide.ai/)
- [Original IBM Project](https://github.com/rustyoldrake/Harry_Potter_Sorting_Hat_Simple)

---

## üéâ Congratulations!

You've built an AI-powered Sorting Hat using modern open-source technology! Feel free to modify, experiment, and make it your own.

**"It is our choices, Harry, that show what we truly are, far more than our abilities."** - Albus Dumbledore