# TECHIN 509: Melody Generator with Functions & Bigrams

**Name:** Rushav  
**Date:** November 7th 2025

---

## Project Overview

This project implements a **bigram-based melody generator** that learns note-to-note relationships from training melodies and generates new musical sequences. Unlike random note generation, this approach creates more musical-sounding melodies by learning which notes typically follow other notes.

### What is a Bigram Model?

A bigram model stores information about note transitions:
```
Current note → possible next notes and their counts
```

For example:
- `C4 → {D4: 3, E4: 1}` means after C4, D4 appears 3 times and E4 appears 1 time
- When generating, we can choose D4 with 75% probability and E4 with 25% probability

## Setup and Imports

In [1]:
import random
from collections import defaultdict, Counter

# Set random seed for reproducibility
random.seed(42)

---

## Part 1: Write Functions for Clarity

### Program Structure Plan

1. **Data Loading Functions**
   - Load melodies from file or create sample data
   - Flatten multiple melodies into training data

2. **Bigram Model Functions**
   - Build bigram model from training data
   - Display the model in readable format
   - Find most common transitions

3. **Generation Functions**
   - Generate new melodies using the bigram model
   - Apply constraints
   - Weighted random selection

4. **Analysis Functions**
   - Display melodies
   - Analyze melody statistics

### Design Decisions

- **Modular approach**: Each function does one thing well
- **Clear naming**: Function names describe what they do
- **Reusability**: Functions can be used independently
- **Documentation**: Each function has a docstring

### Data Loading Functions

In [2]:
def create_sample_dataset():
    """
    Create a sample dataset of melodies for training.
    
    Returns a list
        List of melodies, where each melody is a list of notes
    """
    melodies = [
        ['C4', 'D4', 'E4', 'F4', 'G4', 'A4', 'B4', 'C5'],           # C major scale
        ['E4', 'F#4', 'G4', 'A4', 'B4', 'A4', 'G4', 'F#4', 'E4'],   # From given project README
        ['C4', 'E4', 'G4', 'C5', 'G4', 'E4', 'C4'],                 # C major chord arpeggio
        ['G4', 'F4', 'E4', 'D4', 'C4', 'D4', 'E4', 'F4', 'G4'],
        ['A4', 'A4', 'B4', 'C5', 'D5', 'C5', 'B4', 'A4'],
        ['F4', 'G4', 'A4', 'G4', 'F4', 'E4', 'D4', 'C4'],
        ['C4', 'C4', 'G4', 'G4', 'A4', 'A4', 'G4'],                 # Twinkle Twinkle pattern
        ['E4', 'G4', 'E4', 'C4', 'D4', 'E4', 'C4'],
        ['D4', 'E4', 'F#4', 'G4', 'A4', 'B4', 'C#5', 'D5'],         # D major scale
        ['B4', 'A4', 'G4', 'F#4', 'E4', 'D4', 'C4', 'B3'],
    ]
    return melodies


def flatten_melodies(melodies):
    """
    Flatten a list of melodies into a single list of all notes.
    
    This is useful for analyzing patterns across all melodies.
    
    Parameters are melodies : list
        List of melodies, where each melody is a list of notes
        
    Returns a list
        Single list containing all notes from all melodies
    """
    
    all_notes = []
    for melody in melodies:
        all_notes.extend(melody)
    return all_notes

In [3]:
# Test data loading functions
melodies = create_sample_dataset()
print(f"Loaded {len(melodies)} training melodies\n")

print("Sample melodies:")
for i in range(3):
    print(f"  {i+1}. {' '.join(melodies[i])}")

print(f"\nTotal notes in dataset: {len(flatten_melodies(melodies))}")

Loaded 10 training melodies

Sample melodies:
  1. C4 D4 E4 F4 G4 A4 B4 C5
  2. E4 F#4 G4 A4 B4 A4 G4 F#4 E4
  3. C4 E4 G4 C5 G4 E4 C4

Total notes in dataset: 79



## Part 2: Build the Bigram Model

In [5]:
def build_bigram_model(melodies, add_tokens=False):
    """
    Build a bigram model from a list of melodies.
    
    A bigram model tracks which notes follow each note and how often.
    The model is stored as a nested dictionary:
    
    Parameters
    ------------
    melodies : list
        List of melodies, where each melody is a list of notes
    add_tokens : bool, optional
        If True, add start (^) and end ($) tokens to each melody
        
    Returns
    ------------
    dict
        Bigram model mapping current notes to possible next notes and counts
    
    """
    
    bigram_model = defaultdict(lambda: defaultdict(int))
    
    for melody in melodies:
        # Add start and end tokens if requested
        if add_tokens:
            melody = ['^'] + melody + ['$']
        
        # Build bigram counts
        for i in range(len(melody) - 1):
            current_note = melody[i]
            next_note = melody[i + 1]
            bigram_model[current_note][next_note] += 1
    
    # Convert defaultdicts to regular dicts for cleaner printing
    return {k: dict(v) for k, v in bigram_model.items()}


def print_bigram_model(bigram_model, limit=None):
    """
    Print the bigram model in a readable format.
    
    Parameters
    ----------
    bigram_model : dict
        Bigram model to print
    limit : int, optional
        Maximum number of entries to print. If None, print all.

    """

    print("Bigram Model:")
    print("=" * 60)
    
    count = 0

    for current_note, next_notes in sorted(bigram_model.items()):
        print(f"{current_note} → {next_notes}")
        count += 1

        if limit and count >= limit:
            remaining = len(bigram_model) - limit

            if remaining > 0:
                print(f"... and {remaining} more entries")
            break
        
    print("=" * 60)

In [6]:
# Build and display the bigram model
bigram_model = build_bigram_model(melodies, add_tokens=False)
print_bigram_model(bigram_model, limit=15)

Bigram Model:
A4 → {'B4': 4, 'G4': 4, 'A4': 2}
B4 → {'C5': 2, 'A4': 3, 'C#5': 1}
C#5 → {'D5': 1}
C4 → {'D4': 3, 'E4': 1, 'C4': 1, 'G4': 1, 'B3': 1}
C5 → {'G4': 1, 'D5': 1, 'B4': 1}
D4 → {'E4': 4, 'C4': 3}
D5 → {'C5': 1}
E4 → {'F4': 2, 'F#4': 2, 'G4': 2, 'C4': 3, 'D4': 3}
F#4 → {'G4': 2, 'E4': 2}
F4 → {'G4': 3, 'E4': 2}
G4 → {'A4': 5, 'F#4': 2, 'C5': 1, 'E4': 2, 'F4': 2, 'G4': 1}



## Part 3: Generate New Melodies

Generate new melodies by:
1. Starting with a random note (or specified starting note)
2. Looking up possible next notes based on current note
3. Choosing next note randomly, weighted by how often it appears
4. Repeating until we reach desired length

In [8]:
def weighted_random_choice(choices_dict):
    """
    Choose a random item from a dictionary based on weighted counts.
    
    Parameters
    ----------
    choices_dict : dict
        Dictionary mapping items to their counts/weights
        
    Returns
    -------
    str
        Randomly chosen item based on weights
    """

    if not choices_dict:
        return None
    
    # Create a list with items repeated according to their counts
    items = []
    for item, count in choices_dict.items():
        items.extend([item] * count)
    
    # Choose randomly from this weighted list
    return random.choice(items)


def generate_melody(bigram_model, start_note=None, max_length=20, use_end_token=False):
   
    """
    Generate a new melody using the bigram model.
    
    Parameters
    ----------
    bigram_model : dict
        Bigram model for note transitions
    start_note : str, optional
        Starting note. If None, choose randomly from available notes.
        Use '^' if the model has start tokens.
    max_length : int, optional
        Maximum length of the generated melody
    use_end_token : bool, optional
        If True, stop when '$' (end token) is generated
        
    Returns
    -------
    list
        Generated melody as a list of notes
    """

    # Choose starting note
    if start_note is None:
        start_note = random.choice(list(bigram_model.keys()))
    
    melody = []
    
    # Handle start token
    if start_note == '^':
        current_note = start_note
    else:
        melody.append(start_note)
        current_note = start_note
    
    # Generate notes
    while len(melody) < max_length:
        
        # Get possible next notes
        if current_note not in bigram_model:
            break
        
        next_notes = bigram_model[current_note]
        next_note = weighted_random_choice(next_notes)
        
        if next_note is None:
            break
        
        # Check for end token
        if use_end_token and next_note == '$':
            break
        
        # Don't add special tokens to melody
        if next_note not in ['^', '$']:
            melody.append(next_note)
        
        current_note = next_note
    
    return melody

## Part 4: Show Results

In [9]:
print("Generated Melodies (without end tokens):")
print("=" * 60)

for i in range(5):
    melody = generate_melody(bigram_model, max_length=15)
    print(f"{i+1}. {' '.join(melody)}")

Generated Melodies (without end tokens):
1. C#5 D5 C5 B4 A4 B4 C5 G4 F4 G4 F4 E4 F4 E4 C4
2. C4 D4 E4 F#4 G4 E4 D4 E4 C4 D4 C4 G4 F4 E4 C4
3. F4 E4 D4 E4 F4 G4 F4 E4 G4 A4 B4 C5 D5 C5 G4
4. B4 C5 D5 C5 B4 A4 B4 C#5 D5 C5 D5 C5 B4 A4 A4
5. A4 A4 B4 C#5 D5 C5 B4 C5 D5 C5 G4 A4 G4 A4 G4


### Analyzing Generated Melodies

In [10]:
def analyze_melody(melody):
    """
    Analyze a melody and print statistics.
    
    Parameters
    ----------
    melody : list
        Melody to analyze
    """
    if not melody:
        print("Empty melody")
        return
    
    print(f"Length: {len(melody)} notes")
    print(f"Starting note: {melody[0]}")
    print(f"Ending note: {melody[-1]}")
    print(f"Unique notes: {len(set(melody))}")
    
    # Count note frequencies
    note_counts = Counter(melody)
    most_common = note_counts.most_common(3)
    print(f"Most common notes: {most_common}")


# Analyze a generated melody
sample_melody = generate_melody(bigram_model, max_length=20)
print(f"Sample melody: {' '.join(sample_melody)}")
print("\nAnalysis:")
analyze_melody(sample_melody)

Sample melody: C#5 D5 C5 D5 C5 G4 F4 G4 F4 G4 E4 D4 E4 C4 G4 A4 B4 A4 G4 A4

Analysis:
Length: 20 notes
Starting note: C#5
Ending note: A4
Unique notes: 10
Most common notes: [('G4', 5), ('A4', 3), ('D5', 2)]



## Part 5: Clean Up - Add Start and End Tokens

In [11]:
# Build bigram model with start and end tokens
bigram_model_with_tokens = build_bigram_model(melodies, add_tokens=True)

print("Bigram Model with Start/End Tokens:")
print_bigram_model(bigram_model_with_tokens, limit=12)

Bigram Model with Start/End Tokens:
Bigram Model:
A4 → {'B4': 4, 'G4': 4, 'A4': 2, '$': 1}
B3 → {'$': 1}
B4 → {'C5': 2, 'A4': 3, 'C#5': 1}
C#5 → {'D5': 1}
C4 → {'D4': 3, 'E4': 1, '$': 3, 'C4': 1, 'G4': 1, 'B3': 1}
C5 → {'$': 1, 'G4': 1, 'D5': 1, 'B4': 1}
D4 → {'E4': 4, 'C4': 3}
D5 → {'C5': 1, '$': 1}
E4 → {'F4': 2, 'F#4': 2, '$': 1, 'G4': 2, 'C4': 3, 'D4': 3}
F#4 → {'G4': 2, 'E4': 2}
F4 → {'G4': 3, 'E4': 2}
G4 → {'A4': 5, 'F#4': 2, 'C5': 1, 'E4': 2, 'F4': 2, '$': 2, 'G4': 1}
... and 1 more entries


### Understanding Start and End Tokens

- **Start token `^`**: Shows which notes commonly begin melodies
- **End token `$`**: Shows which notes commonly end melodies

In [13]:
print("Generated Melodies (with start/end tokens):")
print("=" * 60)

for i in range(5):
    melody = generate_melody(bigram_model_with_tokens, start_note='^', max_length=30, use_end_token=True)
    print(f"{i+1}. {' '.join(melody)}")

Generated Melodies (with start/end tokens):
1. E4 C4 D4 C4 D4 C4
2. D4 C4 E4 F#4 E4 D4 E4 C4 G4 G4 A4 A4 G4 C5
3. C4
4. E4 F#4 G4 A4 A4 B4 C5 B4 C5 G4 A4
5. F4 E4 F#4 E4 C4 B3


### Finding the Most Common Transition

In [14]:
def find_most_common_transition(bigram_model):
    """
    Find the most common transition in the bigram model.
    
    Parameters
    ----------
    bigram_model : dict
        Bigram model containing note transitions
        
    Returns
    -------
    tuple
        (current_note, next_note, count) for the most common transition
    """
    
    max_count = 0
    most_common = None
    
    for current_note, next_notes in bigram_model.items():
        for next_note, count in next_notes.items():
            if count > max_count:
                max_count = count
                most_common = (current_note, next_note, count)
    
    return most_common


most_common = find_most_common_transition(bigram_model_with_tokens)
if most_common:
    print(f"Most common transition: {most_common[0]} → {most_common[1]}")
    print(f"Appears {most_common[2]} times in the training data")

Most common transition: G4 → A4
Appears 5 times in the training data
