# Taylor Swift Markov Chain Language Model

This notebook implements a simple probabilistic language model using Markov chains to generate text based on Taylor Swift lyrics.

In [None]:
import random
import io

## 1. Words Function

Extract words from a text file, preserving punctuation and newline characters.

In [None]:
def words(handle):
   
    text = handle.read()
    word_list = text.split()
    return word_list

### Test words function

In [None]:
handle = io.StringIO("""Can we always be this close forever and ever?
And ah, take me out, and take me home forever and ever.""")

language = words(handle)
print(language)

## 2. Transition Matrix Function

Build a dictionary mapping pairs of consecutive words to their possible next words.

In [None]:
def transition_matrix(word_list):

    matrix = {}
    
    # Iterate through the list, looking at triplets of words
    for i in range(len(word_list) - 2):
        pair = (word_list[i], word_list[i + 1])
        next_word = word_list[i + 2]
        
        if pair not in matrix:
            matrix[pair] = []
        
        matrix[pair].append(next_word)
    
    return matrix

### Test transition_matrix function

In [None]:
m = transition_matrix(language)

print("m[('take', 'me')]:", m.get(("take", "me")))
print("m[('we', 'always')]:", m.get(("we", "always")))
print("m[('forever', 'and')]:", m.get(("forever", "and")))

## 3. Markov Chain Generator

Generate sentences of a specified length using the Markov chain model.

In [None]:
def markov_chain(word_list, matrix, length):

    if length < 2:
        return ""
    
    # Randomly select two starting words
    start_idx = random.randint(0, len(word_list) - 2)
    result = [word_list[start_idx], word_list[start_idx + 1]]
    
    # Generate remaining words
    for i in range(2, length):
        # Get the last two words as a pair
        pair = (result[-2], result[-1])
        
        # Check if this pair exists in the transition matrix
        if pair in matrix:
            # Choose randomly from possible next words
            next_word = random.choice(matrix[pair])
        else:
            # If pair doesn't exist, choose random word from entire list
            next_word = random.choice(word_list)
        
        result.append(next_word)
    
    # Join words with spaces
    return ' '.join(result)

### Test markov_chain function

In [None]:
print(markov_chain(language, m, 15))

## 4. Taylor Swifter Function

Main function to generate sentences based on Taylor Swift lyrics from a file.

In [None]:
def taylor_swifter(filepath, length):

    with open(filepath, 'r', encoding='utf-8') as f:
        word_list = words(f)
    
    matrix = transition_matrix(word_list)
    sentence = markov_chain(word_list, matrix, length)
    
    return sentence

## Generate Taylor Swift Style Sentences

Run this cell to generate text based on the Taylor Swift lyrics file. Make sure `taylor_swift.txt` is in the same directory as this notebook.

In [None]:
# Generate a 30-word sentence
print(taylor_swifter("taylor_swift.txt", 30))
print()

# Generate multiple sentences
print("Generated sentences:")
print("-" * 80)
for i in range(5):
    print(f"{i+1}. {taylor_swifter('taylor_swift.txt', 25)}")
    print()

## Experiment with Different Lengths

Try generating sentences of different lengths to see how the model performs.

In [None]:
# Short sentence
print("Short (10 words):")
print(taylor_swifter("taylor_swift.txt", 10))
print()

# Medium sentence
print("Medium (20 words):")
print(taylor_swifter("taylor_swift.txt", 20))
print()

# Long sentence
print("Long (50 words):")
print(taylor_swifter("taylor_swift.txt", 50))