# Lyrics Generation: Genius Song Lyrics (1%)

**Dataset:** 34'049 Songs · 26'408 Artists · 6 Genres
**Genres:** Rap / Hip-Hop · Rock · Pop · R&B · Country · Miscellaneous

**Purpose:**
Generate new, style-consistent song lyrics using a simple Markov chain model trained on the existing lyrics dataset.

The notebook supports:
- generation from the **full corpus**
- **genre-specific** lyric generation (e.g. only country, only rap)

---

# 1. Imports and Setup
## 1.1 Import Libraries and Load Data

In [None]:
import pandas as pd
import markovify

In [None]:
df = pd.read_csv("data/clean/data.csv")

print(df.shape)
df[["lyrics", "tag"]].head()

## 1.2 Data Preparation

In [None]:
all_lyrics = df["lyrics"].dropna().tolist()

# Join all lyrics into one big text
corpus_text = "\n".join(all_lyrics)

# 2. Markov chain model
## 2.1 Build Model

To generate individual lyric lines, we use
`model.make_short_sentence(max_chars=90, tries=100)`.

- `make_short_sentence` creates a valid sentence no longer than `max_chars`.
- It is better suited for short lyric-style lines than `make_sentence()`.
- `tries=100` controls how many attempts the model makes before giving up.

This ensures that generated lines stay concise and resemble typical song lyrics.

In [None]:
text_model_all = markovify.Text(corpus_text, state_size=2)

## 2.2 Generate a few lines

In [None]:
print("=== Generated lyrics (full corpus) ===\n")
for _ in range(10):
    line = text_model_all.make_short_sentence(max_chars=90, tries=100)
    if line:
        print(line)

## 2.3 Generate Genre-specific Lyrics

In [None]:
def generate_markov_lyrics(genre=None, num_lines=10):
    """
    Generate Markov-based lyrics from the full corpus or a specific genre.
    """
    if genre is None:
        subset = df["lyrics"].dropna().tolist()
        label = "full corpus"
    else:
        subset = df[df["tag"] == genre]["lyrics"].dropna().tolist()
        label = f"genre: {genre}"

    corpus_text = "\n".join(subset)
    model = markovify.Text(corpus_text, state_size=2)

    print(f"=== Generated lyrics ({label}) ===\n")
    for _ in range(num_lines):
        line = model.make_short_sentence(max_chars=90, tries=100)
        if line:
            print(line)

In [None]:
generate_markov_lyrics(genre="country", num_lines=10)

## 2.4 Generate Lyrics with Verse and Chorus

In [None]:
def generate_line(model, max_tries=100):
    line = model.make_short_sentence(max_chars=90, tries=100)
    return line if line else ""

In [None]:
def generate_verse(model, num_lines=8):
    lines = []
    for _ in range(num_lines):
        line = generate_line(model)
        if line:
            lines.append(line)
    return lines

In [None]:
def generate_chorus(model, num_lines=4):
    lines = []
    base_line = generate_line(model)
    if not base_line:
        base_line = "La la la"

    for _ in range(num_lines):
        if _ % 2 == 0:
            lines.append(base_line)
        else:
            line = generate_line(model)
            lines.append(line if line else base_line)
    return lines

In [None]:
def generate_song(model):
    verse1 = generate_verse(model)
    chorus = generate_chorus(model)
    verse2 = generate_verse(model)

    print("[Verse 1]")
    print("\n".join(verse1))
    print("\n[Chorus]")
    print("\n".join(chorus))
    print("\n[Verse 2]")
    print("\n".join(verse2))

In [None]:
genre = "country"
subset = df[df["tag"] == genre]["lyrics"].dropna().tolist()
text_model_genre = markovify.Text("\n".join(subset), state_size=2)

generate_song(text_model_genre)