# Lyrics Analysis Toolkit for Digital Humanities

This notebook provides tools for downloading and analyzing song lyrics for academic research using distant reading techniques. Perfect for digital humanities projects analyzing popular culture, literary themes in music, and cultural studies.

**Features:**
- Download lyrics from Genius API
- Perform text analysis suitable for distant reading
- Save lyrics as organized text files for research
- Download entire albums for comparative analysis

**Requirements:**
- Free Genius API token (get at https://genius.com/api-clients)
- Python libraries: lyricsgenius, requests

In [None]:
# Install required libraries
# Use %pip for better compatibility across different notebook environments
%pip install lyricsgenius requests

In [None]:
# Setup and imports
import lyricsgenius
import time
import requests
import re
import os
from collections import Counter
from datetime import datetime

# Replace with your Genius API token
GENIUS_ACCESS_TOKEN = 'YOUR_TOKEN_HERE'

# Initialize with rate limiting for academic use
genius = lyricsgenius.Genius(GENIUS_ACCESS_TOKEN)
genius.timeout = 20
genius.sleep_time = 2  # 2 seconds between requests
genius.retries = 2
genius.remove_section_headers = True  # Clean up lyrics text

print("✓ Genius API client initialized with rate limiting")

In [None]:
# Test API connection and token validity
def test_api_connection(token):
    """Test if API token is valid"""
    headers = {'Authorization': f'Bearer {token}'}
    try:
        response = requests.get('https://api.genius.com/account', headers=headers)
        if response.status_code == 200:
            data = response.json()
            print("✓ API Token is VALID!")
            print(f"Account: {data['response']['user']['name']}")
            return True
        elif response.status_code == 401:
            print("✗ Token is INVALID or EXPIRED")
            return False
        else:
            print(f"✗ API returned status code: {response.status_code}")
            return False
    except Exception as e:
        print(f"✗ Error testing token: {e}")
        return False

# Test the connection
test_api_connection(GENIUS_ACCESS_TOKEN)

In [None]:
# Core function: Download single song lyrics
def download_song_lyrics(artist_name, song_title, delay=2):
    """
    Download lyrics for a single song for research purposes
    Returns structured data suitable for analysis
    """
    try:
        print(f"Downloading: {song_title} by {artist_name}")
        time.sleep(delay)  # Rate limiting
        
        song = genius.search_song(song_title, artist_name)
        
        if song and song.lyrics:
            return {
                'title': song.title,
                'artist': song.artist,
                'lyrics': song.lyrics,
                'album': getattr(song, 'album', 'Unknown'),
                'year': getattr(song, 'year', 'Unknown'),
                'url': getattr(song, 'url', '')
            }
        else:
            print(f"No lyrics found for {song_title}")
            return None
            
    except Exception as e:
        print(f"Error downloading {song_title}: {e}")
        return None

# Example usage
song_data = download_song_lyrics("Taylor Swift", "Anti-Hero")
if song_data:
    print(f"✓ Successfully downloaded '{song_data['title']}'")
    print(f"Lyrics length: {len(song_data['lyrics'])} characters")

In [None]:
# Advanced function: Download entire album
def download_album_lyrics(artist_name, album_name, max_songs=20):
    """
    Download lyrics for all songs in an album
    Returns list of song data for corpus analysis
    """
    print(f"Downloading album: {album_name} by {artist_name}")
    print(f"Limiting to {max_songs} songs to respect rate limits")
    
    try:
        # Search for artist first
        artist = genius.search_artist(artist_name, max_songs=max_songs, sort="popularity")
        
        if not artist:
            print(f"Artist {artist_name} not found")
            return []
        
        # Filter songs by album
        album_songs = []
        for song in artist.songs:
            # Check if song is from the specified album
            song_album = getattr(song, 'album', '')
            if album_name.lower() in song_album.lower():
                song_data = {
                    'title': song.title,
                    'artist': song.artist,
                    'lyrics': song.lyrics,
                    'album': song_album,
                    'year': getattr(song, 'year', 'Unknown'),
                    'url': getattr(song, 'url', '')
                }
                album_songs.append(song_data)
                print(f"✓ Downloaded: {song.title}")
                time.sleep(3)  # Longer delay for album downloads
        
        print(f"\n✓ Album download complete: {len(album_songs)} songs")
        return album_songs
        
    except Exception as e:
        print(f"Error downloading album: {e}")
        return []

# Example: Download an album (uncomment to use)
# album_songs = download_album_lyrics("Taylor Swift", "Midnights", max_songs=15)
# print(f"Downloaded {len(album_songs)} songs from the album")

In [None]:
# Text analysis functions for distant reading
def analyze_lyrics_text(lyrics_text):
    """
    Perform distant reading analysis on lyrics text
    Returns quantitative metrics suitable for research
    """
    if not lyrics_text:
        return None
    
    lyrics = lyrics_text.lower()
    
    # Basic metrics
    total_chars = len(lyrics)
    total_words = len(lyrics.split())
    lines = [line.strip() for line in lyrics.split('\n') if line.strip()]
    total_lines = len(lines)
    
    # Clean text for analysis
    clean_text = re.sub(r'[^\w\s]', '', lyrics)
    words = clean_text.split()
    
    # Word frequency analysis
    word_freq = Counter(words)
    most_common = word_freq.most_common(10)
    
    # Sentiment indicators (keyword counting)
    positive_words = ['love', 'happy', 'good', 'beautiful', 'amazing', 'wonderful', 'joy', 'hope', 'dream']
    negative_words = ['sad', 'hurt', 'pain', 'crying', 'broken', 'lonely', 'dark', 'lost', 'fear']
    
    positive_count = sum(word_freq[word] for word in positive_words if word in word_freq)
    negative_count = sum(word_freq[word] for word in negative_words if word in word_freq)
    
    # Repetition analysis
    unique_words = len(set(words))
    repetition_ratio = len(words) / unique_words if unique_words > 0 else 0
    
    return {
        'total_chars': total_chars,
        'total_words': total_words,
        'total_lines': total_lines,
        'unique_words': unique_words,
        'repetition_ratio': round(repetition_ratio, 2),
        'most_common_words': most_common,
        'positive_sentiment': positive_count,
        'negative_sentiment': negative_count,
        'avg_words_per_line': round(total_words / total_lines if total_lines > 0 else 0, 2)
    }

# Test analysis on downloaded song
if 'song_data' in locals() and song_data:
    analysis = analyze_lyrics_text(song_data['lyrics'])
    if analysis:
        print(f"=== Analysis: {song_data['title']} ===")
        print(f"Words: {analysis['total_words']} | Unique: {analysis['unique_words']}")
        print(f"Repetition ratio: {analysis['repetition_ratio']}")
        print(f"Sentiment - Positive: {analysis['positive_sentiment']} | Negative: {analysis['negative_sentiment']}")

In [None]:
# Save functions for building research corpus
def save_lyrics_corpus(songs_data, folder_path="lyrics_corpus"):
    """
    Save multiple songs to organized text files
    Perfect for building research corpora
    """
    if not songs_data:
        print("No songs data to save")
        return []
    
    # Create folder if it doesn't exist
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)
        print(f"Created corpus folder: {folder_path}")
    
    saved_files = []
    
    # Handle single song or list of songs
    if isinstance(songs_data, dict):
        songs_data = [songs_data]
    
    for song_data in songs_data:
        if not song_data or not song_data.get('lyrics'):
            continue
            
        # Create safe filename
        safe_title = re.sub(r'[^\w\s-]', '', song_data['title'])
        safe_artist = re.sub(r'[^\w\s-]', '', song_data['artist'])
        filename = f"{safe_artist} - {safe_title}.txt"
        filepath = os.path.join(folder_path, filename)
        
        try:
            with open(filepath, 'w', encoding='utf-8') as f:
                # Write research metadata
                f.write(f"Title: {song_data['title']}\n")
                f.write(f"Artist: {song_data['artist']}\n")
                f.write(f"Album: {song_data.get('album', 'Unknown')}\n")
                f.write(f"Year: {song_data.get('year', 'Unknown')}\n")
                f.write(f"Source: {song_data.get('url', '')}\n")
                f.write(f"Downloaded: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
                f.write("=" * 50 + "\n\n")
                
                # Write lyrics for analysis
                f.write(song_data['lyrics'])
            
            saved_files.append(filepath)
            print(f"✓ Saved: {filename}")
            
        except Exception as e:
            print(f"✗ Error saving {filename}: {e}")
    
    print(f"\n✓ Corpus saved: {len(saved_files)} files in {folder_path}")
    return saved_files

# Save the downloaded song
if 'song_data' in locals() and song_data:
    saved_files = save_lyrics_corpus(song_data)
    print(f"Research corpus ready for analysis!")

## Usage Examples for Digital Humanities Research

### Single Song Analysis
```python
# Download and analyze a specific song
song = download_song_lyrics("Artist Name", "Song Title")
analysis = analyze_lyrics_text(song['lyrics'])
save_lyrics_corpus(song)
```

### Album Corpus Building
```python
# Build a research corpus from an album
album = download_album_lyrics("Artist Name", "Album Name", max_songs=15)
save_lyrics_corpus(album, folder_path="research_corpus")
```

### Comparative Analysis
```python
# Compare sentiment across different albums/artists
for song in album:
    analysis = analyze_lyrics_text(song['lyrics'])
    print(f"{song['title']}: Positive={analysis['positive_sentiment']}, Negative={analysis['negative_sentiment']}")
```

## Research Applications

This toolkit enables:
- **Cultural Studies**: Analyze themes across different time periods
- **Literary Analysis**: Study poetic devices and linguistic patterns
- **Social History**: Track cultural attitudes through popular music
- **Computational Humanities**: Apply machine learning to cultural texts
- **Comparative Studies**: Analyze differences between artists, genres, or eras

## Academic Citation

When using this toolkit for research, cite both the Genius API and include download timestamps from the saved files for reproducibility.