# Taylor Swift Lyrics Analysis: A Data-Driven Exploration

A computational analysis of Taylor Swift's lyrics, examining temporal references, sentiment patterns, and thematic evolution across her discography from 2006-2022.

## Project Overview

This project conducts a detailed analysis of Taylor Swift's lyrics using natural language processing and data analysis techniques. Key areas of investigation include:

- Temporal reference analysis (day/night imagery)
- Sentiment analysis across albums
- Word frequency and thematic patterns
- Chronological evolution of lyrical themes

## Prerequisites

To run this analysis, you'll need Python 3.7+ and several packages. Let's verify your environment:

In [None]:
import sys
print(f"Python version: {sys.version}")

# Required packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import nltk

print("\nRequired packages are installed!")

## NLTK Setup

Run the following cell to download required NLTK data:

In [None]:
nltk.download('vader_lexicon')
nltk.download('stopwords')

## Project Structure

```
├── solution.ipynb          # Main analysis notebook
├── taylor_swift_dataprep.ipynb   # Data preparation notebook
├── README.md              # Project documentation
├── README.ipynb           # This notebook
├── album_year_name.csv    # Album metadata
├── taylor_swift_lyrics_2006-2020_all.csv  # Main dataset
└── lyrics/               # Individual album datasets
    ├── 01-taylor_swift.csv
    ├── 02-fearless_taylors_version.csv
    └── ...
```

## Data Verification

Let's verify that the required data files are present:

In [None]:
import os

required_files = [
    'taylor_swift_lyrics_2006-2020_all.csv',
    'album_year_name.csv'
]

for file in required_files:
    if os.path.exists(file):
        print(f"✓ {file} found")
    else:
        print(f"✗ {file} missing")

## Analysis Components

1. **Data Preprocessing**
   - Text cleaning and normalization
   - Temporal reference extraction
   - Album metadata integration

2. **Temporal Analysis**
   - Day/night reference tracking
   - Chronological patterns
   - Album-level temporal distribution

3. **Sentiment Analysis**
   - Overall sentiment trends
   - Temporal reference sentiment
   - Album-by-album sentiment evolution

4. **Word Frequency Analysis**
   - Most common terms
   - Thematic patterns
   - Temporal evolution of vocabulary

## Getting Started

1. Open `solution.ipynb` to begin the main analysis
2. Run cells sequentially to reproduce the analysis
3. Modify parameters or extend analyses as needed

## Acknowledgments

- Jan Llenzl Dagohoy for the original dataset
- The NLTK team for sentiment analysis tools
- Taylor Swift for the amazing music and lyrics

*Note: This project is for educational and research purposes only. All lyrics are property of their respective owners.*