Song lyrics project

Exploratory Data Analysis and Visualization, Columbia University, Spring 2018.

Project overview

We explored song lyrics data from the Musixmatch + Million Songs dataset to derive conclusions about trends in song lyrics and music across time and geography. We asked questions to explore different facets of the dataset and identified some interesting trends.

Deliverables

The report for this project is available here.

The interactive component, built in d3, allows you to explore data points such as sentiment scores, topic scores and similar artists for the top artists in the One Million Songs + Musixmatch dataset. Click here to view the interactive component.

Folder structure

data/ - Data is dumped here, not included in the repository
interactive/ - Source code for interactive component
experiments/ - Notebooks/scripts that we used to explore the data
lib/ - R utility functions used in the project
process/ - Scripts for downloading and processing the data (Python 3)
- process/pkg/ - Python package with utility functions
- process/clean/ - Cleaning the raw data
- process/transform/ - Code for generating various song vector representations
- process/cluster/ - Clustering songs
report/ - Report files

Data

We are using the Million Song Dataset, specifically the musiXmatch dataset which contains lyics data for 237,662 tracks.

Quickstart

git clone https://github.com/edublancas/song-lyrics
cd song-lyrics

0. Software requirements

This project requires Python 3 and R.

To install Python and R required packages:

make requirements

1. Get raw data

The following command fetches all the datasets we used, it will create a new data/ folder in the current working directory raw data will be stored in data/raw.

make get_data

Note: GLoVe gives some problems when trying to download it using wget, it's better to download it manually, put the uncompressed data in data/raw.

2. Process data

This script runs all the cleaning, processing we did on the data and it outputs the final datasets we used in the report and the interactive component.

make bootstrap

3. Build report

Build the final report.

make report

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
experiments		experiments
interactive		interactive
lib		lib
process		process
report		report
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
bootstrap		bootstrap
get_data		get_data
index.html		index.html
install_r_requirements		install_r_requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Song lyrics project

Project overview

Deliverables

Folder structure

Data

Quickstart

0. Software requirements

1. Get raw data

2. Process data

3. Build report

About

Releases

Packages

Contributors 3

Languages

License

edublancas/song-lyrics

Folders and files

Latest commit

History

Repository files navigation

Song lyrics project

Project overview

Deliverables

Folder structure

Data

Quickstart

0. Software requirements

1. Get raw data

2. Process data

3. Build report

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages