## Favorite Albums By the Numbers

After four months of dedicated listening, I've compiled my initial list of 200 favorite albums. Here I present some of the highlights, made easy with Python data libraries.

The data is currently being stored in an Excel spreadsheet. Let's instead import it into a dataframe, a Pandas data structure:

In [24]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# import spreadsheet
df = pd.read_excel("/Users/stevendungan/Dropbox/Music Favorites.xlsx")
df.index = df.Rank
df = df.drop('Rank',1)

Now that we have the data in a dataframe, let's crunch some numbers.

Let's see how many albums by each artist made the list:

In [7]:
# album count by artist
df['Artist'].value_counts()

The Beach Boys                        8
Portugal. The Man                     6
Beach House                           6
The Rolling Stones                    5
Radiohead                             5
Neil Young                            4
The Beatles                           4
Uncle Tupelo                          3
Neko Case                             3
Whiskeytown                           3
The Clash                             3
Nick Drake                            3
Elliott Smith                         3
Bon Iver                              3
Big Star                              3
The Jayhawks                          3
The Replacements                      3
Gram Parsons                          2
Jellyfish                             2
Old 97's                              2
Sufjan Stevens                        2
Ryan Adams                            2
Brian Eno                             2
David Bowie                           2
Elvis Costello and the Attractions    2


...and how many albums by each artist made the top 25:

In [17]:
# album count by artist for top 25 albums
df['Artist'][:25].value_counts()

The Beach Boys                 4
The Beatles                    2
Big Star                       2
Gram Parsons                   2
The Rolling Stones             2
Beach House                    1
Beachwood Sparks               1
The Clash                      1
Radiohead                      1
Van Morrison                   1
The Byrds                      1
The Flying Burrito Brothers    1
The Replacements               1
Neil Young                     1
Ryan Adams                     1
The Zombies                    1
The Dear Hunter                1
Whiskeytown                    1
Name: Artist, dtype: int64

Hmm, that does look like a lot of Beach Boys albums...

In [30]:
# Beach Boys albums
df[df['Artist']=="The Beach Boys"].loc[:,['Album','Year']]

Unnamed: 0_level_0,Album,Year
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Pet Sounds,1966
10,The Smile Sessions,2011
16,Sunflower,1970
21,The Beach Boys Today!,1965
39,Friends,1968
62,Love You,1977
73,Carl and the Passions: So Tough,1972
84,Wild Honey,1967


...you're damn right. Brian Wilson is awesome.

Let's see what I've been listening to recently:

In [10]:
# 20 most recent listened to albums
last_listened = df.sort_values('Last Listen', ascending=False)
last_listened.loc[:,['Album','Artist','Last Listen']].head(20)

Unnamed: 0_level_0,Album,Artist,Last Listen
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
23,Revolver,The Beatles,2017-02-09
2,The Gilded Palace of Sin,The Flying Burrito Brothers,2017-02-08
145,Nebraska,Bruce Springsteen,2017-02-08
62,Love You,The Beach Boys,2017-02-08
64,In the Mountain in the Cloud,Portugal. The Man,2017-02-08
183,Some Girls,The Rolling Stones,2017-02-08
184,Moon Safari,Air,2017-02-08
52,No Other,Gene Clark,2017-02-08
190,Barna Howard,Barna Howard,2017-02-07
160,Illinois,Sufjan Stevens,2017-02-07


Now, let's take a look at some release year statistics.

Albums released in my lifetime:

In [11]:
# albums from my lifetime (1990-current)
df[df['Year']>=1990].loc[:,['Album','Artist','Year']].sort_values(['Year','Artist','Album'])


Unnamed: 0_level_0,Album,Artist,Year
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
69,Bellybutton,Jellyfish,1990
65,No Depression,Uncle Tupelo,1990
130,Bandwagonesque,Teenage Fanclub,1991
194,Gravity,Alejandro Escovedo,1992
76,I Am the Cosmos,Chris Bell,1992
96,Hollywood Town Hall,The Jayhawks,1992
135,"March 16-20, 1992",Uncle Tupelo,1992
71,Five Days in July,Blue Rodeo,1993
170,Hints Allegations and Things Left Unsaid,Collective Soul,1993
185,Amor Amarillo,Gustavo Cerati,1993


How about a year-by-year breakdown:

In [14]:
# year-by-year breakdown
df['Year'].value_counts()

1970    8
1995    7
2005    7
1972    7
1968    7
1994    7
1971    7
2010    7
2015    6
1969    6
1997    6
1977    6
1993    6
1975    6
1973    6
2000    5
1999    5
1996    5
2011    5
1967    5
1978    5
1974    4
2008    4
2013    4
2016    4
1992    4
2007    4
2006    4
2012    3
2014    3
2004    3
1979    3
1966    3
2009    3
1983    2
1998    2
1990    2
2001    2
2002    2
2003    2
1985    2
1965    2
1989    2
1991    1
1976    1
1980    1
1982    1
1984    1
1988    1
1962    1
Name: Year, dtype: int64

Here are those albums from 1970:

In [37]:
# albums from 1970
df[df['Year']==1970].loc[:,['Album','Artist']].sort_values('Artist')

Unnamed: 0_level_0,Album,Artist
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
196,Tea for the Tillerman,Cat Stevens
121,Cosmo's Factory,Creedence Clearwater Revival
94,All Things Must Pass,George Harrison
6,After the Gold Rush,Neil Young
82,Bridge over Troubled Water,Simon & Garfunkel
16,Sunflower,The Beach Boys
44,American Beauty,The Grateful Dead
127,Moondance,Van Morrison
