## Introduction ##
The Rolling Stone's 500 Greatest Albums ever, a comprehensive list of some of the most influential albums spanning over 7 decades. This is the ultimate who's who of music, the stalwarts of the industry. And today, I'm going to dissect this awesome list and see what I can unearth.

## First Look at the Data ##
Let's load up the data and take a look at what we are working with here.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("../input/albumlist.csv", encoding = 'latin1') # I've added that 'encoding' bit just to fix the unicode decoding error in python 3
df.loc[0:20]

## A little housekeeping ##
At first sight, the genre field seems to have too much stuff in it. A whole host of commas and slashes. Let's take care of that a bit. 
What I'm going to do is restrict the number of genres per album to 1. Now, I know that approach may not seem musically open-minded enough, but I'm going to stick with it anyway.

In [None]:
# Cleaning up genres
split_genre = []
for s in  df["Genre"]:
    split_genre.append(s.split(",")[0]) # Split every genre field entry at the comma
df["Genre"] = split_genre               # and only use the first genre specified
df.loc[0:20]

Alright. Much Better.
Also, it would be useful to add a column that holds the decade of the album release. I will be using this later.

In [None]:
# Adding decades column
newyears = []
for year in df["Year"]:
    if year < 1960:
        newyears.append("50s")
    elif year < 1970:
        newyears.append("60s")
    elif year < 1980:
        newyears.append("70s")
    elif year < 1990:
        newyears.append("80s")
    elif year < 2000:
        newyears.append("90s")
    elif year < 2010:
        newyears.append("00s")
    else:
        newyears.append("10s")
df["Decade"] = newyears
sorter = ["50s", "60s", "70s", "80s", "90s", "00s", "10s"]
df["Decade"] = pd.Categorical(df["Decade"], sorter)
df = df.sort_values("Decade")
df.head()