# Top 5000 Analysis project

### This notebook is my process of brief analysis and cleaning of the data given. 
#### All the visualisations are made in Tableau with the data saved in the end of this file.

In [23]:
import numpy as np
import pandas as pd
import os

### Let's look at the columns:

In [24]:
cwd = os.getcwd()
album_file = pd.read_csv(cwd + '/rym_top_5000_all_time.csv')
list(album_file.columns)


['Ranking',
 'Album',
 'Artist Name',
 'Release Date',
 'Genres',
 'Descriptors',
 'Average Rating',
 'Number of Ratings',
 'Number of Reviews']

### **'Descriptos'** is an interesting column! Let's see what it contains:

In [25]:
album_file.iloc[:20]

Unnamed: 0,Ranking,Album,Artist Name,Release Date,Genres,Descriptors,Average Rating,Number of Ratings,Number of Reviews
0,1.0,OK Computer,Radiohead,16 June 1997,"Alternative Rock, Art Rock","melancholic, anxious, futuristic, alienation, ...",4.23,70382,1531
1,2.0,Wish You Were Here,Pink Floyd,12 September 1975,"Progressive Rock, Art Rock","melancholic, atmospheric, progressive, male vo...",4.29,48662,983
2,3.0,In the Court of the Crimson King,King Crimson,10 October 1969,"Progressive Rock, Art Rock","fantasy, epic, progressive, philosophical, com...",4.3,44943,870
3,4.0,Kid A,Radiohead,3 October 2000,"Art Rock, Experimental Rock, Electronic","cold, melancholic, futuristic, atmospheric, an...",4.21,58590,734
4,5.0,To Pimp a Butterfly,Kendrick Lamar,15 March 2015,"Conscious Hip Hop, West Coast Hip Hop, Jazz Rap","political, conscious, poetic, protest, concept...",4.27,44206,379
5,6.0,Loveless,My Bloody Valentine,4 November 1991,"Shoegaze, Noise Pop","noisy, ethereal, atmospheric, romantic, dense,...",4.24,49887,1223
6,7.0,The Dark Side of the Moon,Pink Floyd,23 March 1973,"Art Rock, Progressive Rock","philosophical, atmospheric, introspective, exi...",4.2,57622,1549
7,8.0,Abbey Road,The Beatles,26 September 1969,Pop Rock,"melodic, warm, male vocals, bittersweet, summe...",4.25,44544,961
8,9.0,The Velvet Underground & Nico,The Velvet Underground & Nico,12 March 1967,"Art Rock, Experimental Rock","drugs, sexual, raw, urban, noisy, nihilistic, ...",4.23,45570,929
9,10.0,The Rise and Fall of Ziggy Stardust and the Sp...,David Bowie,16 June 1972,"Glam Rock, Pop Rock","science fiction, melodic, anthemic, concept al...",4.26,39501,721


### We will come later to this column. Note that it contains a string of some keywords describing the album, althoug we will need to clean it first.

### Now I'll make time values look so that computers can easily work with them:

In [26]:
album_file["Release Date"] = pd.to_datetime(album_file["Release Date"])
album_file["Release Date"].iloc[:5]

0   1997-06-16
1   1975-09-12
2   1969-10-10
3   2000-10-03
4   2015-03-15
Name: Release Date, dtype: datetime64[ns]

### Now I'll get rid out of all NaN values. Note that only **'Descriptors'** column have them, so I'll set them to a blank string.

In [27]:
album_file.isna().sum()

Ranking                0
Album                  0
Artist Name            0
Release Date           0
Genres                 0
Descriptors          114
Average Rating         0
Number of Ratings      0
Number of Reviews      0
dtype: int64

In [28]:
album_file["Descriptors"] = album_file["Descriptors"].fillna("")

In [29]:
album_file["Descriptors"] = album_file["Descriptors"].fillna("")
album_file.isna().sum()

Ranking              0
Album                0
Artist Name          0
Release Date         0
Genres               0
Descriptors          0
Average Rating       0
Number of Ratings    0
Number of Reviews    0
dtype: int64

In [30]:
album_file.to_csv("albums_edited.csv", index=False)

## Part Two

### Now I am going to work on separating **'Descriptors'** column into separate columns.

In [31]:
for i in list(album_file["Descriptors"]):
    i = i.split(sep=', ')
    #for j in range(10):
    #    album_file["Descriptor " + str(j)]

In [32]:
for i in range(10):
    album_file["Descriptor " + str(i)] = ''

### Let's see what I have done. The result is something, but yet is wrong.

In [33]:
album_file.iloc[:5]

Unnamed: 0,Ranking,Album,Artist Name,Release Date,Genres,Descriptors,Average Rating,Number of Ratings,Number of Reviews,Descriptor 0,Descriptor 1,Descriptor 2,Descriptor 3,Descriptor 4,Descriptor 5,Descriptor 6,Descriptor 7,Descriptor 8,Descriptor 9
0,1.0,OK Computer,Radiohead,1997-06-16,"Alternative Rock, Art Rock","melancholic, anxious, futuristic, alienation, ...",4.23,70382,1531,,,,,,,,,,
1,2.0,Wish You Were Here,Pink Floyd,1975-09-12,"Progressive Rock, Art Rock","melancholic, atmospheric, progressive, male vo...",4.29,48662,983,,,,,,,,,,
2,3.0,In the Court of the Crimson King,King Crimson,1969-10-10,"Progressive Rock, Art Rock","fantasy, epic, progressive, philosophical, com...",4.3,44943,870,,,,,,,,,,
3,4.0,Kid A,Radiohead,2000-10-03,"Art Rock, Experimental Rock, Electronic","cold, melancholic, futuristic, atmospheric, an...",4.21,58590,734,,,,,,,,,,
4,5.0,To Pimp a Butterfly,Kendrick Lamar,2015-03-15,"Conscious Hip Hop, West Coast Hip Hop, Jazz Rap","political, conscious, poetic, protest, concept...",4.27,44206,379,,,,,,,,,,


In [34]:
descr_list =[]
for i in list(album_file["Descriptors"]):
    i = i.split(sep=', ')
    descr_list.append(i)

In [35]:
for index, row in album_file.iterrows():
    for j in range(10):
        try:
            album_file["Descriptor " + str(j)][index] = descr_list[index][j]
        except IndexError:
            pass

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  album_file["Descriptor " + str(j)][index] = descr_list[index][j]


### Now everything seems good. The results is:

In [36]:
album_file.head()

Unnamed: 0,Ranking,Album,Artist Name,Release Date,Genres,Descriptors,Average Rating,Number of Ratings,Number of Reviews,Descriptor 0,Descriptor 1,Descriptor 2,Descriptor 3,Descriptor 4,Descriptor 5,Descriptor 6,Descriptor 7,Descriptor 8,Descriptor 9
0,1.0,OK Computer,Radiohead,1997-06-16,"Alternative Rock, Art Rock","melancholic, anxious, futuristic, alienation, ...",4.23,70382,1531,melancholic,anxious,futuristic,alienation,existential,male vocals,atmospheric,lonely,cold,introspective
1,2.0,Wish You Were Here,Pink Floyd,1975-09-12,"Progressive Rock, Art Rock","melancholic, atmospheric, progressive, male vo...",4.29,48662,983,melancholic,atmospheric,progressive,male vocals,concept album,introspective,serious,longing,bittersweet,meditative
2,3.0,In the Court of the Crimson King,King Crimson,1969-10-10,"Progressive Rock, Art Rock","fantasy, epic, progressive, philosophical, com...",4.3,44943,870,fantasy,epic,progressive,philosophical,complex,surreal,poetic,male vocals,melancholic,technical
3,4.0,Kid A,Radiohead,2000-10-03,"Art Rock, Experimental Rock, Electronic","cold, melancholic, futuristic, atmospheric, an...",4.21,58590,734,cold,melancholic,futuristic,atmospheric,anxious,cryptic,sombre,abstract,introspective,male vocals
4,5.0,To Pimp a Butterfly,Kendrick Lamar,2015-03-15,"Conscious Hip Hop, West Coast Hip Hop, Jazz Rap","political, conscious, poetic, protest, concept...",4.27,44206,379,political,conscious,poetic,protest,concept album,introspective,urban,male vocals,eclectic,passionate


### And last, we don't need the **'Descriptors'** column itself anymore:

In [37]:
album_file = album_file.drop(columns="Descriptors")

In [38]:
album_file.head()

Unnamed: 0,Ranking,Album,Artist Name,Release Date,Genres,Average Rating,Number of Ratings,Number of Reviews,Descriptor 0,Descriptor 1,Descriptor 2,Descriptor 3,Descriptor 4,Descriptor 5,Descriptor 6,Descriptor 7,Descriptor 8,Descriptor 9
0,1.0,OK Computer,Radiohead,1997-06-16,"Alternative Rock, Art Rock",4.23,70382,1531,melancholic,anxious,futuristic,alienation,existential,male vocals,atmospheric,lonely,cold,introspective
1,2.0,Wish You Were Here,Pink Floyd,1975-09-12,"Progressive Rock, Art Rock",4.29,48662,983,melancholic,atmospheric,progressive,male vocals,concept album,introspective,serious,longing,bittersweet,meditative
2,3.0,In the Court of the Crimson King,King Crimson,1969-10-10,"Progressive Rock, Art Rock",4.3,44943,870,fantasy,epic,progressive,philosophical,complex,surreal,poetic,male vocals,melancholic,technical
3,4.0,Kid A,Radiohead,2000-10-03,"Art Rock, Experimental Rock, Electronic",4.21,58590,734,cold,melancholic,futuristic,atmospheric,anxious,cryptic,sombre,abstract,introspective,male vocals
4,5.0,To Pimp a Butterfly,Kendrick Lamar,2015-03-15,"Conscious Hip Hop, West Coast Hip Hop, Jazz Rap",4.27,44206,379,political,conscious,poetic,protest,concept album,introspective,urban,male vocals,eclectic,passionate


In [39]:
album_file.to_csv("albums_edited_2.csv", index=False)