# Fantano Review Notebook 1 - Exploratory Data Analysis

*In this project*, I will analyze and visualize data of albums rated by "The Internet's Busiest Music Nerd," Anthony Fantano. Fantano's YouTube channel "The Needle Drop" has 2.12 Million subscribers (as of April 2020) and he has rated over 1,700 albums. After analyzing and visualizing the data, I will make two Machine Learning models to predict future album ratings. The first model will be a Logistic Regession model for binary classification which will classify albums as either "highly rated" or "not highly rated." The second will be a Neural Network which will classify albums on the same 1-10 scale that Fantano uses for all of his ratings.  

*In this first notebook*, I will combine two CSV files (one with Anthony Fantano ratings information from Kaggle and one with Spotify album information created using a Spotify API) to explore data of albums that Anthony Fantano has rated. I will combine the CSVs, clean the data, and organize the final CSV is a way that is conducive to visualization and modeling.

# Step 1 - Pull & aggregate relevant data from Spotify CSV

In [1]:
import pandas as pd

We start by importing the Spotify dataset and inspecting the first few rows. This dataset was made using Spotify's API and a description of each column may be found here: https://developer.spotify.com/documentation/web-api/reference/tracks/get-several-audio-features/

In [61]:
spotify = pd.read_csv('spotify_data.csv')

spotify.head()

Unnamed: 0,spotify_id,album_name,artist_name,track_name,duration_ms,key,mode,tempo,time_signature,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence
0,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Clock Catcher,72627,5,0,93.7,4,-5.876,0.148,0.192,0.221,0.596,0.423,0.708,0.149
1,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Pickled!,133693,6,0,156.221,4,-2.545,0.188,0.00135,0.376,0.167,0.425,0.936,0.397
2,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Nose Art,118320,0,1,76.972,4,-3.211,0.00167,0.000613,0.317,0.0991,0.685,0.989,0.522
3,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Intro//A Cosmic Drama,74760,9,0,62.156,5,-7.744,0.0526,0.292,0.0324,0.133,0.218,0.554,0.131
4,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Zodiac Shit,164640,10,0,79.275,4,-4.606,0.0641,0.0229,0.273,0.3,0.346,0.753,0.184


The following categories are confidence intervals indicating Spotify's confidence that the song either is or is not instrumental, etc:
* acousticness - 0 is not acoustic, 1 is acoustic
* instrumentalness - 0 is not instrumental, 1 is instrumental
* liveness - 0 is not a recording of a live performance, 1 is a recording of a live performance

Because these values are meant to represent binary data, I will transform them to either values of 0 or 1. I will use the following thresholds, based on information from Spotify's API description:
* acousticness: values > 0.5 are acoustic (1)
* instrumentalness: values > 0.5 are instrumental (1)
* liveness: values > 0.8 are live (1)

In [62]:
spotify['acousticness'] = spotify['acousticness'].apply(lambda x: 1 if x > 0.5 else 0)
spotify['instrumentalness'] = spotify['instrumentalness'].apply(lambda x: 1 if x > 0.5 else 0)
spotify['liveness'] = spotify['liveness'].apply(lambda x: 1 if x > 0.8 else 0)

spotify.head()

Unnamed: 0,spotify_id,album_name,artist_name,track_name,duration_ms,key,mode,tempo,time_signature,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence
0,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Clock Catcher,72627,5,0,93.7,4,-5.876,0,0,0.221,0,0.423,0.708,0.149
1,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Pickled!,133693,6,0,156.221,4,-2.545,0,0,0.376,0,0.425,0.936,0.397
2,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Nose Art,118320,0,1,76.972,4,-3.211,0,0,0.317,0,0.685,0.989,0.522
3,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Intro//A Cosmic Drama,74760,9,0,62.156,5,-7.744,0,0,0.0324,0,0.218,0.554,0.131
4,5c7XChrHxYaqykCZLaGM5f,Cosmogramma,Flying Lotus,Zodiac Shit,164640,10,0,79.275,4,-4.606,0,0,0.273,0,0.346,0.753,0.184


Next, I want to group the data by Albums, since Fantano reviews albums instead of individual songs. For most columns I will find the mean of all of the songs in the album. This will give an overview of the values for each colum *by album*. 

The following columns, however, I will aggregate by mode instead of mean, as these columns represent discrete (non-continuous) values:
* key
* mode
* time_signature

In [65]:
means = spotify.groupby(['artist_name', 'album_name', 'spotify_id'])['tempo','loudness','instrumentalness',
                                                                    'acousticness','speechiness','liveness',
                                                                     'danceability','energy','valence'].mean().reset_index()
means.head()

Unnamed: 0,artist_name,album_name,spotify_id,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,134.414417,-9.953667,0.0,0.083333,0.100267,0.0,0.587167,0.656083,0.690917
1,$uicideBoy$,Eternal Grey,3jud1ryzH2GV7T1qvWiHZx,117.073571,-5.947786,0.0,0.0,0.140407,0.0,0.718214,0.678,0.284993
2,$uicideBoy$,Radical $uicide,2WI0MZ3Jb7WmcpF3CQiNJx,124.8076,-2.4986,0.0,0.0,0.0941,0.0,0.8358,0.824,0.2898
3,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,112.474923,-11.217231,0.153846,0.384615,0.032885,0.0,0.504162,0.436923,0.363815
4,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,128.357714,-7.042714,0.285714,0.214286,0.049843,0.0,0.461143,0.588929,0.462714


In [68]:
modes = spotify.groupby(['artist_name', 'album_name', 'spotify_id'])['key','mode','time_signature'].agg(lambda x: x.value_counts().index[0]).reset_index()
modes.head()

Unnamed: 0,artist_name,album_name,spotify_id,key,mode,time_signature
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,9,1,4
1,$uicideBoy$,Eternal Grey,3jud1ryzH2GV7T1qvWiHZx,8,1,4
2,$uicideBoy$,Radical $uicide,2WI0MZ3Jb7WmcpF3CQiNJx,1,1,4
3,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,2,1,4
4,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,9,1,4


Now that I have found the relevant means and modes, I want to combine the tables back into one.

In [71]:
combined_mean_mode = pd.merge(means,
                         modes,
                         on=['artist_name', 'album_name', 'spotify_id'])
combined_mean_mode.head()

Unnamed: 0,artist_name,album_name,spotify_id,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence,key,mode,time_signature
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,134.414417,-9.953667,0.0,0.083333,0.100267,0.0,0.587167,0.656083,0.690917,9,1,4
1,$uicideBoy$,Eternal Grey,3jud1ryzH2GV7T1qvWiHZx,117.073571,-5.947786,0.0,0.0,0.140407,0.0,0.718214,0.678,0.284993,8,1,4
2,$uicideBoy$,Radical $uicide,2WI0MZ3Jb7WmcpF3CQiNJx,124.8076,-2.4986,0.0,0.0,0.0941,0.0,0.8358,0.824,0.2898,1,1,4
3,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,112.474923,-11.217231,0.153846,0.384615,0.032885,0.0,0.504162,0.436923,0.363815,2,1,4
4,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,128.357714,-7.042714,0.285714,0.214286,0.049843,0.0,0.461143,0.588929,0.462714,9,1,4


I felt that finding the 'mean' for many of these songs features didn't properly represent the variety in each album. As I believe an album's variety of sounds is relevant to their rating, I will include a measure of varience - I've chosen the Standard Deviation. 

Now I want to find the standard deviation for the columns with mean values (all numeric columns besides key, mode, and time_signature).

In [72]:
standard_deviations = spotify.groupby(['artist_name', 'album_name', 'spotify_id'])['tempo','loudness','instrumentalness',
                                                                                   'acousticness','speechiness','liveness',
                                                                                   'danceability','energy','valence'].std().reset_index()
standard_deviations.head()

Unnamed: 0,artist_name,album_name,spotify_id,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,31.235458,2.147683,0.0,0.288675,0.109117,0.0,0.178058,0.179098,0.226669
1,$uicideBoy$,Eternal Grey,3jud1ryzH2GV7T1qvWiHZx,29.439636,1.600359,0.0,0.0,0.047101,0.0,0.108061,0.13029,0.161724
2,$uicideBoy$,Radical $uicide,2WI0MZ3Jb7WmcpF3CQiNJx,12.871218,1.53792,0.0,0.0,0.073217,0.0,0.062102,0.159567,0.107355
3,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,26.156683,1.865558,0.375534,0.50637,0.004761,0.0,0.163336,0.128091,0.261598
4,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,32.653342,2.149563,0.468807,0.425815,0.072109,0.0,0.152455,0.197175,0.180167


Now I want to combine these standard deviations back into the rest of the dataset.

In [73]:
combined_spotify = pd.merge(combined_mean_mode,
                            standard_deviations,
                            on=['artist_name', 'album_name', 'spotify_id'],
                            suffixes=['', '_standard_dev'])
combined_spotify.columns.values

array(['artist_name', 'album_name', 'spotify_id', 'tempo', 'loudness',
       'instrumentalness', 'acousticness', 'speechiness', 'liveness',
       'danceability', 'energy', 'valence', 'key', 'mode',
       'time_signature', 'tempo_standard_dev', 'loudness_standard_dev',
       'instrumentalness_standard_dev', 'acousticness_standard_dev',
       'speechiness_standard_dev', 'liveness_standard_dev',
       'danceability_standard_dev', 'energy_standard_dev',
       'valence_standard_dev'], dtype=object)

# Step 2 - Introduce Reviews CSV, combine with Spotify CSV, and clean the data

I will import the reviews csv and inspect the first few rows.

In [74]:
reviews = pd.read_csv('fantano_reviews.csv')
reviews.head()

Unnamed: 0,id,title,artist,spotify_id,review_date,review_type,score,word_score,best_tracks,worst_track,link
0,0,Cosmogramma,Flying Lotus,5c7XChrHxYaqykCZLaGM5f,2010-05-05,Album,8.0,,[],,https://www.youtube.com/watch?v=KCuamde9Atc
1,1,Throat,Little Women,1SpWwIEwnXRhc8co4h17hf,2010-05-09,Album,9.0,,[],,https://www.youtube.com/watch?v=cndwH6byJnk
2,2,Latin,Holy Fuck,1BStiBF6YFDPgGuNPh5rhi,2010-05-10,Album,7.0,,[],,https://www.youtube.com/watch?v=ySXryTlo9Ac
3,3,High Violet,The National,59gXPxZ8CwFaeknPxtxXHZ,2010-05-11,Album,6.0,,[],,https://www.youtube.com/watch?v=DuMUDldrG3g
4,4,At Echo Lake,Woods,45YGCEh09Ji6rdHjPgNdw8,2010-05-12,Album,8.0,,[],,https://www.youtube.com/watch?v=ncrpTX6jR5w


Before combining the CSVs, I will drop some information from the reviews dataset. Much of this information is not relevant to our eventual goal of predicting album ratings. I will drop the following columns:
* id
* review_type
* word_score
* best_tracks
* worst_track
* link

Really all I *need* to keep for modeling are the columns 'spotify_id' and 'score.' However, I want to keep 'title' and 'artist' to quickly verify that the two CSVs do in fact merge correctly. I'm keeping 'review_date' for now, because I think I can do some interesting visualizations with that data.

In [83]:
reviews_condensed = reviews.drop(columns=['id','review_type','word_score','best_tracks','worst_track','link'])
reviews_condensed.tail(30)

Unnamed: 0,title,artist,spotify_id,review_date,score
1704,Golden Hour,Kacey Musgraves,7f6xPqyaolTiziKf5R5Z0c,2018-04-07,4.0
1705,Invasion of Privacy,Cardi B,4KdtEKjY3Gi0mKiSdy96ML,2018-04-10,6.0
1706,Vacation in Hell,Flatbush Zombies,622gmiTAUET3WtAZq8aXtS,2018-04-11,6.0
1707,Isolation,Kali Uchis,4EPQtdq6vvwxuYeQTrwDVY,2018-04-13,8.0
1708,Your Queen Is a Reptile,Sons of Kemet,4pxnDGBdoGu88h8ZInX5f5,2018-04-17,8.0
1709,The Tree of Forgiveness,John Prine,13UwfQZqne7ZQIkUZsAPLg,2018-04-18,7.0
1710,CARE FOR ME,Saba,6Te111t5gDZ7W94myHRqUt,2018-04-18,
1711,Sex & Food,Unknown Mortal Orchestra,7c2Xfq7aQKzs0KdSI3K7Rc,2018-04-20,5.0
1712,KOD,J. Cole,4Wv5UAieM1LDEYVq5WmqDd,2018-04-23,5.0
1713,"Bark Your Head Off, Dog",Hop Along,7taAuoHvZ4LJzEaB0OzuP3,2018-04-25,8.0


Next, I want to drop any rows where the score is 'NaN'. This indicated that the reviews dataset didn't actually have a numerical review for this album. 

In [84]:
# if this works properly, rows with indices 1710 and 1715 should be dropped, but all others in .tail(30) should remain 
reviews_clean = reviews_condensed.dropna()
reviews_clean.tail(30)


Unnamed: 0,title,artist,spotify_id,review_date,score
1702,Czarface Meets Metal Face,CZARFACE & MF DOOM,2PPvDD3t985MvMphfSwzgr,2018-04-04,7.0
1703,Everything's Fine,Jean Grae x Quelle Chris,22oHrB4SLwayyuLE02t5BD,2018-04-06,8.0
1704,Golden Hour,Kacey Musgraves,7f6xPqyaolTiziKf5R5Z0c,2018-04-07,4.0
1705,Invasion of Privacy,Cardi B,4KdtEKjY3Gi0mKiSdy96ML,2018-04-10,6.0
1706,Vacation in Hell,Flatbush Zombies,622gmiTAUET3WtAZq8aXtS,2018-04-11,6.0
1707,Isolation,Kali Uchis,4EPQtdq6vvwxuYeQTrwDVY,2018-04-13,8.0
1708,Your Queen Is a Reptile,Sons of Kemet,4pxnDGBdoGu88h8ZInX5f5,2018-04-17,8.0
1709,The Tree of Forgiveness,John Prine,13UwfQZqne7ZQIkUZsAPLg,2018-04-18,7.0
1711,Sex & Food,Unknown Mortal Orchestra,7c2Xfq7aQKzs0KdSI3K7Rc,2018-04-20,5.0
1712,KOD,J. Cole,4Wv5UAieM1LDEYVq5WmqDd,2018-04-23,5.0


We're ready to combine the CSVs! I'll be using an inner merge because I only want to keep rows that are in the Spotify dataset AND the reviews dataset. Unfortunately, not all of the albums that Fantano reviewed were available on Spotify. I will be discarding those rows from the combined dataset, since we cannot do useful visualization or modeling without the Spotify data.

In [87]:
spotify_reviews = pd.merge(combined_spotify,
                           reviews_clean,
                           how='inner',
                           on='spotify_id')
spotify_reviews.head()

Unnamed: 0,artist_name,album_name,spotify_id,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,...,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,title,artist,review_date,score
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,134.414417,-9.953667,0.0,0.083333,0.100267,0.0,0.587167,...,0.288675,0.109117,0.0,0.178058,0.179098,0.226669,Mandatory Fun,"""Weird Al"" Yankovic",2014-07-23,7.0
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,112.474923,-11.217231,0.153846,0.384615,0.032885,0.0,0.504162,...,0.50637,0.004761,0.0,0.163336,0.128091,0.261598,Beach Music,Alex G,2015-10-14,6.0
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,128.357714,-7.042714,0.285714,0.214286,0.049843,0.0,0.461143,...,0.425815,0.072109,0.0,0.152455,0.197175,0.180167,Rocket,(Sandy) Alex G,2017-05-24,5.0
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,135.056692,-5.171385,0.0,0.076923,0.153346,0.0,0.764846,...,0.27735,0.083065,0.0,0.114408,0.12951,0.256612,Based on a T.R.U. Story,2 Chainz,2012-08-16,4.0
4,21 Savage,Issa Album,1GeQZeg77rUX8s0pEiz8jS,129.126643,-10.567214,0.142857,0.071429,0.3435,0.0,0.833143,...,0.267261,0.092056,0.0,0.065064,0.110153,0.145041,Issa Album,21 Savage,2017-07-10,3.0


It looks like it worked! Now I can drop the columns 'title' and 'artist,' since I have verified that they are the same in both datasets. 

In [99]:
spotify_reviews_almost_clean = spotify_reviews.drop(columns=['title','artist'])
spotify_reviews_almost_clean.head()

Unnamed: 0,artist_name,album_name,spotify_id,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,...,loudness_standard_dev,instrumentalness_standard_dev,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,review_date,score
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,134.414417,-9.953667,0.0,0.083333,0.100267,0.0,0.587167,...,2.147683,0.0,0.288675,0.109117,0.0,0.178058,0.179098,0.226669,2014-07-23,7.0
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,112.474923,-11.217231,0.153846,0.384615,0.032885,0.0,0.504162,...,1.865558,0.375534,0.50637,0.004761,0.0,0.163336,0.128091,0.261598,2015-10-14,6.0
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,128.357714,-7.042714,0.285714,0.214286,0.049843,0.0,0.461143,...,2.149563,0.468807,0.425815,0.072109,0.0,0.152455,0.197175,0.180167,2017-05-24,5.0
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,135.056692,-5.171385,0.0,0.076923,0.153346,0.0,0.764846,...,1.466928,0.0,0.27735,0.083065,0.0,0.114408,0.12951,0.256612,2012-08-16,4.0
4,21 Savage,Issa Album,1GeQZeg77rUX8s0pEiz8jS,129.126643,-10.567214,0.142857,0.071429,0.3435,0.0,0.833143,...,1.623469,0.363137,0.267261,0.092056,0.0,0.065064,0.110153,0.145041,2017-07-10,3.0


Before we save this CSV, I want to check one last thing: duplicates.

In [100]:
duplicated = spotify_reviews_almost_clean[spotify_reviews_almost_clean.duplicated('spotify_id')]
duplicated

Unnamed: 0,artist_name,album_name,spotify_id,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,...,loudness_standard_dev,instrumentalness_standard_dev,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,review_date,score
973,Planningtorock,All Love's Legal,0Cc7BlKV0vZnGPx0fAFqY7,111.1855,-10.295667,0.166667,0.333333,0.064167,0.0,0.507917,...,2.479021,0.380693,0.481543,0.041193,0.0,0.172431,0.163528,0.20895,2014-04-01,0.0


I found a duplicate on the column 'spotify_id'. To get a clearer picture of what's going on, I want to see both rows with this spotify id.

In [101]:
spotify_reviews_almost_clean[spotify_reviews_almost_clean['spotify_id'] == '0Cc7BlKV0vZnGPx0fAFqY7']

Unnamed: 0,artist_name,album_name,spotify_id,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,...,loudness_standard_dev,instrumentalness_standard_dev,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,review_date,score
972,Planningtorock,All Love's Legal,0Cc7BlKV0vZnGPx0fAFqY7,111.1855,-10.295667,0.166667,0.333333,0.064167,0.0,0.507917,...,2.479021,0.380693,0.481543,0.041193,0.0,0.172431,0.163528,0.20895,2014-04-01,10.0
973,Planningtorock,All Love's Legal,0Cc7BlKV0vZnGPx0fAFqY7,111.1855,-10.295667,0.166667,0.333333,0.064167,0.0,0.507917,...,2.479021,0.380693,0.481543,0.041193,0.0,0.172431,0.163528,0.20895,2014-04-01,0.0


It looks like one version of this album was rated a 10.0 and the other was rated a 0.0. I'm going to delete both rows, because this looks like bad data.

In [103]:
spotify_reviews_clean = spotify_reviews_almost_clean.drop([972,973])
spotify_reviews_clean.head(975)

Unnamed: 0,artist_name,album_name,spotify_id,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,...,loudness_standard_dev,instrumentalness_standard_dev,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,review_date,score
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,134.414417,-9.953667,0.000000,0.083333,0.100267,0.000000,0.587167,...,2.147683,0.000000,0.288675,0.109117,0.000000,0.178058,0.179098,0.226669,2014-07-23,7.0
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,112.474923,-11.217231,0.153846,0.384615,0.032885,0.000000,0.504162,...,1.865558,0.375534,0.506370,0.004761,0.000000,0.163336,0.128091,0.261598,2015-10-14,6.0
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,128.357714,-7.042714,0.285714,0.214286,0.049843,0.000000,0.461143,...,2.149563,0.468807,0.425815,0.072109,0.000000,0.152455,0.197175,0.180167,2017-05-24,5.0
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,135.056692,-5.171385,0.000000,0.076923,0.153346,0.000000,0.764846,...,1.466928,0.000000,0.277350,0.083065,0.000000,0.114408,0.129510,0.256612,2012-08-16,4.0
4,21 Savage,Issa Album,1GeQZeg77rUX8s0pEiz8jS,129.126643,-10.567214,0.142857,0.071429,0.343500,0.000000,0.833143,...,1.623469,0.363137,0.267261,0.092056,0.000000,0.065064,0.110153,0.145041,2017-07-10,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
970,Pissgrave,Suicide Euphoria,665dNixFI8wcSx4q0BD8e2,131.595000,-4.660444,1.000000,0.000000,0.094667,0.000000,0.160111,...,0.245466,0.000000,0.000000,0.014446,0.000000,0.037880,0.042008,0.002957,2015-08-04,7.0
971,Pixies,Indie Cindy,1DDCTTXogJbszTFZgGhwGj,104.222417,-3.859250,0.083333,0.000000,0.049267,0.000000,0.475583,...,0.864906,0.288675,0.000000,0.017007,0.000000,0.042213,0.076494,0.163274,2014-04-24,4.0
974,Planningtorock,W,5LpF1uV3RpZKPA3ZwgK8hI,125.536167,-8.050167,0.083333,0.416667,0.047833,0.000000,0.614250,...,2.370768,0.288675,0.514929,0.014169,0.000000,0.181507,0.248036,0.171381,2011-06-04,2.0
975,Playboi Carti,Die Lit,7dAm8ShwJLFm9SaJ6Yc58O,135.603000,-6.145421,0.000000,0.000000,0.176179,0.052632,0.759474,...,1.378389,0.000000,0.000000,0.093991,0.229416,0.119263,0.097629,0.151734,2018-05-15,7.0


# Step 3 - Save the combined & cleaned data to a new CSV

In [106]:
spotify_reviews_clean.to_csv('fantano_spotify.csv')