# Fantano Review Notebook 1 - Exploratory Data Analysis

*In this project*, I will analyze and visualize data of albums rated by "The Internet's Busiest Music Nerd," Anthony Fantano. Fantano's YouTube channel "The Needle Drop" has 2.12 Million subscribers (as of April 2020) and he has rated over 1,700 albums. After analyzing and visualizing the data, I will make a Neural Network to predict future album ratings.

*In this first notebook*, I will combine two CSV files (one with Anthony Fantano's ratings information from Kaggle and one with Spotify album information created using Spotify's API) to explore data of albums that Anthony Fantano has rated. I will combine the CSVs, clean the data, and organize the final CSV in a way that is conducive to visualization and modeling.

# Step 1 - Pull & aggregate relevant data from Spotify CSV

In [1]:
import pandas as pd

We start by importing the Spotify dataset and inspecting the first few rows. This dataset was made using Spotify's API and a description of each column may be found here: https://developer.spotify.com/documentation/web-api/reference/tracks/get-several-audio-features/

In [2]:
spotify = pd.read_csv('updated_spotify_data.csv')

spotify.tail()

Unnamed: 0,spotify_id,album_name,artist_name,track_name,duration_ms,key,mode,tempo,time_signature,popularity,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence
19548,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,Communion,412267,0,0,103.678,3,20,-12.379,0.0895,0.97,0.037,0.0873,0.283,0.111,0.0933
19549,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,Sounds Heard from the Moon,545547,10,0,128.073,4,20,-12.913,0.109,0.925,0.0304,0.0945,0.319,0.557,0.0997
19550,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,The Longing of the Yawning Divide,195107,0,0,78.338,4,20,-14.549,0.585,0.981,0.0408,0.206,0.136,0.126,0.0398
19551,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,All Souls' Day,543413,3,1,113.691,4,20,-10.821,0.69,0.886,0.0324,0.107,0.327,0.201,0.0362
19552,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,The First Time I Sat Across from You,508893,10,0,127.143,4,20,-9.826,0.238,0.941,0.0357,0.111,0.29,0.346,0.0388


The following categories are confidence intervals indicating Spotify's confidence that the song either is or is not instrumental, etc:
* acousticness - 0 is not acoustic, 1 is acoustic
* instrumentalness - 0 is not instrumental, 1 is instrumental
* liveness - 0 is not a recording of a live performance, 1 is a recording of a live performance

Because these values are meant to represent binary data, I will transform them to either values of 0 or 1. I will use the following thresholds, based on information from Spotify's API description:
* acousticness: values > 0.5 are acoustic (1)
* instrumentalness: values > 0.5 are instrumental (1)
* liveness: values > 0.8 are live (1)

In [3]:
spotify['acousticness'] = spotify['acousticness'].apply(lambda x: 1 if x > 0.5 else 0)
spotify['instrumentalness'] = spotify['instrumentalness'].apply(lambda x: 1 if x > 0.5 else 0)
spotify['liveness'] = spotify['liveness'].apply(lambda x: 1 if x > 0.8 else 0)

spotify.tail()

Unnamed: 0,spotify_id,album_name,artist_name,track_name,duration_ms,key,mode,tempo,time_signature,popularity,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence
19548,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,Communion,412267,0,0,103.678,3,20,-12.379,0,1,0.037,0,0.283,0.111,0.0933
19549,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,Sounds Heard from the Moon,545547,10,0,128.073,4,20,-12.913,0,1,0.0304,0,0.319,0.557,0.0997
19550,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,The Longing of the Yawning Divide,195107,0,0,78.338,4,20,-14.549,1,1,0.0408,0,0.136,0.126,0.0398
19551,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,All Souls' Day,543413,3,1,113.691,4,20,-10.821,1,1,0.0324,0,0.327,0.201,0.0362
19552,5cOsXBUTKytilc5odrPtpk,Communion,Park Jiha,The First Time I Sat Across from You,508893,10,0,127.143,4,20,-9.826,0,1,0.0357,0,0.29,0.346,0.0388


Next, I want to group the data by Albums, since Fantano reviews albums instead of individual songs. For most columns I will find the mean of all of the songs in the album. This will give an overview of the values for each colum *by album*. 

The following columns, however, I will aggregate by mode instead of mean, as these columns represent discrete (non-continuous) values:
* key
* mode
* time_signature

In [4]:
means = spotify.groupby(['artist_name', 'album_name', 'spotify_id'])['popularity','duration_ms','tempo','loudness','instrumentalness',
                                                                    'acousticness','speechiness','liveness',
                                                                     'danceability','energy','valence'].mean().reset_index()
means.head()

Unnamed: 0,artist_name,album_name,spotify_id,popularity,duration_ms,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,51.0,227201.083333,134.414417,-9.953667,0.0,0.083333,0.100267,0.0,0.587167,0.656083,0.690917
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,48.0,171639.076923,112.474923,-11.217231,0.153846,0.384615,0.032885,0.0,0.504162,0.436923,0.363815
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,50.0,179333.642857,128.357714,-7.042714,0.285714,0.214286,0.049843,0.0,0.461143,0.588929,0.462714
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,36.0,250958.923077,135.056692,-5.171385,0.0,0.076923,0.153346,0.0,0.764846,0.683846,0.438069
4,2 Chainz,Pretty Girls Like Trap Music,5vvvo79z68vWj9yimoygfS,70.0,232039.4375,123.552188,-4.7555,0.0,0.0625,0.248125,0.0,0.764188,0.66325,0.31975


In [5]:
modes = spotify.groupby(['artist_name', 'album_name', 'spotify_id'])['key','mode','time_signature'].agg(lambda x: x.value_counts().index[0]).reset_index()
modes.head()

Unnamed: 0,artist_name,album_name,spotify_id,key,mode,time_signature
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,9,1,4
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,2,1,4
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,9,1,4
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,6,1,4
4,2 Chainz,Pretty Girls Like Trap Music,5vvvo79z68vWj9yimoygfS,1,1,4


Now that I have found the relevant means and modes, I want to combine the tables back into one.

In [6]:
combined_mean_mode = pd.merge(means,
                         modes,
                         on=['artist_name', 'album_name', 'spotify_id'])
combined_mean_mode.head()

Unnamed: 0,artist_name,album_name,spotify_id,popularity,duration_ms,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence,key,mode,time_signature
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,51.0,227201.083333,134.414417,-9.953667,0.0,0.083333,0.100267,0.0,0.587167,0.656083,0.690917,9,1,4
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,48.0,171639.076923,112.474923,-11.217231,0.153846,0.384615,0.032885,0.0,0.504162,0.436923,0.363815,2,1,4
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,50.0,179333.642857,128.357714,-7.042714,0.285714,0.214286,0.049843,0.0,0.461143,0.588929,0.462714,9,1,4
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,36.0,250958.923077,135.056692,-5.171385,0.0,0.076923,0.153346,0.0,0.764846,0.683846,0.438069,6,1,4
4,2 Chainz,Pretty Girls Like Trap Music,5vvvo79z68vWj9yimoygfS,70.0,232039.4375,123.552188,-4.7555,0.0,0.0625,0.248125,0.0,0.764188,0.66325,0.31975,1,1,4


I felt that finding the 'mean' for many of these songs features didn't properly represent the variety in each album. As I believe an album's variety of sounds may be relevant to their rating, I will include a measure of varience - I've chosen the Standard Deviation. 

Now I will find the standard deviation for the columns with mean values (all numeric columns besides key, mode, and time_signature).

In [7]:
standard_deviations = spotify.groupby(['artist_name', 'album_name', 'spotify_id'])['duration_ms','tempo','loudness','instrumentalness',
                                                                                   'acousticness','speechiness','liveness',
                                                                                   'danceability','energy','valence'].std().reset_index()
standard_deviations.head()

Unnamed: 0,artist_name,album_name,spotify_id,duration_ms,tempo,loudness,instrumentalness,acousticness,speechiness,liveness,danceability,energy,valence
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,108027.525637,31.235458,2.147683,0.0,0.288675,0.109117,0.0,0.178058,0.179098,0.226669
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,74976.548237,26.156683,1.865558,0.375534,0.50637,0.004761,0.0,0.163336,0.128091,0.261598
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,51685.567847,32.653342,2.149563,0.468807,0.425815,0.072109,0.0,0.152455,0.197175,0.180167
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,33730.362061,14.662922,1.466928,0.0,0.27735,0.083065,0.0,0.114408,0.12951,0.256612
4,2 Chainz,Pretty Girls Like Trap Music,5vvvo79z68vWj9yimoygfS,39086.343954,28.173881,1.728804,0.0,0.25,0.130747,0.0,0.1123,0.11651,0.14558


Now I want to combine these standard deviations back into the rest of the dataset.

In [8]:
combined_spotify = pd.merge(combined_mean_mode,
                            standard_deviations,
                            on=['artist_name', 'album_name', 'spotify_id'],
                            suffixes=['', '_standard_dev'])
combined_spotify.columns.values

array(['artist_name', 'album_name', 'spotify_id', 'popularity',
       'duration_ms', 'tempo', 'loudness', 'instrumentalness',
       'acousticness', 'speechiness', 'liveness', 'danceability',
       'energy', 'valence', 'key', 'mode', 'time_signature',
       'duration_ms_standard_dev', 'tempo_standard_dev',
       'loudness_standard_dev', 'instrumentalness_standard_dev',
       'acousticness_standard_dev', 'speechiness_standard_dev',
       'liveness_standard_dev', 'danceability_standard_dev',
       'energy_standard_dev', 'valence_standard_dev'], dtype=object)

# Step 2 - Introduce Reviews CSV, combine with Spotify CSV, and clean the data

I will import the reviews csv and inspect the last few rows.

In [9]:
reviews = pd.read_csv('fantano_reviews.csv')
reviews.tail()

Unnamed: 0,id,title,artist,spotify_id,review_date,review_type,score,word_score,best_tracks,worst_track,link
1729,1729,Tell Me How You Really Feel,Courtney Barnett,3l7JWewI3ZByxaT5BCgRx2,2018-05-22,Album,6.0,,['NEED A LITTLE TIME ; CITY LOOKS PRETTY ; NAM...,WALKIN ' ON EGGSHELLS,https://www.youtube.com/watch?v=GkeHYp7MASY
1730,1730,Wide Awake!,Parquet Courts,5uTI2HcpAywDP8Vo1DpJta,2018-05-23,Album,9.0,,"['VIOLENCE ', 'BEFORE THE WATER GETS TOO HIGH ...",BACK TO EARTH,https://www.youtube.com/watch?v=4ZZREmYnygU
1731,1731,Mark Kozelek,Mark Kozelek,2srShhDYRXZ6tYk0I61Pkw,2018-05-25,Album,7.0,,"['THIS IS MY TOWN ', 'LIVE IN CHICAGO ', 'THE ...",YOUNG RIDDICK BOWE,https://www.youtube.com/watch?v=HMIUSLOR350
1732,1732,DAYTONA,Pusha T,07bIdDDe3I3hhWpxU6tuBp,2018-05-28,Album,8.0,,"['IF YOU KNOW YOU KNOW ', 'THE GAMES WE PLAY '...",HARD PIANO,https://www.youtube.com/watch?v=z605Rm7lFTM
1733,1733,Communion,Park Jiha,5cOsXBUTKytilc5odrPtpk,2018-05-29,Album,6.0,,"['THROUGHOUT THE NIGHT ', 'COMMUNION ', ""ALL S...",ACCUMULATION OF TIME,https://www.youtube.com/watch?v=icib8b4GYlI


Before combining the CSVs, I will drop some information from the reviews dataset. Much of this information is not relevant to our eventual goal of predicting album ratings. I will drop the following columns:
* id
* review_type
* word_score
* best_tracks
* worst_track
* link
* review_date

Really all I *need* to keep for modeling are the columns 'spotify_id' and 'score.' However, I want to keep 'title' and 'artist' to quickly verify that the two CSVs do in fact merge correctly.

In [10]:
reviews_condensed = reviews.drop(columns=['id','review_type','word_score','best_tracks','worst_track','link','review_date'])
reviews_condensed.tail(30)

Unnamed: 0,title,artist,spotify_id,score
1704,Golden Hour,Kacey Musgraves,7f6xPqyaolTiziKf5R5Z0c,4.0
1705,Invasion of Privacy,Cardi B,4KdtEKjY3Gi0mKiSdy96ML,6.0
1706,Vacation in Hell,Flatbush Zombies,622gmiTAUET3WtAZq8aXtS,6.0
1707,Isolation,Kali Uchis,4EPQtdq6vvwxuYeQTrwDVY,8.0
1708,Your Queen Is a Reptile,Sons of Kemet,4pxnDGBdoGu88h8ZInX5f5,8.0
1709,The Tree of Forgiveness,John Prine,13UwfQZqne7ZQIkUZsAPLg,7.0
1710,CARE FOR ME,Saba,6Te111t5gDZ7W94myHRqUt,
1711,Sex & Food,Unknown Mortal Orchestra,7c2Xfq7aQKzs0KdSI3K7Rc,5.0
1712,KOD,J. Cole,4Wv5UAieM1LDEYVq5WmqDd,5.0
1713,"Bark Your Head Off, Dog",Hop Along,7taAuoHvZ4LJzEaB0OzuP3,8.0


Next, I want to drop any rows where the score is 'NaN'. This indicates that the reviews dataset didn't actually have a numerical review for this album. 

In [11]:
# if this works properly, rows with indices 1710 and 1715 should be dropped, but all others in .tail(30) should remain 
reviews_clean = reviews_condensed.dropna()
reviews_clean.tail(30)


Unnamed: 0,title,artist,spotify_id,score
1702,Czarface Meets Metal Face,CZARFACE & MF DOOM,2PPvDD3t985MvMphfSwzgr,7.0
1703,Everything's Fine,Jean Grae x Quelle Chris,22oHrB4SLwayyuLE02t5BD,8.0
1704,Golden Hour,Kacey Musgraves,7f6xPqyaolTiziKf5R5Z0c,4.0
1705,Invasion of Privacy,Cardi B,4KdtEKjY3Gi0mKiSdy96ML,6.0
1706,Vacation in Hell,Flatbush Zombies,622gmiTAUET3WtAZq8aXtS,6.0
1707,Isolation,Kali Uchis,4EPQtdq6vvwxuYeQTrwDVY,8.0
1708,Your Queen Is a Reptile,Sons of Kemet,4pxnDGBdoGu88h8ZInX5f5,8.0
1709,The Tree of Forgiveness,John Prine,13UwfQZqne7ZQIkUZsAPLg,7.0
1711,Sex & Food,Unknown Mortal Orchestra,7c2Xfq7aQKzs0KdSI3K7Rc,5.0
1712,KOD,J. Cole,4Wv5UAieM1LDEYVq5WmqDd,5.0


We're ready to combine the CSVs! I'll be using an inner merge because I only want to keep rows that are in the Spotify dataset AND the reviews dataset. Unfortunately, not all of the albums that Fantano reviewed were available on Spotify. I will be discarding those rows from the combined dataset, since we cannot do useful visualization or modeling without the Spotify data.

In [12]:
spotify_reviews = pd.merge(combined_spotify,
                           reviews_clean,
                           how='inner',
                           on='spotify_id')
spotify_reviews.head()

Unnamed: 0,artist_name,album_name,spotify_id,popularity,duration_ms,tempo,loudness,instrumentalness,acousticness,speechiness,...,instrumentalness_standard_dev,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,title,artist,score
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,51.0,227201.083333,134.414417,-9.953667,0.0,0.083333,0.100267,...,0.0,0.288675,0.109117,0.0,0.178058,0.179098,0.226669,Mandatory Fun,"""Weird Al"" Yankovic",7.0
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,48.0,171639.076923,112.474923,-11.217231,0.153846,0.384615,0.032885,...,0.375534,0.50637,0.004761,0.0,0.163336,0.128091,0.261598,Beach Music,Alex G,6.0
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,50.0,179333.642857,128.357714,-7.042714,0.285714,0.214286,0.049843,...,0.468807,0.425815,0.072109,0.0,0.152455,0.197175,0.180167,Rocket,(Sandy) Alex G,5.0
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,36.0,250958.923077,135.056692,-5.171385,0.0,0.076923,0.153346,...,0.0,0.27735,0.083065,0.0,0.114408,0.12951,0.256612,Based on a T.R.U. Story,2 Chainz,4.0
4,21 Savage,Issa Album,1GeQZeg77rUX8s0pEiz8jS,39.0,242553.5,129.126643,-10.567214,0.142857,0.071429,0.3435,...,0.363137,0.267261,0.092056,0.0,0.065064,0.110153,0.145041,Issa Album,21 Savage,3.0


It looks like it worked! Now I can drop the columns 'title' and 'artist,' since I have verified that they are the same in both datasets. 

In [13]:
spotify_reviews_almost_clean = spotify_reviews.drop(columns=['title','artist'])
spotify_reviews_almost_clean.head()

Unnamed: 0,artist_name,album_name,spotify_id,popularity,duration_ms,tempo,loudness,instrumentalness,acousticness,speechiness,...,tempo_standard_dev,loudness_standard_dev,instrumentalness_standard_dev,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,score
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,51.0,227201.083333,134.414417,-9.953667,0.0,0.083333,0.100267,...,31.235458,2.147683,0.0,0.288675,0.109117,0.0,0.178058,0.179098,0.226669,7.0
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,48.0,171639.076923,112.474923,-11.217231,0.153846,0.384615,0.032885,...,26.156683,1.865558,0.375534,0.50637,0.004761,0.0,0.163336,0.128091,0.261598,6.0
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,50.0,179333.642857,128.357714,-7.042714,0.285714,0.214286,0.049843,...,32.653342,2.149563,0.468807,0.425815,0.072109,0.0,0.152455,0.197175,0.180167,5.0
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,36.0,250958.923077,135.056692,-5.171385,0.0,0.076923,0.153346,...,14.662922,1.466928,0.0,0.27735,0.083065,0.0,0.114408,0.12951,0.256612,4.0
4,21 Savage,Issa Album,1GeQZeg77rUX8s0pEiz8jS,39.0,242553.5,129.126643,-10.567214,0.142857,0.071429,0.3435,...,33.867507,1.623469,0.363137,0.267261,0.092056,0.0,0.065064,0.110153,0.145041,3.0


Before we save this CSV, I want to check one last thing: duplicates.

In [14]:
duplicated = spotify_reviews_almost_clean[spotify_reviews_almost_clean.duplicated('spotify_id')]
duplicated

Unnamed: 0,artist_name,album_name,spotify_id,popularity,duration_ms,tempo,loudness,instrumentalness,acousticness,speechiness,...,tempo_standard_dev,loudness_standard_dev,instrumentalness_standard_dev,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,score
973,Planningtorock,All Love's Legal,0Cc7BlKV0vZnGPx0fAFqY7,30.0,212528.916667,111.1855,-10.295667,0.166667,0.333333,0.064167,...,17.640547,2.479021,0.380693,0.481543,0.041193,0.0,0.172431,0.163528,0.20895,0.0


I found a duplicate on the column 'spotify_id'. To get a clearer picture of what's going on, I want to see both rows with this spotify id.

In [15]:
spotify_reviews_almost_clean[spotify_reviews_almost_clean['spotify_id'] == '0Cc7BlKV0vZnGPx0fAFqY7']

Unnamed: 0,artist_name,album_name,spotify_id,popularity,duration_ms,tempo,loudness,instrumentalness,acousticness,speechiness,...,tempo_standard_dev,loudness_standard_dev,instrumentalness_standard_dev,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,score
972,Planningtorock,All Love's Legal,0Cc7BlKV0vZnGPx0fAFqY7,30.0,212528.916667,111.1855,-10.295667,0.166667,0.333333,0.064167,...,17.640547,2.479021,0.380693,0.481543,0.041193,0.0,0.172431,0.163528,0.20895,10.0
973,Planningtorock,All Love's Legal,0Cc7BlKV0vZnGPx0fAFqY7,30.0,212528.916667,111.1855,-10.295667,0.166667,0.333333,0.064167,...,17.640547,2.479021,0.380693,0.481543,0.041193,0.0,0.172431,0.163528,0.20895,0.0


It looks like one version of this album was rated a 10.0 and the other was rated a 0.0. Domain knowledge tells me that this was done by Fantano as an April fool's joke and is not a genuine rating, so I will be deleting both entries because this is bad data.

In [16]:
spotify_reviews_clean = spotify_reviews_almost_clean.drop([972,973])
spotify_reviews_clean.head(975)

Unnamed: 0,artist_name,album_name,spotify_id,popularity,duration_ms,tempo,loudness,instrumentalness,acousticness,speechiness,...,tempo_standard_dev,loudness_standard_dev,instrumentalness_standard_dev,acousticness_standard_dev,speechiness_standard_dev,liveness_standard_dev,danceability_standard_dev,energy_standard_dev,valence_standard_dev,score
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,51.0,227201.083333,134.414417,-9.953667,0.000000,0.083333,0.100267,...,31.235458,2.147683,0.000000,0.288675,0.109117,0.000000,0.178058,0.179098,0.226669,7.0
1,(Sandy) Alex G,Beach Music,0aABjw7BY2iRsK4ZdkwSjF,48.0,171639.076923,112.474923,-11.217231,0.153846,0.384615,0.032885,...,26.156683,1.865558,0.375534,0.506370,0.004761,0.000000,0.163336,0.128091,0.261598,6.0
2,(Sandy) Alex G,Rocket,5Pq92omNLyQgGGrj2u4pur,50.0,179333.642857,128.357714,-7.042714,0.285714,0.214286,0.049843,...,32.653342,2.149563,0.468807,0.425815,0.072109,0.000000,0.152455,0.197175,0.180167,5.0
3,2 Chainz,Based On A T.R.U. Story,5fL5QvYTODefwIrQVAHa0f,36.0,250958.923077,135.056692,-5.171385,0.000000,0.076923,0.153346,...,14.662922,1.466928,0.000000,0.277350,0.083065,0.000000,0.114408,0.129510,0.256612,4.0
4,21 Savage,Issa Album,1GeQZeg77rUX8s0pEiz8jS,39.0,242553.500000,129.126643,-10.567214,0.142857,0.071429,0.343500,...,33.867507,1.623469,0.363137,0.267261,0.092056,0.000000,0.065064,0.110153,0.145041,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
970,Pissgrave,Suicide Euphoria,665dNixFI8wcSx4q0BD8e2,18.0,211647.444444,131.595000,-4.660444,1.000000,0.000000,0.094667,...,18.858173,0.245466,0.000000,0.000000,0.014446,0.000000,0.037880,0.042008,0.002957,7.0
971,Pixies,Indie Cindy,1DDCTTXogJbszTFZgGhwGj,33.0,229529.000000,104.222417,-3.859250,0.083333,0.000000,0.049267,...,15.948134,0.864906,0.288675,0.000000,0.017007,0.000000,0.042213,0.076494,0.163274,4.0
974,Planningtorock,W,5LpF1uV3RpZKPA3ZwgK8hI,17.0,256116.833333,125.536167,-8.050167,0.083333,0.416667,0.047833,...,26.088822,2.370768,0.288675,0.514929,0.014169,0.000000,0.181507,0.248036,0.171381,2.0
975,Playboi Carti,Die Lit,7dAm8ShwJLFm9SaJ6Yc58O,80.0,182510.789474,135.603000,-6.145421,0.000000,0.000000,0.176179,...,28.172495,1.378389,0.000000,0.000000,0.093991,0.229416,0.119263,0.097629,0.151734,7.0


# Step 4 - Save the combined & cleaned data to a new CSV

In [17]:
spotify_reviews_clean.to_csv('fantano_spotify.csv')