# Spotify Data Cleaning and Analysis

### Genres:
- Pop
- Hip-hop
- Jazz
- Rock
- K-pop
- Instrumental
- ASMR

### Features Definitions:

- Danceability - The higher the value, the easier it is to dance to this song. Danceability describes how suitable a track is for dancing based on a combination of muical elements including tempo, rhythm stability, beat strength, and overall regurality.
- Energy - the higher the value, the more energtic the song is. Energy represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
- Acousticness - The higher the value the more acoustic the song is.
- Instrumentalness - The higher the value the greater likelihood the track contains no vocal content. Instrumentalness predicts whether a track contains no vocals.
- Valence - The higher the value, the more positive mood for the song. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
- Speechiness - The higher the value the more spoken word the song contains.
- Loudness (dB) - The higher the value, the louder the song. Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Values typical range between -60 and 0 dB.
- Tempo (BPM) - The tempo of the song (beats per minute).
- Popularity - The higher the value the more popular the song is.


In [1]:
import pandas as pd

import warnings
warnings.simplefilter('ignore', FutureWarning)

## Extract The Necessary Columns 

### Pop

In [2]:
# Read the spotify csv data for each genres
pop = pd.read_csv("spotify_data_pop_v2.csv")
pop.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,The Weeknd,Blinding Lights,100,0VjIjW4GlUZAMYd2vXMi3b,spotify:track:0VjIjW4GlUZAMYd2vXMi3b,0.514,0.73,-5.934,0.0598,0.00146,9.5e-05,0.0897,0.334,171.005,200040,pop
1,Dua Lipa,Don't Start Now,97,6WrI0LAC5M1Rw2MnX2ZvEg,spotify:track:6WrI0LAC5M1Rw2MnX2ZvEg,0.794,0.793,-4.521,0.0842,0.0125,0.0,0.0952,0.677,123.941,183290,pop
2,Doja Cat,Say So,97,3Dv1eDb0MEgF93GpLXlucZ,spotify:track:3Dv1eDb0MEgF93GpLXlucZ,0.787,0.673,-4.577,0.158,0.256,4e-06,0.0904,0.786,110.962,237893,pop
3,Arizona Zervas,ROXANNE,95,696DnlkuDOXcMAnKlTgXXK,spotify:track:696DnlkuDOXcMAnKlTgXXK,0.621,0.601,-5.616,0.148,0.0522,0.0,0.46,0.457,116.735,163636,pop
4,BENEE,Supalonely,95,4nK5YrxbMGZstTLbvj6Gxw,spotify:track:4nK5YrxbMGZstTLbvj6Gxw,0.863,0.631,-4.689,0.0534,0.305,3e-05,0.123,0.817,128.977,223480,pop


In [3]:
pop_clean = pop[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
pop_clean.dropna(inplace=True)
pop_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,pop,0.514,0.73,0.00146,9.5e-05,0.334,0.0598,-5.934,171.005
1,pop,0.794,0.793,0.0125,0.0,0.677,0.0842,-4.521,123.941
2,pop,0.787,0.673,0.256,4e-06,0.786,0.158,-4.577,110.962
3,pop,0.621,0.601,0.0522,0.0,0.457,0.148,-5.616,116.735
4,pop,0.863,0.631,0.305,3e-05,0.817,0.0534,-4.689,128.977


### Hip-hop

In [4]:
# Read the spotify csv data for each genres
hiphop = pd.read_csv("spotify_data_hiphop_v2.csv")
hiphop.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,Drake,Toosie Slide,99,127QTOFJsJQp5LbJbu3A1y,spotify:track:127QTOFJsJQp5LbJbu3A1y,0.834,0.454,-9.75,0.201,0.321,6e-06,0.114,0.837,81.618,247059,hiphop
1,Jack Harlow,WHATS POPPIN,93,1jaTQ3nqY3oAAYyCTbIvnM,spotify:track:1jaTQ3nqY3oAAYyCTbIvnM,0.923,0.604,-6.671,0.245,0.017,0.0,0.272,0.826,145.062,139741,hiphop
2,Future,Life Is Good (feat. Drake),94,5yY9lUy8nbvjM1Uyo1Uqoc,spotify:track:5yY9lUy8nbvjM1Uyo1Uqoc,0.676,0.609,-5.831,0.481,0.0706,0.0,0.152,0.508,142.037,237735,hiphop
3,DaBaby,ROCKSTAR (feat. Roddy Ricch),91,7ytR5pFWmSjzHJIeQkgog4,spotify:track:7ytR5pFWmSjzHJIeQkgog4,0.746,0.69,-7.956,0.164,0.247,0.0,0.101,0.497,89.977,181733,hiphop
4,NLE Choppa,Walk Em Down (feat. Roddy Ricch),90,4cSSL3YafYjM3yjgFO1vJg,spotify:track:4cSSL3YafYjM3yjgFO1vJg,0.867,0.744,-5.171,0.228,0.268,0.0,0.0713,0.645,84.005,173288,hiphop


In [5]:
hiphop_clean = hiphop[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
hiphop_clean.dropna(inplace=True)
hiphop_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,hiphop,0.834,0.454,0.321,6e-06,0.837,0.201,-9.75,81.618
1,hiphop,0.923,0.604,0.017,0.0,0.826,0.245,-6.671,145.062
2,hiphop,0.676,0.609,0.0706,0.0,0.508,0.481,-5.831,142.037
3,hiphop,0.746,0.69,0.247,0.0,0.497,0.164,-7.956,89.977
4,hiphop,0.867,0.744,0.268,0.0,0.645,0.228,-5.171,84.005


### Jazz

In [6]:
# Read the spotify csv data for each genres
jazz = pd.read_csv("spotify_data_jazz_v2.csv")
jazz['genre'] = 'jazz'
jazz.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,"Earth, Wind & Fire",September,82,7Cuk8jsPPoNYQWXK9XRFvG,spotify:track:7Cuk8jsPPoNYQWXK9XRFvG,0.694,0.831,-7.288,0.0301,0.165,0.000892,0.25,0.98,125.901,215080,jazz
1,Louis Armstrong,What A Wonderful World - Single Version,68,29U7stRjqHU6rMiS8BfaI9,spotify:track:29U7stRjqHU6rMiS8BfaI9,0.271,0.165,-20.652,0.0351,0.729,2e-06,0.118,0.203,77.082,139227,jazz
2,Leslie Odom Jr.,Alexander Hamilton,71,4TTV7EcfroSLWzXRY6gLv6,spotify:track:4TTV7EcfroSLWzXRY6gLv6,0.609,0.435,-7.862,0.284,0.524,0.0,0.118,0.563,131.998,236738,jazz
3,Etta James,At Last,75,4Hhv2vrOTy89HFRcjU3QOx,spotify:track:4Hhv2vrOTy89HFRcjU3QOx,0.273,0.347,-8.631,0.0292,0.546,0.0137,0.334,0.328,87.411,179693,jazz
4,"Grover Washington, Jr.",Just the Two of Us (feat. Bill Withers),71,1ko2lVN0vKGUl9zrU0qSlT,spotify:track:1ko2lVN0vKGUl9zrU0qSlT,0.803,0.488,-9.303,0.0803,0.576,0.0609,0.0763,0.624,95.771,237106,jazz


In [7]:
jazz_clean = jazz[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
jazz_clean.dropna(inplace=True)
jazz_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,jazz,0.694,0.831,0.165,0.000892,0.98,0.0301,-7.288,125.901
1,jazz,0.271,0.165,0.729,2e-06,0.203,0.0351,-20.652,77.082
2,jazz,0.609,0.435,0.524,0.0,0.563,0.284,-7.862,131.998
3,jazz,0.273,0.347,0.546,0.0137,0.328,0.0292,-8.631,87.411
4,jazz,0.803,0.488,0.576,0.0609,0.624,0.0803,-9.303,95.771


### Rock

In [8]:
# Read the spotify csv data for each genres
rock = pd.read_csv("spotify_data_rock_v2.csv")
rock['genre'] = 'rock'
rock.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,Twenty One Pilots,Level of Concern,88,6xZ4Q2k2ompmDppyeESIY8,spotify:track:6xZ4Q2k2ompmDppyeESIY8,0.754,0.583,-7.34,0.0432,0.32,0.00015,0.144,0.77,122.012,220051,rock
1,Wallows,Are You Bored Yet? (feat. Clairo),81,57RA3JGafJm5zRtKJiKPIm,spotify:track:57RA3JGafJm5zRtKJiKPIm,0.682,0.683,-6.444,0.0287,0.156,2.3e-05,0.273,0.64,120.023,178000,rock
2,Grouplove,Tongue Tied,81,0GO8y8jQk1PkHzS31d699N,spotify:track:0GO8y8jQk1PkHzS31d699N,0.56,0.936,-5.835,0.0439,0.00847,0.0,0.161,0.371,112.96,218013,rock
3,Imagine Dragons,Believer,88,0pqnGHJpmpxLKifKRmU6WP,spotify:track:0pqnGHJpmpxLKifKRmU6WP,0.776,0.78,-4.374,0.128,0.0622,0.0,0.081,0.666,124.949,204347,rock
4,The Killers,Mr. Brightside,78,7oK9VyNzrYvRFo7nQEYkWN,spotify:track:7oK9VyNzrYvRFo7nQEYkWN,0.356,0.924,-3.74,0.0808,0.00101,0.0,0.0953,0.232,148.017,222587,rock


In [9]:
rock_clean = rock[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
rock_clean.dropna(inplace=True)
rock_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,rock,0.754,0.583,0.32,0.00015,0.77,0.0432,-7.34,122.012
1,rock,0.682,0.683,0.156,2.3e-05,0.64,0.0287,-6.444,120.023
2,rock,0.56,0.936,0.00847,0.0,0.371,0.0439,-5.835,112.96
3,rock,0.776,0.78,0.0622,0.0,0.666,0.128,-4.374,124.949
4,rock,0.356,0.924,0.00101,0.0,0.232,0.0808,-3.74,148.017


### K-pop

In [10]:
# Read the spotify csv data for each genres
kpop = pd.read_csv("spotify_data_kpop_v2.csv")
kpop['genre'] = 'kpop'
kpop.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,BTS,ON,85,2QyuXBcV1LJ2rq01KhreMF,spotify:track:2QyuXBcV1LJ2rq01KhreMF,0.583,0.817,-5.146,0.0987,0.118,0.0,0.338,0.438,105.936,246381,kpop
1,BTS,Boy With Luv (feat. Halsey),84,5KawlOMHjWeUjQtnuRs22c,spotify:track:5KawlOMHjWeUjQtnuRs22c,0.645,0.862,-4.757,0.0965,0.0923,0.0,0.192,0.798,119.991,229773,kpop
2,BTS,Filter,82,0ono6UCNVZ1XqOm6j78Blu,spotify:track:0ono6UCNVZ1XqOm6j78Blu,0.781,0.762,-5.188,0.0626,0.0222,0.0,0.121,0.86,110.042,180221,kpop
3,BTS,My Time,81,4vTgx6h4seHvkuFh84JXYP,spotify:track:4vTgx6h4seHvkuFh84JXYP,0.674,0.64,-5.139,0.0339,0.151,0.0,0.0925,0.664,99.908,234458,kpop
4,BTS,ON (Feat. Sia),81,3IB5qOeMayvpOdHxYCL5tZ,spotify:track:3IB5qOeMayvpOdHxYCL5tZ,0.591,0.848,-4.397,0.0828,0.137,0.0,0.372,0.386,105.922,246816,kpop


In [11]:
kpop_clean = kpop[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
kpop_clean.dropna(inplace=True)
kpop_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,kpop,0.583,0.817,0.118,0.0,0.438,0.0987,-5.146,105.936
1,kpop,0.645,0.862,0.0923,0.0,0.798,0.0965,-4.757,119.991
2,kpop,0.781,0.762,0.0222,0.0,0.86,0.0626,-5.188,110.042
3,kpop,0.674,0.64,0.151,0.0,0.664,0.0339,-5.139,99.908
4,kpop,0.591,0.848,0.137,0.0,0.386,0.0828,-4.397,105.922


### Instrumental

In [12]:
# Read the spotify csv data for each genres
instrumental = pd.read_csv("spotify_data_instrumental_v2.csv")
instrumental.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,Steve Mokwebe,It Ends with Us,68,6RMjZgsE9IcQZqubTzLtDs,spotify:track:6RMjZgsE9IcQZqubTzLtDs,0.326,0.00891,-24.103,0.0518,0.986,0.934,0.111,0.299,69.79,130192,instrumental
1,Rannar Sillard,Dream Voucher,68,2T6wyxLBkQ4Y2ZjTbpuYfr,spotify:track:2T6wyxLBkQ4Y2ZjTbpuYfr,0.302,0.101,-19.705,0.0378,0.943,0.886,0.107,0.213,106.904,155040,instrumental
2,Vala Capon,Presto arriverà il sole,69,2jyJXuG0rIma11mOl4Fz7m,spotify:track:2jyJXuG0rIma11mOl4Fz7m,0.378,0.0437,-26.159,0.0357,0.995,0.92,0.0983,0.423,77.211,152000,instrumental
3,Benette,Lily's Cradle,68,4UlarjdicLUPbdssOxWbYX,spotify:track:4UlarjdicLUPbdssOxWbYX,0.368,0.00892,-29.555,0.05,0.99,0.911,0.102,0.287,63.639,161267,instrumental
4,Ever So Blue,Cessura,70,7uvey8m0ZfknE25sBVWoGY,spotify:track:7uvey8m0ZfknE25sBVWoGY,0.374,0.0989,-28.377,0.0326,0.991,0.805,0.0799,0.342,152.84,185390,instrumental


In [13]:
instrumental_clean = instrumental[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
instrumental_clean.dropna(inplace=True)
instrumental_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,instrumental,0.326,0.00891,0.986,0.934,0.299,0.0518,-24.103,69.79
1,instrumental,0.302,0.101,0.943,0.886,0.213,0.0378,-19.705,106.904
2,instrumental,0.378,0.0437,0.995,0.92,0.423,0.0357,-26.159,77.211
3,instrumental,0.368,0.00892,0.99,0.911,0.287,0.05,-29.555,63.639
4,instrumental,0.374,0.0989,0.991,0.805,0.342,0.0326,-28.377,152.84


### ASMR

In [14]:
# Read the spotify csv data for each genres
asmr = pd.read_csv("spotify_data_asmr_v2.csv")
asmr.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,Weather Factory,Deep Thunderstorm,74,4xyiM6KWF90l9wQh07GJvN,spotify:track:4xyiM6KWF90l9wQh07GJvN,0.135,0.579,-27.105,0.0932,0.00143,0.949,0.393,0.0327,100.841,152563,asmr
1,Weather Factory,Calm Rain & Thunder,73,5fkUSLLclp3LdXiMw3zTgR,spotify:track:5fkUSLLclp3LdXiMw3zTgR,0.16,0.56,-25.388,0.0772,0.00238,0.714,0.413,0.0271,129.874,200693,asmr
2,Weather Factory,Forest Thunderstorm,70,5lyslBBFXOiu2uB0sEdLSa,spotify:track:5lyslBBFXOiu2uB0sEdLSa,0.142,0.91,-20.77,0.115,0.0469,0.957,0.509,0.0135,65.291,150444,asmr
3,Masters of Binaurality,Gamma Freq Pads,64,1yOlQ6v0E3Jgo9iY0dtMyQ,spotify:track:1yOlQ6v0E3Jgo9iY0dtMyQ,0.205,0.121,-16.988,0.0489,0.713,0.939,0.11,0.0283,71.895,220606,asmr
4,Mindful Behaviour,Alpha 8 Hz,63,5VV9fAmGLlXsK568Uax9A2,spotify:track:5VV9fAmGLlXsK568Uax9A2,0.243,0.0632,-31.781,0.0838,0.996,0.794,0.106,0.0531,137.632,224049,asmr


In [15]:
asmr_clean = asmr[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness', 'tempo'
]]
asmr_clean.dropna(inplace=True)
asmr_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,asmr,0.135,0.579,0.00143,0.949,0.0327,0.0932,-27.105,100.841
1,asmr,0.16,0.56,0.00238,0.714,0.0271,0.0772,-25.388,129.874
2,asmr,0.142,0.91,0.0469,0.957,0.0135,0.115,-20.77,65.291
3,asmr,0.205,0.121,0.713,0.939,0.0283,0.0489,-16.988,71.895
4,asmr,0.243,0.0632,0.996,0.794,0.0531,0.0838,-31.781,137.632


## Check Value Types and Min Max Values for Each Features

In [16]:
print("Pop Data Types: ")
print(pop_clean.dtypes)
print("---------------------------------------")

print("Hiphop Data Types: ")
print(hiphop_clean.dtypes)
print("---------------------------------------")

print("Jazz Data Types: ")
print(jazz_clean.dtypes)
print("---------------------------------------")

print("Rock Data Types: ")
print(rock_clean.dtypes)
print("---------------------------------------")

print("Kpop Data Types: ")
print(kpop_clean.dtypes)
print("---------------------------------------")

print("Instrumental Data Types: ")
print(instrumental_clean.dtypes)
print("---------------------------------------")

print("ASMR Data Types: ")
print(asmr_clean.dtypes)

Pop Data Types: 
genre                object
danceability        float64
energy              float64
acousticness        float64
instrumentalness    float64
valence             float64
speechiness         float64
loudness            float64
tempo               float64
dtype: object
---------------------------------------
Hiphop Data Types: 
genre                object
danceability        float64
energy              float64
acousticness        float64
instrumentalness    float64
valence             float64
speechiness         float64
loudness            float64
tempo               float64
dtype: object
---------------------------------------
Jazz Data Types: 
genre                object
danceability        float64
energy              float64
acousticness        float64
instrumentalness    float64
valence             float64
speechiness         float64
loudness            float64
tempo               float64
dtype: object
---------------------------------------
Rock Data Types: 
genre    

In [17]:
print("Pop")
pop_clean.describe()

Pop


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.679188,0.589186,0.243901,0.003028,0.481089,0.09654,-6.229181,119.265083
std,0.136654,0.1481,0.262278,0.0182,0.231616,0.098924,1.816456,24.239718
min,0.378,0.267,0.00146,0.0,0.0592,0.0259,-10.965,75.025
25%,0.572,0.483,0.0237,0.0,0.284,0.042225,-7.206,102.819
50%,0.676,0.58,0.127,0.0,0.48,0.0575,-6.027,113.039
75%,0.806,0.7,0.449,1.2e-05,0.68,0.0924,-4.858,139.98
max,0.923,0.955,0.946,0.13,0.952,0.481,-3.046,171.649


In [18]:
print("Hiphop")
hiphop_clean.describe()

Hiphop


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.793107,0.559354,0.203739,0.001012,0.491629,0.204672,-7.099988,121.264661
std,0.117259,0.110177,0.212072,0.003319,0.200612,0.13012,2.080736,26.550106
min,0.453,0.304,0.000282,0.0,0.1,0.0286,-11.713,75.023
25%,0.734,0.486,0.0622,0.0,0.345,0.0924,-8.465,100.0
50%,0.81,0.559,0.111,0.0,0.46,0.164,-6.972,122.973
75%,0.889,0.624,0.259,2.1e-05,0.62,0.332,-5.723,142.069
max,0.97,0.847,0.874,0.0184,0.966,0.53,-2.708,175.041


In [19]:
print("Jazz")
jazz_clean.describe()

Jazz


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.525875,0.281834,0.726441,0.087477,0.423316,0.078755,-14.063774,111.913961
std,0.150688,0.232828,0.261203,0.235424,0.218146,0.112398,5.727891,34.116181
min,0.172,0.00756,0.0682,0.0,0.0541,0.0247,-27.331,49.689
25%,0.42,0.104,0.596,1e-06,0.23,0.0327,-18.749,86.13
50%,0.517,0.199,0.833,0.000699,0.375,0.0386,-13.193,108.229
75%,0.62225,0.398,0.923,0.00915,0.613,0.0538,-9.767,131.75575
max,0.848,0.924,0.993,0.948,0.98,0.648,-4.729,206.247


In [20]:
print("Rock")
rock_clean.describe()

Rock


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.577392,0.666217,0.16815,0.02851,0.527536,0.047821,-7.486857,125.547854
std,0.11943,0.182031,0.185707,0.115975,0.203702,0.029647,3.634668,25.60402
min,0.31,0.161,2.5e-05,0.0,0.136,0.0253,-22.32,74.989
25%,0.501,0.532,0.0183,0.0,0.361,0.0311,-9.348,108.736
50%,0.579,0.675,0.0936,0.000105,0.516,0.0374,-7.174,124.053
75%,0.645,0.807,0.217,0.000879,0.681,0.0507,-5.021,141.47
max,0.852,0.952,0.883,0.91,0.965,0.168,-2.729,188.386


In [21]:
print("Kpop")
kpop_clean.describe()

Kpop


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.65805,0.781011,0.083426,6.7e-05,0.521027,0.096608,-4.343752,123.932605
std,0.10679,0.092695,0.106968,0.000323,0.176487,0.093025,1.492394,24.864153
min,0.347,0.47,0.00116,0.0,0.141,0.0276,-7.981,77.501
25%,0.583,0.71725,0.0114,0.0,0.389,0.045,-5.165,104.995
50%,0.661,0.798,0.03235,0.0,0.519,0.0665,-4.342,124.998
75%,0.72,0.853,0.118,0.0,0.648,0.106,-3.211,139.987
max,0.874,0.959,0.445,0.00188,0.896,0.483,-1.339,176.084


In [22]:
print("Instrumental")
instrumental_clean.describe()

Instrumental


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.330933,0.051802,0.961124,0.900726,0.233352,0.048943,-26.328453,98.280007
std,0.153895,0.080273,0.083071,0.113269,0.169139,0.020089,5.438604,34.836254
min,0.0793,0.00275,0.478,0.00281,0.0347,0.0326,-38.48,60.584
25%,0.215,0.0131,0.973,0.892,0.0911,0.0375,-29.699,69.882
50%,0.332,0.0282,0.993,0.922,0.216,0.0418,-25.859,85.191
75%,0.426,0.0621,0.995,0.94925,0.313,0.0521,-22.649,118.281
max,0.839,0.585,0.996,0.972,0.912,0.153,-7.738,200.615


In [23]:
print("ASMR")
asmr_clean.describe()

ASMR


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.272093,0.230373,0.570336,0.655693,0.095609,0.094388,-29.64915,101.393064
std,0.145651,0.318364,0.416652,0.351641,0.120984,0.06402,9.0364,33.766274
min,0.0733,0.00268,2.3e-05,0.0,0.00102,0.0306,-50.618,65.044
25%,0.168,0.0222,0.0254,0.44575,0.0269,0.0527,-37.978,73.012
50%,0.213,0.0522,0.795,0.808,0.035,0.0642,-30.711,87.28
75%,0.346,0.295,0.957,0.941,0.124,0.141,-21.9305,130.548
max,0.654,0.998,0.996,0.997,0.58,0.393,-10.537,176.654


In [24]:
print(pop_clean.shape)
print(hiphop_clean.shape)
print(jazz_clean.shape)
print(rock_clean.shape)
print(kpop_clean.shape)
print(instrumental_clean.shape)
print(asmr_clean.shape)

(1000, 9)
(1000, 9)
(1000, 9)
(1000, 9)
(1000, 9)
(1000, 9)
(1000, 9)


### Export Clean DataFrame to New CSV files

In [None]:
# Export the DataFrames to csv files
pop_clean.to_csv("spotify_data_pop_clean.csv", encoding="utf-8", index=False)
hiphop_clean.to_csv("spotify_data_hiphop_clean.csv", encoding="utf-8", index=False)
jazz_clean.to_csv("spotify_data_jazz_clean.csv", encoding="utf-8", index=False)
rock_clean.to_csv("spotify_data_rock_clean.csv", encoding="utf-8", index=False)
kpop_clean.to_csv("spotify_data_kpop_clean.csv", encoding="utf-8", index=False)
instrumental_clean.to_csv("spotify_data_instrumental_clean.csv", encoding="utf-8", index=False)
asmr_clean.to_csv("spotify_data_asmr_clean.csv", encoding="utf-8", index=False)