# Spotify Data Analysis

### Genres:
- Pop
- Hip-hop
- Jazz
- Rock
- K-pop
- Instrumental
- ASMR

### Features Definitions:

- Danceability - The higher the value, the easier it is to dance to this song. Danceability describes how suitable a track is for dancing based on a combination of muical elements including tempo, rhythm stability, beat strength, and overall regurality.
- Energy - the higher the value, the more energtic the song is. Energy represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
- Acousticness - The higher the value the more acoustic the song is.
- Instrumentalness - The higher the value the greater likelihood the track contains no vocal content. Instrumentalness predicts whether a track contains no vocals.
- Valence - The higher the value, the more positive mood for the song. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
- Speechiness - The higher the value the more spoken word the song contains.
- Loudness (dB) - The higher the value, the louder the song. Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Values typical range between -60 and 0 dB.
- Tempo (BPM) - The tempo of the song (beats per minute).
- Popularity - The higher the value the more popular the song is.


In [1]:
import pandas as pd

## Extract The Necessary Columns 

### Pop

In [2]:
# Read the spotify csv data for each genres
pop = pd.read_csv("spotify_data_pop.csv")
pop.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,The Weeknd,Blinding Lights,100,0VjIjW4GlUZAMYd2vXMi3b,spotify:track:0VjIjW4GlUZAMYd2vXMi3b,0.514,0.73,-5.934,0.0598,0.00146,9.5e-05,0.0897,0.334,171.005,200040,pop
1,Dua Lipa,Don't Start Now,97,6WrI0LAC5M1Rw2MnX2ZvEg,spotify:track:6WrI0LAC5M1Rw2MnX2ZvEg,0.794,0.793,-4.521,0.0842,0.0125,0.0,0.0952,0.677,123.941,183290,pop
2,Doja Cat,Say So,97,3Dv1eDb0MEgF93GpLXlucZ,spotify:track:3Dv1eDb0MEgF93GpLXlucZ,0.787,0.673,-4.577,0.158,0.256,4e-06,0.0904,0.786,110.962,237893,pop
3,Arizona Zervas,ROXANNE,95,696DnlkuDOXcMAnKlTgXXK,spotify:track:696DnlkuDOXcMAnKlTgXXK,0.621,0.601,-5.616,0.148,0.0522,0.0,0.46,0.457,116.735,163636,pop
4,BENEE,Supalonely,95,4nK5YrxbMGZstTLbvj6Gxw,spotify:track:4nK5YrxbMGZstTLbvj6Gxw,0.863,0.631,-4.689,0.0534,0.305,3e-05,0.123,0.817,128.977,223480,pop


In [3]:
pop_clean = pop[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
pop_clean.dropna(inplace=True)
pop_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,pop,0.514,0.73,0.00146,9.5e-05,0.334,0.0598,-5.934,171.005
1,pop,0.794,0.793,0.0125,0.0,0.677,0.0842,-4.521,123.941
2,pop,0.787,0.673,0.256,4e-06,0.786,0.158,-4.577,110.962
3,pop,0.621,0.601,0.0522,0.0,0.457,0.148,-5.616,116.735
4,pop,0.863,0.631,0.305,3e-05,0.817,0.0534,-4.689,128.977


### Hip-hop

In [4]:
# Read the spotify csv data for each genres
hiphop = pd.read_csv("spotify_data_hiphop.csv")
hiphop.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,Drake,Toosie Slide,99,127QTOFJsJQp5LbJbu3A1y,spotify:track:127QTOFJsJQp5LbJbu3A1y,0.834,0.454,-9.75,0.201,0.321,6e-06,0.114,0.837,81.618,247059,HipHop
1,Jack Harlow,WHATS POPPIN,93,1jaTQ3nqY3oAAYyCTbIvnM,spotify:track:1jaTQ3nqY3oAAYyCTbIvnM,0.923,0.604,-6.671,0.245,0.017,0.0,0.272,0.826,145.062,139741,HipHop
2,Future,Life Is Good (feat. Drake),94,5yY9lUy8nbvjM1Uyo1Uqoc,spotify:track:5yY9lUy8nbvjM1Uyo1Uqoc,0.676,0.609,-5.831,0.481,0.0706,0.0,0.152,0.508,142.037,237735,HipHop
3,DaBaby,ROCKSTAR (feat. Roddy Ricch),91,7ytR5pFWmSjzHJIeQkgog4,spotify:track:7ytR5pFWmSjzHJIeQkgog4,0.746,0.69,-7.956,0.164,0.247,0.0,0.101,0.497,89.977,181733,HipHop
4,NLE Choppa,Walk Em Down (feat. Roddy Ricch),90,4cSSL3YafYjM3yjgFO1vJg,spotify:track:4cSSL3YafYjM3yjgFO1vJg,0.867,0.744,-5.171,0.228,0.268,0.0,0.0713,0.645,84.005,173288,HipHop


In [5]:
hiphop_clean = hiphop[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
hiphop_clean.dropna(inplace=True)
hiphop_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,HipHop,0.834,0.454,0.321,6e-06,0.837,0.201,-9.75,81.618
1,HipHop,0.923,0.604,0.017,0.0,0.826,0.245,-6.671,145.062
2,HipHop,0.676,0.609,0.0706,0.0,0.508,0.481,-5.831,142.037
3,HipHop,0.746,0.69,0.247,0.0,0.497,0.164,-7.956,89.977
4,HipHop,0.867,0.744,0.268,0.0,0.645,0.228,-5.171,84.005


### Jazz

In [6]:
# Read the spotify csv data for each genres
jazz = pd.read_csv("spotify_data_jazz.csv")
jazz['genre'] = 'jazz'
jazz.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,"Earth, Wind & Fire",September,82,7Cuk8jsPPoNYQWXK9XRFvG,spotify:track:7Cuk8jsPPoNYQWXK9XRFvG,0.694,0.831,-7.288,0.0301,0.165,0.000892,0.25,0.98,125.901,215080,jazz
1,Louis Armstrong,What A Wonderful World - Single Version,68,29U7stRjqHU6rMiS8BfaI9,spotify:track:29U7stRjqHU6rMiS8BfaI9,0.271,0.165,-20.652,0.0351,0.729,2e-06,0.118,0.203,77.082,139227,jazz
2,Leslie Odom Jr.,Alexander Hamilton,71,4TTV7EcfroSLWzXRY6gLv6,spotify:track:4TTV7EcfroSLWzXRY6gLv6,0.609,0.435,-7.862,0.284,0.524,0.0,0.118,0.563,131.998,236738,jazz
3,Etta James,At Last,75,4Hhv2vrOTy89HFRcjU3QOx,spotify:track:4Hhv2vrOTy89HFRcjU3QOx,0.273,0.347,-8.631,0.0292,0.546,0.0137,0.334,0.328,87.411,179693,jazz
4,"Grover Washington, Jr.",Just the Two of Us (feat. Bill Withers),71,1ko2lVN0vKGUl9zrU0qSlT,spotify:track:1ko2lVN0vKGUl9zrU0qSlT,0.803,0.488,-9.303,0.0803,0.576,0.0609,0.0763,0.624,95.771,237106,jazz


In [7]:
jazz_clean = jazz[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
jazz_clean.dropna(inplace=True)
jazz_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,jazz,0.694,0.831,0.165,0.000892,0.98,0.0301,-7.288,125.901
1,jazz,0.271,0.165,0.729,2e-06,0.203,0.0351,-20.652,77.082
2,jazz,0.609,0.435,0.524,0.0,0.563,0.284,-7.862,131.998
3,jazz,0.273,0.347,0.546,0.0137,0.328,0.0292,-8.631,87.411
4,jazz,0.803,0.488,0.576,0.0609,0.624,0.0803,-9.303,95.771


### Rock

In [8]:
# Read the spotify csv data for each genres
rock = pd.read_csv("spotify_data_rock.csv")
rock['genre'] = 'rock'
rock.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,Twenty One Pilots,Level of Concern,88,6xZ4Q2k2ompmDppyeESIY8,spotify:track:6xZ4Q2k2ompmDppyeESIY8,0.754,0.583,-7.34,0.0432,0.32,0.00015,0.144,0.77,122.012,220051,rock
1,Wallows,Are You Bored Yet? (feat. Clairo),81,57RA3JGafJm5zRtKJiKPIm,spotify:track:57RA3JGafJm5zRtKJiKPIm,0.682,0.683,-6.444,0.0287,0.156,2.3e-05,0.273,0.64,120.023,178000,rock
2,Grouplove,Tongue Tied,81,0GO8y8jQk1PkHzS31d699N,spotify:track:0GO8y8jQk1PkHzS31d699N,0.56,0.936,-5.835,0.0439,0.00847,0.0,0.161,0.371,112.96,218013,rock
3,Imagine Dragons,Believer,88,0pqnGHJpmpxLKifKRmU6WP,spotify:track:0pqnGHJpmpxLKifKRmU6WP,0.776,0.78,-4.374,0.128,0.0622,0.0,0.081,0.666,124.949,204347,rock
4,The Killers,Mr. Brightside,78,7oK9VyNzrYvRFo7nQEYkWN,spotify:track:7oK9VyNzrYvRFo7nQEYkWN,0.356,0.924,-3.74,0.0808,0.00101,0.0,0.0953,0.232,148.017,222587,rock


In [9]:
rock_clean = rock[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
rock_clean.dropna(inplace=True)
rock_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,rock,0.754,0.583,0.32,0.00015,0.77,0.0432,-7.34,122.012
1,rock,0.682,0.683,0.156,2.3e-05,0.64,0.0287,-6.444,120.023
2,rock,0.56,0.936,0.00847,0.0,0.371,0.0439,-5.835,112.96
3,rock,0.776,0.78,0.0622,0.0,0.666,0.128,-4.374,124.949
4,rock,0.356,0.924,0.00101,0.0,0.232,0.0808,-3.74,148.017


### K-pop

In [10]:
# Read the spotify csv data for each genres
kpop = pd.read_csv("spotify_data_kpop.csv")
kpop['genre'] = 'kpop'
kpop.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,BTS,ON,85,2QyuXBcV1LJ2rq01KhreMF,spotify:track:2QyuXBcV1LJ2rq01KhreMF,0.583,0.817,-5.146,0.0987,0.118,0.0,0.338,0.438,105.936,246381,kpop
1,BTS,Boy With Luv (feat. Halsey),84,5KawlOMHjWeUjQtnuRs22c,spotify:track:5KawlOMHjWeUjQtnuRs22c,0.645,0.862,-4.757,0.0965,0.0923,0.0,0.192,0.798,119.991,229773,kpop
2,BTS,Filter,82,0ono6UCNVZ1XqOm6j78Blu,spotify:track:0ono6UCNVZ1XqOm6j78Blu,0.781,0.762,-5.188,0.0626,0.0222,0.0,0.121,0.86,110.042,180221,kpop
3,BTS,My Time,81,4vTgx6h4seHvkuFh84JXYP,spotify:track:4vTgx6h4seHvkuFh84JXYP,0.674,0.64,-5.139,0.0339,0.151,0.0,0.0925,0.664,99.908,234458,kpop
4,BTS,ON (Feat. Sia),81,3IB5qOeMayvpOdHxYCL5tZ,spotify:track:3IB5qOeMayvpOdHxYCL5tZ,0.591,0.848,-4.397,0.0828,0.137,0.0,0.372,0.386,105.922,246816,kpop


In [11]:
kpop_clean = kpop[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
kpop_clean.dropna(inplace=True)
kpop_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,kpop,0.583,0.817,0.118,0.0,0.438,0.0987,-5.146,105.936
1,kpop,0.645,0.862,0.0923,0.0,0.798,0.0965,-4.757,119.991
2,kpop,0.781,0.762,0.0222,0.0,0.86,0.0626,-5.188,110.042
3,kpop,0.674,0.64,0.151,0.0,0.664,0.0339,-5.139,99.908
4,kpop,0.591,0.848,0.137,0.0,0.386,0.0828,-4.397,105.922


### Instrumental

In [12]:
# Read the spotify csv data for each genres
instrumental = pd.read_csv("spotify_data_instrumental.csv")
instrumental.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,Steve Mokwebe,It Ends with Us,68,6RMjZgsE9IcQZqubTzLtDs,spotify:track:6RMjZgsE9IcQZqubTzLtDs,0.326,0.00891,-24.103,0.0518,0.986,0.934,0.111,0.299,69.79,130192,instrumental
1,Rannar Sillard,Dream Voucher,68,2T6wyxLBkQ4Y2ZjTbpuYfr,spotify:track:2T6wyxLBkQ4Y2ZjTbpuYfr,0.302,0.101,-19.705,0.0378,0.943,0.886,0.107,0.213,106.904,155040,instrumental
2,Vala Capon,Presto arriverà il sole,69,2jyJXuG0rIma11mOl4Fz7m,spotify:track:2jyJXuG0rIma11mOl4Fz7m,0.378,0.0437,-26.159,0.0357,0.995,0.92,0.0983,0.423,77.211,152000,instrumental
3,Benette,Lily's Cradle,68,4UlarjdicLUPbdssOxWbYX,spotify:track:4UlarjdicLUPbdssOxWbYX,0.368,0.00892,-29.555,0.05,0.99,0.911,0.102,0.287,63.639,161267,instrumental
4,Ever So Blue,Cessura,70,7uvey8m0ZfknE25sBVWoGY,spotify:track:7uvey8m0ZfknE25sBVWoGY,0.374,0.0989,-28.377,0.0326,0.991,0.805,0.0799,0.342,152.84,185390,instrumental


In [13]:
instrumental_clean = instrumental[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness','tempo'
]]
instrumental_clean.dropna(inplace=True)
instrumental_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,instrumental,0.326,0.00891,0.986,0.934,0.299,0.0518,-24.103,69.79
1,instrumental,0.302,0.101,0.943,0.886,0.213,0.0378,-19.705,106.904
2,instrumental,0.378,0.0437,0.995,0.92,0.423,0.0357,-26.159,77.211
3,instrumental,0.368,0.00892,0.99,0.911,0.287,0.05,-29.555,63.639
4,instrumental,0.374,0.0989,0.991,0.805,0.342,0.0326,-28.377,152.84


### ASMR

In [14]:
# Read the spotify csv data for each genres
asmr = pd.read_csv("spotify_data_asmr.csv")
asmr.head()

Unnamed: 0,artist_name,track_name,popularity,track_id,track_uri,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration,genre
0,Weather Factory,Deep Thunderstorm,74,4xyiM6KWF90l9wQh07GJvN,spotify:track:4xyiM6KWF90l9wQh07GJvN,0.135,0.579,-27.105,0.0932,0.00143,0.949,0.393,0.0327,100.841,152563,asmr
1,Weather Factory,Calm Rain & Thunder,73,5fkUSLLclp3LdXiMw3zTgR,spotify:track:5fkUSLLclp3LdXiMw3zTgR,0.16,0.56,-25.388,0.0772,0.00238,0.714,0.413,0.0271,129.874,200693,asmr
2,Weather Factory,Forest Thunderstorm,70,5lyslBBFXOiu2uB0sEdLSa,spotify:track:5lyslBBFXOiu2uB0sEdLSa,0.142,0.91,-20.77,0.115,0.0469,0.957,0.509,0.0135,65.291,150444,asmr
3,Masters of Binaurality,Gamma Freq Pads,64,1yOlQ6v0E3Jgo9iY0dtMyQ,spotify:track:1yOlQ6v0E3Jgo9iY0dtMyQ,0.205,0.121,-16.988,0.0489,0.713,0.939,0.11,0.0283,71.895,220606,asmr
4,Mindful Behaviour,Alpha 8 Hz,63,5VV9fAmGLlXsK568Uax9A2,spotify:track:5VV9fAmGLlXsK568Uax9A2,0.243,0.0632,-31.781,0.0838,0.996,0.794,0.106,0.0531,137.632,224049,asmr


In [15]:
asmr_clean = asmr[[
    'genre', 'danceability', 'energy', 'acousticness', 'instrumentalness', 
    'valence', 'speechiness', 'loudness', 'tempo'
]]
asmr_clean.dropna(inplace=True)
asmr_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,genre,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
0,asmr,0.135,0.579,0.00143,0.949,0.0327,0.0932,-27.105,100.841
1,asmr,0.16,0.56,0.00238,0.714,0.0271,0.0772,-25.388,129.874
2,asmr,0.142,0.91,0.0469,0.957,0.0135,0.115,-20.77,65.291
3,asmr,0.205,0.121,0.713,0.939,0.0283,0.0489,-16.988,71.895
4,asmr,0.243,0.0632,0.996,0.794,0.0531,0.0838,-31.781,137.632


## Check Value Types and Min Max Values for Each Features

In [16]:
print("Pop Data Types: ")
print(pop_clean.dtypes)
print("---------------------------------------")

print("Hiphop Data Types: ")
print(hiphop_clean.dtypes)
print("---------------------------------------")

print("Jazz Data Types: ")
print(jazz_clean.dtypes)
print("---------------------------------------")

print("Rock Data Types: ")
print(rock_clean.dtypes)
print("---------------------------------------")

print("Kpop Data Types: ")
print(kpop_clean.dtypes)
print("---------------------------------------")

print("Instrumental Data Types: ")
print(instrumental_clean.dtypes)
print("---------------------------------------")

print("ASMR Data Types: ")
print(asmr_clean.dtypes)

Pop Data Types: 
genre                object
danceability        float64
energy              float64
acousticness        float64
instrumentalness    float64
valence             float64
speechiness         float64
loudness            float64
tempo               float64
dtype: object
---------------------------------------
Hiphop Data Types: 
genre                object
danceability        float64
energy              float64
acousticness        float64
instrumentalness    float64
valence             float64
speechiness         float64
loudness            float64
tempo               float64
dtype: object
---------------------------------------
Jazz Data Types: 
genre                object
danceability        float64
energy              float64
acousticness        float64
instrumentalness    float64
valence             float64
speechiness         float64
loudness            float64
tempo               float64
dtype: object
---------------------------------------
Rock Data Types: 
genre    

In [17]:
print("Pop")
pop_clean.describe()

Pop


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,0.69688,0.588385,0.248686,0.003525,0.514296,0.106108,-6.18332,117.24204
std,0.135685,0.14369,0.257241,0.020335,0.24252,0.105873,1.803926,23.457004
min,0.378,0.267,0.00146,0.0,0.0592,0.0259,-10.965,75.025
25%,0.6135,0.491,0.0252,0.0,0.324,0.042225,-6.671,102.1195
50%,0.6935,0.585,0.143,0.0,0.513,0.0574,-6.024,113.012
75%,0.809,0.719,0.328,7e-06,0.71875,0.13375,-4.81575,129.989
max,0.923,0.955,0.866,0.13,0.895,0.481,-3.434,171.005


In [18]:
print("Hiphop")
hiphop_clean.describe()

Hiphop


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,0.81045,0.5746,0.190125,0.000446,0.531005,0.221505,-6.87616,119.415625
std,0.10251,0.116045,0.162403,0.001683,0.214294,0.128255,2.200057,27.208304
min,0.453,0.347,0.000282,0.0,0.101,0.0287,-11.713,75.023
25%,0.769,0.4975,0.0849,0.0,0.389,0.10875,-8.33,97.008
50%,0.831,0.568,0.149,0.0,0.497,0.206,-6.678,122.973
75%,0.883,0.652,0.258,6e-06,0.711,0.341,-5.23175,140.894
max,0.97,0.808,0.847,0.00951,0.966,0.481,-2.708,175.041


In [19]:
print("Jazz")
jazz_clean.describe()

Jazz


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,0.5281,0.28165,0.737626,0.077187,0.433605,0.079826,-13.868335,107.30401
std,0.150405,0.22747,0.245688,0.220561,0.214075,0.114087,5.666088,32.305247
min,0.172,0.0161,0.0903,0.0,0.103,0.0247,-27.331,49.689
25%,0.421,0.114,0.596,1e-06,0.255,0.0322,-17.042,82.32375
50%,0.507,0.199,0.843,0.000739,0.394,0.0386,-12.776,103.2645
75%,0.613,0.373,0.923,0.0152,0.613,0.0538,-9.43,128.629
max,0.848,0.924,0.993,0.948,0.98,0.648,-5.263,206.247


In [20]:
print("Rock")
rock_clean.describe()

Rock


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,0.58665,0.65337,0.164657,0.022481,0.525265,0.047473,-7.576145,125.293765
std,0.125078,0.177654,0.18512,0.076625,0.208748,0.03044,3.789669,25.746522
min,0.31,0.161,0.000147,0.0,0.136,0.0253,-22.32,74.989
25%,0.501,0.516,0.0183,0.0,0.341,0.0299,-9.616,108.736
50%,0.5825,0.67,0.0901,6.5e-05,0.533,0.0364,-6.694,124.053
75%,0.67125,0.784,0.217,0.00173,0.66975,0.0507,-5.021,143.883
max,0.852,0.952,0.883,0.447,0.965,0.16,-2.729,188.386


In [21]:
print("Kpop")
kpop_clean.describe()

Kpop


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,0.656715,0.77823,0.084348,8.3e-05,0.515275,0.093884,-4.46639,123.36524
std,0.107159,0.0958,0.105207,0.000357,0.176521,0.085241,1.489166,25.715502
min,0.347,0.548,0.00116,0.0,0.141,0.0276,-7.981,77.501
25%,0.583,0.712,0.0121,0.0,0.389,0.045,-5.167,99.93
50%,0.661,0.798,0.0367,0.0,0.503,0.0665,-4.519,124.998
75%,0.7235,0.853,0.125,0.0,0.644,0.106,-3.48,141.87
max,0.874,0.959,0.445,0.00188,0.896,0.483,-1.339,174.039


In [22]:
print("Instrumental")
instrumental_clean.describe()

Instrumental


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,0.319746,0.038292,0.969265,0.910585,0.21818,0.046039,-27.260445,97.243955
std,0.131184,0.038081,0.061716,0.048545,0.144788,0.014192,4.822376,33.432244
min,0.0793,0.00275,0.679,0.776,0.0347,0.0326,-38.48,60.584
25%,0.224,0.0102,0.976,0.89125,0.0911,0.0375,-30.011,69.69375
50%,0.329,0.0281,0.993,0.92,0.213,0.04115,-26.478,85.191
75%,0.407,0.0509,0.995,0.947,0.312,0.0509,-23.11325,117.326
max,0.622,0.163,0.996,0.972,0.561,0.12,-19.448,200.615


In [23]:
print("ASMR")
asmr_clean.describe()

ASMR


Unnamed: 0,danceability,energy,acousticness,instrumentalness,valence,speechiness,loudness,tempo
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,0.260492,0.229345,0.583904,0.681268,0.091876,0.090255,-29.182495,104.043615
std,0.138061,0.315418,0.41151,0.33632,0.121067,0.055017,8.861158,34.735805
min,0.0733,0.00268,3e-05,0.0,0.00102,0.0306,-50.618,65.044
25%,0.16275,0.02475,0.0407,0.549,0.0269,0.0515,-35.858,71.895
50%,0.213,0.0522,0.795,0.813,0.035,0.0613,-30.323,90.533
75%,0.328,0.307,0.957,0.941,0.119,0.111,-20.77,136.396
max,0.575,0.998,0.996,0.997,0.58,0.229,-10.537,176.654


In [24]:
print(pop_clean.shape)
print(hiphop_clean.shape)
print(jazz_clean.shape)
print(rock_clean.shape)
print(kpop_clean.shape)
print(instrumental_clean.shape)
print(asmr_clean.shape)

(200, 9)
(200, 9)
(200, 9)
(200, 9)
(200, 9)
(200, 9)
(200, 9)


### Export Clean DataFrame to New CSV files

In [25]:
# Export the DataFrames to csv files
pop_clean.to_csv("spotify_data_pop_clean.csv", encoding="utf-8", index=False)
hiphop_clean.to_csv("spotify_data_hiphop_clean.csv", encoding="utf-8", index=False)
jazz_clean.to_csv("spotify_data_jazz_clean.csv", encoding="utf-8", index=False)
rock_clean.to_csv("spotify_data_rock_clean.csv", encoding="utf-8", index=False)
kpop_clean.to_csv("spotify_data_kpop_clean.csv", encoding="utf-8", index=False)
instrumental_clean.to_csv("spotify_data_instrumental_clean.csv", encoding="utf-8", index=False)
asmr_clean.to_csv("spotify_data_asmr_clean.csv", encoding="utf-8", index=False)