In [1]:
import pandas as pd

DISCO = '/kaggle/input/700-classic-disco-tracks-with-spotify-data/ClassicDisco.csv'

df = pd.read_csv(filepath_or_buffer=DISCO)
df.head()

Unnamed: 0,Track,Artist,Album,Year,Duration,Time_Signature,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Popularity
0,Boogie Oogie Oogie,A Taste Of Honey,A Taste Of Honey,1978,245173,4,0.797,0.548,2,-9.228,0,0.0492,0.0401,1e-05,0.0994,0.868,123.686,53
1,Disco Dancin' - 2004 - Remastered,A Taste Of Honey,A Taste Of Honey,1978,210493,4,0.751,0.57,2,-11.645,1,0.0813,0.371,2e-05,0.123,0.77,103.408,11
2,Distant,A Taste Of Honey,A Taste Of Honey,1978,260053,4,0.468,0.512,2,-10.187,1,0.138,0.782,2e-05,0.202,0.714,163.4,13
3,Do It Good,A Taste Of Honey,Another Taste (Expanded Edition),1978,342826,4,0.699,0.74,3,-8.625,0,0.0289,0.0714,2e-06,0.165,0.889,104.168,17
4,If We Loved,A Taste Of Honey,A Taste Of Honey,1978,266200,4,0.463,0.459,6,-11.6,1,0.0296,0.764,3e-06,0.158,0.3,129.267,13


In [2]:
len(df)

724

This is not a very big dataset; how many distict artists are represented?

In [3]:
df['Artist'].nunique()

101

That means the mean artist contributed about 7.17 tracks. But what about the median?

In [4]:
df['Artist'].value_counts().median()

8.0

This is kind of a surprise: the disco era wasn't very long and there were a lot of one-hit wonders, so we would expect the median to be (somewhat? much?) less than the mean. But this isn't an exhaustive catalogue of disco hits; it's a Spotify sample of disco tunes, so it is subject to other factors too.

In [5]:
from plotly import express

express.histogram(data_frame=df['Artist'].value_counts().to_frame().reset_index(), x='Artist', y='count')

What years does this dataset consider the Disco Era?

In [6]:
express.histogram(data_frame=df['Year'].value_counts().to_frame().reset_index().sort_values(by='Year'), x='Year', y='count', nbins=60)

There are some questionable choices here; or maybe the Year column doesn't tell us exactly what we think it should. But this graph captures things we probably already knew: 
* Disco is primarily a 1970s phenomenon
* It peaked in 1978-1979

Disco is almost exclusively a "four on the floor" genre; is that true of our dataset?

In [7]:
df['Time_Signature'].value_counts(normalize=True).to_dict()

{4: 0.9682320441988951, 3: 0.03038674033149171, 5: 0.0013812154696132596}

Yes. 97% of our tracks have the same time signature.

In [8]:
df[df['Time_Signature'] == 5]

Unnamed: 0,Track,Artist,Album,Year,Duration,Time_Signature,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Popularity
250,I Like Girls,Fatback Band,Hustle! The Ultimate Fatback,1983,227946,5,0.716,0.928,5,-7.757,0,0.0923,0.0308,0.115,0.168,0.869,104.129,22


How about other musical concepts we don't need to explain, like key and mode?

In [9]:
express.histogram(data_frame=df, x='Key')

We only have 12 values for Key, so our dataset must be lumping major and minor keys together; a few web searches suggest that 0 = C, so our most popular key is C/Cm, and our least popular is D sharp/E flat.

In [10]:
df['Mode'].value_counts().to_dict()

{1: 416, 0: 308}

We only have two values for Mode. Ionian/major and Aeolian/minor, maybe? 

In [11]:
express.histogram(data_frame=df, x='Key', color='Mode')

It looks like 0 is minor and 1 is major. Would you have guessed that songs in B minor would outnumber songs in D major in this dataset?

In [12]:
express.histogram(data_frame=df, x='Danceability')

What do you suppose are the most and least danceable tracks in our dataset?

In [13]:
df[df['Danceability'] == df['Danceability'].min()]

Unnamed: 0,Track,Artist,Album,Year,Duration,Time_Signature,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Popularity
409,She's Out of My Life - Single Version,Michael Jackson,Off the Wall,1979,218266,4,0.293,0.0348,4,-23.92,1,0.0414,0.962,0.000212,0.105,0.101,130.959,56


Yeah She's Out of My Life is a very sad little torch song and essentially undanceable.

In [14]:
df[df['Danceability'] == df['Danceability'].max()]

Unnamed: 0,Track,Artist,Album,Year,Duration,Time_Signature,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Popularity
587,State of Shock,The Jacksons,Victory,1984,271000,4,0.98,0.846,8,-7.696,0,0.139,0.283,0.149,0.052,0.891,121.562,42


That's a surprise; there are lots of danceable songs in this dataset; some of them even have dances named after them. State of Shock does not immediately spring to mind as being the most danceable.

In [15]:
df[df['Danceability'] == df['Danceability'].median()]

Unnamed: 0,Track,Artist,Album,Year,Duration,Time_Signature,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Popularity
18,Voulez-Vous,ABBA,Voulez-Vous,1979,309173,4,0.708,0.774,11,-6.931,0,0.0372,0.149,0.00109,0.169,0.903,126.636,72
283,Love On Top Of Love,Grace Jones,20 #1's: 80's Club Classics,1980,368146,4,0.708,0.951,10,-5.638,0,0.0729,0.00718,0.00113,0.193,0.712,105.157,9
410,Wink,Neal McCoy,No Doubt About It,1994,162466,4,0.708,0.852,4,-6.177,1,0.0302,0.0761,0.0,0.172,0.93,117.644,58
706,Turn the Beat Around,Vicki Sue Robinson,Never Gonna Let You Go (Expanded Edition),1976,333826,4,0.708,0.94,7,-4.03,0,0.0593,0.162,6e-06,0.305,0.869,131.072,43


And that's what average danceability looks like.

In [16]:
express.histogram(data_frame=df, x='Energy')

We already learned there are some torch songs in our dataset; maybe some ballads too?

In [17]:
df[df['Energy'] < 0.2].sort_values(by='Energy')

Unnamed: 0,Track,Artist,Album,Year,Duration,Time_Signature,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Popularity
409,She's Out of My Life - Single Version,Michael Jackson,Off the Wall,1979,218266,4,0.293,0.0348,4,-23.92,1,0.0414,0.962,0.000212,0.105,0.101,130.959,56
150,Three Times A Lady,Commodores,All The Great Love Songs,1978,217426,3,0.39,0.0803,8,-20.518,1,0.0306,0.874,0.000639,0.127,0.184,150.503,53
450,I'm Going Down,Rose Royce,Car Wash,1976,216773,3,0.623,0.103,0,-15.008,1,0.0393,0.712,0.000177,0.109,0.319,92.257,44
299,Always and Forever,Heatwave,Too Hot to Handle (Expanded Edition),1976,377106,3,0.333,0.156,2,-17.574,1,0.0335,0.286,2.3e-05,0.151,0.195,71.441,52
449,I Wanna Get Next To You,Rose Royce,Car Wash,1976,238200,4,0.62,0.179,0,-19.981,1,0.0314,0.835,0.03,0.0928,0.675,130.864,52
147,Oh No,Commodores,In The Pocket,1981,184306,4,0.584,0.186,8,-17.158,1,0.0258,0.772,6e-06,0.273,0.336,85.731,49
320,Medley: I Just Want To Make Love To You/Rock M...,Isaac Hayes,Presenting Isaac Hayes,1975,541226,4,0.515,0.195,3,-18.009,1,0.0869,0.86,8.2e-05,0.255,0.479,138.339,7


What does the other end of our distribution look like?

In [18]:
df[df['Energy'] > 0.95].sort_values(ascending=False, by='Energy')

Unnamed: 0,Track,Artist,Album,Year,Duration,Time_Signature,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Popularity
578,"Take Me to the Next Phase, Pts. 1 & 2",The Isley Brothers,Showdown,1975,310760,4,0.781,0.985,5,-5.199,0,0.107,0.315,0.0,0.337,0.622,105.029,19
155,Far Out,Crown Heights Affair,Do It Your Way,1979,381560,4,0.535,0.976,6,-6.415,1,0.0653,0.0166,0.346,0.0342,0.825,123.653,17
367,Crash Goes Love (Blaster Mix),Loleatta Holloway,Crash Goes Love,1980,441765,4,0.665,0.976,2,-6.016,1,0.0462,0.00222,0.00552,0.0819,0.796,117.525,13
723,"Tahiti, Tahiti…",Voyage,Fly Away,1979,299510,4,0.672,0.972,2,-4.929,1,0.139,0.00425,9e-06,0.101,0.815,127.591,24
145,Machine Gun,Commodores,Machine Gun,1974,159466,4,0.741,0.97,5,-6.364,1,0.0341,0.00882,0.836,0.295,0.778,113.103,48
252,Spanish Hustle,Fatback Band,Hustle! The Ultimate Fatback,1975,205213,4,0.426,0.97,10,-6.928,1,0.103,0.00357,0.207,0.0817,0.65,123.723,23
722,Souvenirs,Voyage,Fly Away,1979,390567,4,0.633,0.97,5,-4.153,1,0.038,0.000667,0.202,0.396,0.757,127.776,29
167,Move On Up,Curtis Mayfield,People Never Give Up,1970,165789,4,0.538,0.969,11,-5.626,0,0.0907,0.419,0.00144,0.12,0.928,138.654,54
703,Hold Tight,Vicki Sue Robinson,Half and Half (Expanded Edition),1976,326626,4,0.691,0.961,0,-3.914,1,0.0558,0.141,0.000147,0.598,0.833,123.459,6
283,Love On Top Of Love,Grace Jones,20 #1's: 80's Club Classics,1980,368146,4,0.708,0.951,10,-5.638,0,0.0729,0.00718,0.00113,0.193,0.712,105.157,9


In [19]:
express.histogram(data_frame=df, x='Tempo')

Disco has a reputation for being a 120bpm genre, so this distribution is not entirely surprising; we might expect even more tracks to be clustered around the mean than we see.

In [20]:
df['Tempo'].value_counts(normalize=True, bins=10)

(119.329, 133.386]             0.370166
(105.273, 119.329]             0.320442
(91.216, 105.273]              0.147790
(133.386, 147.443]             0.059392
(77.16, 91.216]                0.037293
(175.556, 189.612]             0.020718
(62.961000000000006, 77.16]    0.013812
(147.443, 161.499]             0.012431
(161.499, 175.556]             0.011050
(189.612, 203.669]             0.006906
Name: proportion, dtype: float64

Ten bins is plenty; about 65% of our data is in the two bins closest to the mean.

In [21]:
df[df['Tempo'] > 190].sort_values(by='Tempo')

Unnamed: 0,Track,Artist,Album,Year,Duration,Time_Signature,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Popularity
611,Can't Get By Without You,The Real Thing,Can You Feel the Force - All Their Biggest Hits!,1976,213559,4,0.485,0.822,1,-5.147,0,0.0779,0.19,0.0,0.129,0.631,191.918,29
661,Sixty Minute Man,The Trammps,The Legendary Zing Album,1976,294533,4,0.468,0.783,6,-11.268,1,0.0629,0.258,0.333,0.0756,0.779,192.019,8
333,Hot Pants (She Got To Use What She Got To Get ...,James Brown,Star Time,1971,187306,4,0.669,0.675,8,-8.958,1,0.386,0.256,0.0,0.0762,0.787,196.088,29
21,It's All Up to You,Andrea True Connection,White Witch,1976,181200,4,0.622,0.696,8,-11.1,0,0.0471,0.326,4e-06,0.318,0.933,202.097,5
583,Enjoy Yourself,The Jacksons,The Jacksons,1976,204413,4,0.601,0.643,9,-8.912,0,0.224,0.171,0.0,0.536,0.822,203.669,44


Is disco loud?

In [22]:
express.histogram(data_frame=df, x='Loudness')

Yes; disco is loud. Disco predates the Loudness Wars, but by standards of the time disco was loud.

In [23]:
df[df['Loudness'] > -4].sort_values(by='Loudness')

Unnamed: 0,Track,Artist,Album,Year,Duration,Time_Signature,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Popularity
703,Hold Tight,Vicki Sue Robinson,Half and Half (Expanded Edition),1976,326626,4,0.691,0.961,0,-3.914,1,0.0558,0.141,0.000147,0.598,0.833,123.459,6
132,I'll Be There (feat. Nile Rodgers) - Single Ve...,CHIC,I'll Be There (feat. Nile Rodgers),1980,217849,4,0.855,0.826,0,-3.882,1,0.0798,0.13,3e-06,0.0782,0.798,125.013,35
442,Love Gun,Rick James,Fire It Up,1981,346453,4,0.761,0.938,9,-3.856,1,0.0635,0.00652,0.1,0.1,0.748,121.714,32
529,Everything Is Cool,T-Connection,Classic Masters,1979,267573,4,0.732,0.9,0,-3.651,1,0.0873,0.34,0.00376,0.0799,0.791,102.818,9
114,Midnight Madness,Cerrone,Don Ray - Garden of Love,1979,343000,4,0.789,0.932,11,-3.384,0,0.0493,0.00402,0.00476,0.0719,0.886,127.675,8
