# Music Sales across Decades

- <a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/1967_hits.csv">Most sold LPs</a> in 1967.
- <a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/most-streamed-2023.csv">Most streamed albums</a> in 2023.

Which albums stood out (AS IN THE BIGGEST)in terms of sales?

In [1]:
## import library
import pandas as pd

In [2]:
## set display code
pd.options.display.float_format = '{:,.0f}'.format

In [3]:
## import mad math functions package
from scipy.stats import zscore 

In [4]:
## read data for LPs
lp = pd.read_csv("https://raw.githubusercontent.com/sandeepmj/datasets/main/1967_hits.csv")
lp

Unnamed: 0,artist,album,sales,year
0,THE BEATLES,SGT. PEPPER'S LONELY HEARTS CLUB BAND,32000000,1967
1,THE DOORS,THE DOORS,20000000,1967
2,PATSY CLINE,GREATEST HITS,10000000,1967
3,THE BEATLES,MAGICAL MYSTERY TOUR,7032199,1967
4,BOB DYLAN,BOB DYLAN'S GREATEST HITS,6525000,1967
...,...,...,...,...
98,THE VENTURES,POPS IN JAPAN,20740,1967
99,FRANÇOISE HARDY,MA JEUNESSE FOUT LE CAMP,16580,1967
100,PAUL MAURIAT,WORLD TOP HITS,5160,1967
101,LEIKHÓPURINN DÝRIN Í HÁLSASKÓGI,DÝRIN Í HÁLSASKÓGI,4543,1967


In [5]:
## add zscore column to lp df
lp["sales_zscore"] = zscore(lp["sales"])
lp

Unnamed: 0,artist,album,sales,year,sales_zscore
0,THE BEATLES,SGT. PEPPER'S LONELY HEARTS CLUB BAND,32000000,1967,8
1,THE DOORS,THE DOORS,20000000,1967,5
2,PATSY CLINE,GREATEST HITS,10000000,1967,2
3,THE BEATLES,MAGICAL MYSTERY TOUR,7032199,1967,1
4,BOB DYLAN,BOB DYLAN'S GREATEST HITS,6525000,1967,1
...,...,...,...,...,...
98,THE VENTURES,POPS IN JAPAN,20740,1967,-0
99,FRANÇOISE HARDY,MA JEUNESSE FOUT LE CAMP,16580,1967,-0
100,PAUL MAURIAT,WORLD TOP HITS,5160,1967,-0
101,LEIKHÓPURINN DÝRIN Í HÁLSASKÓGI,DÝRIN Í HÁLSASKÓGI,4543,1967,-0


In [6]:
## read data for albums
alb = pd.read_csv("https://raw.githubusercontent.com/sandeepmj/datasets/main/most-streamed-2023.csv")
alb

Unnamed: 0,album,streams_2023
0,Manana Sera Bonito Karol G,5130293275
1,One Thing At A Time Morgan Wallen,4380347931
2,Genesis Peso Pluma,3936960850
3,Drive Tiesto,3876226367
4,Meduza Meduza,3383108898
...,...,...
95,Cracker Island Gorillaz,662551227
96,Senaryo Adie,661023457
97,Dark Side Justine Skye,657963327
98,Trustfall P!nk,654976412


In [7]:
## add zscore column to alb df
alb["streams_zscore"] = zscore(alb["streams_2023"])
alb

Unnamed: 0,album,streams_2023,streams_zscore
0,Manana Sera Bonito Karol G,5130293275,4
1,One Thing At A Time Morgan Wallen,4380347931,3
2,Genesis Peso Pluma,3936960850,3
3,Drive Tiesto,3876226367,3
4,Meduza Meduza,3383108898,2
...,...,...,...
95,Cracker Island Gorillaz,662551227,-1
96,Senaryo Adie,661023457,-1
97,Dark Side Justine Skye,657963327,-1
98,Trustfall P!nk,654976412,-1


In [8]:
## combine lp and alb into one df in order to compare zscores
df1 = pd.concat([lp,alb], ignore_index=True)
df1

Unnamed: 0,artist,album,sales,year,sales_zscore,streams_2023,streams_zscore
0,THE BEATLES,SGT. PEPPER'S LONELY HEARTS CLUB BAND,32000000,1967,8,,
1,THE DOORS,THE DOORS,20000000,1967,5,,
2,PATSY CLINE,GREATEST HITS,10000000,1967,2,,
3,THE BEATLES,MAGICAL MYSTERY TOUR,7032199,1967,1,,
4,BOB DYLAN,BOB DYLAN'S GREATEST HITS,6525000,1967,1,,
...,...,...,...,...,...,...,...
198,,Cracker Island Gorillaz,,,,662551227,-1
199,,Senaryo Adie,,,,661023457,-1
200,,Dark Side Justine Skye,,,,657963327,-1
201,,Trustfall P!nk,,,,654976412,-1


In [9]:
## concat zscores into one column, killing NaNs
df2 = pd.concat([df1['sales_zscore'], df1['streams_zscore']]).dropna().to_frame(name='combined')
df2

Unnamed: 0,combined
0,8
1,5
2,2
3,1
4,1
...,...
198,-1
199,-1
200,-1
201,-1


In [10]:
## now add back into one diff
df3 = pd.concat([df1,df2],axis=1)
df3

Unnamed: 0,artist,album,sales,year,sales_zscore,streams_2023,streams_zscore,combined
0,THE BEATLES,SGT. PEPPER'S LONELY HEARTS CLUB BAND,32000000,1967,8,,,8
1,THE DOORS,THE DOORS,20000000,1967,5,,,5
2,PATSY CLINE,GREATEST HITS,10000000,1967,2,,,2
3,THE BEATLES,MAGICAL MYSTERY TOUR,7032199,1967,1,,,1
4,BOB DYLAN,BOB DYLAN'S GREATEST HITS,6525000,1967,1,,,1
...,...,...,...,...,...,...,...,...
198,,Cracker Island Gorillaz,,,,662551227,-1,-1
199,,Senaryo Adie,,,,661023457,-1,-1
200,,Dark Side Justine Skye,,,,657963327,-1,-1
201,,Trustfall P!nk,,,,654976412,-1,-1


In [11]:
## sort by descending to get the top 25
df3.sort_values(by= ['combined'], ascending = False).head(25)

Unnamed: 0,artist,album,sales,year,sales_zscore,streams_2023,streams_zscore,combined
0,THE BEATLES,SGT. PEPPER'S LONELY HEARTS CLUB BAND,32000000.0,1967.0,8.0,,,8
1,THE DOORS,THE DOORS,20000000.0,1967.0,5.0,,,5
103,,Manana Sera Bonito Karol G,,,,5130293275.0,4.0,4
104,,One Thing At A Time Morgan Wallen,,,,4380347931.0,3.0,3
105,,Genesis Peso Pluma,,,,3936960850.0,3.0,3
106,,Drive Tiesto,,,,3876226367.0,3.0,3
2,PATSY CLINE,GREATEST HITS,10000000.0,1967.0,2.0,,,2
107,,Meduza Meduza,,,,3383108898.0,2.0,2
108,,Nadie Sabe Lo Que Va A Pasar Manana Bad Bunny,,,,3341708923.0,2.0,2
109,,1989 (Taylor's Version) Taylor Swift,,,,3264335874.0,2.0,2


In [12]:
## find high score
top_score = df3["combined"].max()
top_score

7.89740421195481

In [13]:
## then filter down df3 to just top result
df4 = df3.query("7.8 <= combined")
df4

Unnamed: 0,artist,album,sales,year,sales_zscore,streams_2023,streams_zscore,combined
0,THE BEATLES,SGT. PEPPER'S LONELY HEARTS CLUB BAND,32000000,1967,8,,,8


In [14]:
## then isolate top album
top_album = df4["album"].max()
top_album

"SGT. PEPPER'S LONELY HEARTS CLUB BAND"

In [15]:
## then print sentence
print(f"The album with the highest sales or most streams, since 1967, is {top_album}.")

The album with the highest sales or most streams, since 1967, is SGT. PEPPER'S LONELY HEARTS CLUB BAND.
