## Coachella 2024 Lineup, Exploratory Data Analysis
This dataset was created using information from Spotify's Web API. The code I used to create the dataset can be found [here](https://github.com/joylee13/Data-Projects/blob/main/main.py)!

In [407]:
# import libraries
import pandas as pd
import numpy as np
import plotly.express as px

In [425]:
# read csv
df = pd.read_csv("lineup.csv")
df.head()

Unnamed: 0,name,id,genres,popularity,followers,external_urls,href,images,type,uri
0,Adam Ten,05tmGPn4fFdVpnsMt0YW5S,['israeli techno'],43,"{'href': None, 'total': 16501}",{'spotify': 'https://open.spotify.com/artist/0...,https://api.spotify.com/v1/artists/05tmGPn4fFd...,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",artist,spotify:artist:05tmGPn4fFdVpnsMt0YW5S
1,The Adicts,5f5B8Zd0xuvmcDin3oK76F,"['hardcore punk', 'oi', 'psychobilly', 'punk',...",39,"{'href': None, 'total': 167548}",{'spotify': 'https://open.spotify.com/artist/5...,https://api.spotify.com/v1/artists/5f5B8Zd0xuv...,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",artist,spotify:artist:5f5B8Zd0xuvmcDin3oK76F
2,Adriatique,02DWGcShQivFepRvGJ7xhB,"['deep disco house', 'melodic techno', 'swiss ...",57,"{'href': None, 'total': 246023}",{'spotify': 'https://open.spotify.com/artist/0...,https://api.spotify.com/v1/artists/02DWGcShQiv...,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",artist,spotify:artist:02DWGcShQivFepRvGJ7xhB
3,Marcel Dettmann,1sxHp39RqBEE01pgVqsdyP,"['microhouse', 'minimal techno', 'techno']",31,"{'href': None, 'total': 73747}",{'spotify': 'https://open.spotify.com/artist/1...,https://api.spotify.com/v1/artists/1sxHp39RqBE...,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",artist,spotify:artist:1sxHp39RqBEE01pgVqsdyP
4,ANOTR,4p5WgeiPSPpqPDs7T6OkWf,['dutch tech house'],60,"{'href': None, 'total': 81679}",{'spotify': 'https://open.spotify.com/artist/4...,https://api.spotify.com/v1/artists/4p5WgeiPSPp...,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",artist,spotify:artist:4p5WgeiPSPpqPDs7T6OkWf


### Data Cleaning

In [426]:
df.shape  # dataframe has 146 rows and 10 columns

(146, 10)

In [427]:
df.duplicated().sum() # count duplicate rows

1

In [428]:
df[df.duplicated('name', keep=False) == True]

Unnamed: 0,name,id,genres,popularity,followers,external_urls,href,images,type,uri
49,Peggy Gou,2mLA48B366zkELXYx7hcDN,"['electra', 'float house']",70,"{'href': None, 'total': 855639}",{'spotify': 'https://open.spotify.com/artist/2...,https://api.spotify.com/v1/artists/2mLA48B366z...,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",artist,spotify:artist:2mLA48B366zkELXYx7hcDN
109,Peggy Gou,2mLA48B366zkELXYx7hcDN,"['electra', 'float house']",70,"{'href': None, 'total': 855639}",{'spotify': 'https://open.spotify.com/artist/2...,https://api.spotify.com/v1/artists/2mLA48B366z...,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",artist,spotify:artist:2mLA48B366zkELXYx7hcDN


In [429]:
df.drop_duplicates(inplace=True) # drop duplicates
df = df.reset_index(drop=True)

In [430]:
df.shape # confirm dropping duplicate rows worked

(145, 10)

In [431]:
df.info() # summary information

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145 entries, 0 to 144
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   name           145 non-null    object
 1   id             145 non-null    object
 2   genres         145 non-null    object
 3   popularity     145 non-null    int64 
 4   followers      145 non-null    object
 5   external_urls  145 non-null    object
 6   href           145 non-null    object
 7   images         145 non-null    object
 8   type           145 non-null    object
 9   uri            145 non-null    object
dtypes: int64(1), object(9)
memory usage: 11.5+ KB


In [432]:
df.columns[np.sum(df.isnull()) != 0] # check if any columns have missing values

Index([], dtype='object')

In [433]:
# drop unnecessary columns
cols = ['name', 'genres', 'popularity', 'followers']
df = df[cols]

In [434]:
df.head()

Unnamed: 0,name,genres,popularity,followers
0,Adam Ten,['israeli techno'],43,"{'href': None, 'total': 16501}"
1,The Adicts,"['hardcore punk', 'oi', 'psychobilly', 'punk',...",39,"{'href': None, 'total': 167548}"
2,Adriatique,"['deep disco house', 'melodic techno', 'swiss ...",57,"{'href': None, 'total': 246023}"
3,Marcel Dettmann,"['microhouse', 'minimal techno', 'techno']",31,"{'href': None, 'total': 73747}"
4,ANOTR,['dutch tech house'],60,"{'href': None, 'total': 81679}"


In [None]:
# format followers column
df['followers'] = df['followers'].str.replace('[^0-9]','')
df['followers'] = df['followers'].astype(int)

In [436]:
df.head()

Unnamed: 0,name,genres,popularity,followers
0,Adam Ten,['israeli techno'],43,16501
1,The Adicts,"['hardcore punk', 'oi', 'psychobilly', 'punk',...",39,167548
2,Adriatique,"['deep disco house', 'melodic techno', 'swiss ...",57,246023
3,Marcel Dettmann,"['microhouse', 'minimal techno', 'techno']",31,73747
4,ANOTR,['dutch tech house'],60,81679


In [437]:
# format genres column
df['genres'] = df['genres'].str.strip("[]")
df['genres'] = df['genres'].str.replace("'", "")
df['genres'] = df['genres'].str.split(", ")

In [438]:
df.head()

Unnamed: 0,name,genres,popularity,followers
0,Adam Ten,[israeli techno],43,16501
1,The Adicts,"[hardcore punk, oi, psychobilly, punk, uk82]",39,167548
2,Adriatique,"[deep disco house, melodic techno, swiss house]",57,246023
3,Marcel Dettmann,"[microhouse, minimal techno, techno]",31,73747
4,ANOTR,[dutch tech house],60,81679


### Learning About Genres on the Lineup

Spotify classifies each artist into one or more genres. An empty list ( [ ] ) implies that artist has not yet been classified.

In [439]:
# count unique genres and their frequency
genres = np.hstack(df.genres)
genre, counts = np.unique(genres, return_counts=True)
genre_count = np.asarray((genre, counts)).T

In [440]:
len(genres) # 339 genres (as classified by spotify) are represented on the lineup

339

In [441]:
# convert array to dict
genre_dict = {}
for i in range(len(genre_count)):
    genre_dict[genre_count[i][0]] = genre_count[i][1]

In [442]:
# find the 15 most represented genres
top15 = sorted(genre_dict.items(), key=lambda item: item[1], reverse=True)
top15 = top15[:15]
top15

[('house', '6'),
 ('art pop', '5'),
 ('indie pop', '5'),
 ('pop', '5'),
 ('tech house', '5'),
 ('edm', '4'),
 ('indie soul', '4'),
 ('indietronica', '4'),
 ('pop dance', '4'),
 ('pov: indie', '4'),
 ('rap', '4'),
 ('alternative r&b', '3'),
 ('crank wave', '3'),
 ('dance pop', '3'),
 ('electra', '3')]

In [443]:
# Visualize Genres
genres = []
counts = []

for i in range(len(top15)):
    genres.append(top15[i][0])
    counts.append(int(top15[i][1]))

fig = px.bar(top15, 
             x = genres, 
             y = counts, 
             color = counts,
             title = "Top 15 Genres on the Lineup",
             labels={
                     "x": "Genre",
                     "y": "Count"
                 })

fig.show()

### Artist Popularity

In [444]:
# Bottom 10 artists by popularity
df.sort_values(by = ["popularity"], ascending=True).head(10)

Unnamed: 0,name,genres,popularity,followers
79,Keyspan,[],7,177
108,Patrick Mason,[art song],20,1548
67,JJUUJJUU,[neo-psychedelic],23,7947
137,Upchuck,"[atlanta punk, black punk]",27,8072
134,Two Shell,[experimental club],29,24653
3,Marcel Dettmann,"[microhouse, minimal techno, techno]",31,73747
16,Ben Sterling,"[tech house, uk tech house]",31,11101
95,"Mandy, Indiana","[british experimental, crank wave]",33,15962
97,Mdou Moctar,"[desert blues, musique nigerienne, tuareg guitar]",36,113484
124,Son Rompe Pera,[marimba mexicana],36,29745


In [445]:
# Top 10 artists by popularity
df.sort_values(by = ["popularity"], ascending=False).head(10)

Unnamed: 0,name,genres,popularity,followers
109,Peso Pluma,"[corridos tumbados, sad sierreno]",92,11321815
83,Lana Del Rey,"[art pop, pop]",90,32070018
29,Carin Leon,"[corrido, musica mexicana, sad sierreno, sierr...",86,10017933
64,J Balvin,"[reggaeton, reggaeton colombiano, trap latino,...",85,37165272
88,Lil Uzi Vert,"[hip hop, melodic rap, philly rap, rage rap, rap]",85,16620343
136,"Tyler, The Creator","[hip hop, rap]",84,13987546
19,Bizarrap,"[argentine hip hop, pop venezolano, trap argen...",84,14189802
143,Young Miko,"[trap latino, urbano latino]",83,2256899
117,Sabrina Carpenter,[pop],81,7196827
15,Bebe Rexha,"[dance pop, pop]",80,8850459


In [446]:
# Bottom 10 artists by followers
df.sort_values(by = ["followers"], ascending=True).head(10)

Unnamed: 0,name,genres,popularity,followers
111,Rainer & Co,[color noise],37,16
79,Keyspan,[],7,177
108,Patrick Mason,[art song],20,1548
81,Kimonos,[],45,1593
67,JJUUJJUU,[neo-psychedelic],23,7947
137,Upchuck,"[atlanta punk, black punk]",27,8072
71,JOPLYN,[],36,8999
16,Ben Sterling,"[tech house, uk tech house]",31,11101
95,"Mandy, Indiana","[british experimental, crank wave]",33,15962
0,Adam Ten,[israeli techno],43,16501


In [447]:
# Top 10 artists by followers
df.sort_values(by = ["followers"], ascending=False).head(10)

Unnamed: 0,name,genres,popularity,followers
64,J Balvin,"[reggaeton, reggaeton colombiano, trap latino,...",85,37165272
83,Lana Del Rey,"[art pop, pop]",90,32070018
88,Lil Uzi Vert,"[hip hop, melodic rap, philly rap, rage rap, rap]",85,16620343
120,Sid Sriram,[filmi],75,14487541
19,Bizarrap,"[argentine hip hop, pop venezolano, trap argen...",84,14189802
136,"Tyler, The Creator","[hip hop, rap]",84,13987546
109,Peso Pluma,"[corridos tumbados, sad sierreno]",92,11321815
29,Carin Leon,"[corrido, musica mexicana, sad sierreno, sierr...",86,10017933
89,Lil Yachty,"[atl hip hop, melodic rap, rap, trap]",79,9628915
119,Santa Fe Klan,[mexican hip hop],75,8884089


In [448]:
df.describe()

Unnamed: 0,popularity,followers
count,145.0,145.0
mean,59.517241,2115245.0
std,15.433596,5017578.0
min,7.0,16.0
25%,49.0,99268.0
50%,60.0,371555.0
75%,72.0,1176699.0
max,92.0,37165270.0


This lineup has an **average popularity of 59.52%**.

#### Based on this article by [pudding.cool](https://pudding.cool/2023/10/genre/), the top 10 Spotify genres (ranked by streams) in 2023 were:

1. pop

2. rap

3. rock

4. urbano latino

5. hip hop

6. trap latino

7. reggaeton

8. filmi

9. dance pop

10. latin pop

***How many artists on the lineup fall into these genres?***

In [449]:
top_spotify_genres = ["pop", "rap", "rock", "urbano latino", "hip hop", 
                      "trap latino", "reggaeton", "filmi", "dance pop", "latin pop"]

In [None]:
# find artists on the lineup with genres in the top 10

pop_artists = pd.DataFrame()

for i in range(len(df)): # iterate through dataframe
    
    for j in range(len(df["genres"][i])): # iterate through genres list for each artist
        
        df["genres"][i] = tuple(df["genres"][i])
        
        if df["genres"][i][j] in top_spotify_genres:
            
            pop_artists = pop_artists.append(df.loc[i],ignore_index=True)

In [451]:
pop_artists.drop_duplicates(inplace=True)
pop_artists = pop_artists.reset_index(drop=True)
pop_artists

Unnamed: 0,name,genres,popularity,followers
0,Bebe Rexha,"(dance pop, pop)",80.0,8850459.0
1,Bizarrap,"(argentine hip hop, pop venezolano, trap argen...",84.0,14189802.0
2,Blur,"(alternative rock, britpop, madchester, perman...",69.0,3190926.0
3,Deftones,"(alternative metal, nu metal, rap metal, rock,...",78.0,4946809.0
4,DJ Snake,"(edm, electronic trap, pop, pop dance)",78.0,8624963.0
5,J Balvin,"(reggaeton, reggaeton colombiano, trap latino,...",85.0,37165272.0
6,Lana Del Rey,"(art pop, pop)",90.0,32070018.0
7,Lil Uzi Vert,"(hip hop, melodic rap, philly rap, rage rap, rap)",85.0,16620343.0
8,Lil Yachty,"(atl hip hop, melodic rap, rap, trap)",79.0,9628915.0
9,NAV,"(canadian hip hop, canadian trap, melodic rap,...",73.0,3432047.0


In [452]:
print(f"There are {len(pop_artists)} artists on the lineup who are classified into Spotify's top 10 genres. That is {np.round(len(pop_artists)/len(df)*100,2)}% of the lineup.")

There are 16 artists on the lineup who are classified into Spotify's top 10 genres. That is 11.03% of the lineup.


In [453]:
# Visualize Artists
fig = px.bar(pop_artists, 
             x = "name", 
             y = "followers", 
             title = "Popular Artists by Genre",
             color = "genres",
             width=1200, 
             height=600,
             labels={
                     "genre": "genres"
                 })

fig.show()

It is worth mentioning that genres like "pop" are huge and have several overlapping genres such as "nigerian pop", "indie pop", and "japanese teen pop." Classifying music and artists into specific genres is complicated, so although only ~11% of this lineup falls into Spotify's top 10 genres, **it is likely that this percentage is larger in reality.**