# Final project
##  Proportional distribution of genre categories

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
data = pd.read_csv("spotify_songs.csv") #Load data

In [None]:
genre_counts = data['playlist_genre'].value_counts() #Counts hoy many times a genre is found in the database

In [None]:
colors = sns.color_palette("Greens") #setting green palette

#Plot
plt.pie(genre_counts.values, colors=colors, autopct='%1.1f%%', wedgeprops={'edgecolor': 'black'}) # Autopct shows the percentages of each genre & Wedgeprops adds black edges for clarity

#Add legend with genres
plt.legend(labels=genre_counts.index, title="Genres", fontsize=10, bbox_to_anchor=(1.2, .7)) #bbox_to_anchor gives the position of the legend box

plt.title('Proportional Distribution of Spotify Genres (2020)', fontsize=14)
plt.show()

The chart reveals that **EDM, Rap, and Pop** are the most listen genres in Spotify, however no single genre strongly dominates. The difference between the top (EDM) and the lowest genre (Rock) is less than 4%. This genre diversity suggests that listeners in 2020 valued music variety and were open to exploring different styles.

## Genres popularity

In [None]:
#Group by Genre and calculate the average track_popularity score
top_genres = (data.groupby("playlist_genre")["track_popularity"].mean().sort_values(ascending=False).reset_index()) 

In [None]:
# Rename column
top_genres = top_genres.rename(columns={"track_popularity": "average_popularity"}) 

In [None]:
#Plot
bar=sns.barplot(x=top_genres['playlist_genre'], y=top_genres['average_popularity'], data=top_genres, palette='Greens_d')
plt.title('Genres by Average Popularity', fontsize=14)
plt.xlabel('Genre', fontsize=12)
plt.ylabel('Average Popularity', fontsize=12)
bar.bar_label(bar.containers[0], fmt='%.2f') #bar_label shows the popularity score of each genre
plt.show()

The **Pop** genre has the highest average popularity score (47.74), closely followed by **Latin** (47.03). These genres might resonate most with listeners in 2020. There is a noticeable gap between the top genres and the lowest-performing genre (**EDM**). The difference of about 13 points suggests a clear distinction in audience preference.

## Top 5 artists in terms of popularity score

In [None]:
#Group by Artist and calculate the average track_popularity score
top_artists = (data.groupby("track_artist")["track_popularity"].mean().sort_values(ascending=False).reset_index()) 

In [None]:
top_artists = top_artists.rename(columns={"track_popularity": "average_popularity"}) # Rename column
top_artists.head(5) # Save top 5

In [None]:
#plot
sns.barplot(x=top_artists['track_artist'], y=top_artists['average_popularity'], data=top_artists, palette='Greens_d')
plt.title('Top Artists by Average Popularity', fontsize=14)
plt.xlabel('Artist', fontsize=12)
plt.ylabel('Average Popularity', fontsize=12)
plt.ylim(80, 100)  # Change y-axis limits for better visualization
plt.show()

These artists have the highest average popularity scores across their tracks. This indicate that they dominated listener preferences in 2020, maintaining a high level of popularity across multiple tracks in the dataset. **Trevor Daniel** leads with the highest average popularity score of 97, making him the most popular artist of the database. **Y2K** and **Don Toliver** follow closely, with their average scores ~91. Finally, **Roddy Ricch**, and **DaBaby** complete the list, with average scores ~88.