This notebook introduces a unique dataset that provides insights into the most popular songs of 2023 on Spotify. This dataset goes beyond the ordinary by offering a wealth of features that shed light on the musical landscape of the year. It provides a deep dive into track attributes, popularity metrics, and cross-platform presence, making it a valuable resource for music lovers, data analysts, and anyone interested in contemporary music culture.

With this dataset, you can discover the nuances of chart-topping songs, explore the intricacies of musical attributes, and uncover the trends that define the sound of 2023. Whether you are a music researcher, a data scientist, or simply a curious listener, this dataset invites you to explore, analyze, and appreciate the artistry that defines this year's most famous songs.

# 1. Importing Datasets

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import pandas as pd

# Loading the dataset with 'ISO-8859-1' encoding
songs_data = pd.read_csv('/kaggle/input/top-spotify-songs-2023/spotify-2023.csv', encoding='ISO-8859-1')



# Display the first few rows of the dataset
songs_data.head()

In [None]:
basic_stats = songs_data.describe()
basic_stats

**Cleaning the Data**

In [None]:
# Check data types of all columns
songs_data.dtypes

In [None]:
# Remove non-numeric values from 'streams' column and convert to integer
songs_data['streams'] = pd.to_numeric(songs_data['streams'], errors='coerce')

songs_data["streams"].fillna(0, inplace=True)
songs_data["streams"] = songs_data["streams"].astype("int")

# Check data types again
songs_data.dtypes

# 2. Lets Start the Analysis

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Top 10 artists with most songs in the dataset
top_artists = songs_data['artist(s)_name'].value_counts().head(10)

# Plot
plt.figure(figsize=(10, 5))
sns.barplot(x=top_artists.values, y=top_artists.index, palette='viridis')
plt.xlabel('Number of Songs')
plt.ylabel('Artist(s) Name')
plt.title('Top 10 Artists with Most Songs')
plt.show()

**These are the artists with the most songs in the dataset, ranked by number of songs.**

*     Taylor Swift: 34 songs
*     The Weeknd: 22 songs
*     Bad Bunny: 19 songs
*     SZA: 19 songs
*     Harry Styles: 17 songs
*     Kendrick Lamar: 12 songs
*     Morgan Wallen: 11 songs
*     Ed Sheeran: 9 songs
*     BTS: 8 songs
*     Feid: 8 songs



In [None]:
# Top 10 songs with most streams on Spotify
top_streams_spotify = songs_data[['track_name', 'artist(s)_name', 'streams']].sort_values(by='streams', ascending=False).head(10)

top_streams_spotify["streams"] = top_streams_spotify["streams"].astype("int")

# Plot
plt.figure(figsize=(10, 5))
sns.barplot(x=top_streams_spotify['streams'], y=top_streams_spotify['track_name'], palette='viridis')
plt.xlabel('Streams (in billions)')
plt.ylabel('Track Name')
plt.title('Top 10 Songs with Most Streams on Spotify')
plt.xticks(rotation=45)
plt.show()

top_streams_spotify

 **Findings: The most stream song in Spotify is Blinding Lights by The Weeknd**

<img src="https://www.rollingstone.com/wp-content/uploads/2020/02/TheWeeknd.jpg" alt="drawing" width="400"/>



In [None]:
# Top 10 songs with highest presence in Apple Music playlists
top_apple_playlists = songs_data[['track_name', 'artist(s)_name', 'in_apple_playlists']].sort_values(by='in_apple_playlists', ascending=False).head(10)

# Plot
plt.figure(figsize=(10,5))
sns.barplot(x=top_apple_playlists['in_apple_playlists'], y=top_apple_playlists['track_name'], palette='viridis')
plt.xlabel('Total number of playlists on Apple Music')
plt.ylabel('Track Name')
plt.title('Top 10 most-added songs to Apple Music playlists')
plt.xticks(rotation=45)
plt.show()

top_apple_playlists

In [None]:
# Plot Top 10 artists with most streams on Spotify

grouped = songs_data[['artist(s)_name', 'streams']].groupby(['artist(s)_name']).sum('streams').reset_index()
grouped = grouped.sort_values('streams', ascending=False).head(10)
grouped

In [None]:

x=grouped['streams'].head(10)
y=grouped['artist(s)_name'].head(10)

# Plot the values
plt.figure(figsize=(10,5))
sns.barplot(x=x, y=y, palette='viridis')
plt.xlabel('Streams (in billions)')
plt.ylabel('Artists')
plt.title('Top 10 Artists with Most Streams on Spotify')
plt.xticks(rotation=0)

plt.show()

**Top 5 years with the hightest number of tracks**

In [None]:
# Let's revisit the top 5 years with the hightest number of tracks in the dataframe

songs_data.groupby(['released_year'])['released_year']                 \
                             .count()                                  \
                             .reset_index(name='count')                \
                             .sort_values(['count'], ascending=False)  \
                             .head(5)