# Basic Information

- DataSet Name: Spotify top 50 songs in 2021

- Download URL: The dataset is available at [Kaggle](https://www.kaggle.com/datasets/equinxx/spotify-top-50-songs-in-2021/download?datasetVersionNumber=3)

- License: The dataset is free for public use with no restrictions.

- Dataset File Size: 4kB with 50 rows of the top 50 songs in 2021



# Dataset Description

This dataset contains information on the 50 most popular songs on Spotify in 2021. It includes 14 descriptive variables for each song, such as popularity, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration, and time signature. These variables provide insight into the characteristics of the songs that make them popular among listeners. 

The dataset can be used for various analyses, such as identifying patterns in popular songs, understanding the features that make a song successful. For this project, we will try to present the analysis via visulizations.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from pyecharts import options as opts
from pyecharts.charts import Pie

In [None]:
file_name = 'C:/Users/17164/OneDrive/Desktop/spotify_top50_2021.csv'

In [None]:
df = pd.read_csv(file_name, index_col=0)
df.head()

Check data columns and the corresponding types.

In [None]:
df.info()

## 1. Who is the most popular artist in the dataset?

To determine the most popular artist in the Spotify Top 50 songs dataset, we can create a bar chart of the counts of each artist. We can use seaborn library to draw the barplot and show the result.

In [None]:
plt.figure(figsize=(14, 8))
artist_counts = df["artist_name"].value_counts()

sns.barplot(x=artist_counts.values, y=artist_counts.index)

plt.title("Spotify Top 50 Songs by Artist")
plt.xlabel("Number of Songs")
plt.ylabel("Artist Name")

plt.show()

## 2. How danceability influences its popularity?
We can use scatter plot by seaborn regplot to show the relationship between two variables.

In [None]:
plt.figure(figsize=(14, 8))
sns.regplot(data=df, x="danceability", y="popularity")

plt.title("Danceability vs Popularity")

## 3. What is the correlation among features?

Use seaborn heatmap to show the correlation matrix of the dataframe. The result shows the correlation between each pair of the columns.

In [None]:
plt.figure(figsize=(20,10))
sns.heatmap(df.corr(), annot=True, fmt='.2f')
