![](https://i.ytimg.com/vi/8d7ywKCm6HI/maxresdefault.jpg)

1. Project Overview
* Project Title: Clearly state the name of the project.
* Objective: Explain the purpose of the dashboard. For example, "The dashboard provides insights into sales trends and helps in decision-making for business growth."
* Problem Statement: Briefly describe the problem or business need the dashboard aims to solve. For example, "The sales team struggled to track real-time performance across regions, making it difficult to identify key growth areas."
2. Data Description
* Data Source: Mention where the data comes from.
* Data Volume: Provide an estimate of the dataset size or number of records.
* Data Fields: Highlight key fields or variables used (e.g., "sales, revenue, customer demographics, product categories").
* Data Cleaning: Briefly touch on steps taken to prepare the data, such as handling missing values or duplicate records.

3. Target Audience
* Define who will use the dashboard and how it will help them. For example, "The dashboard is designed for the sales team to monitor their performance and for senior management to identify high-performing regions."
4. Key Features
* Highlight the main features of the dashboard. For example:
    - Real-time updates
    - Drill-down functionality
    - Interactive filters
    - KPIs (e.g., total revenue, profit margin, etc.)
    - Trend analysis or predictions

5. Tools and Techniques
* Specify the tools used in the project (e.g., "matplotlib/Seaborn for visualization, SQL/pandas for querying data, pandas for data preprocessing").
* Mention any advanced techniques or features used, such as pandas aggrigate, measures, or advanced visualizations.

6. Project Scope and Limitations
* Scope: Define what the project covers (e.g., "Analysis of regional sales data for the past year").
* Limitations: Acknowledge constraints, such as data granularity or unavailability of certain datasets.

7. Outcome/Expected Results
* Share the key insights or value provided by the dashboard. For example, "The EDA Analysis identifies top-performing regions and products, enabling the sales team to focus on high-potential areas."

8. Future Enhancements (Optional)
* Suggest possible improvements or additional features. For example, "Integration with predictive analytics to forecast future sales trends."

# **üìå Project Overview ‚Äì Spotify & YouTube Music Insights Dashboard**

## **1. Project Overview**

### **Project Title:**

**Spotify & YouTube Music Performance Analysis Dashboard**

### **Objective:**

The objective of this project is to create an interactive dashboard that combines Spotify track analytics and YouTube video performance metrics to understand how musical attributes relate to audience engagement. The dashboard enables users to analyze trends, compare performance across platforms, and make data-driven decisions in music production, marketing, and audience targeting.

### **Problem Statement:**

Music creators and marketing teams often struggle to correlate a song‚Äôs audio characteristics (like energy, tempo, danceability) with its real-world audience engagement (views, likes, comments). Additionally, performance is spread across multiple platforms, making it difficult to get a unified view.
This dashboard solves this by integrating Spotify and YouTube data to help identify which attributes contribute to higher engagement and streaming popularity.

---

## **2. Data Description**

### **Data Source:**

* **Spotify API / Dataset** containing track attributes (danceability, energy, tempo, duration, etc.)
* **YouTube dataset** containing official video stats (views, likes, comments, channel info, etc.)

### **Data Volume:**

* **9 records** (sample subset provided)
* **25+ columns** including audio features and video metrics.

### **Data Fields:**

Key fields include:

* **Spotify fields:**
  Artist, Track, Album, URI, Danceability, Energy, Key, Loudness, Speechiness, Acousticness, Instrumentalness, Liveness, Valence, Tempo, Duration_ms
* **YouTube fields:**
  Title, Channel, Views, Likes, Comments, Description, Licensed, Official Video
* **Common fields:**
  Song title, links to Spotify and YouTube, streaming count

### **Data Cleaning:**

During preprocessing:

* Removed/handled missing values
* Converted numeric fields (views, likes, duration) to correct data types
* Standardized naming between Spotify tracks and YouTube titles
* Checked for duplicates and dropped repeated entries
* Cleaned textual fields (special characters, formatting issues)

---

## **3. Target Audience**

The dashboard is designed for:

* **Music industry analysts** to compare performance across platforms
* **Artists and music producers** looking to understand which audio features drive engagement
* **Marketing teams** assessing the success of releases
* **Fans or researchers** studying Gorillaz discography and popularity trends

It helps them evaluate song performance, identify patterns, and make strategic decisions for future releases.

---

## **4. Key Features of the Dashboard**

* **Interactive filters** (Track, Album, Year, etc.)
* **Drill-down functionality** (Song ‚Üí Audio features ‚Üí YouTube engagement)
* **KPI Cards** showing:

  * Total YouTube views
  * Total likes
  * Average Spotify danceability/energy
  * Total streams
* **Comparative analysis** between Spotify metrics and YouTube metrics
* **Visualizations such as:**

  * Audio feature radar chart
  * Tempo vs popularity scatterplot
  * Views vs likes correlation graph
  * Stream count vs YouTube engagement
* **Trend/Pattern analysis** across multiple tracks
* **Artist-wise or album-wise grouping**

---

## **5. Tools and Techniques**

* **Python Libraries:** pandas, numpy, matplotlib, seaborn
* **Data Cleaning & Preparation:** pandas
* **EDA & Visualizations:** seaborn, matplotlib
* **Data Aggregation:** pandas `groupby()`, statistical summaries
* **Optional:** Power BI/Tableau for dashboarding
* **Techniques Used:**

  * Correlation analysis
  * Feature comparison charts
  * Aggregated metrics for streaming and engagement
  * Trend identification using line/bar/heatmap visuals

---

## **6. Project Scope and Limitations**

### **Scope:**

* Analysis of **Gorillaz tracks** based on Spotify‚Äôs audio attributes and YouTube engagement metrics
* Comparison between platform popularity and musical characteristics
* Identification of patterns that may influence audience engagement

### **Limitations:**

* Dataset is small (9 records) and represents only a subset of songs
* YouTube and Spotify popularity may differ due to marketing factors not included
* No real-time data updates
* Audio sentiment or lyrical analysis not included

---

## **7. Outcome / Expected Results**

The analysis provides insights such as:

* Which Gorillaz tracks perform best on YouTube based on views, likes, and comments
* Which Spotify audio attributes (energy, tempo, danceability) correlate with higher engagement
* Identification of standout tracks like **Feel Good Inc.**, **Clint Eastwood**, and **On Melancholy Hill**
* Helps stakeholders understand audience preferences and optimize future releases

This EDA supports data-driven decisions for music production and promotional strategies.

---

## **8. Future Enhancements**

* Integrate **predictive analytics** to forecast future views/streams
* Include **sentiment analysis** on YouTube comments
* Add **real-time API integration**
* Expand dataset to multiple artists or full discographies
* Add a **recommendation model** for predicting a song‚Äôs popularity based on audio features

---

# **üìå Exploratory Data Analysis (EDA) Steps

(Spotify & YouTube Music Dataset)**

---

## **1. Import Libraries**

Start by loading all required Python libraries:

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

## **2. Load the Dataset**

* Read the combined Spotify‚ÄìYouTube CSV file using `pd.read_csv()`.
* Display the first few rows using `df.head()` to understand the structure.

In [None]:
data = pd.read_csv(r"Spotify_Youtube Dataset.csv")

In [None]:
data

In [None]:
data.columns                   # to show the names of columns of the dataframe

## 3. Understand the Dataset Structure

Check the basic features:

- df.info() ‚Üí data types, missing values

- df.shape ‚Üí number of rows and columns

- df.columns ‚Üí list of available fields

In [None]:
data.info()                   # to get some basic information about the datset

## **4. Data Cleaning**

Perform necessary cleaning steps:

* Handle missing values (`dropna` / `fillna`)
* Remove duplicate rows
* Convert data types (views, likes, comments ‚Üí integer; duration ‚Üí numeric)
* Clean text columns (track name, album, channel)
* Standardize column names
* Ensure track titles match between Spotify and YouTube entries


In [None]:
# removing the columns from the dataframe

data.drop(columns = ['Unnamed: 0', 'Url_spotify', 'Uri', 'Url_youtube'], inplace = True)

In [None]:
data

In [None]:
# checking missing values counts in each column of the dataframe

data.isna().sum()

In [None]:
# filling the missing values with 0 in Likes & Commnets column

data['Likes'] =  data['Likes'].fillna(0)
data['Comments'] = data['Comments'].fillna(0)

In [None]:
data.isnull().sum()     # to check the count of missing values in each column

In [None]:
data.dropna(inplace = True)               # It drops the rows that contains all or any missing values

In [None]:
data.isnull().sum()     # to check the count of missing values in each column

In [None]:
data.info()

## **5. Descriptive Statistics**

Use statistical summaries to understand variable distributions:

* `df.describe()` for numeric fields
* Check min, max, mean, median, standard deviation
* Identify outliers (e.g., extremely high YouTube views)

## **6. Univariate Analysis (Single Variable Analysis)**

Analyze each major column individually.

### **Spotify Audio Features:**

* Distribution plots of danceability, energy, tempo, valence
* Boxplots to detect outliers
* Count how many songs are album vs single

### **YouTube Metrics:**

* Distribution of views
* Likes distribution
* Comments distribution
* Licensed vs non-licensed ‚Äî count values

#### Q.1) Top 10 Artists - with the Highest Views on YouTube?

In [None]:
data.head(2)

In [None]:
Artist_grouped =  data.groupby('Artist')['Views'].sum()

In [None]:
Artist_grouped

In [None]:
Artist_sorted =  Artist_grouped.sort_values(ascending = False)

In [None]:
Artist_sorted.head(10)

#### Q.2) Top 10 Tracks - with the Highest Streams on Spotify?

In [None]:
data.head(1)

In [None]:
x = data[['Track', 'Stream']]          # creating a new dataframe with 2 columns - Track & Stream

x

In [None]:
most_stream_track =  x.sort_values(by = ['Stream'], ascending=False).head(10) # sorting the dataframe wrt Stream column

In [None]:
most_stream_track

* Insights 
    - Blinding Lights is the highest streaming songs 
    - Believer is lovest streaming  songes
    - AVG  streaming durations is approx 2.3min to 2.6min 
    

## **7. Bivariate Analysis (Relationship Between Two Variables)**

Study relationships between Spotify features and YouTube engagement.

Examples:

* Scatter plot: **Energy vs Views**
* Scatter plot: **Danceability vs Likes**
* Heatmap correlation for all numeric variables
* Views vs Likes correlation
* Duration vs Views relationship

---

### Question For You - 

Q.2A) 5 Tracks - with the Lowest Streams on Spotify?

#### Q.3) What are the most common Album Types on Spotify? How many tracks belong to each album type?

In [None]:
data.head(1)

In [None]:
data.Album_type.unique()              # to check the unique values in a column

In [None]:
a_type =  data['Album_type'].value_counts()     # It shows all unique values with their counts in the column

a_type

In [None]:
# draw a Pie chart

plt.pie( a_type, labels =  a_type.index, autopct = "‚Äò%1.1f%%‚Äô", startangle= 60 , 
        colors= 'myr', shadow='True', explode = (0.05,0.05,0.05),  pctdistance = 0.75)

plt.show()

Pie Chart                          -         
plt.pie(slices, labels= activities, colors = ‚Äòbryg‚Äô, startangle= , shadow=True, explode=(0,0,0.1,0.2), autopct= ‚Äò%1.1f%%‚Äô, pctdistance=0.75 ) . 

Slices = [12,15,20,10] , 

activities = [‚Äòeating‚Äô, ‚Äò sleeping‚Äô , ‚Äòworking‚Äô, ‚Äòplaying‚Äô].   

#Explode ‚Äì To cut the slices out. 

Autopct ‚Äì To show the % on the chart using string format. 

pctdistance ‚Äì Distance of % from center

#Compare parts of data to the whole. It shows the size of items(wedges) in one data series proportional to the sum of the items. 


## **8. Multivariate Analysis**

Look at combined factors.

* Correlation heatmap (Spotify + YouTube metrics)
* Pairplots for audio features
* Group-by analysis:

  * Album-wise average views
  * Audio feature averages per album
  * Track popularity ranking

---


#### Q.4) How do the Average Views, Likes, and Comments are compared between different Album Types?

In [None]:
data.head(1)

In [None]:
# group the Album Type column, and show the mean of three columns

df = data.groupby('Album_type')[['Likes', 'Views', 'Comments']].mean()

df

In [None]:
type(df)

In [None]:
df = df.reset_index()                # rest_index - To convert the index of a Series into a column to form a DataFrame

df

In [None]:
# melt - unpivot a dataframe

df_melted  = pd.melt( df, id_vars = 'Album_type', var_name = "Attribute", value_name = 'Total' )

df_melted

In [None]:
# Draw the Bar Plot

plt.figure(figsize = (9,4))

sns.barplot( x = 'Album_type', y = 'Total', hue = 'Attribute', data = df_melted );

#### Q.5) Top 5 YouTube Channels -  based on the Views?

In [None]:
data.head(1)

In [None]:
c_views = data.groupby('Channel')['Views'].sum().sort_values(ascending=False).head()

c_views

In [None]:
c_views = c_views.reset_index()

c_views.head(10)

In [None]:
type(c_views)

In [None]:
# sns.set_style("whitegrid")

sns.barplot( x = "Views", y = "Channel", data = c_views, color='black')
plt.title('Top 5 Channels by Views')
plt.xlabel('Views')
plt.ylabel('Channel')
plt.show()

#### Q.6) The Top Most Track -  based on Views?

In [None]:
data.head(2)

In [None]:
data.sort_values( by = 'Views', ascending = False).head(1)

#### Q.7) Which Top 7 Tracks have the highest Like-to-View ratio on YouTube? 

In [None]:
data.head(1)

In [None]:
track_lv = data[['Track', 'Likes', 'Views']]

track_lv

In [None]:
track_lv['LV_Ratio'] = data['Likes']/data['Views'] * 100

In [None]:
track_lv.drop('LV_Ration', axis=1, inplace=True)

In [None]:
track_lv

In [None]:
track_lv.sort_values( by = 'LV_Ratio', ascending = False).head(7)

#### Q.7.A) Which Top 3 Tracks have the lowest Like-to-View ratio on YouTube? 

#### Q.8) Top Albums having the Tracks with Maximum Danceability ?

In [None]:
data.head(1)

---


## **9. Outlier Detection**

Check abnormal values:

* Boxplots for views, likes, comments
* Identify which tracks significantly exceed averages
* Understand why (e.g., ‚ÄúFeel Good Inc.‚Äù high popularity)

---


In [None]:
# creating groups for each Album

T_danceability =  data.groupby('Album')['Danceability'].sum().sort_values(ascending=False)

T_danceability

In [None]:
data[data.Album == 'Greatest Hits']          # filtering the dataframe with 'Greatest Hits'

## **10. Feature Engineering (Optional but useful)**

* Convert `duration_ms` ‚Üí duration in minutes
* Calculate engagement score = likes + comments
* Normalize audio features (0‚Äì1) for plotting
* Categorize tempo ranges (slow, medium, fast)

---


#### Q.9) What is the Correlation between Views, Likes, Comments, and Stream?

In [None]:
data.head(1)

In [None]:
# creating a new dataframe with 4 columns

df_vlcs = data[['Views', 'Likes', 'Comments', 'Stream']] 

df_vlcs

In [None]:
df_vlcs.corr()                      # correlation matrix for the required columns

In [None]:
sns.heatmap(df_vlcs.corr())            # drawing a heatmap for the correlation matrix

--------

## **11. Visualization of Insights**

Create meaningful graphs:

* Bar chart: Views per track
* Bar chart: Streams per track
* Heatmap: Correlation across all metrics
* Radar chart: Audio profile per track
* Bubble chart: Tempo (x) vs Energy (y) sized by Views
* Album-wise performance chart

---

## **12. Identifying Key Insights**

Examples of insights you may extract:

* Which track has the highest views/likes/comments?
* Which song has the highest danceability or energy?
* Do high-energy songs perform better on YouTube?
* Are singles more popular than album tracks?
* Correlation between streams and YouTube views.

---

## **13. Document Findings**

Summarize:

* Observations
* Patterns
* Anomalies
* Insights linked to musical attributes and viewer engagement

---

## IN fUTURE

## **14. Prepare Data for Dashboard**

* Export cleaned data to CSV
* Ensure KPIs (views, likes, energy, danceability) are properly formatted
* Add calculated metrics for use in Power BI visualization

----

### **üìå  Conclusion**

The EDA reveals clear relationships between Spotify audio features and YouTube engagement.
Tracks with higher energy, danceability, and tempo tend to attract more views and likes.
Popular songs like *Feel Good Inc.* and *Clint Eastwood* show consistently strong performance across platforms.
The combined dataset helps uncover patterns that support better music production and marketing decisions.
Despite its small size, the analysis provides meaningful insights into factors influencing music popularity.

---
