* `Acousticness:` A numerical variable, a confidence measure from 0.0 to 1.0 indicating whether the track is acoustic. 1.0 represents high confidence that the track is acoustic.

* `Danceability:` A numerical variable, danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

* `Duration_ms:` A numerical variable, the duration of the track in milliseconds.

* `Duration_min:` A numerical variable, the duration of the track in minutes.

* `Energy:` A numerical variable, energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

* `Explicit:` A categorical variable, whether the track has explicit lyrics or not (true = yes (1); false = no (0), unknown).

* `Id:` The Spotify ID for the track.

* `Instrumentalness:` A numerical variable, predicts whether a track does not contain vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater the likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

* `Key:` A numerical variable, the overall estimated key of the track. Integers map to pitches using standard Pitch Class notation. For example, 0 = C, 1 = C#/Db, 2 = D, and so on. If no key is detected, the value is -1.

* `Liveness:` A numerical variable, detects the presence of an audience in the recording. Higher liveness values represent a higher probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

* `Loudness:` A numerical variable, the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing the relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Typical values range between -60 and 0 db.

* `Mode:` A numerical variable, mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

* `Popularity:` A numerical variable, the popularity of a track is a value between 0 and 100, with 100 being the most popular. Popularity is calculated by algorithm and is based, in large part, on the total number of plays the track has had and how recent those plays are.

* `Speechiness:` A numerical variable, speechiness detects the presence of spoken words in a track. The more exclusively spoken the recording (e.g., talk show, audiobook, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including cases such as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

* `Tempo:` A numerical variable, the overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

* `Valence:` A numerical variable, a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g., happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g., sad, depressed, angry).

* `Year:` The year the track was released.

In [None]:
# Data Loading and Initial Analysis

import pandas as pd
import numpy as np

# Load the dataset
data = pd.read_csv('https://raw.githubusercontent.com/user/repo/branch/data.csv')

# Display the first two rows
print(data.head(2))

# List unique years in the dataset
print(data["year"].unique())

# Check the shape of the dataset
print(data.shape)

# Drop unnecessary columns
data = data.drop(["explicit", "key", "mode"], axis=1)

# Check the shape of the dataset after dropping columns
print(data.shape)

# Check for missing values
print(data.isna().sum())

#Data Visualization

import plotly.express as px

# Example of a line plot with Plotly Express
fig = px.line(data, x="year", y="loudness", title="Loudness Over Years")
fig.show()

import plotly.graph_objects as go

# Example of creating a figure with Plotly Graph Objects
fig = go.Figure()
# Here, you would add traces or layout adjustments as needed
fig.show()

#Applying K-Means Clustering

from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

# Create a K-Means clustering pipeline with PCA
clustering_pipeline = Pipeline([
    ('pca', PCA(n_components=2)),
    ('kmeans', KMeans(n_clusters=5, random_state=42))
])

# Fit the pipeline to the data
clustering_pipeline.fit(data)

# Example of using the pipeline to predict clusters
clusters = clustering_pipeline.predict(data)


#Spotify API and Spotipy Library
!pip install spotipy
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

# Set up Spotipy client credentials
client_credentials_manager = SpotifyClientCredentials(client_id='YOUR_CLIENT_ID', client_secret='YOUR_CLIENT_SECRET')
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

# Recommendation Function and Visualization

def recommend_song(song_name):
    # Here you would define the logic to recommend a song based on the input song name
    # This could involve querying the Spotify API, performing clustering analysis, etc.
    pass

# Example usage of the recommendation function
recommend_song('Ed Sheeran - Shape of You')