#FETCHING DATA FROM SPOTIFY API


**In the fetching data step, first of all we need to import library Spotipy, which is essential for communicating with Spotify's API then we will import libraries needed for API calls, data handling, and data analysis and some specific modules for authentication and data manipulation.**

In [0]:
# Installing the Spotipy library to interact with the Spotify Web API
%pip install spotipy

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
# Importing necessary libraries
import spotipy
import requests
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials
from pyspark.sql.functions import col,expr

**Client_id and client_secret will be obtained from Spotify's developer dashboard. They are required to access Spotify's API**

In [0]:
# Setting up Spotify API credentials
client_id = "bf0d9d77e23e4ce3ae4462f94af3c01c"
client_secret = "ce41b3e502ba4b3e90f824c1a3ebaa46"

In [0]:
# Initializing Spotify client with credentials
client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

**Makes the actual call to Spotify's API to retrieve a list of artists names and artist IDs based on 100 recommended tracks which seed genre are multiple genres**


In [0]:
# Fetching recommended artists based on seed genres
limit = 100
seed_genres = ["pop", "indie", "country","k-pop", "r-n-b"]
results = sp.recommendations(seed_genres=seed_genres, limit=limit)

# Extracting artist details (just name and ID) from the recommended tracks
artists = []
for track in results["tracks"]:
    artist_name = track["artists"][0]["name"]
    artist_id = track["artists"][0]["id"]
    artist_market = track["available_markets"]
    artist = {"name": artist_name, "id": artist_id, "market": artist_market}
    artists.append(artist)

In [0]:
# Displaying the list of artists along with name and ID collected from Spotify recommendations
artists

[{'name': 'Kip Moore',
  'id': '2hJPr4lk7Q8SSvCVBl9fWM',
  'market': ['AR',
   'AU',
   'AT',
   'BE',
   'BO',
   'BR',
   'BG',
   'CA',
   'CL',
   'CO',
   'CR',
   'CY',
   'CZ',
   'DK',
   'DO',
   'DE',
   'EC',
   'EE',
   'SV',
   'FI',
   'FR',
   'GR',
   'GT',
   'HN',
   'HK',
   'HU',
   'IS',
   'IE',
   'IT',
   'LV',
   'LT',
   'LU',
   'MY',
   'MT',
   'MX',
   'NL',
   'NZ',
   'NI',
   'NO',
   'PA',
   'PY',
   'PE',
   'PH',
   'PL',
   'PT',
   'SG',
   'SK',
   'ES',
   'SE',
   'CH',
   'TW',
   'TR',
   'UY',
   'US',
   'GB',
   'AD',
   'LI',
   'MC',
   'ID',
   'JP',
   'TH',
   'VN',
   'RO',
   'IL',
   'ZA',
   'SA',
   'AE',
   'BH',
   'QA',
   'OM',
   'KW',
   'EG',
   'MA',
   'DZ',
   'TN',
   'LB',
   'JO',
   'PS',
   'IN',
   'BY',
   'KZ',
   'MD',
   'UA',
   'AL',
   'BA',
   'HR',
   'ME',
   'MK',
   'RS',
   'SI',
   'KR',
   'BD',
   'PK',
   'LK',
   'GH',
   'KE',
   'NG',
   'TZ',
   'UG',
   'AG',
   'AM',
   'BS',
   'BB',
   'BZ

**Then for each artist, we will extract the top tracks list**

In [0]:
tracks = []
for artist in artists:
    artist_id = artist["id"]
    # Fetching the top tracks for the given artist ID
    top_tracks = sp.artist_top_tracks(artist_id)
    # Appending the fetched top tracks to the tracks list
    tracks.append(top_tracks)

In [0]:
# Displaying the collected top tracks for each artist
tracks

[{'tracks': [{'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2hJPr4lk7Q8SSvCVBl9fWM'},
       'href': 'https://api.spotify.com/v1/artists/2hJPr4lk7Q8SSvCVBl9fWM',
       'id': '2hJPr4lk7Q8SSvCVBl9fWM',
       'name': 'Kip Moore',
       'type': 'artist',
       'uri': 'spotify:artist:2hJPr4lk7Q8SSvCVBl9fWM'}],
     'external_urls': {'spotify': 'https://open.spotify.com/album/191BU6Uvnf7oNTjO4n36Yu'},
     'href': 'https://api.spotify.com/v1/albums/191BU6Uvnf7oNTjO4n36Yu',
     'id': '191BU6Uvnf7oNTjO4n36Yu',
     'images': [{'height': 640,
       'url': 'https://i.scdn.co/image/ab67616d0000b27384550c77304d8c80bede4b20',
       'width': 640},
      {'height': 300,
       'url': 'https://i.scdn.co/image/ab67616d00001e0284550c77304d8c80bede4b20',
       'width': 300},
      {'height': 64,
       'url': 'https://i.scdn.co/image/ab67616d0000485184550c77304d8c80bede4b20',
       'width': 64}],
     'is_playable': True,
     'n

# DATA CLEANING AND TRANSFORMATION

**In this step, we will access PySpark and define schema to create DataFrame in order to transform data into tabular format. We also cleaned the data by removing some duplicate value and block the null value when defining the schema**

In [0]:
# Importing necessary libraries for creating a SparkSession and defining the schema for a DataFrame
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, LongType, IntegerType



In [0]:
# Defining schema for the DataFrame
schema = StructType([
    StructField("Track ID", StringType(), nullable=False),
    StructField("Track Name", StringType(), nullable=False),
    StructField("Duration (ms)", LongType(), nullable=False),
    StructField("Popularity", IntegerType(), nullable=False),
    StructField("Album Name", StringType(), nullable=False),
    StructField("Album Type", StringType(), nullable=False),
    StructField("Release Date", StringType(), nullable=False),
    StructField("Artist", StringType(), nullable=False)
])

In [0]:
rows = []
for track in tracks:
    for track_data in track["tracks"]:
        track_id = track_data["id"]
        track_name = track_data["name"]
        track_duration = track_data["duration_ms"]
        track_popularity = track_data["popularity"]
        album_name = track_data["album"]["name"]
        album_type = track_data["album"]["album_type"]
        album_release_date = track_data["album"]["release_date"]
        # Combining artist names into a single string
        artist_names = ", ".join([artist["name"] for artist in track_data["artists"]])
        rows.append((track_id, track_name, track_duration, track_popularity, album_name, album_type, album_release_date, artist_names))
# Creating a DataFrame from the collected rows with the predefined schema
df = spark.createDataFrame(rows, schema)

In [0]:
# Remove duplicate values from the DataFrame
df = df.dropDuplicates()
df.show()

+--------------------+--------------------+-------------+----------+--------------------+----------+------------+--------------------+
|            Track ID|          Track Name|Duration (ms)|Popularity|          Album Name|Album Type|Release Date|              Artist|
+--------------------+--------------------+-------------+----------+--------------------+----------+------------+--------------------+
|6Ymvlzom4TQeoKqAW...|Somethin' 'Bout A...|       213826|        67|        Up All Night|     album|  2012-01-01|           Kip Moore|
|53Ji6ZvVjsBl4pIui...|Bones (feat. OneR...|       205792|        52|              Church|     album|  2020-02-07|Galantis, OneRepu...|
|4WKOsXGqZiG5ihoL5...|   Heartbreak Anthem|       183725|        59|                  Rx|     album|  2024-05-17|Galantis, David G...|
|61voPX1C71rhwynuL...|          Beer Money|       218146|        55|        Up All Night|     album|  2012-01-01|           Kip Moore|
|46lFttIf5hnUZMGvj...|     Runaway (U & I)|       22707

In [0]:
# Sort the DataFrame by "Popularity" column in descending order
sorted_df = df.orderBy(col("Popularity").desc())

# Select specific columns and display the results
sorted_df.select("Track Name", "Popularity", "Release Date", "Artist", "Duration (ms)").show(truncate=False)

+--------------------------------------------------------------------------------------------------+----------+------------+----------------------------------+-------------+
|Track Name                                                                                        |Popularity|Release Date|Artist                            |Duration (ms)|
+--------------------------------------------------------------------------------------------------+----------+------------+----------------------------------+-------------+
|One Of The Girls (with JENNIE, Lily Rose Depp)                                                    |91        |2023-06-23  |The Weeknd, JENNIE, Lily-Rose Depp|244684       |
|My Love Mine All Mine                                                                             |89        |2023-09-15  |Mitski                            |137773       |
|Starboy                                                                                           |88        |2016-11-25  |The We

# BUILD THE DASHBOARD

**In this step we will build the dashboard based on dataset. However, we only filter the data which release date is in 07/07/2021 so that we can get more insights about Productivity and Popularity of artirts in 3 year up to now .**

**Interpret the dashboard:**
1. Productivity vs. Popularity:

The bar graph immediately reveals the artists with the highest number of tracks, indicating their productivity and output.The line graph overlays the average popularity score for each artist, allowing for a direct comparison between their productivity and the overall reception of their music.This can highlight artists who consistently produce popular music (high bars and high line points) versus those who release many tracks but with lower average popularity (high bars and low line points).

2. Outliers and Trends:

The combined graph can quickly identify outliers:
Artists with a high number of tracks but unexpectedly low average popularity (potential for niche appeal or declining quality).
Artists with a low number of tracks but surprisingly high average popularity (potential for emerging talent or viral hits or really famous artist: Taylor Swift).
It can also reveal trends:
Does a higher number of tracks generally correlate with higher or lower average popularity?
Are there any clusters of artists with similar productivity and popularity levels, suggesting genre or style influences?

In [0]:
# Display tracks released after January 1, 2023, including their name, popularity, release date, artist, and duration
display(sorted_df.filter(col("Release Date") > "2021-07-07").select("Track Name", "Popularity", "Release Date", "Artist", "Duration (ms)"))

Track Name,Popularity,Release Date,Artist,Duration (ms)
"One Of The Girls (with JENNIE, Lily Rose Depp)",91,2023-06-23,"The Weeknd, JENNIE, Lily-Rose Depp",244684
My Love Mine All Mine,89,2023-09-15,Mitski,137773
I Remember Everything (feat. Kacey Musgraves),88,2023-08-25,"Zach Bryan, Kacey Musgraves",227195
Nasty,86,2024-04-12,Tinashe,176027
Popular (with Playboi Carti & Madonna) - From The Idol Vol. 1 (Music from the HBO Original Series),85,2023-06-02,"The Weeknd, Playboi Carti, Madonna",215466
Pour Me A Drink (Feat. Blake Shelton),85,2024-06-21,"Post Malone, Blake Shelton",195122
Dance The Night - From Barbie The Album,84,2023-05-25,Dua Lipa,176579
Creepin' (with The Weeknd & 21 Savage),83,2022-12-02,"Metro Boomin, The Weeknd, 21 Savage",221520
Shivers,82,2021-10-25,Ed Sheeran,207853
Superhero (Heroes & Villains) [with Future & Chris Brown],82,2022-12-02,"Metro Boomin, Future, Chris Brown",182666


Databricks visualization. Run in Databricks to view.

# FETCHING DATA FROM SPOTIFY API TO CHECK THE TOP TRACK OF A SPECIFIC ARTIST

Moreover, we can also extract data from one specific artist and his top tracks to get some insights


In [0]:
# Fetching details for the artist with the given ID
sp.artist("1Xyo4u8uXC1ZmMpatF05PJ")

{'external_urls': {'spotify': 'https://open.spotify.com/artist/1Xyo4u8uXC1ZmMpatF05PJ'},
 'followers': {'href': None, 'total': 85718694},
 'genres': ['canadian contemporary r&b', 'canadian pop', 'pop'],
 'href': 'https://api.spotify.com/v1/artists/1Xyo4u8uXC1ZmMpatF05PJ',
 'id': '1Xyo4u8uXC1ZmMpatF05PJ',
 'images': [{'height': 640,
   'url': 'https://i.scdn.co/image/ab6761610000e5eb214f3cf1cbe7139c1e26ffbb',
   'width': 640},
  {'height': 320,
   'url': 'https://i.scdn.co/image/ab67616100005174214f3cf1cbe7139c1e26ffbb',
   'width': 320},
  {'height': 160,
   'url': 'https://i.scdn.co/image/ab6761610000f178214f3cf1cbe7139c1e26ffbb',
   'width': 160}],
 'name': 'The Weeknd',
 'popularity': 93,
 'type': 'artist',
 'uri': 'spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ'}

In [0]:
# Fetching the top tracks for the artist with the given ID (The Weeknd in this case)
tracks = []
artist_id = "1Xyo4u8uXC1ZmMpatF05PJ"
    # Fetching the top tracks for the given artist ID
top_tracks = sp.artist_top_tracks(artist_id)
    # Appending the fetched top tracks to the tracks list
tracks.append(top_tracks)

In [0]:
# Displaying the collected top tracks for the artist with ID given above
tracks

[{'tracks': [{'album': {'album_type': 'single',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/1Xyo4u8uXC1ZmMpatF05PJ'},
       'href': 'https://api.spotify.com/v1/artists/1Xyo4u8uXC1ZmMpatF05PJ',
       'id': '1Xyo4u8uXC1ZmMpatF05PJ',
       'name': 'The Weeknd',
       'type': 'artist',
       'uri': 'spotify:artist:1Xyo4u8uXC1ZmMpatF05PJ'},
      {'external_urls': {'spotify': 'https://open.spotify.com/artist/250b0Wlc5Vk0CoUsaCY84M'},
       'href': 'https://api.spotify.com/v1/artists/250b0Wlc5Vk0CoUsaCY84M',
       'id': '250b0Wlc5Vk0CoUsaCY84M',
       'name': 'JENNIE',
       'type': 'artist',
       'uri': 'spotify:artist:250b0Wlc5Vk0CoUsaCY84M'},
      {'external_urls': {'spotify': 'https://open.spotify.com/artist/1pBLC0qVRTB5zVMuteQ9jJ'},
       'href': 'https://api.spotify.com/v1/artists/1pBLC0qVRTB5zVMuteQ9jJ',
       'id': '1pBLC0qVRTB5zVMuteQ9jJ',
       'name': 'Lily-Rose Depp',
       'type': 'artist',
       'uri': 'spotify:artist:1pBLC0q

# DATA CLEANING AND TRANSFORMATION


In [0]:
# Importing necessary libraries for creating a SparkSession and defining the schema for a DataFrame
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, LongType, IntegerType



In [0]:
# Defining schema for the DataFrame
schema = StructType([
    StructField("Track ID", StringType(), nullable=False),
    StructField("Track Name", StringType(), nullable=False),
    StructField("Duration (ms)", LongType(), nullable=False),
    StructField("Popularity", IntegerType(), nullable=False),
    StructField("Album Name", StringType(), nullable=False),
    StructField("Album Type", StringType(), nullable=False),
    StructField("Release Date", StringType(), nullable=False),
    StructField("Artist", StringType(), nullable=False)
])

In [0]:
rows = []
for track in tracks:
    for track_data in track["tracks"]:
        track_id = track_data["id"]
        track_name = track_data["name"]
        track_duration = track_data["duration_ms"]
        track_popularity = track_data["popularity"]
        album_name = track_data["album"]["name"]
        album_type = track_data["album"]["album_type"]
        album_release_date = track_data["album"]["release_date"]
        # Combining artist names into a single string
        artist_names = ", ".join([artist["name"] for artist in track_data["artists"]])
        rows.append((track_id, track_name, track_duration, track_popularity, album_name, album_type, album_release_date, artist_names))
# Creating a DataFrame from the collected rows with the predefined schema
df = spark.createDataFrame(rows, schema)

# BUILD THE DASHBOARD

Intepret the dashboard:
1. The higher points on the graph represent the Weekend's most popular tracks over the years which are
One Of The Girl, Binding Light, Starboy
2. Evolution of Sound: 
There is a consistently increasing popularity score of the Weekend's songs which may suggest a successful adherence to a particular style, and some fluctuations could indicate experimentation or changes in musical direction.

3. Audience Reception:
A high and consistent popularity score also suggests a strong and loyal fan base

In [0]:
# Remove duplicate values from the DataFrame
df = df.dropDuplicates()
display(df.orderBy(col("Release Date").asc()))

Track ID,Track Name,Duration (ms),Popularity,Album Name,Album Type,Release Date,Artist
7fBv7CLKzipRk6EC6TWHOB,The Hills,242253,84,Beauty Behind The Madness,album,2015-08-28,The Weeknd
2LBqCSwhJGcFQeTHMVGwy3,Die For You,260253,84,Starboy,album,2016-11-24,The Weeknd
5gDWsRxpJ2lZAffh5p7K0w,Stargirl Interlude,111626,83,Starboy,album,2016-11-24,"The Weeknd, Lana Del Rey"
7MXVkk9YMctZqd1Srtv4MB,Starboy,230453,88,Starboy,album,2016-11-25,"The Weeknd, Daft Punk"
09mEdoA6zrmBPgTEN5qXmN,Call Out My Name,228373,83,"My Dear Melancholy,",album,2018-03-30,The Weeknd
0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,200040,88,After Hours,album,2020-03-20,The Weeknd
5QO79kh1waicV47BqGRL3g,Save Your Tears,215626,84,After Hours,album,2020-03-20,The Weeknd
2dHHgzDwk4BJdRwy9uXhTO,Creepin' (with The Weeknd & 21 Savage),221520,83,HEROES & VILLAINS,album,2022-12-02,"Metro Boomin, The Weeknd, 21 Savage"
6WzRpISELf3YglGAh7TXcG,Popular (with Playboi Carti & Madonna) - From The Idol Vol. 1 (Music from the HBO Original Series),215466,85,Popular (Music from the HBO Original Series),single,2023-06-02,"The Weeknd, Playboi Carti, Madonna"
7CyPwkp0oE8Ro9Dd5CUDjW,"One Of The Girls (with JENNIE, Lily Rose Depp)",244684,91,The Idol Episode 4 (Music from the HBO Original Series),single,2023-06-23,"The Weeknd, JENNIE, Lily-Rose Depp"


Databricks visualization. Run in Databricks to view.

# FETCHING DATA TO TRACK POPULARITY SCORE OF SOME KIND OF GENRES ACROSS COMBINATION OF MARKETS

In [0]:
# Fetching recommendations including popularity score
results = sp.recommendations(seed_genres=["pop",], limit=100)

# Filtering available markets and converting to a Spark DataFrame with popularity score and release date
desired_markets = {"US", "CA", "AU", "BR", "KR", "JP", "FR"}
rows = [
    Row(
        name=track["artists"][0]["name"], 
        id=track["artists"][0]["id"], 
        market=[market for market in track["available_markets"] if market in desired_markets], 
        popularity=track["popularity"]
    ) 
    for track in results["tracks"]
]
df = spark.createDataFrame(rows)

display(df)

name,id,market,popularity
Calvin Harris,7CajNmpbOovFoOoasH2HaY,"List(AU, BR, CA, FR, US, JP)",83
Miley Cyrus,5YGY8feqx7naU7z4HrwZM6,"List(AU, BR, CA, FR, US, JP)",76
Snakehips,2FwJwEswyIUAljqgjNSHgP,"List(AU, BR, CA, FR, US, JP)",51
Bryson Tiller,2EMAnMvWE2eb56ToJVfCWs,"List(AU, BR, CA, FR, US, JP)",80
Selena Gomez,0C8ZW7ezQVs4URX5aX7Kqx,"List(AU, BR, FR, KR)",65
Future,1RyvyyTE3xzB2ZywiAwp0i,"List(AU, BR, CA, FR, US, JP)",61
ScHoolboy Q,5IcR3N7QB1j6KBL8eImZ8m,"List(AU, BR, CA, FR, US, JP, KR)",58
Katy Perry,6jJ0s89eD6GaHleKKya26X,"List(AU, BR, CA, FR, US, JP, KR)",74
XXXTENTACION,15UsOTVnJzReFVN1VCnxy4,"List(AU, BR, CA, FR, US, JP, KR)",82
The Chainsmokers,69GGBxA162lTqCwzJG5jLp,List(),0


Databricks visualization. Run in Databricks to view.

**Interpret the popularity of POP genre across markets**

Consistent Popularity: Pop music generally enjoys consistent popularity across different market combinations. Most groups of markets show an average popularity score between 60 and 80. This suggests that pop music has a broad international appeal and is well-received in various parts of the world.

Slight Variations: There are some minor fluctuations in popularity depending on the specific market combination. For instance, pop music might be slightly more popular when released in markets including the United States, Australia, Brazil, Korean (mostly seen in group of high popularity records), especially United States the group of music released in US only still get pretty good score (approximately reach the median of the data set)



In [0]:

# Fetching recommendations including popularity score
results = sp.recommendations(seed_genres=["indie",], limit=100)

# Filtering available markets and converting to a Spark DataFrame with popularity score
desired_markets = {"US", "CA", "AU", "BR", "KR", "JP", "FR"}
rows = [
    Row(
        name=track["artists"][0]["name"], 
        id=track["artists"][0]["id"], 
        market=[market for market in track["available_markets"] if market in desired_markets], 
        popularity=track["popularity"]
    ) 
    for track in results["tracks"]
]
df = spark.createDataFrame(rows)

display(df)

name,id,market,popularity
Bibio,0qzzGu8qpbXYpzgV52wOFT,List(JP),16
Cold War Kids,6VDdCwrBM4qQaGxoAyxyJC,"List(AU, BR, CA, FR, US, JP, KR)",55
Beat Happening,1qHR9DMfOJQjvWLEfMZQlG,List(),0
The Districts,3HZgaiR960RFqx9d4LPraD,"List(AU, BR, CA, FR, US, JP, KR)",27
Of Monsters and Men,4dwdTW1Lfiq0cM8nBAqIIz,"List(CA, US)",62
Pinegrove,2gbT6GPXMis0OAkZbEQCYB,List(),0
Radiohead,4Z8W4fKeB5YxbusRsdQVPb,List(),0
Another Sunny Day,6EGQKKjGZOxDJ1iy7Pw25M,"List(AU, BR, CA, FR, US, JP, KR)",37
Rilo Kiley,2cevwbv7ISD92VMNLYLHZA,"List(AU, BR, CA, FR, US, JP, KR)",49
The Neighbourhood,77SW9BnxLY8rJ0RciFqkHh,List(),0


Databricks visualization. Run in Databricks to view.

**Interpret the popularity of INDIE genre across markets**

Consistent Popularity: Indie music generally enjoys a consistent level of popularity across different market combinations, mostly ranging between 50 and 70. This suggests a steady global appeal, though not as high as some other genres might experience.

Impact of Specific Markets:

US Market: Including the United States market seems to boost popularity slightly, as combinations with United States tend to be on the higher end of the range.


Limited Markets: Releasing in Canada or Japan or Australia only is associated with lower average popularity.

Market Combinations: Some specific combinations (e.g., AU, BR, CA, US) seem to perform slightly better than others.

Overall: The chart indicates that Indie music maintains a decent level of popularity globally. However, strategic market selection might play a role in maximizing its reach and potential success.



In [0]:
# Fetching recommendations including popularity score
results = sp.recommendations(seed_genres=["hip-hop",], limit=100)

# Filtering available markets and converting to a Spark DataFrame with popularity score
desired_markets = {"US", "CA", "AU", "BR", "KR", "JP", "FR"}
rows = [
    Row(
        name=track["artists"][0]["name"], 
        id=track["artists"][0]["id"], 
        market=[market for market in track["available_markets"] if market in desired_markets], 
        popularity=track["popularity"]
    ) 
    for track in results["tracks"]
]
df = spark.createDataFrame(rows)

display(df)

name,id,market,popularity
JID,6U3ybJ9UHNKEdsH7ktGBZ7,"List(AU, BR, CA, FR, US, JP, KR)",59
Busta Rhymes,1YfEcTuGvBQ8xSD1f53UnK,"List(AU, BR, CA, FR, US, JP, KR)",57
Dr. Dre,6DPYiyq5kWVQS4RGwxzPC7,List(),0
Snoop Dogg,7hJcb9fa4alzcOq3EaNPoG,"List(CA, KR, US)",67
6ix9ine,7gZfnEnfiaHzxARJ2LeXrf,List(),2
Rocko,0T5OJgMVjKIX3b3W3ekqOl,"List(AU, BR, CA, FR, US, JP, KR)",55
Mos Def,0Mz5XE0kb1GBnbLQm2VbcO,"List(AU, BR, CA, FR, US, JP, KR)",68
YG,0A0FS04o6zMoto8OKPsDwY,List(),0
Gucci Mane,13y7CgLHjMVRMDqxdx0Xdo,"List(AU, BR, CA, FR, US, JP, KR)",53
Westside Connection,3zNM2tRfTX6LI1lN2PlrTt,List(US),56


Databricks visualization. Run in Databricks to view.

**Interpret the popularity of HIPHOP genre across markets**

Consistent Popularity: Hip Hop generally maintains a good level of popularity across most market combinations, often falling within the 40-60 range. This suggests a broad international appeal, although not as universally high as pop or indie music might be.

Impact of Specific Markets:

US Market: The US market seems particularly important for Hip Hop, as its inclusion generally boosts the average popularity significantly. Most combinations featuring "US" rank higher.

Other Markets: Australia (AU) and Canada (CA) also appear to positively influence popularity, though to a lesser extent than the US.
Japan and Korea (JP, KR): Interestingly, including Japan and Korea doesn't seem to impact popularity as much as other markets. However, they are still a good market when combined with the other larger markets.

Combined Markets:  Certain market combinations, like AU, BR, CA, and US, or AU, BR, CA, FR, and US, seem to perform particularly well for Hip Hop. This suggests a synergistic effect when targeting these markets together.

Overall:
The chart demonstrates that Hip Hop has a solid global following. However, strategic market selection seems crucial, with the US being a particularly influential factor in maximizing the genre's popularity. Combining certain markets can further enhance success.

In [0]:
# Fetching recommendations including popularity score
results = sp.recommendations(seed_genres=["r-n-b",], limit=100)

# Filtering available markets and converting to a Spark DataFrame with popularity score
desired_markets = {"US", "CA", "AU", "BR", "KR", "JP", "FR"}
rows = [
    Row(
        name=track["artists"][0]["name"], 
        id=track["artists"][0]["id"], 
        market=[market for market in track["available_markets"] if market in desired_markets], 
        popularity=track["popularity"]
    ) 
    for track in results["tracks"]
]
df = spark.createDataFrame(rows)

display(df)

name,id,market,popularity
K. Michelle,2retT7MFwHDVTeGKDdybEx,"List(AU, BR, CA, FR, US, JP, KR)",54
Ella Mai,7HkdQ0gt53LP4zmHsL0nap,"List(AU, BR, CA, FR, US, JP, KR)",41
Jagged Edge,7Aq8lpLMSt1Zxu56pe9bmp,"List(AU, BR, CA, FR, US, JP)",56
The Internet,7GN9PivdemQRKjDt4z5Zv8,"List(AU, BR, CA, FR, US, JP)",46
Mabel,1MIVXf74SZHmTIp4V4paH4,List(),0
Zendaya,6sCbFbEjbYepqswM1vWjjs,"List(AU, BR, CA, FR, US, JP, KR)",55
Nicki Minaj,0hCNtLu0JehylgoiP8L4Gh,"List(AU, BR, CA, FR, US, JP, KR)",34
Alina Baraz,6hfwwpXqZPRC9CsKI7qtv1,"List(AU, BR, CA, FR, US, JP, KR)",65
Craig David,2JyWXPbkqI5ZJa3gwqVa0c,List(),0
Charlie Wilson,6CxZzQFUTM6AzgluGwtq5w,"List(AU, BR, CA, FR, US, JP)",45


Databricks visualization. Run in Databricks to view.

**Interpret the popularity of R&B genre across markets**

Importance of the US Market: R&B music's popularity significantly increases when released in the US market. Combinations including the US generally have the highest average popularity scores.

Gradual Increase with Market Expansion: The popularity generally increases as the music is released in more markets, particularly when adding the US to the mix. This suggests that a broader release strategy is beneficial.

Specific Market Combinations: Some combinations seem to perform better than others. Notably, the combination of  BR, FR, US seems to be the most successful.

Impact of Individual Markets: While the US has the most significant impact, adding other markets like Australia, Brazil, and Canada seems to contribute positively to popularity.

Lower Popularity in Limited Markets: Releasing R&B music in only one market (e.g., only Brazil or sometimes France) results in the lowest average popularity.

Overall: The chart indicates that R&B music enjoys greater popularity when released in multiple markets, with a clear emphasis on the US market. Strategic market selection plays a crucial role in maximizing the genre's success.

