**Assignment: A MongoDB JSON Document Database using Spotify API**

#Batch Processing using MongoDB and the Spotify API

Batch data processing involves collecting, storing, and processing data in large groups or "batches" rather than handling data in real-time as it comes in. This approach allows for more efficient handling of data, especially when dealing with large volumes of information that don’t need to be processed immediately. In this assignment, you will work with both MongoDB, a NoSQL database, and the Spotify API to implement batch processing techniques that involve collecting and analyzing music-related data.

These are the steps you will need to follow:
*   [1 - Install dependencies](#1)
*   [2 - Create an Atlas Client on MongoDB](#2)
*   [3 - Create an Spotify APP](#3)
*   [4 - Connect to your app using Spotify's SDK](#4)
*   [5 - Get new releases data from Spotify API](#5)
*   [6 - Explore your MongoDB collection](#6)
*   [7 - Get all albums from the featured Artists](#7)
*   [8 - Create New MongoDB collection](#8)
*   [9 - Explore your data!](#9)
*   [10 - Create an iteractive map using Folium!](#10)

**IMPORTANT!!!!**
## During the course of this assignment, you will encounter the word `None` in several places. Each time you see `None` replace it with the appropriate variable, method, string, or value for that specific code snippet—unless the `None` is used as a return value to indicate the absence of a result. In this case, `None` is intentionally returned to signify that that a result could not be obtained.

<a name='1'></a>
#1 - Install spotify sdk and pymongo in your Google Colab env

In [None]:
!pip install spotipy pymongo --upgrade

Collecting spotipy
  Downloading spotipy-2.24.0-py3-none-any.whl.metadata (4.9 kB)
Collecting pymongo
  Downloading pymongo-4.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting redis>=3.5.3 (from spotipy)
  Downloading redis-5.2.0-py3-none-any.whl.metadata (9.1 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading spotipy-2.24.0-py3-none-any.whl (30 kB)
Downloading pymongo-4.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m28.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.6/313.6 kB[0m [31m20.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading redis-5.2.0-py3-none-any.whl (261 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB

<a name='2'></a>
#2 - Create an Atlas Client on MongoDB

To sign up for a free MongoDB account, go to https://mongodb.com, then create a new free account. Once your account is set up, you will be taken to the screen to create your cluster. Use the default settings for their free Atlas cluster (MO, as they refer to it) and click Create Cluster to get started. This will take you to the Clusters page so you can begin creating your new cluster, which takes several minutes.

###Create your Database User and whitelist your IP address
Next, in the Atlas tab Security Quickstart, you will need to complete additional steps to get up and running:

*	Add your username and password, then click Create User—This enables you to log into your cluster.
*	Keep My Local Environment—This means adding your network IP addresses to the IP Access List. This can be modified at any time.
*	Click on Add My Current IP Address—This is a security measure that ensures only the IP addresses you verify are allowed to interact with your cluster. To connect to this cluster from multiple locations (school, home, work, etc.), you will need to whitelist each IP address from which you intend to connect.
Finally, click on Finish and Close.

###Connect to your Cluster

Go to Databases. Click Connect to continue. Connecting to a MongoDB Atlas database from Python requires a connection string. To get your connection string, click **Connect Your Application**. In **Select your driver and version**, choose Python 3.6 or later. Your connection string will appear below in **Add your connection string into your application code**. Click COPY to copy the string. Paste this string into the keys.py file as mongo_connection_string’s value. Replace “<PASSWORD>” in the connection string with your password, and replace the database name “myFirstDatabase” with “mySpotifyDatabase”,” which will be the database name in this assignment. At the bottom of the Connect to YourClusterName, click Close. You are now ready to interact with your Atlas cluster.


In [None]:
MONGO_STRING = "mongodb+srv://eantia:Test123@cluster0.fo3ls.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0" #Include your mongo connection string here

In [None]:
from pymongo import MongoClient
#START YOUR CODE HERE
atlas_client = MongoClient("mongodb+srv://eantia:Test123@cluster0.fo3ls.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0")   #Pass your cluster connection string to the client method
#END YOUR CODE HERE

In [None]:
#START YOUR CODE HERE
database = atlas_client["mySpotifyDatabase"]                            #Create a database object and name it for your atlas_client
featured_albums_collection = database["MySpotifyCollections"]      #Select a name for your collection
#END YOUR CODE HERE



In [None]:
# atlas_client.close()

<a name='3'></a>
#3 - Create a Spotify APP

To get access to Spotify's API resources, you need to create a Spotify account if you don't already have one. A trial account will be enough to complete this lab.

1. Go to https://developer.spotify.com/, create an account and log in.
2. Click on the account name in the right-top corner and then click on **Dashboard**.
3. Create a new APP using the following details:
   - App name: You can choose the name, make sure you select only use an alphanumeric string without special characters
   - App description: `DBMS test API application`
   - Website: leave empty
   - Redirect URIs: `http://localhost:6000`
   - API to use: select `Web API`
4. Click on **Save** button. If you get an error message saying that your account is not ready, you can log out, wait for a few minutes and then repeat again steps 2-4.
5. In the App Home page click on **Settings** and reveal `Client ID` and `Client secret`. Make sure you copy those and save them in a separated file!


Here's the link to [the Spotify API documentation](https://developer.spotify.com/documentation/web-api/tutorials/getting-started) that you can refer to while you're working on this assignment.

<a name='4'></a>
#4 - Create a Spotify ClientCredential object using the SDK

The Spotipy SDK is a Python client for interacting with Spotify’s Web API. It provides a range of functions to access and manage data related to artists, albums, tracks, playlists, and user profiles. Here’s an overview of some key capabilities that you will explore in this assignment:





*   **Accessing Artist Information**: With Spotipy, you can retrieve detailed information about artists, including their name, genres, popularity score, and followers. The SDK also allows access to an artist's top tracks and related artists, which can help students explore music trends and build up artist profiles for batch storage in MongoDB.

*   **Track and Album Metadata**: Spotipy enables access to metadata for tracks and albums, such as track name, album name, release date, and track popularity. Additionally, you can retrieve audio features like tempo, danceability, and energy, which provide in-depth details about the music and are valuable for data analysis.

*   **Searching for Content**: Using Spotipy’s search functionality, you can query the Spotify catalog by keywords for artists, albums, playlists, or tracks. This can be instrumental in batch processing, as users can search for multiple artists or songs and gather relevant data in one go.

*   **User Profile and Playlist Management**: Spotipy also supports accessing Spotify user profiles and playlists, though this is less relevant for the assignment. However, this feature could provide additional context or personalization if students wanted to explore user-based music preferences.


*   **Authorization and Access Control**: Spotipy handles authorization with Spotify’s OAuth, ensuring that only authenticated requests are made. This allows students to securely access data and manage the rate limits associated with the Spotify API.

In [None]:
import spotipy
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials

The first step in working with an API is understanding its authentication process. For Spotify, this involves using a Client ID and Client Secret generated by the Spotify app to obtain an access token. The access token is a string containing the credentials and permissions required to access specific resources. For more information, refer to the Spotify [API documentation](https://developer.spotify.com/documentation/web-api/concepts/access-token).

Since each API is designed with unique features, it’s essential to review its documentation thoroughly to access data responsibly. Throughout this lab, you’ll find links to documentation; it’s recommended to review these during and after the session as needed.

Now, let’s create variables to store the client_id and client_secret values.

In [None]:
CLIENT_ID = 'ddccff70f43b4087b622d0c720b39879'     #Include your client ID here
CLIENT_SECRET = 'f768e2b86b60467b868b42432ad45e4b' #Include your client Secret here

In [None]:
credentials = SpotifyClientCredentials(
        client_id=CLIENT_ID,
        client_secret=CLIENT_SECRET
    )

spotify = spotipy.Spotify(client_credentials_manager=credentials, language='en')  #You can change this if you want to get data from a different lenguage

When working with the Spotify API, you'll receive a temporary access token, with its validity period specified in the `expires_in` field (in seconds). Once this token expires, any subsequent requests will fail and return an error with a status code of 401, indicating that the request is unauthorized.

For each API request you send to Spotify, you need to include the access token in the request’s authorization header. The get_auth_header function is provided to streamline this process. It takes the access token as input and returns a properly formatted authorization header, which you can then include in your API requests.

**If you get an 401 response, please make sure to create your access token again by executing the code below!**

In [None]:
credentials.get_access_token()

  credentials.get_access_token()


{'access_token': 'BQA27cE9d1H2_LTyBr7mKNKR5BunEi_4nCjz5PZUr9RGZNjEaRLEPEYfyGfHwHKbJM8FbluinvIpQgK9Iw3y2uF1VRXlQQbjjLo8Q40L0eYeauWPrRY',
 'token_type': 'Bearer',
 'expires_in': 3600,
 'expires_at': 1732510831}

The above token contains the expiration (in seconds) of the token. Once the token expires, you will need to create a new one.

<a name='5'></a>
#5 - Get new releases data from Spotify API

Select one of the following country codes to fetch data from Spotify based on the country of your choice:

* AU: Australia
* AT: Austria
* BE: Belgium
* BO: Bolivia
* BR: Brazil
* BG: Bulgaria
* CA: Canada
* CL: Chile
* CO: Colombia
* CR: Costa Rica
* CY: Cyprus
* DO: Dominican Republic
* FI: Finland
* FR: France
* DE: Germany
* GT: Guatemala
* HN: Honduras
* HK: Hong Kong
* IE: Ireland
* IT: Italy
* JP: Japan
* LV: Latvia
* LU: Luxembourg
* MY: Malaysia
* MT: Malta
* MX: Mexico
* MC: Monaco
* NL: Netherlands
* NZ: New Zealand
* NI: Nicaragua
* PY: Paraguay
* PE: Peru
* PH: Philippines
* PL: Poland
* PT: Portugal
* SG: Singapore
* ES: Spain
* SK: Slovakia
* SE: Sweden
* CH: Switzerland
* TW: Taiwan
* TR: Turkey
* GB: United Kingdom
* US: United States
* UY: Uruguay


**Your task**:
*   Select one country from the list above for which you will retrieve data from the Spotify API.
*   Use the limit parameter to specify the number of records you want to retrieve. Default: 20. Minimum: 1. Maximum: 50


In [None]:
#START YOUR CODE HERE
COUNTRY_CODE = 'PE'
LIMIT = 5
#END YOUR CODE HERE

Now, let's use the token to perform a request to access the first resource, which is the [new_releases](https://spotipy.readthedocs.io/en/2.22.1/?highlight=featured_playlists#spotipy.client.Spotify.new_releases).

**Your tasks**:


1.   Look at the link above and make the correct call to the Spotify end-point and get the new_releases and store them in the `featured_albums`.
2.   Loop through the response and store each record (album) into your MongoDB collection you created above. HINT: You should explore the `featured_albums` response to understand how it is structured, also check the [mongodb doc](https://www.mongodb.com/docs/manual/reference/method/db.collection.insertOne/?msockid=2c010af6d0b963db3ebe1e3ed1496248)



In [None]:
#START YOUR CODE HERE
featured_albums = spotify.new_releases(country='PE', limit=5)

for album in featured_albums['albums']['items']:
    featured_albums_collection.insert_one(album)
#END YOUR CODE HERE
print(f"Inserted {len(featured_albums['albums']['items'])} albums into MongoDB.")

Inserted 5 albums into MongoDB.


<a name='6'></a>
#6 - Explore your MongoDB collection

This script will connect to the MongoDB collection, query for specific fields (artist ID, name, and URI), and load the data into a list.

**Your tasks:**


1.   Check the following [link](https://www.mongodb.com/docs/manual/reference/method/db.collection.find/) to explore how to use the find() method to query specific fields from your collection.  Ensure that your query retrieves only the artists id, name and uri from your collection. Make sure to read the documentation.
2.   Once you get the data from your query, you should create a pandas DataFrame with the results. You should find a way to combine all records into the `artists_data` dictionary.



In [None]:
artists_data = []
#START YOUR CODE HERE
for album in featured_albums_collection.find({},
    {   "artists.id": 1,
        "artists.name": 1,
        "artists.uri": 1,
        "_id": 0}):
    for artist in album.get("artists", []):
        artist_info = {
            "artist_id": artist["id"],
            "artist_name": artist["name"],
            "artist_uri": artist["uri"]
        }
        artists_data.append(artist_info)
    #END YOUR CODE HERE
    # Create a pandas DataFrame from the artists_data list
artists_dataframe = pd.DataFrame(artists_data)

# Display the DataFrame
print(artists_dataframe)

                artist_id   artist_name                             artist_uri
0  06HL4z0CvFAxyc27GXpf02  Taylor Swift  spotify:artist:06HL4z0CvFAxyc27GXpf02
1  1w5Kfo2jwwIPruYS2UWh56     Pearl Jam  spotify:artist:1w5Kfo2jwwIPruYS2UWh56
2  5Vuvs6Py2JRU7WiFDVsI7J    Lucky Daye  spotify:artist:5Vuvs6Py2JRU7WiFDVsI7J
3  540vIaP2JwjQb9dm3aArA4      DJ Snake  spotify:artist:540vIaP2JwjQb9dm3aArA4
4  12GqGscKJx3aE4t07u7eVZ    Peso Pluma  spotify:artist:12GqGscKJx3aE4t07u7eVZ
5  75JvBeqW4BJ4xgnbMAq6MN   Anne Wilson  spotify:artist:75JvBeqW4BJ4xgnbMAq6MN


<a name='7'></a>
#7 - Get all albums from the featured Artists

When we used the `new_releases` method, we actually queried all new relases (based on the parameters you picked) from the Spotify API. This allowed us to save multiple documents (each beign a single released) in an object called `artists_data`. Now your job is to retrieve every single album from the list of artists you got from the `new_releases` method.

Your tasks:


1.   Loop through the `artists_data` object and get the `artist_uri` for each artist. This value will be required for you to call the `artist_albums` method and get all `albums` from that `artist_uri`. Learn more about [artist_albums](https://spotipy.readthedocs.io/en/2.22.1/?highlight=featured_playlists#spotipy.client.Spotify.artist_albums) and [artist_uri](https://spotipy.readthedocs.io/en/2.22.1/?highlight=featured_playlists#ids-uris-and-urls) by clicking the links
2.   You should create a temporal variable to store the results from the `artist_albums` method. Furthermore, you should only store the `items` key from the results inside a variable called `albums`
3.   We want to create a new list with all the different albums that an artist has, to do this, you will first add a new key named `artist_name` that will contain the `artist_name` that you got from the `artist_albums` method.
4.   Join each album inside the `artists_albums` list.
5.   Spotify API works with something called "pagination". Pagination means that within the string response from the API, there will be another set of results contained in the `next` key. This allows us to create consecutive requests from the same element. Your job is to use the `next` [method](https://spotipy.readthedocs.io/en/2.22.1/?highlight=featured_playlists#spotipy.client.Spotify.next) to get the next albums from a given artist. Do not forget to include the `artist_name` just as you did in step 3.



In [None]:
artists_albums = []

#START YOUR CODE HERE
for artist in artists_data:
    results = spotify.artist_albums(artist['artist_uri'], album_type='album')
    albums = results['items']

    # Add artist's name to each album in the initial results
    for album in albums:
        album['artist_name'] = artist['artist_name']
        artists_albums.append(album)

    # Loop through paginated results, adding artist's name
    while results['next']:
        results = spotify.next(results)
        for album in results['items']:
            album['artist_name'] = artist['artist_name']
            artists_albums.append(album)
#END YOUR CODE HERE
# Create a pandas DataFrame from the artists_albums list
df_artists_albums = pd.DataFrame(artists_albums)

# Display the DataFrame
print(df_artists_albums)

   album_type  total_tracks  \
0       album            31   
1       album            16   
2       album            22   
3       album            21   
4       album            22   
..        ...           ...   
62      album            10   
63      album            10   
64      album            16   
65      album            26   
66      album            15   

                                    available_markets  \
0   [AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...   
1   [AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...   
2   [AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...   
3   [AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...   
4   [AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...   
..                                                ...   
62  [AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...   
63  [AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...   
64  [AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...   
65  [AR, AU, AT, BE, BO, BR, BG, CA, CL, CO, CR, C...   


<a name='8'></a>
#8 - Create New MongoDB collection

Now that you have your new object with all artists' albums, you will need to create a new Collection in your MongoDB cluster. Use the data you created above to store those in a new MongoDB collection.

Remember to look at this [documentation](https://www.mongodb.com/docs/manual/tutorial/insert-documents/#:~:text=Collection.-,insertOne()%20inserts%20a%20single%20document%20into%20a%20collection.,value%20to%20the%20new%20document) to learn more about MongoDB. Also, DO NOT FORGET to go and verify that the data is in your MongoDB cluster.

In [None]:
#START YOUR CODE HERE
database = atlas_client["myAlbumdatabase"]       #Select the name of your database
albums_collection = database["myAlbumCollection"]  #Select the name of your collection

for album in artists_albums:
    albums_collection.insert_one(album)     #Insert the data into MongoDB
#END YOUR CODE HERE
print(f"Inserted {len(artists_albums)} albums into MongoDB.")

Inserted 67 albums into MongoDB.


<a name='9'></a>
#9 - Explore your data!
You have now collected all albums from artists with new releases. Your next task is to explore and analyze this data using Python and MongoDB.

Answer the following questions based on the data in your collection:


1.   How many albums are stored in the collection?
2.   Which artist has the most albums in the collection?
3.   Which artist has the least albums in the collection?
4.   What is the average number of tracks per album? (*Include the Artist Name*)
5.   How many albums are available in each market?
6.   What is the release date of the oldest album? (*Include the Artist Name*)
7.   What are the top 5 albums with the most tracks? (*Include the Artist Name*)
8.   Which albums are available in more than 60 markets? (*Include the Artist Name*)
9.   How many albums does each artist have, and what is the average number of tracks per album for each artist?
10.  Which albums have the word "Deluxe" in their title? (*Include the Artist Name*)

For your reference, here are the MongoDB commands that will be useful for these tasks:

[Aggregate](https://www.mongodb.com/docs/manual/reference/command/aggregate/)

[Find](https://www.mongodb.com/docs/manual/reference/command/find/)


In [None]:
# Question 1:
total_albums = albums_collection.count_documents({})
print(f"Total number of albums in MongoDB: {total_albums}")



Total number of albums in MongoDB: 67


In [None]:
# Question 2:
artist_with_most_album = [
    {"$unwind": "$artist_name"},
    {"$group": {"_id": "$artist_name", "album_count": {"$sum": 1}}},
    {"$sort": {"album_count": -1}},
    {"$limit": 1}
]

result = list(albums_collection.aggregate(artist_with_most_album))
if result:
    print(f"Artist with the most albums: {result[0]['_id']} ({result[0]['album_count']} albums)")


Artist with the most albums: Taylor Swift (29 albums)


In [None]:
# Question 3:
least_album = [
    {"$unwind": "$artist_name"},
    {"$group": {"_id": "$artist_name", "album_count": {"$sum": 1}}},
    {"$sort": {"album_count": 1}},
    {"$limit": 1}
]

result = list(albums_collection.aggregate(least_album))
if result:
    print(f"Artist with the least albums: {result[0]['_id']} ({result[0]['album_count']} album)")


Artist with the least albums: DJ Snake (3 album)


In [None]:
# Question 4:
average_tracks_per_artist = df_artists_albums.groupby('artist_name')['total_tracks'].mean()
print("Average number of tracks per album for each artist:")
print(average_tracks_per_artist)


Average number of tracks per album for each artist:
artist_name
Anne Wilson     16.750000
DJ Snake        21.000000
Lucky Daye      15.166667
Pearl Jam       14.473684
Peso Pluma      13.500000
Taylor Swift    19.896552
Name: total_tracks, dtype: float64


In [None]:
# Question 5
album_per_market = [
    {"$unwind": "$available_markets"},
    {"$group": {"_id": "$available_markets", "album_count": {"$sum": 1}}},
    {"$sort": {"album_count": -1}}
]

results = list(albums_collection.aggregate(album_per_market))
print("Number of albums available in each market:")
for market in results:
    print(f"{market['_id']}: {market['album_count']} albums")


Number of albums available in each market:
CA: 67 albums
US: 67 albums
AU: 59 albums
TO: 59 albums
MZ: 59 albums
HK: 59 albums
MG: 59 albums
KW: 59 albums
PL: 59 albums
AL: 59 albums
GH: 59 albums
DM: 59 albums
AZ: 59 albums
VC: 59 albums
LV: 59 albums
TZ: 59 albums
NR: 59 albums
SM: 59 albums
FI: 59 albums
GD: 59 albums
MR: 59 albums
GR: 59 albums
LI: 59 albums
GM: 59 albums
TD: 59 albums
NO: 59 albums
TJ: 59 albums
LU: 59 albums
LR: 59 albums
KG: 59 albums
CR: 59 albums
MU: 59 albums
BS: 59 albums
ET: 59 albums
NP: 59 albums
FR: 59 albums
ZM: 59 albums
HN: 59 albums
DK: 59 albums
BZ: 59 albums
SE: 59 albums
KE: 59 albums
DE: 59 albums
MC: 59 albums
MV: 59 albums
PY: 59 albums
CG: 59 albums
RW: 59 albums
DO: 59 albums
PG: 59 albums
GW: 59 albums
NE: 59 albums
BA: 59 albums
SG: 59 albums
BG: 59 albums
MD: 59 albums
TR: 59 albums
CY: 59 albums
TH: 59 albums
IE: 59 albums
CM: 59 albums
RO: 59 albums
AR: 59 albums
IN: 59 albums
HT: 59 albums
MN: 59 albums
PT: 59 albums
PK: 59 albums
KR: 5

In [None]:
# Question 6
oldest_album = [
    {"$sort": {"release_date": 1}},
    {"$limit": 1},
    {"$project": {"_id": 0, "name": 1, "artist_name": 1, "release_date": 1}}
]

oldest_album_result = list(albums_collection.aggregate(oldest_album))
if oldest_album_result:
    oldest_album = oldest_album_result[0]
    print(f"Oldest album: '{oldest_album['name']}' by {oldest_album['artist_name']} released on {oldest_album['release_date']}")


Oldest album: 'Ten Redux' by Pearl Jam released on 1991-08-27


In [None]:
# Question 7
top_tracks = [
    {"$sort": {"total_tracks": -1}},
    {"$limit":5},
    {"$project": {"_id": 0, "name": 1, "artist_name": 1, "total_tracks": 1}}
]

result = list(albums_collection.aggregate(top_tracks))
print("Top 5 albums with the most tracks:")
for album in result:
    print(f"{album['name']} by {album['artist_name']} - {album['total_tracks']} tracks")

Top 5 albums with the most tracks:
reputation Stadium Tour Surprise Song Playlist by Taylor Swift - 46 tracks
folklore: the long pond studio sessions (from the Disney+ special) [deluxe edition] by Taylor Swift - 34 tracks
Carte Blanche (Deluxe) by DJ Snake - 32 tracks
THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY by Taylor Swift - 31 tracks
Red (Taylor's Version) by Taylor Swift - 30 tracks


In [None]:
# Question 9
album_stats = [
    {"$group": {"_id": "$artist_name", "count": {"$sum": 1}, "avg_tracks": {"$avg": "$total_tracks"}}},
    {"$sort": {"count": -1}},
    {"$project": {"_id": 0, "artist_name": "$_id", "count": 1, "avg_tracks": 1}}
]

result = list(albums_collection.aggregate(album_stats))
print("Number of albums and average number of tracks per artist:")
for doc in result:
    print(f"{doc['artist_name']}: {doc['count']} albums, {doc['avg_tracks']:.2f} tracks on average")
    print()


Number of albums and average number of tracks per artist:
Taylor Swift: 29 albums, 19.90 tracks on average

Pearl Jam: 19 albums, 14.47 tracks on average

Peso Pluma: 6 albums, 13.50 tracks on average

Lucky Daye: 6 albums, 15.17 tracks on average

Anne Wilson: 4 albums, 16.75 tracks on average

DJ Snake: 3 albums, 21.00 tracks on average



In [None]:
# Question 10
album_deluxe = [
    {"$match": {"name": {"$regex": "deluxe", "$options": "i"}}},
    {"$project": {"_id": 0, "name": 1, "artist_name": 1}}
]

result = list(albums_collection.aggregate(album_deluxe))
print("Albums with the word 'Deluxe' in their title:")
for album in result:
    print(f"{album['name']} by {album['artist_name']}")

Albums with the word 'Deluxe' in their title:
1989 (Taylor's Version) [Deluxe] by Taylor Swift
evermore (deluxe version) by Taylor Swift
folklore: the long pond studio sessions (from the Disney+ special) [deluxe edition] by Taylor Swift
folklore (deluxe version) by Taylor Swift
1989 (Deluxe Edition) by Taylor Swift
Red (Deluxe Edition) by Taylor Swift
Speak Now (Deluxe Edition) by Taylor Swift
Candydrip (Deluxe) by Lucky Daye
Painted (Deluxe Edition) by Lucky Daye
Carte Blanche (Deluxe) by DJ Snake
My Jesus (Anniversary Deluxe) by Anne Wilson


<a name='10'></a>
#10 - Create an interactive map using Folium!

Folium is a Python library that simplifies the creation of interactive, visually appealing maps. It acts as a wrapper for the Leaflet.js JavaScript library, allowing users to create maps in Python without needing to write JavaScript. [Learn More](https://python-visualization.github.io/folium/latest/)

Here are a few key concepts about this library:

* **Map Initialization**: Folium provides a Map class that lets users set a central location and zoom level, initializing a map on which they can place markers or other geographic elements.
* **Adding Markers**: Folium’s Marker class lets students place icons on the map at specific locations, which can contain popups with details (like artist names and album information). This feature is key for visualizing different locations where an artist’s album is available.
* **Customization and Interactivity**: The library supports customizing marker icons, colors, and popups, making the map both interactive and visually informative. Users can click on markers to view additional information about the album or artist, which makes exploring data on a map engaging.

In this assignment, Folium will allow you to see where each album is available. Each marker represents a market (country) in which the artist’s album has been released, making it easy to see the global spread and reach of the album. By adding artist and album information to the markers, you can visually assess which artists and albums have the widest distribution.

**Geopy** is a Python library that enables geocoding—converting addresses or location names (like country codes) into latitude and longitude coordinates, which can then be used for plotting on a map.

Here’s how we will be using it:

* **Geocoding Services**: Geopy can connect to multiple geocoding providers (like Nominatim, Google Maps, etc.) to look up geographic data. When given a country code, Geopy queries the provider to retrieve the corresponding latitude and longitude.
* **Caching and Rate Limiting**: Geopy includes rate limits to prevent users from overwhelming the service with requests. This is particularly helpful in this assignment, because many albums might be available in multiple countries! And we will provide 1 request per country per album.

In [None]:
import folium
import time
from pymongo import MongoClient
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut

Your tasks:


1.   Because your collection can have many albums and many markets you will need to first create a new query using pymongo to get specific records from your collection that will be used for your map. Your task is to create a variable to store the TOP 10 albums order by the lastest `release_date`.
2.   We will send request to the Nominatim geocoding provider you will need to give a name to the geolocator.
3.   We have defined a function called `get_coordinates` that will take as an input a `country_code` and will return a tuple with the coordinates of that country. We want to reduce having multiple API requests for countries that we already asked the coordinates for (this is because multiple albums may have the same market). Your task is to check if the `country_code` passed to the `get_coordinates` was already provided by checking if it is within our `coordinates_cache` dictionary.
4.   If the country code has not been retrieved before, you will need to pass it to the geocode method to get the coordinates.
5.   Create a tuple with the `latitude` and `longitude` values
6.   In step 1 you got a variable with a list of albums that you need to map using Folium. You should loop and get the following information for each album: `artist_name`, `name` and `available_markets`.
7.   Because each `available_market` for each record can contain multiple countries, you will need to loop through each country and call the `get_coordinates` function you just created and pass the current market
8.   Now you are ready to create a folium Marker. This market should have the `artist_name`, `album_name` and `market`. You can change the colors and icon if you like. Learn more about this [here](https://python-visualization.github.io/folium/latest/reference.html#folium.map.Marker)



 NOTE: The way that we have constructed the map only allows a single record to be shown in the MAP. Please avoid creating a query that returns multiple records as only the last record will be mapped using the code below.




In [None]:
# Initialize a Folium map centered globally
map_ = folium.Map(location=[20, 0], zoom_start=2)
coordinates_cache = {}

#START YOUR CODE HERE
geolocator = Nominatim(user_agent="spotify_album_map")   #Name your geolocator

latest_albums = albums_collection.find().sort("release_date", -1).limit(1)      #Create your query using pymongo here

def get_coordinates(country_code):
    if country_code in coordinates_cache:
        return coordinates_cache[country_code]

    try:
        location = geolocator.geocode(country_code, timeout=10)
        if location:
            coords = (location.latitude, location.longitude)
            coordinates_cache[country_code] = coords
            return coords
        else:
            return coords
    except GeocoderTimedOut:
        return None

# Loop through each album and add markers for available markets
for album in latest_albums:
    artist_name = album["artist_name"]
    album_name = album["name"]
    available_markets = album["available_markets"]

    for market in available_markets:
        coords = get_coordinates("PE")
        if coords:
            # Add a marker with a popup showing artist and album information
            folium.Marker(
                location=coords,
                popup=f"Artist: {artist_name}<br>Album: {album_name}<br>Market: {available_markets}",
                icon=folium.Icon(color="blue", icon="music")
            ).add_to(map_)
        time.sleep(1)
#END YOUR CODE HERE
map_

# Video Submission

In your **5-minute** (maximum) video, ensure you:

1. **Explain your understanding of the assignment** and the process for each of the 10 steps:
   - **Clear Overview:** Provide a comprehensive overview covering each of the 10 steps in the assignment, such as setting up MongoDB, accessing the Spotify API, and querying data.
   - **Thoughtful Reflections:** Highlight the challenges you encountered and how you solved them. Discuss any key choices you made (e.g., structuring MongoDB queries), showcasing your thought process and approach.
   - **Depth of Understanding:** Demonstrate a strong understanding of each step, reflecting on both the successes and obstacles you faced.

2. **Provide a detailed explanation of the insights you gained** from querying the data:
   - **Learning Outcomes:** Clearly articulate what you learned from the data queries, connecting these insights to specific questions you aimed to answer (e.g., identifying which artist has the most albums or finding the oldest album).
   - **Contributions to Understanding:** Explain how these insights contribute to a deeper understanding of the data and your overall assignment, emphasizing the value of your analysis.

3. **Explain the interactive map** you created with **Folium**:
   - **Insights from the Map:** Describe the insights the map helped you uncover, including any geographical patterns or trends in the data.
   - **Enhancing Analysis:** Discuss how the interactive map enhanced your analysis and contributed to your overall findings, demonstrating its relevance to your assignment.
