Web application to extract my private data from Spotify using Spotify API and store in database to display on web application
Explore the docs »
Table of Contents
Web Applications built in Python and Flask to implement an ETL pipeline to extract Spotify private data via API and load it on the database. This project will allow Flask to request my private data from Spotify API and structure them using pandas to load in a Postgres database, and a little bit further, retrieve these data in a Front-End application built in HTML5, CSS3, and Javascript, and on top of that, it was necessary to use jinja2 from Flask to allow using templates on front-end application.
In this project, it was necessary to split it into three parts, such as:
- ETL Pipeline: to extract and load these on the database;
- Front-End Application: User’s interface to be able to showcase these private data in a web application;
- Deploy: The last part was deployment from a web application using GitHub and cloud AWS by Render Deployment to create a CI/CI workflow to allow new releases on the web application;
Authorization
Allow Flask to use some Spotify permission to connect the web application to Spotify API.
- Get Access Token: It will be to send a type 'post' request with some params for API and get a response with an access token:
- TYPE: "grant_type"
- AUTHORIZATION CODE: "code"
- APP REDIRECT URI: "redirect_uri"
- SPOTIFY CLIENT ID: 'client_id'
- SPOTIFY CLIENT SECRET: 'client_secret'
NOTE: we can get these params on Spotify documentation
Get Data:
This module can get our data from Spotify API as a JSON response and pass them for data format. In that way, we could use some API endpoint from Spotify to request these data, such as: Get Uer’s Top Items: include Tracks and Artists Get User’s Profile Get Current User’s Playlist Get Recently Played Tracks Search For Items
Data Format
We can use it to structure every data extracted using pandas, in other words, part of the transformation data happens here because we need to structure these data before loading them on the database. It will be returned to a Pandas DataFrame Store these data formatted on the database using SQLAlchemy function “to_sql”.
It comes after extracting and formatting target data, and then, it’ll be possible to come up with a way to ingest structured data into the database. To be able to ingest these data, it was necessary to use the SQLAlchemy library from Python and the function “to_sql” to convert Pandas DataFrame to SQL syntax.
In the first place, data ingestion was used on an SQLite database, to be able to store simple data, but, a little bit further, it was important to create a Postgres Database using docker as a test local database, before shifting to a production database working on cloud.
This part of the application was built using HTML5, CSS3, and Javascript to interact with jinja2 from Flask to data retrieval from database to interface application. On top of that, the application interface was split into three views to explore many endpoint features from API and so on.
Each of these views requests data from the Postgres database using jinja2 templates which interact with front-end stacks to render some pieces of information in the application interface.
After building the first view from the web application, it used Github to create a CI/CD process to allow web service working on the cloud to request new updates from the web application every time the new adust is detected.
It’s important to say, that there exists an option in profile visualization to request new data for Spotify API to be able to update the database with new information, but it’s just can be used by a logged account from Spotify that is requesting these data.
Write here
Here are some important topics about this project and how to replay it.
- virtualenv
python3 -m venv .venv
Before starting this application in your local environment, it'll be necessary to proceed with some tasks to reproduce this project.
- Get API Access https://developer.spotify.com
- Clone the repo
git clone https://github.com/luk3mn/spotify-profile.git
- Install packages
pip freeze -r requirements.txt
-
Create a new app on Spotify API after logging on your spotify account
-
Get both access credentials, "CLIENT_ID" and "CLIENT_SECRET" and indicate your redirect path of the application
OBS: This path it'll be necessary to extract our data from Spotify, we can specify it after building our flask application.
For more explanations, please refer to the Documentation
Useful Queries: Can be useful to personalize these queries to showcase some information in better or different ways.
- Select to allow showcase of these data groups by the number of times that the same track was played:
query = db.session.execute(text("""
SELECT name,
artist,
album,
popularity,
spotify_url,
preview_url,
album_url,
release,
image,
count(track_id) as "played"
from tb_recently_played
GROUP BY name, album, artist, release, image, popularity, spotify_url, preview_url, album_url
ORDER BY played DESC
""")).all()
- Select the most played track among all in the same list:
query = db.session.execute(text("""
SELECT * FROM (
SELECT name,
image,
artist,
album,
release,
popularity,
spotify_url,
count(track_id) as "played"
FROM tb_recently_played
GROUP BY name, image, track_id, artist, album, release, popularity, spotify_url
ORDER BY played DESC
) AS "most_played"
WHERE played = (
SELECT max(played) FROM (
SELECT count(track_id) as "played"
FROM tb_recently_played
GROUP BY track_id
) AS "max_played"
)
""")).all()
- Select the latest played track based on the time that this song was played:
query = db.session.execute(text("""
SELECT * FROM tb_recently_played
WHERE played_at = (
SELECT max(played_at)
FROM tb_recently_played
)
""")).all()
- Pipeline ETL using Spotify private data
- Data Extract: Get data using Spotify API
- Data Transform: Transform these data in structure data using pandas
- Data Load: Store these data on database to retrieve on web application using SQL Alchemy
- Web application
- Profile
- Get User's Profile
- Get User's Top Artists
- Spotify Playlist
- Tracks
- Get User's Top Tracks
- Discover
- Get Recently Played Tracks
- Discover Weekly
- Latest Played Track
- Most Played Track
- Recommendations
- Profile
Distributed under the MIT License. See LICENSE.txt
for more information.
Lucas Renan - lucasnunes2030@gmail.com
Project Link: https://github.com/luk3mn/spotify-profile
I think it would be interesting to sit here some references and other resources that were really useful and helped me to come up with this project.
- Data Engineering Course for Beginners - #1 EXTRACT
- Data Engineering Course For Beginners - #2 TRANSFORM
- Data Engineering Course For Beginners - #3 LOAD
- Scrape Spotify’s API within 20 mins
- Spotify OAuth: Automating Discover Weekly Playlist - Full Tutorial
- Bored of Libraries? Here’s How to Connect to the Spotify API Using Pure Python
- Spotify API OAuth - Automate Getting User Playlists (Complete Tutorial)
- Python: Loop through JSON File
- How to Solve: Insufficient client scope in Python using Spotipy
- How to Use Flask-SQLAlchemy to Interact with Databases in a Flask Application
- Introduction to SQLAlchemy in Pandas Dataframe
- Best-README-Template