Back-end web application using Python and Flask to build a Rest API by extracting data from My Anime List through web scraping.
Explore the docs »
Table of Contents
In this project, it was necessary to split it into some parts, such as:
- ETL Pipeline: Extraction using web scraping strategy, transformation of these data and load in the MongoDB database;
- Database configuration: configuration to assist a connection from Flask application to MongoDB collections;
- Build Rest API: It was created some endpoints API to consume these data from the database;
- Deployment on AWS: It's time to make this project available on the internet, and to fulfill this step it's necessary to use containers docker to work with the application and database separately and an easy way to apply changes whenever we want;
All the data used in this project belongs to My Anime List extracted by web scraping method and it was possible by using the library "Beautiful Soup". During this process, it was able to go through several contents and organize them to store in a dictionary to facilitate some validation process before loading in the MongoDB database.
To be able to create a connection between Flask and MongoDB, it was necessary to use the library "pymongo" which facilitated a bunch of features that included connection resources and collection manipulation.
The API endpoints were built using the Flask framework from Python and on top of that, it was needed to create a DTO class to limit the quantity of information during endpoint requests.
GET /api/v1/anime/extract
Parameter | Type | Description |
---|---|---|
None |
None |
Required. to extract and load new data |
GET /api/v1/anime
Parameter | Type | Description |
---|---|---|
None |
None |
to list all anime |
GET /api/v1/anime/name/${anime_name}
Parameter | Type | Description |
---|---|---|
anime_name |
string |
to get anime by name |
GET /api/v1/anime/genre/${genre_name}
Parameter | Type | Description |
---|---|---|
genre_name |
string |
to get anime by genre |
GET /api/v1/anime/rank/${anime_rank}
Parameter | Type | Description |
---|---|---|
anime_rank |
integer |
to get anime by rank |
GET /api/v1/anime/score/${anime_score}
Parameter | Type | Description |
---|---|---|
anime_score |
integer |
to get anime by score |
Here are some important topics about this project and how to replay it.
-
virtualenv
python3 -m venv .venv
-
Environment Variables
To run this project, you will need to add the following environment variables to your .env file
HOST
PORT
DB_NAME
Before starting this application in your local environment, it'll be necessary to proceed with some tasks to reproduce this project.
- Clone the repo
git clone https://github.com/luk3mn/mal-api.git
- Install packages
pip freeze -r requirements.txt
This project can be deployed on AWS simply by using an EC2 instance and releasing port 5000 to the Anywhere IP address. Once the instance is working, just follow the Deplymet steps next and use an IP address allocated to the EC2 instance on port 5000 on Postman, APIDOG or whatever application that allows the testing of web APIs.
To deploy this project run
-
docker-compose
sudo apt install docker-compose
-
Running the application and MongoDB using containers
sudo docker-compose up -d
Processing
- Extract: get data from the source using web scraping
- Transform: to valid some information before storing it in the database
- Load: store data in MongoDB database
MongoDB
- Database configuration
- Working on repository class
API Rest
- GET /api/v1/anime/extract
- GET /api/v1/anime
- GET /api/v1/anime/name/{anime_name}
- GET /api/v1/anime/genre/{genre_name}
- GET /api/v1/anime/rank/{anime_rank}
- GET /api/v1/anime/score/{anime_score}
Docker
- Run Python application by docker
- Run MongoDB database by docker
Deploy
- AWS
This project was an excellent learning object for me. I was able to deep into REST API architecture using Python and Flask, ways to use Docker to deploy an application in a container by using docker-compose and Dockerfile, and finally to get running the application on the cloud using AWS EC2 Instances.
Distributed under the MIT License. See LICENSE.txt
for more information.
- username: @luk3mn
If you have any feedback, please reach out to us at lucasnunes2030@gmail.com
Project Link: https://github.com/luk3mn/mal-api
I think it would be interesting to place here some references and other resources that were useful and helped me to work on this project. I hope it can help you as well!
- Web Scraping With Python – Step-By-Step Guide
- Beautiful Soup: Build a Web Scraper With Python
- StackOverflow
- w3schools: Python MongoDB Find
- How to Use *args and **kwargs in Python
- Design Patterns for REST-APIs
- How to Dockerize a Flask Application
- Python MongoDB tutorial using PyMongo and Docker
- Creating Dockerized Flask + MongoDB Application
- How To Set Up Flask with MongoDB and Docker
- Create an API using Flask, MongoDB and Docker
- MongoDB docker image documentation
- Quick MongoDB Docker Setup