This project is an extension of data engineering group project done previously, as part of accrediatation for Data Engineering course at UCL. Whilst the coursework scope stays the same, this project:
- Successfully merges 3 parquet files scraped from 2 websites: Open Sea, NFT Showroom, and adds 4 more data tables into the RDS.
- Deploys machine learning pipeline, predicting a number of total sales of an NFT collection, based on its features. The sales vary from 0 to 10.
The project's objective is to create more NFT datasets currently lacking on Kaggle and eradicate the problem of weak labelling in the digital art industry.
This is a guide on how to run the project using your Docker.
-
Type the following in your terminal:
git clone https://github.com/marfappv/data_eng_ind.git
-
Make sure
dockerfile
is run properly frompython-docker
folder. It will install all necessary libraries to run the Machine Learning pipeline code. -
Type the following in your terminal:
python3 main.py