Skip to content

loinguyen3108/dvdrental-etl

Repository files navigation

DVD Rental

This is example project for hadoop, spark, hive and superset

github release date commit active license PRs welcome code with hearth by Loi Nguyen

🚩 Table of Contents

🎨 Stack

Project run in local based on docker-compose.yml in bigdata-stack

⚙️ Setup

1. Run bigdata-stack

git clone git@github.com:loinguyen3108/bigdata-stack.git

cd bigdata-stack

docker compose up -d

2. Spark Standalone
Setup at spark document

3. Dataset
Data is downloaded at PostgreSQL Sample Database

4. Environment

export JDBC_URL=...
export JDBC_USER=...
export JDBC_PASSWORD=...

5. Build dependencies

./build_dependencies.sh

6. Insert local packages

./update_local_packages.sh

7. Args help

cd manager
python ingestion.py -h
python transform.py -h
cd ..

8. Run

# ingest data from postgres to datalake
spark-submit --py-files packages.zip manager/ingestion.py --exec-date YYYY:MM:DD --table-name <table_name> --p-key <key name> --loading-type <type>

# transform data from datalake to hive
spark-submit --py-files packages.zip manager/transform.py --exec-date YYYY:MM:DD

✍️ Example

  • Data Lake

Data Lake

  • Hive

Hive

📜 License

This software is licensed under the Apache © Loi Nguyen.