azure-databricks-etl-project

ETL motor racing data

####The data is from the Ergast website.

The data is stored in the form of an API, downloadable CSVs, and nested or non-nested JSON files. Azure Databricks on top of Apache Spark, Azure Notebook, and Azure Data Lakes Storage are the main tools for this ETL Project.

In this project, I focused on extraction from the CSV AND JSON files for my ETL. This can be done on a free AZURE trial option from Microsoft.

Here is a quick diagram of the high-level plan.

Quick Overview of my ETL Processes

Purple Blocks show columns were renamed and/or transformed Red Blocks show columns that were dropped Green Blocks show columns that were Added

Both horizontal and vertical scaling is very much possible but a larger budget would be necessary to truly take advantage of the full potential of Azure Databricks.

Below are random snapshots the reproducable files are avalable DataBricks files are in the folder

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
racing_etl_files		racing_etl_files
racing_etl_images		racing_etl_images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

azure-databricks-etl-project

ETL motor racing data

Quick Overview of my ETL Processes

Creating secure secret keys and connecting and create and mounting the raw empty folder

Uploading raw files to Data Lakes Storage raw folder

read the json file using the spark dataframe

Output to parquet file

About

Releases

Packages

Languages

randyroac/azure-databricks-etl-project

Folders and files

Latest commit

History

Repository files navigation

azure-databricks-etl-project

ETL motor racing data

Quick Overview of my ETL Processes

Creating secure secret keys and connecting and create and mounting the raw empty folder

Uploading raw files to Data Lakes Storage raw folder

read the json file using the spark dataframe

Output to parquet file

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages