Skip to content

joeWatersDev/ibm-data-engineering-capstone-project

Repository files navigation

Data Engineering Capstone Project

Demonstrate knowledge of Data Engineering by assuming the role of a Junior Data Engineer who is presented with a project that requires architecting and implementing a data analytics platform.

Visual breakdown of the project components

This is the final project of the IBM Data Engineering Professional Certificate program.

Overview

As part of this project I create and query data repositories using relational and NoSQL databases such as MySQL and MongoDB. I also design and populate a data warehouse using PostgreSQL and IBM Db2 and write queries to perform Cube and Rollup operations.

I generate reports from the data in the data warehouse and build a dashboard using Cognos Analytics. I also utilize Extract, Transform, and Load processes by creating data pipelines for moving data from different repositories. Finally, I perform big data analytics using Apache Spark to make predictions with the help of a machine learning model.

Learning Objectives

  • Demonstrate proficiency in skills required for an entry-level data engineering role.
  • Design and implement various concepts and components in the data engineering lifecycle such as data repositories.
  • Showcase working knowledge with relational databases, NoSQL data stores, big data engines, data warehouses, and data pipelines.
  • Apply skills in Linux shell scripting, SQL, and Python programming languages to Data Engineering problems.

Components

  1. MySQL OLTP Database I design a data platform that uses MySQL as an OLTP database. I use MySQL to store the OLTP data.

  2. MongoDB NoSQL Databases I design a data platform that uses MongoDB as a NoSQL database. I use MongoDB to store the e-commerce catalog data.

  3. PostgreSQL Data Warehouse I design and implement a data warehouse and then generate reports from the data in the data warehouse.

  4. Cognos Data Analytics I assume the role of a data engineer and design a reporting dashboard that reflects the key metrics of my company's business.

  5. Airflow ETL Data Pipelines I perform various ETL operations that move data from RDBMS to NoSQL, NoSQL to RDBMS, and from RDBMS, NoSQL to the data warehouse. I write a pipeline that analyzes the web server log file, extracts the required lines and fields, transforms and loads data.

  6. Spark Big Data Analytics I use the data from a webserver to analyze search terms. I then load a pretrained sales forecasting model and predict the sales forecast for a future year.

About

Final project for the IBM Data Engineering Professional Certificate course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published