Skip to content

Latest commit

 

History

History
48 lines (28 loc) · 2.59 KB

README.md

File metadata and controls

48 lines (28 loc) · 2.59 KB

Covid-19 Health Care Data Engineering Project

About Project:

Created a Data Warehouse of COVID-19 data on Cases & Deaths, Hospital Admissions and more, develop a complete Data Pipeline using Azure Data Factory & Databricks. Data Visualization was made using PowerBi.

Solution Architecture:

Covid19_DataFlow_diagram

Getting Started

  1. Cloned the project repository from GitHub .

  2. Above line can be skipped by fetching data from ECDC API.

  3. Developed a Data Pipeline in Azure Data Factory

    ◾ Fetched data from GitHub to Azure Blob Storage.

copyActivity

CopySuccess

   ◾ Processed data by applying diverse transformations as per requirements using:

       ▪ Used Dataflows in Data Factory

Hospital_Datafloow

admission_hospitalFlowData

       ▪ Pyspark in Azure Databricks to write data in Azure SQL DB.
  1. Created Data Lake to store raw and processed data.

  2. Developed a Data Warehouse in Azure SQL DB(DDL Command & Pyspark_code_in_SQL) and masked the sensitive data using Pyspark functionality(Pyspark code)

  3. To get insights out of it, data from SQL DB was loaded into Power BI Desktop.

Health_care_report

Services Used :

◽ Azure Data Factory (Dataflows, Linked Services, Triggers, Azure Databricks)

◽ Azure Blob Storage

◽ Azure Data Lake Storage Gen 2

◽ Azure SQL DB