Skip to content

Commit

Permalink
improved version
Browse files Browse the repository at this point in the history
  • Loading branch information
hassanrhanimi committed Dec 28, 2021
1 parent 3d544ad commit 8a1b47a
Showing 1 changed file with 6 additions and 7 deletions.
13 changes: 6 additions & 7 deletions README.md
@@ -1,5 +1,4 @@
#-------- Data_Engineering_project*-------------------------------------------------------------------------------------------------------

# Data_Engineering_project*

This repository is composed of two mini-projects where we'll assume the role of a data engineer.

Expand All @@ -11,17 +10,17 @@ ETL does exactly what the name implies. It is the process of
- Transforming it into one specific format, and
- Loading it into a database or target file.

#-------------------------------------------------First scenario--------------------------------------------------------------------------
## First scenario

Assume working at a start-up devolopping an AI tool to predict if someone is at Risk for diabetes using height and body weight.
The goal is to implement an ETL to get data from multiple sources (csv, json, xml file format), transforming it and store it in
a format acceptableby the AI.
The first exemple assumes working at a start-up devolopping an AI tool to predict if someone is at Risk for diabetes using height
and body weight. The goal is to implement an ETL to get data from multiple sources (csv, json, xml file format), transforming it
and store it in a format acceptableby the AI.

A second exemple of car-dealer campany is also given.
The dataset contains CSV, JSON, and XML files (three from each format) for used car data which contain features named car_model, year_of_manufacture, price, and fuel. The Goal is to implement a simple ETL to collect, combine and transform data then store it
into one file for end-user (a data analyst, scientist for analysis or a BI enginner for ad-hoc reporting).

#-------------------------------------------------Second scenario-------------------------------------------------------------------------
## Second scenario

Assume working for an international financial analysis company. the company tracks stock prices, commodities, forex rates, inflation rates. the job is to extract financial data from various sources like websites, APIs and files provided by various financial
analysis firms. After collectting the data, and extracting the data of interest to the company and transforming it based on the requirements given. I'll store the data in a csv file ready to be loaded into a database.
Expand Down

0 comments on commit 8a1b47a

Please sign in to comment.