UserTransactions_DataPipeline

Objective

To solve the some of the Use case of the Data PipeLine and Implement the SCD1 in Hive

Scenario

You are given Data that is stored in multiple CSV files (its about the User Transactions)

load the data from the CSV file to the SQL table.
Using Sqoop Job (Incremental Load) send the data from the SQL to hdfs.
Using Hive first load the Data from hdfs to the Manage table then load the data to the External Table partition Year wise and then Month wise Implement the SCD 1 in this process.
Finally, load back the data to another SQL table to cross verify the data from the Source to Destination

Input :

given three csv's stored in this format

"custid","username","quote_count","ip","entry_time","prp_1","prp_2","prp_3","ms","http_type","purchase_category", "total_count","purchase_sub_category","http_info","status_code"

My Approach :

Using Python load the Data to the SQL table store it in the two tables one for the validation and another for shifting the data from the SQL to the HDFS (sqoop job)
Using the Sqoop Job load the data from the SQL table to the HDFS
In the Hive create two manage tables
One for loading each CSV file and truncating it after shifting the data.
another to store entire data having the SCD1 implementation (as we cannot apply the Acid property to the External table)
Override the Manage Table that had SCD1 got implemented to the External table
Load the data from the Manage that keeps truncating for every file

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset		dataset
img		img
README.md		README.md
csv_to_sql.py		csv_to_sql.py
docs.docx		docs.docx
ots.sh		ots.sh
script2.sh		script2.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

img

img

README.md

README.md

csv_to_sql.py

csv_to_sql.py

docs.docx

docs.docx

ots.sh

ots.sh

script2.sh

script2.sh

Repository files navigation

UserTransactions_DataPipeline

Objective

Scenario

Input :

My Approach :

About

Releases

Packages

Languages

melwinmpk/UserTransactions_DataPipeline

Folders and files

Latest commit

History

Repository files navigation

UserTransactions_DataPipeline

Objective

Scenario

Input :

My Approach :

About

Topics

Resources

Stars

Watchers

Forks

Languages