Skip to content

itversity/etl-pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL Pyspark

  • Clone GitHub Repository
  • Create virtual environment specific to this project
  • Install dependencies
  • Activate Virtual environment
  • Launch spark-sql and create these 2 tables.
CREATE TABLE t (d DATE) LOCATION 'file:/Users/itversity/Projects/Internal/etl-pyspark/t';
CREATE TABLE ts (t TIMESTAMP ) LOCATION 'file:/Users/itversity/Projects/Internal/etl-pyspark/ts';
  • Make sure to create logs folder
  • Run using spark-submit app.py REPORT_1

About

A Pyspark based light weight ETL Application

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages