sauravvenkat

Saurav Venkat sauravvenkat

Pinned Loading

SparkDataLakes SparkDataLakes Public

This is an ETL pipeline taking data from S3 data lake, transformed using Spark, and finally uploaded back to S3 into partitioned parquet file format.

Python 1
CloudDataWarehouse CloudDataWarehouse Public

This is an ETL pipeline taking source data from an open source music database stored in Amazon S3, transforming the data, and then finally uploading the data into Amazon Redshift.

Python
Instacart-Market-Basket Instacart-Market-Basket Public

This is an Exploratory Analysis of the Instacart Market Basket Dataset on Kaggle: https://www.kaggle.com/c/instacart-market-basket-analysis

Jupyter Notebook
Capital_Bike_Share Capital_Bike_Share Public

This is an Exploratory Data Analysis of the publicly available Capital Bike Share Dataset

Jupyter Notebook
DataVisualization DataVisualization Public

These are data visualizations I've created using Python and D3.js

HTML