No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Launch Stack

Please note: This will not be under AWS free tier and will be billed. The EMR cluster is not termianted automatically and should be deleted manually from the Cloud Formation console when done.

This is a small demo for getting started with AWS EMR. AWS EMR is managed Hadoop Framework with support for Apache Spark, Presto etc.,

The included Cloud Formation Template, launches an EMR stack in a new VPC and executes a small step on Apache Spark

It takes an CSV file as input from S3 which had Census data with male population and female population in it.

The pyspark script calculates the sex ratio and adds it as a new column to the CSV and uploads it to the output bucket

The output bucket must be specified while creating a stack.