Please note: This will not be under AWS free tier and will be billed. The EMR cluster is not termianted automatically and should be deleted manually from the Cloud Formation console when done.
This is a small demo for getting started with AWS EMR. AWS EMR is managed Hadoop Framework with support for Apache Spark, Presto etc.,
The included Cloud Formation Template, launches an EMR stack in a new VPC and executes a small step on Apache Spark
It takes an CSV file as input from S3 which had Census data with male population and female population in it.
The pyspark script calculates the sex ratio and adds it as a new column to the CSV and uploads it to the output bucket
The output bucket must be specified while creating a stack.