In this project I use various big data tools like hadoop, Spark and Cassandra for analysis and storing of data to answer a particular set of day to day questions in Aviation operations such as:
- the best flight on a given day
- the most popular airports
- the most on-time airlines, etc.
- Data Extraction and Cleaning - Extracting csv files to HDFS
- Data Analysis using Hadoop and PySpark
- MapReduce codes in java
- Process Mapreduce operation using Pyspark
- Store mapreduce outputs in a dataframe into a cassandra table to retrieve our required results. I ran cassandra locally as the size of my datasets and scope of my project was quite small