Skip to content

Finding solutions to real-world aviation problems in two batch processing systems (Apache Hadoop and Spark), and in a stream processing system (Apache Storm).

Notifications You must be signed in to change notification settings

madhu1607/DecisionsInAviation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aviation Analysis ✈️

In this project I use various big data tools like hadoop, Spark and Cassandra for analysis and storing of data to answer a particular set of day to day questions in Aviation operations such as:

  • the best flight on a given day
  • the most popular airports
  • the most on-time airlines, etc.

Technologies and implementation

  1. Data Extraction and Cleaning - Extracting csv files to HDFS
  2. Data Analysis using Hadoop and PySpark
  3. MapReduce codes in java
  4. Process Mapreduce operation using Pyspark
  5. Store mapreduce outputs in a dataframe into a cassandra table to retrieve our required results. I ran cassandra locally as the size of my datasets and scope of my project was quite small

About

Finding solutions to real-world aviation problems in two batch processing systems (Apache Hadoop and Spark), and in a stream processing system (Apache Storm).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published