N-Consequetive-Increases-In-Spark

Finding out N consequetive increases in a dataset.

Use Case

Say we need to find the performance of an airline based on the delay of departure. We want to flag those airlines which has a delay N consequetive times and say these are bad. So maybe next time users will avoid them. The goal of this project is to solve this use case using apache spark.

Solution summary

The steps involves in solving this:

First we need to group data of a single airline in a single partition and sort them out. 2)We need to iterate inside the partition and find out N consequetive increases and raise an alarm which says these are bad.

Solution Detail

The data we used in from the Transporatation bureau(https://transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time). We loaded the data into spark and took the following fields UNIQUE_CARRIER, DEST_AIRPORT_ID, ARR_DELAY

We used a case class which would hold these fields and also defined the implicit ordering in the case class so that the delay would be sorted by carrier and airport. This would in turn suggest that for a particular carrier and airport data the delay is sorted in ascending order.

Now define a custom partitioner to partition by carrier. We use repartitionAndSortWithinPartitions to partition and sort the data using the case class and the custom partitioner.

Finally we use mapPartitions and iterate within the partitions and see if we find N consequetive increase in delays over multiple airports and raise alarm for them.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.settings		.settings
project		project
src/main/scala		src/main/scala
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

N-Consequetive-Increases-In-Spark

Use Case

Solution summary

Solution Detail

About

Releases

Packages

Languages

jonu4u/N-Consequetive-Increases-In-Spark

Folders and files

Latest commit

History

Repository files navigation

N-Consequetive-Increases-In-Spark

Use Case

Solution summary

Solution Detail

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages