Analysis of New York city yellow taxi data set with Hadoop MapReduce.
- What is the average number of passengers per trip in general and per day of the week?
- What is the average trip distance in general and per day of the week?
- What are the most used payment types?
- How does the number of Passengers change over the week day and weekend. A graph (using the output of a MapReduce job) showing the average number of passengers over the day (per hour).
- How does the total distance travelled change over different time of the day for all days of the week. a graph showing the average trip distance over the day (per hour).
- Individual java programs have been created to answer each of the Intial questions that were set up for analysis. The naming convention followed is QuestionX.java where x denotes the question number for example; question 1 is solved in Question1.java
The graph can be found in the specific folder in the repository and depicts the variation in the number of passengers traveling in the taxi at the various instance of time during the day. The graph was drawn using TIBCO Spotfire from the output obtained using the map-reduce task. It can be inferred from the graph above that though both the lines follow the same trend i.e. the highs and the lows follow a similar pattern it is visible that the Average passengers traveling by taxi at any given time on a weekend are always higher than the corresponding value on weekdays.
The second graph epicts the variation in the number of passengers traveling in the taxi at the various instance of time during the day. The graph was drawn using TIBCO Spotfire from the output obtained using the map-reduce task. It can be inferred from the graph above that though both the lines follow the same trend i.e. the highs and the lows follow a similar pattern it is visible that the Average passengers traveling by taxi at any given time on a weekend are always higher than the corresponding value on weekdays.
- The graphs were plotted using TIBCO Spotfire and data from the same was obtained from the output of question 4 and question 5.