# Modeling Runner's Times in Cherry Blossom Race

Daniel Byrne, Joanna Duran

9/19/19

### Abstract
We analyze the Cherry Blossom 10 Mile race results in order to assess if the age distributions of racers change over the years. We compare the age distributions of the runners across the years 1999-2012. We utilize  box plots and density curves to make our comparisons. We will answer "How do the distributions change over the years?" and "Was it a gradual change?"

### Introduction
Thousands of people participate in 5ks and 10ks every year. In most of these races, lots of data is collected by the race organizers and racing services. This data is generally published on the individual race's website or on the racing services website. This data is freely accessible and can give quite a bit of insight about runner's performance over time as well as participants demographics.

We chose to investigate the Cherry Blossom Ten Mile Run held in Washington D.C. every year in April. The race started in 1973 and there are records from 1973 that are avialable on the race's website, http://www.cherryblossom.org/. Upon ivestigation of the 1973 data there seems to be just sex, name, time and pace available; age, division and hometown are not available. For our analysis we chose to focus on data from the years 1999 to 2012. The data is publically available and we web scraped it from the Web and read into R Studio.

Our analysis objective is to compare the age distributions of the runners across all 14 years (1999-2012) of the races.  We sought out to answer "How do the age distributions change over the years?" and "Was it a gradual change?"

### Methods

Our analysis follow the following steps: Scrape data from site and read into R, transform data and perform basic statistics, examine various plots, determine statistically significant difference between men ages, differences between the grand mean and the group means.

#### Scrape Web
We began by reading in the results. When we examine the first 6 lines of our data we see the data read in correctly (Fig. 1)

![](scrape.jpg)

                                                Fig. 1 Initial read in

#### Basic statistics
We performed basic descriptive statistics on ages per year (Fig. 2). What we gathered here was that there might be some points that needed to be investigated. One case in point is the minimum age in the 2002 dataset. The minimum age is four and it is hard to belive that a four year old ran the 10 miler. After investigating, it seems to be a typo since the time was 1:28:56 which coincides with an plausible time.

![](descStats.jpg)

                                                Fig. 2 Descriptive Statistics

#### Examine Plots
From the boxplot we observed that the average age decreases gradually from 2000 until 2010. 

![](boxplot.jpg)

                                                Fig. 3 Boxplot

The smooth scatter

![](Smoother.jpg)

                                                Fig. 4 Smooth Scatter

The density

![](density.jpg)

                                                Fig. 5 Density Plot

#### Statistically Significant
determine statistically significant difference between men ages

![](anova.jpg)

                                                Fig. 6 ANOVA

#### Differences between the grand mean and the group means
We perform a 

![](pairwise.jpg)

                                                Fig. 7 Pairwise Comparison

### Results
The results section will include intelligent discussion of the output of the work.  Please stick to the data and avoid commentary on the software including issues with errors, bugs or code.  Figures, tables, and diagrams are especially useful in this section.  Be sure to caption figures and describe them in a manner consistent with a professional report.  For example, if the assignment was to try a new method, compare the results with a previous standard method.  If multiple methods were used, compare the results to each other.  Describe what the data implies at a level of an educated reader in the field.

Refer to figures as 'Fig X.' such as Fig 1.  Figures can be presented inline or after the main text, but should always be a part of the main document.  It is asked that figures be generated by the software and not screen captures.  If you are uncertain how to generate a figure from a piece of software, feel free to ask.  Please note that all of the software packages utilized in this course are capable of producing graphics for export.  Label all axes, use units and other best practices.  Make the title descriptive of the data presented.  It is also acceptable to include your code inline for Jupyter, including graphics generation.

In [5]:
library(rvest)
library(tidyverse)
library(mosaic)
library(gmodels)
library(knitr)
library(pander)
mensResults = read.csv(paste0("./MensResults",1999,".csv"),stringsAsFactors = FALSE)
for(year in 2000:2014){
  data = read.csv(paste0("./MensResults",year,".csv"),stringsAsFactors = FALSE)
  mensResults = rbind(mensResults,data)
}
mensResults$Age = as.numeric(mensResults$Age)
mensResults$Race = as.factor(mensResults$Race)
head(mensResults)

"NAs introduced by coercion"

X,Race,Name,Age,Time,Pace,PiS.TiS,Division,PiD.TiD,Hometown
1,1999 10M,Worku Bikila (M),28,0:46:59,4:42,1/3190,M2529,1/394,Ethiopia
2,1999 10M,Lazarus Nyakeraka (M),24,0:47:01,4:42,2/3190,M2024,1/85,Kenya
3,1999 10M,James Kariuki (M),27,0:47:03,4:42,3/3190,M2529,2/394,Kenya
4,1999 10M,William Kiptum (M),28,0:47:07,4:43,4/3190,M2529,3/394,Kenya
5,1999 10M,Joseph Kimani (M),26,0:47:31,4:45,5/3190,M2529,4/394,Kenya
6,1999 10M,Josphat Machuka (M),25,0:47:33,4:45,6/3190,M2529,5/394,Kenya


### Conclusion
Summarize your results.  Again, do so in a professional manner.  Conclude with what was learned and any possible follow ups.  Avoiding again commenting on errors and software issues.  It is appropriate to talk about methods and how the results compare.