## Import Modules

Let's import what we need to parse the Marathon Guide site results.

In [20]:
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd

## Run A Query for the Boston Marathon

Now let's search for "Boston" to see if we can find the Boston Marathon.

In [2]:
url = "http://www.marathonguide.com/races/search.cfm"
searchParams = {
    "MarName":"Boston",
    "state":0,
    "Country":0,
    "BeginDate":"",
    "EndDate":"",
    "submit":"Search"
}
res = requests.post(url,data=searchParams)

## Parse Marathon Search with BeautifulSoup

We can read in the results content with BeautifulSoup. The search results are all linked to their page, referenced by an ID labelled as "MIDD". We can use this link to peel off the MIDD for each race in the search result via `MIDD\s*=\s*([\S\s]+)`, where

* `MIDD` checks for the text "MIDD"
* `\s*` selects all whitespace if there is any
* `=` Will match the "=" sign
* `([\S\s]+)` select all characters.

In [12]:
# Read in the results of the query with BeautifulSoup
soup = BeautifulSoup(res.content)

# Loop through all <a> tags and find those with "Boston" in the name
# TODO: Ignore any <strike>'ed results
for a in soup.find_all("a"):
    if a.text == "Boston Marathon":
        link = a.get_attribute_list('href')
        midd = re.search('MIDD\s*=\s*([\S\s]+)', link[0]).group(1)
        race = a.text

In [14]:
url = 'http://www.marathonguide.com/races/racedetails.cfm'
params = {
    "MIDD": midd
}
res = requests.get(url,params)

b'\r\n\r\n<html><head>\r\n<title>Boston Marathon - Race Details</title> <meta name="description" content="Boston Marathon Information by MarathonGuide.com - the complete marathon resource and community.  Complete directory of marathons, marathon results, athlete and race news, marathon history, training schedules, chat, email, marathoning humor - everything for the marathon runner and marathon fan.">\r\n<meta name="keywords" content=" Boston Marathon, 2024 boston marathon, marathons races, marathoning, running marathons, course maps, elevation charts, training coaches, results races, records marathon guide directory, history of the marathon, runners racing, international marathoning, training schedules, running calculators, fitness health and nutrition calculators, racing, olympics marathons, olympics runners, running quotes, community chat, email runnng, training programs, athletes, marthon, marthons, history, joan benoit, tegla loroupe, john elliott, marthon directory, schedule, info

In [40]:
years = [str(i) for i in range(2000,2024)]
soup = BeautifulSoup(res.content)
dfList = []
for a in soup.find_all("a"):
    if a.text in years:
        link = a.get_attribute_list('href')
        midd = re.search('MIDD\s*=\s*([\S\s]+)', link[0]).group(1)
        year = a.text
        dfList.append(pd.DataFrame([midd,year]))

df = pd.concat(dfList,axis=1).T
df.columns = ("midd","year")
df = df.sort_values("year").reset_index().drop('index',axis=1)
df.head()

Unnamed: 0,midd,year
0,15000417,2000
1,15010416,2001
2,15020415,2002
3,15030421,2003
4,15040419,2004


## Get the Page for Results of the Most Recent Marathon

Now we get the page for a particular event; the most recent Boston Marathon. We'll parse some initial results from the page, such as the number of finishers, and the number of finishers registered as male and female.

In [45]:
year = df["year"].to_list()[-1]
midd = df["midd"].to_list()[-1]

url = f"http://www.marathonguide.com/results/browse.cfm"
params = {
    "MIDD": midd
}
res = requests.get(url,params)

In [55]:
nFinishers = int(res.text.split("Finishers: ")[1].split(",")[0])
nMales = int(res.text.split("Males - ")[1].split(",")[0])
nFemales = int(res.text.split("Females - ")[1].split("\n")[0])

26600 15171 11405


## Parse Event Results

The page requires users to submit a form to view the data, so let's now scrape that data for an event.

TODO: [Need to handle referrer](https://stackoverflow.com/questions/23303120/requests-interacting-strangely-with-redirect)

In [80]:
formData = {
    "RaceRange": ["B,1,100,26600","",""],
    "RaceRange_Required":	"You+must+make+a+selection+before+viewing+results.",
    "MIDD": 15230417,
    "SubmitButton": "View"
}
url = "http://www.marathonguide.com/results/makelinks.cfm"
res = requests.post(url,formData)

In [84]:
url = "http://www.marathonguide.com/results/browse.cfm?MIDD=15230417&Gen=B&Begin=1End=100&Max=26600"
res = requests.get(url)
BeautifulSoup(res.content)


<html>
<head><script type="text/javascript">/* <![CDATA[ */_cf_loadingtexthtml="<img alt=' ' src='/CFIDE/scripts/ajax/resources/cf/images/loading.gif'/>";
_cf_contextpath="";
_cf_ajaxscriptsrc="/CFIDE/scripts/ajax";
_cf_jsonprefix='//';
_cf_websocket_port=8577;
_cf_flash_policy_port=1243;
_cf_clientid='0B56D56BB73B0D6813C46356B23E7C13';/* ]]> */</script><script src="/CFIDE/scripts/ajax/messages/cfmessage.js" type="text/javascript"></script>
<script src="/CFIDE/scripts/ajax/package/cfajax.js" type="text/javascript"></script>
<script src="/CFIDE/scripts/ajax/smp/swfobject.js" type="text/javascript"></script>
<script src="/CFIDE/scripts/ajax/jquery/jquery.js" type="text/javascript"></script>
<script src="/CFIDE/scripts/chart/cfchart-lite.js" type="text/javascript"></script>
<script src="/CFIDE/scripts/chart/cfchart-html.js" type="text/javascript"></script>
<script src="/CFIDE/scripts/chart/cfchart.js" type="text/javascript"></script>
<script src="/CFIDE/scripts/chart/license.js" type="te