# Scraping race information from roots and rain for each year of Downhill world cup

This document works through the web scrapping of data from roots and rain for each downhill world cup

In [1]:
#loading libraries
import re
import pandas as pd
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
import numpy as np
import traceback
from fuzzywuzzy import process
import fuzzywuzzy as fuzz

## Creating a list of the landing pages for the races from each year

Some of the important metadata about the races is only presnt on these landing pages (or it is the only place I can find it) so we will use these to extract race locaitons and also the url for every race results page

In [2]:
years=list(range(1996,2023))
url_list=['https://www.rootsandrain.com/event-list/filters/1995,dh/organisers21/']
modify_url='https://www.rootsandrain.com/event-list/filters/1995,dh/organisers21/'
for y in years:
    new_url=modify_url.replace(str(y-1),str(y))
    url_list.append(new_url)
    modify_url=new_url
   

## Extracting data from each of these web pages

Now that we have a list of the urls for each year of downhill racing we want to pull data, inluding the url for the individual race pages from each of these pages. To do this we will define a function to scrape this data, then we will iterate through the list of urls

In [3]:
## this function scrapes race info and urls from the landing page for one year of UCI races
def scrape_race_info(url):

    one_year_races=pd.read_html(url)[0]
    one_year_races.index=one_year_races['Event']
    # this should contain the basic information for one yaer of DH races
    one_year_races.head(5)

    page_2  = requests.get(url).text 
    one_year_races['url']=""


    for race_name in one_year_races['Event']:
        # this line is included because of an odly formatted masters world champs in 2021. 
        #I am not iterested in these data so excludign here
        if str(race_name.lower()).find('masters') ==-1:
            if (race_name.lower()).find('world champ') ==-1:
                loc_race=str(page_2).find(race_name)
                string_with_url=str(page_2)[loc_race:loc_race+250]
                begin_index=string_with_url.index('href=')+6
                end_index=string_with_url.index("/\">")+1
                new_url='https://www.rootsandrain.com'+string_with_url[begin_index:end_index]
                one_year_races.loc[race_name,"url"]=new_url
            else:
                loc_race=str(page_2).find(race_name)
                string_with_url=str(page_2)[loc_race-250:loc_race+2]
                begin_index=string_with_url.index('href=')+6
                end_index=string_with_url.index("/\">")+1
                new_url='https://www.rootsandrain.com'+string_with_url[begin_index:end_index]
                one_year_races.loc[race_name,"url"]=new_url
    
    return(one_year_races)

### Interating through the list of urls we generated

In [4]:
all_races=pd.DataFrame()
for url in url_list:
    all_races=pd.concat([all_races,scrape_race_info(url)],axis=0)

how many races do we have?

In [5]:
all_races.shape

(244, 5)

Drop races without a url - these are from masters world champs which I am not interested in

In [6]:
all_races['url'].replace('', np.nan, inplace=True)
all_races.dropna(subset=['url'], inplace=True)
all_races.shape

(220, 5)

### pulling the country and track name from the venue

The data on these pages has information inclduing the race venue, date, and country the race was in that we want to pull out and use later on.

In [7]:
expanded_venue_info=pd.DataFrame(all_races['Venue'].str.split(',', 3, expand=True))

all_races['track_name']=expanded_venue_info[0]

race_country=[]
for i in range(220):
        if expanded_venue_info.iloc[i,2] == None and expanded_venue_info.iloc[i,3] == None:
            race_country.append(expanded_venue_info.iloc[i,1])
        elif expanded_venue_info.iloc[i,2] != None and expanded_venue_info.iloc[i,3] == None:
            race_country.append(expanded_venue_info.iloc[i,2])
        else:
            race_country.append(expanded_venue_info.iloc[i,3])
all_races['race_country']=race_country

In [111]:
#all_races.to_csv('data/race_information.csv')  

What does the dataset look like right now?

In [8]:
all_races.head()

Unnamed: 0_level_0,Date⇩,Event,Venue,Competitors,url,track_name,race_country
Event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
UCI - DHI World Championships '95,23rd Sep 1995,UCI - DHI World Championships '95,"Kirchzarten, Germany",240,https://www.rootsandrain.com/event1459/1995-se...,Kirchzarten,Germany
1995 Grundig UCI World Cup DH #6,1st Jan 1995,1995 Grundig UCI World Cup DH #6,"Kaprun, Austria",-,https://www.rootsandrain.com/event9541/1995-ja...,Kaprun,Austria
1995 Grundig UCI World Cup DH #5,1st Jan 1995,1995 Grundig UCI World Cup DH #5,"Big Bear Lake, CA, USA",-,https://www.rootsandrain.com/event9540/1995-ja...,Big Bear Lake,USA
1995 Grundig UCI World Cup DH #4,1st Jan 1995,1995 Grundig UCI World Cup DH #4,"Mont-Sainte-Anne, QC, Canada",-,https://www.rootsandrain.com/event9539/1995-ja...,Mont-Sainte-Anne,Canada
1995 Grundig UCI World Cup DH #3,1st Jan 1995,1995 Grundig UCI World Cup DH #3,"Mt Snow, VT, USA",-,https://www.rootsandrain.com/event9538/1995-ja...,Mt Snow,USA


## Now we are going to use the urls for each race to access the results

Frist we need to define a couple of funcitons that were are going to use too pull out the data. First is a function to determine the country a rider is from. This information is on the webpage for the race results as is a part of the html code used to display the litte flags. The second function take the url of a race result, and the track name, race country and race date. Then it accesses the webpage for the race result and scapres the the data that we want. 

In [9]:
## this code is struggling with about 8 percent of the riders where it can't parse their names close enough to find a match
## adding code for when this fails to try fuzzy matching which is more computationally intensive
countries=pd.read_csv('data/nationalities.csv') 

def country_of_origin(rider_name,soup):
    name_parts=rider_name.split()
    if len(name_parts)>=3 and name_parts[2]!="(elt)":
        reverse_name=name_parts[-2]+" "+name_parts[-1]+" "+name_parts[0]
    else:    
        reverse_name=name_parts[1]+" "+name_parts[0]
    if str(soup).find(reverse_name)==-1:
        split_soup=str(soup).split('<td data-sb')
        fuzzy_match_of_name=process.extract(reverse_name,split_soup,limit=1)
        for nationality in countries['nationality']:
            if str(fuzzy_match_of_name).find(nationality)>0:
                country=nationality
    else:
        loc_name=str(soup).find(reverse_name)
        string_with_country=str(soup)[loc_name:loc_name+150]
        string_with_country_no_punc=re.sub(r'[^\w\s]', ' ', string_with_country).split()
        index=string_with_country_no_punc.index("title")+1
        country=string_with_country_no_punc[index]
        if string_with_country_no_punc[index+1]!="click":
            country=country+" "+string_with_country_no_punc[index+1]
    ## The first part looks for exact matches of the riders name in the html code. If this fails 
    ## then we try looking for a fuzzy match of the rider name, and then looking for a country in the text string
    ## the list of countries was generated from earily runs where we didn't get all of the riders. 
    return(country)

def get_race_results(url,track_name,race_country,date):

    data  = requests.get(url).text 

    soup = BeautifulSoup(data,"html.parser")
  
    tables = pd.read_html(url)
    
    ## confirm table 1 is elite men
    table_soup=soup.find_all('table')
    if str(table_soup[0])[0:100].find('c-elitem') != -1:
        elite_men=tables[0]
        elite_men['sex']='Men'
        together=elite_men
    # check if table 2 is elite women
    if len(tables)==2:
        if str(table_soup[1])[0:100].find('c-elitef') != -1:
                elite_women=tables[1]
                elite_women['sex']='Women'
                together=pd.concat([elite_men,elite_women],axis=0)
    # check if table 3 is elite women
    if len(tables)>2:
        if str(table_soup[2])[0:100].find('c-elitef') != -1:
                    elite_women=tables[2]
                    elite_women['sex']='Women'
                    together=pd.concat([elite_men,elite_women],axis=0)
    
    

    
    together['rider_country']=""
    together.index=together['Name']
    together['Date']=date
    together['track_name']=track_name
    together['race_country']=race_country


    i=0
    for rider in together['Name']:
        try:
            together.loc[rider,'rider_country']=country_of_origin(rider,soup)
        except:
           print('could not find match for name: rider')
    
    return(together[['Name','Run 1','sex','rider_country','race_country','Date',"track_name"]])

Now we use these functions that we defined to work through every row of the all_races data, access the url and extract the data that we want. 

In [10]:
all_race_data=pd.DataFrame()
#220

for i in range(220):
    try:
        one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
        all_race_data=pd.concat([all_race_data,one_race],axis=0)
        print("SUCCESS","Race at", all_races['track_name'][i], "on", all_races['Date⇩'][i])
    except:
        print("PROBLEM: Race at", all_races['track_name'][i], "on", all_races['Date⇩'][i])
        traceback.print_exc()


could not find match for name: rider
SUCCESS Race at Kirchzarten on 23rd Sep 1995
PROBLEM: Race at Kaprun on 1st Jan 1995


Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 36, in get_race_results
    tables = pd.read_html(url)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 1113, in read_html
    return _parse(
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 939, in _parse
    raise retained
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 919, in _parse
    tables = p.parse_tables()
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 239, in parse_tables
    tables = self.

PROBLEM: Race at Big Bear Lake on 1st Jan 1995


Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 36, in get_race_results
    tables = pd.read_html(url)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 1113, in read_html
    return _parse(
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 939, in _parse
    raise retained
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 919, in _parse
    tables = p.parse_tables()
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 239, in parse_tables
    tables = self.

PROBLEM: Race at Mont-Sainte-Anne on 1st Jan 1995


Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 36, in get_race_results
    tables = pd.read_html(url)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 1113, in read_html
    return _parse(
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 939, in _parse
    raise retained
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 919, in _parse
    tables = p.parse_tables()
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 239, in parse_tables
    tables = self.

PROBLEM: Race at Mt Snow on 1st Jan 1995


Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 36, in get_race_results
    tables = pd.read_html(url)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 1113, in read_html
    return _parse(
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 939, in _parse
    raise retained
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 919, in _parse
    tables = p.parse_tables()
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 239, in parse_tables
    tables = self.

PROBLEM: Race at Åre on 1st Jan 1995


Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 36, in get_race_results
    tables = pd.read_html(url)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 1113, in read_html
    return _parse(
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 939, in _parse
    raise retained
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 919, in _parse
    tables = p.parse_tables()
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 239, in parse_tables
    tables = self.

PROBLEM: Race at Cap d'Ail on 1st Jan 1995


Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 36, in get_race_results
    tables = pd.read_html(url)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 1113, in read_html
    return _parse(
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 939, in _parse
    raise retained
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 919, in _parse
    tables = p.parse_tables()
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 239, in parse_tables
    tables = self.

SUCCESS Race at Cairns on 22nd Sep 1996
SUCCESS Race at Hawaii on 7th Sep 1996
SUCCESS Race at Kaprun on 15th Aug 1996
SUCCESS Race at Les Gets on 10th Aug 1996
SUCCESS Race at Mont-Sainte-Anne on 15th Jun 1996
SUCCESS Race at Nevegal on 19th May 1996
SUCCESS Race at Panticosa on 12th May 1996
SUCCESS Race at Château d'Oex on 21st Sep 1997
SUCCESS Race at Kaprun on 15th Aug 1997
SUCCESS Race at Massanutten on 6th Jul 1997
SUCCESS Race at Mont-Sainte-Anne on 29th Jun 1997
could not find match for name: rider
SUCCESS Race at Sierra Nevada on 1st Jun 1997
SUCCESS Race at Nevegal on 25th May 1997
SUCCESS Race at Stellenbosch on 18th May 1997
SUCCESS Race at Mont-Sainte-Anne on 20th Sep 1998
SUCCESS Race at Arai Mountain on 30th Aug 1998
SUCCESS Race at Kaprun on 16th Aug 1998
SUCCESS Race at Sierra Nevada on 9th Aug 1998
SUCCESS Race at Snoqualmie Pass on 28th Jul 1998
could not find match for name: rider
SUCCESS Race at Big Bear Lake on 21st Jun 1998
could not find match for name: rider
S

Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 36, in get_race_results
    tables = pd.read_html(url)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 1113, in read_html
    return _parse(
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 939, in _parse
    raise retained
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 919, in _parse
    tables = p.parse_tables()
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 239, in parse_tables
    tables = self.

SUCCESS Race at Mont-Sainte-Anne on 30th Jun 2002
SUCCESS Race at Maribor on 9th Jun 2002
SUCCESS Race at Fort William on 2nd Jun 2002
could not find match for name: rider
SUCCESS Race at Kaprun on 14th Sep 2003
SUCCESS Race at Lugano on 7th Sep 2003
SUCCESS Race at Grouse Mountain on 13th Jul 2003
PROBLEM: Race at Telluride on 5th Jul 2003


Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 36, in get_race_results
    tables = pd.read_html(url)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 1113, in read_html
    return _parse(
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 939, in _parse
    raise retained
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 919, in _parse
    tables = p.parse_tables()
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 239, in parse_tables
    tables = self.

SUCCESS Race at Mont-Sainte-Anne on 29th Jun 2003
could not find match for name: rider
could not find match for name: rider
SUCCESS Race at Alpe d'Huez on 8th Jun 2003
could not find match for name: rider
could not find match for name: rider
SUCCESS Race at Fort William on 1st Jun 2003
could not find match for name: rider
SUCCESS Race at Livigno on 19th Sep 2004
SUCCESS Race at Les Gets on 11th Sep 2004
could not find match for name: rider
SUCCESS Race at Calgary on 4th Jul 2004
SUCCESS Race at Mont-Sainte-Anne on 27th Jun 2004
could not find match for name: rider
SUCCESS Race at Schladming on 20th Jun 2004
could not find match for name: rider
SUCCESS Race at Les Deux Alpes on 13th Jun 2004
could not find match for name: rider
could not find match for name: rider
could not find match for name: rider
SUCCESS Race at Fort William on 6th Jun 2004
could not find match for name: rider
SUCCESS Race at Fort William on 11th Sep 2005
SUCCESS Race at Livigno on 3rd Sep 2005
SUCCESS Race at Pila 

Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 36, in get_race_results
    tables = pd.read_html(url)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 1113, in read_html
    return _parse(
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 939, in _parse
    raise retained
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 919, in _parse
    tables = p.parse_tables()
  File "C:\Users\nt78066\anaconda3\lib\site-packages\pandas\io\html.py", line 239, in parse_tables
    tables = self.

SUCCESS Race at Leogang on 15th Jun 2014
SUCCESS Race at Fort William on 8th Jun 2014
SUCCESS Race at Cairns on 27th Apr 2014
SUCCESS Race at Pietermaritzburg on 12th Apr 2014
SUCCESS Race at Vallnord on 6th Sep 2015
PROBLEM: Race at Innsbruck on 29th Aug 2015


Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 60, in get_race_results
    together['rider_country']=""
UnboundLocalError: local variable 'together' referenced before assignment


could not find match for name: rider
SUCCESS Race at Val di Sole on 23rd Aug 2015
could not find match for name: rider
SUCCESS Race at Windham on 8th Aug 2015
could not find match for name: rider
SUCCESS Race at Mont-Sainte-Anne on 1st Aug 2015
could not find match for name: rider
SUCCESS Race at Lenzerheide on 5th Jul 2015
SUCCESS Race at Leogang on 14th Jun 2015
SUCCESS Race at Fort William on 7th Jun 2015
could not find match for name: rider
SUCCESS Race at Lourdes on 12th Apr 2015
SUCCESS Race at Val di Sole on 11th Sep 2016
could not find match for name: rider
could not find match for name: rider
could not find match for name: rider
SUCCESS Race at Vallnord on 4th Sep 2016
could not find match for name: rider
could not find match for name: rider
SUCCESS Race at Mont-Sainte-Anne on 6th Aug 2016
could not find match for name: rider
SUCCESS Race at Lenzerheide on 9th Jul 2016
could not find match for name: rider
SUCCESS Race at Leogang on 12th Jun 2016
could not find match for name: 

Traceback (most recent call last):
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\2198857620.py", line 6, in <module>
    one_race=get_race_results(all_races['url'][i],all_races['track_name'][i],all_races['race_country'][i], all_races['Date⇩'][i])
  File "C:\Users\nt78066\AppData\Local\Temp\ipykernel_11596\590882901.py", line 60, in get_race_results
    together['rider_country']=""
UnboundLocalError: local variable 'together' referenced before assignment


could not find match for name: rider
could not find match for name: rider
could not find match for name: rider
SUCCESS Race at Lourdes on 30th Apr 2017
could not find match for name: rider
SUCCESS Race at Lenzerheide on 9th Sep 2018
could not find match for name: rider
SUCCESS Race at La Bresse on 25th Aug 2018
could not find match for name: rider
SUCCESS Race at Mont-Sainte-Anne on 11th Aug 2018
could not find match for name: rider
SUCCESS Race at Vallnord on 15th Jul 2018
could not find match for name: rider
SUCCESS Race at Val di Sole on 7th Jul 2018
could not find match for name: rider
SUCCESS Race at Leogang on 10th Jun 2018
could not find match for name: rider
SUCCESS Race at Fort William on 3rd Jun 2018
could not find match for name: rider
SUCCESS Race at Lošinj on 22nd Apr 2018
could not find match for name: rider
SUCCESS Race at Snowshoe on 7th Sep 2019
SUCCESS Race at Mont-Sainte-Anne on 1st Sep 2019
could not find match for name: rider
SUCCESS Race at Lenzerheide on 10th Aug

How many results did we get?

In [190]:
all_race_data.shape

(32983, 7)

Exporting just the countries - this gets used to help find rider countries in the above script

In [253]:
countries=pd.DataFrame(all_race_data['rider_country'].unique())
countries.to_csv('data/nationalities.csv') 

exporting file

In [11]:
all_race_data.to_csv('data/race_results.csv') 