## Goal

Scrape the PGA Tour public facing website to create CSVs. This notebook is only for FedExCup Standings over all years. End goal is to have these CSVs be used to create a relational database that can be queried to conduct basic analysis of golfers on the PGA Tour and to compare to historical record. 

In [1]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
import re
import os.path
from os import path

In [2]:
# Statistic to scrape

stat_cat = 'Off the tee'
stat_id = '101'

In [3]:
# Import tournament list
df_tourney = pd.read_csv('../data/stat_id_{stat_id}_tournaments.csv'.format(stat_id=stat_id))
df_tourney.head()

Unnamed: 0,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,t044,t044,t045,t044,t044,t001,t057,t495,t060,t060,...,t045,t045,t060,t060,t060,t060,t060,t060,t060,t003
1,t040,t040,t044,t045,t045,t044,t001,t057,t057,t495,...,t493,t493,t028,t028,t028,t028,t028,t028,t028,t004
2,t031,t041,t042,t040,t040,t045,t041,t060,t001,t045,...,t464,t464,t505,t505,t505,t505,t505,t505,t027,t002
3,t041,t039,t041,t041,t041,t040,t045,t001,t045,t041,...,t047,t047,t027,t027,t027,t027,t027,t027,t013,t006
4,t039,t042,t040,t042,t042,t041,t044,t045,t495,t057,...,t060,t060,t013,t013,t013,t013,t013,t013,t472,t016


## Driving Distance - All years, all tournaments

In [4]:

for year in range(1980, 2021):

    for tournament in df_tourney[str(year)]:
        
        try:
            
            print(tournament)
            print(year)

            ### Get Title of Stats Page ###
            url = "https://www.pgatour.com/content/pgatour/stats/stat.{stat_id}.y{year}.eon.{tournament}.html".format(stat_id=stat_id, tournament=tournament, year=year)
            html = urlopen(url)
            soup = BeautifulSoup(html)
            bread_crumbs = soup.findAll('div', {'class' : 'breadcrumbs'})
            title = [crumb.text for crumb in bread_crumbs][0][87:].strip()
            print(title)

            ### Get tournament name ###
            url = "https://www.pgatour.com/content/pgatour/stats/stat.{stat_id}.y{year}.eon.{tournament}.html".format(stat_id=stat_id, year=year, tournament=tournament)
            html = urlopen(url)
            soup = BeautifulSoup(html)

            tourney_container = soup.findAll("div", {"class": "with-chevron"})[2]
            tourney_container
            tag = tourney_container.findAll("option", {"value" : tournament})[0]
            tourney_name = tag.text
            print(tourney_name)
            print(' ')

            ### Get column headers ###
            html = urlopen(url)
            soup = BeautifulSoup(html)

            # Extract table header rows
            soup.findAll('tr', limit=2)[1].findAll('th')    

            # Store column headers
            column_headers  = [th.getText() for th in 
                                            soup.findAll('tr', limit=2)[1].findAll('th')]

            ### Get data for dataframe ###

            data_rows = soup.findAll('tr')[2:]  # skip the first 2 header rows

            player_data = []  # create an empty list to hold all the data (in lists)

            for p in range(len(data_rows)):  # for each table row
                player_row = []  # create an empty list for each player

            # for each table data element from each table row
                for td in data_rows[p].findAll('td'):        
                    # get the text content and append to the player_row 
                    player_row.append(td.getText())        

                # then append each player to the player_data matrix
                player_data.append(player_row)

            # Convert list of lists to DF
            df = pd.DataFrame(player_data, columns=column_headers)

            # Add features
            df['YEAR'] = year
            df['Tournament'] = tourney_name

            ### Data Cleaning ###

            # Convert to numerics
            df = df.convert_objects(convert_numeric=True)

            # Clean player names
            df['PLAYER NAME'] = [player.replace('\n','') for player in df['PLAYER NAME']]

            # Drop RANK LAST WEEK
            df.drop('RANK LAST WEEK', axis=1, inplace=True)
            df.drop(df.columns[0], axis=1, inplace=True)


            ### Export ###
            if not os.path.isfile('../data/{stat_cat}_{title}.csv'.format(stat_cat=stat_cat, title=title)):
                print('File does not exist --> CREATING')
                df.to_csv('../data/{stat_cat}_{title}.csv'.format(stat_cat=stat_cat, title=title), header='column_names')

            else: 
                print('File exists --> appending data to file')
                df.to_csv('../data/{stat_cat}_{title}.csv'.format(stat_cat=stat_cat, title=title), mode='a', header=False)
        
        except:
            pass

t044
1980
Driving Distance
Pensacola Open
 


For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.


File does not exist --> CREATING
t040
1980
Driving Distance
Southern Open
 
File exists --> appending data to file
t031
1980
Driving Distance
Anheuser-Busch Golf Classic
 
File exists --> appending data to file
t041
1980
Driving Distance
San Antonio Texas Open
 
File exists --> appending data to file
t039
1980
Driving Distance
Hall Of Fame
 
File exists --> appending data to file
t038
1980
Driving Distance
Pleasant Valley Jimmy Fund Classic
 
File exists --> appending data to file
t037
1980
Driving Distance
B.C. Open
 
File exists --> appending data to file
t035
1980
Driving Distance
Buick-Goodwrench Open
 
File exists --> appending data to file
t036
1980
Driving Distance
World Series of Golf
 
File exists --> appending data to file
t027
1980
Driving Distance
Manufacturers Hanover Westchester Classic
 
File exists --> appending data to file
t033
1980
Driving Distance
PGA Championship
 
File exists --> appending data to file
t181
1980
Driving Distance
IVB-Golf Classic
 
File exists --> 

Driving Distance
Texas Open
 
File exists --> appending data to file
t040
1982
Driving Distance
Southern Open
 
File exists --> appending data to file
t039
1982
Driving Distance
Hall Of Fame
 
File exists --> appending data to file
t038
1982
Driving Distance
Bank of Boston Classic
 
File exists --> appending data to file
t037
1982
Driving Distance
B.C. Open
 
File exists --> appending data to file
t036
1982
Driving Distance
World Series of Golf
 
File exists --> appending data to file
t035
1982
Driving Distance
Buick Open
 
File exists --> appending data to file
t034
1982
Driving Distance
Sammy Davis Jr.-Greater Hartford Open
 
File exists --> appending data to file
t033
1982
Driving Distance
PGA Championship
 
File exists --> appending data to file
t032
1982
Driving Distance
Canadian Open
 
File exists --> appending data to file
t031
1982
Driving Distance
Anheuser-Busch Golf Classic
 
File exists --> appending data to file
t030
1982
Driving Distance
Miller High Life QCO
 
File exists 

File exists --> appending data to file
t029
1984
Driving Distance
Greater Milwaukee Open
 
File exists --> appending data to file
t038
1984
Driving Distance
Bank of Boston Classic
 
File exists --> appending data to file
t037
1984
Driving Distance
B.C. Open
 
File exists --> appending data to file
t036
1984
Driving Distance
NEC World Series of Golf
 
File exists --> appending data to file
t033
1984
Driving Distance
PGA Championship
 
File exists --> appending data to file
t035
1984
Driving Distance
Buick Open
 
File exists --> appending data to file
t025
1984
Driving Distance
Danny Thomas Memphis Classic
 
File exists --> appending data to file
t034
1984
Driving Distance
Sammy Davis Jr.-Greater Hartford Open
 
File exists --> appending data to file
t030
1984
Driving Distance
Miller High Life QCO
 
File exists --> appending data to file
t031
1984
Driving Distance
Anheuser-Busch Golf Classic
 
File exists --> appending data to file
t028
1984
Driving Distance
Western Open
 
File exists --

Federal Express St. Jude Classic
 
File exists --> appending data to file
t036
1986
Driving Distance
NEC World Series of Golf
 
File exists --> appending data to file
t056
1986
Driving Distance
The International
 
t033
1986
Driving Distance
PGA Championship
 
File exists --> appending data to file
t028
1986
Driving Distance
Western Open
 
File exists --> appending data to file
t035
1986
Driving Distance
Buick Open
 
File exists --> appending data to file
t030
1986
Driving Distance
Hardee's Golf Classic
 
File exists --> appending data to file
t031
1986
Driving Distance
Anheuser-Busch Golf Classic
 
File exists --> appending data to file
t034
1986
Driving Distance
Canon Sammy Davis Jr.-Greater Hartford Open
 
File exists --> appending data to file
t032
1986
Driving Distance
Canadian Open
 
File exists --> appending data to file
t022
1986
Driving Distance
Georgia-Pacific Atlanta Golf Classic
 
File exists --> appending data to file
t026
1986
Driving Distance
U.S. Open Championship
 
File

Driving Distance
Southern Open
 
File exists --> appending data to file
t037
1988
Driving Distance
B.C. Open
 
File exists --> appending data to file
t038
1988
Driving Distance
Bank of Boston Classic
 
File exists --> appending data to file
t029
1988
Driving Distance
Greater Milwaukee Open
 
File exists --> appending data to file
t032
1988
Driving Distance
Canadian Open
 
File exists --> appending data to file
t036
1988
Driving Distance
NEC World Series of Golf
 
File exists --> appending data to file
t055
1988
Driving Distance
Provident Classic
 
File exists --> appending data to file
t056
1988
Driving Distance
The International
 
t033
1988
Driving Distance
PGA Championship
 
File exists --> appending data to file
t025
1988
Driving Distance
Federal Express St. Jude Classic
 
File exists --> appending data to file
t035
1988
Driving Distance
Buick Open
 
File exists --> appending data to file
t034
1988
Driving Distance
Canon Sammy Davis Jr.-Greater Hartford Open
 
File exists --> append

Buick Southern Open
 
File exists --> appending data to file
t037
1990
Driving Distance
B.C. Open
 
File exists --> appending data to file
t032
1990
Driving Distance
Canadian Open
 
File exists --> appending data to file
t030
1990
Driving Distance
Hardee's Golf Classic
 
File exists --> appending data to file
t029
1990
Driving Distance
Greater Milwaukee Open
 
File exists --> appending data to file
t036
1990
Driving Distance
NEC World Series of Golf
 
File exists --> appending data to file
t055
1990
Driving Distance
Chattanooga Classic
 
File exists --> appending data to file
t056
1990
Driving Distance
The International
 
t033
1990
Driving Distance
PGA Championship
 
File exists --> appending data to file
t025
1990
Driving Distance
Federal Express St. Jude Classic
 
File exists --> appending data to file
t035
1990
Driving Distance
Buick Open
 
File exists --> appending data to file
t038
1990
Driving Distance
Bank of Boston Classic
 
File exists --> appending data to file
t031
1990
Driv

Driving Distance
B.C. Open
 
File exists --> appending data to file
t030
1992
Driving Distance
Hardee's Golf Classic
 
File exists --> appending data to file
t032
1992
Driving Distance
Canadian Open
 
File exists --> appending data to file
t029
1992
Driving Distance
Greater Milwaukee Open
 
File exists --> appending data to file
t036
1992
Driving Distance
NEC World Series of Golf
 
File exists --> appending data to file
t056
1992
Driving Distance
The International
 
t033
1992
Driving Distance
PGA Championship
 
File exists --> appending data to file
t035
1992
Driving Distance
Buick Open
 
File exists --> appending data to file
t034
1992
Driving Distance
Canon Greater Hartford Open
 
File exists --> appending data to file
t038
1992
Driving Distance
New England Classic
 
File exists --> appending data to file
t055
1992
Driving Distance
Chattanooga Classic
 
File exists --> appending data to file
t031
1992
Driving Distance
Anheuser-Busch Golf Classic
 
File exists --> appending data to fi

Driving Distance
B.C. Open
 
File exists --> appending data to file
t032
1994
Driving Distance
Bell Canadian Open
 
File exists --> appending data to file
t029
1994
Driving Distance
Greater Milwaukee Open
 
File exists --> appending data to file
t036
1994
Driving Distance
NEC World Series of Golf
 
File exists --> appending data to file
t056
1994
Driving Distance
Sprint International
 
t033
1994
Driving Distance
PGA Championship
 
File exists --> appending data to file
t035
1994
Driving Distance
Buick Open
 
File exists --> appending data to file
t025
1994
Driving Distance
Federal Express St. Jude Classic
 
File exists --> appending data to file
t038
1994
Driving Distance
New England Classic
 
File exists --> appending data to file
t054
1994
Driving Distance
Deposit Guaranty Golf Classic
 
File exists --> appending data to file
t031
1994
Driving Distance
Anheuser-Busch Golf Classic
 
File exists --> appending data to file
t028
1994
Driving Distance
Motorola Western Open
 
File exists -

Driving Distance
Greater Vancouver Open
 
File exists --> appending data to file
t036
1996
Driving Distance
NEC World Series of Golf
 
File exists --> appending data to file
t056
1996
Driving Distance
Sprint International
 
t033
1996
Driving Distance
PGA Championship
 
File exists --> appending data to file
t035
1996
Driving Distance
Buick Open
 
File exists --> appending data to file
t038
1996
Driving Distance
CVS Charity Classic
 
File exists --> appending data to file
t100
1996
Driving Distance
The Open Championship
 
t054
1996
Driving Distance
Deposit Guaranty Golf Classic
 
File exists --> appending data to file
t031
1996
Driving Distance
Michelob Championship at Kingsmill
 
File exists --> appending data to file
t028
1996
Driving Distance
Motorola Western Open
 
File exists --> appending data to file
t034
1996
Driving Distance
Canon Greater Hartford Open
 
File exists --> appending data to file
t025
1996
Driving Distance
FedEx St. Jude Classic
 
File exists --> appending data to 

File exists --> appending data to file
t056
1998
Driving Distance
Sprint International
 
t033
1998
Driving Distance
PGA Championship
 
File exists --> appending data to file
t035
1998
Driving Distance
Buick Open
 
File exists --> appending data to file
t025
1998
Driving Distance
FedEx St. Jude Classic
 
File exists --> appending data to file
t038
1998
Driving Distance
CVS Charity Classic
 
File exists --> appending data to file
t100
1998
Driving Distance
The Open Championship
 
t054
1998
Driving Distance
Deposit Guaranty Golf Classic
 
File exists --> appending data to file
t030
1998
Driving Distance
Quad City Classic
 
File exists --> appending data to file
t034
1998
Driving Distance
Canon Greater Hartford Open
 
File exists --> appending data to file
t028
1998
Driving Distance
Motorola Western Open
 
File exists --> appending data to file
t026
1998
Driving Distance
U.S. Open Championship
 
File exists --> appending data to file
t027
1998
Driving Distance
Buick Classic
 
File exists -

File exists --> appending data to file
t032
2000
Driving Distance
Bell Canadian Open
 
File exists --> appending data to file
t502
2000
Driving Distance
Air Canada Championship
 
File exists --> appending data to file
t476
2000
Driving Distance
World Golf Championships-NEC Invitational
 
File exists --> appending data to file
t472
2000
Driving Distance
Reno-Tahoe Open
 
File exists --> appending data to file
t033
2000
Driving Distance
PGA Championship
 
File exists --> appending data to file
t035
2000
Driving Distance
Buick Open
 
File exists --> appending data to file
t056
2000
Driving Distance
The International Presented by Qwest
 
t030
2000
Driving Distance
John Deere Classic
 
File exists --> appending data to file
t100
2000
Driving Distance
The Open Championship
 
t037
2000
Driving Distance
B.C. Open
 
File exists --> appending data to file
t029
2000
Driving Distance
Greater Milwaukee Open
 
File exists --> appending data to file
t028
2000
Driving Distance
Advil Western Open
 
Fil

Tampa Bay Classic presented by Buick
 
File exists --> appending data to file
t473
2002
Driving Distance
World Golf Championships-American Express Championship
 
File exists --> appending data to file
t474
2002
Driving Distance
SEI Pennsylvania Classic
 
File exists --> appending data to file
t032
2002
Driving Distance
Bell Canadian Open
 
File exists --> appending data to file
t502
2002
Driving Distance
Air Canada Championship
 
File exists --> appending data to file
t472
2002
Driving Distance
Reno-Tahoe Open
 
File exists --> appending data to file
t476
2002
Driving Distance
World Golf Championships-NEC Invitational
 
File exists --> appending data to file
t033
2002
Driving Distance
PGA Championship
 
File exists --> appending data to file
t035
2002
Driving Distance
Buick Open
 
File exists --> appending data to file
t056
2002
Driving Distance
The INTERNATIONAL Presented by Qwest
 
t030
2002
Driving Distance
John Deere Classic
 
File exists --> appending data to file
t100
2002
Drivin

t060
2004
Driving Distance
THE TOUR Championship presented by Coca-Cola
 
File exists --> appending data to file
t475
2004
Driving Distance
Chrysler Championship
 
File exists --> appending data to file
t045
2004
Driving Distance
FUNAI Classic at the WALT DISNEY WORLD Resort
 
File exists --> appending data to file
t013
2004
Driving Distance
Chrysler Classic of Greensboro
 
File exists --> appending data to file
t047
2004
Driving Distance
Michelin Championship at Las Vegas
 
File exists --> appending data to file
t473
2004
Driving Distance
World Golf Championships-American Express Championship
 
File exists --> appending data to file
t054
2004
Driving Distance
Southern Farm Bureau Classic
 
File exists --> appending data to file
t474
2004
Driving Distance
84 LUMBER Classic
 
File exists --> appending data to file
t041
2004
Driving Distance
Valero Texas Open
 
File exists --> appending data to file
t032
2004
Driving Distance
Bell Canadian Open
 
File exists --> appending data to file
t5

File exists --> appending data to file
t023
2007
Driving Distance
the Memorial Tournament presented by Morgan Stanley
 
File exists --> appending data to file
t021
2007
Driving Distance
Crowne Plaza Invitational at Colonial
 
File exists --> appending data to file
t022
2007
Driving Distance
AT&T Classic
 
File exists --> appending data to file
t011
2007
Driving Distance
THE PLAYERS Championship
 
File exists --> appending data to file
t480
2007
Driving Distance
Wachovia Championship
 
File exists --> appending data to file
t019
2007
Driving Distance
EDS Byron Nelson Championship
 
File exists --> appending data to file
t018
2007
Driving Distance
Zurich Classic of New Orleans
 
File exists --> appending data to file
t012
2007
Driving Distance
Verizon Heritage
 
File exists --> appending data to file
t014
2007
Driving Distance
Masters Tournament
 
File exists --> appending data to file
t020
2007
Driving Distance
Shell Houston Open
 
File exists --> appending data to file
t473
2007
Drivin

File exists --> appending data to file
t030
2009
Driving Distance
John Deere Classic
 
File exists --> appending data to file
t471
2009
Driving Distance
AT&T National
 
File exists --> appending data to file
t034
2009
Driving Distance
Travelers Championship
 
File exists --> appending data to file
t026
2009
Driving Distance
U.S. Open Championship
 
File exists --> appending data to file
t025
2009
Driving Distance
St. Jude Classic presented by FedEx
 
File exists --> appending data to file
t023
2009
Driving Distance
the Memorial Tournament
 
File exists --> appending data to file
t021
2009
Driving Distance
Crowne Plaza Invitational at Colonial
 
File exists --> appending data to file
t019
2009
Driving Distance
HP Byron Nelson Championship
 
File exists --> appending data to file
t041
2009
Driving Distance
Valero Texas Open
 
File exists --> appending data to file
t011
2009
Driving Distance
THE PLAYERS Championship
 
File exists --> appending data to file
t480
2009
Driving Distance
Quail

File exists --> appending data to file
t490
2011
Driving Distance
The Greenbrier Classic
 
File exists --> appending data to file
t032
2011
Driving Distance
RBC Canadian Open
 
File exists --> appending data to file
t100
2011
Driving Distance
The Open Championship
 
File exists --> appending data to file
t054
2011
Driving Distance
Viking Classic
 
File exists --> appending data to file
t030
2011
Driving Distance
John Deere Classic
 
File exists --> appending data to file
t471
2011
Driving Distance
AT&T National
 
File exists --> appending data to file
t034
2011
Driving Distance
Travelers Championship
 
File exists --> appending data to file
t026
2011
Driving Distance
U.S. Open
 
File exists --> appending data to file
t025
2011
Driving Distance
FedEx St. Jude Classic
 
File exists --> appending data to file
t023
2011
Driving Distance
the Memorial Tournament presented by Nationwide Insurance
 
File exists --> appending data to file
t019
2011
Driving Distance
HP Byron Nelson Championship


File exists --> appending data to file
t472
2013
Driving Distance
Reno-Tahoe Open
 
File exists --> appending data to file
t032
2013
Driving Distance
RBC Canadian Open
 
File exists --> appending data to file
t100
2013
Driving Distance
The Open Championship
 
File exists --> appending data to file
t054
2013
Driving Distance
Sanderson Farms Championship
 
File exists --> appending data to file
t030
2013
Driving Distance
John Deere Classic
 
File exists --> appending data to file
t490
2013
Driving Distance
The Greenbrier Classic
 
File exists --> appending data to file
t471
2013
Driving Distance
AT&T National
 
File exists --> appending data to file
t034
2013
Driving Distance
Travelers Championship
 
File exists --> appending data to file
t026
2013
Driving Distance
U.S. Open
 
File exists --> appending data to file
t025
2013
Driving Distance
FedEx St. Jude Classic
 
File exists --> appending data to file
t023
2013
Driving Distance
the Memorial Tournament presented by Nationwide Insurance

PGA Championship
 
File exists --> appending data to file
t476
2015
Driving Distance
World Golf Championships-Bridgestone Invitational
 
File exists --> appending data to file
t472
2015
Driving Distance
Barracuda Championship
 
File exists --> appending data to file
t471
2015
Driving Distance
Quicken Loans National
 
File exists --> appending data to file
t032
2015
Driving Distance
RBC Canadian Open
 
File exists --> appending data to file
t100
2015
Driving Distance
The Open Championship
 
File exists --> appending data to file
t518
2015
Driving Distance
Barbasol Championship
 
File exists --> appending data to file
t030
2015
Driving Distance
John Deere Classic
 
File exists --> appending data to file
t490
2015
Driving Distance
The Greenbrier Classic
 
File exists --> appending data to file
t034
2015
Driving Distance
Travelers Championship
 
File exists --> appending data to file
t026
2015
Driving Distance
U.S. Open
 
File exists --> appending data to file
t025
2015
Driving Distance
Fe

BMW Championship
 
File exists --> appending data to file
t505
2017
Driving Distance
Dell Technologies Championship
 
File exists --> appending data to file
t027
2017
Driving Distance
THE NORTHERN TRUST
 
File exists --> appending data to file
t013
2017
Driving Distance
Wyndham Championship
 
File exists --> appending data to file
t033
2017
Driving Distance
PGA Championship
 
File exists --> appending data to file
t476
2017
Driving Distance
World Golf Championships-Bridgestone Invitational
 
File exists --> appending data to file
t472
2017
Driving Distance
Barracuda Championship
 
File exists --> appending data to file
t032
2017
Driving Distance
RBC Canadian Open
 
File exists --> appending data to file
t100
2017
Driving Distance
The Open Championship
 
File exists --> appending data to file
t518
2017
Driving Distance
Barbasol Championship
 
File exists --> appending data to file
t030
2017
Driving Distance
John Deere Classic
 
File exists --> appending data to file
t490
2017
Driving Di

File exists --> appending data to file
t003
2019
Driving Distance
Waste Management Phoenix Open
 
File exists --> appending data to file
t004
2019
Driving Distance
Farmers Insurance Open
 
File exists --> appending data to file
t002
2019
Driving Distance
Desert Classic
 
File exists --> appending data to file
t006
2019
Driving Distance
Sony Open in Hawaii
 
File exists --> appending data to file
t016
2019
Driving Distance
Sentry Tournament of Champions
 
File exists --> appending data to file
t493
2019
Driving Distance
The RSM Classic
 
File exists --> appending data to file
t457
2019
Driving Distance
Mayakoba Golf Classic
 
File exists --> appending data to file
t047
2019
Driving Distance
Shriners Hospitals for Children Open
 
File exists --> appending data to file
t489
2019
Driving Distance
World Golf Championships-HSBC Champions
 
t054
2019
Driving Distance
Sanderson Farms Championship
 
t521
2019
Driving Distance
THE CJ CUP @ NINE BRIDGES
 
t494
2019
Driving Distance
CIMB Classic
 