# Web Scraping for Fun and Profit
<img align="right" style="padding-right:10px;" src="figures_8/Hanshintigerslogo.png" width=200><br>
Collecting sports statistics is almost the "Hello World" of web scraping projects. Let's see if we can make it a little more interesting. 

**(Pointless backstory)**

My wife is from Japan and over the years I've learned that the Japanese people love baseball as much as, or possibly even more than, Americans.  We've gone to see the local pro team, the Hanshin Tigers (Japanese: 阪神タイガース), a couple of times when visiting family and I've always wondered how Japanese players compare to American. In this assignment we will answer part of that question by comparing batting statistics for teams in Japan's Central League to teams in the US National League West.

## Moneyball meet Sabermetrics

The 2011 movie Moneyball () shows how Oakland A's general manager, Billy Beane, used statistics to build a low-cost winning team in 2002. The use of statistical analysis to evaluate player and team performance is called **sabermetrics** and can trace its lineage to Earnshaw Cook's 1964 book, "Percentage Baseball" (Wikipedia)

We will be using a calculation from sabermetrics called **Base Runs** to evaluate team batting performance. Base runs uses several of the "on base" statistics for players or entire teams to estimate an offensive potential. The base runs calculation will be discussed in more detail below.

**_Your job:_** create a "base runs" column and calculate base runs for each team for each year statistics are available and then do comparative analysis and visualization to determine Japanese vs. American baseball. 

***

Moneyball: https://www.imdb.com/title/tt1210166/, https://en.wikipedia.org/wiki/Moneyball_(film)
Sabermetrics: https://en.wikipedia.org/wiki/Sabermetrics
SABR (Society for American Baseball Research) website: https://sabr.org/

## Data Source

We will be using https://www.baseball-reference.com/ to gather our statistics. Their data is very well organized and the HTML is well-labeled and easy to navigate.


### Japan Data

**Japan Central League:** https://www.baseball-reference.com/register/league.cgi?code=JPCL&class=Fgn

The Japan Central League is composed of 6 teams, **Chunichi Dragons, Hanshin Tigers, Hiroshima Carp, Yakult Swallows, Yokohama Bay Stars, Yomiuri Giants** (actually, some years had more teams), with data stretching back to 1950 organized into a table of links for teams for each year.

<img align="left" style="padding-right:10px;" src="figures_8/Japan_Central_League.png" width=800><br>

It should be a relatively easy task to scrape a list of links for each year (hint: think "dictionary")

### Batting Stats

Clicking a year link will take you to tables of league statistics for that year. To keep things simple, we will only scrape the second table -- **"League Batting"**.

<img style="padding-right:10px;" src="figures_8/2019_Japan_Central_Batting.png" width=800><br>
<br><br>
Collect each team's statistics from the table and ignore the League Totals at the bottom of the table.

## US Data

The US National League West is composed of six teams: **Arizona Diamondbacks, Colorado Rockies, Los Angeles Dodgers , San Diego Padres, San Francisco Giants.** 

1. It is considerably less straightforward to get team stats for US teams. The best place I found to get links for all six teams is at the bottom of the main page in the "Full Site Menu" as seen in the picture below.

<img align="left" style="padding-right:10px;" src="figures_8/MLB_Stats.png" width=200><br>

2. Following one of the team links will take you to a team page. Underneath the main table on that page is a link to "Batting" under "Year-by-year Stats".

<img align="left" style="padding-right:10px;" src="figures_8/Colorado_Rockies_Team_History.png" width=400><br>

Following that link will take you to the page with batting statistics organized by year:

<img align="left" style="padding-right:10px;" src="figures_8/Colorado_Rockies_Team_Yearly_Batting_Stats.png" width=400><br>

## Base Runs

After you have collected all the data for all the teams and all the years, it is time to create the **base runs** column. According to http://tangotiger.net/wiki_archive/Base_Runs.html (linked from the SABR site), the formula to calculate base runs is:

`A*B/(B + C) + D`

Where:

```
A = H + BB - HR
B = (1.4*SLG - .6*H - 3*HR + .1*BB)*1.02
C = AB - H
D = HR
```

<img style="padding-right:10px;" src="figures_8/Colorado_Rockies_Detail.png" width=800><br>

*Note: There is a discrepancy between labeling of columns at Baseball-Reference and the Tangotiger formula. I've adjusted the formula to match Baseball-Reference.

***

Using the information above, for 2019:

A = 1502 + 489 - 224 **= 1,767**<br>
B = (1.4 * .456 - .6 * 1502 - .1 * 489) * 1.02 **= -968.450832**<br>
C = 5660 - 1502 **= 4158**<br>
D = **224**<br>

A*B/(B + C) + D = 1,767 * -968.450832 / (-968.450832 + 4158) + 224

= -312.518652013

***

Unfortunately, one of the weaknesses of base runs is that the answer can come out negative -- clearly a team cannot score **negative** runs in a season so you will have to decide how to deal with this problem.

## A Word About HTML and Nested Tags

A couple of decades ago, the US government finished a project that defined in excruciating detail how government documents would be formatted. More accurately, it defined the language used to create the formatting. This language is called **SGML** - Standard Generalized Markup Language. SGML is actually a language that defines rules for creating markup languages and it is the basis for HTML, XML, BPML, CML, and hundreds of others. 

SGML specifies that markup tags can be nested within each other, which has proven problematic for both HTML designers and parsers alike. This nesting forms a barrier to HTML parsing for many pages that we will want to scrape, including the Baseball-Reference page.

<img style="padding-right:10px;" src="figures_8/nested.png" width=1000><br>

***

### Helpful Hints

This assignment consists of 3 major sections:<br>
1. Gather links to pages containing data by scraping tables.
2. Scrape data from HTML tables into Pandas tables.
3. Calculate new statistic and create visualizations.

In these sports pages, tables are buried under several levels of nested \<div> tags. Let's grab data off the hockey page to demonstrate. 

In [1]:
from gazpacho import get, Soup
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests

In [2]:
state = f'COL'
year = f'2016'
base = f'https://www.hockey-reference.com'
page = f'/teams/{state}/{year}.html'

In [3]:
url = f'{base}{page}'
html = get(url)
soup = Soup(html)

In [4]:
div = soup.find('div')

This gives us the top-level \<div> and everything inside. Now we can look for the table. NOTE: the `find()` method returns a list.

In [5]:
df = pd.read_html(str(div[0].find('table')))[0]

In [6]:
df

Unnamed: 0,Team,AvAge,GP,W,L,OL,PTS,PTS%,GF,GA,...,PPOA,PK%,SH,SHA,S,S%,SA,SV%,PDO,SO
0,Colorado Avalanche,28.3,82,39,39,4,82,0.5,212,240,...,258,80.23,7,9,2348,9.0,2649,0.909,100.5,5
1,League Average,28.0,82,41,32,9,91,0.556,219,219,...,255,81.34,6,6,2438,9.0,2438,0.91,,5


In this case, we were scraping the actual data so I converted it to a DataFrame. In the case of harvesting links, you probably want to use the raw list that the `find()` gives you and pull the link.

### Deliverable

Your job is to present a comparative analysis between Japanese Central League baseball teams and American National League West teams using the **"base run"** statistic calculated from data scraped from the Baseball-Reference website. This analysis should contain tables and visualizations to support an ultimate answer to which country's baseball teams are stronger. You are free to use any combination of requests, beautifulsoup, gazpacho, scrapy, etc. that you feel comfortable with to gather the data and any visualization library you want.  

<div class="alert alert-block alert-warning">
<b>Note:</b> As noted in the Zoom session this evening, we discovered that the American league batting tables are not easily parsed.  Here is a solution for working with these tables.
</div>

In [7]:
# url for Colorado Rockies Year-by-Year Batting
url_mlb="https://www.baseball-reference.com/teams/COL/batteam.shtml"

html = get(url_mlb)
soup = Soup(html)    

In [8]:
# parse the outter level 'div'
div = soup.find('div',{'id':'content'},mode='all')

In [9]:
# Reconstruct the html string to represent what we want it to look like
table = str(div[0].find('table', {'id':'yby_team_bat'})) + str(div[0].find('tbody')) + '</table>'

In [10]:
# read into a pd dataframe
df = pd.read_html(table)[0]

<div class="alert alert-block alert-warning">
<b>Note:</b> Notice the odd column name in the middle of the dataframe header. We might want to clean this up.
</div>

<div class="alert alert-info">
  <strong>Info!</strong> For those having trouble with the Japanese stats, use the functions below. Should also work for American teams.
</div>

In [11]:
import requests, bs4
import re

In [12]:
## from https://github.com/BenKite/baseball_data/blob/master/baseballReferenceScrape.py
##
## This function simply takes a url and provides the ids
## from the html tables that the code provided here can access.
## Using findTables is great for determining options for the
## pullTable function for the tableID argument.

def findTables(url):
    res = requests.get(url)
    ## The next two lines get around the issue with comments breaking the parsing.
    comm = re.compile("<!--|-->")
    soup = bs4.BeautifulSoup(comm.sub("", res.text), 'lxml')
    divs = soup.findAll('div', id = "content")
    divs = divs[0].findAll("div", id=re.compile("^all"))
    ids = []
    for div in divs:
        searchme = str(div.findAll("table"))
        x = searchme[searchme.find("id=") + 3: searchme.find(">")]
        x = x.replace("\"", "")
        if len(x) > 0:
            ids.append(x)
    return(ids)

## For example:
## findTables("http://www.baseball-reference.com/teams/KCR/2016.shtml")

In [13]:
findTables("http://www.baseball-reference.com/teams/COL/2016.shtml")

['team_batting',
 'team_pitching',
 'appearances',
 'coaches',
 'standard_fielding',
 'players_value_batting',
 'players_value_pitching']

In [14]:
## Pulls a single table from a url provided by the user.
## The desired table should be specified by tableID.
## This function is used in all functions that do more complicated pulls.

def pullTable(url, tableID):
    res = requests.get(url)
    ## Work around comments
    comm = re.compile("<!--|-->")
    soup = bs4.BeautifulSoup(comm.sub("", res.text), 'lxml')
    tables = soup.findAll('table', id = tableID)
    data_rows = tables[0].findAll('tr')
    data_header = tables[0].findAll('thead')
    data_header = data_header[0].findAll("tr")
    data_header = data_header[0].findAll("th")
    game_data = [[td.getText() for td in data_rows[i].findAll(['th','td'])]
        for i in range(len(data_rows))
        ]
    data = pd.DataFrame(game_data)
    header = []
    for i in range(len(data.columns)):
        header.append(data_header[i].getText())
    data.columns = header
    data = data.loc[data[header[0]] != header[0]]
    data = data.reset_index(drop = True)
    return(data)

## For example:
## url = "http://www.baseball-reference.com/teams/KCR/2016.shtml"
## pullTable(url, "team_batting")


In [15]:
NLWestTeams = ['COL','ARI','LAD','SDP','SFG'] 


In [16]:
#testing the creating of URL and match of Teams from a list

for team in NLWestTeams:
    url_base = "https://www.baseball-reference.com/teams/"
    addTeam = team
    url_end = "/batteam.shtml" 
    url = url_base + addTeam + url_end
    if url == url_mlb:
        print('MATCH', team, url)
    else:
        print('Non-Match',team)

MATCH COL https://www.baseball-reference.com/teams/COL/batteam.shtml
Non-Match ARI
Non-Match LAD
Non-Match SDP
Non-Match SFG


In [17]:
#Testing grabbing data from the site and adding it to a list then a DataFram
team = 'COL'                                                      #Testing with the Colorado Rockies
LoopDF = pd.DataFrame()
LoopList = []
for year in range(1999,2020):                                     #creating a list of years to loop through
    url_base = "https://www.baseball-reference.com/teams/"        #breaking the url down into parts so we can loop through teams later
    addTeam = team                                                
    url_end = '.shtml'                                            #saving the ending 
    url = url_base + addTeam + "/" + str(year) + url_end          #concatinating the url back together
    data = pullTable(url, "team_batting")                         #pull team batting table data from the url
    data['Year'] = year                                           #adding year and team to identify after everthing is combined 
    data['Team'] = team
    LoopList.append(data)                                         #appending the data we just collected to a list outside the loop
LoopDF = LoopDF.append(LoopList, ignore_index = True)                      #appending the list that was created from the loop into a DataFrame

# Looks like everything Worked. Lets try setting it up to pull Year-by-Year Team Batting from
# https://www.baseball-reference.com/teams/COL/batteam.shtml
# https://www.baseball-reference.com/teams/TEAM/batteam.shtml

In [18]:
NLWest = pd.DataFrame()                                         # Create an empty dataframe
teambatting = []                                                # Create a empty list will use to append data from the For loop
for team in NLWestTeams:                                        # Loop through each team in the NLwest
    url_base = 'https://www.baseball-reference.com/teams/'      # Breaking up URL so we can change the URL depending on which team we are pulling
    addTeam = team                                              
    url_end = '/batteam.shtml'
    url = url_base + addTeam + url_end                          # Pulling the URL together to pull from that page
    data = pullTable(url, "yby_team_bat")                       # Using PullTable to get the year by year table from the URL
    data['team'] = team                                         # adding team name to the list for ablity to ID easier in the DataFrame
    teambatting.append(data)                                    # Appending everything to teambatting
    
NLWest = NLWest.append(teambatting, ignore_index = True)        # Turning the Teambatting into a DataFrame


## Start to test and pull the Japan team stats

In [19]:
findTables('https://www.baseball-reference.com/register/league.cgi?code=JPCL&class=Fgn')

['lg_history']

In [20]:
from bs4 import BeautifulSoup
import requests
url_jpn = 'https://www.baseball-reference.com/register/league.cgi?code=JPCL&class=Fgn'
resp = requests.get(url_jpn)
soup = BeautifulSoup(resp.text, 'lxml')

len(soup.find_all('a'))
len(soup.find_all('table'))


1

In [21]:
jpn_urls = []    
for h in soup.find_all('td'):
    try:
        jpn_urls.append(h.find('a').attrs['href'])
    except:
        pass
jpn_urls

['/register/team.cgi?id=440abdcc',
 '/register/team.cgi?id=5b8c0aae',
 '/register/team.cgi?id=b758241e',
 '/register/team.cgi?id=7dcd3bed',
 '/register/team.cgi?id=104f07f5',
 '/register/team.cgi?id=8a52b102',
 '/register/team.cgi?id=6cd77290',
 '/register/team.cgi?id=81a0f195',
 '/register/team.cgi?id=132c15ba',
 '/register/team.cgi?id=b5237ebe',
 '/register/team.cgi?id=232bcf74',
 '/register/team.cgi?id=fe01330f',
 '/register/team.cgi?id=16498b57',
 '/register/team.cgi?id=2a90bed3',
 '/register/team.cgi?id=707590b1',
 '/register/team.cgi?id=5e91a021',
 '/register/team.cgi?id=f995e1f8',
 '/register/team.cgi?id=9c29067e',
 '/register/team.cgi?id=649851aa',
 '/register/team.cgi?id=84a8c055',
 '/register/team.cgi?id=56354bdd',
 '/register/team.cgi?id=5d636e2b',
 '/register/team.cgi?id=f2dc9b63',
 '/register/team.cgi?id=b6c7f7c1',
 '/register/team.cgi?id=25346c0b',
 '/register/team.cgi?id=e4bbc1de',
 '/register/team.cgi?id=af9de917',
 '/register/team.cgi?id=d09a2719',
 '/register/team.cgi

In [22]:
jpn_base = 'https://www.baseball-reference.com'
team_urls = [jpn_base + x for x in jpn_urls]       # Creating a list of all of the URLs i need to grab data from
test_url = team_urls[:2]                           # Creating a subset to use for testing
test_url

['https://www.baseball-reference.com/register/team.cgi?id=440abdcc',
 'https://www.baseball-reference.com/register/team.cgi?id=5b8c0aae']

In [23]:
# Working on pulling the team name and year from to append to the stats list
nameYear = []
url_jpn = 'https://www.baseball-reference.com/register/team.cgi?id=232bcf74'
resp = requests.get(url_jpn)
soup = BeautifulSoup(resp.text, 'lxml')
for h in soup.find_all('div',{'id':'meta'}):      # Pulling all divs where id = 'meta'
    for x in h.find_all('span'):                  # looping through the spans from the meta div and 
        nameYear.append(x.text)                   # appending the text to NameYear.
    

# print(teamname)
nameYear                                          # There is a blank in the list. need to append [:2] when putting the list together

['2009', 'Chunichi Dragons', '']

In [24]:
# creating list of column names to use in creating a dictionary with zip
jpn_col = ['Rk','Name','Age','G','PA','AB','R','H','2B','3B','HR','RBI','SB','CS','BB','SO','BA','OBP','SLG','OPS','TB','GDP','HBP','SH','SF','IBB','Notes','Year','TeamName']
jpnlist = []
for x in team_urls:                                       # Loop through all of the URLS
    nameYear = []                                         # First get the Name and Year of the team we are currently scraping
    resp = requests.get(x)                                # Request the team url
    soup = BeautifulSoup(resp.text, 'lxml')               # Request text from the URL
    for h in soup.find_all('div',{'id':'meta'}):          # Loop through all div's with id = meta
        for i in h.find_all('span'):                      # Loop through all the spans within the divs that were found
            nameYear.append(i.text)                       # append the text from the spans to nameYear
    pull = pullTable(x,'team_batting')                    # use pullTable to get the team batting table
    last = pull.last_valid_index()                        # pullTable pulls all players stats but we want the team totals which are in the past row so we need to find the last index
    last = pull.iloc[last]                                # use iloc to save the last index row
    last = last.append(pd.Series(nameYear[:2]))           # append the first to  values (year, teamName) to the end of last
    jpnlist.append(dict(zip(jpn_col,last)))               # zip header(Key) and last(Value) together into a dictionary


The Data Cleaning process is almost over. Just get the JPNlist into a dataframe and convert each column to the correct type and make a copy of each Dataframe for safety.

In [25]:
jpndf = pd.DataFrame.from_dict(jpnlist)
jpndf_copy = jpndf.copy()
jpndfTeam = jpndf['TeamName']

In [26]:
jpndf.drop(['Name','Notes','TeamName'], axis=1, inplace=True)

In [27]:
jpndf = jpndf.apply(lambda col:pd.to_numeric(col, errors='coerce'))  

In [28]:
jpndf['TeamName'] = jpndfTeam

In [29]:
NLWestCopy = NLWest.copy                           #Making a copy of the dataframe just in case seomthing goes wrong
NLSave = NLWest[['team','Lg']]                     # Copying team and Lg to a different dataframe while i convert 
                                                   # everyting to a float or int and we dont need the League info bc every team is in the NLWest
NLWest.drop(['team','Lg'], axis=1, inplace=True)   # dropping columns in prep for the next step

In [30]:
NLWest = NLWest.apply(lambda col:pd.to_numeric(col, errors='coerce'))  
# converting data types objects to float / ints 
# https://stackoverflow.com/questions/28277137/how-to-convert-datatypeobject-to-float64-in-python

In [31]:
NLWest['Team'] = NLSave['team']

In [32]:
def BaseRunCal(DF):
    A = DF['H'] + DF['BB'] - DF['HR']
    B =((1.4*DF['SLG']) - (.6*DF['H']) - (3*DF['HR']) + (.1*DF['BB'])) * 1.02
    C = DF['AB'] - DF['H']
    D = DF['HR']
    DF = A * B / (B + C) + D
    return DF

In [82]:
NLWest['BaseRuns'] = BaseRunCal(NLWest)

In [83]:
jpndf['BaseRuns'] = BaseRunCal(jpndf)

In [84]:
NLWBaseRunsMin = 1167.44
NLWBaseRunsMax = 215

In [85]:
NLWest['BaseRunsShift'] = NLWest['BaseRuns'] + NLWBaseRunsMin

In [86]:
from sklearn.preprocessing import MinMaxScaler

In [87]:
mms = MinMaxScaler()

NLMinMax = NLWest[['BaseRuns','BaseRunsShift']]
NLMinMax[['BaseRuns','BaseRunsShift']] = scaler.fit_transform(NLMinMax[['BaseRuns','BaseRunsShift']])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.loc._setitem_with_indexer((slice(None), indexer), value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_array(key, value)


In [123]:
jpndf.min()
jpndf['BaseRunsShift'] = jpndf['BaseRuns'] + NLWBaseRunsMin
jpnBRdf =jpndf[['BaseRuns','BaseRunsShift']]
jpnBRdf[['BaseRuns','BaseRunsShift']] = scaler.fit_transform(jpnBRdf[['BaseRuns','BaseRunsShift']])
jpndf['BRScaled'] = jpnBRdf['BaseRuns']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.loc._setitem_with_indexer((slice(None), indexer), value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_array(key, value)


In [96]:
NLMinMax.rename(columns={"BaseRuns": "BRScaled", "BaseRunsShift": "BRShiftScaled"}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(**kwargs)


In [98]:
NLWest['BRScaled'] = NLMinMax['BRScaled']

In [135]:
NLWest.groupby(['Team', 'Year']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,W,L,Finish,R/G,G,PA,AB,R,H,2B,...,OBP,SLG,OPS,E,DP,Fld%,BatAge,BaseRuns,BaseRunsShift,BRScaled
Team,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
ARI,1998,65,97,5,4.10,162,6116,5491,665,1353,235,...,0.314,0.393,0.707,100,125,0.984,27.8,-581.307411,586.132589,0.616547
ARI,1999,100,62,1,5.60,162,6415,5658,908,1566,289,...,0.347,0.459,0.805,104,132,0.983,30.0,-976.446581,190.993419,0.200902
ARI,2000,85,77,3,4.89,162,6241,5527,792,1466,282,...,0.333,0.429,0.763,107,138,0.982,30.8,-768.916820,398.523180,0.419202
ARI,2001,92,70,1,5.05,162,6349,5595,818,1494,284,...,0.341,0.442,0.783,84,148,0.986,31.9,-861.192251,306.247749,0.322138
ARI,2002,98,64,1,5.06,162,6318,5508,819,1471,283,...,0.346,0.423,0.769,89,116,0.985,31.7,-802.235009,365.204991,0.384154
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SFG,2015,84,78,2,4.30,162,6153,5565,696,1486,288,...,0.326,0.406,0.732,78,145,0.987,28.9,-688.846157,478.593843,0.503428
SFG,2016,87,75,2,4.41,162,6271,5565,715,1437,280,...,0.329,0.398,0.728,72,136,0.988,29.1,-656.771862,510.668138,0.537166
SFG,2017,64,98,5,3.94,162,6137,5551,639,1382,290,...,0.309,0.380,0.689,87,127,0.985,29.5,-558.899642,608.540358,0.640118
SFG,2018,73,89,4,3.72,162,6113,5541,603,1324,255,...,0.300,0.368,0.667,97,160,0.984,29.8,-497.129516,670.310484,0.705094


In [158]:
Stats = NLWest.groupby(['Year','Team']).sum()

In [159]:
Stats['BRScaled']

Year  Team
1883  SFG     0.933181
1884  LAD     1.000000
      SFG     0.863982
1885  LAD     0.920623
      SFG     0.829520
                ...   
2019  ARI     0.463084
      COL     0.354269
      LAD     0.268483
      SDP     0.605445
      SFG     0.661788
Name: BRScaled, Length: 373, dtype: float64