# Predicting Baseball Salaries, Part II

I previously worked on a simple regression model to predict baseball player salaries where I collected player team, position and salary for the 2019 season. It was a simple exercise in using decision trees and different ensemble methods to predict salaries, but I wanted to go back adn see if I included more stats, specifically the previous season's hitting statistics, could I get closer to predicting salary?

My first step was to load in the libraries necessary and then my previously collected salary data from [USA Today](https://www.usatoday.com/sports/mlb/salaries/).

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import time

In [2]:
pd.set_option('display.max_columns', None)

In [3]:
salary = pd.read_pickle('salary_info')

In [4]:
salary.head()

Unnamed: 0,name,team,position,salary
0,MaxScherzer,WSH,SP,42142857
1,StephenStrasburg,WSH,SP,36428571
2,MikeTrout,LAA,CF,34083333
3,ZackGreinke,ARI,SP,32421884
4,DavidPrice,BOS,SP,31000000


Now that I have those salaraies, I want to get the 2018 hitting stats. I'm going to collect my data from [MLB's site](https://www.mlb.com/stats/2018) for each player and then use BeautifulSoup to parse the HTML.

In [5]:
url = 'https://www.mlb.com/stats/2018'
html = requests.get(url)
html

<Response [200]>

In [6]:
soup = BeautifulSoup(html.content, 'html.parser')

In [81]:
name_container = soup.find('th', {'data-row' : 0})
name_container

<th class="pinned-col-3lxtuFnc col-group-start-sa9unvY0 number-aY5arzrB first-col-3aGPCzvr is-table-pinned-1WfPW2jT" data-col="0" data-row="0" id="tb-1913-body-row0" scope="row"><div class="custom-cell-wrapper-34Cjf9P0"><div class="index-3cdMSKi7">1</div><div class="value-wrapper-1W5GYs5E"><div class="top-wrapper-1NLTqKbE"><div><a aria-label="Mike Trout" class="bui-link" href="/player/545361"><span class="full-3fV3c9pF">Mike</span><span class="short-3OJ0bTju">M Trout</span><span class="full-3fV3c9pF">Trout</span></a></div><div class="position-28TbwVOg">CF</div></div></div></div><div class="placeholder-wrapper-bEG1UFFP"><div class="index-3cdMSKi7">1</div><div><span class="bui-skeleton"><span class="skeleton-row-2cL12jX9" style="background-color:#eee;background-image:linear-gradient(90deg, #eee, #F5F5F5, #eee);border-radius:50%;width:42px;height:42px">‌</span></span></div><div class="placeholder-content-2l2UMerJ"><div><span class="bui-skeleton"><span class="skeleton-row-2cL12jX9" style="

In [70]:
names = container.findAll('span', class_ = 'full-3fV3c9pF')
names

[<span class="full-3fV3c9pF">Mike</span>,
 <span class="full-3fV3c9pF">Trout</span>]

In [75]:
for i in range(len(names)):
    print(names[i].get_text())

Mike
Trout


Get Name

In [82]:
first = names[0].get_text()
last = names[1].get_text()
print(first, last)

Mike Trout


Get Position

In [93]:
name_container.find('div', class_ = 'position-28TbwVOg').get_text()

'CF'

Get Team

In [105]:
soup.find('td', {'data-col': '1', 'data-row': '0'}).get_text()

'LAA'

Get Games

In [106]:
soup.find('td', {'data-col': '2', 'data-row': '0'}).get_text()

'140'

Get At Bats

In [107]:
soup.find('td', {'data-col': '3', 'data-row': '0'}).get_text()

'471'

Get Runs

In [109]:
soup.find('td', {'data-col': '4', 'data-row': '0'}).get_text()

'101'

Get Hits

In [111]:
soup.find('td', {'data-col': '5', 'data-row': '0'}).get_text()

'147'

Get Doubles

In [113]:
soup.find('td', {'data-col': '6', 'data-row': '0'}).get_text()

'24'

Get Triples

In [114]:
soup.find('td', {'data-col': '7', 'data-row': '0'}).get_text()

'4'

Get Homeruns

In [115]:
soup.find('td', {'data-col': '8', 'data-row': '0'}).get_text()

'39'

Get RBIs

In [116]:
soup.find('td', {'data-col': '9', 'data-row': '0'}).get_text()

'79'

Get Walks

In [117]:
soup.find('td', {'data-col': '10', 'data-row': '0'}).get_text()

'122'

Get Strikeouts

In [118]:
soup.find('td', {'data-col': '11', 'data-row': '0'}).get_text()

'124'

Get Stolen Bases

In [119]:
soup.find('td', {'data-col': '12', 'data-row': '0'}).get_text()

'24'

Get Caught Stealing

In [120]:
soup.find('td', {'data-col': '13', 'data-row': '0'}).get_text()

'2'

Get Batting Average

In [121]:
soup.find('td', {'data-col': '14', 'data-row': '0'}).get_text()

'.312'

Get On Base Percentage

In [122]:
soup.find('td', {'data-col': '15', 'data-row': '0'}).get_text()

'.460'

Get Slugging Percentage

In [123]:
soup.find('td', {'data-col': '16', 'data-row': '0'}).get_text()

'.628'

Get On Base Plus Slugging Percentage

In [124]:
soup.find('td', {'data-col': '17', 'data-row': '0'}).get_text()

'1.088'

In [None]:
df = pd.DataFrame(columns = ['FirstName', 'LastName', 'FullName', 'Position', 'Team', 'Games', 'At_Bats', 'Runs', 
                             'Hits', 'Doubles', 'Triples', 'Homeruns', 'RBIs', 'Walks', 'Strikeouts', 'StolenBases', 
                            'CaughtStealing'])