The idea here is that I'll pull sports stats and use the most recent data to update my predictions for each baseball player's "Rest of year" outcomes.

For example, suppose that the original prediction for the number of home runs that Kris Bryant hits is 25. Now, suppose he has hit 35 home runs already this year. What is my prediction for the number of home runs he will hit over the rest of the season (or for the season total)?



In [1]:
from ohmysportsfeedspy import MySportsFeeds
import pandas as pd
from pandas.io.json import json_normalize  
import info

In [2]:
# !pip install ohmysportsfeedspy

In [3]:
msf = MySportsFeeds(version="1.2")

In [4]:
msf.authenticate(info.token, info.pw)

In [5]:
output = msf.msf_get_data(league='mlb',season='2019-regular',feed='cumulative_player_stats',format='json',player='mike-trout')
# msf.msf_get_data()
output

{'cumulativeplayerstats': {'lastUpdatedOn': '2019-09-06 7:28:26 PM',
  'playerstatsentry': [{'player': {'ID': '10561',
     'LastName': 'Trout',
     'FirstName': 'Mike',
     'JerseyNumber': '27',
     'Position': 'CF'},
    'team': {'ID': '124',
     'City': 'Los Angeles',
     'Name': 'Angels',
     'Abbreviation': 'LAA'},
    'stats': {'GamesPlayed': {'@abbreviation': 'G', '#text': '132'},
     'AtBats': {'@category': 'Batting', '@abbreviation': 'AB', '#text': '468'},
     'Runs': {'@category': 'Batting', '@abbreviation': 'R', '#text': '110'},
     'Hits': {'@category': 'Batting', '@abbreviation': 'H', '#text': '138'},
     'SecondBaseHits': {'@category': 'Batting',
      '@abbreviation': '2B',
      '#text': '27'},
     'ThirdBaseHits': {'@category': 'Batting',
      '@abbreviation': '3B',
      '#text': '2'},
     'Homeruns': {'@category': 'Batting',
      '@abbreviation': 'HR',
      '#text': '45'},
     'EarnedRuns': {'@category': 'Batting',
      '@abbreviation': 'ER',
      '

In [8]:
# for items in x[0]:
#     print(items)
    
# print(x[0]['player'])
# print(x[0]['team'])
# print('stats')
# print(x[0]['stats'])

In [9]:
# x[0]['stats']

In [10]:
# json_normalize(x[0]['stats'])

In [11]:
# json_normalize(data=x, record_path='stats')# , meta=['player','team'])

The following websites might be helpful for parsing json:
- https://www.geeksforgeeks.org/pandas-parsing-json-dataset/
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html

# Now that we can get the data, let's figure out how to unpack it and what to do with it


We will hard code "Mike Trout" as the player of interest for now, but later when this is implemented as an app, the player of interest will take the form of a user input. 

The data that we will need to plug into the pymc3 algorithm are:
1. Number of home runs for each game.
2. Number of games remaining in the season.

With these two inputs, we will be able to build our model.

A third consideration that we may want to add later is "is injured?" as a boolean. If a player is injured, then we're going to need to discount the number of home runs by the number of games they are projected to meet.

## Best way to get the data:

The first thing I tried was to make an API request for each game in which a player was active. This was easy to get the number of home runs from, however, this complicated things because I didn't have a good way of knowing if a player was active during a given game, or even which dates to use in the API call. I really wanted to get a list of game-by-game stats since the beginning of the season in one API call.

Fortunately, it looks like this is possible! I think the `'player-gamelogs'` feed provides for such a table. Below, I demonstrate how to make the API request. Subsequently, I will unpack it and determine how to properly make a table for it.

In [12]:
#try to get the gamelog and not just cumulative stats
output = msf.msf_get_data(league='mlb', season='2019-regular', feed='player_gamelogs', format='json',
                          player='mike-trout', date='since-3-weeks-ago')

In [13]:
output

{'playergamelogs': {'lastUpdatedOn': '2019-09-06 8:57:24 PM',
  'gamelogs': [{'game': {'id': '49439',
     'date': '2019-08-16',
     'time': '10:07PM',
     'awayTeam': {'ID': '119',
      'City': 'Chicago',
      'Name': 'White Sox',
      'Abbreviation': 'CWS'},
     'homeTeam': {'ID': '124',
      'City': 'Los Angeles',
      'Name': 'Angels',
      'Abbreviation': 'LAA'},
     'location': 'Angel Stadium'},
    'player': {'ID': '10561',
     'LastName': 'Trout',
     'FirstName': 'Mike',
     'JerseyNumber': '27',
     'Position': 'CF'},
    'team': {'ID': '124',
     'City': 'Los Angeles',
     'Name': 'Angels',
     'Abbreviation': 'LAA'},
    'stats': {'AtBats': {'@category': 'Batting',
      '@abbreviation': 'AB',
      '#text': '4'},
     'Runs': {'@category': 'Batting', '@abbreviation': 'R', '#text': '1'},
     'Hits': {'@category': 'Batting', '@abbreviation': 'H', '#text': '1'},
     'SecondBaseHits': {'@category': 'Batting',
      '@abbreviation': '2B',
      '#text': '0'},

In [14]:
normies = output['playergamelogs']['gamelogs']

In [15]:
# json_normalize(data=normies[0])# , meta=['player','team'])

In [16]:
print([x for x in normies[0]])

['game', 'player', 'team', 'stats']


In [17]:
len(normies)

17

In [18]:
df = json_normalize(normies)

In [19]:
for x in df.columns:
    print(x)

game.awayTeam.Abbreviation
game.awayTeam.City
game.awayTeam.ID
game.awayTeam.Name
game.date
game.homeTeam.Abbreviation
game.homeTeam.City
game.homeTeam.ID
game.homeTeam.Name
game.id
game.location
game.time
player.FirstName
player.ID
player.JerseyNumber
player.LastName
player.Position
stats.Assists.#text
stats.Assists.@abbreviation
stats.Assists.@category
stats.AtBats.#text
stats.AtBats.@abbreviation
stats.AtBats.@category
stats.Batter2SeamFastballs.#text
stats.Batter2SeamFastballs.@abbreviation
stats.Batter2SeamFastballs.@category
stats.Batter4SeamFastballs.#text
stats.Batter4SeamFastballs.@abbreviation
stats.Batter4SeamFastballs.@category
stats.BatterChangeups.#text
stats.BatterChangeups.@abbreviation
stats.BatterChangeups.@category
stats.BatterCurveballs.#text
stats.BatterCurveballs.@abbreviation
stats.BatterCurveballs.@category
stats.BatterCutters.#text
stats.BatterCutters.@abbreviation
stats.BatterCutters.@category
stats.BatterDoublePlays.#text
stats.BatterDoublePlays.@abbreviation

In [20]:
df['stats.Homeruns.#text']

0     1
1     0
2     0
3     1
4     0
5     0
6     0
7     0
8     0
9     1
10    0
11    0
12    0
13    1
14    0
15    1
16    0
Name: stats.Homeruns.#text, dtype: object

In [23]:
type(df['stats.Homeruns.#text'][0])

str

In [27]:
observed = []
for x in df['stats.Homeruns.#text']:
    observed.append(int(x))
observed

[1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0]

In [28]:
observed = [int(x) for x in df['stats.Homeruns.#text']]

In [29]:
observed

[1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0]