<a id="top"></a>
## MLB-StatsAPI
2020-08-27
Using [MLB-StatsAPI](https://github.com/toddrob99/MLB-StatsAPI). Here we use just the [schedule](https://github.com/toddrob99/MLB-StatsAPI/wiki/Function:-schedule) function. 

The following setup is required 
- conda install tabulate
- pip install MLB-StatsAPI
- pip install xiongmao

Examples Shown
- Creating simple [table](#table) showing home_id/away_id -> home_name/away_name 
- Grab the [season](#season) for a club
- Find longest win [streak](#streak) in a regular season
- [sample](#sample) response record from schedule method 
- Remembering [Padres](#135) 10 game streak in 2019

In [2]:
import pandas as pd
from datetime import datetime as dt
import numpy as np 

import statsapi as mlb

import xiongmao

### Helper Function
The MLB API returns a list of dictionaries. For Pandas, it's better to have a dictionary of lists. This does that conversion and creates DataFrame. 

In [3]:
def dL2df(dL):
    """
    Convert list of dictionaries to dictionary of lists
    """
    kL = dL[0].keys()
    LD = {k: [] for k in kL}
    
    for entry in dL:
        for k in kL:
            try:
                LD[k].append(entry[k])
            except KeyError:
                LD[k].append(None)
    
    return pd.DataFrame(LD, columns=kL)

<a id="table"></a>
### Table Showing club_id -> club_name
The MLB has an integer id for each club. It can be handy to use the id rather than the name to do searches, because integers are so easy to type. A table can be handy to keep those straight. Here we 
- Use the API to grab the month of July 2017
- Find the home_id / away_id correspondence
- Do the same for away_id /away_name
- Concatenate those and show hte table

Back to [top](#top)

In [4]:
month = mlb.schedule(start_date='07/01/2018',end_date='07/31/2018') # grabbing month
mR = dL2df(month) # converting to dataframe

# creating correspondence
homeNameR = mR.xTab(*"home_id home_name".split()).stack().todf().reset_index()
awayNameR = mR.xTab(*"away_id away_name".split()).stack().todf().reset_index()
# making names the same
homeNameR.columns = awayNameR.columns = "team_id team_name count".split()
# concatenating
teamNameR = pd.concat((homeNameR.query("count > 0"), awayNameR.query("count > 0")))
# keeping only id and name
teamNameR = teamNameR.set_index("team_id").drop(columns="count") \
    .drop_duplicates().sort_index()
print (teamNameR.to_markdown())

|   team_id | team_name                 |
|----------:|:--------------------------|
|       108 | Los Angeles Angels        |
|       109 | Arizona Diamondbacks      |
|       110 | Baltimore Orioles         |
|       111 | Boston Red Sox            |
|       112 | Chicago Cubs              |
|       113 | Cincinnati Reds           |
|       114 | Cleveland Indians         |
|       115 | Colorado Rockies          |
|       116 | Detroit Tigers            |
|       117 | Houston Astros            |
|       118 | Kansas City Royals        |
|       119 | Los Angeles Dodgers       |
|       120 | Washington Nationals      |
|       121 | New York Mets             |
|       133 | Oakland Athletics         |
|       134 | Pittsburgh Pirates        |
|       135 | San Diego Padres          |
|       136 | Seattle Mariners          |
|       137 | San Francisco Giants      |
|       138 | St. Louis Cardinals       |
|       139 | Tampa Bay Rays            |
|       140 | Texas Rangers       

<a id="season"></a>
### Grabbing a Season for One Club
We can grab the results for a season for a club 
- using the start_date and end_date limits to one season
- using the team filter provides both home and away games

Back to [top](#top)

In [5]:
print (dt.now().strftime("%H:%M:%S")) # printing time query started
cubs2016L = mlb.schedule(start_date='01/01/2016',end_date='12/31/2016', team=112)
print (dt.now().strftime("%H:%M:%S")) # printing time query stopped

08:11:07
08:11:11


In [6]:
cubs2016R = dL2df(cubs2016L)
cubs2016R.shape

(214, 27)

#### Discussion
The record shows 214 games for the 2016 Cubs, which seems like a few too many, so we can look at the game_type field to get an idea of what those all are. We can figure out the encoding from the counts. 

| game_type   |   frequency | description |
|:---|------------:|:----|
| D  |           4 | Division Series|
| L  |           6 | League Championship|
| R  |         165 | Regular Season|
| S  |          32 | Spring Training|
| W  |           7 | World Series|

165 still seems like too many Regular Season games, till we look at the status to see that 3 games wer Postponed. 

In [7]:
print (cubs2016R.game_type.value_counts().sort_index().to_markdown())

|    |   game_type |
|:---|------------:|
| D  |           4 |
| L  |           6 |
| R  |         165 |
| S  |          32 |
| W  |           7 |


In [8]:
cubs2016R.xTab(*"game_type status".split())

status,Final,Final: Tied,Postponed
game_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
D,4,0,0
L,6,0,0
R,161,1,3
S,30,2,0
W,7,0,0


<a id="streak"></a>
### Counting Wins and Streaks of Wins
With the game results, let's 
- code each game as a cubs win (1) or loss (0)
- count the number of Cubs wins by game type 
- find the longerst streak within the R type games

Removing the Postpomned status games. 

Back to [top](#top)

In [9]:
cubs2016R["win"] = np.where(cubs2016R.winning_team == "Chicago Cubs", 1, 0)
print (cubs2016R.xTab(*"game_type win".split()).to_markdown())

| game_type   |   0 |   1 |
|:------------|----:|----:|
| D           |   1 |   3 |
| L           |   2 |   4 |
| R           |  62 | 103 |
| S           |  21 |  11 |
| W           |   3 |   4 |


In [10]:
cubs2016R.status.value_counts()

Final          208
Final: Tied      3
Postponed        3
Name: status, dtype: int64

In [11]:
wsL = []
regSeasR = cubs2016R.query("status == 'Final' and game_type == 'R'").copy().reset_index()
for i, game in regSeasR.iterrows():
    if i == 0:
        wsL.append(game.win)
    else:
        cw = (wsL[i-1] + game.win) * game.win
        wsL.append(cw)

regSeasR["streak"] = wsL

regSeasR["game_date win streak".split()][:6]

Unnamed: 0,game_date,win,streak
0,2016-04-04,1,1
1,2016-04-05,1,2
2,2016-04-07,1,3
3,2016-04-08,0,0
4,2016-04-09,1,1
5,2016-04-10,1,2


In [12]:
regSeasR.streak.max()

11

<a id="sample"></a>
### Example Schedule Record

Because looking at the data is how to make sense of it.

Back to [top](#top)

In [13]:
cubs2016R.iloc[0]

game_id                                                             469433
game_datetime                                         2016-03-03T20:05:00Z
game_date                                                       2016-03-03
game_type                                                                S
status                                                               Final
away_name                                                     Chicago Cubs
home_name                                                Milwaukee Brewers
away_id                                                                112
home_id                                                                158
doubleheader                                                             N
game_num                                                                 1
home_probable_pitcher                                       Chase Anderson
away_probable_pitcher                                          Travis Wood
home_pitcher_note        

<a id="135"></a>
### Counting Wins and Streaks of Wins
And since my club 135 just finished a 7 game winning streak, when is the last time that happened? 

Back to [top](#top)

In [14]:
# 2019?
team=135
year=2019
print (dt.now().strftime("%H:%M:%S")) # printing time query started
seasL = mlb.schedule(start_date='01/01/%d' %(year),end_date='12/31/%d' %(year), team=team)
print (dt.now().strftime("%H:%M:%S")) # printing time query stopped
seasR = dL2df(seasL) 
regR = seasR.query("status == 'Final' and game_type == 'R'").copy().reset_index()
regR["win"] = np.where(regR.winning_team == "San Diego Padres", 1, 0)

wsL = []
for i, game in regR.iterrows():
    if i == 0:
        wsL.append(game.win)
    else:
        cw = (wsL[i-1] + game.win) * game.win
        wsL.append(cw)

regR["streak"] = wsL
regR.streak.max()

08:11:12
08:11:13


10

#### 10 game streak in 2019
This streak was 3 series sweeps against NL East teams at home between losses on the road. 

In [15]:
regR.query("streak == 10")["game_date losing_team venue_name".split()]

Unnamed: 0,game_date,losing_team,venue_name
65,2019-06-09,Washington Nationals,Petco Park


In [16]:
regR[55:67]["game_date losing_team venue_name".split()]

Unnamed: 0,game_date,losing_team,venue_name
55,2019-05-29,San Diego Padres,Yankee Stadium
56,2019-05-31,Miami Marlins,Petco Park
57,2019-06-01,Miami Marlins,Petco Park
58,2019-06-02,Miami Marlins,Petco Park
59,2019-06-03,Philadelphia Phillies,Petco Park
60,2019-06-04,Philadelphia Phillies,Petco Park
61,2019-06-05,Philadelphia Phillies,Petco Park
62,2019-06-06,Washington Nationals,Petco Park
63,2019-06-07,Washington Nationals,Petco Park
64,2019-06-08,Washington Nationals,Petco Park


#### Other 7 game streaks in 2019
There were actually a few 7 game or longer streaks in 2019. 

In [18]:
regR.query("streak == 7")["game_date losing_team venue_name".split()]

Unnamed: 0,game_date,losing_team,venue_name
6,2019-04-03,Arizona Diamondbacks,Petco Park
23,2019-04-23,Seattle Mariners,Petco Park
49,2019-05-22,Arizona Diamondbacks,Petco Park
62,2019-06-06,Washington Nationals,Petco Park
119,2019-08-14,Tampa Bay Rays,Petco Park
145,2019-09-12,Chicago Cubs,Petco Park
