# Leveraging SQLAlchemy ORM to Sotre and Retrieve MLB Stats

## Table of Contents

[Part 1: Exploring the MLB API](#part-1)
- [1a. Install and Import](#part-1a)
- [1b. Get GamePks](#part-1b)
- [1c. The 'Game' Endpoint](#part-1c)

---

### The SQLAlchemy Object Relational Mapper automatically constructs higher-level SQL and automates persistence of python objects.
We're going to query the MLB API using a python wrapper created by Todd Roberts and store the information in a SQLite database for future analysis. 

---

<a id='part-1'></a>

## Part 1: Exploring the MLB API
Todd Roberts' python wrapper is part of the python package index. You can find more information [here](https://pypi.org/project/MLB-StatsAPI/) or on [GitHub](https://github.com/toddrob99/MLB-StatsAPI).

<a id='part-1a'></a>

First, we have to install it and import it.

In [1]:
import sys
#pip install 
#!{sys.executable} -m pip install MLB-StatsAPI

import statsapi as mlb

Todd was nice enough to give us several convenient functions for accessing the API's endpoints. The most flexible/powerful of these is the get() function that takes in an endpoint and returns the raw JSON response from the MLB Stats API. You can find a dictionary with the endpoint configuration by accessing the ENDPOINTS global variable. To get notes for a given endpoint, use the notes() method.

In [79]:
list(mlb.ENDPOINTS.keys())[:10]

['attendance',
 'awards',
 'conferences',
 'divisions',
 'draft',
 'game',
 'game_diff',
 'game_timestamps',
 'game_changes',
 'game_contextMetrics']

In [139]:
print(mlb.notes('game'))

Endpoint: game 
All path parameters: ['ver', 'gamePk']. 
Required path parameters (note: ver will be included by default): ['ver', 'gamePk']. 
All query parameters: ['timecode', 'hydrate', 'fields']. 
Required query parameters: None. 
The hydrate function is supported by this endpoint. Call the endpoint with {'hydrate':'hydrations'} in the parameters to return a list of available hydrations. For example, statsapi.get('schedule',{'sportId':1,'hydrate':'hydrations','fields':'hydrations'})



<a id='part-1b'></a>

#### Get GamePks

In [132]:
from datetime import datetime as dt
import os,re,csv
from os import walk

#dates from the 'season' endpoint are returned in a different format than what we need to query the API
#we'll use this function to take care of that in a moment
def convert_date(date):
    date = dt.strptime(date,"%Y-%m-%d")
    convertedDate = dt.strftime(date,"%m/%d/%Y")
    return convertedDate

def get_gamePks(seasons,target_directory=None):
    """
    Takes in a list of seasons as strings representing their year e.g. ['2018','2019']
    Queries the MLB API to find gamePks for each season and writes them to CSV files
    if a target directory for the gamePks is not specified, a directory called 'gamePks'
    will be added to the current directory. 
    """
    if target_directory:
        gamePks_path = target_directory
    else:
        #create a directory to store CSVs
        try:
            os.mkdir(os.getcwd()+'/gamePks')
        except FileExistsError:
            pass
        gamePks_path=os.getcwd()+'/gamePks'
    
    #walk the gamePks directory to see if we've already added any seasons
    f = []
    for (dirpath, dirnames, filenames) in walk(gamePks_path):
        f.extend(filenames)
        break
    years = [re.findall('[^.csv]+',x) for x in f]
    already_added = [item for sublist in years for item in sublist if item[0] in ['1','2']]
    seasons = list(set(seasons)-set(already_added))
    
    #query the API to get start dates and end dates for all seasons
    all_seasons = mlb.get('seasons',{'sportId':1,'all':True})['seasons']
    
    #filter out the ones we don't care about right now
    seasons = list(filter(lambda x: x['seasonId'] in seasons,all_seasons))
    
    gamePks = {}
    for season in seasons:  
        year = season['seasonId']
        startDate = convert_date(season['seasonStartDate'])
        endDate = convert_date(season['seasonEndDate'])
        
        #returns a list of dicts for each date in the range
        #each dict has a 'games' key with a list of dicts for each game in that day as values
        dates = mlb.get('schedule',{'sportId':1,'startDate':startDate,'endDate':endDate})['dates']
        
        #for each date, and for each game in that date, get the gamePk 
        gamePks[year]= [ game['gamePk'] 
                                          for date in dates 
                                          for game in date['games'] ]
        #store the gamePks as CSVs
        with open(gamePks_path + f"/{year}.csv", 'w',newline='') as myfile:
            wr = csv.writer(myfile,quoting=csv.QUOTE_ALL)
            wr.writerow(gamePks[year])
get_gamePks([str(x) for x in range(2008,2020)])   

In [133]:
def read_gamePks():
    gamePks_path = os.curdir+'/gamePks'
    f = []
    for (dirpath, dirnames, filenames) in walk(gamePks_path):
        f.extend(filenames)
        break
    pk_paths = [gamePks_path + '/' + x for x in f if x[0]!= '.']
    
    gamePks = {}
    for path in pk_paths:
        season = re.findall('/gamePks/([^.csv]+)',path)
        with open(path, 'r') as f:
            reader = csv.reader(f)
            seasonPks = list(reader)
        gamePks[season[0]] = [item for sublist in seasonPks for item in sublist]
    return gamePks

In [135]:
gamePks=read_gamePks()

<a id='part-1c'></a>

#### Explore the 'Game' Endpoint

In [146]:
temp_pk=gamePks['2019'][300]

game = mlb.get('game',{'gamePk':temp_pk})

In [148]:
game.keys()

dict_keys(['copyright', 'gamePk', 'link', 'metaData', 'gameData', 'liveData'])

In [154]:
game['liveData']

{'plays': {'allPlays': [{'result': {'type': 'atBat',
     'event': 'Lineout',
     'eventType': 'field_out',
     'description': 'Adam Frazier lines out to center fielder Victor Robles.',
     'rbi': 0,
     'awayScore': 0,
     'homeScore': 0},
    'about': {'atBatIndex': 0,
     'halfInning': 'top',
     'isTopInning': True,
     'inning': 1,
     'startTime': '2019-04-12T18:46:25.000Z',
     'endTime': '2019-04-12T23:06:20.000Z',
     'isComplete': True,
     'isScoringPlay': False,
     'hasReview': False,
     'hasOut': True,
     'captivatingIndex': 0},
    'count': {'balls': 1, 'strikes': 0, 'outs': 1},
    'matchup': {'batter': {'id': 624428,
      'fullName': 'Adam Frazier',
      'link': '/api/v1/people/624428'},
     'batSide': {'code': 'L', 'description': 'Left'},
     'pitcher': {'id': 571578,
      'fullName': 'Patrick Corbin',
      'link': '/api/v1/people/571578'},
     'pitchHand': {'code': 'L', 'description': 'Left'},
     'batterHotColdZones': [],
     'pitcherHotCol