![](../resources/images/data_source_logos/api_sports/rapid_api_banner2.png)

Source: Data in this notebook is collected from [API Sports](https://www.api-football.com) via [RapidAPI](https://rapidapi.com/api-sports/api/api-football).

### Imports

In [27]:
import requests
import json
import pandas as pd
import time
import datetime as dt
from datetime import datetime

In [2]:
with open('../secrets.json') as f:
    secrets = json.load(f)

# <span style="color:orangered">API Architecture

<img src='../resources/api_football_info/api_architechture.jpg' width=600>
<!-- ![](../resources/api_football_info/api_architechture.jpg) -->
  
[Image Source: API Football Documentation](https://www.api-football.com/documentation-v3#section/Architecture)

# <span style="color:mediumblue">LEAGUES ENDPOINT - DATA COVERAGE

From a call to the "leagues" endpoint filtered by country, I found a list of leagues in the United States, and within that list - Major League Soccer. I then did another call for MLS specifically and exported the resulting JSON to a file so that I wouldn't need to make either of those particular calls again.

### Requesting all Leagues in the United States to find MLS League ID

In [3]:
url = "https://api-football-v1.p.rapidapi.com/v3/leagues"

querystring = {"country":"USA"}

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
    }

leagues_response = requests.request("GET", url, headers=headers, params=querystring)

In [4]:
print('ID ', '   ', 'League' )
print('---', '   ','---------------')
for league in leagues_response.json()['response']:
    print(league['league']['id'],' - ' , league['league']['name'])

ID      League
---     ---------------
253  -  Major League Soccer
257  -  US Open Cup
255  -  USL Championship
256  -  USL League Two
489  -  USL League One
523  -  NISA
254  -  NWSL Women
641  -  NWSL Women - Challenge Cup


### Requesting League info for Major League Soccer

In [5]:
url = "https://api-football-v1.p.rapidapi.com/v3/leagues"

querystring = {"id":"253"}

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
    }

mls_response = requests.request("GET", url, headers=headers, params=querystring)

In [6]:
mls_json = mls_response.json()['response'][0]

In [7]:
print('League:', mls_json['league']['id'],' - ' ,mls_json['league']['name'])
print('Seasons: ')
for season in mls_json['seasons']:
    print(season['year'])

League: 253  -  Major League Soccer
Seasons: 
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021


Next, I exported the JSON to a file to save what I had pulled.

In [8]:
with open('../data/api_football_data/00_league_endpoint/rapi-league_mls.json', 'w') as outfile:
    json.dump(obj=mls_json, fp=outfile , ensure_ascii=False, indent=1)

The output from the JSON gives a rundown of what kinds of data are available data for each season in MLS. The first year in which statistics are available for each game (fixture/match) are available is 2015. So I will likely be using 2015-2019 (or possibly 2020*) as the range from which I will examine data.

**2020 being a tumultuous year with even more variables contributing to match outcomes, I expect to exclude it from my data.*

###  Requesting info for every MLS Season

The API request for MLS' seasons returned a JSON with data for every MLS season since 2012. It provided an overview of what kind of data was available for each season (coverage). I dug into the json and brought all the information into one dataframe. I put that info in a markdown table below for reference.

**Sample Output:** 

```
[{'year': 2012,
  'start': '2012-03-10',
  'end': '2012-12-01',
  'current': False,
  'coverage': {'fixtures': {'events': True,
                            'lineups': True,
                            'statistics_fixtures': False,
                            'statistics_players': False},
               'standings': False,
               'players': True,
               'top_scorers': True,
               'top_assists': True,
               'top_cards': True,
               'injuries': False,
               'predictions': True,
               'odds': False}}]
   ```

### Seasons Coverage

Next step is to dig into the JSON. To make it more readable, I converted it into several Pandas Dataframes. 
1. The first Dataframe would contain the coverage info for each season
2. The second would be made up of the coverage info for individual fixtures in each season

Then I joined those two tables together so I could see all coverage information at a glance.

**Creating the First DataFrame - Season Coverage**

In [9]:
with open('../data/api_football_data/00_league_endpoint/rapi-league_mls.json') as f:
    league_mls = json.load(f)

In [10]:
coverage_list = []

for year in league_mls['seasons']:
    coverage_list.append(year['coverage'])

In [11]:
stats_seasons_coverage = pd.DataFrame(coverage_list).drop(columns=['fixtures']) 

years = list(range(2012,2022,1))
stats_seasons_coverage['year'] = years

stats_seasons_coverage.set_index('year', inplace=True)

In [12]:
stats_seasons_coverage.rename(axis=1, 
                              mapper={'predictions':'preds'}, 
                              inplace=True)

#### Creating the Second DataFrame - Season Fixtures Coverage

In [13]:
stats_fixture_coverage = pd.DataFrame(list(pd.DataFrame(coverage_list)['fixtures']))

years = list(range(2012,2022,1))
stats_fixture_coverage['year'] = years

stats_fixture_coverage.set_index('year', inplace=True)

In [14]:
stats_fixture_coverage.rename(axis=1, 
                              mapper={'statistics_fixtures':'match_stats', 
                                      'statistics_players':'player_stats'}, 
                              inplace=True)

#### Merging the two DataFrames

So that I would be able to distinguish between the columns from the two merged DataFrames, I added a prefix to every column name.

In [15]:
stats_seasons_coverage = stats_seasons_coverage.add_prefix('tm_')
stats_fixture_coverage = stats_fixture_coverage.add_prefix('fx_')
mls_data_coverage = pd.concat([stats_seasons_coverage, stats_fixture_coverage], axis=1)

In [18]:
# mls_data_coverage

I converted the new DataFrame to markdown and quickly ran a find-replace for all   
of the "False" values - replacing them with red/bolded "False" values for readability.

### MLS Data Coverage

| year | tm_stndgs | tm_plrs | tm_top_g | tm_top_a | tm_top_cards | tm_injuries | tm_preds | tm_odds | fx_events | fx_lineups | fx_match_stats | fx_player_stats |
|------|--------------|------------|----------------|----------------|--------------|-------------|----------|---------|-----------|------------|----------------|-----------------|
| 2012 | <span style='color:red;font-weight:bold'>False</span>        | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | <span style='color:red;font-weight:bold'>False</span>          | <span style='color:red;font-weight:bold'>False</span>           |
| 2013 | <span style='color:red;font-weight:bold'>False</span>        | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | <span style='color:red;font-weight:bold'>False</span>          | <span style='color:red;font-weight:bold'>False</span>           |
| 2014 | <span style='color:red;font-weight:bold'>False</span>        | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | <span style='color:red;font-weight:bold'>False</span>          | <span style='color:red;font-weight:bold'>False</span>           |
| 2015 | <span style='color:red;font-weight:bold'>False</span>        | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2016 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2017 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2018 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2019 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2020 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2021 | True         | True       | True           | True           | True         | True        | True     | True    | True      | True       | True           | True            |

Now that I have this for reference, It's pretty clear that I won't be able to use 2012-2014 because the match (fixture) statistics and individual player statistics are unavailable. I could work with 2015 and on, but 2015 doesn't have team standings, which I'm hoping to use as a feature for any given match-up. This means I'll be limited to the data from 2016-2019 if I also exclude 2020 as an outlier year.

# <span style="color:steelblue">FIXTURES ENDPOINT - INFO

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

### <span style="color:steelblue">Testing - 2019 Fixtures

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

In [19]:
url = "https://api-football-v1.p.rapidapi.com/v3/fixtures"
querystring = {"league":"253","season":"2019"}
headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': secrets['rapid_api_host']
    }

mls19_response = requests.request("GET", url, headers=headers, params=querystring)
mls_2019 = mls19_response.json()

In [20]:
with open('../data/api_football_data/01_fixtures_info/rapi-fixtures-mls_2019.json', 'w') as outfile:
    json.dump(obj=mls_2019, fp=outfile , ensure_ascii=False, indent=1)

In [22]:
# mls_2019['response'][-1]


```
{'fixture': {'id': 250199,
             'referee': 'Allen Chapman, USA',
             'timezone': 'UTC',
             'date': '2019-11-10T20:00:00+00:00',
             'timestamp': 1573416000,
             'periods': {'first': 1573416000, 'second': 1573419600},
             'venue': {'id': None, 'name': 'CenturyLink Field', 'city': 'Seattle'},
             'status': {'long': 'Match Finished', 'short': 'FT', 'elapsed': 90}
            },
            
 'league': {'id': 253,
            'name': 'Major League Soccer',
            'country': 'USA',
            'logo': 'https://media.api-sports.io/football/leagues/253.png',
            'flag': 'https://media.api-sports.io/flags/us.svg',
            'season': 2019,
            'round': 'MLS Cup - Final'},
            
 'teams': {'home': {'id': 1595,
                   'name': 'Seattle Sounders',
                   'logo': 'https://media.api-sports.io/football/teams/1595.png',
                   'winner': True
                   },
                   
          'away': {'id': 1601,
                   'name': 'Toronto FC',
                   'logo': 'https://media.api-sports.io/football/teams/1601.png',
                   'winner': False
                   }
           },
           
 'goals': {'home': 3, 'away': 1},
 
 'score': {'halftime': {'home': 0, 'away': 0},
           'fulltime': {'home': 3, 'away': 1},
           'extratime': {'home': None, 'away': None},
           'penalty': {'home': None, 'away': None}
          
           }
  }
```



### <span style="color:steelblue">Requesting and Saving all MLS Fixtures by Season

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

Requesting Fixture Info for all MLS fixtures by season and going through them to extract a full list of team IDs

In [24]:
url = "https://api-football-v1.p.rapidapi.com/v3/fixtures"

for year in list(range(2012,2022,1)):
   
    # requesting season info
    querystring = {"league":"253","season":year}
    headers = {
        'x-rapidapi-key': secrets['rapid_api_key'],
        'x-rapidapi-host': secrets['rapid_api_host']
        }
    response = requests.request("GET", url, headers=headers, params=querystring)
    season_json = response.json()
    time.sleep(.5)
    
    # saving season info
    with open(f'../data/api_football_data/01_fixtures_info/rapi-fixtures-mls_{year}.json', 'w') as outfile:
        json.dump(obj=season_json, fp=outfile , ensure_ascii=False, indent=1)
    

### <span style="color:steelblue">Extracting list of all team IDs from Fixtures Info

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

In [25]:
mls_team_ids = {}

for year in list(range(2012,2022,1)):
   
    with open(f'../data/api_football_data/01_fixtures_info/rapi-fixtures-mls_{year}.json') as f:
        loaded_json = json.load(f)
    
    # extracting and saving team IDs
    for fixture in season_json['response']:
        team_id = fixture['teams']['home']['id']
        team_name = fixture['teams']['home']['name']

        mls_team_ids[team_id] = {'name': team_name}

In [26]:
mls_team_ids_df = pd.DataFrame(mls_team_ids).T   
mls_team_ids_df.sort_values(by='name')    

Unnamed: 0,name
1608,Atlanta United FC
16489,Austin
1607,Chicago Fire
1610,Colorado Rapids
1613,Columbus Crew
1615,DC United
2242,FC Cincinnati
1597,FC Dallas
1600,Houston Dynamo
9568,Inter Miami


## <span style="color:steelblue">Extracting and Aggregating Fixture Info

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

### <span style="color:steelblue">Testing - 2019 Fixtures

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

In [30]:
with open('../data/api_football_data/01_fixtures_info/rapi-fixtures-mls_2019.json') as f:
    fixtures_2019 = json.load(f)

In order to determine how to process these data, I'm going to work with a single match JSON and make decisions about what columns I'm going to keep, how I'll name them, sort them, etc. Once I have that built, I can scale up and run all fixtures through it to create a fixtures Dataframe for each year and then ultimately for all years (2016-2019).

In [35]:
fixture_test = fixtures_2019['response'][0]
test = pd.json_normalize(fixture_test)
test

Unnamed: 0,fixture.id,fixture.referee,fixture.timezone,fixture.date,fixture.timestamp,fixture.periods.first,fixture.periods.second,fixture.venue.id,fixture.venue.name,fixture.venue.city,...,goals.home,goals.away,score.halftime.home,score.halftime.away,score.fulltime.home,score.fulltime.away,score.extratime.home,score.extratime.away,score.penalty.home,score.penalty.away
0,128168,"Nima Saghafi, USA",UTC,2019-03-02T18:00:00+00:00,1551549600,1551549600,1551553200,,Talen Energy Stadium,Chester,...,1,3,0,1,1,3,,,,


In [33]:
fx_columns = test.columns
fx_columns

Index(['fixture.id', 'fixture.referee', 'fixture.timezone', 'fixture.date',
       'fixture.timestamp', 'fixture.periods.first', 'fixture.periods.second',
       'fixture.venue.id', 'fixture.venue.name', 'fixture.venue.city',
       'fixture.status.long', 'fixture.status.short', 'fixture.status.elapsed',
       'league.id', 'league.name', 'league.country', 'league.logo',
       'league.flag', 'league.season', 'league.round', 'teams.home.id',
       'teams.home.name', 'teams.home.logo', 'teams.home.winner',
       'teams.away.id', 'teams.away.name', 'teams.away.logo',
       'teams.away.winner', 'goals.home', 'goals.away', 'score.halftime.home',
       'score.halftime.away', 'score.fulltime.home', 'score.fulltime.away',
       'score.extratime.home', 'score.extratime.away', 'score.penalty.home',
       'score.penalty.away'],
      dtype='object')

#### Translation table

<table>
     <colgroup>
    <col span="10" >
    <col>
  </colgroup> 
    
<tr> 

</tr>
<tr>
    
<td>


| Name      | Abbreviation |
|:----------|:-------------|
| fixture   | fx           |
| league    | lg           |
| teams     | tm           |
| goals     | gl           |
| score     | sc           |
| periods   | per          |
| venue     | ven          |
| status    | sts          |
    
    
</td>
<td>
    

| Name      | Abbreviation |
|:----------|:-------------|    
| home      | h            |
| away      | a            |
| halftime  | ht           |
| fulltime  | ft           |
| extratime | et           |
| penalty   | pen          |
| id        | id           |
| referee   | ref          |
    
</td>
<td>
    
    
| Name       | Abbreviation |
|:-----------|:-------------|
| timezone   | tz           |
| date       | dt           |
| timestamp  | ts           |
| first      | fst          |
| second     | sec          |
| name       | nm           |
| city       | city         |
| long       | long         |
    
    
</td>
<td>
    
    
| Name       | Abbreviation |
|:-----------|:-------------|    
| short      | shrt         |
| elapsed    | elps         |
| country    | ctry         |
| logo       | logo         |
| flag       | flag         |
| season     | seas         |
| round      | rnd          |
| winner     | win          |
    
    
</td></tr> 

</table>






<!---

<table>
<tr> 
<th>  </th> <th>  </th>
</tr>
<tr>
    
<td>
    
    
|  full                  |     |  tier 1 |     |  tier 2 |     |  end       |
|:-----------------------|:---:|:--------|:---:|:--------|:---:|:-----------|
| fixture.id             | -> | fixture | -> |         | -> | id         |
| fixture.referee        | -> | fixture | -> |         | -> | referee    |
| fixture.timezone       | -> | fixture | -> |         | -> | timezone   |
| fixture.date           | -> | fixture | -> |         | -> | date       |
| fixture.timestamp      | -> | fixture | -> |         | -> | timestamp  |
| fixture.periods.first  | -> | fixture | -> | periods | -> | first      |
| fixture.periods.second | -> | fixture | -> | periods | -> | second     |
| fixture.venue.id       | -> | fixture | -> | venue   | -> | id         |
| fixture.venue.name     | -> | fixture | -> | venue   | -> | name       |
| fixture.venue.city     | -> | fixture | -> | venue   | -> | city       |
| fixture.status.long    | -> | fixture | -> | status  | -> | long       |
| fixture.status.short   | -> | fixture | -> | status  | -> | short      |
| fixture.status.elapsed | -> | fixture | -> | status  | -> | elapsed    |

    
|  full           |     |  tier 1 |     |  tier 2 |     |  end     |
|:----------------|:---:|:--------|:---:|:--------|:---:|:---------|   
| league.id       | -> | league  | -> |         | -> | id       |
| league.name     | -> | league  | -> |         | -> | name     |
| league.country  | -> | league  | -> |         | -> | country  |
| league.logo     | -> | league  | -> |         | -> | logo     |
| league.flag     | -> | league  | -> |         | -> | flag     |
| league.season   | -> | league  | -> |         | -> | season   |
| league.round    | -> | league  | -> |         | -> | round    |
    
    
</td>
    
<td>
    
|  full              |     |  tier 1 |     |  tier 2 |     |  end    |
|:-------------------|:---:|:--------|:---:|:--------|:---:|:--------|
| teams.home.id      | -> | teams   | -> | home    | -> | id      |
| teams.home.name    | -> | teams   | -> | home    | -> | name    |
| teams.home.logo    | -> | teams   | -> | home    | -> | logo    |
| teams.home.winner  | -> | teams   | -> | home    | -> | winner  |
| teams.away.id      | -> | teams   | -> | away    | -> | id      |
| teams.away.name    | -> | teams   | -> | away    | -> | name    |
| teams.away.logo    | -> | teams   | -> | away    | -> | logo    |
| teams.away.winner  | -> | teams   | -> | away    | -> | winner  |
    

|  full       |     |  tier 1 |     |  tier 2 |     |  end  |
|:------------|:---:|:--------|:---:|:--------|:---:|:------|
| goals.home  | -> | goals   | -> |         | -> | home  |
| goals.away  | -> | goals   | -> |         | -> | away  |
    
    
|  full                 |     |  tier 1 |     |  tier 2    |     |  end  |
|:----------------------|:---:|:--------|:---:|:-----------|:---:|:------|
| score.halftime.home   | -> | score   | -> | halftime   | -> | home  |
| score.halftime.away   | -> | score   | -> | halftime   | -> | away  |
| score.fulltime.home   | -> | score   | -> | fulltime   | -> | home  |
| score.fulltime.away   | -> | score   | -> | fulltime   | -> | away  |
| score.extratime.home  | -> | score   | -> | extratime  | -> | home  |
| score.extratime.away  | -> | score   | -> | extratime  | -> | away  |
| score.penalty.home    | -> | score   | -> | penalty    | -> | home  |
| score.penalty.away    | -> | score   | -> | penalty    | -> | away  |

</td></tr> 

</table>

-->

In [37]:
fx_col_names = {
'fixture.id': 'fx_id', 
'fixture.referee': 'fx_ref', 
'fixture.timezone': 'fx_tz',
'fixture.date': 'fx_date',
'fixture.timestamp': 'fx_time',
'fixture.periods.first': 'fx_per_fst', 
'fixture.periods.second': 'fx_per_sec',
'fixture.venue.id': 'fx_ven_id', 
'fixture.venue.name': 'fx_ven_name', 
'fixture.venue.city': 'fx_ven_city',
'fixture.status.long': 'fx_sts_long', 
'fixture.status.short': 'fx_sts_shrt', 
'fixture.status.elapsed': 'fx_sts_elps',

'league.id': 'lg_id', 
'league.name': 'lg_name', 
'league.country': 'lg_ctry', 
'league.logo': 'lg_logo',
'league.flag': 'lg_flag', 
'league.season': 'lg_seas', 
'league.round': 'lg_rnd',

'teams.home.id': 'tm_h_id',
'teams.home.name': 'tm_h_name', 
'teams.home.logo': 'tm_h_logo', 
'teams.home.winner': 'tm_h_win',
'teams.away.id': 'tm_a_id', 
'teams.away.name': 'tm_a_name', 
'teams.away.logo': 'tm_a_logo',
'teams.away.winner': 'tm_a_win', 

'goals.home': 'gl_h',
'goals.away': 'gl_a',

'score.halftime.home': 'sc_ht_h',
'score.halftime.away': 'sc_ht_a', 
'score.fulltime.home': 'sc_ft_h', 
'score.fulltime.away': 'sc_ft_a',
'score.extratime.home': 'sc_et_h', 
'score.extratime.away': 'sc_et_a', 
'score.penalty.home': 'sc_pen_h',
'score.penalty.away': 'sc_pen_a'
}

## <span style="color:steelblue">Code

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

In [38]:
import json
import pandas as pd

all_fixtures = pd.DataFrame()

for year in list(range(2016,2020,1)):
    
    with open(f'../data/api_football_data/01_fixtures_info/rapi-fixtures-mls_{year}.json') as f:
        season = json.load(f)    
    print(f'Processing fixtures for {year}...')
    
    # create dataframe to which all other fixtures can be appended
    template = pd.json_normalize(season['response'][0]) 
    renamed = template.rename(mapper = fx_col_names, axis = 1)
    fixture_base = pd.DataFrame(columns = renamed.columns)

    # process each fixture, renaming column names and appending to base
    for fix in season['response']:
        new_row = pd.json_normalize(fix)
        renamed_row = new_row.rename(mapper = fx_col_names, axis = 1)
        fixture_base = fixture_base.append(renamed_row, ignore_index=True)

    all_fixtures = all_fixtures.append(fixture_base)

    
all_fixtures.to_csv('../data/api_football_data/02_aggregated_info/rapi_fixtures_2016_2019.csv', index=False)
all_fixtures

Processing fixtures for 2016...
Processing fixtures for 2017...
Processing fixtures for 2018...
Processing fixtures for 2019...


Unnamed: 0,fx_id,fx_ref,fx_tz,fx_date,fx_time,fx_per_fst,fx_per_sec,fx_ven_id,fx_ven_name,fx_ven_city,...,gl_h,gl_a,sc_ht_h,sc_ht_a,sc_ft_h,sc_ft_a,sc_et_h,sc_et_a,sc_pen_h,sc_pen_a
0,148290,"Alan Kelly, Ireland",UTC,2016-12-11T01:00:00+00:00,1481418000,1481418000,1481421600,,BMO Field,Toronto,...,0,0,0,0,0,0,0,0,4,5
1,148291,"Juan Guzman, USA",UTC,2016-11-23T01:00:00+00:00,1479862800,1479862800,1479866400,,Olympic Stadium,Montreal,...,3,2,2,0,3,2,,,,
2,148292,"Jair Marrufo, USA",UTC,2016-12-01T00:00:00+00:00,1480550400,1480550400,1480554000,,BMO Field,Toronto,...,5,2,2,1,3,2,5,2,,
3,148293,"Chris Penso, USA",UTC,2016-11-23T03:00:00+00:00,1479870000,1479870000,1479873600,,CenturyLink Field,Seattle,...,2,1,1,1,2,1,,,,
4,148294,"Ricardo Salazar, USA",UTC,2016-11-27T21:00:00+00:00,1480280400,1480280400,1480284000,,Dick's Sporting Goods Park,Commerce City,...,0,1,0,0,0,1,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
416,250195,"Ismail Elfath, USA",UTC,2019-10-25T00:00:00+00:00,1571961600,1571961600,1571965200,,Mercedes-Benz Stadium,Atlanta,...,2,0,1,0,2,0,,,,
417,250196,"Kevin Stott, USA",UTC,2019-10-25T02:30:00+00:00,1571970600,1571970600,1571974200,,Banc of California Stadium,Los Angeles,...,5,3,2,1,5,3,,,,
418,250197,"Jair Antonio Maruffo, USA",UTC,2019-10-30T02:00:00+00:00,1572400800,1572400800,1572404400,,Banc of California Stadium,Los Angeles,...,1,3,1,2,1,3,,,,
419,250198,"Alan Kelly, Ireland",UTC,2019-10-31T00:00:00+00:00,1572480000,1572480000,1572483600,,Mercedes-Benz Stadium,Atlanta,...,1,2,1,1,1,2,,,,


# <span style="color:steelblue">FIXTURES ENDPOINT - BASIC STATS

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

After aggregating the all of the basic info for the 2016-019 seasons, I used the resulting CSV to request basic stats for the fixtures.

In [39]:
all_fixtures = pd.read_csv('../data/api_football_data/02_aggregated_info \
                           /rapi_fixtures_2016_2019.csv')
all_fixture_ids = all_fixtures['fx_id']

In [40]:
len(all_fixture_ids)

1576

### <span style="color:steelblue">Requesting Fixture Basic Stats 

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

In [45]:
fixture_statistics = []
report = []

url = "https://api-football-v1.p.rapidapi.com/v3/fixtures/statistics"

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': secrets['rapid_api_host']
    }

for fix in all_fixture_ids:
    
    querystring = {"fixture":fix}
    
    try:
        response = requests.request("GET", url, headers=headers, params=querystring)
        
    except Exception as e:                                              
        cur_datetime = dt.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        report.append({'event': f'request failed',
                       'datetime': f'{cur_datetime}',
                       'fixture id': f'{fix}',
                       'exception': f'{e}'
                       })
        time.sleep(.25)
        continue
    
    fix_json = response.json()
    fixture_statistics.append(fix_json)
        
    cur_datetime = dt.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    report.append({'event': f'request succeeded',
                   'datetime': f'{cur_datetime}',
                   'fixture id': f'{fix}',
                   'exception': ''
                   })    

    df_report = pd.DataFrame(report)
    df_report.to_csv(f'../data/api_football_data/03_fixtures_basic_stats/rapi_fixtures_basic_stats_2016_2019_report.csv', index=False)

    with open(f'../data/api_football_data/03_fixtures_basic_stats/rapi_fixtures_basic_stats_2016_2019.json', 'w') as outfile:
        json.dump(obj=fixture_statistics, fp=outfile , ensure_ascii=False, indent=1)
    
    # subscription allows 300 request per minute (5 per second)
    time.sleep(.25)      


# <span style="color:darkcyan">TEAMS ENDPOINT

### <span style="color:darkcyan">Sounders Statistics by Season 

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="darkcyan">
</td>    
</tr>

In [None]:
team_stats_ssfc = []

for year in list(range(2012,2022,1)):
    url = "https://api-football-v1.p.rapidapi.com/v3/teams/statistics"
    querystring = {"league":"253","season":year,"team":"1595"}
    headers = {
        'x-rapidapi-key': secrets['rapid_api_key'],
        'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
        }
    response = requests.request("GET", url, headers=headers, params=querystring)
    
    team_stats_ssfc.append(response.json()['response'])

In [None]:
with open('../data/api_football_data/rapi-team_stats_ssfc_2012_2021.json', 'w') as outfile:
    json.dump(obj=team_stats_ssfc, fp=outfile , ensure_ascii=False, indent=1)

In [None]:
with open('../data/api_football_data/rapi-team_stats_ssfc_2012_2021.json') as f:
    team_stats_ssfc = json.load(f)

### <span style="color:darkcyan">All Teams Statistics by Season 

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="darkcyan">
</td>    
</tr>

In [None]:
all_team_ids

In [None]:
for team in all_team_ids:
    team_stats = []
    print(f'Starting {team} at', datetime.now().strftime("%H:%M:%S"))
    for year in list(range(2012,2022,1)):
        url = "https://api-football-v1.p.rapidapi.com/v3/teams/statistics"
        querystring = {"league":"253","season":year,"team":f"{team}"}
        headers = {
            'x-rapidapi-key': secrets['rapid_api_key'],
            'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
            }

        response = requests.request("GET", url, headers=headers, params=querystring)
        team_stats.append(response.json())
        print(year)
#         time.sleep(1)
    with open(f'../data/api_football_data/teams-statistics/rapi-team_stats_{team}.json', 'w') as outfile:
        json.dump(obj=team_stats, fp=outfile , ensure_ascii=False, indent=1)
    print(f'Completed {team} at', datetime.now().strftime("%H:%M:%S"))
    print('')
#     time.sleep(1)

# <span style="color:firebrick">STANDINGS ENDPOINT

The standings endpoint has end-of-season standings only. Though, it doesn't have information for standings throughout the season, I can use the standings as weights for start-of-season ratings. 

In [None]:
 import requests

url = "https://api-football-v1.p.rapidapi.com/v3/standings"

querystring = {"season":"2019","team":"1595"}

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
    }

response = requests.request("GET", url, headers=headers, params=querystring)

In [None]:
response.json()

In [None]:
import requests

url = "https://api-football-v1.p.rapidapi.com/v3/standings"

querystring = {"season":"2019","league":"253"}

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
    }

response = requests.request("GET", url, headers=headers, params=querystring)

In [None]:
# response.json()

In [99]:
standings = []

for year in list(range(2012,2022,1)):
    url = "https://api-football-v1.p.rapidapi.com/v3/standings"
    querystring = {"league":"253","season":year,}
    headers = {
        'x-rapidapi-key': secrets['rapid_api_key'],
        'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
        }

    response = requests.request("GET", url, headers=headers, params=querystring)
    time.sleep(1)
    standings.append(response.json())
    print(year)
    
with open(f'../data/api_football_data/00_season_standings/rapi-team_standings.json', 'w') as outfile:
    json.dump(obj=standings, fp=outfile , ensure_ascii=False, indent=1)

2012
2013
2014
2015
2016
2017
2018
2019
2020
2021


In [105]:
with open('../data/api_football_data/00_season_standings/rapi-team_standings.json') as f:
    standings = json.load(f)

In [127]:
#                 i              
table = standings[0]['response'][0]['league']['standings'][0]

In [163]:
table[0]

{'rank': 1,
 'team': {'id': 1609,
  'name': 'New England Revolution',
  'logo': 'https://media.api-sports.io/football/teams/1609.png'},
 'points': 17,
 'goalsDiff': 4,
 'group': 'MLS - Eastern Conference',
 'form': 'WWWDL',
 'status': 'same',
 'description': 'Final Series',
 'all': {'played': 8,
  'win': 5,
  'draw': 2,
  'lose': 1,
  'goals': {'for': 11, 'against': 7}},
 'home': {'played': 4,
  'win': 4,
  'draw': 0,
  'lose': 0,
  'goals': {'for': 7, 'against': 2}},
 'away': {'played': 4,
  'win': 1,
  'draw': 2,
  'lose': 1,
  'goals': {'for': 4, 'against': 5}},
 'update': '2021-05-31T00:00:00+00:00'}

In [100]:
standings_df = pd.read_json('../data/api_football_data/00_season_standings/rapi-team_standings.json')

In [162]:
standings_df[4:-2]

Unnamed: 0,get,parameters,errors,results,paging,response
4,standings,"{'league': '253', 'season': '2016'}",[],1,"{'current': 1, 'total': 1}","[{'league': {'id': 253, 'name': 'Major League ..."
5,standings,"{'league': '253', 'season': '2017'}",[],1,"{'current': 1, 'total': 1}","[{'league': {'id': 253, 'name': 'Major League ..."
6,standings,"{'league': '253', 'season': '2018'}",[],1,"{'current': 1, 'total': 1}","[{'league': {'id': 253, 'name': 'Major League ..."
7,standings,"{'league': '253', 'season': '2019'}",[],1,"{'current': 1, 'total': 1}","[{'league': {'id': 253, 'name': 'Major League ..."


Pulling out the information from this endpoint was taking more time than it was worth to code it all out. So I went to https://www.mlssoccer.com/standings and spent 20 minutes copying/pasting the tables I needed into google sheets and downloading them as CSVs.

# Form Guide

In [47]:
ssfc_stats = pd.read_json('../data/api_football_data/rapi-team_stats_ssfc_2012_2021.json')

In [54]:
ssfc_stats

Unnamed: 0,league,team,form,fixtures,goals,biggest,clean_sheet,failed_to_score,penalty,lineups,cards
0,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",WWLDWWWWWLDLDLDLDDWDWWLWWDWDLDWDWLDWLW,"{'played': {'home': 19, 'away': 19, 'total': 3...","{'for': {'total': {'home': 29, 'away': 25, 'to...","{'streak': {'wins': 5, 'draws': 2, 'loses': 1}...","{'home': 9, 'away': 5, 'total': 14}","{'home': 5, 'away': 5, 'total': 10}","{'scored': {'total': 1, 'percentage': '100.00%...",[],"{'yellow': {'0-15': {'total': None, 'percentag..."
1,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",LDLLDWDWWWLWWLWLLDWWWLWWWWWDDLLLLDWLL,"{'played': {'home': 19, 'away': 18, 'total': 3...","{'for': {'total': {'home': 32, 'away': 15, 'to...","{'streak': {'wins': 5, 'draws': 2, 'loses': 4}...","{'home': 8, 'away': 4, 'total': 12}","{'home': 2, 'away': 7, 'total': 9}","{'scored': {'total': 3, 'percentage': '100.00%...",[],"{'yellow': {'0-15': {'total': None, 'percentag..."
2,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",WLWLDWWWWWLWDWWWLWLLWLDWWWWLLWWLDWDDLW,"{'played': {'home': 19, 'away': 19, 'total': 3...","{'for': {'total': {'home': 33, 'away': 35, 'to...","{'streak': {'wins': 5, 'draws': 2, 'loses': 2}...","{'home': 8, 'away': 2, 'total': 10}","{'home': 3, 'away': 4, 'total': 7}","{'scored': {'total': 6, 'percentage': '100.00%...",[],"{'yellow': {'0-15': {'total': None, 'percentag..."
3,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",WLDWLWWWLWDWWLWLLLWLLLLLWLWWDWDDDWWWL,"{'played': {'home': 19, 'away': 18, 'total': 3...","{'for': {'total': {'home': 31, 'away': 19, 'to...","{'streak': {'wins': 3, 'draws': 3, 'loses': 5}...","{'home': 8, 'away': 3, 'total': 11}","{'home': 4, 'away': 7, 'total': 11}","{'scored': {'total': 2, 'percentage': '100.00%...",[],"{'yellow': {'0-15': {'total': 2, 'percentage':..."
4,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",LLLWDWLWWLLLWLLDLWLLDWWWDLDWWWWDLWWWLWWD,"{'played': {'home': 20, 'away': 20, 'total': 4...","{'for': {'total': {'home': 28, 'away': 24, 'to...","{'streak': {'wins': 4, 'draws': 1, 'loses': 3}...","{'home': 9, 'away': 3, 'total': 12}","{'home': 5, 'away': 4, 'total': 9}","{'scored': {'total': 4, 'percentage': '100.00%...","[{'formation': '4-2-3-1', 'played': 25}, {'for...","{'yellow': {'0-15': {'total': 3, 'percentage':..."
5,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",LDWDDLWDLLLWWLWLDDWWWDWWWDDDDLWLWWDWWWL,"{'played': {'home': 19, 'away': 20, 'total': 3...","{'for': {'total': {'home': 37, 'away': 22, 'to...","{'streak': {'wins': 3, 'draws': 4, 'loses': 3}...","{'home': 11, 'away': 6, 'total': 17}","{'home': 2, 'away': 8, 'total': 10}","{'scored': {'total': 5, 'percentage': '100.00%...","[{'formation': '4-2-3-1', 'played': 27}, {'for...","{'yellow': {'0-15': {'total': 4, 'percentage':..."
6,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",LLLDWLDWLLLWLDLWDDWWWWWWWWWLLWWWWWLW,"{'played': {'home': 18, 'away': 18, 'total': 3...","{'for': {'total': {'home': 36, 'away': 20, 'to...","{'streak': {'wins': 9, 'draws': 2, 'loses': 3}...","{'home': 4, 'away': 3, 'total': 7}","{'home': 5, 'away': 6, 'total': 11}","{'scored': {'total': 6, 'percentage': '100.00%...","[{'formation': '4-2-3-1', 'played': 31}, {'for...","{'yellow': {'0-15': {'total': 2, 'percentage':..."
7,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",WWWDWWLDDDWWDLLLWLWWLWLDLDWWLWDLWWWWWW,"{'played': {'home': 20, 'away': 18, 'total': 3...","{'for': {'total': {'home': 43, 'away': 21, 'to...","{'streak': {'wins': 3, 'draws': 3, 'loses': 3}...","{'home': 7, 'away': 4, 'total': 11}","{'home': 1, 'away': 6, 'total': 7}","{'scored': {'total': 3, 'percentage': '100.00%...","[{'formation': '4-2-3-1', 'played': 34}, {'for...","{'yellow': {'0-15': {'total': 3, 'percentage':..."
8,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",WDDLWLWWDLWWLWWWLDDWLDWWWWL,"{'played': {'home': 16, 'away': 11, 'total': 2...","{'for': {'total': {'home': 38, 'away': 14, 'to...","{'streak': {'wins': 4, 'draws': 2, 'loses': 1}...","{'home': 4, 'away': 3, 'total': 7}","{'home': 1, 'away': 3, 'total': 4}","{'scored': {'total': 5, 'percentage': '100.00%...","[{'formation': '4-2-3-1', 'played': 26}, {'for...","{'yellow': {'0-15': {'total': 1, 'percentage':..."
9,"{'id': 253, 'name': 'Major League Soccer', 'co...","{'id': 1595, 'name': 'Seattle Sounders', 'logo...",WDWWWWDD,"{'played': {'home': 5, 'away': 3, 'total': 8},...","{'for': {'total': {'home': 10, 'away': 4, 'tot...","{'streak': {'wins': 4, 'draws': 1, 'loses': 0}...","{'home': 4, 'away': 1, 'total': 5}","{'home': 1, 'away': 0, 'total': 1}","{'scored': {'total': 1, 'percentage': '100.00%...","[{'formation': '5-3-2', 'played': 4}, {'format...","{'yellow': {'0-15': {'total': 1, 'percentage':..."


In [73]:
form_dicts = []

for row in ssfc_stats.iterrows():
#     print(row[1]['team']['name'])
#     print(row[1]['league']['season'])
#     print(len(row[1]['form']))
#     print(row[1]['form'])
    
    season_form = {'season':row[1]['league']['season']} 
    
    for i in range(40):
        try:
            season_form[i+1] = row[1]['form'][i]
        except:
            season_form[i+1] = ''
        
    form_dicts.append(season_form)

In [164]:
pd.DataFrame(form_dicts).set_index('season')

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,...,31,32,33,34,35,36,37,38,39,40
season,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2012,W,W,L,D,W,W,W,W,W,L,...,W,D,W,L,D,W,L,W,,
2013,L,D,L,L,D,W,D,W,W,W,...,L,L,L,D,W,L,L,,,
2014,W,L,W,L,D,W,W,W,W,W,...,W,L,D,W,D,D,L,W,,
2015,W,L,D,W,L,W,W,W,L,W,...,D,D,D,W,W,W,L,,,
2016,L,L,L,W,D,W,L,W,W,L,...,W,D,L,W,W,W,L,W,W,D
2017,L,D,W,D,D,L,W,D,L,L,...,W,L,W,W,D,W,W,W,L,
2018,L,L,L,D,W,L,D,W,L,L,...,W,W,W,W,L,W,,,,
2019,W,W,W,D,W,W,L,D,D,D,...,D,L,W,W,W,W,W,W,,
2020,W,D,D,L,W,L,W,W,D,L,...,,,,,,,,,,
2021,W,D,W,W,W,W,D,D,,,...,,,,,,,,,,
