![](../resources/images/data_source_logos/api_sports/rapid_api_banner2.png)

Source: Data in this notebook is collected from [API Sports](https://www.api-football.com) via [RapidAPI](https://rapidapi.com/api-sports/api/api-football).

### Imports

In [3]:
import requests
import json
import pandas as pd
import time
from datetime import datetime

In [4]:
with open('../git_ignore/secrets.json') as f:
    secrets = json.load(f)

#   
___

#  

# <span style="color:orangered">API Architecture

<img src='../resources/api_football_info/api_architechture.jpg' width=600>
<!-- ![](../resources/api_football_info/api_architechture.jpg) -->
  
[Image Source: API Football Documentation](https://www.api-football.com/documentation-v3#section/Architecture)

# <span style="color:mediumblue">LEAGUES ENDPOINT - DATA COVERAGE

From a call to the "leagues" endpoint filtered by country, I found a list of leagues in the United States, and within that list - Major League Soccer. I then did another call for MLS specifically and exported the resulting JSON to a file so that I wouldn't need to make either of those particular calls again.

### Requesting all Leagues in the United States to find MLS League ID

In [5]:
url = "https://api-football-v1.p.rapidapi.com/v3/leagues"

querystring = {"country":"USA"}

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
    }

leagues_response = requests.request("GET", url, headers=headers, params=querystring)

In [6]:
print('ID ', '   ', 'League' )
print('---', '   ','---------------')
for league in leagues_response.json()['response']:
    print(league['league']['id'],' - ' , league['league']['name'])

ID      League
---     ---------------
253  -  Major League Soccer
257  -  US Open Cup
255  -  USL Championship
256  -  USL League Two
489  -  USL League One
523  -  NISA
254  -  NWSL Women
641  -  NWSL Women - Challenge Cup


### Requesting League info for Major League Soccer

In [7]:
url = "https://api-football-v1.p.rapidapi.com/v3/leagues"

querystring = {"id":"253"}

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
    }

mls_response = requests.request("GET", url, headers=headers, params=querystring)

In [8]:
mls_json = mls_response.json()['response'][0]

In [9]:
print('League:', mls_json['league']['id'],' - ' ,mls_json['league']['name'])
print('Seasons: ')
for season in mls_json['seasons']:
    print(season['year'])

League: 253  -  Major League Soccer
Seasons: 
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021


Next, I exported the JSON to a file and then read it back in to verify that I had what I needed to proceed, and that I would be able to bring it back in correctly the next time I open the notebook.

In [10]:
with open('../data/api_football_data/rapi-league_mls.json', 'w') as outfile:
    json.dump(obj=mls_json, fp=outfile , ensure_ascii=False, indent=1)

In [11]:
with open('../data/api_football_data/rapi-league_mls.json') as f:
    league_mls = json.load(f)

The output from the JSON gives a rundown of all available data for MLS. The first year in which statistics are available for each game (fixture/match) are available is 2015. So I will likely be using 2015-2019 (or possibly 2020*) as the range from which I will examine data.

**2020 being a tumultuous year with even more variables contributing to match outcomes, I expect to exclude it from my data.*

###  Requesting info for every MLS Season

The API request for MLS' seasons returned a JSON with data for every MLS season since 2012. It provided an overview of what kind of data was available for each season (coverage). I dug into the json and brought all the information into one dataframe. I put that info in a markdown table below for reference.

**Sample Output:** 

```
[{'year': 2012,
  'start': '2012-03-10',
  'end': '2012-12-01',
  'current': False,
  'coverage': {'fixtures': {'events': True,
                            'lineups': True,
                            'statistics_fixtures': False,
                            'statistics_players': False},
               'standings': False,
               'players': True,
               'top_scorers': True,
               'top_assists': True,
               'top_cards': True,
               'injuries': False,
               'predictions': True,
               'odds': False}}]
   ```

### Seasons Coverage

Next step is to dig into the JSON. To make it more readable, I converted it into several Pandas Dataframes. 
1. The first Dataframe would contain the coverage info for each season
2. The second would be made up of the coverage info for individual fixtures in each season

Then I joined those two tables together so I could see all coverage information at a glance.

**Creating the First DataFrame - Season Coverage**

In [12]:
coverage_list = []

for year in league_mls['seasons']:
    coverage_list.append(year['coverage'])

In [13]:
stats_seasons_coverage = pd.DataFrame(coverage_list).drop(columns=['fixtures']) 

years = list(range(2012,2022,1))
stats_seasons_coverage['year'] = years

stats_seasons_coverage.set_index('year', inplace=True)

In [14]:
stats_seasons_coverage.rename(axis=1, 
                              mapper={'predictions':'preds'}, 
                              inplace=True)

#### Creating the Second DataFrame - Season Fixtures Coverage

In [15]:
stats_fixture_coverage = pd.DataFrame(list(pd.DataFrame(coverage_list)['fixtures']))

years = list(range(2012,2022,1))
stats_fixture_coverage['year'] = years

stats_fixture_coverage.set_index('year', inplace=True)

In [16]:
stats_fixture_coverage.rename(axis=1, 
                              mapper={'statistics_fixtures':'match_stats', 
                                      'statistics_players':'player_stats'}, 
                              inplace=True)

#### Merging the two DataFrames

So that I would be able to distinguish between the columns from the two merged DataFrames, I added a prefix to every column name.

In [17]:
stats_seasons_coverage = stats_seasons_coverage.add_prefix('tm_')
stats_fixture_coverage = stats_fixture_coverage.add_prefix('fx_')
mls_data_coverage = pd.concat([stats_seasons_coverage, stats_fixture_coverage], axis=1)

In [20]:
# mls_data_coverage

I converted the new DataFrame to markdown and quickly ran a find-replace for all   
of the "False" values - replacing them with red/bolded "False" values for readability.

### MLS Data Coverage

| year | tm_stndgs | tm_plrs | tm_top_g | tm_top_a | tm_top_cards | tm_injuries | tm_preds | tm_odds | fx_events | fx_lineups | fx_match_stats | fx_player_stats |
|------|--------------|------------|----------------|----------------|--------------|-------------|----------|---------|-----------|------------|----------------|-----------------|
| 2012 | <span style='color:red;font-weight:bold'>False</span>        | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | <span style='color:red;font-weight:bold'>False</span>          | <span style='color:red;font-weight:bold'>False</span>           |
| 2013 | <span style='color:red;font-weight:bold'>False</span>        | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | <span style='color:red;font-weight:bold'>False</span>          | <span style='color:red;font-weight:bold'>False</span>           |
| 2014 | <span style='color:red;font-weight:bold'>False</span>        | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | <span style='color:red;font-weight:bold'>False</span>          | <span style='color:red;font-weight:bold'>False</span>           |
| 2015 | <span style='color:red;font-weight:bold'>False</span>        | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2016 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2017 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2018 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2019 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2020 | True         | True       | True           | True           | True         | <span style='color:red;font-weight:bold'>False</span>       | True     | <span style='color:red;font-weight:bold'>False</span>   | True      | True       | True           | True            |
| 2021 | True         | True       | True           | True           | True         | True        | True     | True    | True      | True       | True           | True            |

Now that I have this for reference, It's pretty clear that I won't be able to use 2012-2014 because the match (fixture) statistics and individual player statistics are unavailable. I could work with 2015 and on, but 2015 doesn't have team standings, which I'm hoping to use as a feature for any given match-up. This means I'll be limited to the data from 2016-2020.

# <span style="color:steelblue">FIXTURES ENDPOINT - INFO AND BASIC STATS

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

### <span style="color:steelblue">Testing - 2019 Fixtures

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

In [24]:
url = "https://api-football-v1.p.rapidapi.com/v3/fixtures"
querystring = {"league":"253","season":"2019"}
headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': secrets['rapid_api_host']
    }

mls19_response = requests.request("GET", url, headers=headers, params=querystring)
mls_2019 = mls19_response.json()

In [25]:
with open('../data/api_football_data/rapi-fixtures-mls_2019.json', 'w') as outfile:
    json.dump(obj=mls_2019, fp=outfile , ensure_ascii=False, indent=1)

In [None]:
# mls_2019['response'][-1]


```
{'fixture': {'id': 250199,
             'referee': 'Allen Chapman, USA',
             'timezone': 'UTC',
             'date': '2019-11-10T20:00:00+00:00',
             'timestamp': 1573416000,
             'periods': {'first': 1573416000, 'second': 1573419600},
             'venue': {'id': None, 'name': 'CenturyLink Field', 'city': 'Seattle'},
             'status': {'long': 'Match Finished', 'short': 'FT', 'elapsed': 90}
            },
            
 'league': {'id': 253,
            'name': 'Major League Soccer',
            'country': 'USA',
            'logo': 'https://media.api-sports.io/football/leagues/253.png',
            'flag': 'https://media.api-sports.io/flags/us.svg',
            'season': 2019,
            'round': 'MLS Cup - Final'},
            
 'teams': {'home': {'id': 1595,
                   'name': 'Seattle Sounders',
                   'logo': 'https://media.api-sports.io/football/teams/1595.png',
                   'winner': True
                   },
                   
          'away': {'id': 1601,
                   'name': 'Toronto FC',
                   'logo': 'https://media.api-sports.io/football/teams/1601.png',
                   'winner': False
                   }
           },
           
 'goals': {'home': 3, 'away': 1},
 
 'score': {'halftime': {'home': 0, 'away': 0},
           'fulltime': {'home': 3, 'away': 1},
           'extratime': {'home': None, 'away': None},
           'penalty': {'home': None, 'away': None}
          
           }
  }
```



### <span style="color:steelblue">Requesting and Saving all MLS Fixtures by Season

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

Requesting Fixture Info for all MLS fixtures by season and going through them to extract a full list of team IDs

In [49]:
url = "https://api-football-v1.p.rapidapi.com/v3/fixtures"

for year in list(range(2012,2022,1)):
   
    # requesting season info
    querystring = {"league":"253","season":year}
    headers = {
        'x-rapidapi-key': secrets['rapid_api_key'],
        'x-rapidapi-host': secrets['rapid_api_host']
        }
    response = requests.request("GET", url, headers=headers, params=querystring)
    season_json = response.json()
    time.sleep(.5)
    
    # saving season info
    with open(f'../data/api_football_data/rapi-fixtures-mls_{year}.json', 'w') as outfile:
        json.dump(obj=season_json, fp=outfile , ensure_ascii=False, indent=1)
    
#     # extracting and saving team IDs
#     for fixture in season_json['response']:
#         team_id = fixture['teams']['home']['id']
#         team_name = fixture['teams']['home']['name']

#         mls_team_ids[team_id] = {'name': team_name}
    
# mls_team_ids_df = pd.DataFrame(mls_team_ids).T

### <span style="color:steelblue">Extracting list of all team IDs from Fixtures Info

<table width=100%>
<tr>
<td style="text-align:left" bgcolor="lightsteelblue">
</td>    
</tr>

In [None]:
mls_team_ids = {}

for year in list(range(2012,2022,1)):
   
    with open(f'../data/api_football_data/rapi-fixtures-mls_{year}.json') as f:
        loaded_json = json.load(f)
    
    # extracting and saving team IDs
    for fixture in season_json['response']:
        team_id = fixture['teams']['home']['id']
        team_name = fixture['teams']['home']['name']

        mls_team_ids[team_id] = {'name': team_name}
    
# mls_team_ids_df = pd.DataFrame(mls_team_ids).T   
# mls_team_ids_df.sort_values(by='name')    

## Requesting Fixture Basic Stats 


In [None]:
fixture_statistics = []
report = []

url = "https://api-football-v1.p.rapidapi.com/v3/fixtures/statistics"

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': secrets['rapid_api_host']
    }

for fix in all_fixture_ids:
    
    querystring = {"fixture":fix}
    
    try:
        response = requests.request("GET", url, headers=headers, params=querystring)
        
    except Exception as e:                                              
        cur_datetime = dt.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        report.append({'event': f'request failed',
                       'datetime': f'{cur_datetime}',
                       'fixture id': f'{fix}',
                       'exception': f'{e}'
                       })
        time.sleep(.25)
        continue
    
    fix_json = response.json()
    fixture_statistics.append(fix_json)
        
    cur_datetime = dt.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    report.append({'event': f'request succeeded',
                   'datetime': f'{cur_datetime}',
                   'fixture id': f'{fix}',
                   'exception': ''
                   })    

    df_report = pd.DataFrame(report)
    df_report.to_csv(f'../data/api_football_data/fixtures-statistics/report.csv', index=False)

    with open(f'../data/api_football_data/fixtures-statistics/fix_adv_stats.json', 'w') as outfile:
        json.dump(obj=fixture_statistics, fp=outfile , ensure_ascii=False, indent=1)
    
    # subscription allows 300 request per minute (5 per second)
    time.sleep(.25)      


In [None]:
import requests
import pandas as pd
import datetime as dt
import time

In [None]:
all_fixtures = pd.read_csv('../data/output_aggragated/mls_fixtures_2015_2020.csv')

all_fixture_ids = all_fixtures['fx_id']

test_set = all_fixture_ids[0:11]

In [None]:
len(all_fixture_ids)

In [None]:
test_set

# TEAMS ENDPOINT

### Sounders Statistics by Season

In [None]:
team_stats_ssfc = []

for year in list(range(2012,2022,1)):
    url = "https://api-football-v1.p.rapidapi.com/v3/teams/statistics"
    querystring = {"league":"253","season":year,"team":"1595"}
    headers = {
        'x-rapidapi-key': secrets['rapid_api_key'],
        'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
        }
    response = requests.request("GET", url, headers=headers, params=querystring)
    
    team_stats_ssfc.append(response.json()['response'])

In [None]:
with open('../data/api_football_data/rapi-team_stats_ssfc_2012_2021.json', 'w') as outfile:
    json.dump(obj=team_stats_ssfc, fp=outfile , ensure_ascii=False, indent=1)

In [None]:
with open('../data/api_football_data/rapi-team_stats_ssfc_2012_2021.json') as f:
    team_stats_ssfc = json.load(f)

### All Teams Statistics by Season

In [195]:
all_team_ids

[1608,
 1607,
 13173,
 1610,
 1613,
 1615,
 2242,
 1597,
 1600,
 1616,
 1605,
 1612,
 1614,
 1609,
 1604,
 1602,
 1598,
 1599,
 1617,
 1606,
 1596,
 1595,
 1611,
 1601,
 1603,
 8007]

In [None]:
for team in all_team_ids:
    team_stats = []
    print(f'Starting {team} at', datetime.now().strftime("%H:%M:%S"))
    for year in list(range(2012,2022,1)):
        url = "https://api-football-v1.p.rapidapi.com/v3/teams/statistics"
        querystring = {"league":"253","season":year,"team":f"{team}"}
        headers = {
            'x-rapidapi-key': secrets['rapid_api_key'],
            'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
            }

        response = requests.request("GET", url, headers=headers, params=querystring)
        team_stats.append(response.json())
        print(year)
#         time.sleep(1)
    with open(f'../data/api_football_data/teams-statistics/rapi-team_stats_{team}.json', 'w') as outfile:
        json.dump(obj=team_stats, fp=outfile , ensure_ascii=False, indent=1)
    print(f'Completed {team} at', datetime.now().strftime("%H:%M:%S"))
    print('')
#     time.sleep(1)

In [None]:
 import requests

url = "https://api-football-v1.p.rapidapi.com/v3/standings"

querystring = {"season":"2019","team":"1595"}

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
    }

response = requests.request("GET", url, headers=headers, params=querystring)

In [48]:
response.json()

{'get': 'fixtures',
 'parameters': {'league': '253', 'season': '2021'},
 'errors': [],
 'results': 459,
 'paging': {'current': 1, 'total': 1},
 'response': [{'fixture': {'id': 684127,
    'referee': 'A. Villarreal',
    'timezone': 'UTC',
    'date': '2021-04-17T00:00:00+00:00',
    'timestamp': 1618617600,
    'periods': {'first': 1618617600, 'second': 1618621200},
    'venue': {'id': 1614, 'name': 'BBVA Stadium', 'city': 'Houston, Texas'},
    'status': {'long': 'Match Finished', 'short': 'FT', 'elapsed': 90}},
   'league': {'id': 253,
    'name': 'Major League Soccer',
    'country': 'USA',
    'logo': 'https://media.api-sports.io/football/leagues/253.png',
    'flag': 'https://media.api-sports.io/flags/us.svg',
    'season': 2021,
    'round': 'Regular Season - 1'},
   'teams': {'home': {'id': 1600,
     'name': 'Houston Dynamo',
     'logo': 'https://media.api-sports.io/football/teams/1600.png',
     'winner': True},
    'away': {'id': 1596,
     'name': 'San Jose Earthquakes',
  

In [38]:
import requests

url = "https://api-football-v1.p.rapidapi.com/v3/standings"

querystring = {"season":"2020","league":"253"}

headers = {
    'x-rapidapi-key': secrets['rapid_api_key'],
    'x-rapidapi-host': "api-football-v1.p.rapidapi.com"
    }

response = requests.request("GET", url, headers=headers, params=querystring)

response.json() 

{'get': 'standings',
 'parameters': {'league': '253', 'season': '2020'},
 'errors': [],
 'results': 1,
 'paging': {'current': 1, 'total': 1},
 'response': [{'league': {'id': 253,
    'name': 'Major League Soccer',
    'country': 'USA',
    'logo': 'https://media.api-sports.io/football/leagues/253.png',
    'flag': 'https://media.api-sports.io/flags/us.svg',
    'season': 2020,
    'standings': [[{'rank': 1,
       'team': {'id': 1598,
        'name': 'Orlando City SC',
        'logo': 'https://media.api-sports.io/football/teams/1598.png'},
       'points': 7,
       'goalsDiff': 3,
       'group': 'MLS: Group Stage',
       'form': 'DWW',
       'status': 'same',
       'description': 'Next Round',
       'all': {'played': 3,
        'win': 2,
        'draw': 1,
        'lose': 0,
        'goals': {'for': 6, 'against': 3}},
       'home': {'played': 1,
        'win': 1,
        'draw': 0,
        'lose': 0,
        'goals': {'for': 2, 'against': 1}},
       'away': {'played': 2,
      