# Working with StatsBomb competition and match data

### Import libraries and modules

In [1]:
import json

### Load in the competition and match data

In [14]:
# loading in the competition data
with open('SoccermaticsForPython-master/StatsBomb/data/competitions.json') as f:
    competitions = json.load(f)

Let's see what data we have for a competition.

In [9]:
competitions[0]

{'competition_id': 16,
 'season_id': 4,
 'country_name': 'Europe',
 'competition_name': 'Champions League',
 'competition_gender': 'male',
 'season_name': '2018/2019',
 'match_updated': '2020-10-25T12:33:27.855343',
 'match_available': '2020-10-25T12:33:27.855343'}

Let's have a look at all the competitions and seasons we have in our competition data.

In [11]:
for competition in competitions:
    name = competition['competition_name']
    season = competition['season_name']
    print(f'{name} {season}')

Champions League 2018/2019
Champions League 2017/2018
Champions League 2016/2017
Champions League 2015/2016
Champions League 2014/2015
Champions League 2013/2014
Champions League 2012/2013
Champions League 2011/2012
Champions League 2010/2011
Champions League 2009/2010
Champions League 2008/2009
Champions League 2006/2007
Champions League 2004/2005
Champions League 2003/2004
Champions League 1999/2000
FA Women's Super League 2019/2020
FA Women's Super League 2018/2019
FIFA World Cup 2018
La Liga 2019/2020
La Liga 2018/2019
La Liga 2017/2018
La Liga 2016/2017
La Liga 2015/2016
La Liga 2014/2015
La Liga 2013/2014
La Liga 2012/2013
La Liga 2011/2012
La Liga 2010/2011
La Liga 2009/2010
La Liga 2008/2009
La Liga 2007/2008
La Liga 2006/2007
La Liga 2005/2006
La Liga 2004/2005
NWSL 2018
Premier League 2003/2004
Women's World Cup 2019


I'd like to investigate the data from the 2014-15 season of La Liga. This was the last season when FC Barcelona won the treble (La Liga, Copa del Rey and the UEFA Champions League). Let's load in the La Liga match data specifically from the 2014-15 season. 

In [12]:
# finding the competition id and season id of La Liga 2014-15 season to help fetch match data from this season
for competition in competitions:
    if (competition['competition_name'] == 'La Liga') and (competition['season_name'] == '2014/2015'):
        competition_id = competition['competition_id']
        season_id = competition['season_id']
        break

In [19]:
# loading in La Liga 2014-15 match data
match_data_location = f''
with open('SoccermaticsForPython-master/StatsBomb/data/matches/' + str(competition_id) + '/' + str(season_id) + '.json', encoding='utf8') as f:
    matches = json.load(f)

Let's check out what data we have for a single match.

In [22]:
matches[0]

{'match_id': 266117,
 'match_date': '2014-09-27',
 'kick_off': '18:00:00.000',
 'competition': {'competition_id': 11,
  'country_name': 'Spain',
  'competition_name': 'La Liga'},
 'season': {'season_id': 26, 'season_name': '2014/2015'},
 'home_team': {'home_team_id': 217,
  'home_team_name': 'Barcelona',
  'home_team_gender': 'male',
  'home_team_group': None,
  'country': {'id': 214, 'name': 'Spain'},
  'managers': [{'id': 793,
    'name': 'Luis Enrique Martínez García',
    'nickname': 'Luis Enrique',
    'dob': None,
    'country': {'id': 214, 'name': 'Spain'}}]},
 'away_team': {'away_team_id': 1049,
  'away_team_name': 'Granada',
  'away_team_gender': 'male',
  'away_team_group': None,
  'country': {'id': 214, 'name': 'Spain'},
  'managers': [{'id': 497,
    'name': 'Joaquín de Jesús Caparrós Camino',
    'nickname': 'Joaquín Caparrós',
    'dob': None,
    'country': {'id': 214, 'name': 'Spain'}}]},
 'home_score': 6,
 'away_score': 0,
 'match_status': 'available',
 'last_updated':

You can see that FC Barcelona's name is saved as 'Barcelona' in this data. We will need information this later.

### Data analysis

A quick check shows that this json file contains only the match result data of FC Barcelona from La Liga 2014-15:     

In [32]:
total_matches = 0
for match in matches:
    home_team = match['home_team']['home_team_name']
    away_team = match['away_team']['away_team_name']
    total_matches += 1
    print(f'Home team: {home_team}, Away team: {away_team}')
print('\n')
print(f'Total number of matches listed in the json file: {total_matches}')

Home team: Barcelona, Away team: Granada
Home team: Barcelona, Away team: Rayo Vallecano
Home team: Barcelona, Away team: Real Madrid
Home team: Barcelona, Away team: Deportivo La Coruna
Home team: Levante, Away team: Barcelona
Home team: Barcelona, Away team: Getafe
Home team: Getafe, Away team: Barcelona
Home team: Barcelona, Away team: Athletic Bilbao
Home team: Barcelona, Away team: Celta Vigo
Home team: Barcelona, Away team: Eibar
Home team: Córdoba, Away team: Barcelona
Home team: Barcelona, Away team: Almería
Home team: Sevilla, Away team: Barcelona
Home team: Villarreal, Away team: Barcelona
Home team: Real Madrid, Away team: Barcelona
Home team: Rayo Vallecano, Away team: Barcelona
Home team: Barcelona, Away team: Málaga
Home team: Barcelona, Away team: Real Sociedad
Home team: Barcelona, Away team: Atlético Madrid
Home team: Real Sociedad, Away team: Barcelona
Home team: Barcelona, Away team: Villarreal
Home team: Barcelona, Away team: Sevilla
Home team: Málaga, Away team: Ba

You can notice that each of the matches listed above contain FC Barcelona either as the home team or as the away team. Let's confirm this. If this json file contains all of FC Barcelona's results from La Liga 2014-15 only, then we should have 19 home matches and 19 away matches with FC Barcelona being either the home team or the away team (there are a total of 20 teams in La Liga, so each team plays against the other 19 teams twice - once at home and once away from home).

In [34]:
home_matches, away_matches = 0, 0
for match in matches:
    if match['home_team']['home_team_name'] == 'Barcelona':
        home_matches += 1
    elif match['away_team']['away_team_name'] == 'Barcelona':
        away_matches += 1
print(f'Number of matches in the json file with FC Barcelona as the home team: {home_matches}')
print(f'Number of matches in the json file with FC Barcelona as the away team: {away_matches}')

Number of matches in the json file with FC Barcelona as the home team: 19
Number of matches in the json file with FC Barcelona as the away team: 19


Now, we are sure that this json file that we have loaded in for the La Liga 2014-15 data contains match data for only FC Barcelona. But this is a good thing for us as we needed all of FC Barcelona's match data for this season. 

First of all, I will create a list of all of the match results of FC Barcelona in La Liga 2014-15 and add them to a dataframe.

In [None]:
match_results = []
for match in matches:
    