The StatsBomb open data repository organizes match data by competitions and seasons, and each match is associated with a unique match ID. Here's a general approach to achieve this:

### 1. **Identify Competition and Season IDs**: 
First, we need to find the IDs for the competition and season we're interested in. StatsBomb provides a `competitions.json` JSON file that lists all competitions and their details, including competition IDs and season IDs.

```json
root [] 70 items
0:
    competition_id 9
    season_id 27
    country_name "Germany"
    competition_name "1. Bundesliga"
    competition_gender "male"
    competition_youth false
    competition_international false
    season_name "2015/2016"
    match_updated "2023-12-12T07:43:33.436182"
    match_updated_ 360null
    match_available_ 360null
    match_available "2023-12-12T07:43:33.436182"
```

In [25]:
def list_competitions():
    with open('Statsbomb/data/competitions.json') as file:
        competitions = json.load(file)
    df_competitions = pd.json_normalize(competitions)
    return df_competitions

# Call this function to see all available competitions and their details
competitions_df = list_competitions()
competitions_df[['competition_id', 'season_id', 'competition_name', 'season_name']]


Unnamed: 0,competition_id,season_id,competition_name,season_name
0,9,27,1. Bundesliga,2015/2016
1,16,4,Champions League,2018/2019
2,16,1,Champions League,2017/2018
3,16,2,Champions League,2016/2017
4,16,27,Champions League,2015/2016
...,...,...,...,...
65,55,43,UEFA Euro,2020
66,35,75,UEFA Europa League,1988/1989
67,53,106,UEFA Women's Euro,2022
68,72,107,Women's World Cup,2023


### Creating a competitions dict for finding competitions 
The function essentially converts the DataFrame competitions_df into a dictionary where each key is a tuple `(competition_id, season_id)` and each value is a dictionary containing `competition_name` and `season_name`. This helps in organizing and accessing the competition data efficiently based on their IDs and seasons.

In [None]:
def create_competitions_dict(competitions_df):
    # Initialize an empty dictionary to store competition data
    competitions_dict = {}
    
    # Loop through each row in the competitions DataFrame
    for index, row in competitions_df.iterrows():
        # Extract the competition_id and season_id from the current row
        key = (row['competition_id'], row['season_id'])
        
        # Create a dictionary containing competition_name and season_name
        value = {
            'competition_name': row['competition_name'],
            'season_name': row['season_name']
        }
        
        # Check if the combination of competition_id and season_id is not already in the dictionary
        if key not in competitions_dict:
            # If not present, add it to the dictionary with its corresponding value
            competitions_dict[key] = value
            
    # Return the populated competitions dictionary
    return competitions_dict


In [46]:
# Call the function to create the dictionary
competitions_dict = create_competitions_dict(competitions_df)

competitions_dict

{(9, 27): {'competition_name': '1. Bundesliga', 'season_name': '2015/2016'},
 (16, 4): {'competition_name': 'Champions League', 'season_name': '2018/2019'},
 (16, 1): {'competition_name': 'Champions League', 'season_name': '2017/2018'},
 (16, 2): {'competition_name': 'Champions League', 'season_name': '2016/2017'},
 (16, 27): {'competition_name': 'Champions League',
  'season_name': '2015/2016'},
 (16, 26): {'competition_name': 'Champions League',
  'season_name': '2014/2015'},
 (16, 25): {'competition_name': 'Champions League',
  'season_name': '2013/2014'},
 (16, 24): {'competition_name': 'Champions League',
  'season_name': '2012/2013'},
 (16, 23): {'competition_name': 'Champions League',
  'season_name': '2011/2012'},
 (16, 22): {'competition_name': 'Champions League',
  'season_name': '2010/2011'},
 (16, 21): {'competition_name': 'Champions League',
  'season_name': '2009/2010'},
 (16, 41): {'competition_name': 'Champions League',
  'season_name': '2008/2009'},
 (16, 39): {'compet

In [68]:
# Print the created dictionary
print("Dictionary for Finding Competition IDs:\n")
prev_competition_id = None
for key, value in competitions_dict.items():
    if key[0] != prev_competition_id:
        print("=== COMPETITION ID: ", key[0], " ===> ",value['competition_name'])
    print("Season ID:", key[1])
    print("Season:", value['season_name'])
    print()
    prev_competition_id = key[0]

Dictionary for Finding Competition IDs:

=== COMPETITION ID:  9  ===>  1. Bundesliga
Season ID: 27
Season: 2015/2016

=== COMPETITION ID:  16  ===>  Champions League
Season ID: 4
Season: 2018/2019

Season ID: 1
Season: 2017/2018

Season ID: 2
Season: 2016/2017

Season ID: 27
Season: 2015/2016

Season ID: 26
Season: 2014/2015

Season ID: 25
Season: 2013/2014

Season ID: 24
Season: 2012/2013

Season ID: 23
Season: 2011/2012

Season ID: 22
Season: 2010/2011

Season ID: 21
Season: 2009/2010

Season ID: 41
Season: 2008/2009

Season ID: 39
Season: 2006/2007

Season ID: 37
Season: 2004/2005

Season ID: 44
Season: 2003/2004

Season ID: 76
Season: 1999/2000

Season ID: 277
Season: 1972/1973

Season ID: 71
Season: 1971/1972

Season ID: 276
Season: 1970/1971

=== COMPETITION ID:  87  ===>  Copa del Rey
Season ID: 84
Season: 1983/1984

Season ID: 268
Season: 1982/1983

Season ID: 279
Season: 1977/1978

=== COMPETITION ID:  37  ===>  FA Women's Super League
Season ID: 90
Season: 2020/2021

Season I

### List available seasons for competition_id

In [71]:
# Accessing elements in the dictionary
competition_id = 16 

# Initialize a list to store all seasons for the given competition ID
seasons = []

# Iterate over the dictionary to find all entries with the given competition ID
for key, value in competitions_dict.items():
    if key[0] == competition_id:
        # If the competition ID matches, append the season ID to the seasons list
        seasons.append(key[1])

# Print the list of seasons associated with the given competition ID
if seasons:
    print(f"Seasons for Competition ID {competition_id}:")
    for season_id in seasons:
        # Find the corresponding season name from the dictionary
        for key, value in competitions_dict.items():
            if key[0] == competition_id and key[1] == season_id:
                season_name = value['season_name']
                print(f"- Season ID: {season_id}, Season Name: {season_name}")
else:
    print(f"No seasons found for Competition ID {competition_id}.")



Seasons for Competition ID 16:
- Season ID: 4, Season Name: 2018/2019
- Season ID: 1, Season Name: 2017/2018
- Season ID: 2, Season Name: 2016/2017
- Season ID: 27, Season Name: 2015/2016
- Season ID: 26, Season Name: 2014/2015
- Season ID: 25, Season Name: 2013/2014
- Season ID: 24, Season Name: 2012/2013
- Season ID: 23, Season Name: 2011/2012
- Season ID: 22, Season Name: 2010/2011
- Season ID: 21, Season Name: 2009/2010
- Season ID: 41, Season Name: 2008/2009
- Season ID: 39, Season Name: 2006/2007
- Season ID: 37, Season Name: 2004/2005
- Season ID: 44, Season Name: 2003/2004
- Season ID: 76, Season Name: 1999/2000
- Season ID: 277, Season Name: 1972/1973
- Season ID: 71, Season Name: 1971/1972
- Season ID: 276, Season Name: 1970/1971


### 2. **List Matches for a Competition**: 
Once you have the competition and season IDs, you can locate the JSON file that lists all matches for that competition and season. This file will provide you with match IDs for all the matches in that competition and season.

In [85]:
# Accessing elements in the dictionary
competition_id = 16 # Champions League
season_id = 4 # 2018/2019

# Construct the key using competition_id and season_id
key = (competition_id, season_id)

# Accessing the value corresponding to the key
if key in competitions_dict:
    competition_info = competitions_dict[key]
    print("Competition Name:", competition_info['competition_name'])
    print("Season Name:", competition_info['season_name'])
else:
    print("Competition ID and Season ID combination not found in the dictionary.")


Competition Name: Champions League
Season Name: 2018/2019


### Loading matches

In [86]:
with open('Statsbomb/data/matches/'+str(competition_id)+'/'+str(season_id)+'.json') as f:
    matches = json.load(f)

In [97]:
len(matches) # We have only one match for this competitions

1

In [92]:
matches[0]

{'match_id': 22912,
 'match_date': '2019-06-01',
 'kick_off': '21:00:00.000',
 'competition': {'competition_id': 16,
  'country_name': 'Europe',
  'competition_name': 'Champions League'},
 'season': {'season_id': 4, 'season_name': '2018/2019'},
 'home_team': {'home_team_id': 38,
  'home_team_name': 'Tottenham Hotspur',
  'home_team_gender': 'male',
  'home_team_group': None,
  'country': {'id': 68, 'name': 'England'},
  'managers': [{'id': 81,
    'name': 'Mauricio Roberto Pochettino Trossero',
    'nickname': 'Mauricio Pochettino',
    'dob': '1972-03-02',
    'country': {'id': 11, 'name': 'Argentina'}}]},
 'away_team': {'away_team_id': 24,
  'away_team_name': 'Liverpool',
  'away_team_gender': 'male',
  'away_team_group': None,
  'country': {'id': 68, 'name': 'England'},
  'managers': [{'id': 94,
    'name': 'Jürgen Klopp',
    'nickname': None,
    'dob': '1967-06-16',
    'country': {'id': 85, 'name': 'Germany'}}]},
 'home_score': 0,
 'away_score': 2,
 'match_status': 'available',


In [98]:
for match in matches:
    home_team_name=match['home_team']['home_team_name']
    away_team_name=match['away_team']['away_team_name']
    home_score=match['home_score']
    away_score=match['away_score']
    describe_text = 'The match between ' + home_team_name + ' and ' + away_team_name
    result_text = ' finished ' + str(home_score) +  ' : ' + str(away_score)
    print(describe_text + result_text)

The match between Tottenham Hotspur and Liverpool finished 0 : 2
