In [1]:
from mplsoccer import Sbopen
parser = Sbopen()

## Parser

Using the parser, we can access various dataframes starting from the high level (competitions, like WWC, AFCON...) all the way to the individual matches.

One thing to note on how these work is the ID's. You may think if we get the `competition_name` that it will give an index ID we can use as well. This is not the case.

For example, if we get the Womens World Cup from `competition_name` we can see it exists in row 70. However, we then need to do `competition_id` to see that row 70 has an ID of 72. 

## parser.competition()

This gets us all the competition info. The important keys from this are `competition_id` and `season_id`. From these we can access any competitions data from any season.

In [17]:
df_comp = parser.competition()
df_comp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 71 entries, 0 to 70
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   competition_id             71 non-null     int64 
 1   season_id                  71 non-null     int64 
 2   country_name               71 non-null     object
 3   competition_name           71 non-null     object
 4   competition_gender         71 non-null     object
 5   competition_youth          71 non-null     bool  
 6   competition_international  71 non-null     bool  
 7   season_name                71 non-null     object
 8   match_updated              71 non-null     object
 9   match_updated_360          54 non-null     object
 10  match_available_360        8 non-null      object
 11  match_available            71 non-null     object
dtypes: bool(2), int64(2), object(8)
memory usage: 5.8+ KB


Say we want to look at the Bundesliga. Let's first find which row this exists:

In [22]:
df_comp['competition_name']

0              1. Bundesliga
1     African Cup of Nations
2           Champions League
3           Champions League
4           Champions League
               ...          
66                 UEFA Euro
67        UEFA Europa League
68         UEFA Women's Euro
69         Women's World Cup
70         Women's World Cup
Name: competition_name, Length: 71, dtype: object

We can see it is on the first row. To get the ID of this competition, we need to look at the first row of `competition_id`. 

In [23]:
df_comp['competition_id']

0        9
1     1267
2       16
3       16
4       16
      ... 
66      55
67      35
68      53
69      72
70      72
Name: competition_id, Length: 71, dtype: int64

We have `9`. We can do the same with `season_name` and `season_id` too. For this, I've selected the 2015/16 season which has an ID of `27`.

NOTE: ONLY SEASON 27 EXISTS FOR THE OPENDATA (HAD TO LOOK), WHICH IS 2015/16!

In [38]:
df_match = parser.match(competition_id=9, season_id=27)
# can do parser.match(competition_name="1. Bundesliga", season_name="2015/16") as well

df_match.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 306 entries, 0 to 305
Data columns (total 52 columns):
 #   Column                           Non-Null Count  Dtype         
---  ------                           --------------  -----         
 0   match_id                         306 non-null    int64         
 1   match_date                       306 non-null    datetime64[ns]
 2   kick_off                         306 non-null    datetime64[ns]
 3   home_score                       306 non-null    int64         
 4   away_score                       306 non-null    int64         
 5   match_status                     306 non-null    object        
 6   match_status_360                 306 non-null    object        
 7   last_updated                     306 non-null    datetime64[ns]
 8   last_updated_360                 0 non-null      datetime64[ns]
 9   match_week                       306 non-null    int64         
 10  competition_id                   306 non-null    int64        

## Event data

We will mostly be using "Events" data, which are things like shots, passes etc. You access these with `game_id`. 

The `event` method returns 4 dataframes in a tuple. We have events, related, freeze and tactics.

In [43]:
df_event = parser.event(match_id=3890511)[0]
df_event.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3395 entries, 0 to 3394
Data columns (total 67 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              3395 non-null   object 
 1   index                           3395 non-null   int64  
 2   period                          3395 non-null   int64  
 3   timestamp                       3395 non-null   object 
 4   minute                          3395 non-null   int64  
 5   second                          3395 non-null   int64  
 6   possession                      3395 non-null   int64  
 7   duration                        2489 non-null   float64
 8   match_id                        3395 non-null   int64  
 9   type_id                         3395 non-null   int64  
 10  type_name                       3395 non-null   object 
 11  possession_team_id              3395 non-null   int64  
 12  possession_team_name            33

## 360 data

Finally there is also 360 data which track not only location of an event but also players’ location. To open them we need an id of game. Later, we will also need id of the event.

In [44]:
df_frame, df_visible = parser.frame(3788741)

# exploring the data
df_frame.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45737 entries, 0 to 45736
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   teammate  45737 non-null  bool   
 1   actor     45737 non-null  bool   
 2   keeper    45737 non-null  bool   
 3   match_id  45737 non-null  int64  
 4   id        45737 non-null  object 
 5   x         45737 non-null  float64
 6   y         45737 non-null  float64
dtypes: bool(3), float64(2), int64(1), object(1)
memory usage: 1.5+ MB
