# SPADL To Wyscout Conversion

### What is SPADL?

SPADL (Soccer Player Action Description Language) represents a game as a sequence of on-the-ball actions [a1, a2, ..., am], where m is the total number of actions that happened in the game.

SPADL uses a standardized coordinate system with the origin on the bottom left of the pitch, and a uniform field of 105m x 68m. For direction of play, SPADL uses the “home team attacks to the right” convention, but this can be converted conveniently with the play_left_to_right() function such that the lower x-coordinates represent the own half of the team performing the action.
<br>
<br>
**A note on Atomic SPADL**<br>
In this representation, all actions are atomic in the sense that they are always completed successfully without interruption. Consequently, while SPADL treats a pass as one action consisting of both the initiation and receival of the pass, Atomic-SPADL sees giving and receiving a pass as two separate actions. Because not all passes successfully reach a teammate, Atomic-SPADL introduces an interception action if the ball was intercepted by the other team or an out event if the ball went out of play. Atomic-SPADL similarly divides shots, freekicks, and corners into two separate actions. Practically, the effect is that this representation helps to distinguish the contribution of the player who initiates the action (e.g., gives the pass) and the player who completes the action (e.g., receives the pass)
<br>
The atomic SPADL format is conversion from the original SPADL format, so we do not need to rewrite the code for the Wyscout API in order to be able to use this. <br>
<br>
**Action Type**<br>
The action type attribute can have 22 possible values. These are pass, cross, throw-in, crossed free kick, short free kick, crossed corner, short corner, take-on, foul, tackle, interception, shot, penalty shot, free kick shot, keeper save, keeper claim, keeper punch, keeper pick-up, clearance, bad touch, dribble and goal kick. A detailed definition of each action type is available here.

**Result**<br>
The result attribute can either have the value success, to indicate that an action achieved it’s intended result; or the value fail, if this was not the case. An example of a successful action is a pass which reaches a teammate. An example of an unsuccessful action is a pass which goes over the sideline. Some action types can have special results. These are offside (for passes, corners and free-kicks), own goal (for shots), and yellow card and red card (for fouls).

**Body Part**<br>
The body part attribute can have 4 possible values. These are foot, head, other and none. For Wyscout, which does not distinguish between the head and other body parts a special body part head/other is used.

### The problem we need to solve : moving to v3

The old wyscout event format from version 2 of the API looks like this: 
```
{
    "tags": [
    {
    "id": 1802,
    "tag": {
    "label": "not accurate"
            }
        }
    ],
    "eventId": 8,
    "eventName": "Pass",
    "eventSec": 1.8496730000000001,
    "id": 663292348,
    "matchId": 2852835,
    "matchPeriod": "1H",
    "playerId": 21123,
    "positions": [
        {
        "x": 52,
        "y": 47
        },
        {
        "x": 60,
        "y": 32
        }
    ],
    "subEventId": 85,
    "subEventName": "Simple pass",
    "teamId": 3185
}

```
The new version three format is somewhat more complex:
```
{
    "id": 601919968,
    "matchId": -168770,
    "matchPeriod": "1H",
    "minute": 8,
    "second": 21,
    "matchTimestamp": "00:08:21.568",
    "videoTimestamp": "507.568215",
    "relatedEventId": 601919969,
    "type": {
    "primary": "pass",
    "secondary": [
        "back_pass"
    ]
    },
    "location": {
    "x": 42, 
    "y": 83 
    },
    "team": {
    "id": 964,
    "name": "Borussia Dortmund",
    "formation": "3-4-3"
    },
    "opponentTeam": {
    "id": 961,
    "name": "Bayern München",
    "formation": "4-2-3-1"
    },
    "player": {
    "id": 156709,
    "name": "T. Hazard",
    "position": "RWF"
    },
    "pass": {
    "accurate": true,
    "length": 9.34,
    "angle": 148,
    "recipient": {
        "id": 419254,
        "name": "A. Hakimi",
        "position": "RWB"
    },
    "endLocation": {
        "x": 34,
        "y": 89
    }
    },
    "shot": null,
    "groundDuel": null,
    "aerialDuel": null,
    "infraction": null,
    "carry": null,
    "possession": {
    "id": 601919966,
    "duration": "3.293842",
    "types": [
        "throw_in"
    ],
    "eventsNumber": 3,
    "eventIndex": 1,
    "startLocation": {
        "x": 29,
        "y": 100
    },
    "endLocation": {
        "x": 43,
        "y": 28
    },
    "team": {
        "id": 964,
        "name": "Borussia Dortmund",
        "formation": "3-4-3"
    },
    "attack": null
    }
}
```

There is no current conversion between the Wyscout v3 format and the SPADL data format, so we are going to build it ourselves.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import warnings
import pandas as pd
pd.set_option('display.max_columns', None)
warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)
warnings.filterwarnings(action="ignore", message="credentials were not supplied. open data access only")
import tqdm

In [3]:
from socceraction.spadl.wyscout import *
import pandas as pd
from socceraction.data.wyscout import WyscoutLoader
from socceraction.data.base import _localloadjson
from socceraction.data.wyscout.loader import _convert_events
from socceraction.data.wyscout.schema import WyscoutEventSchema

The Wyscout loader code from the SoccerAction databse also looks to be broken, so I will use some of the methods in the file and the hyperlinks in the WyscoutLoader class to load the data manually in the below cell

In [4]:
# Events have been downloaded from the link in the PublicWyscoutLoader class - events="https://ndownloader.figshare.com/files/14464685",

# A modified version of the code from the events method of the PublicWyscoutLoader class
obj = _localloadjson("C:\\Users\\LiamMoore\\Documents\\code\\python\\wyscout-spadl-conversion\\Data\\Events\\v2_response.json")

In [5]:
obj

{'events': [{'id': 1646549577,
   'playerId': 372255,
   'teamId': 3161,
   'matchId': 5345057,
   'matchPeriod': '1H',
   'eventSec': 3.187,
   'eventId': 8,
   'eventName': 'Pass',
   'subEventId': 85,
   'subEventName': 'Simple pass',
   'positions': [{'x': 49, 'y': 50}, {'x': 34, 'y': 50}],
   'tags': [{'id': 1801}]},
  {'id': 1646549578,
   'playerId': 20635,
   'teamId': 3161,
   'matchId': 5345057,
   'matchPeriod': '1H',
   'eventSec': 6.645,
   'eventId': 8,
   'eventName': 'Pass',
   'subEventId': 83,
   'subEventName': 'High pass',
   'positions': [{'x': 34, 'y': 50}, {'x': 75, 'y': 96}],
   'tags': [{'id': 1801}]},
  {'id': 1646549581,
   'playerId': 330003,
   'teamId': 3161,
   'matchId': 5345057,
   'matchPeriod': '1H',
   'eventSec': 9.512,
   'eventId': 1,
   'eventName': 'Duel',
   'subEventId': 10,
   'subEventName': 'Air duel',
   'positions': [{'x': 75, 'y': 96}, {'x': 83, 'y': 100}],
   'tags': [{'id': 701}, {'id': 1802}]},
  {'id': 1646549855,
   'playerId': 3441

In [6]:
raw_df = pd.DataFrame(obj['events'])
raw_df

Unnamed: 0,id,playerId,teamId,matchId,matchPeriod,eventSec,eventId,eventName,subEventId,subEventName,positions,tags
0,1646549577,372255,3161,5345057,1H,3.187,8,Pass,85,Simple pass,"[{'x': 49, 'y': 50}, {'x': 34, 'y': 50}]",[{'id': 1801}]
1,1646549578,20635,3161,5345057,1H,6.645,8,Pass,83,High pass,"[{'x': 34, 'y': 50}, {'x': 75, 'y': 96}]",[{'id': 1801}]
2,1646549581,330003,3161,5345057,1H,9.512,1,Duel,10,Air duel,"[{'x': 75, 'y': 96}, {'x': 83, 'y': 100}]","[{'id': 701}, {'id': 1802}]"
3,1646549855,344132,3157,5345057,1H,9.512,1,Duel,10,Air duel,"[{'x': 25, 'y': 4}, {'x': 17, 'y': 0}]","[{'id': 703}, {'id': 1801}]"
4,1646549582,0,3161,5345057,1H,11.790,5,Interruption,50,Ball out of the field,"[{'x': 73, 'y': 100}]",[]
...,...,...,...,...,...,...,...,...,...,...,...,...
1564,1646551402,0,3157,5345057,2H,2848.000,5,Interruption,50,Ball out of the field,"[{'x': 100, 'y': 5}]",[]
1565,1646551403,518231,3157,5345057,2H,2866.000,3,Free Kick,30,Corner,"[{'x': 100, 'y': 0}, {'x': 96, 'y': 45}]","[{'id': 801}, {'id': 1802}]"
1566,1646550872,7905,3161,5345057,2H,2867.000,7,Others on the ball,71,Clearance,"[{'x': 4, 'y': 55}, {'x': 38, 'y': 59}]","[{'id': 1401}, {'id': 1802}]"
1567,1646551404,291591,3157,5345057,2H,2872.000,8,Pass,85,Simple pass,"[{'x': 62, 'y': 41}, {'x': 67, 'y': 3}]",[{'id': 1802}]


Just take a single match from this file for the rest of the investigation 

In [99]:
df_events = _convert_events(pd.DataFrame(raw_df))
events_df = cast(DataFrame[WyscoutEventSchema], df_events)

In [100]:
# check if there is a single row for each event ie - nothing has been exploded
len(events_df), events_df.event_id.nunique()

(1569, 1569)

The root function that does all the nice trickery we care about is the convert_to_actions function. Take a look at the output of this function with v2 and v3 event data from Wyscout and see what it returns.

*Note Im just using the first team id as home team here, may or may not be the home team but it saves downloading the Teams.json data*

In [9]:
spadl_actions = convert_to_actions(events_df, 1609)
spadl_actions

Unnamed: 0,game_id,period_id,time_seconds,team_id,player_id,start_x,start_y,end_x,end_y,original_event_id,bodypart_id,type_id,result_id,action_id
0,5345057,1,307.4,3161,405597,72.45,5.44,68.25,18.36,1646549626,1,0,1,0
1,5345057,1,523.3,3157,344132,22.05,27.2,17.85,22.44,1646550475,0,0,1,1
2,5345057,1,834.0,3161,21095,63.0,53.04,61.95,36.72,1646549689,0,0,1,2
3,5345057,1,1110.0,3157,257028,49.35,59.84,37.8,48.96,1646550634,1,0,0,3
4,5345057,1,1243.0,3157,134496,90.3,38.08,70.35,14.96,1646550682,2,0,1,4
5,5345057,1,1318.0,3157,254493,71.4,50.32,82.95,61.2,1646550718,0,0,1,5
6,5345057,1,1358.0,3157,257028,38.85,61.2,27.3,40.12,1646550733,0,0,1,6
7,5345057,1,1591.0,3157,257028,77.7,57.8,67.2,63.92,1646550797,0,0,1,7
8,5345057,1,1751.0,3157,521375,89.25,34.68,85.05,31.28,1646550862,0,0,1,8
9,5345057,1,2709.0,3157,254493,31.5,26.52,29.4,38.08,1646550997,0,0,0,9


Go through each component of the convert_to_actions function and find what will need adjusted for the v3 format

In [101]:
tags = get_tagsdf(events_df)
tags.columns

Index(['goal', 'own_goal', 'assist', 'key_pass', 'counter_attack', 'left_foot',
       'right_foot', 'head/body', 'direct', 'indirect', 'dangerous_ball_lost',
       'blocked', 'high', 'low', 'interception', 'clearance', 'opportunity',
       'feint', 'missed_ball', 'free_space_right', 'free_space_left',
       'take_on_left', 'take_on_right', 'sliding_tackle', 'anticipated',
       'anticipation', 'red_card', 'yellow_card', 'second_yellow_card',
       'position_goal_low_center', 'position_goal_low_right',
       'position_goal_mid_center', 'position_goal_mid_left',
       'position_goal_low_left', 'position_goal_mid_right',
       'position_goal_high_center', 'position_goal_high_left',
       'position_goal_high_right', 'position_out_low_right',
       'position_out_mid_left', 'position_out_low_left',
       'position_out_mid_right', 'position_out_high_center',
       'position_out_high_left', 'position_out_high_right',
       'position_post_low_right', 'position_post_mid_left',
    

The "tags" from the v2 of the Wyscout API look like some combination of the type.secondary column, the event attributes like shot, pass etc and possession_types from v3 

In [123]:
events_tagged = pd.concat([events_df, get_tagsdf(events_df)], axis=1)
events_tagged.head(3)

Unnamed: 0,event_id,game_id,period_id,milliseconds,team_id,player_id,type_id,type_name,subtype_id,subtype_name,tags,goal,own_goal,assist,key_pass,counter_attack,left_foot,right_foot,head/body,direct,indirect,dangerous_ball_lost,blocked,high,low,interception,clearance,opportunity,feint,missed_ball,free_space_right,free_space_left,take_on_left,take_on_right,sliding_tackle,anticipated,anticipation,red_card,yellow_card,second_yellow_card,position_goal_low_center,position_goal_low_right,position_goal_mid_center,position_goal_mid_left,position_goal_low_left,position_goal_mid_right,position_goal_high_center,position_goal_high_left,position_goal_high_right,position_out_low_right,position_out_mid_left,position_out_low_left,position_out_mid_right,position_out_high_center,position_out_high_left,position_out_high_right,position_post_low_right,position_post_mid_left,position_post_low_left,position_post_mid_right,position_post_high_center,position_post_high_left,position_post_high_right,through,fairplay,lost,neutral,won,accurate,not_accurate,start_x,start_y,end_x,end_y,offside
0,1646549577,5345057,1,3187.0,3161,372255,8,Pass,85,Simple pass,[{'id': 1801}],False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,49,50,34,50.0,0
1,1646549578,5345057,1,6645.0,3161,20635,8,Pass,83,High pass,[{'id': 1801}],False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,34,50,75,96.0,0
2,1646549581,5345057,1,9512.0,3161,330003,8,Duel,82,Air duel,"[{'id': 701}, {'id': 1802}]",False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,75,96,27,0.0,0


The make_new_positions function just extracts the two x,y coordinates into start and end coordinates, anything that doesnt have an end location is removed - for v3 this is done for us already, so just filter df where we have endLocation values for the event

In [124]:
converted_positions = make_new_positions(events_tagged)
converted_positions.head(3)

Unnamed: 0,event_id,game_id,period_id,milliseconds,team_id,player_id,type_id,type_name,subtype_id,subtype_name,tags,goal,own_goal,assist,key_pass,counter_attack,left_foot,right_foot,head/body,direct,indirect,dangerous_ball_lost,blocked,high,low,interception,clearance,opportunity,feint,missed_ball,free_space_right,free_space_left,take_on_left,take_on_right,sliding_tackle,anticipated,anticipation,red_card,yellow_card,second_yellow_card,position_goal_low_center,position_goal_low_right,position_goal_mid_center,position_goal_mid_left,position_goal_low_left,position_goal_mid_right,position_goal_high_center,position_goal_high_left,position_goal_high_right,position_out_low_right,position_out_mid_left,position_out_low_left,position_out_mid_right,position_out_high_center,position_out_high_left,position_out_high_right,position_post_low_right,position_post_mid_left,position_post_low_left,position_post_mid_right,position_post_high_center,position_post_high_left,position_post_high_right,through,fairplay,lost,neutral,won,accurate,not_accurate,start_x,start_y,end_x,end_y
0,1646549577,5345057,1,3187.0,3161,372255,8,Pass,85,Simple pass,[{'id': 1801}],False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,49,50,34,50
1,1646549578,5345057,1,6645.0,3161,20635,8,Pass,83,High pass,[{'id': 1801}],False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,34,50,75,96
2,1646549581,5345057,1,9512.0,3161,330003,1,Duel,10,Air duel,"[{'id': 701}, {'id': 1802}]",False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,75,96,83,100


The fix_wyscout_events function will create coordinates that are needed for the spadl format for events that it doesnt exist. <br>
It is made up of multiple functions, defined below.

In [104]:
# The function is made up of these
# events = create_shot_coordinates(events) # this should be able to remain the same for v3
# events = convert_duels(events) # this will need to be rewritten to use the groundDuel and aerialDuel fields in v3, but logic should remain the same
# events = insert_interception_passes(events) # dont think this is needed anymore
# events = add_offside_variable(events) # this can remain the same other than offside is now contained in the type.primary field
# convert_touches(events) # same as above
# convert_simulations(events) # same as above but the simulation field is the infraction type (might also be in type secondary)

# events = fix_wyscout_events(events)
# events.head(3)

The create_df_actions is the main function that we will need to focus on for the conversion to the new types. It is a manually defined process whereby the event event is assigned to one of the SciSports actions based on the types (type.primary and type.secondary in v3). The result of the event and body part is also assigned here.

In [72]:
# determine_bodypart_id(event) # The logic here will remain the same other than using the body_type field from wyscout v3
# determine_type_id(event)  # This is probably the bulk of the work - need to manually check the primary/secondary combos and decide which category they will fit into
# determine_result_id(event) # This should be simle and remain mostly the same

# actions = create_df_actions(events)
# actions.head(3)

Unnamed: 0,game_id,period_id,time_seconds,team_id,player_id,start_x,start_y,end_x,end_y,original_event_id,bodypart_id,type_id,result_id
0,5345057,1,3.187,3161,372255,49,50,34,50.0,1646549577,0,0,1
1,5345057,1,6.645,3161,20635,34,50,75,96.0,1646549578,0,0,1
2,5345057,1,9.512,3161,330003,75,96,27,0.0,1646549581,1,0,0


### Start to convert the v3 data 

In [39]:
v3_json = _localloadjson("C:\\Users\\LiamMoore\\Documents\\code\\python\\wyscout-spadl-conversion\\Data\\Events\\v3_response.json")
v3_events = v3_json['events']

In [40]:
v3_df = pd.json_normalize(v3_events, sep='_')
v3_df.head(3)

Unnamed: 0,id,matchId,matchPeriod,minute,second,matchTimestamp,videoTimestamp,relatedEventId,shot,groundDuel,aerialDuel,infraction,carry,type_primary,type_secondary,location_x,location_y,team_id,team_name,team_formation,opponentTeam_id,opponentTeam_name,opponentTeam_formation,player_id,player_name,player_position,pass_accurate,pass_angle,pass_height,pass_length,pass_recipient_id,pass_recipient_name,pass_recipient_position,pass_endLocation_x,pass_endLocation_y,possession_id,possession_duration,possession_types,possession_eventsNumber,possession_eventIndex,possession_startLocation_x,possession_startLocation_y,possession_endLocation_x,possession_endLocation_y,possession_team_id,possession_team_name,possession_team_formation,possession_attack,pass,aerialDuel_opponent_id,aerialDuel_opponent_name,aerialDuel_opponent_position,aerialDuel_opponent_height,aerialDuel_firstTouch,aerialDuel_height,aerialDuel_relatedDuelId,possession,possession_attack_withShot,possession_attack_withShotOnGoal,possession_attack_withGoal,possession_attack_flank,possession_attack_xg,infraction_yellowCard,infraction_redCard,infraction_type,infraction_opponent_id,infraction_opponent_name,infraction_opponent_position,groundDuel_opponent_id,groundDuel_opponent_name,groundDuel_opponent_position,groundDuel_duelType,groundDuel_keptPossession,groundDuel_progressedWithBall,groundDuel_stoppedProgress,groundDuel_recoveredPossession,groundDuel_takeOn,groundDuel_side,groundDuel_relatedDuelId,carry_progression,carry_endLocation_x,carry_endLocation_y,shot_bodyPart,shot_isGoal,shot_onTarget,shot_goalZone,shot_xg,shot_postShotXg,shot_goalkeeperActionId,shot_goalkeeper,shot_goalkeeper_id,shot_goalkeeper_name,infraction_opponent,location
0,1646549577,5345057,1H,0,3,00:00:03.187,5.187274,1646550000.0,,,,,,pass,"[back_pass, short_or_medium_pass]",49.0,50.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,372255,L. Martínez,SS,True,180.0,,16.0,20635.0,F. Acerbi,CB,34.0,50.0,1646550000.0,6.324848,[],4.0,0.0,49.0,50.0,75.0,96.0,3161.0,Internazionale,3-5-2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,1646549578,5345057,1H,0,6,00:00:06.645,8.645442,1646550000.0,,,,,,pass,"[forward_pass, long_pass, pass_to_final_third,...",34.0,50.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,20635,F. Acerbi,CB,True,36.0,high,53.0,330003.0,D. Dumfries,RWB,75.0,96.0,1646550000.0,6.324848,[],4.0,1.0,49.0,50.0,75.0,96.0,3161.0,Internazionale,3-5-2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,1646549581,5345057,1H,0,9,00:00:09.512,11.512122,1646550000.0,,,,,,duel,"[aerial_duel, loss]",75.0,96.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,330003,D. Dumfries,RWB,,,,,,,,,,1646550000.0,6.324848,[],4.0,2.0,49.0,50.0,75.0,96.0,3161.0,Internazionale,3-5-2,,,344132.0,Theo Hernández,LB,184.0,False,188.0,1646550000.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [41]:
# The new format is only one row shorter than the old format
v3_df.shape

(1574, 94)

In [49]:
wyscout_periods = wyscout_periods = {"1H": 1, "2H": 2, "E1": 3, "E2": 4, "P": 5}

v3_df["period_id"] = v3_df.matchPeriod.apply(lambda x: wyscout_periods[x])
v3_df["milliseconds"] = (v3_df.second + (v3_df.minute * 60)) * 1000

Test the rewritten make_new_positions function

In [50]:
from socceraction.spadl.wyscout_v3 import make_new_positions as make_new_positions_v3

updated_positions = make_new_positions_v3(v3_df)
updated_positions.head(3)

Unnamed: 0,id,matchId,matchPeriod,minute,second,matchTimestamp,videoTimestamp,relatedEventId,shot,groundDuel,aerialDuel,infraction,carry,type_primary,type_secondary,location_x,location_y,team_id,team_name,team_formation,opponentTeam_id,opponentTeam_name,opponentTeam_formation,player_id,player_name,player_position,pass_accurate,pass_angle,pass_height,pass_length,pass_recipient_id,pass_recipient_name,pass_recipient_position,pass_endLocation_x,pass_endLocation_y,possession_id,possession_duration,possession_types,possession_eventsNumber,possession_eventIndex,possession_startLocation_x,possession_startLocation_y,possession_endLocation_x,possession_endLocation_y,possession_team_id,possession_team_name,possession_team_formation,possession_attack,pass,aerialDuel_opponent_id,aerialDuel_opponent_name,aerialDuel_opponent_position,aerialDuel_opponent_height,aerialDuel_firstTouch,aerialDuel_height,aerialDuel_relatedDuelId,possession,possession_attack_withShot,possession_attack_withShotOnGoal,possession_attack_withGoal,possession_attack_flank,possession_attack_xg,infraction_yellowCard,infraction_redCard,infraction_type,infraction_opponent_id,infraction_opponent_name,infraction_opponent_position,groundDuel_opponent_id,groundDuel_opponent_name,groundDuel_opponent_position,groundDuel_duelType,groundDuel_keptPossession,groundDuel_progressedWithBall,groundDuel_stoppedProgress,groundDuel_recoveredPossession,groundDuel_takeOn,groundDuel_side,groundDuel_relatedDuelId,carry_progression,carry_endLocation_x,carry_endLocation_y,shot_bodyPart,shot_isGoal,shot_onTarget,shot_goalZone,shot_xg,shot_postShotXg,shot_goalkeeperActionId,shot_goalkeeper,shot_goalkeeper_id,shot_goalkeeper_name,infraction_opponent,location,start_x,start_y,end_x,end_y,period_id,milliseconds
0,1646549577,5345057,1H,0,3,00:00:03.187,5.187274,1646550000.0,,,,,,pass,"[back_pass, short_or_medium_pass]",49.0,50.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,372255,L. Martínez,SS,True,180.0,,16.0,20635.0,F. Acerbi,CB,34.0,50.0,1646550000.0,6.324848,[],4.0,0.0,49.0,50.0,75.0,96.0,3161.0,Internazionale,3-5-2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,49.0,50.0,34.0,50.0,1,3000
1,1646549578,5345057,1H,0,6,00:00:06.645,8.645442,1646550000.0,,,,,,pass,"[forward_pass, long_pass, pass_to_final_third,...",34.0,50.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,20635,F. Acerbi,CB,True,36.0,high,53.0,330003.0,D. Dumfries,RWB,75.0,96.0,1646550000.0,6.324848,[],4.0,1.0,49.0,50.0,75.0,96.0,3161.0,Internazionale,3-5-2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,34.0,50.0,75.0,96.0,1,6000
2,1646549581,5345057,1H,0,9,00:00:09.512,11.512122,1646550000.0,,,,,,duel,"[aerial_duel, loss]",75.0,96.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,330003,D. Dumfries,RWB,,,,,,,,,,1646550000.0,6.324848,[],4.0,2.0,49.0,50.0,75.0,96.0,3161.0,Internazionale,3-5-2,,,344132.0,Theo Hernández,LB,184.0,False,188.0,1646550000.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,75.0,96.0,75.0,96.0,1,9000


In [65]:
from socceraction.spadl.wyscout_v3 import create_shot_coordinates as create_shot_coordinates_v3

shot_coords = create_shot_coordinates_v3(updated_positions)

In [73]:
from socceraction.spadl.wyscout import create_shot_coordinates

v2_shot_coords = create_shot_coordinates(events)

In [74]:
cols = ['start_x', 'start_y', 'end_x', 'end_y']
v2_shot_coords.loc[v2_shot_coords.type_name=='Shot'][cols].reset_index() == shot_coords.loc[shot_coords.shot_xg.notna()][cols].reset_index()

Unnamed: 0,index,start_x,start_y,end_x,end_y
0,False,True,True,True,True
1,False,True,True,True,True
2,False,True,True,True,True
3,False,True,True,True,True
4,False,True,True,True,True
5,False,True,True,True,True
6,False,True,True,True,True
7,False,True,True,True,True
8,False,True,True,True,True
9,False,True,True,True,True


In [93]:
from socceraction.spadl.wyscout_v3 import convert_duels as convert_duels_v3

converted_duels = convert_duels_v3(shot_coords)

In [94]:
from socceraction.spadl.wyscout import convert_duels

converted_duels_v2 = convert_duels(v2_shot_coords)

In [78]:
converted_duels_v2.loc[~converted_duels_v2.event_id.isin(converted_duels.id)]

Unnamed: 0,event_id,game_id,period_id,milliseconds,team_id,player_id,type_id,type_name,subtype_id,subtype_name,tags,goal,own_goal,assist,key_pass,counter_attack,left_foot,right_foot,head/body,direct,indirect,dangerous_ball_lost,blocked,high,low,interception,clearance,opportunity,feint,missed_ball,free_space_right,free_space_left,take_on_left,take_on_right,sliding_tackle,anticipated,anticipation,red_card,yellow_card,second_yellow_card,position_goal_low_center,position_goal_low_right,position_goal_mid_center,position_goal_mid_left,position_goal_low_left,position_goal_mid_right,position_goal_high_center,position_goal_high_left,position_goal_high_right,position_out_low_right,position_out_mid_left,position_out_low_left,position_out_mid_right,position_out_high_center,position_out_high_left,position_out_high_right,position_post_low_right,position_post_mid_left,position_post_low_left,position_post_mid_right,position_post_high_center,position_post_high_left,position_post_high_right,through,fairplay,lost,neutral,won,accurate,not_accurate,start_x,start_y,end_x,end_y,offside,time_seconds


In [79]:
len(converted_duels.loc[~converted_duels.id.isin(converted_duels_v2.event_id)])

3

Some of the duel conversion doesnt quite match up due to the order of the actions being returned in a diferent order between v2 and v3. As far as I can see there is no way to sort this as the events have the same matchTimestamp. 

In [106]:
from socceraction.spadl.wyscout_v3 import insert_interception_passes as insert_interception_passes_v3

interception_passes_v3 = insert_interception_passes_v3(converted_duels)

In [85]:
interception_passes_v2 = insert_interception_passes(converted_duels_v2)

In [117]:
len(interception_passes_v3), len(events)

(1226, 1223)

In [137]:
events.loc[events["subtype_id"].isin([10])]

Unnamed: 0,event_id,game_id,period_id,milliseconds,team_id,player_id,type_id,type_name,subtype_id,subtype_name,tags,goal,own_goal,assist,key_pass,counter_attack,left_foot,right_foot,head/body,direct,indirect,dangerous_ball_lost,blocked,high,low,interception,clearance,opportunity,feint,missed_ball,free_space_right,free_space_left,take_on_left,take_on_right,sliding_tackle,anticipated,anticipation,red_card,yellow_card,second_yellow_card,position_goal_low_center,position_goal_low_right,position_goal_mid_center,position_goal_mid_left,position_goal_low_left,position_goal_mid_right,position_goal_high_center,position_goal_high_left,position_goal_high_right,position_out_low_right,position_out_mid_left,position_out_low_left,position_out_mid_right,position_out_high_center,position_out_high_left,position_out_high_right,position_post_low_right,position_post_mid_left,position_post_low_left,position_post_mid_right,position_post_high_center,position_post_high_left,position_post_high_right,through,fairplay,lost,neutral,won,accurate,not_accurate,start_x,start_y,end_x,end_y,offside


In [120]:
from socceraction.spadl.wyscout_v3 import convert_touches as convert_touches_v3

converted_touches = convert_touches_v3(interception_passes_v3)
converted_touches_v2 = convert_touches(events)

The fix_wyscout_events method should now yield identical results (aside from the ordering of actions in convert duels)

In [177]:
from socceraction.spadl.wyscout_v3 import fix_wyscout_events as fix_wyscout_events_v3

fixed_df_v2 = fix_wyscout_events(converted_positions)

In [167]:
fixed_df = fix_wyscout_events_v3(v3_df)
fixed_df

Unnamed: 0,id,matchId,matchPeriod,minute,second,matchTimestamp,videoTimestamp,relatedEventId,shot,groundDuel,aerialDuel,infraction,carry,type_primary,type_secondary,location_x,location_y,team_id,team_name,team_formation,opponentTeam_id,opponentTeam_name,opponentTeam_formation,player_id,player_name,player_position,pass_accurate,pass_angle,pass_height,pass_length,pass_recipient_id,pass_recipient_name,pass_recipient_position,pass_endLocation_x,pass_endLocation_y,possession_id,possession_duration,possession_types,possession_eventsNumber,possession_eventIndex,possession_startLocation_x,possession_startLocation_y,possession_endLocation_x,possession_endLocation_y,possession_team_id,possession_team_name,possession_team_formation,possession_attack,pass,aerialDuel_opponent_id,aerialDuel_opponent_name,aerialDuel_opponent_position,aerialDuel_opponent_height,aerialDuel_firstTouch,aerialDuel_height,aerialDuel_relatedDuelId,possession,possession_attack_withShot,possession_attack_withShotOnGoal,possession_attack_withGoal,possession_attack_flank,possession_attack_xg,infraction_yellowCard,infraction_redCard,infraction_type,infraction_opponent_id,infraction_opponent_name,infraction_opponent_position,groundDuel_opponent_id,groundDuel_opponent_name,groundDuel_opponent_position,groundDuel_duelType,groundDuel_keptPossession,groundDuel_progressedWithBall,groundDuel_stoppedProgress,groundDuel_recoveredPossession,groundDuel_takeOn,groundDuel_side,groundDuel_relatedDuelId,carry_progression,carry_endLocation_x,carry_endLocation_y,shot_bodyPart,shot_isGoal,shot_onTarget,shot_goalZone,shot_xg,shot_postShotXg,shot_goalkeeperActionId,shot_goalkeeper,shot_goalkeeper_id,shot_goalkeeper_name,infraction_opponent,location,start_x,start_y,end_x,end_y,period_id,milliseconds,accurate,not_accurate,offside
0,1646549577,5345057,1H,0,3,00:00:03.187,5.187274,1.646550e+09,,,,,,pass,"[back_pass, short_or_medium_pass]",49.0,50.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,372255,L. Martínez,SS,True,180.0,,16.0,20635.0,F. Acerbi,CB,34.0,50.0,1.646550e+09,6.324848,[],4.0,0.0,49.0,50.0,75.0,96.0,3161.0,Internazionale,3-5-2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,49.0,50.0,34.0,50.0,1,3000,,,0
1,1646549578,5345057,1H,0,6,00:00:06.645,8.645442,1.646550e+09,,,,,,pass,"[forward_pass, long_pass, pass_to_final_third,...",34.0,50.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,20635,F. Acerbi,CB,True,36.0,high,53.0,330003.0,D. Dumfries,RWB,75.0,96.0,1.646550e+09,6.324848,[],4.0,1.0,49.0,50.0,75.0,96.0,3161.0,Internazionale,3-5-2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,34.0,50.0,75.0,96.0,1,6000,,,0
2,1646549581,5345057,1H,0,9,00:00:09.512,11.512122,1.646550e+09,,,,,,pass,head_pass,75.0,96.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,330003,D. Dumfries,RWB,,,,,,,,,,1.646550e+09,6.324848,[],4.0,2.0,49.0,50.0,75.0,96.0,3161.0,Internazionale,3-5-2,,,344132.0,Theo Hernández,LB,184.0,False,188.0,1.646550e+09,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,75.0,96.0,27.0,0.0,1,9000,False,True,0
3,1646549582,5345057,1H,0,11,00:00:11.794,13.794992,,,,,,,game_interruption,[ball_out],73.0,100.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,73.0,100.0,73.0,100.0,1,11000,,,0
4,1646549583,5345057,1H,0,28,00:00:28.242,30.242578,1.646550e+09,,,,,,throw_in,[],83.0,100.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,330003,D. Dumfries,RWB,True,-71.0,,27.0,8327.0,E. Džeko,CF,91.0,62.0,1.646550e+09,2.277116,"[attack, throw_in]",5.0,0.0,83.0,100.0,91.0,64.0,3161.0,Internazionale,3-5-2,,,,,,,,,,,False,False,False,right,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,83.0,100.0,91.0,62.0,1,28000,,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1221,1646551402,5345057,2H,92,28,01:32:28.116,5754.116723,,,,,,,game_interruption,[ball_out],100.0,5.0,3157,Milan,4-2-3-1,3161,Internazionale,3-5-2,0,,,,,,,,,,,,1.646551e+09,1.983295,[],2.0,1.0,14.0,54.0,0.0,95.0,3161.0,Internazionale,3-5-2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100.0,5.0,100.0,5.0,2,5548000,,,0
1222,1646551403,5345057,2H,92,46,01:32:46.019,5772.01947,1.646551e+09,,,,,,corner,[],100.0,0.0,3157,Milan,4-2-3-1,3161,Internazionale,3-5-2,518231,S. Tonali,LDMF,False,97.0,high,31.0,7905.0,R. Lukaku,CF,96.0,45.0,1.646551e+09,7.1935525,"[corner, set_piece_attack]",4.0,0.0,100.0,0.0,67.0,3.0,3157.0,Milan,4-2-3-1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,100.0,0.0,96.0,45.0,2,5566000,,,0
1223,1646550872,5345057,2H,92,47,01:32:47.426,5773.426068,1.646551e+09,,,,,,interception,"[head_pass, loss]",4.0,55.0,3161,Internazionale,3-5-2,3157,Milan,4-2-3-1,7905,R. Lukaku,CF,,,,,,,,,,1.646551e+09,7.1935525,"[corner, set_piece_attack]",4.0,1.0,100.0,0.0,67.0,3.0,3157.0,Milan,4-2-3-1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,55.0,4.0,55.0,2,5567000,,,0
1224,1646551404,5345057,2H,92,52,01:32:52.110,5778.110842,1.646551e+09,,,,,,pass,"[lateral_pass, pass_to_final_third, short_or_m...",62.0,41.0,3157,Milan,4-2-3-1,3161,Internazionale,3-5-2,291591,F. Tomori,LCB,False,-79.0,,26.0,0.0,,,67.0,3.0,1.646551e+09,7.1935525,"[corner, set_piece_attack]",4.0,2.0,100.0,0.0,67.0,3.0,3157.0,Milan,4-2-3-1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,62.0,41.0,67.0,3.0,2,5572000,,,0


In [170]:
from socceraction.spadl.wyscout_v3 import determine_bodypart_id as determine_bodypart_id_v3

fixed_df["time_seconds"] = fixed_df["milliseconds"] / 1000
fixed_df = fixed_df.rename(columns={"matchId": "game_id"})
df_actions = fixed_df[
    [
        "game_id",
        "period_id",
        "time_seconds",
        "team_id",
        "player_id",
        "start_x",
        "start_y",
        "end_x",
        "end_y",
    ]
].copy()
df_actions["original_event_id"] = fixed_df["id"].astype(object)

In [175]:
df_actions["bodypart_id"] = fixed_df.apply(determine_bodypart_id_v3, axis=1)

In [178]:
fixed_df_v2["time_seconds"] = fixed_df_v2["milliseconds"] / 1000
df_actions_v2 = fixed_df_v2[
    [
        "game_id",
        "period_id",
        "time_seconds",
        "team_id",
        "player_id",
        "start_x",
        "start_y",
        "end_x",
        "end_y",
    ]
].copy()
df_actions_v2["original_event_id"] = fixed_df_v2["event_id"].astype(object)
df_actions_v2["bodypart_id"] = fixed_df_v2.apply(determine_bodypart_id, axis=1)

The differences here are because I have added the left foot and right foot flag from shots which did not look like they were available in the v2. There is also some informatino missing from v2 that makes interceptions come through as foot even when they are with head - so our new version does a better job.

In [193]:
merged = df_actions.merge(df_actions_v2, on="original_event_id")
len(merged.loc[merged['bodypart_id_x'] != merged['bodypart_id_y']])

59

In [214]:
from socceraction.spadl.wyscout_v3 import determine_type_id as determine_type_id_v3

df_actions["type_id"] = fixed_df.apply(determine_type_id_v3, axis=1)
df_actions_v2["type_id"] = fixed_df_v2.apply(determine_type_id, axis=1)

In [222]:
action_counts = df_actions.type_id.value_counts().reset_index()
action_counts_v2 = df_actions_v2.type_id.value_counts().reset_index()

action_counts.merge(action_counts_v2, on='index', suffixes=['', '_v2']).rename(columns={'index': "action_id"}).sort_values(by='action_id')

Unnamed: 0,action_id,type_id,type_id_v2
0,0,704,734
7,1,27,27
4,2,34,34
14,3,4,4
5,4,34,34
11,5,6,6
15,6,2,2
6,7,28,29
3,8,41,37
12,9,5,4
