# Wyscout Data for Inter 2017-2018

In this notebooks we will learn to work with [Wyscout](https://wyscout.com/) data and to extract useful information for the tactical point of view.

For this course we will use che [Player Rank](https://github.com/mesosbrodleto/playerank) open source project owned by Pappalardo Luca, Cintia Paolo & Co.

## utilities

In [18]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.io.json import json_normalize
import util
path = "C:/Users/Mauro/OneDrive/Documenti/Football/playerank-master"
inter_id = 3161

### events

**GET ALL THE SERIE A MATCH EVENTS** 

In [19]:
events = pd.read_json(os.path.join(path, 'data/events/events_Italy.json'))
len(events)
events.head(3)

Unnamed: 0,eventId,eventName,eventSec,id,matchId,matchPeriod,playerId,positions,subEventId,subEventName,tags,teamId
0,8,Pass,2.530536,180423957,2575959,1H,8327,"[{'y': 52, 'x': 49}, {'y': 44, 'x': 43}]",85,Simple pass,[{'id': 1801}],3158
1,8,Pass,3.768418,180423958,2575959,1H,20438,"[{'y': 44, 'x': 43}, {'y': 17, 'x': 36}]",85,Simple pass,[{'id': 1801}],3158
2,7,Others on the ball,4.868265,180423959,2575959,1H,8306,"[{'y': 17, 'x': 36}, {'y': 56, 'x': 78}]",72,Touch,[],3158


**GET ALL THE INTER MATCH EVENTS**

In [22]:
path = "C:/Users/Mauro/OneDrive/Documenti/Football/Inter_2017_2018"
inter_matches = pd.read_csv(os.path.join(path, "inter_matches.csv"))
inter_matches_events = events.query("matchId in " + str(list(inter_matches.wyId)))
inter_matches_events.head(3)

Unnamed: 0,eventId,eventName,eventSec,id,matchId,matchPeriod,playerId,positions,subEventId,subEventName,tags,teamId
6313,8,Pass,0.924246,180460660,2575963,1H,269152,"[{'y': 50, 'x': 52}, {'y': 47, 'x': 34}]",85,Simple pass,[{'id': 1801}],3176
6314,8,Pass,1.679327,180458825,2575963,1H,26518,"[{'y': 47, 'x': 34}, {'y': 48, 'x': 30}]",85,Simple pass,[{'id': 1801}],3176
6315,8,Pass,2.980452,180458841,2575963,1H,20866,"[{'y': 48, 'x': 30}, {'y': 88, 'x': 76}]",83,High pass,[{'id': 1801}],3176


**NORMALIZE EVENTS LOCATION**

In [24]:
inter_matches_events = util.get_event_locations(inter_matches_events, "positions")
inter_matches_events.head(3)

Unnamed: 0,eventId,eventName,eventSec,id,matchId,matchPeriod,playerId,subEventId,subEventName,tags,teamId,x_start,y_start,x_end,y_end
0,8,Pass,0.924246,180460660,2575963,1H,269152,85,Simple pass,[{'id': 1801}],3176,52,50,34,47
1,8,Pass,1.679327,180458825,2575963,1H,26518,85,Simple pass,[{'id': 1801}],3176,34,47,30,48
2,8,Pass,2.980452,180458841,2575963,1H,20866,83,High pass,[{'id': 1801}],3176,30,48,76,88


**NORMALIZE EVENTS TAGS**

In [26]:
inter_matches_events = util.get_event_tags(inter_matches_events, "tags")
inter_matches_events.head(3)

Unnamed: 0,eventId,eventName,eventSec,id,matchId,matchPeriod,playerId,subEventId,subEventName,teamId,x_start,y_start,x_end,y_end,tags
0,8,Pass,0.924246,180460660,2575963,1H,269152,85,Simple pass,3176,52,50,34,47,[1801]
1,8,Pass,1.679327,180458825,2575963,1H,26518,85,Simple pass,3176,34,47,30,48,[1801]
2,8,Pass,2.980452,180458841,2575963,1H,20866,83,High pass,3176,30,48,76,88,[1801]


**GET GOALS ABOUT DERBY AT HOME**

In [31]:
match_id = inter_matches[inter_matches.label.str.contains(" - Milan")].wyId.iloc[0]
derby_events = inter_matches_events.query("matchId == " + str(match_id))
id_goals = []
for i in range(len(derby_events)):
    event = derby_events.iloc[i]
    if event.teamId == inter_id and 101 in event.tags and event.subEventName in ['Shot', 'Penalty']:
        id_goals.append(i)
derby_goals = derby_events.iloc[id_goals]
inter_players = pd.read_csv(os.path.join(path, "inter_players.csv"))
pd.merge(derby_goals, inter_players, how="inner", left_on="playerId", right_on="wyId")[['playerId', 'shortName']]

Unnamed: 0,playerId,shortName
0,206314,M. Icardi
1,206314,M. Icardi
2,206314,M. Icardi


**EXPLORE DATA ABOUT INTER MATCH EVENTS**

In [35]:
match_id = inter_matches[inter_matches.label.str.contains(" - Juventus")].wyId.iloc[0]
int_juv_events = inter_matches_events.query("matchId == " + str(match_id))
home_events = int_juv_events.query("teamId == " + str(inter_id))
away_events = int_juv_events.query("teamId != " + str(inter_id))
print("In Inter-Juventus there are", home_events.shape[0], "events associated to Inter")
print("In Inter-Juventus there are", away_events.shape[0], "events associated to Juventus")

In Inter-Juventus there are 790 events associated to Inter
In Inter-Juventus there are 849 events associated to Juventus


In [37]:
print("For every event we have these informations")
inter_matches_events.dtypes

For every event we have these informations


eventId           int64
eventName        object
eventSec        float64
id                int64
matchId           int64
matchPeriod      object
playerId          int64
subEventId       object
subEventName     object
teamId            int64
x_start           int64
y_start           int64
x_end            object
y_end            object
tags             object
dtype: object

**EXPORT USEFUL DATA FOR NEXT ANALYSIS IN CSV FILE**

In [38]:
inter_matches_events.to_csv(os.path.join(path, "inter_matches_events.csv"), index_label=False)