In [50]:
#enables chart rendering
%matplotlib inline

# First 10 Minutes 

Welcome to your first 10 minutes with Pyrugga. In this tutorial you will learn how to convert a Superscout XML into a Match object analyse a game of Rugby. 

The first step is to import Pyrugga library. This is as simple us typing 

In [51]:
import pyrugga as pgr

Pyrugga requires a Super Scout file containing a play by play descriptions of a match. They are stored in an XML format which is not great for statistical analysis, but useful for other things. We need to convert the XML format  into something a little more friendly, a Pandas Dataframes. 

In [52]:
df = pgr.Match('game_1.xml')

A Match object contains a number of functions and Dataframes to help us analyse a match 

**Dataframes**

* Summary -- Summary of the match 
* Events -- Description of each action 
* Timeline -- A timeline of the match with periods ending when either possession of the ball changes or there was a stopage in play 

**Functions**

* getRef -- Will return the name of the referee
* Draw -- Will return if the match was drawn ?
* HomeWin -- Will return if home team won ?

to view the summary  line of a match

In [53]:
df.summary

Unnamed: 0,fixture_code,ref_id,ref_name,fixture_date,fx_week,awayteam,hometeam,home_score,away_score
0,719101,204,Peyper,27/10/2018,9,Natal Sharks,Western Province,12,17


to access the first 10 events of a match 

In [54]:
df.events.head(10)

Unnamed: 0,action_id,additional,advantage,description,event,event_type,fixture_code,home_team_advantage,match_time,metres,...,ps_endstamp,ps_timestamp,set_num,shirt_no,team_name,x_coord,x_coord_end,y_coord,y_coord_end,points
0,18665513,,0,,Period,Start Period,719101.0,0,0,0,...,0,0,0,0.0,Western Province,50,0,34,0,0
0,18665514,,0,,Restart,50m Restart Kick,719101.0,0,0,0,...,0,0,0,15.0,Natal Sharks,50,75,34,57,0
94,18665515,,0,,Collection,Restart Catch,719101.0,0,4,0,...,4,4,1,11.0,Western Province,26,0,13,0,0
65,18665516,,0,,Turnover,Dropped Ball Unforced,719101.0,0,4,0,...,4,4,1,11.0,Western Province,26,0,13,0,0
98,18665517,,0,,Collection,Defensive Loose Ball,719101.0,0,5,0,...,5,5,1,6.0,Natal Sharks,76,0,54,0,0
218,18665518,Neutral Contact,0,Neutral,Carry,Other Carry,719101.0,0,7,0,...,7,7,1,6.0,Natal Sharks,76,0,54,0,0
404,18665519,,0,,Tackle,Line Tackle,719101.0,0,7,0,...,7,7,1,11.0,Western Province,24,0,14,0,0
137,18665520,,0,,Ruck,,719101.0,0,7,0,...,7,7,1,0.0,Natal Sharks,75,0,52,0,0
680,18665521,,0,,Pass,Complete Pass,719101.0,0,7,0,...,7,7,1,9.0,Natal Sharks,75,0,52,0,0
231,18665522,Neutral Contact,0,Crossed Gainline,Carry,One Out Drive,719101.0,0,12,2,...,12,12,1,3.0,Natal Sharks,75,0,52,0,0


to access the timeline of a match type

In [55]:
df.timeline

Unnamed: 0,period,set_num,team_name,points,x_coord,x_coord_end,y_coord,y_coord_end,meters_gained,dist_traveled,...,fly_half_defensive,left_wing_defensive,inside_centre_defensive,outside_centre_defensive,right_wing_defensive,full_back_defensive,Natal Sharks,Western Province,Natal Sharks_points,Western Province_points
0,1,1,Natal Sharks,0,74,75,55,22,1,33.015148,...,0,2,0,0,0,0,0.0,0.0,0.0,0.0
1,1,2,Natal Sharks,0,24,59,1,54,35,63.513778,...,1,0,0,0,0,1,0.0,0.0,0.0,0.0
2,1,3,Western Province,0,41,69,14,47,28,43.278170,...,0,0,0,0,1,0,0.0,0.0,0.0,0.0
3,1,4,Natal Sharks,0,31,102,21,49,71,76.321688,...,0,0,1,0,0,1,0.0,0.0,0.0,0.0
4,1,5,Natal Sharks,0,40,50,51,46,10,11.180340,...,0,0,0,0,0,0,0.0,0.0,0.0,0.0
5,1,6,Western Province,0,76,97,3,36,21,39.115214,...,1,0,0,0,0,0,0.0,0.0,0.0,0.0
6,1,7,Western Province,0,73,92,29,14,19,24.207437,...,0,0,1,0,0,0,0.0,0.0,0.0,0.0
7,1,8,Natal Sharks,0,4,38,54,67,34,36.400549,...,0,0,0,1,0,0,0.0,0.0,0.0,0.0
8,1,9,Natal Sharks,0,21,20,64,45,-1,19.026298,...,0,0,0,0,0,0,0.0,0.0,0.0,0.0
9,1,10,Natal Sharks,0,49,89,66,55,40,41.484937,...,0,0,0,0,0,0,0.0,0.0,0.0,0.0


And so on ...

In [56]:
df.Draw()

False

In [57]:
df.HomeWin()

False

In [58]:
df.getRef()

'Peyper'

# Some Analysis

On to something a little more useful such as working out working out which team had the most possession during a game. Our match is still stored in variable **df** and it contains a Dataframe called timeline. To access the timeline 

```python
df.timeline
```

then we will want to group by team_name and sum up the time each team had the ball. Group by is simple enough just .groupby('team_name') 

```python
df.timeline.groupby('team_name') 
```

and then to sum add .sum() and for length add ['length'] leaving us with 

In [59]:
df.timeline.groupby('team_name').sum()['length'] 

team_name
Natal Sharks        1349
Western Province     707
Name: length, dtype: int64

We can see the Sharks has a 1349 seconds with the ball in play compare to Province 707 secounds. What if we wanted to see this in percentages we need to sum the column which requires adding .sum() 


In [60]:
df.timeline.groupby('team_name').sum()['length'] / (df.timeline.groupby('team_name').sum()['length'].sum())

team_name
Natal Sharks        0.656128
Western Province    0.343872
Name: length, dtype: float64

Say you want to see points per time with ball. We can see Province was slightly more effective scoring a point on average every 78 secs with the ball compared to Natal with 96secs

In [61]:
df.timeline.groupby(['team_name']).sum()['length'] / (df.timeline.groupby('team_name')['points'].sum())

team_name
Natal Sharks        96.357143
Western Province    78.555556
dtype: float64

What about that other TV stat, territory I here you say. We need a variable called zone. For that we are better off looking at indiviudal actions in the event Dataframe. 

In [65]:
pos = df.events.groupby(['team_name','x_coord']).count()['action_id']

In [68]:
pos.reset_index()['team_']

Unnamed: 0,team_name,x_coord,action_id
0,Natal Sharks,-5,1
1,Natal Sharks,3,5
2,Natal Sharks,4,2
3,Natal Sharks,6,2
4,Natal Sharks,8,5
5,Natal Sharks,9,6
6,Natal Sharks,10,7
7,Natal Sharks,11,1
8,Natal Sharks,12,7
9,Natal Sharks,13,2


In [16]:
list(df.timeline.columns)

['period',
 'set_num',
 'team_name',
 'points',
 'x_coord',
 'x_coord_end',
 'y_coord',
 'y_coord_end',
 'meters_gained',
 'dist_traveled',
 'start',
 'length',
 'start_event',
 'end_event',
 'phases',
 'carry',
 'collection',
 'other_carry',
 'kick_return',
 'one_out_drive',
 'defender_beaten',
 'defensive_catch',
 'defensive_loose_ball',
 'attacking_loose_ball',
 'support_carry',
 'pick_and_go',
 'stepped',
 'dropped_ball_unforced',
 'initial_break',
 'restart_catch',
 'restart_return',
 'supported_break',
 'kick',
 'goal_kick',
 'box',
 'territorial',
 'bomb',
 'touch_kick',
 'lineout',
 'lineout_take',
 'throw_middle',
 'lineout_win_middle',
 'lineout_win_front',
 'maul',
 'ruck',
 'pass',
 'complete_pass',
 'offload',
 'scrum_half_pass',
 'break_pass',
 'incomplete_pass',
 'penalty_conceded',
 'penalty_won',
 'scrum',
 'defensive_scrum',
 'offensive_scrum',
 'no_8_pick_up',
 'no_8_pass',
 'tackle',
 'missed_tackle',
 'line_tackle',
 'chase_tackle',
 'other_tackle',
 'cover_tackle'