In [1]:
#enables chart rendering
%matplotlib inline

In [2]:
!pip install --upgrade --force-reinstal --no-deps git+https://github.com/jlondal/pyrugga.git

Collecting git+https://github.com/jlondal/pyrugga.git
  Cloning https://github.com/jlondal/pyrugga.git to /tmp/pip-req-build-9jyo3hbr
Building wheels for collected packages: pyrugga
  Building wheel for pyrugga (setup.py) ... [?25ldone
[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-ivwz5jby/wheels/c4/c6/06/41574b4a3a768b91eeec22fe1a22c5ca0f5a9d0bdc0d36c6fa
Successfully built pyrugga
Installing collected packages: pyrugga
  Found existing installation: pyrugga 1.0.1
    Uninstalling pyrugga-1.0.1:
      Successfully uninstalled pyrugga-1.0.1
Successfully installed pyrugga-1.0.1


# First 10 Minutes 

Welcome to your first 10 minutes with Pyrugga. In this tutorial you will learn how to convert a Superscout XML into a Match object analyse a game of Rugby. 

The first step is to import Pyrugga library. This is as simple us typing 

In [3]:
import pyrugga as pgr
import pandas as pd
import numpy as np

SyntaxError: invalid syntax (match.py, line 508)

Pyrugga requires a Super Scout file containing a play by play descriptions of a match. They are stored in an XML format which is not great for statistical analysis, but useful for other things. We need to convert the XML format  into something a little more friendly, a Pandas Dataframes. 

In [None]:
df = pgr.Match('game_1.xml')

A Match object contains a number of functions and Dataframes to help us analyse a match 

**Dataframes**

* Summary -- Summary of the match 
* Events -- Description of each action 
* Timeline -- A timeline of the match with periods ending when either possession of the ball changes or there was a stopage in play 

**Functions**


* getRef -- Will return the name of the referee
* Draw -- Will return if the match was drawn ?
* HomeWin -- Will return if home team won ? 
  

* getTerritoryY -- amount of time each team spent in a zone (breadth of the pitch)
* getTerritoryX -- amount of time each team spent in a zone (length of the pitch)
* getTerritory -- amount of time each team spent in a zone (both breadth & length of the pitch)



to view the summary  line of a match

In [None]:
df.summary

to access the first 10 events of a match 

In [None]:
df.events.head(10)

to access the the Sharks first periods of play 

In [None]:
df.timeline.query('team_name == "Natal Sharks"').head(3)

And so on ...

In [None]:
df.Draw()

In [None]:
df.HomeWin()

In [None]:
df.getRef()

# Some Analysis

Lets do to something a little more useful. To working which team had the most possession during a game. 

Remember our match is still stored in variable **df** and it contains three Dataframes called

* **events**, a blow by blow account of every action in a match
* **timelines**, match broken into periods of play ending with a stopage in play or change of posession 
* **summary**, a summary of the match

To access the timeline 

```python
df.timeline
```

then we will want to group by team_name and sum up the length of time each team had the ball. Group by is as simple as just ".groupby('team_name')" 

```python
df.timeline.groupby('team_name') 
```

and then to sum add ".sum()" and for length add "['length']" leaving us with 

In [None]:
df.timeline.groupby('team_name').sum()['length'] 

The Sharks had the ball 1349 seconds compare to Province 707 secounds.  

If we wanted to see this in a percentage we need to sum the column which requires adding another ".sum()"


In [None]:
df.timeline.groupby('team_name').sum()['length'] / (df.timeline.groupby('team_name').sum()['length'].sum())

Say we want to see points per second with ball. We can see Province was slightly more effective scoring points than The Sharks.

In [None]:
df.timeline.groupby(['team_name']).sum()['length'] / (df.timeline.groupby('team_name')['points'].sum())

But what about that other TV statistics we tend to see such as territory I here you say. There are three functions to help us calculate that

In [None]:
df.getTerritory(perc=True)

In [None]:
# Territory in the length of the pitch
df.getTerritoryX(perc=True)

In [None]:
# Territory in the width of the pitch
df.getTerritoryY(perc=True)

In [None]:
df.getTerritoryMetric()

We can also build out plots to see that the Sharks play slight more on the left side than the right while the Storms do not have much preferance

In [None]:
df.getTerritoryY(perc=True).plot(kind='bar',figsize=(17,8),title='Width')

In [None]:
df.getTerritoryX(perc=True).plot(kind='barh',figsize=(17,8),title='Territory')

In [None]:
df.heat_map()

In [None]:
df.heat_map(cust_metric='Kick')

In [None]:
df.heat_map(cust_metric='Pass')

In [None]:
df.heat_map(cust_metric='Carry')

In [None]:
df.events

## Players

Produces a summary of each player with the option to normilised either via minutes, actions or phases. 

In [None]:
df.player_summary(norm='mins')

In [None]:
df.player_summary(norm='actions')

In [None]:
df.player_summary(norm='phases')

In [None]:
df.player_summary()

In [None]:
df.