# Interactive Python using jupyter notebooks

- Notebooks are made of a sequence of cells
- Cells can contain different content such as Python code, or Markdown
- You can change the cell type in the toolbar
- To execute a cell press "Shift+Return"
- Use the tool bar to add, delete, copy, or insert cells

(Note: to learn more about Markdown check [Daring Fireball's website](https://daringfireball.net/projects/markdown/syntax))

## Import the Python package for numerical arrays (numpy)

In [1]:
import numpy as np

## Define a function that creates some statistical data

In [2]:
def load_data():    
    # Goalkeeper, defender, midfielder, attacker
    possible_positions = ['GK', 'D', 'M', 'A']
    N = 100
    positions = []
    heights = []
    for i in range(0,N):
        positions.append(possible_positions[np.random.randint(len(possible_positions))])
        heights.append(np.random.normal(loc=180.0,scale=5.0))
    return positions, heights

## Read the data

In [3]:
positions, heights = load_data()

The objects ```positions``` and ```heights``` are lists as we can check using the Python function ```type```:

In [4]:
print(type(positions))
print(type(heights))

<type 'list'>
<type 'list'>


Question: *How many items are inside the lists ```positions``` and ```heights```?

Hint: Use the Python function ```len```. 

In [5]:
print len(positions)
print len(heights)

100
100


## Convert to numpy arrays

In [6]:
np_positions = np.array(positions)
np_heights = np.array(heights)

Question: *what is the data type of ```np_positions``` and ```np_heights```*?<br>
Question: *what is the shape of ```np_positions``` and ```np_heights```*?


Hint: Numpy objects have member functions called ```dtype``` and ```shape```.

In [44]:
print np_positions.shape, np_heights.shape
print np_positions.dtype
print np_heights.dtype

print np_positions
print np_heights

(100,) (100,)
|S2
float64
['A' 'M' 'A' 'M' 'M' 'A' 'D' 'M' 'GK' 'GK' 'GK' 'GK' 'M' 'GK' 'M' 'GK' 'M'
 'A' 'M' 'A' 'A' 'D' 'M' 'GK' 'M' 'A' 'A' 'D' 'M' 'D' 'GK' 'A' 'A' 'GK'
 'D' 'D' 'A' 'GK' 'GK' 'GK' 'M' 'GK' 'D' 'D' 'GK' 'GK' 'D' 'A' 'A' 'GK'
 'D' 'M' 'A' 'M' 'D' 'A' 'D' 'A' 'GK' 'D' 'D' 'GK' 'A' 'D' 'GK' 'A' 'GK'
 'D' 'D' 'M' 'A' 'A' 'D' 'M' 'D' 'M' 'D' 'GK' 'GK' 'M' 'A' 'D' 'A' 'D' 'D'
 'A' 'D' 'GK' 'A' 'A' 'D' 'D' 'D' 'A' 'A' 'GK' 'M' 'M' 'GK' 'A']
[176.35233561 183.15238168 180.41692489 170.27576625 183.58511928
 181.60915941 176.67466281 181.45602494 176.09119202 183.41107785
 173.58674974 176.47847043 176.59355049 177.34482542 179.35452489
 178.51839939 178.88599342 182.5218037  177.65881434 179.85583752
 177.49293838 184.87143971 183.92977659 186.89914047 182.91686461
 179.8341647  177.66223785 172.47368253 183.38063304 176.93992421
 175.02901123 174.62292932 173.76003731 182.96545736 173.24420125
 182.36839983 178.49837483 174.07606263 184.72964127 177.73241231
 191.80745547 

## Extract the heights of the goalkeepers

In [47]:
gk_heights = np_heights[np_positions == 'GK']
attacker_heights = np_heights[np_positions == 'A']
field_heights = np_heights[np_positions == 'M']
attacker_gk = np.append(field_heights, attacker_heights)

print gk_heights
print field_heights
print attacker_heights
print attacker_gk
#print gk_heights

[176.09119202 183.41107785 173.58674974 176.47847043 177.34482542
 178.51839939 186.89914047 175.02901123 182.96545736 174.07606263
 184.72964127 177.73241231 185.5457376  182.44962633 173.28094
 173.23333686 178.6902921  179.3848828  175.38901697 179.15807163
 174.6009678  184.12662342 177.60952381 175.56222925 181.88158001]
[183.15238168 170.27576625 183.58511928 181.45602494 176.59355049
 179.35452489 178.88599342 177.65881434 183.92977659 182.91686461
 183.38063304 191.80745547 181.61418506 185.52756634 184.95410741
 178.24039795 177.67444418 185.58925045 185.18011772 182.37686018]
[176.35233561 180.41692489 181.60915941 182.5218037  179.85583752
 177.49293838 179.8341647  177.66223785 174.62292932 173.76003731
 178.49837483 176.85004224 179.14952415 187.56526915 186.91164838
 180.36827506 179.79468914 177.71808964 169.56183267 188.05378686
 179.27145489 181.3636217  181.31707715 182.60482494 182.05814193
 178.38161329 174.98414334 177.60404584]
[183.15238168 170.27576625 183.58511

## Print the median of the goalkeepers heights

In [49]:
print("Median height of goalkeepers: " + str(np.median(gk_heights)))
print("Median height of attackers: " + str(np.median(attacker_heights)))
print("Median height of field players: " + str(np.median(field_heights)))
print("Median height of attacker + goalkeeper: " + str(np.median(attacker_gk)))

Median height of goalkeepers: 177.73241230951768
Median height of attackers: 179.53307201245022
Median height of field players: 182.64686239539884
Median height of attacker + goalkeeper: 180.11205629069315


Question: *what is the median height of all the field players*?<br>
Question: *what is the median height of all the attackers*?<br>
Question: *what is the median height of goalkeepers and the attackers combined?*

## More statistical tests

Besides the ```median```, numpy als comes with the functions ```mean```, ```std```, ```min``` and ```max``` which are useful for investigating statistical data. 

Question: *Who is the shortest player (which position)*?<br>
Question: *Who is the tallest player (which position)*?

In [42]:
print np_positions[np_heights == min(np_heights)]
print np_positions[np_heights == max(np_heights)]
print min(np_heights)
print max(np_heights)
    

['A']
['M']
169.56183267408147
191.80745547380957


## Data plotting

For plotting, we need the package matplotlib

In [10]:
import matplotlib.pyplot as plt

There are different display modes for matplotlib plots inside a jupyter notebook.

In [11]:
# For inline plots use
%matplotlib inline

In [12]:
# For inline plots with interactive capabilities use
%matplotlib notebook

Lets visualize the height distribution of the defenders

In [13]:
d_heights = np_heights[np_positions == 'D']

In [14]:
plt.figure()
plt.hist(d_heights)
plt.title('Defenders')
plt.xlabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

To figure out the tallest/shortest player, we can the max/min within each position

In [15]:
p = ['GK', 'D', 'M', 'A']
p_max = [np_heights[np_positions == i].max() for i in p]

In [16]:
plt.figure()
plt.plot(range(len(p)), p_max)
plt.gca().xaxis.set_ticks(range(len(p)))
plt.gca().xaxis.set_ticklabels(p)
plt.ylabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

For inspiration on data plotting and more examples, check out the matplotlib gallery: [https://matplotlib.org/gallery.html](https://matplotlib.org/gallery.html)