# Interactive Python using jupyter notebooks

- Notebooks are made of a sequence of cells
- Cells can contain different content such as Python code, or Markdown
- You can change the cell type in the toolbar
- To execute a cell press "Shift+Return"
- Use the tool bar to add, delete, copy, or insert cells

(Note: to learn more about Markdown check [Daring Fireball's website](https://daringfireball.net/projects/markdown/syntax))

## Import the Python package for numerical arrays (numpy)

In [1]:
import numpy as np

## Define a function that creates some statistical data

In [2]:
def load_data():    
    # Goalkeeper, defender, midfielder, attacker
    possible_positions = ['GK', 'D', 'M', 'A']
    N = 100
    positions = []
    heights = []
    for i in range(0,N):
        positions.append(possible_positions[np.random.randint(len(possible_positions))])
        heights.append(np.random.normal(loc=180.0,scale=5.0))
    return positions, heights

## Read the data

In [3]:
positions, heights = load_data()

The objects ```positions``` and ```heights``` are lists as we can check using the Python function ```type```:

In [4]:
print(type(positions))
print(type(heights))

<class 'list'>
<class 'list'>


Question: *How many items are inside the lists ```positions``` and ```heights```?

Hint: Use the Python function ```len```. 

In [15]:
print(len(positions))
print(len(heights))

100
100


## Convert to numpy arrays

In [5]:
np_positions = np.array(positions)
np_heights = np.array(heights)

Question: *what is the data type of ```np_positions``` and ```np_heights```*?<br>
Question: *what is the shape of ```np_positions``` and ```np_heights```*?


Hint: Numpy objects have member functions called ```dtype``` and ```shape```.

In [21]:
print(np_positions.dtype)
print(np_heights.dtype)

print(np_positions.shape)
print(np_heights.shape)

<U2
float64
(100,)
(100,)


## Extract the heights of the goalkeepers

In [28]:
gk_heights = np_heights[np_positions == 'GK']
field_heights = np_heights[np_positions != 'GK']
A_heights = np_heights[np_positions == 'A']

GK_and_A_heights = np.hstack((gk_heights, A_heights))

print(GK_and_A_heights)

[183.93945046 181.6147069  177.30240963 187.64011195 175.39163947
 180.07543799 171.91815178 177.56451433 180.21310284 174.88531002
 180.24124149 186.77823578 179.83214034 186.37117311 189.94255967
 178.16318355 171.32804637 183.61984216 189.28506386 178.67712661
 172.60313402 182.05181449 172.99599812 172.98346826 183.6499095
 187.56276369 174.51051802 177.95950518 181.20318998 180.11633286
 182.77455629 180.55400224 177.15379613 181.43939874 182.22448357
 180.07806229 164.44882431 171.24557232 176.5461479  180.08707351
 185.09219238 181.50306381 173.40332312 182.85923368 173.49859102
 173.32056821 174.1616272  178.10457264 181.6349082  190.0928954
 177.12146071]


## Print the median of the goalkeepers heights

In [23]:
print("Median height of goalkeepers: " + str(np.median(gk_heights)))

Median height of goalkeepers: 180.07543798853868


Question: *what is the median height of all the field players*?<br>
Question: *what is the median height of all the attackers*?<br>
Question: *what is the median height of goalkeepers and the attackers combined?*

In [29]:
print("Median height of field players: " + str(np.median(field_heights)))
print("Median height of attackers: " + str(np.median(A_heights)))
print("Median height of goalkeepers and attackers: " + str(np.median(GK_and_A_heights)))

Median height of field players: 180.18770343057565
Median height of attackers: 180.08256790049683
Median height of goalkeepers and attackers: 180.0780622943705


## More statistical tests

Besides the ```median```, numpy als comes with the functions ```mean```, ```std```, ```min``` and ```max``` which are useful for investigating statistical data. 

Question: *Who is the shortest player (which position)*?<br>
Question: *Who is the tallest player (which position)*?

In [35]:
min_h = min(np_heights)
max_h = max(np_heights)
min_pos = np_positions[np_heights == min_h]
max_pos = np_positions[np_heights == max_h]


print("The smallest player is a: " + min_pos[0])
print("The tallest player is a: " + max_pos[0])

The smallest player is a: A
The tallest player is a: D


## Data plotting

For plotting, we need the package matplotlib

In [8]:
import matplotlib.pyplot as plt

There are different display modes for matplotlib plots inside a jupyter notebook.

In [9]:
# For inline plots use
%matplotlib inline

In [10]:
# For inline plots with interactive capabilities use
%matplotlib notebook

Lets visualize the height distribution of the defenders

In [11]:
d_heights = np_heights[np_positions == 'D']

In [12]:
plt.figure()
plt.hist(d_heights)
plt.title('Defenders')
plt.xlabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

To figure out the tallest/shortest player, we can the max/min within each position

In [13]:
p = ['GK', 'D', 'M', 'A']
p_max = [np_heights[np_positions == i].max() for i in p]

In [14]:
plt.figure()
plt.plot(range(len(p)), p_max)
plt.gca().xaxis.set_ticks(range(len(p)))
plt.gca().xaxis.set_ticklabels(p)
plt.ylabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

For inspiration on data plotting and more examples, check out the matplotlib gallery: [https://matplotlib.org/gallery.html](https://matplotlib.org/gallery.html)