# Interactive Python using jupyter notebooks

- Notebooks are made of a sequence of cells
- Cells can contain different content such as Python code, or Markdown
- You can change the cell type in the toolbar
- To execute a cell press "Shift+Return"
- Use the tool bar to add, delete, copy, or insert cells

(Note: to learn more about Markdown check [Daring Fireball's website](https://daringfireball.net/projects/markdown/syntax))

## Import the Python package for numerical arrays (numpy)

In [1]:
import numpy as np

## Define a function that creates some statistical data

In [2]:
def load_data():    
    # Goalkeeper, defender, midfielder, attacker
    possible_positions = ['GK', 'D', 'M', 'A']
    N = 100
    positions = []
    heights = []
    for i in range(0,N):
        positions.append(possible_positions[np.random.randint(len(possible_positions))])
        heights.append(np.random.normal(loc=180.0,scale=5.0))
    return positions, heights

## Read the data

In [7]:
positions, heights = load_data()

The objects ```positions``` and ```heights``` are lists as we can check using the Python function ```type```:

In [4]:
print(type(positions))
print(type(heights))

<class 'list'>
<class 'list'>


In [8]:
print(positions)

['GK', 'GK', 'GK', 'M', 'A', 'A', 'GK', 'A', 'D', 'GK', 'M', 'D', 'A', 'D', 'A', 'GK', 'GK', 'GK', 'M', 'GK', 'M', 'M', 'D', 'GK', 'D', 'A', 'GK', 'GK', 'A', 'M', 'D', 'M', 'M', 'GK', 'GK', 'M', 'GK', 'M', 'A', 'D', 'D', 'M', 'GK', 'A', 'D', 'GK', 'GK', 'M', 'M', 'GK', 'A', 'GK', 'GK', 'D', 'M', 'D', 'GK', 'D', 'A', 'M', 'A', 'A', 'M', 'GK', 'M', 'A', 'GK', 'M', 'GK', 'GK', 'GK', 'GK', 'GK', 'D', 'M', 'GK', 'M', 'M', 'A', 'D', 'D', 'M', 'A', 'D', 'GK', 'D', 'GK', 'GK', 'D', 'M', 'M', 'D', 'D', 'D', 'A', 'D', 'M', 'A', 'M', 'M']


Question: *How many items are inside the lists ```positions``` and ```heights```?

Hint: Use the Python function ```len```. 

In [6]:
len(positions),len(heights)

(100, 100)

## Convert to numpy arrays

In [14]:
np_positions = np.array(positions)
np_heights = np.array(heights)

Question: *what is the data type of ```np_positions``` and ```np_heights```*?<br>
Question: *what is the shape of ```np_positions``` and ```np_heights```*?


Hint: Numpy objects have member functions called ```dtype``` and ```shape```.

In [24]:
np_positions.dtype, np_heights.dtype

(dtype('<U2'), dtype('float64'))

## Extract the heights of the goalkeepers

In [25]:
gk_heights = np_heights[np_positions == 'GK']

In [28]:
a_heights = np_heights[np_positions == 'A']

In [26]:
print(gk_heights)

[169.86575438 179.34654444 185.01830255 183.25810053 173.9430434
 182.56057268 187.56862831 187.04728687 177.34892379 167.63523966
 180.03378354 171.39278774 177.93653305 184.43948141 180.88236343
 187.33527808 182.32981454 176.98351459 182.60750891 181.90288272
 175.82364069 181.74254321 182.38090366 175.8493064  179.10350983
 175.56593294 175.17405407 172.80884945 179.19255233 181.20022633
 172.72373857 181.79103139 175.81263273]


## Print the median of the goalkeepers heights

In [27]:
print("Median height of goalkeepers: " + str(np.median(gk_heights)))

Median height of goalkeepers: 179.3465444352619


In [29]:
print("Median height of attackers: " + str(np.median(a_heights)))

Median height of attackers: 178.074307367924


In [36]:
a_gk_heights = np.append(gk_heights,a_heights)
print(a_gk_heights)

[169.86575438 179.34654444 185.01830255 183.25810053 173.9430434
 182.56057268 187.56862831 187.04728687 177.34892379 167.63523966
 180.03378354 171.39278774 177.93653305 184.43948141 180.88236343
 187.33527808 182.32981454 176.98351459 182.60750891 181.90288272
 175.82364069 181.74254321 182.38090366 175.8493064  179.10350983
 175.56593294 175.17405407 172.80884945 179.19255233 181.20022633
 172.72373857 181.79103139 175.81263273 177.58936206 175.37605955
 179.73218341 182.37356949 188.6164053  177.94822595 176.25005025
 177.46314651 181.37350773 176.82858007 171.42063223 179.31486756
 171.50715621 178.20038879 178.34543737 187.03096133 189.32241703
 177.61233385]


In [37]:
print("Median height of attackers and goalkeepers: " + str(np.median(a_gk_heights)))

Median height of attackers and goalkeepers: 179.1035098251782


Question: *what is the median height of all the field players*?<br>
Question: *what is the median height of all the attackers*?<br>
Question: *what is the median height of goalkeepers and the attackers combined?*

## More statistical tests

Besides the ```median```, numpy als comes with the functions ```mean```, ```std```, ```min``` and ```max``` which are useful for investigating statistical data. 

Question: *Who is the shortest player (which position)*?<br>
Question: *Who is the tallest player (which position)*?

In [39]:
np.std(heights),np.min(heights)

(5.374649913505452, 163.57430526730235)

## Data plotting

For plotting, we need the package matplotlib

In [40]:
import matplotlib.pyplot as plt

There are different display modes for matplotlib plots inside a jupyter notebook.

In [41]:
# For inline plots use
%matplotlib inline

In [42]:
# For inline plots with interactive capabilities use
%matplotlib notebook

Lets visualize the height distribution of the defenders

In [43]:
d_heights = np_heights[np_positions == 'D']

In [44]:
plt.figure()
plt.hist(d_heights)
plt.title('Defenders')
plt.xlabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

To figure out the tallest/shortest player, we can the max/min within each position

In [45]:
p = ['GK', 'D', 'M', 'A']
p_max = [np_heights[np_positions == i].max() for i in p]

In [46]:
plt.figure()
plt.plot(range(len(p)), p_max)
plt.gca().xaxis.set_ticks(range(len(p)))
plt.gca().xaxis.set_ticklabels(p)
plt.ylabel('Heights')
plt.show()

<IPython.core.display.Javascript object>

For inspiration on data plotting and more examples, check out the matplotlib gallery: [https://matplotlib.org/gallery.html](https://matplotlib.org/gallery.html)