## Jupyter notebook

Jupyter notebooks have many special interactive behaviours that aren't available in the normal python runtime environment:

In [166]:
print?

[31mSignature:[39m print(*args, sep=[33m' '[39m, end=[33m'\n'[39m, file=[38;5;28;01mNone[39;00m, flush=[38;5;28;01mFalse[39;00m)
[31mDocstring:[39m
Prints the values to a stream, or to sys.stdout by default.

sep
  string inserted between values, default a space.
end
  string appended after the last value, default a newline.
file
  a file-like object (stream); defaults to the current sys.stdout.
flush
  whether to forcibly flush the stream.
[31mType:[39m      builtin_function_or_method

When exploring new concepts and testing things out, notebooks are very useful as an advanced journal and testing tool. It's an interactive environment that can integrate several programming languages and scripting languages (just have a look at the options in the lower right corner of this cell).

In [167]:
# DON'T DO THIS, put it in a text-cell
# comments in code should only be about the code!
 
# Adding two integers in python is easy:
3 + 5

8

As indicated, the comments in the above cell don't belong in a code-cell. It's important to keep in mind that a jupyter notebook really is a notebook or lab-journal, _not_ a program. You can write code and execute it, but the notebook itself is not a suitable form to deliver software to a user. It's good for presentations, reports and experiments but not for writing standalone computer programs.

#### Strings and lists

In [168]:
"hello" + " " + "world"

'hello world'

In [169]:
x = [3,4,5]
x

[3, 4, 5]

In [170]:
y = [6,7,8]
x+y # append

[3, 4, 5, 6, 7, 8]

The code-cells work as a REPL, essentially a python-terminal. If you type 'python' in a terminal without any file you end up in a more primitive version of something like a jupyter cell.

### Matrices

In [171]:
import numpy as np

In [172]:
x = np.array([3,4,5])
y = np.array([6,7,8])
x+y # matrix addition

array([ 9, 11, 13])

In [173]:
A = np.array([[1,1,1],
             [2,1,-1],
             [3,2,1]])
y = np.array([4,1,5]) # numpy considers vectors as 1-d arrays

In [174]:
A.shape, A.ndim

((3, 3), 2)

Since $\mathbf{y} = \mathbf{Ax}$ we can find $\mathbf{x}$ through:
$
\mathbf{A}^{-1}\mathbf{y} = \mathbf{x}
$

In [175]:
x = np.linalg.inv(A) @ y # A*y means broadcast multiplication, @ is matrix multiplication
x

array([-3.00000000e+00,  7.00000000e+00,  1.22124533e-15])

In [176]:
A @ x

array([4., 1., 5.])

However, this only works when an inverse exists. If there is a parametric solution, then $\mathrm{Rank}$ is less than the dimension of the matrix and some non-zero input vectors $x_i$ are mapped to $\mathbf{0}$ (the linear map isn't bijective <-> the matrix doesn't have an inverse)

In [177]:
B = np.array([[1,1,1],
             [2,1,-1],
             [3,2,0]])
np.linalg.det(B)

np.float64(-3.330669073875464e-16)

Indeed, the numerically computed determinant is very close to zero for a linearly dependent map. Here an immediate artefact of numerical methods is apparent. It's not _exactly_ zero-- if you solve the problem algebraically by hand you really do get 0 however.

In [178]:
x = np.linalg.inv(B) @ y
B @ x

array([4., 0., 4.])

Something has gone quite wrong! The numerical method just fails to produce a meaningful result if the matrix isn't invertible.

However, non-invertible matrices still have a pseudo-inverse that can be computed through <code>pinv</code>. Note however, that <code>pinv</code> performs very poorly on large square matrices. This isn't much of an issue for ML problems that use matrix inversion, since there isn't an alternative and square matrices are a coincidence. Thus <code>pinv</code> is preferred in general for machine learning. It's _not_ a true inverse, however, but an _approximation_. 

A simpler way to handle this is to use ready made solvers:

In [179]:
x = np.linalg.solve(a=A, b=y) # inverse
x, A @ x

(array([-3.0000000e+00,  7.0000000e+00,  4.4408921e-16]), array([4., 1., 5.]))

In [180]:
x = np.linalg.solve(a=B, b=y) # pseudo-inverse
x, B @ x

(array([-0.33333333,  3.        ,  1.33333333]), array([4., 1., 5.]))

### Random sampling from statistical distributions

In [181]:
rng = np.random.default_rng()

X = rng.integers(100, size=100)
X

array([51, 85, 68, 21, 97, 22, 70, 61, 94, 66, 25, 25, 70, 76, 83, 66, 14,
       93, 25,  0, 77, 14, 93, 59, 46, 39, 86, 13, 99, 68, 93, 98, 83, 93,
       11,  9, 67, 49, 13, 14, 69, 26, 27, 99, 79,  8, 66, 70, 82, 93, 30,
        9, 16, 64, 81, 22, 75, 53, 72, 71, 63, 97, 66, 82, 66, 74, 17, 83,
       37,  2, 95, 36, 35, 68, 93, 20, 39, 24, 18, 22, 17, 75, 95, 97, 29,
       94, 33, 42, 39, 30, 98, 58, 90, 50, 38,  3, 58, 49, 53, 68])

In [182]:
X = rng.uniform(size=100)
X

array([0.99652588, 0.60103977, 0.14713938, 0.4594774 , 0.37480873,
       0.14944655, 0.37428749, 0.99713809, 0.71312121, 0.33791318,
       0.02875366, 0.65372925, 0.58508316, 0.71835315, 0.09666672,
       0.50322629, 0.6165656 , 0.23506076, 0.24211497, 0.11556341,
       0.73417242, 0.96318042, 0.02945494, 0.20043512, 0.04876992,
       0.49955402, 0.22685914, 0.26424261, 0.78594302, 0.01975866,
       0.41732909, 0.85574741, 0.57115352, 0.50329984, 0.18019182,
       0.68589234, 0.49049154, 0.42882069, 0.36952678, 0.36791175,
       0.82694849, 0.1897309 , 0.75491219, 0.45414968, 0.07351142,
       0.61061694, 0.19111387, 0.56288129, 0.01016886, 0.3242895 ,
       0.35683095, 0.1226183 , 0.36275466, 0.03820289, 0.59669808,
       0.56187589, 0.31718655, 0.82929326, 0.13876583, 0.37195664,
       0.32002794, 0.4384538 , 0.47184605, 0.93494973, 0.19552937,
       0.05959341, 0.08399579, 0.28216898, 0.23683722, 0.48750468,
       0.45660959, 0.70397453, 0.68175684, 0.29653372, 0.65463

In [183]:
X = rng.normal(size=100)
X

array([ 1.31266109,  0.19763438,  0.7086566 , -0.66062303, -1.21021344,
        1.61053679,  0.53724297,  0.94995867, -0.17550442,  0.7880266 ,
       -0.35825443, -0.06868533, -1.06232089,  2.89482695, -1.44445702,
       -1.77429904, -0.22331102, -0.0241652 ,  0.91815927, -1.77841675,
       -0.91309479,  1.28254776,  1.35206394, -0.03569176,  0.06813841,
       -0.04513073, -0.80315007,  0.60222302, -1.10816417,  0.367592  ,
       -0.33611548,  0.9081928 ,  0.67428641, -0.95433493, -1.0924626 ,
        0.029634  , -0.43691226,  1.30916955, -1.25639942, -0.252134  ,
        0.31114324, -0.54247758, -0.97664864,  0.99261664, -1.09672163,
       -0.3262343 , -1.89122307,  0.74290577, -0.08979133, -0.56053164,
        0.53321805,  0.86407362, -0.69363941,  0.97725607,  0.78596324,
       -1.18986068,  1.44534012, -1.46191995,  0.16978079, -0.4497356 ,
        1.42049296,  0.04280718,  0.16098962, -0.07027192,  0.6938163 ,
        1.49576773, -0.81465714, -0.90295955, -0.085697  , -1.46

In [184]:
X = rng.binomial(n=10,p=.3,size=100)
X

array([2, 5, 2, 3, 5, 5, 3, 1, 2, 3, 5, 2, 3, 2, 3, 2, 6, 2, 3, 3, 3, 2,
       7, 1, 2, 2, 6, 3, 1, 6, 3, 3, 7, 2, 4, 3, 1, 2, 4, 1, 0, 5, 2, 3,
       4, 1, 5, 4, 3, 3, 2, 5, 5, 3, 1, 1, 2, 3, 2, 5, 2, 3, 3, 4, 0, 2,
       3, 2, 4, 3, 2, 2, 3, 3, 4, 3, 4, 0, 2, 1, 1, 2, 0, 2, 5, 3, 1, 4,
       1, 3, 5, 4, 3, 5, 4, 5, 2, 4, 1, 2])

In [185]:
X = rng.negative_binomial(n=10,p=.3,size=100)
X

array([14, 31, 27, 20, 30, 19, 20, 24, 15, 22, 13, 30, 26, 38, 30, 23, 31,
       24, 47,  8, 24, 18, 12, 11, 21, 26, 17, 42, 25, 21, 20, 27, 18, 18,
       21, 13, 38, 27,  7, 42, 21, 14, 31, 10, 29, 15, 34, 31, 26, 20,  8,
       30, 56, 23, 36, 18, 32, 13, 38, 25, 27, 17, 19, 28, 21, 20, 12, 13,
       29, 40, 16, 22, 29, 32, 22, 35, 20, 28, 22, 26, 34, 10, 26, 19, 39,
       23, 35, 32, 33, 24, 35, 25, 34, 14, 15, 17, 17, 19, 30, 16])

In [186]:
X = rng.gamma(shape=1.0, scale=0.9, size=100)
X

array([0.0315426 , 2.7895587 , 1.33531818, 2.21378776, 0.31499183,
       0.38566061, 0.61716726, 0.09752232, 0.00884116, 1.28538191,
       0.32081103, 0.14408186, 0.17386393, 1.6077845 , 0.48833052,
       0.35269651, 0.19632922, 0.12566154, 1.57556836, 0.64680611,
       0.03288419, 2.50426009, 1.09464562, 2.68524891, 0.45545405,
       0.33919295, 0.41678255, 0.65511638, 1.05699719, 1.67925607,
       0.05306566, 1.16196043, 0.80230202, 0.04152956, 0.72482858,
       0.68323154, 1.79479185, 0.04876562, 0.77971187, 0.33982203,
       0.17136181, 3.18594714, 0.8076852 , 0.60065187, 1.17348292,
       1.46047739, 1.82332834, 0.33722983, 1.03356364, 0.48604225,
       0.09210708, 0.00668711, 1.276451  , 0.46661489, 0.24932242,
       0.95097861, 0.46129816, 0.69990118, 6.08633179, 0.33277643,
       2.75893591, 0.15582753, 4.14037089, 0.37848304, 1.72831741,
       1.10918046, 0.05653248, 0.27358047, 0.07094194, 2.82400752,
       0.11823286, 1.45450773, 1.6722889 , 0.4937298 , 0.51554

In [188]:
X = rng.geometric(p=0.31, size=100)
X

array([ 5,  6,  1,  2,  1,  1,  6,  3,  3,  1, 11, 12,  1,  1,  2,  3,  4,
        1,  1,  1,  2,  7,  3,  2,  5,  1,  1,  4,  6,  2, 17,  2,  4,  3,
        7,  7,  3,  9,  7, 13,  5,  4,  9,  1,  8,  1,  5,  9,  2,  2,  1,
        2,  4,  3,  3, 11,  2, 16,  2,  1,  2,  1, 17,  1,  1, 14,  3,  1,
        2,  1,  1,  8,  3,  4, 11,  1,  8,  1,  4,  1,  4,  7,  2,  1,  6,
        4,  4,  4,  2,  6,  1,  6,  1,  4,  1,  2,  1,  2,  3,  3])

### Exercise 1 (*)

a) Use matplotlib and/or seaborn to produce _histograms_ of the above distributions. Place them in subplots with reasonable sizes and formatting.

b) Use matoplotlib and/or seaborn ro produce _ogives_ of the above distributions. Place them in subplots with reasonable sizes and formatting.

#### Exercise 2 (*/**)
Write a quiz application (ie, not a notebook) that displays a histogram or ogive of a distribution with random parameters and asks the user to identify the distribution. Keep track of score. Matplotlib isn't tied to jupyter and works with a GUI as well. Keep it simple or gamify this important exercise all the way with `pygame` or equivalent. 

This will be your own quiz application for the exam -- you will be given a histogram and ogive and be required to identify the distribution as an exam question.