<h1>01 Numpy</h1>
$\newcommand{\Set}[1]{\{#1\}}$ 
$\newcommand{\Tuple}[1]{\langle#1\rangle}$ 
$\newcommand{\v}[1]{\pmb{#1}}$ 
$\newcommand{\cv}[1]{\begin{bmatrix}#1\end{bmatrix}}$ 
$\newcommand{\rv}[1]{[#1]}$ 
$\DeclareMathOperator{\argmax}{arg\,max}$ 
$\DeclareMathOperator{\argmin}{arg\,min}$ 
$\DeclareMathOperator{\dist}{dist}$
$\DeclareMathOperator{\abs}{abs}$

<h2>Preliminaries</h2>
<p>
    One of my first code cells always looks like this:
</p>

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

<p>
    The first two lines mean that modules get reloaded before executing anything. So if I have my own module
    and I change it in an editor, then I can run the code without worrying about how to reload the changed
    module: it's done automatically.
</p>
<p>
    The third line says that when we draw graphs, they will appear in the notebook itself, not in a separate
    window.
</p>

<p>
    My next cell usually contains these three imports:
</p>

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

<p>
    My third code cell also contains <code>import</code> statements, ones that are specific to this notebook. 
    Here's an example:
</p>

In [3]:
from math import sqrt

<p>
    So now we can compute square roots:
</p>

In [4]:
sqrt(25)

5.0

<h2>Numpy</h2>
<p>
    Numpy is short for Numerical Python. It offers <code>ndarray</code>, which is a fast and space-efficient
    multidimensional array providing vectorized arithmetic operations, among other things. Pandas is built atop
    of numpy, and is somehwat more high-level; scikit-learn uses numpy ndarrays as its main data structure;
    matplotlib works with numpy arrays also.
</p>

<h2>Exercises</h2>
<ol>
    <li>
        Let
        $$\v{u} = \cv{2\\-7\\1}\,\,\,
          \v{v} = \cv{-3\\0\\4}$$
        and
        $$\v{A} =  \begin{bmatrix}
                      1 &  2 & 0 \\
                      3 & -1 & 4
                  \end{bmatrix}\,\,\,
           \v{B} = \begin{bmatrix}
                       2 & -1 \\
                       1 &  0 \\
                      -3 & 4
                \end{bmatrix}$$
        Use numpy to compute:
        <ol>
            <li>$\v{u} + \v{v}$</li>
            <li>$-3\v{u}$</li>
            <li>$\v{u}\v{v}$ (Strictly, we should write $\v{u}^T\v{v}$. Why? But it is common to write it without the transpose.)</li>
            <li>$\v{u}\v{u}$</li>
            <li>$\sqrt{\v{u}\v{u}}$</li>
            <li>$\v{u} * \v{v}$</li>
            <li>$\v{A} + \v{A}$</li>
            <li>$\v{A} + \v{u}$</li>
            <li>$10\v{A}$</li>
            <li>$\v{A}\v{v}$</li>
            <li>$\v{A}\v{B}$</li>
            <li>$\v{A}^T$</li>
            <li>$\v{A}\v{A}^T$</li>
            <li>$\v{A}^T\v{A}$</li>
            <li>the smallest element in $\v{u}$</li>
            <li>the index of the smallest element in $\v{u}$</li>
            <li>the mean of the values in $\v{u}$</li>
        </ol>
    </li>
    <li>Play with the <code>cumsum</code> method on 1-dimensional numpy arrays. Then define a Python function
        that does the same thing for regular Python lists. Then compare how long they take to run on an 
        array/list that contains all the integers from 1 to 1000 inclusive.
    </li>
</ol>

In [5]:
u = np.array([2,-7,1])
v = np.array([-3,0,4])

u + v

array([-1, -7,  5])

In [6]:
def cumsum(L):
    total = 0
    NewList = []
    for x in L:
        total += x
        NewList.append(total)
    return NewList

In [7]:
x = [1,2,3,4]

cumsum(x)

[1, 3, 6, 10]

In [8]:
%timeit cumsum(x)

430 ns ± 24.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [9]:
x = np.arange(1, 101)

%timeit np.cumsum(x)

2.63 µs ± 100 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
