# Introduction

In this introduction to our `sklearn` tutorial, we will investigate the basic tools that will be needed for the more machine-learning oriented sections that follow.

## NumPy

### Basics of `ndarray` type

First, we will present the `numpy` library and its `ndarray` object.

Let us first import this library. As it will be used very often in our code, it is usual to rename it `np` while importing:

In [1]:
import numpy as np

Then, we can create a first array and manipulate it:

In [2]:
arr = np.array([[0, 1], [2, 3], [4, 5]])
print(arr)

[[0 1]
 [2 3]
 [4 5]]


In [3]:
print(2.5 * arr)

[[  0.    2.5]
 [  5.    7.5]
 [ 10.   12.5]]


In [4]:
print(arr + arr)

[[ 0  2]
 [ 4  6]
 [ 8 10]]


In [5]:
print(arr.dtype)  # Data type

int64


In [6]:
print(arr.shape)

(3, 2)


In [7]:
print(arr.ndim)

2


In this tutorial, we will always consider vectors (`ndim = 1`) or matrices (`ndim = 2`), but `numpy` can be used to manipulate arrays with any number of dimensions.

### Element-wise operations _vs_ matrix products

One important thing to understand with `numpy` is that the usual product between two arrays is element-wise product, not matrix/vector product:

In [8]:
A = np.array([[0, 1], [2, 3]])
I = np.array([[1, 0], [0, 1]])
print(A)
print(I)

[[0 1]
 [2 3]]
[[1 0]
 [0 1]]


In [9]:
print(A * I)
print(np.dot(A, I))  # np.dot is the matrix product

[[0 0]
 [0 3]]
[[0 1]
 [2 3]]


Similarly, in `numpy`, `A ** 2` is the element-wise square of matrix `A`:

In [10]:
print(A ** 2)
print(np.dot(A, A))

[[0 1]
 [4 9]]
[[ 2  3]
 [ 6 11]]


Quite easily, we can transpose an array:

In [11]:
print(arr.T)

[[0 2 4]
 [1 3 5]]


### Building usual arrays

`numpy` also offers routines to build typical matrices / vectors:

In [12]:
print(np.zeros((2, 3)))

[[ 0.  0.  0.]
 [ 0.  0.  0.]]


In [13]:
print(np.ones((2, 3)))

[[ 1.  1.  1.]
 [ 1.  1.  1.]]


In [14]:
print(np.eye(3))

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]


In [15]:
print(np.arange(10))

[0 1 2 3 4 5 6 7 8 9]


In [16]:
print(np.linspace(0, 1, 11))

[ 0.   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1. ]


In [17]:
np.random.seed(0)             # Initialize the seed of the random generator to get reproducible results
print(np.random.randn(2, 5))  # randn returns samples drawn from N(0,1)
print(np.random.rand(2, 5))   # rand returns samples drawn uniformly in [0,1]

[[ 1.76405235  0.40015721  0.97873798  2.2408932   1.86755799]
 [-0.97727788  0.95008842 -0.15135721 -0.10321885  0.4105985 ]]
[[ 0.79172504  0.52889492  0.56804456  0.92559664  0.07103606]
 [ 0.0871293   0.0202184   0.83261985  0.77815675  0.87001215]]


### Array slicing and boolean indexing

As for lists, `numpy` arrays can be accessed by slice:

In [18]:
M = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]])
print(M)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


In [19]:
print(M[:2, 3:])  # Row indices up to 2 (excluded), Column indices strating from 3 (included)

[[3 4]
 [8 9]]


In [20]:
print(M[1:3, :])  # Row indices 1 (included) to 3 (excluded), All columns

[[ 5  6  7  8  9]
 [10 11 12 13 14]]


Another way to get a subset of a matrix is to use boolean indexing.

Let us assume, for example, that we want to keep only positive components in a vector v:

In [21]:
v = np.array([10, 5, -1, 4, 0, 3])
print(v)
v2 = v[v > 0]  # Keep only strictly positive components from v
print(v2)

[10  5 -1  4  0  3]
[10  5  4  3]


### Operations on arrays

`numpy` offers facilities to compute basic statistics from arrays (sum of their elements, minimum/maximum values, mean, standard deviation, ...). We present some of them in the following:

In [22]:
print(M)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


In [23]:
print(np.max(M))  # Could also be written M.max()

19


In [24]:
print(np.min(M))  # Could also be written M.min()

0


In [25]:
print(np.mean(M))  # Could also be written M.mean()

9.5


In [26]:
print(np.std(M))  # Could also be written M.std()

5.76628129734


In [27]:
print(np.linalg.norm(M))  # L2-norm by default

49.6990945592


In [28]:
print(np.sum(M))  # Could also be written M.sum()

190


The latter can also be used on binary vectors, in which cases it corresponds to the number of `True` entries in the array: 

In [29]:
v = np.array([10, 5, -1, 4, 0, 3])
print(np.sum(v > 0))

4


### Element-wise operations

`numpy` also offers many vectorized versions of standard mathematical operations. For these functions, element-wise operations are performed:

In [30]:
print(M)
print(np.sqrt(M))

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
[[ 0.          1.          1.41421356  1.73205081  2.        ]
 [ 2.23606798  2.44948974  2.64575131  2.82842712  3.        ]
 [ 3.16227766  3.31662479  3.46410162  3.60555128  3.74165739]
 [ 3.87298335  4.          4.12310563  4.24264069  4.35889894]]


In [31]:
print(np.exp(v))

[  2.20264658e+04   1.48413159e+02   3.67879441e-01   5.45981500e+01
   1.00000000e+00   2.00855369e+01]


In [32]:
print(np.abs(v))

[10  5  1  4  0  3]


### Concatenating and reshaping arrays

We can change the shape of an array, as soon as the number of elements is unchanged:

In [33]:
print(v.reshape((3, 2)))

[[10  5]
 [-1  4]
 [ 0  3]]


Note that this does not change the shape of `v` but rather returns a new array with the required shape.

One can also concatenate several arrays to create new ones. There exists two modes of concatenation:
* horizontal concatenation stacks columns;
* vertical concatenation stacks rows.

Of course, these operations require that corresponding dimensions match.

In [34]:
print(np.hstack((np.zeros((2, 3)), np.ones((2, 5)))))

[[ 0.  0.  0.  1.  1.  1.  1.  1.]
 [ 0.  0.  0.  1.  1.  1.  1.  1.]]


In [35]:
print(np.vstack((np.zeros((2, 5)), np.ones((3, 5)))))

[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]]


## Plotting with `matplotlib`

**TODO**

## Common API for `sklearn` models

**TODO**
