
# Fundamentals of Deep Learning 

*Notebook 2.0: Scientific Computing in Python*

#### Notes are based on Python Data Analysis, O'Reilly and University of Toronto

## This notebook has two parts: Pandas and NumPy

# Part I: Pandas

At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the rows and columns are identified with labels rather than simple integer indices.
As we will see during the course of this chapter, Pandas provides a host of useful tools, methods, and functionality on top of the basic data structures, but nearly everything that follows will require an understanding of what these structures are.
Thus, before we go any further, let's introduce these three fundamental Pandas data structures: the ``Series``, ``DataFrame``, and ``Index``.

We will start our code sessions with the standard NumPy and Pandas imports:

In [114]:
import numpy as np
import pandas as pd

## The Pandas Series Object

A Pandas ``Series`` is a one-dimensional array of indexed data.
It can be created from a list or array as follows:

In [115]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

As we see in the output, the ``Series`` wraps both a sequence of values and a sequence of indices, which we can access with the ``values`` and ``index`` attributes.
The ``values`` are simply a familiar NumPy array:

In [116]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

The ``index`` is an array-like object of type ``pd.Index``, which we'll discuss in more detail momentarily.

In [117]:
data.index

RangeIndex(start=0, stop=4, step=1)

Like with a NumPy array, data can be accessed by the associated index via the familiar Python square-bracket notation:

In [118]:
data[1]

0.5

In [119]:
data[1:3]

1    0.50
2    0.75
dtype: float64

As we will see, though, the Pandas ``Series`` is much more general and flexible than the one-dimensional NumPy array that it emulates.

### ``Series`` as generalized NumPy array

From what we've seen so far, it may look like the ``Series`` object is basically interchangeable with a one-dimensional NumPy array.
The essential difference is the presence of the index: while the Numpy Array has an *implicitly defined* integer index used to access the values, the Pandas ``Series`` has an *explicitly defined* index associated with the values.

This explicit index definition gives the ``Series`` object additional capabilities. For example, the index need not be an integer, but can consist of values of any desired type.
For example, if we wish, we can use strings as an index:

In [120]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

And the item access works as expected:

In [121]:
data['b']

0.5

We can even use non-contiguous or non-sequential indices:

In [122]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=[2, 5, 3, 7])
data

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

In [123]:
data[5]

0.5

### Series as specialized dictionary

In this way, you can think of a Pandas ``Series`` a bit like a specialization of a Python dictionary.
A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a ``Series`` is a structure which maps typed keys to a set of typed values.
This typing is important: just as the type-specific compiled code behind a NumPy array makes it more efficient than a Python list for certain operations, the type information of a Pandas ``Series`` makes it much more efficient than Python dictionaries for certain operations.

The ``Series``-as-dictionary analogy can be made even more clear by constructing a ``Series`` object directly from a Python dictionary:

In [124]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

By default, a ``Series`` will be created where the index is drawn from the sorted keys.
From here, typical dictionary-style item access can be performed:

In [125]:
population['California']

38332521

Unlike a dictionary, though, the ``Series`` also supports array-style operations such as slicing:

In [None]:
population['California':'Illinois']

California    38332521
Florida       19552860
Illinois      12882135
dtype: int64

We'll discuss some of the quirks of Pandas indexing and slicing in [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb).

### Constructing Series objects

We've already seen a few ways of constructing a Pandas ``Series`` from scratch; all of them are some version of the following:

```python
>>> pd.Series(data, index=index)
```

where ``index`` is an optional argument, and ``data`` can be one of many entities.

For example, ``data`` can be a list or NumPy array, in which case ``index`` defaults to an integer sequence:

In [None]:
pd.Series([2, 4, 6])

0    2
1    4
2    6
dtype: int64

``data`` can be a scalar, which is repeated to fill the specified index:

In [None]:
pd.Series(5, index=[100, 200, 300])

100    5
200    5
300    5
dtype: int64

``data`` can be a dictionary, in which ``index`` defaults to the sorted dictionary keys:

In [None]:
pd.Series({2:'a', 1:'b', 3:'c'})

1    b
2    a
3    c
dtype: object

In each case, the index can be explicitly set if a different result is preferred:

In [None]:
pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

3    c
2    a
dtype: object

Notice that in this case, the ``Series`` is populated only with the explicitly identified keys.

## The Pandas DataFrame Object

The next fundamental structure in Pandas is the ``DataFrame``.
Like the ``Series`` object discussed in the previous section, the ``DataFrame`` can be thought of either as a generalization of a NumPy array, or as a specialization of a Python dictionary.
We'll now take a look at each of these perspectives.

### DataFrame as a generalized NumPy array
If a ``Series`` is an analog of a one-dimensional array with flexible indices, a ``DataFrame`` is an analog of a two-dimensional array with both flexible row indices and flexible column names.
Just as you might think of a two-dimensional array as an ordered sequence of aligned one-dimensional columns, you can think of a ``DataFrame`` as a sequence of aligned ``Series`` objects.
Here, by "aligned" we mean that they share the same index.

To demonstrate this, let's first construct a new ``Series`` listing the area of each of the five states discussed in the previous section:

In [None]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area

California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662
dtype: int64

Now that we have this along with the ``population`` Series from before, we can use a dictionary to construct a single two-dimensional object containing this information:

In [None]:
states = pd.DataFrame({'population': population,
                       'area': area})
states

Unnamed: 0,area,population
California,423967,38332521
Florida,170312,19552860
Illinois,149995,12882135
New York,141297,19651127
Texas,695662,26448193


Like the ``Series`` object, the ``DataFrame`` has an ``index`` attribute that gives access to the index labels:

In [None]:
states.index

Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')

Additionally, the ``DataFrame`` has a ``columns`` attribute, which is an ``Index`` object holding the column labels:

In [None]:
states.columns

Index(['area', 'population'], dtype='object')

Thus the ``DataFrame`` can be thought of as a generalization of a two-dimensional NumPy array, where both the rows and columns have a generalized index for accessing the data.

### DataFrame as specialized dictionary

Similarly, we can also think of a ``DataFrame`` as a specialization of a dictionary.
Where a dictionary maps a key to a value, a ``DataFrame`` maps a column name to a ``Series`` of column data.
For example, asking for the ``'area'`` attribute returns the ``Series`` object containing the areas we saw earlier:

In [None]:
states['area']

California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662
Name: area, dtype: int64

Notice the potential point of confusion here: in a two-dimesnional NumPy array, ``data[0]`` will return the first *row*. For a ``DataFrame``, ``data['col0']`` will return the first *column*.
Because of this, it is probably better to think about ``DataFrame``s as generalized dictionaries rather than generalized arrays, though both ways of looking at the situation can be useful.
We'll explore more flexible means of indexing ``DataFrame``s in [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb).

### Constructing DataFrame objects

A Pandas ``DataFrame`` can be constructed in a variety of ways.
Here we'll give several examples.

#### From a single Series object

A ``DataFrame`` is a collection of ``Series`` objects, and a single-column ``DataFrame`` can be constructed from a single ``Series``:

In [None]:
pd.DataFrame(population, columns=['population'])

Unnamed: 0,population
California,38332521
Florida,19552860
Illinois,12882135
New York,19651127
Texas,26448193


#### From a list of dicts

Any list of dictionaries can be made into a ``DataFrame``.
We'll use a simple list comprehension to create some data:

In [None]:
data = [{'a': i, 'b': 2 * i}
        for i in range(3)]
pd.DataFrame(data)

Unnamed: 0,a,b
0,0,0
1,1,2
2,2,4


Even if some keys in the dictionary are missing, Pandas will fill them in with ``NaN`` (i.e., "not a number") values:

In [None]:
pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

Unnamed: 0,a,b,c
0,1.0,2,
1,,3,4.0


#### From a dictionary of Series objects

As we saw before, a ``DataFrame`` can be constructed from a dictionary of ``Series`` objects as well:

In [None]:
pd.DataFrame({'population': population,
              'area': area})

Unnamed: 0,area,population
California,423967,38332521
Florida,170312,19552860
Illinois,149995,12882135
New York,141297,19651127
Texas,695662,26448193


#### From a two-dimensional NumPy array

Given a two-dimensional array of data, we can create a ``DataFrame`` with any specified column and index names.
If omitted, an integer index will be used for each:

In [None]:
pd.DataFrame(np.random.rand(3, 2),
             columns=['foo', 'bar'],
             index=['a', 'b', 'c'])

Unnamed: 0,foo,bar
a,0.865257,0.213169
b,0.442759,0.108267
c,0.04711,0.905718


#### From a NumPy structured array

We covered structured arrays in [Structured Data: NumPy's Structured Arrays](02.09-Structured-Data-NumPy.ipynb).
A Pandas ``DataFrame`` operates much like a structured array, and can be created directly from one:

In [None]:
A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
A

array([(0, 0.0), (0, 0.0), (0, 0.0)], 
      dtype=[('A', '<i8'), ('B', '<f8')])

In [None]:
pd.DataFrame(A)

Unnamed: 0,A,B
0,0,0.0
1,0,0.0
2,0,0.0


## The Pandas Index Object

We have seen here that both the ``Series`` and ``DataFrame`` objects contain an explicit *index* that lets you reference and modify data.
This ``Index`` object is an interesting structure in itself, and it can be thought of either as an *immutable array* or as an *ordered set* (technically a multi-set, as ``Index`` objects may contain repeated values).
Those views have some interesting consequences in the operations available on ``Index`` objects.
As a simple example, let's construct an ``Index`` from a list of integers:

In [None]:
ind = pd.Index([2, 3, 5, 7, 11])
ind

Int64Index([2, 3, 5, 7, 11], dtype='int64')

### Index as immutable array

The ``Index`` in many ways operates like an array.
For example, we can use standard Python indexing notation to retrieve values or slices:

In [None]:
ind[1]

3

In [None]:
ind[::2]

Int64Index([2, 5, 11], dtype='int64')

``Index`` objects also have many of the attributes familiar from NumPy arrays:

In [None]:
print(ind.size, ind.shape, ind.ndim, ind.dtype)

5 (5,) 1 int64


One difference between ``Index`` objects and NumPy arrays is that indices are immutable–that is, they cannot be modified via the normal means:

In [None]:
ind[1] = 0

TypeError: Index does not support mutable operations

This immutability makes it safer to share indices between multiple ``DataFrame``s and arrays, without the potential for side effects from inadvertent index modification.

### Index as ordered set

Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic.
The ``Index`` object follows many of the conventions used by Python's built-in ``set`` data structure, so that unions, intersections, differences, and other combinations can be computed in a familiar way:

In [None]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [None]:
indA & indB  # intersection

Int64Index([3, 5, 7], dtype='int64')

In [None]:
indA | indB  # union

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [None]:
indA ^ indB  # symmetric difference

Int64Index([1, 2, 9, 11], dtype='int64')

# Part II: NumPy

## Python lists are not ideal for numerical arrays

For numerical work, the python-native lists aren't the ideal data type.

Lists can do funny things that you don't expect, if you're not
careful.

  * Lists are just a collection of items, of any type.
  * If you do mathematical operations on a list, you won't get
    what you expect.
  * These are not the ideal data type for scientific computing.
  * ****Arrays**** are a much better choice, but are not a native
    Python data type.

In [77]:
a = [1, 2, 3, 4]
a

[1, 2, 3, 4]

In [78]:
b = [3, 5, 5, 6]
b

[3, 5, 5, 6]

In [79]:
2 * a

[1, 2, 3, 4, 1, 2, 3, 4]

In [80]:
a + b

[1, 2, 3, 4, 3, 5, 5, 6]

## Lists vs. Arrays

**Lists**: optimized for flexibility

  * Can hold any type
  * Can grow
  * Are one-dimensional
  * Do not have out-of-the-box element-wise operations

**Arrays**: optimized for speed

  * Single type
  * Fixed size
  * Multi-dimensional
  * Have optimized element-wise operations

## Arrays are what we want to use: Numpy

Almost everything that you want to do starts with NumPy.

  * Contains arrays of various types and forms: zeros, ones,
    linspace, *etc.*
  * linspace takes 2 or 3 arguments, the default number of
    entries is 50.

In [81]:
import numpy

In [82]:
numpy.zeros(5)

array([0., 0., 0., 0., 0.])

In [83]:
numpy.ones(5, dtype = int)

array([1, 1, 1, 1, 1])

In [84]:
numpy.zeros([2,2])

array([[0., 0.],
       [0., 0.]])

In [85]:
numpy.arange(5)

array([0, 1, 2, 3, 4])

In [86]:
numpy.linspace(1,5)

array([1.        , 1.08163265, 1.16326531, 1.24489796, 1.32653061,
       1.40816327, 1.48979592, 1.57142857, 1.65306122, 1.73469388,
       1.81632653, 1.89795918, 1.97959184, 2.06122449, 2.14285714,
       2.2244898 , 2.30612245, 2.3877551 , 2.46938776, 2.55102041,
       2.63265306, 2.71428571, 2.79591837, 2.87755102, 2.95918367,
       3.04081633, 3.12244898, 3.20408163, 3.28571429, 3.36734694,
       3.44897959, 3.53061224, 3.6122449 , 3.69387755, 3.7755102 ,
       3.85714286, 3.93877551, 4.02040816, 4.10204082, 4.18367347,
       4.26530612, 4.34693878, 4.42857143, 4.51020408, 4.59183673,
       4.67346939, 4.75510204, 4.83673469, 4.91836735, 5.        ])

In [87]:
numpy.linspace(1, 5, 6)

array([1. , 1.8, 2.6, 3.4, 4.2, 5. ])

## Specifying data types

In [88]:
x = numpy.float32(7.4e-3)
a = numpy.array([[1,2,3],[4,5,6]],dtype=numpy.float32)
a

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

In [89]:
b = numpy.ndarray((2,3),dtype=numpy.float16)
b

array([[0.0e+00, 0.0e+00, 0.0e+00],
       [0.0e+00, 7.4e-06, 0.0e+00]], dtype=float16)

  * Integers:    
    `int8 int16 int32` `int64` `uint8 uint16 uint32 uint64`    
    Number indicates number of bits.
  * Floats of half, single and double precision:  `float16 float32` `float64`
  * Complex numbers in single and double precision: `complex64` `complex128`

## Accessing array elements
  
Elements of arrays are accessed using square brackets.

  * Like most languages, the first index is the row, the second is the
    column.

  * Indexing starts at 0.

  * You cannot assign values outside the index range (unlike e.g. in  R).

*Note: `import numpy as np` renames the `numpy` module to the shorter `np`.*

In [90]:
import numpy as np  
np.zeros([2, 3])

array([[0., 0., 0.],
       [0., 0., 0.]])

In [91]:
a = np.zeros([2,3])
a[1,2] = 1
a[0,1] = 2
a

array([[0., 2., 0.],
       [0., 0., 1.]])

## Slicing arrays

You can select a subset of an numpy array by using an **index range** instead of a single number between square brackets. This is called **slicing**.

  * An index range looks like "a:b",  e.g. "2:4". So `a[2:4]` selects those elements of an array `a`.

  * Read "2:4" as "from the beginning of the element at index 2, to
    the beginning of that at index 4".

  * Or read it as: index 2 is the first you get, index 4 is the first you do not get.
  
  * Negative indexing is supported.
  * If a third index is specified, it refers to the step size
    ("1:10:2", for example).
  * If no index is specifed, either "beginning" or "end" is
    assumed.
      

In [92]:
a = np.array([1,2,3,4,5,6,7])
print(a[2])

3


In [93]:
print(a[2:4])

[3 4]


In [94]:
print(a[::2])

[1 3 5 7]


Elements in an array can also be selected using a boolean array.  Boolean arrays can be created using a conditional expression.

In [95]:
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [96]:
a > 2

array([False, False, False,  True,  True])

In [97]:
a[a > 2]

array([3, 4])

In [98]:
a[(a % 2) == 0]

array([0, 2, 4])

Note: The "%" symbol is the modulus operator.

# Copying arrays

Use caution when copying array variables.  There's a 'sharing feature'
here that is unexpected.

In [99]:
a = 10
b = a
a = 20
a, b

(20, 10)

In [100]:
a = np.array([[1,2,3],[2,3,4]])
b = a
a[1,0] = -10
a

array([[  1,   2,   3],
       [-10,   3,   4]])

In [101]:
b

array([[  1,   2,   3],
       [-10,   3,   4]])

To turn off this 'sharing feature', use `copy`

In [102]:
b = a.copy()
a[1,0] = 16
a

array([[ 1,  2,  3],
       [16,  3,  4]])

In [103]:
b

array([[  1,   2,   3],
       [-10,   3,   4]])

# Matrix arithmetic

## Looping over arrays
  
  * In Python, loops over arrays are performed over the first
    index.

  * To go over all elements of a multidimensional array `a` without using nested loops, use
    `a.ravel()` or `a.flat` 

    (or `a.flatten()` if you need a copy).

In [104]:
a = np.array([1,2,3])
for i in a:
    print("element:", i)

element: 1
element: 2
element: 3


In [105]:
a = np.array([[1,2],[3,4]])
for i in a: 
    print("element:", i)

element: [1 2]
element: [3 4]


In [106]:
for i in a.ravel(): 
    print("element:", i)

element: 1
element: 2
element: 3
element: 4


## Shape and reshape
   
  * NumPy allows you to modify the shape of an array once it
    already exists.

  * Though, of course, you can only change the shape to one which
    contains the same number of elements.

  * Also, note that `reshape` creates a new view of the
    array data, and doesn't change the shape of the original array.

In [107]:
a = np.arange(8)
a.shape

(8,)

In [108]:
a.reshape([2,4])

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [109]:
a.reshape([2,4]).shape

(2, 4)

## Vector-vector & vector-scalar multiplication
   
1-D arrays are often called 'vectors'.

  * When vectors are multiplied you get element-by-element
    multiplication.
  * When vectors are multiplied by a scalar, you
    also get element-wise multiplication.

In [110]:
a = np.arange(4)
a

array([0, 1, 2, 3])

In [66]:
b = np.arange(4.) + 3
b

array([3., 4., 5., 6.])

In [67]:
c = 2
c

2

In [68]:
a * b

array([ 0.,  4., 10., 18.])

In [69]:
a * c

array([0, 2, 4, 6])

In [70]:
b * c

array([ 6.,  8., 10., 12.])

## Peculiar matrix-vector multiplication

A 2-D array is sometimes called a 'matrix'.

  * Matrix-scalar multiplication gives element-wise
    multiplication.
  * Matrix-vector multiplication DOES NOT give the
    standard result!

In [71]:
a = np.array([[1,2,3],[2,3,4]])
b = np.array([1, 2, 3])
a * b

array([[ 1,  4,  9],
       [ 2,  6, 12]])

Normal matrix-vector multiplication:
$$
\begin{bmatrix}
  a_{11} & a_{12} & a_{13}  \\
  a_{21} & a_{22} & a_{23}
\end{bmatrix} \cdot
\begin{bmatrix}
  b_1  \\
  b_2  \\
  b_3 
\end{bmatrix}
=
\begin{bmatrix}
  a_{11} \cdot b_1 + a_{12} \cdot b_2 +  a_{13} \cdot b_3\\
  a_{21} \cdot b_1 + a_{22} \cdot b_2 +  a_{23} \cdot b_3
\end{bmatrix}
$$

Python matrix-vector multiplication:
$$
\begin{bmatrix}
  a_{11} & a_{12} & a_{13}  \\
  a_{21} & a_{22} & a_{23}
\end{bmatrix} \cdot
\begin{bmatrix}
  b_1  \\
  b_2  \\
  b_3 
\end{bmatrix}
=
\begin{bmatrix}
  a_{11} \cdot b_1 & a_{12} \cdot b_2 &  a_{13} \cdot b_3\\
  a_{21} \cdot b_1 & a_{22} \cdot b_2 &  a_{23} \cdot b_3
\end{bmatrix}
$$

## Vector broadcasting

This peculiar multiplication is result of element-wise operations plus **broadcasting**.

Python will perform vector broadcasting if you perform a matrix-vector operation:
      
  * Python will repeatedly apply the vector to the matrix.
  * Python will not do this with vector-vector operations.
  * The length of the vector must equal the last dimension of the matrix.
  * By default it will do the application by row; use
    'np.newaxis' to reshape the vector.

In [72]:
a = np.zeros([2,3])
a + np.arange(3)

array([[0., 1., 2.],
       [0., 1., 2.]])

In [76]:
a + np.arange(2)[:,np.newaxis]

array([[0., 0., 0.],
       [1., 1., 1.]])

## Broadcasting rules

Dimensions are stretched if they are ``compatible''.

Start with the trailing dimensions, and move to the left.

If the dimensions are equal, or one of them is 1, or one is missing, they are compatible.

![](array-broadcasting.png)


## Matrix-matrix multiplication
   
Not surprisingly, matrix-matrix multiplication doesn't work as
expected either, instead doing an element-wise multiplication like
with vector-vector multiplication.

In [None]:
a = np.array([[1,2,3],[2,3,4]])
b = np.array([[1,2,3],[2,3,4]])
a

In [None]:
a * b

Normal matrix-matrix multiplication:

$$
  \begin{bmatrix} 
      a_{11} & a_{12} \\
      a_{21} & a_{22}
  \end{bmatrix} \cdot 
  \begin{bmatrix} 
    b_{11} & b_{12} \\
    b_{21} & b_{22} 
  \end{bmatrix}
  = 
  \begin{bmatrix} 
      a_{11} \cdot b_{11} + a_{12} \cdot b_{21} &  a_{11} \cdot b_{12} + a_{12} \cdot b_{22}\\
      a_{21} \cdot b_{11} + a_{22} \cdot b_{21} &  a_{21} \cdot b_{12} + a_{22} \cdot b_{22}
    \end{bmatrix}
$$

Python matrix-vector multiplication:

$$
  \begin{bmatrix} 
      a_{11} & a_{12} \\
      a_{21} & a_{22}
  \end{bmatrix} \cdot 
  \begin{bmatrix} 
    b_{11} & b_{12} \\
    b_{21} & b_{22} 
  \end{bmatrix}
  = 
  \begin{bmatrix} 
      a_{11} \cdot b_{11} &  a_{12} \cdot b_{12}\\
      a_{21} \cdot b_{21} &  a_{22} \cdot b_{22}
    \end{bmatrix}
$$

## How then to perform matrix algebra?
  
Since Python 3.5, the standard way is to use numpy arrays in conjuction with the @ product operator.
Alternatively, you can use the dot function from SciPy.

In [None]:
import scipy as sp
a = np.array([[1,2,3],[2,3,4]])
b = np.array([[1,2,3],[2,3,4]])
a

In [None]:
a.T  # or a.transpose()

In [None]:
a.T @ b  # or: sp.dot(a.T, b)

In [None]:
b @ a.T  # or: sp.dot(b, a.T)

In [None]:
c = np.arange(3) + 1
a @ c # or: sp.dot(a,c)

# Linalg Submodule of SciPy
  
The linalg submodule of SciPy contains useful functions for matrix algebra.

  * Typical matrix functions: inv, det, norm...
  * More advanced functions: eig, SVD, cholesky...
  * Both NumPy and SciPy have a linalg module.  Use SciPy,
    because it is compiled with optimized BLAS/LAPACK support.

In [None]:
import numpy as np
import scipy as sp
import scipy.linalg as linalg 
a = np.array([[1,2,3], [3,4,5], [1,1,2]])
linalg.det(a)

In [None]:
a @ linalg.inv(a)

## Solving systems of equations


The linalg submodule of scipy comes with an important function: solve.
      
linalg.solve is used to solve the system of equations $Ax = b$.

In [None]:
a = np.array([[1,2,3], [3,4,5], [1,1,2]])
a

In [None]:
b = np.array([3, 4, 2])
b

In [None]:
x = linalg.solve(a, b)
x

Here 
$$
  \begin{bmatrix} 
    1 & 2 & 3\\  
    3 & 4 & 5  \\
    1 & 1 & 2\\
\end{bmatrix} \cdot
\begin{bmatrix} 
x[0]  \\
x[1]  \\
x[2]
\end{bmatrix}
=
\begin{bmatrix} 
3\\
4\\
2
\end{bmatrix}
$$

is solved by

$$
  \begin{bmatrix} 
    1 & 2 & 3\\  
    3 & 4 & 5  \\
    1 & 1 & 2\\
\end{bmatrix} \cdot
\begin{bmatrix} 
-0.5  \\
-0.5  \\
1.5
\end{bmatrix}
=
\begin{bmatrix} 
3\\
4\\
2
\end{bmatrix}
$$

# Statistics

SciPy contains all of the statistical functions that you'll
probably ever need.
  
  * The scipy.stats module is based around the idea of a 'random
    variable' type.
    
  * A whole variety of standard distributions are available:
    
      * Continuous distributions: Normal, Maxwell, Cauchy,
        Chi-squared, Gumbel Left-scewed, Gilbrat, Nakagami, ...
      * Discrete distributions: Poisson, Binomial,
        Geometric, Bernoulli, ...
    
  * The random variables have all of the statistical
    properties of the distributions built into them already: cdf,
    pdf, mean, variance, moments, ...

## Normal statistics

In [None]:
import numpy as np, scipy as sp
import matplotlib.pyplot as plt
import scipy.stats as stats
x = np.linspace(-5, 5, 100)
plt.plot(x,stats.norm.pdf(x))
plt.plot(x,stats.norm.pdf(x,loc=1))
plt.plot(x,stats.norm.pdf(x,loc=-1,scale=2));

All continuous distributions take `loc` and `scale` as keyword
parameters to adjust the location and scale of the distribution.
In general the distribution of a random variable `X` is obtained
from `(X - loc) / scale`.  The default values are `loc = 0` and
`scale = 1`.

## More normal statistics

In [None]:
stats.norm.mean(loc = -1, scale = 2)

In [None]:
stats.norm.std(loc = -1, scale = 2)

In [None]:
stats.norm.moment(3, loc = -1, scale = 2)

In [None]:
samples = stats.norm.rvs(size = 1000, loc = -1, scale = 2)
plt.hist(samples, bins=41, density=True)
plt.plot(x, stats.norm.pdf(x, loc = -1, scale = 2), 'c', linewidth = 2);

## Setting the seed
 
Sometimes you need consistency in your randomness:
       
  * Pseudo-random numbers are generated from an initial
    'seed'.
  * This seed generates the first number, which is then used
    as the seed for the second number.
  * If you need consistency in your random numbers (for
    debugging, for example), you can set the seed explicitly so
    that you get the same random numbers every time.
  * Be careful using this for production!

In [None]:
stats.norm.rvs()

In [None]:
stats.norm.rvs()

In [None]:
np.random.seed(1)
stats.norm.rvs()

In [None]:
sp.random.seed(1)
stats.norm.rvs()

In [None]:
import random as rd
rd.seed(1)
stats.norm.rvs()

## Random versus numpy.random
  
You may notice that there are several random packages: `random`,
`numpy.random` and `scipy.random`.  What's the difference?
    
  * scipy.random and numpy.random are the same.
  * The `random` package is not connected to the others.
  * The numpy.random package affects numpy and scipy
    routines; the `random` package does not.
  * All use the same algorithm (Mersenne Twister).
  * The `random.seed()` is thread safe, while numpy's and scipy's `random.seed()`
    are not.
  * The `numpy.random` package contains more functionality.
  * Unless you need your code to be thread-safe (rarely in python), use
    `numpy.random`.

## Statistics, a discrete example: Poisson 

In [None]:
x = np.arange(10)
plt.plot(x, stats.poisson.pmf(x, 4),'o-')
plt.plot(x, stats.poisson.cdf(x, 4))

In [None]:
stats.poisson.mean(4)

In [None]:
stats.poisson.var(4)

Note that discrete distributions have Probability Mass Functions
(PMF) instead of Probability Density Functions (PDF).

## Polynomial fitting

In [None]:
x = np.arange(50.)
y = x + 50.0 * np.random.random(50)

fit = np.polyfit(x, y, 1)
fit

In [None]:
plt.plot(x, y, 'o')  
plt.plot(x, np.polyval(fit, x));

In [None]:
fit = np.polyfit(x, y, 2)

In [None]:
plt.plot(x, y, 'o')  
plt.plot(x, np.polyval(fit, x));

## Further numerical functionality in SciPy

There is a lot functionality more in SciPy and its subpackages, e.g.:

 * optimization
 
 * (even more) linear algebra
 
 * integration
 
 * interpolation
 
 * special functions
 
 * fast fourier transforms
 
 * signal and image processing
 
 * solvers for ordinary differential equations.
 
 * ...