# Introduction to NumPy

In [None]:
from IPython.display import Image
import os
  
IMG_PATH = "./img" # put your path for the images

In [None]:
Image(filename=os.path.join(IMG_PATH, 'numpy_ecosystem.png'), width=500)



----
## NumPy Installation 

#### Installing via pip (for the *wise*  *nix users)
Like any other major python library, numpy can be downloaded from
the [Python Package Index](https://pypi.org/) via Python’s standard pip package manager


**Note:** you need Python and pip already installed on your system.

```

python -m pip install --user numpy 

```

It is preferable to use the --user flag, as it prevents the need for *sudo* privileges

#### Install system-wide (Linux)
Linux users can install packages from distro repository as well.

However, be aware that you may found packager versions that are older than those available using pip.

 *Ubuntu Debian*

--- 

 ```
 sudo apt-get install python-numpy
 
 ```
 
 *Fedora (22 and later)*
 
 ---
 ```
 sudo port install py35-numpy 
 ```
 
 #### Install system-wide (Mac)
 
 Macs don’t have a preinstalled package manager, but there are a couple of popular package managers you can install.
 
 *Macports*
 
 ---
 ```
 sudo port install py35-numpy 
 ```

*Homebrew*
 
 ---
```
python -m pip install numpy 
```


#### Install with Anaconda 

> (**Recommended for Windows users**)

Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytic, etc.), that aims to simplify package management and deployment. Package versions are managed by the package management system conda.

The Anaconda distribution is used by over 12 million users and includes more than 1400 popular data-science packages suitable for Windows, Linux, and MacOs.

It can be download at the following link [Anaconda](https://www.anaconda.com/distribution/).

Please download and install the Python 3.7 version and you are done!  

Anaconda allows you to easily manage and create  different *environments*.

Installing a package with conda is simple. You need to search the package you want to install at [https://anaconda.org/](https://anaconda.org/)  and just copy the command.

For instance:
```
conda install -c anaconda numpy 

```

###  Good Practice
Regardless of the way you decide to install your python distribution,
in order to have a clean python setup, you should get used to the concept of python *environment*.

**What is an Environment?**
A environment is a directory containing all the packages and dependencies required by your project.

You may want to create an environment for every project you work on.
There are two main benefits deriving by the endorsement of such behavior:

 1. It will be easier to share the project among different machines.
 2. You are reducing the risk of messing up your entire python ecosystem.
    
---
*Example*

You have one environment with NumPy 1.7 and its dependencies, and another environment with NumPy 1.6 for legacy testing. 
All the changes you made on one environment do no affect the others. So, you can keep working on different projects as they
were isolated from the others.

---

### How to create an environment
#### Non-Conda Users
First you need to install the *virtualenv* package via:
```
pip install virtualenv
```

The you issue:

```
virtualenv <name-of-the-environment>
source path/to/the/environment/bin/activate
```
From this moment forward, your work session will use this environment
as its python distribution.

This means that, whenever you call ```pip install something```
it is installed in the current active virtual environment.

In order to deactivate the environment:
```
deactivate <name-of-the-environment>
```

#### Conda Users
With conda is even simpler. 
You create an environment with the following command:

```
conda create -n <name-of-the-environment> python[=version number]
```

The you activate it with:
```
conda activate <name-of-the-environment> 
```

And install packages with:

```
conda install -n <name-of-the-environment> something
```



## Importing NumPy
 

In [None]:

import numpy as np

'''
Utiliy function
'''
def describe(a):
  """
  Description of the array a
  
  Parameters
  ----------
    a : numpy array
  """
  print("data:\n{}\nshape:{}\ndtype: {}".format(a, a.shape, a.dtype))

hrule = lambda x : "="*x


## The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray, which
is a fast, flexible container for large data sets in Python.

Arrays enable you to perform
mathematical operations on whole blocks of data using similar syntax to the equivalent
operations between scalar elements

In [None]:
# create a random matrix of 4x5 dimension with
# 20 points drawn from an uniform distribution 
# defined in the interval (0,10]
data = np.random.randint(0,10,20).reshape(4,5)

print(data)

print(hrule(30))
print(data+data)

print(hrule(30))
print(data*10)

**Definition: ndarray**

An ndarray is a generic multidimensional container for *homogeneous* data
(all the elements must have the same type) - ** They are not python lists! **

An ndarray is associated with a *shape*, a tuple denoting the size of each dimension.

An ndarray is also associated with a *dtype*,  an object describing the data type of the array.

In [None]:
print("data: shape {}, dtype {}".format(data.shape, data.dtype))

### Creating ndarrays

The easiest way is via the **array** function. 
This function accepts any sequence-like object and returns
a new ndarray containing the data passed as input.

**Question**: Which is the dtype of data_arr?

In [None]:
data_list = [6,7.5, 8, 0, 1]
data_arr = np.array(data_list)
print(data_arr)

We can also convert nested sequences, e.g., lists of lists. 
Numpy will return a multidimensional array.


In [None]:
data2 = np.array([[1,0.2,3,4], [1,2,3,4]])
describe(data2)

What if the lists have different sizes?

In [None]:
data2_ = np.array([[1,2,3,4], [1,2,3,4,5]], dtype='object')
data2_

**Note** - Unless explicitly specified, np.array tries to infer the best data type
on the basis of the data passed to the function.



In [None]:
data_arr.dtype
data2_.dtype

### Functions for initializing arrays

In addition to np.array, there are several other functions.

For instances

**np.zeros**. it needs the shape of the array to be created

In [None]:
print(np.zeros(10)) # create an array of 10 elements
print(hrule(30)) 
print(np.zeros((2,2))) # create a multidimensional array 2x2

**np.empty** creates an array of garbage numbers

In [None]:
np.empty(15) # DO NOT USE ME!

**np.arange**, it is the counterpart to the built-in range function



In [None]:
#All the number within (0,10]
#with increment 2
np.arange(0,10,2) 

These are just an example. The following table summarize some of the most useful functions.

| Function   | Description |
|---|---|
| asarray | Convert input to ndarray, but do not copy if the input is already an ndarray |
| arange |Like the built-in range but returns an ndarray instead of a list | 
|ones, ones_like |Produce an array of all 1’s with with the same shape and dtype of the given array|
|zeros, zeros_like | Like ones but with zeros | 
| eye, identity |  Create a square N x N identity matrix |

## Data Types for ndarrays

The data type or dtype is a special object containing the information needed 
by numpy in order to interpret a chunk of memory as a particular type of data.

The numerical dtypes are named the same way: a type name,
like float or int , followed by a number indicating the number of bits per element.

The following table is a full listing of NumPy's supported data types


| Type | Type Code   | Description |
|---|---|---|
| int8, uint8 | i1, u1 | Signed and unsigned 8-bit (1 byte) integer types  |
| int16, uint16 | i2, u2|  Signed and unsigned 16-bit integer types|
| int32, uint32 | i4, u4 | Signed and unsigned 32-bit integer types|
| int64, uint64 | i8, u8|  Signed and unsigned 32-bit integer types|
| float16 | f2 | Half-precision floating point|
| float32 | f4 or f | Standard single-precision floating point. Compatible with C float |
| float64, float128 | f8 or d |  Standard double-precision floating point. Comp. with C double and Py float |
| float128 | f16 or g | Extended-precision floating point | 
| complex64, complex128, complex256 | c8, c16, c32 | Complex numbers represented by two32, 64, or 128 floats, respectively |
| bool | ? | Boolean type storing True and False values |
| object | O | Python object type |
| string_  | S|  Fixed-length string type (1 byte per character). |
|unicode_ | U | Fixed-length unicode type (number of bytes platform specific) |


**Casting**

We can cast one dtype into another by using the method astype




In [None]:
data_float = data.astype(np.float64)
describe(data_float)

**Note**
When calling astype, numpy crates a new array, even if the new dtype is the same as the old dtype


In [None]:
data.astype(np.float64)
describe(data)

## Accessing the array

### indexing& slicing
One-dimensional array behaves as a simple old-fashioned python list.


In [None]:
arr = np.arange(10)

print(arr)
print(hrule(10))
print(arr[5:8]) # slice operator
print(hrule(10))

arr[0:5] = 10 # be careful
print(arr)


**Note** - numpy allows us to assign a scalar value to a slice of the array.
The value is said to be *broadcast*  over the entire selection.  (More on
*broadcasting* later)
 

---
**Note***. One major difference with python lists is that the slicing operator
returns and actual *view* on the original array, instead of a copy of the data.

---

** What if we need a full copy of the original array?**

In [None]:
arr = np.arange(10)
print(arr)
print(hrule(10))

arr1 = arr # be careful this is not a copy
arr1[0] = -10
print(arr)
print(hrule(10))
arr_ = arr[0:5].copy()
arr_[0] = -100
print(arr) # it remains the same as before
print(hrule(10))

describe(arr_)


Indexing in multidimensional array is straightforward.
You can specify and index for each available dimension.



In [None]:
print(data)
print(hrule(10))

print(data[0,1])
print(hrule(10))
print(data[0, :])  # `:` means: "give me the entire axis!"
print(hrule(10))
print(data[:, 1:4])


Actually, with multidimensional array we can avoid to specify an index for every possible axes.

In [None]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) # 2 x 2 x 3 array
print(arr3d)
print(hrule(10))

describe(arr3d[0])  # take the first element on the first dimension (2x3) array
print(hrule(10))

# take the first element on the second axis with respect to the entire first axis
describe(arr3d[:,0]) 



**Exercise** - Get familiar with array indexing



In [None]:
'''
Give me the first (2x3) array
  
  Result (wrt arr3d)
  ------
    [
      [1 2 3],
      [4, 5, 6]
    ], shape(2,3)
    
'''
describe(
# your solution here
)
print(hrule(20))


'''
Result (wrt arr3d)
------
  [1 2 3], shape(3,)
'''
describe(
# your solution here
)
print(hrule(20))

''' 
Result (wrt arr3d)
------
  [
    [1 2 3 ]
    [7 8 9 ]
  ], shape (2,3)
 
'''
describe(
#your solution here
)
print(hrule(20))

'''
Result (wrt arr3d)
------
  [
    [4 5 6]
    [10 11 12]
  ], shape (2,3)
'''
describe(
# your solution here
)
print(hrule(20))



''' 
Result (wrt arr3d)
------
  [
    [1 4],
    [7 10]
  ], shape (2,2)
'''
describe(
    arr3d[:,:,0]
)
print(hrule(20))


''' 
Result 
------
  [
    [0 2 4],
    [6 8 10]
  ], shape (2,3)
'''
describe(
# your solution here
)
print(hrule(20))


---
**Solutions**

In [None]:
arr3d

In [None]:
'''
Give me the first (2x3) array
  
  Result (wrt arr3d)
  ------
    [
      [1 2 3],
      [4, 5, 6]
    ], shape(2,3)
    
'''
describe(
    arr3d[0]
)
print(hrule(20))


'''
Result (wrt arr3d)
------
  [1 2 3], shape(3,)
'''
describe(
  arr3d[0,0, :]
)
print(hrule(20))

''' 
Result (wrt arr3d)
------
  [
    [1 2 3 ]
    [7 8 9 ]
  ], shape (2,3)
 
'''
describe(
    arr3d[:, 0]
)
print(hrule(20))

'''
Result (wrt arr3d)
------
  [
    [4 5 6]
    [10 11 12]
  ], shape (2,3)
'''
describe(
    arr3d[:,1]
)
print(hrule(20))



''' 
Result (wrt arr3d)
------
  [
    [1 4],
    [7 10]
  ], shape (2,2)
'''
describe(
    arr3d[:,:,0]
)
print(hrule(20))


''' 
Result 
------
  [
    [0 2 4],
    [6 8 10]
  ], shape (2,3)
'''
describe(
    np.arange(0, 12, 2).reshape(2, 3)
)
print(hrule(20))


---

### Boolean Indexing
This is a powerful tool, since it allows us to execute (kind of) queries on the array.


In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4) # normally distributed data

mask = names == "Bob" # it return an array of bool (useful as a mask)
print(mask)
names[mask]

In [None]:
(names=="Bob") | (names=="Will")

We can perform an element-wise comparison 
between the name "Bob" and the names contained in the array.
In this way we are  exploiting the *vectorization* capability of numpy.

We can also make rather complex logical expressions, e.g.:


```
  mask = (names=="Bob") | (names == "Will")
```

---
**Tip**:

As a general rule-of-thumb, it is always preferable to avoid the for-loop and let numpy
doing the hard work, since it leads to a cleaner and more efficient code.

Besides, numpy is faster than you!

In [None]:
def forSearch(toSearch, sequence):
    for e in sequence:
        if e == toSearch:
            return True
    return False

randomData = np.random.randint(0,500,1000)
x = 250

forTime = %timeit -o forSearch(x,randomData)
numTime = %timeit -o np.any(randomData==x)
print(forTime.best/numTime.best)

---

**Exercise 1**
Create a vector from the  *names* array which contains all the values but "Bob"


In [None]:
names[names != "Bob"]

In [None]:
print(names)
np.argwhere(names=="Bob").ravel()

**Exercise** Set all the negative values to zero (in-place)

In [None]:
data = np.random.randint(-10, 10,25).reshape(5,5)

In [None]:
data[data < 0] = 0
describe(data)

#### Fancy Indexing 
Fancy indexing is a term adopted in NumPy to describe indexing via integer arrays



In [None]:
arr = np.zeros((8,4))
print(arr)
print(hrule(20))
for i in range(8):
  arr[i] = i
print(arr)
print(hrule(20))

# select a subset of rows the 
# exact same order defined inside the list
subset = arr[[1,2,2,3]] 
describe(subset)

**Exercise** - Do the same as above, but this time select the columns on the transpose matrix of arr.
Hint: You can compute the transpose with arr.T

In [None]:
arrT = arr.T
arrT[[1,2,2,3]]

Fancy indexing with two arrays


In [None]:
data = np.arange(4*8).reshape(8,4)
data

In [None]:
data_ = data[[1,5,7,2], [0,3,1,2]]
print(data_)


In [None]:
# data_ contains the values with index
indexes = [(u,v) for u,v in  zip([1,5,7,2], [0,3,1,2])]
print(indexes)
print([data[u,v] for u,v in indexes])

**Exercise**  Create a vector from data which satisfies the following 
 constraints:
1. it contains only the values
from the first and thelast column,
thus the shape will be (5,2).
2. it contains only the values with row-index corresponding to an even number


In [None]:
data = np.random.randint(-10, 10,25).reshape(5,5)
describe(data)

In [None]:
describe(
    # sel. 0,4 cols  sel. even rows
    data[:, [0, 4]] [np.arange(0,5,2)]
)


---
#### Summary




In [None]:

Image(os.path.join(IMG_PATH, "slicing.jpg"), width=500)


### Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping which similarly returns a view on the underlying
data without copying anything. Arrays have the transpose method and also
the special T attribute

In [None]:
Image(os.path.join(IMG_PATH, "axis.png"), width=500)

In [None]:
arr = np.arange(15).reshape((3, 5))
describe(arr)
print(hrule(10))
describe(arr.T)


For higher dimensional arrays, ```transpose``` will accept a tuple of axis numbers to permute
the axes (for extra mind bending):

In [None]:
arr = np.arange(16).reshape((2, 2, 4))
describe(arr)

print(hrule(20))
arr_ = arr.reshape(4,4)
describe(arr_)

In [None]:
arr

In [None]:
arr.transpose((0,1,2))

In [None]:
print(hrule(20))
arr = arr.transpose((1,0,2))
describe(arr)

Simple transposing with .T is just a special case of swapping axes. ndarray has the
method swap axes which takes a pair of axis numbers:

In [None]:
arr = np.arange(16).reshape((2, 2, 4))
describe(arr)
print(hrule(20))

arr_ = arr.swapaxes(1,2)
describe(arr_)


### Universal Functions: Fast Element-wise Array Functions

An universal function, or ufunc, is a function that performs a certain operation 
upon every element in an array container.


In [None]:
arr = np.power(np.arange(10), 2) # param1: scalar or array, param2: exp
describe(np.sqrt(arr)) 

These are regarded as unary ufuncs. Others, such as add or maximum , take 2 arrays
(thus, binary ufuncs) and return a single array as the result:

In [None]:
x,y = np.random.randint(0, 10, 10), np.random.randint(0, 10, 10)

print(x,y)
print(hrule(20))

In [None]:
maximum = np.maximum(x,y)  # element-wise max 
minimum = np.minimum(x,y)  # element-wise min

describe(maximum)  
print(hrule(20))
describe(minimum)


There are several functions that perform element-wise operation over an array.
---

*Unary Function*

| Function  | Description |
|---|---|
| abs |  Compute the absolute value element-wise for integer, floating point, or complex values. |
| sqrt | Compute the square root of each element. |
|square | Compute the square of each element. |
|exp | Compute the exponent e x of each element |
|log, log10, log2, log1p | Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively |
|sign | Compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative) |
|ceil | Compute the ceiling of each element, i.e. the smallest integer greater than or equal to each element |
|floor | Compute the floor of each element, i.e. the largest integer less than or equal to each element |
|rint | Round elements to the nearest integer, preserving the dtype |
|modf | Return fractional and integral parts of array as separate array |
|isnan | Return boolean array indicating whether each value is NaN (Not a Number) |
|isfinite, isinf | Return boolean array indicating whether each element is finite (non- inf , non- NaN ) or infinite, respectively |
|cos, cosh, sin, sinh,tan, tanh | Regular and hyperbolic trigonometric functions |
|arccos, arccosh, arcsin|  * | 
| arcsinh, arctan, arctanh | Inverse trigonometric functions |
| logical_not | Compute truth value of not x element-wise. Equivalent to -arr |



---

*Binary Function*


| Function  | Description |
|---|---|
| add | Add corresponding elements in arrays |
|subtract | Subtract elements in second array from first array |
|multiply | Multiply array elements |
|divide, floor_divide | Divide or floor divide (truncating the remainder) |
|power | Raise elements in first array to powers indicated in second array |
|maximum, fmax | Element-wise maximum. fmax ignores NaN |
|minimum, fmin | Element-wise minimum. fmin ignores NaN |
|mod  | Element-wise modulus (remainder of division) |
|copysign | Copy sign of values in second argument to values in first argument |
|greater, greater_equal | Perform element-wise comparison, yielding boolean array (>, >=, <, <=, ==, !=) |
|less, less_equal, equal, not_equal | * |

---
**TIP**

You should always prefer ufuncs over classic python for-loops,
as they are usually faster.




In [None]:
def power(l):
    pow = lambda x : x**2
    for i,e in enumerate(l):
        l[i] = pow(e)
       
a = np.random.randint(0,5,1000)
forTime = %timeit -o power(a)
numTime = %timeit -o np.power(a,2)

print(forTime.best/numTime.best)



### Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the ternary expression x if condition else y.
Suppose we had a boolean array and two arrays of values:

In [None]:
x = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
y = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

**Exercise**
Create  a list $L = [l_1, l_2 \dots l_n]$ such that
$$
 l_i = \left\{\begin{array}{lr}
        x_i & \text{if } cond_i \text{ is True} \\
        y_i & \text{otherwise } 
        \end{array}\right.
$$

In [None]:
l = [u if cond[i] else v for i,(u,v) in enumerate(zip(x,y))]
print(l)

Now, let's do the same thing but in the NumPy way!



--- 
**numpy.where** 
is a vectorized version of the ternary expression 

```
 if (cond) then x else y
```

Run the following cell for additional information


In [None]:
?np.where

In [None]:
print(cond)
print(x)
print(y)

**Exercise** Solve the above problem with *numpy*

In [None]:
np.where(cond, x, y)

**Exercise** Set all the values of *arr* to 2 if they are grater than 0 or -2 otherwise

In [None]:
arr = np.random.randint(-5,5, 16).reshape(4,4)
describe(arr)

In [None]:
np.where(arr>0,2,-2)

**Additional notes on numpy.where**


The arrays passed to np.where can be more than just equal sizes array or scalars.
With some cleverness you can use where to express more complicated logic; consider
this example where I have two boolean arrays, **cond1** and **cond2** , and wish to assign a
different value for each of the 4 possible pairs of boolean values.

Instead of writing this messy code
```
result = []
for i in range(n):
  if cond1[i] and cond2[i]:
    result.append(0)
  elif cond1[i]:
    result.append(1)
  elif cond2[i]:
    result.append(2)
  else:
    result.append(3)
```

I can write this other (messy) code:

```
np.where(cond1 & cond2, 0,
              np.where(cond1, 1, 
                          np.where(cond2, 2, 3))
```

---


### Mathematical and Statistical Methods

A set of mathematical functions which compute statistics about an entire array or about
the data along an axis are accessible as array methods.


| Function  | Description |
|---|---|
| sum |  - |
|mean |  - |
| std, var | Standard deviation and variance |
|min, max |  - | 
|argmin, argmax | - |
|cumsum | Cumulative sum of elements starting from 0  |
|cumprod | Cumulative product of elements starting from 1 |

**Exercise** Get familiar with these methods. Compute each one of them along axis 1.


In [None]:
arr = np.random.randn(5, 4) # normally-distributed data
describe(arr)

In [None]:
#your code here
print(arr.sum())
print(arr.sum(0))
print(arr.mean(1))

#### Unique and Other Set Logic
NumPy has some basic set operations for one-dimensional ndarrays. Probably the most
commonly used one is **np.unique** , which returns the sorted unique values in an array:

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
describe(np.unique(names))

Another useful function, **np.in1d** , tests membership of the values in one array in another,
returning a boolean array

In [None]:
values = np.array([6, 0, 0, 3, 2, 5, 6])
describe(np.in1d(values, [2, 3, 6]))


In [None]:
np.logical_or(values==2, values==3, values==6)

### Linear Algebra 


|Function | Description |
| --- | --- |
|diag | Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array |
|dot | Matrix multiplication  |
|trace | Compute the sum of the diagonal elements |
|det | Compute the matrix determinant |
|eig | Compute the eigenvalues and eigenvectors of a square matrix |
|inv | Compute the inverse of a square matrix |
|pinv | Compute the Moore-Penrose pseudo-inverse inverse of a square matrix |
|qr | Compute the QR decomposition |
|svd | Compute the singular value decomposition (SVD) |
|solve | Solve the linear system Ax = b for x, where A is a square matrix |
| lstsq | Compute the least-squares solution to y = Xb |




---

### Random Number Generation
The numpy.random module supplements the built-in Python random with functions for
efficiently generating entire arrays from random values drawn from a number of 
probability distributions.

You should prefer numpy.random over built-in random function since it is 
order of magnitude faster.

In [None]:
N = 1000
%timeit samples = [np.random.normal(0, 1) for _ in range(N)]


In [None]:
%timeit np.random.normal(size=N)

## (Homework) - Implement a Random Walker

A random walk is a mathematical object, known as a stochastic or random process, that describes a path that consists of a succession of random steps on some mathematical space such as the integers. An elementary example of a random walk is the random walk on the integer number line, $ \mathbb {Z}$ , which starts at 0 and at each step moves +1 or −1 with equal probability (*uniform probability*).

In this example, we are interested in simulating the behavior of a random walker over $N=1000$ steps.

Once you have computed a trajectory (i.e., a sequence of N steps), you need to calculate:



*   The cumulative sum
*   The minimum and maximum value along the walk trajectory
*  Verify if the number of negative samples is greater than the number of positive ones
*   The first crossing time. i.e., the step at which the walker reaches for the first time a distance from the origin equal to 10 (in absolute value)
*  The longest streak of steps in the same direction (either 1 or -1)




## Advanced Array Manipulation




## Further technical details on ndarray object internals

As you’ve
seen, the data type, or dtype, determines how the data is interpreted as being floating
point, integer, boolean, or any of the other types we’ve been looking at.

Part of what makes ndarray powerful is that every array object is a strided view on a
block of data.

You might wonder, for example, how the array view arr[::2, ::-1] does
not copy any data. 
Simply put, the ndarray is more than just a chunk of memory and
a dtype; it also has striding information which enables the array to move through
memory with varying step sizes.

More precisely, the ndarray internally consists of the
following:
 * A pointer to data, that is a block of system memory
* The data type or dtype
* A tuple indicating the array’s shape
* A tuple of strides, integers indicating the number of bytes to “step” in order to
advance one element along a dimension; 
For example, a typical 3 x 4 x 5 array of float64 (8-byte) values has strides (160 (4x5x8byte), 40 (5x8byte) , 8 (1x8byte) )

While it is rare that a typical NumPy user would be interested in the array strides,
they are the critical ingredient in constructing copy-less array views. 

Strides can
even be negative which enables an array to move backward through memory, which
would be the case in a slice like obj[::-1] or obj[:, ::-1] .




In [None]:
Image(os.path.join(IMG_PATH, 'ndarray.png'), width=500)

In [None]:
x = np.array([[0, 1, 2, 3, 4],
              [5, 6, 7, 8, 9]], dtype=np.int32)
x.strides

### Advanced Array Manipulation
There are many ways to work with arrays beyond fancy indexing, slicing, and boolean
sub-setting. While much of the heavy lifting for data analysis applications is handled by
higher level functions in pandas, you may at some point need to write a data algorithm
that is not found in one of the existing libraries.
#### Reshaping Arrays
Given what we know about NumPy arrays, it should come as little surprise that you
can convert an array from one shape to another without copying any data. To do this,
pass a tuple indicating the new shape to the reshape array instance method. For exam-
example, suppose we had a one-dimensional array of values that we wished to rearrange into
a matrix:

In [None]:
arr = np.arange(8)
arr = arr.reshape((4,2)) 
describe(arr)

One of the input shape dimensions can be -1, in which case the value used for that
dimension will be inferred from the data

In [None]:
arr = arr.reshape(-1,2,2)
describe(arr)

If we want to unroll a multi-dimensional array into a one-dimensional array, we can use
two methods:

*flatten*  and  *ravel* 

In [None]:
describe(arr.flatten())
print(hrule(20))

print("original shape= {}".format(arr.shape))

print(hrule(20))
describe(arr.ravel())

#### Concatenating and Splitting Arrays
numpy.concatenate takes a sequence (tuple, list, etc.) of arrays and it joins them together
along the given axis.



In [None]:
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
data = np.concatenate([arr1, arr2], axis=0)
describe(data)
print(hrule(20))
data = np.concatenate([arr1, arr2], axis=1)
describe(data)


There are some convenience functions, like vstack and hstack , for common kinds of
concatenation. The above operations could have been expressed as:


In [None]:
data = np.vstack([arr1, arr2])
describe(data)
print(hrule(20))
data = np.hstack([arr1, arr2])
describe(data)

**split** slices apart an array into multiple arrays along an axis:

In [None]:
data = np.arange(10).reshape(5,2)
describe(data)
print(hrule(20))

f,s,t = np.split(data, [1, 3])

print("First Split")
describe(f)
print(hrule(20))

print("Second Split")
describe(s)
print(hrule(20))

print("Third Split")
describe(t)

Other useful functions

| Function  | Description |
|---|---|
| concatenate | Most general function, concatenates collection of arrays along one axis   |
| vstack, row_stack  |  Stack arrays row-wise (along axis 0)  |
| hstack | Stack arrays column-wise (along axis 1)  |
| column_stack |Like hstack, but converts 1D arrays to 2D column vectors first |
| dstack |Stack arrays “depth"-wise (along axis 2) |
| split |Split array at passed locations along a particular axis |
| hsplit / vsplit / dsplit |Convenience functions for splitting on axis 0, 1, and 2, respectively |

## Broadcasting
Broadcasting describes how arithmetic works between arrays of different shapes. It is
a very powerful feature, but one that can be easily misunderstood, even by experienced
users. 

The simplest example of broadcasting occurs when combining a scalar value
with an array:

In [None]:
arr = np.arange(5)
arr = arr*4
describe(arr)

The scalar 4 has been broadcast to all of the other elements in the multiplication operation.

Here, there is a another example where  a multi-dimensional array and a one-dimensional
array are involved.

We want to subtract the column means by each column of a 2-dimensional array:

In [None]:
arr = np.arange(12).reshape(4,3)
mean = arr.mean(0)
describe(arr)
print(hrule(20))

In [None]:
describe(mean)
print(hrule(20))

In [None]:
sub = arr - mean
describe(sub)

This is what happened:


In [None]:
Image(os.path.join(IMG_PATH, 'broadcasting.png'), width=500)

---
**The Broadcasting Rule**

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

1. **R1**. If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side

2. **R2**. If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape

3. **R3**. If in any dimension the sizes disagree and neither is equal to 1, an error is raised

---

**Case 1**


In [None]:
M = np.ones((2,3))
a = np.arange(3)

describe(M)
print(hrule(20))

describe(a)


M and a have different dimensions, therefore **R1** applies. The shape of the
of the one with fewer dimensions is *padded* with ones on its leading (left) side.

The shape change as:

```
  M.shape = (2,3) -> (2, 3)
  a.shape = (3,) -> (1, 3)
```
Then, since the first dimension disagree, **R2** applies, thus NumPy stretches this dimension to match.

```
M.shape = (2,3) -> (2,3) -> (2,3)
a.shape = (3,) -> (1,3) -> (2, 3)
```
Now the shape of the two arrays matches and the operation can be performed.

This is what happened to array ```a```




In [None]:
np.repeat(np.arange(3).reshape(1,3),3,0)

In [None]:
Image(os.path.join(IMG_PATH, 'broadcasting_.png'), width=500)

In [None]:
describe(M+a)

---
**Case 2**

Let's see an example where both arrays need to be broadcast


In [None]:
a = np.arange(3).reshape(3,1)
b = np.arange(3)

describe(a)
print(hrule(20))

describe(b)


In this case **R1** applies, therefore the shape of *b* is padded.

```
a.shape = (3, 1) -> (3, 1)
b.shape = (3, )  -> (1, 3)
```
The **R2** applies to both the arrays:

```
a.shape = (3, 1) -> (3, 1) -> (3, 3)
b.shape = (3, ) -> (1, 3) -> (3, 3)
```
Because the result matches, these shapes are compatible. 
We can see this here:

In [None]:
describe(a+b)

---
**Case 3**

Now let's take a look at an example in which the two arrays are not compatible:

In [None]:
M = np.ones((3,2))
a = np.arange(3)

describe(M)
print(hrule(20))

describe(a)
print(hrule(20))

This is just a slightly different situation than in the first example: the matrix M is transposed.
Again, **R1** applies on array *a*. Then by **R2** the first dimension of *a* is stretched
to match that of *M*.

```
M.shape = (3, 2) -> (3, 2) -> (3, 2)
a.shape = (3,)   -> (1, 3) -> (3, 3)  
```

Then **R3** is triggered, the array are compatible and a ValueError exception is raised


In [None]:
M+a

---
Exercise
----
 We want to obtain the following matrix.


> Hint: You can use np.arange(3) and broadcasting


In [None]:
Image(os.path.join(IMG_PATH, 'broadcasting_1.png'), width = 500)

In [None]:
#your code here


---
Exercise
------
A useful operation in many contexts is the mean centering.

Imagine you have an array of 10 observations, each of which consists of 3 values. 

Using the standard convention we'll store this in a 10×3 array.

---
>**Definition**. Mean-centering involves the subtraction of the variable averages from the data. Since multivariate data is typically handled in table format (i.e. matrix) with columns as variables, mean-centering is often referred to as column centering.
What we do with mean-centering is to calculate the average value of each variable and then subtract it from the data. This implies that each column will be transformed in such a way that the resulting variable will have a zero mean.

---


In [None]:
X = np.random.randint(-10, +10, 30).reshape(10, 3)
describe(X)

In [None]:
# your code here


Test the correctness of your solution. 
Your are supposed to obtain an array of 0s

In [None]:
# replace the X_ with your variable
X_ = X_.mean(0)
describe(X_)

---
## Conclusions

This is the end of this section.
If you want to keep practicing you can follow this link:
[Numpy 100](http://www.labri.fr/perso/nrougier/teaching/numpy.100/)