NumPy enables fast computation in Python, the underlying implementation is in C hence it's blazing fast. The Key feature of NumPy is the `ndarray` object. The data type should be homogenous, that is the array should contain elements of single data type.  

In [2]:
import numpy as np

vector = np.array([1, 2, 3, 4])
print("Vector: {}".format(vector))
# Every array will have a shape. That is, its dimensions
print("Shape: {}".format(vector.shape))
# Print number of dimensions
print("Dim: {}".format(vector.ndim))
print("Data type: {}".format(vector.dtype))

Vector: [1 2 3 4]
Shape: (4,)
Dim: 1
Data type: int32


The number of dimensions numpy uses is as follows:

`(depth, rows, columns) `

So a 3D array of 3 rows 2 columns and 2 depth will have following shape:

`(2, 3, 2)`

Which is somewhat counter intuitive since we expect "2" to be on the other side. This is the convention followed in mostly all libraries. E.g. OpenCV

In [3]:
v = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
v.shape = (2, 3, 2)
print(v)

ValueError: cannot reshape array of size 11 into shape (2,3,2)

In [4]:
v = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12])
v.shape = (2, 3, 2)
print(v)

[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]


In [15]:
v = np.zeros((2, 3, 2))
print(v)

[[[ 0.  0.]
  [ 0.  0.]
  [ 0.  0.]]

 [[ 0.  0.]
  [ 0.  0.]
  [ 0.  0.]]]


## arange
The `arange` function is similar to Python's `range` function. The data type, if not specified, in many cases will be `np.foat64`. 

In [18]:
a = np.arange(15)
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


## zeros, zeros_like
`zeros(dim)` will return a np.array of `dim` dimensions initialised with 0. Note that `dim` should be a `tuple`. 

`zeros_like(array)` will return a np.array of same dimensions as of `array` initialised with zeros. 

#### same functionality is with `ones` , and `ones_like` except of course the initialization is done with ones. 

#### same functionality with `empty`, and `empty_like` which will create numpy arrays but won't initialise it with anything (hence, faster) By default, all the values in the array will have garbage values. 


In [33]:
print("Zeros")
a = np.zeros((3, 3))
print("A: {}".format(a))
b = np.zeros_like(a)
print("B: {}".format(b))
print("\nOnes")
a = np.ones((3, 3))
print("A: {}".format(a))
b = np.ones_like(a)
print("B: {}".format(b))
print("\nEmpty")
a = np.empty((3, 3))
print("A: {}".format(a))
b = np.empty_like(a)
print("B: {}".format(b))

Zeros
A: [[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]
B: [[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]

Ones
A: [[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
B: [[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]

Empty
A: [[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
B: [[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]


## astype
`astype` will convert one data type to another. Also note that `astype` will __create a new copy of the input array (even if the data type is same)__. 

Also, converting from higher precision (like float) to lower precision (like int) will cause loss of information (decimal part is lost in case of float to int). 

In [37]:
a = np.array([1, 2, 3, 4.5, 6.7])
print("A: {}, dtype: {}".format(a, a.dtype))
b = a.astype(np.int)
print("B: {}, dtype: {}".format(b, b.dtype))

A: [ 1.   2.   3.   4.5  6.7], dtype: float64
B: [1 2 3 4 6], dtype: int64


## Vectorization and vector-scalar operations
Using for loops in code is not only prone to error but also is inefficient. We can use NumPy operations to circumvent such for loops. This process is called vectorization. 

#### Using operations on same sized arrays produce element wise operations. 

In [40]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[4, 5, 6], [1, 2, 3]])

c = a + b
print(c)

c = a * b
print(c)

c = a - b
print(c)

[[5 7 9]
 [5 7 9]]
[[ 4 10 18]
 [ 4 10 18]]
[[-3 -3 -3]
 [ 3  3  3]]


Using scalars with vectors will produce element wise operations

In [3]:
a = 3
b = np.array([[1, 2, 3], [4, 5, 6]])

c = a + b
print(c)

c = a * b
print(c)

c = 1.0 / b
print(c)

[[4 5 6]
 [7 8 9]]
[[ 3  6  9]
 [12 15 18]]
[[ 1.          0.5         0.33333333]
 [ 0.25        0.2         0.16666667]]


## Slicing 

You can slice by following syntax:
```
array[start_index:end_index] 
```
For n-dimensional array:
```
array[start_index:end_index, start_index:end_index] 
```

Slicing NumPy arrays is similar to that of Python lists. One main distinction in Python list and NumPy array is that the slice __is not the copy, but the original array. Hence, if any operations on the slice will be reflected in the original array.__

In [4]:
a = np.arange(20)
print(a)
a[10:15] = 5
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[ 0  1  2  3  4  5  6  7  8  9  5  5  5  5  5 15 16 17 18 19]


If you want to avoid above scenario, you can use `copy()`

In [6]:
a = np.arange(20)
print(a)
b = a[10:15].copy()
b = 5
# value in the original array doesn't change
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


Slicing by `:` will take the entire axis. So:
```
1. arr2d[:, 0]         Will return array of shape (3, )
2. arr2d[:, :1]        Will return array of shape (3, 1)
```

In [23]:
arr2d = np.array([[1, 2, 3], 
                  [4, 5, 6], 
                  [7, 8, 9]])
arr2d[:, 0]

array([1, 4, 7])

## Boolean indexing
Using boolen indexing, you can use it to filter or check if any entries have any specific values. 

In [29]:
a = np.array(["Mayur", "is", "an", "awesome", "coder"])
a == "Mayur" # Returns boolean array


array([ True, False, False, False, False], dtype=bool)

In [35]:
# lists entry where value != "Mayur" 
a[~(a == "Mayur")]

array(['is', 'an', 'awesome', 'coder'],
      dtype='|S7')

__You can use `|` for `or` and `&` for `and` but not Python's `and`, `or` will not work with NumPy's indexing. __

In [41]:
(a == "Mayur") | (a == "coder")

array([ True, False, False, False,  True], dtype=bool)

## Fancy indexing
You can index a list to print the array in the given order. For instance, you want to print the 1st row, 3rd row, and 2nd row in that order. 


In [61]:
a = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]])
print(a)
a[[1, 3, 2]]

[[1 1 1]
 [2 2 2]
 [3 3 3]
 [4 4 4]]


array([[2, 2, 2],
       [4, 4, 4],
       [3, 3, 3]])

## Transposing

You can obtains the transpose of your matring using `matrix.T` where `matrix` is your matrix name. 

In [62]:
a = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]])
print(a.T)

[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]


## Universal Functions
NumPy has variety of functions that can be applied to scalars as well as vectors. Some examples are sqrt, exp, log, log10, sin, cos, arcsin etc. 

In [64]:
a = 20
b = np.random.rand(2, 2)
print(np.exp(a))
print(np.exp(b))

485165195.41
[[ 1.20093606  1.55455582]
 [ 2.45014857  1.87287753]]


## meshgrid

One of the most useful function is meshgrid. It's used to visualize data boundaries of your classifier. What you do is train your classifier, then create a meshgrid of every pixel in the plot, and then classify the pixel. When you give the pixel a specific color according to the labelled class you can clearly visualize the boundaries. 

Using meshgrid requires three steps. 
1. Create xs (1D array)
2. Create ys (1D array)
3. Create meshgrid (2D array) which corresponds to every pixel in the graph.  

In [71]:
xs = np.linspace(1, 10, 100)
ys = np.linspace(1, 10, 100)
xx, yy = np.meshgrid(xs, ys)
# plot with xx and yy

array([[  1.        ,   1.09090909,   1.18181818, ...,   9.81818182,
          9.90909091,  10.        ],
       [  1.        ,   1.09090909,   1.18181818, ...,   9.81818182,
          9.90909091,  10.        ],
       [  1.        ,   1.09090909,   1.18181818, ...,   9.81818182,
          9.90909091,  10.        ],
       ..., 
       [  1.        ,   1.09090909,   1.18181818, ...,   9.81818182,
          9.90909091,  10.        ],
       [  1.        ,   1.09090909,   1.18181818, ...,   9.81818182,
          9.90909091,  10.        ],
       [  1.        ,   1.09090909,   1.18181818, ...,   9.81818182,
          9.90909091,  10.        ]])

# where
if you have 3 arrays x, y, and condition then, `np.where` is replacement for using:
```
if condition: 
    use x
else:
    use y
```

In [72]:
a = [0, -1, 2, 3, -4, -5]
b = [9, 3, 4, 11, 2, 3]
c = [True, False, True, True, False, True]
np.where(c, a, b)

array([ 0,  3,  2,  3,  2, -5])

## mean, sum, std
NumPy provides variety of functions for statistical use. You can furthermore specify the axis you want to reduce. 

In [78]:
a = np.random.rand(3, 3)
print(a)
print(np.mean(a)) # both are fine
print(a.mean())

print(np.std(a))

[[ 0.47420622  0.43247366  0.93400638]
 [ 0.75673826  0.02606293  0.96672388]
 [ 0.00133784  0.89357209  0.08779815]]
0.508102157536
0.508102157536
0.376361414067
