![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on!

In [1]:
import sys
import numpy as np

## Basic Numpy Arrays

In [3]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [4]:
a = np.array([1, 2, 3, 4])

In [5]:
b = np.array([0, .5, 1, 1.5, 2])

In [6]:
a[0], a[1]

(1, 2)

In [7]:
a[0:]

array([1, 2, 3, 4])

In [8]:
a[1:3]

array([2, 3])

In the below example, it kind of goes against the logic I was used to but when I do a single selection index of say a[-1], this selection is considered INCLUSIVE.
When I'm slicing, and specify -1 as apart of a range, apparently it is EXCLUSIVE, so the system will return the value up to that point, but not the last element here. Still unsure about if I understand this one.

In [None]:
a[1:-1]

array([2, 3])

In [9]:
a[-1]

4

In [None]:
a[::2]

array([1, 3])

In [None]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [None]:
b[0], b[2], b[-1]

(0.0, 1.0, 2.0)

In [None]:
b[[0, 2, -1]]

array([0., 1., 2.])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Types

In [None]:
a.dtype

dtype('int64')

In [11]:
a

array([1, 2, 3, 4])

In [None]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [None]:
b.dtype

dtype('float64')

In [None]:
np.array([1, 2, 3, 4], dtype=np.float)

array([1., 2., 3., 4.])

You can change the size to improve performance, in the below example, you are changing it from float64 (64 bit, to 8 bit)

In [12]:
np.array([1, 2, 3, 4], dtype=np.int8)

array([1, 2, 3, 4], dtype=int8)

In [13]:
c = np.array(['a', 'b', 'c'])

In [14]:
c.dtype

dtype('<U1')

In [15]:
d = np.array([{'a': 1}, sys])

In [16]:
d.dtype

dtype('O')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [17]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

2 rows, and 3 columns


In [18]:
A.shape

(2, 3)

In [19]:
A.ndim

2

In [None]:
A.size

6

In [20]:
B = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]
])

In [21]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [22]:
B.shape

(2, 2, 3)

In [23]:
B.ndim

3

In [24]:
B.size

12

If the shape isn't consistent, it'll just fall back to regular Python objects:

In [25]:
C = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4]
    ]
])

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

In [26]:
C.dtype

NameError: name 'C' is not defined

In [27]:
C.shape

NameError: name 'C' is not defined

In [None]:
C.size

2

In [None]:
type(C[0])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

In [28]:
# Square matrix
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [29]:
A[1]

array([4, 5, 6])

In [30]:
A[1][0]

4

There is a better way to access elements. The below code represents how, where d=dimension

In [None]:

# A[d1, d2, d3, d4]

In this example below, it is saying in Array A, select the first element in the 2nd row.

In [31]:
A[1, 0]

4

In this example, it is saying for Array A, select the first and 2nd row. Remember the index 2, is EXCLUSIVE - so it does not include the 3rd row.

In [33]:
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In this example, this selects ALL rows, and then the second part, following the comma, will select all elements up to the 3rd one (where the 3rd element in EXCLUDED)

In [34]:
A[:, :2]

array([[1, 2],
       [4, 5],
       [7, 8]])

I believe in this example, the first 2 rows are selected, and the first 2 elements in the rows are selected.

In [35]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In this example, the first 2 rows will be selected. And then all elements started from the 3rd element all the way to the end of the elements in the row will be selected

In [None]:
A[:2, 2:]

array([[3],
       [6]])

In [36]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

The below code is saying for the second row in the array, they will add this row of values

In [37]:
A[1] = np.array([10, 10, 10])

In [39]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In the below example, this is called "expanding" an array. So a single value is passed through, but that value is applied as each element in the array


In [40]:
A[2] = 99

In [41]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [42]:
a = np.array([1, 2, 3, 4])

In [43]:
a.sum()

10

In [44]:
a.mean()

2.5

In [45]:
a.std()

1.118033988749895

In [46]:
a.var()

1.25

In [47]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [None]:
A.sum()

45

In [None]:
A.mean()

5.0

In [None]:
A.std()

2.581988897471611

Axis = 0, is the x axis I believe, or the ROW

In [None]:
A.sum(axis=0)

array([12, 15, 18])

Axis = 1, is the y axis I believe, or the COLUMN? There can be even more axises as the number of dimensions increases



In [48]:
A.sum(axis=1)

array([ 6, 15, 24])

In [None]:
A.mean(axis=0)

array([4., 5., 6.])

In [None]:
A.mean(axis=1)

array([2., 5., 8.])

In [None]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [None]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

In [49]:
a = np.arange(4)

In [50]:
a

array([0, 1, 2, 3])

Vectorization of an array, says that you add 10 to each element in this array below

In [51]:
a + 10

array([10, 11, 12, 13])

In [52]:
a * 10

array([ 0, 10, 20, 30])

In [53]:
a

array([0, 1, 2, 3])

In the below example, you are BROADCASTING, and changing the array by adding 100 to each element in the array. now it is IMMUTABLE. A has been modified

In [55]:
a += 100

In [56]:
a

array([200, 201, 202, 203])

In [58]:
l = [0, 1, 2, 3]

The below is a list comprehension. Kind of like a shortcut to apply operations for each element in the array l

In [None]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [59]:
a = np.arange(4)

In [60]:
a

array([0, 1, 2, 3])

In [61]:
b = np.array([10, 10, 10, 10])

In [62]:
b

array([10, 10, 10, 10])

In [63]:
a + b

array([10, 11, 12, 13])

In [64]:
a * b

array([ 0, 10, 20, 30])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
_(Also called masks)_

In [65]:
a = np.arange(4)

In [66]:
a

array([0, 1, 2, 3])

In [67]:
a[0], a[-1]

(0, 3)

In [69]:
a[[0, -1]]

array([0, 3])

You use boolean values to select elements in the below example


In [70]:
a[[True, False, False, True]]

array([0, 3])

In [71]:
a

array([0, 1, 2, 3])

You can broadcast boolean operations that can evaluate to true or false

In [72]:
a >= 2

array([False, False,  True,  True])

In [None]:
a[a >= 2]

array([2, 3])

In [None]:
a.mean()

1.5

In [None]:
a[a > a.mean()]

array([2, 3])

In [None]:
a[~(a > a.mean())]

array([0, 1])

In [None]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [None]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [None]:
A = np.random.randint(100, size=(3, 3))

In [None]:
A

array([[71,  6, 42],
       [40, 94, 24],
       [ 2, 85, 36]])

In [None]:
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]

array([71, 42, 94,  2, 36])

In [None]:
A > 30

array([[ True, False,  True],
       [ True,  True, False],
       [False,  True,  True]])

In [None]:
A[A > 30]

array([71, 42, 40, 94, 85, 36])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [None]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [None]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [None]:
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [None]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [None]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [None]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

### Int, floats

In [None]:
# An integer in Python is > 24bytes
sys.getsizeof(1)

28

In [None]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [None]:
# Numpy size is much smaller
np.dtype(int).itemsize

8

In [None]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

In [None]:
np.dtype(float).itemsize

8

### Lists are even larger

In [None]:
# A one-element list
sys.getsizeof([1])

In [None]:
# An array of one element in numpy
np.array([1]).nbytes

### And performance is also important

In [None]:
l = list(range(100000))

In [None]:
a = np.arange(100000)

In [None]:
%time np.sum(a ** 2)

CPU times: user 1.06 ms, sys: 279 µs, total: 1.34 ms
Wall time: 701 µs


333328333350000

In [None]:
%time sum([x ** 2 for x in l])

CPU times: user 36.1 ms, sys: 0 ns, total: 36.1 ms
Wall time: 35.5 ms


333328333350000

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

### `random`

In [None]:
np.random.random(size=2)

In [None]:
np.random.normal(size=2)

In [None]:
np.random.rand(2, 4)

---
### `arange`

In [None]:
np.arange(10)

In [None]:
np.arange(5, 10)

In [None]:
np.arange(0, 1, .1)

---
### `reshape`

In [None]:
np.arange(10).reshape(2, 5)

In [None]:
np.arange(10).reshape(5, 2)

---
### `linspace`

In [None]:
np.linspace(0, 1, 5)

In [None]:
np.linspace(0, 1, 20)

In [None]:
np.linspace(0, 1, 20, False)

---
### `zeros`, `ones`, `empty`

In [None]:
np.zeros(5)

In [None]:
np.zeros((3, 3))

In [None]:
np.zeros((3, 3), dtype=np.int)

In [None]:
np.ones(5)

In [None]:
np.ones((3, 3))

In [None]:
np.empty(5)

In [None]:
np.empty((2, 2))

---
### `identity` and `eye`

In [None]:
np.identity(3)

In [None]:
np.eye(3, 3)

In [None]:
np.eye(8, 4)

In [None]:
np.eye(8, 4, k=1)

In [None]:
np.eye(8, 4, k=-3)

In [None]:
"Hello World"[6]

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)