![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<div style="background-color: white; padding: 10px; display: inline-block;">
    <img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />
</div>

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [2]:
import numpy as np

## Basic Numpy Arrays

In [3]:
lst = [11, 12, 13]
print(type(lst))

arr = np.array(lst)
print(type(arr))

<class 'list'>
<class 'numpy.ndarray'>


In [4]:
a = np.array([1, 2, 3, 4])
a

array([1, 2, 3, 4])

In [5]:
b = np.array([0, 0.5, 1, 1.5, 2])
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [6]:
a[0], a[1], a[2], a[1], b[1], b[0], b[2]

(1, 2, 3, 2, 0.5, 0.0, 1.0)

In [7]:
a = np.array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

print(a)
print(a[-1:3])
print(a[-9:5])
print(a[1:-1])
print(a[::2])
print(a[::3])
print(a[::-2])

[11 12 13 14 15 16 17 18 19 20]
[]
[12 13 14 15]
[12 13 14 15 16 17 18 19]
[11 13 15 17 19]
[11 14 17 20]
[20 18 16 14 12]


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Array Data Types

In [8]:
a = np.array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
a

array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

In [9]:
a.dtype

dtype('int64')

In [10]:
b = np.array([0, 0.5, 1, 1.5, 2])
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [11]:
b.dtype

dtype('float64')

In [12]:
print(np.array([11, 12, 13, 14]))  # default by integer
print(np.array([11, 12, 13, 14], dtype=int))
print(np.array([11, 12, 13, 14], dtype=float))

[11 12 13 14]
[11 12 13 14]
[11. 12. 13. 14.]


In [13]:
c = np.array(["a", "b", "c"])
print(c.dtype)  # `Ùnicode String``

<U1


In [14]:
d = np.array(["a", "b", "c", 11, 12, 13])
print(d.dtype)  # `Ùnicode String``
d

<U21


array(['a', 'b', 'c', '11', '12', '13'], dtype='<U21')

In [15]:
e = np.array([True, False, True])
print(e.dtype)
e

bool


array([ True, False,  True])

To know more about data type see this [document](https://numpy.org/devdocs/reference/arrays.dtypes.html)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Array Dimensions, shapes, size etc

In [16]:
A = np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]])
A

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [17]:
print("num of rows & cols:", A.shape)
print("dimension:", A.ndim)
print("num of elems:", A.size)

num of rows & cols: (3, 4)
dimension: 2
num of elems: 12


In [18]:
B = np.array(
    [
        [
            [12, 11, 10],
            [9, 8, 7],
        ],
        [[6, 5, 4], [3, 2, 1]],
    ]
)

B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [19]:
print(B.dtype)
print(B.shape)
print(B.ndim)
print(B.size)

int64
(2, 2, 3)
3
12


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

### row indexing

In [20]:
# Square matrix
A = np.array(
    [
        # 0.  1.  2.  3
        [10, 11, 12, 13],  # 0
        [14, 15, 16, 17],  # 1
        [18, 19, 20, 21],  # 2
    ]
)

In [21]:
A[2]  # 3rd row NOT 3rd elem

array([18, 19, 20, 21])

In [22]:
print(A[0:-1])
print("")

print(A[1:])
print("")

print(A[:2])
print("")

print(A[1:-1])

[[10 11 12 13]
 [14 15 16 17]]

[[14 15 16 17]
 [18 19 20 21]]

[[10 11 12 13]
 [14 15 16 17]]

[[14 15 16 17]]


### row & col indexing using comma

In [23]:
A[:, :2]

array([[10, 11],
       [14, 15],
       [18, 19]])

In [24]:
A[:2, :2]

array([[10, 11],
       [14, 15]])

In [25]:
A[:2, 2:]

array([[12, 13],
       [16, 17]])

### chained indexing vs comma indexing

In [26]:
print(A[1][0])  # chainned indexing
print(A[1, 0])  # comma indexing

# you WILL SEE DIFF SOON!

14
14


In [27]:
print(A[1][:])  # chainned indexing
print(A[1, :])  # comma indexing

# you WILL SEE DIFF SOON!

[14 15 16 17]
[14 15 16 17]


In [28]:
print(A[:][2])  # chainned indexing
print(A[:, 2])  # comma indexing

# SEE DIFFERANCE?

[18 19 20 21]
[12 16 20]


### introducing broadcasting

In [29]:
A

array([[10, 11, 12, 13],
       [14, 15, 16, 17],
       [18, 19, 20, 21]])

In [30]:
A[1] = np.array([10, 10, 10, 10])
A
# this is not broadcasting. we are assigning list to be fit same sie

array([[10, 11, 12, 13],
       [10, 10, 10, 10],
       [18, 19, 20, 21]])

In [31]:
A[2] = 99
A
# this IS broadcasting! SINGLE INTEGER is propagating to entire row

array([[10, 11, 12, 13],
       [10, 10, 10, 10],
       [99, 99, 99, 99]])

In [32]:
# broadcasting is NOT working in regular python
lst = [[11, 12, 13], [10, 10, 10], [17, 18, 19]]

lst[2] = 99

lst

[[11, 12, 13], [10, 10, 10], 99]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

### 1D array stat

In [33]:
a = np.array([31, 32, 33, 34])
a

array([31, 32, 33, 34])

In [34]:
a.sum()

130

In [35]:
a.mean()

32.5

In [36]:
a.var()

1.25

In [37]:
np.sqrt(a.var())

1.118033988749895

In [38]:
a.std()

1.118033988749895

### 2D array stat (introducing axis)

In [39]:
A = np.array(
    [
        [31, 32],
        [33, 34],
    ]
)

In [40]:
A.sum()

130

In [41]:
A.sum(axis=0)  # by row (=within row X), vertically

array([64, 66])

In [42]:
A.sum(axis=1)  # by col (=within col X), horizontally

array([63, 67])

In [43]:
A = np.array([[11, 12, 13], [14, 15, 16], [17, 18, 19]])

In [44]:
A.sum(), A.mean(), A.var(), A.std()

(135, 15.0, 6.666666666666667, 2.581988897471611)

In [45]:
A.sum(axis=0), A.sum(axis=1)

(array([42, 45, 48]), array([36, 45, 54]))

In [46]:
A.mean(axis=0), A.mean(axis=1)

(array([14., 15., 16.]), array([12., 15., 18.]))

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)



## Arange & Linspace

### Arange(lib) vs Range(regular python)

In [47]:
print(list(range(10)))
print(list(range(11, 20)))
print(list(range(11, 20, 3)))
print(list(range(20, 11, -3)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[11, 12, 13, 14, 15, 16, 17, 18, 19]
[11, 14, 17]
[20, 17, 14]


In [48]:
print(np.arange(10))
print(np.arange(11, 20))
print(np.arange(11, 20, 3))

[0 1 2 3 4 5 6 7 8 9]
[11 12 13 14 15 16 17 18 19]
[11 14 17]


In [49]:
print(np.arange(11, 20, 1.25))  # try if it works in regular python

[11.   12.25 13.5  14.75 16.   17.25 18.5  19.75]


### linspace

In [50]:
np.arange(0, 1, 0.25)

array([0.  , 0.25, 0.5 , 0.75])

In [51]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [52]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [53]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

## Vectorized operations (w/ broadcasting review)

In [54]:
np.array([11, 12, 13, 14]) * 100

array([1100, 1200, 1300, 1400])

In [55]:
np.arange(11, 15) - 99

array([-88, -87, -86, -85])

In [56]:
lst = [11, 12, 13, 14] + 100
# error, "lst + integer" not working in python

TypeError: can only concatenate list (not "int") to list

In [None]:
lst = [11, 12, 13, 14]
# you need to use for loop (this is much longer process compared to numpy)
[i + 100 for i in lst]

[111, 112, 113, 114]

In [None]:
a = [100, 101, 102, 103]
b = [3, 4, 5, 6]

In [None]:
a + b

[100, 101, 102, 103, 3, 4, 5, 6]

In [None]:
np.array(a) + np.array(b)

array([103, 105, 107, 109])

In [None]:
# a * b # error!

In [None]:
np.array(a) * np.array(b)

array([300, 404, 510, 618])

## Logical Indexing

In [None]:
a = np.array([33, 34, 35, 36])
a

array([33, 34, 35, 36])

In [None]:
print(a[1])  # single indexing
print(a[[2, 3, 3, 3, -1, 2, 3, -2, 0]])  # multi indexing
print(a[[2, 0, 3, -2, 0]])  # multi indexing
print(a[[True, True, False, True]])  # logical indexing

34
[35 36 36 36 36 35 36 35 33]
[35 33 36 35 33]
[33 34 36]


In [None]:
a >= 35

array([False, False,  True,  True])

In [None]:
a[a >= 35]

array([35, 36])

In [None]:
a[a % 2 == 0]

array([34, 36])

In [None]:
a[a > a.mean()]

array([35, 36])

In [None]:
a[~(a > a.mean())]

array([33, 34])

In [None]:
a[(a == 33) | (a == 36)]  # or is for regular python, use symbol in numpy

array([33, 36])

In [None]:
a[(a <= 33) & (a == 36)]  # and is for regular python, use symbol in numpy

array([], dtype=int64)

In [None]:
a[(a >= 34) & (a % 2 == 0)]  # and is for regular python, use symbol in numpyß

array([34, 36])

In [None]:
A = np.random.randint(100, size=(4, 6))  # introducing random function
A

array([[51, 27, 65, 59, 79, 96],
       [52, 24, 59, 92, 71, 57],
       [ 5, 85, 72, 92, 49, 74],
       [93, 87,  9, 10, 93, 28]])

In [None]:
A[
    np.array(
        [
            [True, False, True, False, True, False],
            [False, True, False, True, True, False],
            [True, False, True, True, True, False],
            [True, False, True, False, False, True],
        ]
    )
]
# once logical indexing is used for 2D array. We cannot keep the shape

array([51, 65, 79, 24, 92, 71,  5, 72, 92, 49, 93,  9, 28])

In [None]:
mask = A > 30
mask

array([[ True, False,  True,  True,  True,  True],
       [ True, False,  True,  True,  True,  True],
       [False,  True,  True,  True,  True,  True],
       [ True,  True, False, False,  True, False]])

In [None]:
A[mask]

array([51, 65, 59, 79, 96, 52, 59, 92, 71, 57, 85, 72, 92, 49, 74, 93, 87,
       93])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [None]:
A = [[1, 2], [3, 4]]
B = [[10, 20], [30, 40]]
A = np.array(A)
B = np.array(B)

In [None]:
A + B

array([[11, 22],
       [33, 44]])

In [None]:
A * B

array([[ 10,  40],
       [ 90, 160]])

In [None]:
B * A

array([[ 10,  40],
       [ 90, 160]])

In [None]:
A @ B  # result can be diff from A * B

array([[ 70, 100],
       [150, 220]])

In [None]:
B @ A  # result can be diff from A * B

array([[ 70, 100],
       [150, 220]])

In [None]:
C = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

D = np.array([[6, 5], [4, 3], [2, 1]])

In [None]:
C.dot(D)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [None]:
C @ D  # same as C.dot(D)

array([[20, 14],
       [56, 41],
       [92, 68]])

### transpose

In [None]:
C.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [None]:
D.T

array([[6, 4, 2],
       [5, 3, 1]])

In [None]:
D @ C  # error

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 2)

In [None]:
D.T @ C

array([[36, 48, 60],
       [24, 33, 42]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

In [None]:
import sys

### Int, floats

In [None]:
# remember we import sys module
# An integer in Python is > 24bytes
sys.getsizeof(1)

28

In [None]:
# Longs are even larger
sys.getsizeof(10**99)

68

In [None]:
# Numpy size is much smaller
np.dtype(int).itemsize

8

In [None]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

Lists are even larger

In [None]:
# A one-element list
print(sys.getsizeof([1]))
print(sys.getsizeof([1, 2, 3, 4]))

64
88


In [None]:
# An array of one element in numpy
np.array([1]).nbytes

8

### And performance is also important

In [None]:
l = list(range(1000000))
a = np.arange(1000000)

In [None]:
%time np.sum(a ** 2)

CPU times: user 1.83 ms, sys: 2.23 ms, total: 4.06 ms
Wall time: 2.78 ms


333332833333500000

In [None]:
%time sum([x ** 2 for x in l])

CPU times: user 305 ms, sys: 16.4 ms, total: 322 ms
Wall time: 321 ms


333332833333500000

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

`random` 

In [None]:
ra = np.random.random(size=2)
ra

array([0.71478833, 0.86032725])

In [None]:
ra = np.random.normal(size=2)
ra

array([-0.44324428, -1.02727548])

In [None]:
ra = np.random.rand(2, 4)
ra

array([[0.9895843 , 0.89121794, 0.35941467, 0.20615986],
       [0.84238982, 0.16902483, 0.49874971, 0.25918184]])

In [None]:
ra = np.random.randn(
    100, 100
)  # Return a sample (or samples) from the “standard normal” distribution.
print(ra)
print("avg:", ra.mean().round(2), "  std:", ra.std().round(2))

[[ 0.48612894 -1.04400509 -1.40641645 ...  0.34520384 -0.18899413
   1.73516893]
 [-0.47094705 -0.08451515 -0.57197409 ...  0.76662248 -1.65146258
  -1.04310813]
 [ 0.86841155 -0.58263025  1.79034453 ...  0.34216261 -1.02197446
   0.89326353]
 ...
 [-1.0443312   0.24548079  0.95951075 ...  0.21298886 -1.29126276
  -1.76435457]
 [-0.87233576  1.12208252 -2.55040666 ... -0.51984534 -1.11200374
  -0.56672076]
 [ 1.0625598  -2.0873782   0.55717693 ...  1.0973261  -0.80111329
   1.14261325]]
avg: 0.01   std: 1.0


`reshape`

In [None]:
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [None]:
np.arange(12).reshape(2, 6)

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

`zeros`, `ones`, `empty`

In [None]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [None]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [None]:
np.zeros((3, 3), dtype=int)

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [None]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [None]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [None]:
np.ones((3, 3)) * 1234

array([[1234., 1234., 1234.],
       [1234., 1234., 1234.],
       [1234., 1234., 1234.]])

In [None]:
np.empty(5)

array([1., 1., 1., 1., 1.])

In [None]:
np.empty((2, 2, 2))

array([[[0.9895843 , 0.89121794],
        [0.35941467, 0.20615986]],

       [[0.84238982, 0.16902483],
        [0.49874971, 0.25918184]]])

In [None]:
ra = np.random.rand(2, 4)
np.ones_like(ra)  # taking shadow of ra

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [None]:
ra = np.random.rand(2, 4)
np.zeros_like(ra)  # taking shadow of ra

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.]])

`identity` & `eye`

In [None]:
np.identity(10)

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

In [None]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [None]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [None]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [None]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])