<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)



# NUMPY
    * Array(1D,2D,3D)
    * Create Array
    * Access Array Element
    * Numpy Array function
    * Broadcasting
    * Others

In [3]:
import numpy as np
import timeit

# Python thuần
l = list(range(100000000))
%timeit l2 = [x * 2 for x in l]

# NumPy
a = np.arange(100000000)
%timeit a2 = a * 2



4.09 s ± 307 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
110 ms ± 1.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [4]:
import sys
import numpy as np

## Creating Numpy Arrays from Python Lists

In [None]:
np.array([1, 2, 3, 4])         

array([1, 2, 3, 4])

Unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point)

In [6]:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

In [7]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

Unlike Python lists, NumPy arrays can explicitly be **multi-dimensional**

In [8]:
[range(i, i + 3) for i in [2, 4, 6]]

[range(2, 5), range(4, 7), range(6, 9)]

In [9]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

## Creating Arrays from Scratch

### `zeros`, `ones`, `full`, `arange`, `linspace`

In [10]:
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [11]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [12]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [13]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [14]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

### `random` 

In [126]:
np.random.seed(0)  # seed for reproducibility

# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.5488135 , 0.71518937, 0.60276338],
       [0.54488318, 0.4236548 , 0.64589411],
       [0.43758721, 0.891773  , 0.96366276]])

In [127]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, 5000)

array([ 1.26611853, -0.50587654,  2.54520078, ...,  0.31032925,
       -0.9329943 ,  1.75048978])

In [132]:
np.random.randint(1,5)

3

In [135]:
np.random.randint(10, size=3)

array([0, 8, 0])

In [136]:
import numpy as np

# Tạo 10 số ngẫu nhiên từ 0.1 đến 0.4
arr = np.random.uniform(low=0.1, high=0.4, size=10)
print(arr)


[0.25281385 0.20398962 0.15977137 0.15992604 0.32465788 0.2883858
 0.37961711 0.2263014  0.29097018 0.24056187]


In [18]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[9, 0, 4],
       [7, 3, 2],
       [7, 2, 0]])

In [118]:
#numpy.random.random: the shape argument is a single tuple.
np.random.random((3,5))

np.random.rand(3,5)

array([[0.39788921, 0.13805613, 0.11650935, 0.73534638, 0.08971777],
       [0.59423826, 0.49987209, 0.22437363, 0.67612186, 0.77998024],
       [0.56132784, 0.31494519, 0.35601754, 0.68650228, 0.32862129]])

### `eye`, `empty`

In [20]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [21]:
np.eye(3, dtype='int8')

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int8)

In [22]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

array([1., 1., 1.])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## NumPy Array Attributes

In [23]:
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [24]:
x3

array([[[9, 3, 0, 5, 0],
        [1, 2, 4, 2, 0],
        [3, 2, 0, 7, 5],
        [9, 0, 2, 7, 2]],

       [[9, 2, 3, 3, 2],
        [3, 4, 1, 2, 9],
        [1, 4, 6, 8, 2],
        [3, 0, 0, 6, 0]],

       [[6, 3, 3, 8, 8],
        [8, 2, 3, 2, 0],
        [8, 8, 3, 8, 2],
        [8, 4, 3, 0, 4]]])

In [25]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


In [26]:
print("dtype:", x3.dtype)

dtype: int32


- `itemsize`, which lists the size (in bytes) of each array element, and 
- `nbytes`, which lists the total size (in bytes) of the array

In [27]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 4 bytes
nbytes: 240 bytes


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Indexing & Slicing
### One-dimensional subarray

In [28]:
x1 = np.random.randint(20, size = 6) # One-dimensional array

In [29]:
x1

array([ 3, 13, 11, 13, 13, 11])

In [30]:
x1[4], x1[-1]

(13, 11)

### Slicing:
`x[start:stop:step]`

In [31]:
x1[:3] #First 3 Element

array([ 3, 13, 11])

In [32]:
x1[4:5]  # middle sub-array

array([13])

In [33]:
x1[::2]  # every other element, every 2 step

array([ 3, 11, 13])

### Multi-dimensional array

In [34]:
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array

In [35]:
x2

array([[8, 0, 8, 5],
       [9, 0, 9, 6],
       [5, 3, 1, 8]])

In [36]:
x2[2,0]

5

In [37]:
x2[2,0] = 11

In [38]:
x2

array([[ 8,  0,  8,  5],
       [ 9,  0,  9,  6],
       [11,  3,  1,  8]])

In [39]:
x2[:2, :3]  # two rows, three columns

array([[8, 0, 8],
       [9, 0, 9]])

In [40]:
print(x2[:, 0])  # first column of x2

[ 8  9 11]


In [41]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [42]:
x = np.array([1, 2, 3])

In [43]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Concatenation and Splitting

In [44]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [45]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [46]:
# concatenate along the first axis
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [47]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [48]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [49]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

### Splitting of arrays

In [50]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


In [51]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [52]:
a = np.array([1, 2, 3, 4])

In [53]:
a.sum()

10

In [54]:
a.mean()

2.5

In [55]:
a.std()

1.118033988749895

In [56]:
a.var()

1.25

In [152]:
a=np.random.normal(0.15,0.05,5000)

In [154]:
b=np.random.uniform(0,0.8,5000)

In [161]:
data_list = [np.random.randn(1000) for _ in range(6)]

In [164]:
a=[1,2,3,4]
b=["a","b","c","d"]

In [165]:
for x,y in zip(a,b):
    print(x,y)

1 a
2 b
3 c
4 d


In [188]:
a=np.random.randint(2,10,(2,3))
a

array([[9, 2, 4],
       [6, 8, 3]])

In [189]:
a.flatten()

array([9, 2, 4, 6, 8, 3])

In [178]:
a=np.arange(2,10,1)
a

array([2, 3, 4, 5, 6, 7, 8, 9])

In [57]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [58]:
A.sum()

45

In [59]:
A.mean()

5.0

In [60]:
A.std()

2.581988897471611

In [61]:
A.sum(axis=0)

array([12, 15, 18])

In [62]:
A.sum(axis=1)

array([ 6, 15, 24])

In [63]:
A.mean(axis=0)

array([4., 5., 6.])

In [64]:
A.mean(axis=1)

array([2., 5., 8.])

In [65]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [66]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

![image-broadcasting](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

In [139]:
a = np.arange(1,10)

In [144]:
b=np.arange(60,69)
b

array([60, 61, 62, 63, 64, 65, 66, 67, 68])

In [146]:
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [147]:
c=a*b
c

array([ 60, 122, 186, 252, 320, 390, 462, 536, 612])

In [69]:
a + 5 #Broadcasting & Vectorized operations

array([5, 6, 7])

In [70]:
a * 10

array([ 0, 10, 20])

In [71]:
a

array([0, 1, 2])

In [72]:
a += 100

In [73]:
a

array([100, 101, 102])

In [74]:
l = [0, 1, 2, 3]

In [75]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [76]:
a = np.arange(4)

In [77]:
a

array([0, 1, 2, 3])

In [78]:
b = np.array([10, 10, 10, 10])

In [79]:
b

array([10, 10, 10, 10])

In [80]:
a + b

array([10, 11, 12, 13])

In [81]:
a * b

array([ 0, 10, 20, 30])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Sorting Arrays

np.sort uses an quicksort algorithm


In [82]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)

array([1, 2, 3, 4, 5])

In [83]:
#A related function is argsort, which instead returns the indices of the sorted elements:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print(i)

[1 0 3 2 4]


### Sorting along rows or columns
NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the axis argument

In [84]:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))
print(X)

[[6 3 7 4 6 9]
 [2 6 7 4 3 7]
 [7 2 5 4 1 7]
 [5 1 4 0 9 5]]


In [85]:
# sort each column of X
np.sort(X, axis=0)

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])

In [86]:
# sort each row of X
np.sort(X, axis=1)

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])

### Partial Sorts: Partitioning

In [87]:
x = np.array([7, 2, 3, 1, 6, 5, 4])
np.partition(x, 3)

array([2, 1, 3, 4, 6, 5, 7])

In [88]:
np.partition(X, 2, axis=1)

array([[3, 4, 6, 7, 6, 9],
       [2, 3, 4, 7, 6, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 9, 5]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [89]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [90]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [91]:
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [92]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [93]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [94]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [95]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])