## Essential Numpy

1. Intro to Numpy
2. Initialization
3. Access
4. Splitting
5. Concatenation
6. Dimension, Shape, and Size
7. Reshaping
8. Utility Functions
9. Broadcasting

### 1. Intro to Numpy

- N-dim array for fast mathematical calculation
- Homogeneous data structure
- Written in C/C++
- Scikit-learn is implemented using this
- Python List is generalized and thus slower

In [None]:
import numpy as np

### 2. Initialization

**Size** is important for all of the following initializations.

- array: user-specified values
- zeros: initialized with zero values
- ones: initialized with one values
- empty: uninitialized array (contains garbage value)
- random: different methods to initialize random values
- arrange: a sequence of numbers based on start, end, by numbers
- linspace: generate equally spaced numbers between start & stop numbers

In [28]:
np.array([1,2,3,4,5,6])

array([1, 2, 3, 4, 5, 6])

In [31]:
np.array([[1,2,3],[2,5,6]])

array([[1, 2, 3],
       [2, 5, 6]])

In [8]:
np.zeros(shape=5)

array([0., 0., 0., 0., 0.])

In [15]:
np.ones(shape=(2,2))

array([[1., 1.],
       [1., 1.]])

In [10]:
np.empty(shape=(2,2,2))

array([[[1.91931677e-076, 5.04621361e+180],
        [8.37170571e-144, 4.50617634e-144]],

       [[4.27255699e+180, 6.12033286e+257],
        [3.83819517e+151, 2.47184823e-056]]])

In [23]:
# shape is the parameter, the distribution is uniform(0,1)
np.random.rand(2,2,2,2)

array([[[[0.11810756, 0.09233453],
         [0.05316151, 0.62446039]],

        [[0.55258034, 0.69619733],
         [0.95695035, 0.30415888]]],


       [[[0.8241276 , 0.67356836],
         [0.97802539, 0.413997  ]],

        [[0.93493049, 0.12489635],
         [0.64610544, 0.19433453]]]])

In [24]:
# allows the user to specify range and size
np.random.randint(0,10,size=(2,2))

array([[9, 0],
       [6, 0]])

In [19]:
np.arange(1,10,2)

array([1, 3, 5, 7, 9])

In [18]:
np.linspace(2,4,8)

array([2.        , 2.28571429, 2.57142857, 2.85714286, 3.14285714,
       3.42857143, 3.71428571, 4.        ])

### 3. Access

In [33]:
d = np.random.rand(10,10)

In [36]:
d[:,:]

array([[0.35972231, 0.15258922, 0.69425874, 0.96109249, 0.90760117,
        0.72961592, 0.67162372, 0.17809728, 0.7009859 , 0.48848933],
       [0.29762481, 0.09921372, 0.17757025, 0.36539436, 0.84495415,
        0.31729975, 0.41584211, 0.79297453, 0.18345275, 0.76039079],
       [0.91062412, 0.20315276, 0.25346568, 0.72854863, 0.51926295,
        0.58131829, 0.88008952, 0.96232768, 0.85181196, 0.57276941],
       [0.89510403, 0.34491896, 0.9073245 , 0.69231032, 0.01930403,
        0.13406048, 0.12630522, 0.81668553, 0.86414306, 0.33310459],
       [0.87936894, 0.17034617, 0.84684245, 0.8490861 , 0.5797942 ,
        0.98518158, 0.33529097, 0.87018835, 0.81055071, 0.0990761 ],
       [0.04457081, 0.11690122, 0.79744802, 0.29318921, 0.11178656,
        0.07528277, 0.97378249, 0.96274012, 0.01156743, 0.18121844],
       [0.52081969, 0.71757139, 0.84804928, 0.02563839, 0.55370346,
        0.89172239, 0.97628902, 0.50971472, 0.77637488, 0.15650307],
       [0.51996588, 0.53074958, 0.4410780

In [38]:
d[:,1]

array([0.15258922, 0.09921372, 0.20315276, 0.34491896, 0.17034617,
       0.11690122, 0.71757139, 0.53074958, 0.10162436, 0.82420017])

In [39]:
d[:,:2]

array([[0.35972231, 0.15258922],
       [0.29762481, 0.09921372],
       [0.91062412, 0.20315276],
       [0.89510403, 0.34491896],
       [0.87936894, 0.17034617],
       [0.04457081, 0.11690122],
       [0.52081969, 0.71757139],
       [0.51996588, 0.53074958],
       [0.74024636, 0.10162436],
       [0.86472646, 0.82420017]])

In [40]:
d[:2,2:4]

array([[0.69425874, 0.96109249],
       [0.17757025, 0.36539436]])

### 4. Splitting

Splits array into subarrays.

- axis=0 means vertical
- axis=1 means horizontal
- split: splits the array based on the provided axis
- hsplit: splits the horizontal axis
- vsplit: splits the vertical axis

In [68]:
d = np.random.rand(4,4)
d

array([[0.73734083, 0.09937664, 0.86569589, 0.18586836],
       [0.26556984, 0.3619676 , 0.55532087, 0.81117646],
       [0.78964541, 0.96544566, 0.49593787, 0.84101427],
       [0.49533931, 0.53737861, 0.08058059, 0.35095725]])

In [69]:
np.split(d,4,axis=0)

[array([[0.73734083, 0.09937664, 0.86569589, 0.18586836]]),
 array([[0.26556984, 0.3619676 , 0.55532087, 0.81117646]]),
 array([[0.78964541, 0.96544566, 0.49593787, 0.84101427]]),
 array([[0.49533931, 0.53737861, 0.08058059, 0.35095725]])]

In [74]:
np.vsplit(d,[2])

[array([[0.73734083, 0.09937664, 0.86569589, 0.18586836],
        [0.26556984, 0.3619676 , 0.55532087, 0.81117646]]),
 array([[0.78964541, 0.96544566, 0.49593787, 0.84101427],
        [0.49533931, 0.53737861, 0.08058059, 0.35095725]])]

In [76]:
np.hsplit(d,[1,2,3])

[array([[0.73734083],
        [0.26556984],
        [0.78964541],
        [0.49533931]]), array([[0.09937664],
        [0.3619676 ],
        [0.96544566],
        [0.53737861]]), array([[0.86569589],
        [0.55532087],
        [0.49593787],
        [0.08058059]]), array([[0.18586836],
        [0.81117646],
        [0.84101427],
        [0.35095725]])]

### 5. Concatenation

For any of these operations to work, the joining edge must have the same size.

- axis=0 means vertical
- axis=1 means horizontal
- concatenate: join numpy array along the provided axis
- hstack: joining numpy array horizontally
- vstack: joining numpy array vertically

In [43]:
a = np.zeros((2,3))
b = np.ones((2,2))
c = np.empty((1,3))

In [47]:
np.concatenate([a,b],axis=1)

array([[0., 0., 0., 1., 1.],
       [0., 0., 0., 1., 1.]])

In [51]:
np.hstack([a,b])

array([[0., 0., 0., 1., 1.],
       [0., 0., 0., 1., 1.]])

In [48]:
np.concatenate([a,c],axis=0)

array([[0.0e+000, 0.0e+000, 0.0e+000],
       [0.0e+000, 0.0e+000, 0.0e+000],
       [4.9e-324, 9.9e-324, 1.5e-323]])

In [52]:
np.vstack([a,c])

array([[0.0e+000, 0.0e+000, 0.0e+000],
       [0.0e+000, 0.0e+000, 0.0e+000],
       [4.9e-324, 9.9e-324, 1.5e-323]])

### 6. Dimension, Shape, and Size

In [77]:
d.ndim

2

In [78]:
d.shape

(4, 4)

In [79]:
d.size

16

### 7. Reshaping

- Learning algorithms expects data in certain shape & dimension
- We can use the reshaping utility converts data into a desried shape
- Size of the array need to be consistent
- -1 means calculating the right number based on the other dimension metric

In [81]:
a = np.arange(0,6)

In [82]:
a.reshape(-1,3)

array([[0, 1, 2],
       [3, 4, 5]])

In [84]:
a.reshape(3,-1)

array([[0, 1],
       [2, 3],
       [4, 5]])

In [88]:
b = np.arange(0,6).reshape(3,-1)
b.shape

(3, 2)

In [89]:
b.reshape(-1,3)

array([[0, 1, 2],
       [3, 4, 5]])

### 8. Utility Functions

Some examples are:

- max
- min
- sum
- mean
- std
- cov
- all
- any
- dot
- multiply
- sqrt

In [90]:
a = np.arange(1,5)

In [93]:
np.sum(a)

10

In [94]:
a.sum()

10

In [97]:
a = a.reshape(-1,2)

In [98]:
a.mean()

2.5

In [99]:
a.mean(axis=1)

array([1.5, 3.5])

### 9. Broadcasting

- Single row matrix are known as vectors
- Broadcasting is a technique that uses numpy to do mathematical computation on data of different shapes and dimension
- Reshaping is sometimes nedded to enable broadcasting

In [100]:
a = np.array([1,2,3])

In [101]:
a+a

array([2, 4, 6])

In [102]:
a+1

array([2, 3, 4])

In [111]:
a.reshape(3,1)+a

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

In [109]:
b = np.zeros(4).reshape(2,2)

In [110]:
b + np.arange(2)

array([[0., 1.],
       [0., 1.]])