# PyCon2019: Hello World of Machine Learning using Scikit-learn


## [4.1] - Prerequisite: Introduction to NumPy

<br/><br/>

___NumPy (Numeric Python) is the first step towards your journey on Machine Learning with Python Programming Language. NumPy is used for creating N-D arrays. Here is how we can use the same___

<br/><br/>

In [1]:
import numpy as np

In [2]:
# NumPy array

n_arr = np.array([1,2,3,4,5])

<br/><br/><br/>

In [3]:
# Python array
p_arr = [1,2,3,4,5]

<br/><br/>

In [4]:
print("Python Array => ",p_arr)
print("NumPy Array => ",n_arr)

Python Array =>  [1, 2, 3, 4, 5]
NumPy Array =>  [1 2 3 4 5]


<br/><br/><br/>

### How its different from Python List?


This is the first question that comes to everyone's mind and here is why _NumPy_ is obvious choice

- (1) NumPy N-D arrays takes less memory for storing data
- (2) NumPy makes it easy to perform mathematical operations

<br/>

#### Memory Usage

In [5]:
len(p_arr)

5

In [6]:
import sys
print(len(p_arr) * sys.getsizeof(1))
print(len(n_arr) * sys.getsizeof(1))

140
140


<br/><br/>

_Let's check the size of NumPy array_

In [7]:
len(n_arr)

5

In [8]:
n_arr.size

5

In [9]:
n_arr.itemsize

8

In [10]:
n_arr.size * n_arr.itemsize

40

<br/><br/>

_Futher optimization is also possible with NumPy_

In [11]:
# **Paul's Notes**: if you specify data types you can save in more memory.
n_arr = np.array([1,2,3,4,5], dtype=np.int8)

In [12]:
n_arr.size * n_arr.itemsize

5

<br/><br/><br/><br/>

### There is NO magic behind reduced size

Unlike Python lists, NumPy arrays are __Homogeneous__ i.e all elements of the array are of same type

In [13]:
p_arr = [1,  2.0, "Hello", "World", 56000]

In [14]:
p_arr

[1, 2.0, 'Hello', 'World', 56000]

<br/>

In [15]:
n_arr = np.array([1,2.0, "Hello", "World", 56000])

<br/><br/><br/><br/><br/>

In [16]:
n_arr

array(['1', '2.0', 'Hello', 'World', '56000'], dtype='<U32')

In [17]:
n_arr.dtype

dtype('<U32')

In [18]:
n_arr[0] * 10

'1111111111'

<br/><br/><br/>

### Exercise

Create a NumPy array with a combination of integer and floating point values and check the resultant data type of the NumPy array

In [19]:
test_np_array = np.array([1, 2, 23.7, 4, 15.0004, 12, 56000])
print(test_np_array)
print(test_np_array.dtype)

[1.00000e+00 2.00000e+00 2.37000e+01 4.00000e+00 1.50004e+01 1.20000e+01
 5.60000e+04]
float64


<br/><br/><br/>

#### Mathematical Operations

In [20]:
p_arr = [1,2,3,4,5]
n_arr = np.array([1,2,3,4,5])

In [21]:
p_arr * 2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

In [22]:
n_arr * 2

array([ 2,  4,  6,  8, 10])

<br/><br/><br/><br/>

In [23]:
p_arr = [[1,2],
       [3, 4]]

n_arr = np.array([[1,2],
                 [3,4]])

In [24]:
p_arr * 2

[[1, 2], [3, 4], [1, 2], [3, 4]]

In [25]:
n_arr * 2


array([[2, 4],
       [6, 8]])

<br/><br/><br/>

### How Data is represented / used in Machine Learning

<br/>

In programming, we're used to see the arrays as a row numbers

<br/><br/>

In [26]:
n_arr = np.array([1,2,3,4,5,6,7])

In [27]:
n_arr

array([1, 2, 3, 4, 5, 6, 7])

<br/><br/><br/>

But in Machine learning, we generally represent the numbers as columns

In [28]:
# **Paul's Notes:** the -1 here means (let numpy figure out how many are needed)
# so here, we're saying numpy can figure out the number of rows, I just want one column
n_arr.reshape(-1,1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7]])

In [29]:
n_arr.shape

(7,)

In [30]:
n_arr.ndim

1

___Vector  (Row Vectors and Column Vectors)___

<br/><br/><br/>

In [31]:
n_arr = np.array([[1,2],
                 [3,4]])

In [32]:
n_arr

array([[1, 2],
       [3, 4]])

<br/>

In [33]:
n_arr.shape, n_arr.ndim

((2, 2), 2)

___Matrix___

<br/><br/><br/>

In [34]:
# **Paul's Notes:** the last parameter below--step size--is implicitly 1 unless changed
n_arr = np.arange(0,16,1)
n_arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [35]:
n_arr = n_arr.reshape(4,2,2)

In [36]:
# **Paul's Notes:** vectors of 1-dimension are called tensors of rank 1
# matrixes (2-dimension) are called tensors of rank 2, etc., etc. 
n_arr

array([[[ 0,  1],
        [ 2,  3]],

       [[ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15]]])

In [37]:
n_arr.shape

(4, 2, 2)

In [38]:
n_arr.ndim

3

<br/>

___Tensor___

<br/><br/><br/>

### Generating NumPy arrays without predefined static data

Till now we've created an array with predefined data, let's see how we can use different numpy functions to create N-D arrays

__Using numpy.empty(...)__

In [39]:
n_arr = np.empty(5)
n_arr

array([4.9e-324, 9.9e-324, 1.5e-323, 2.0e-323, 2.5e-323])

In [40]:
n_arr = np.empty((5,2))
n_arr

array([[ 2.31584178e+077,  2.31584178e+077],
       [-3.95252517e-323,  0.00000000e+000],
       [ 5.45352918e-312,  1.26480805e-321],
       [ 0.00000000e+000,  0.00000000e+000],
       [ 1.77229088e-310,  3.50977866e+064]])

<br/><br/><br/>

#### Exercise

- 1) Convert this 2-D array into a 1-D array
- 2) Change the dimension to 2,5

In [41]:
print(n_arr.reshape(1,-1))
print(n_arr.reshape(2,5))


[[ 2.31584178e+077  2.31584178e+077 -3.95252517e-323  0.00000000e+000
   5.45352918e-312  1.26480805e-321  0.00000000e+000  0.00000000e+000
   1.77229088e-310  3.50977866e+064]]
[[ 2.31584178e+077  2.31584178e+077 -3.95252517e-323  0.00000000e+000
   5.45352918e-312]
 [ 1.26480805e-321  0.00000000e+000  0.00000000e+000  1.77229088e-310
   3.50977866e+064]]


<br/><br/><br/><br/>

### Other utility functions
__using numpy.zeros(...)__

In [42]:
n_arr = np.zeros(5)
n_arr

array([0., 0., 0., 0., 0.])

In [43]:
arr = np.zeros((5,2))
arr

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

In [44]:
arr = np.zeros((5,2)).reshape(10,1)
arr

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.]])

<br/><br/>

__Using np.ones(...)__

In [45]:
arr = np.ones(5)
arr

array([1., 1., 1., 1., 1.])

In [46]:
arr = np.ones((5,2))
arr

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [47]:
arr = np.ones((5,2)).reshape(1,10)
arr

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])


<br/>
<br/>

In [48]:
arr.fill(23)

In [49]:
arr

array([[23., 23., 23., 23., 23., 23., 23., 23., 23., 23.]])