# Numpy Tutorial

Let's start our journey of machine learning by getting know how of python's most important machine learning package i.e. **numpy**. This package is dedicated for **operations on Arrays and Matrices** and makes it easy to initialize , create and operate on arrays. It is preferred due to effeciency and performance optimization it brings in code which we will explore in the later part of this course. So, let's start by simply creating arrays and doing some basic stuff.

## One Dimensional Arrays

In [1]:
import numpy as np

# Creating a basic numpy array
array_1d = np.array([1,2,3,10,5,6], int)
# This is a basic numpy array, it has following properties associated with it:
# 
print("Array is " + str(array_1d))
print("Rank is %d" % array_1d.ndim)
print("Size of array %d" % array_1d.size)
print("Data Type of array elements %s" % str(array_1d.dtype))
print("Shape of array %s" % str(array_1d.shape))

Array is [ 1  2  3 10  5  6]
Rank is 1
Size of array 6
Data Type of array elements int64
Shape of array (6,)


Here, we saw four properties of arrays associated with numpy, which are: 
1. Rank of the array - numpy's ndarray.ndim gives rank which is number of linearly independent column vectors in matrix
2. Size of the array  - ndarray.size which gives number of elements in array
3. Data type - ndarray.dtype which gives data type of array elements
4. Shape - ndarray.shape which gives tuple in form of (r,c) i.e. number of rows and columns of array for 2D and (c,) for 1D

## Basic Operations on 1-D Arrays

### 1. Sum, Product, Mean, Variance and standard deviation of array elements

numpy provides standard built in methods that calculate sum,product, mean, variance and standard deviation on array elements which are `array.sum()` , `array.prod()`, `array.mean()` , `array.var()`, `array.std()`. Let's try these out

In [2]:
print("Sum of elements of array is %d"%array_1d.sum())

print("Product of elements of array is %d"% array_1d.prod())

print("Mean of elements of array is %d" %array_1d.mean())

print("Variance of elements of array is %d" %array_1d.var())

print("Standard deviation of elements of array is %d" %array_1d.std())

Sum of elements of array is 27
Product of elements of array is 1800
Mean of elements of array is 4
Variance of elements of array is 8
Standard deviation of elements of array is 2


### 2. Searching and Sorting and Manipulation of elements

Numpy makes it easy for searching elements matching a condition, sorting the elements and manipulating elements of array (or ndarray as of numpy). It provides powerful functions which can help you achieve these goals without using for loops as in traditional python programs. Let's explore these functionalities one by one:

In [5]:
### Searching for elements matching a condition
print("Searching for element in Array")
sel_criteria = (array_1d > 5)
print(array_1d[sel_criteria])    ## printing elements of array greator than 5.

### Sorting array elements

print('\n\n\n')

print("Sorting Array Elements")
print(sorted(array_1d))

print('\n\n\n')

print("Manipulating the Elements of Array")
### Manipulating the elements of array
## Let's say we have been given a task to make odd elements raised to power 2 and even elements raised to power 3 
## in an array. Then
print("Original Array is  : " + str(array_1d))
print("Manipulated Array is : " + str(np.where( array_1d % 2 != 0, array_1d**2, array_1d**3)))

## Here the syntax of np.where is np.where(condition, truearray, falsearray)
## i.e. if condition is true then do truearray operation on element else the operation listed in falsearray

Searching for element in Array
[10  6]




Sorting Array Elements
[1, 2, 3, 5, 6, 10]




Manipulating the Elements of Array
Original Array is  : [ 1  2  3 10  5  6]
Manipulated Array is : [   1    8    9 1000   25  216]


### 3. Basic operations between Arrays 

With numpy you can add, substract , multiply and divide the arrays easily, using basic mathematical operators. Let's try this out 

In [8]:
array_1d_another = np.array([3,4,5,6,80, 400], int)

print("The two Arrays are: " + str(array_1d) + " and " + str(array_1d_another))

print('\n\n\n')

print("Sum of two arrays")
print(array_1d + array_1d_another)

print('\n\n\n')

print("Difference of two arrays")
print(array_1d - array_1d_another)

print('\n\n\n')

print("Product of two arrays")
print(array_1d * array_1d_another)

print('\n\n\n')

print("Division of two arrays")
print(array_1d / array_1d_another)

The two Arrays are: [ 1  2  3 10  5  6] and [  3   4   5   6  80 400]




Sum of two arrays
[  4   6   8  16  85 406]




Difference of two arrays
[  -2   -2   -2    4  -75 -394]




Product of two arrays
[   3    8   15   60  400 2400]




Division of two arrays
[ 0.33333333  0.5         0.6         1.66666667  0.0625      0.015     ]


### 4. Advanced Operations on Arrays

Apart from these, you can also do advanced operations on numpy arrays. These include but not limited to:
1. Cloning an array - Make copy of any array
2. reshape an array - Change the size of array , i.e. convert 1-D array to 2-D array or change size (i.e. (r,c)) of 2-D array
3. creating arrays with random values - Use zeroes, ones or any random values to create and initialize an array


Let's explore how we can do this in numpy:

In [9]:
## Cloning an array 
array_1d_clone = array_1d.copy()
print(str(array_1d_clone) + " is the clone of " + str(array_1d))

print('\n\n\n')

## Reshaping an array - reshaping is changing size of array, you can do this easily using just one command i.e
array_1d_reshaped = array_1d.reshape(2,3) ##  Re shape array into matrix with 2 rows and 3 columns
print("Reshaped array of " + str(array_1d) + " with dimensions (2,3) is " )
print(str(array_1d_reshaped))

print('\n\n\n')

## Creating Array with all zeros 
array_zeroes = np.zeros(10, int)
print("Array of all zeros with 10 columns is " + str(array_zeroes))

print('\n\n\n')

## Creating array with all ones
array_ones = np.ones(6, int) 
print("Array of all ones with 6 columns is " + str(array_ones))


print('\n\n\n')

## Creating array with random seed
### numpy uses Mersenne Twister algorithm to generate random numbers, it takes in a seed value which if you do not 
### give, it will choose by default. based on that seed value numpy generates random numbers in range [0.0, 1.0] by
### default, if not otherwise specified. Let's see how:

array_random_default_range = np.random.rand(5)   
#np.random.rand(dimension) where dimension can be value for 1-D array and tuple for 2-D
print("Random array with 5 columns is " + str(array_random_default_range))

print('\n\n\n')


### Let's now see how to generate random integer numbers between 10 and 1000
random_integers = np.random.randint(10, 1000)
print("A random integer within range 10 - 1000 is " + str(random_integers))

[ 1  2  3 10  5  6] is the clone of [ 1  2  3 10  5  6]




Reshaped array of [ 1  2  3 10  5  6] with dimensions (2,3) is 
[[ 1  2  3]
 [10  5  6]]




Array of all zeros with 10 columns is [0 0 0 0 0 0 0 0 0 0]




Array of all ones with 6 columns is [1 1 1 1 1 1]




Random array with 5 columns is [ 0.03561329  0.0201354   0.65158174  0.72512637  0.87489558]




A random integer within range 10 - 1000 is 724


## A Note on performance

**numpy** optimizes the array and matrix calculations to great extent, to see this, let's demonstrate a simple comparison between __normal python for loop__ and __numpy way of substraction between lists__ and note time taken in both the cases. 

In [10]:
import time

list = [i for i in range(1000)]
list2 = [i for i in range(1000,2000)]

np_array1 = np.array(list)
np_array2 = np.array(list2)

st = time.time()
list3 = []
for i in range(len(list)):
    list3.append(list[i] - list2[i])
en = time.time()
print("time taken %f "% (en-st))

st1 = time.time()
np_array3 =  np_array1 - np_array2
en1 = time.time()

print("time taken %f"% (en1-st1))

print("Are both outputs equal :" + str(np_array3.tolist() == list3))


time taken 0.000412 
time taken 0.000075
Are both outputs equal :True


# As we can see above numpy calculation takes 6 times less time as taken by normal method 