<a href="https://colab.research.google.com/github/ketanmuddalkar/Data-Science-Basics/blob/master/NumPy_Guide.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Understanding / Cheatsheet for NumPy
---
Hopefully this will be a good guide / cheatsheet for understanding NumPy.

Documentation :-

*  Numpy :- [Docs](https://numpy.org/) [Cheatsheet](http://datacamp-community-prod.s3.amazonaws.com/ba1fe95a-8b70-4d2f-95b0-bc954e9071b0)
*  Routines :- [Cheatsheet](https://numpy.org/doc/stable/reference/routines.html)

## Numpy arrays vs Python Lists
---

Looking from a broader perspective, numpy arrays and python lists look awfully close to each other. But the truth is they are very different. The difference in performance is astronomically different. Let's run a piece of code which is doing a simple task of matrix addition and see the performance difference between the two.

In [None]:
import time
import numpy as np

size_of_vec = 1000

def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X)) ]
    return time.time() - t1

def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print("Performance of List:- ", t1)
print("Performance of Numpy:- ", t2)
print("Numpy in this example is :- " + str(t1/t2) + " faster!")

Performance of List:-  0.0002598762512207031
Performance of Numpy:-  0.00022602081298828125
Numpy in this example is :- 1.149789029535865 faster!


## Understanding the difference
---

As you can see by the results, numpy arrays outclass python lists. Let's understand how they work now. 

![Reason.png](https://drive.google.com/uc?export=view&id=12g04Lx5N6coYC9F16QZp-jdILuzvbHCc)

As you can see in the image above, while storing a numpy array you are only stroring the int values. But while storing a list there is a lot of additional meta data being stored, as a result the python interpreter has to access all these values stored at various locations and hence increasing the total time to perform such a simple operation. This is a very basic explaination for why numpy arrays are better than lists. If you want a more in-depth explaination [click here](https://python.plainenglish.io/python-list-vs-numpy-array-whats-the-difference-7308cd4b52f6).

## Getting Started
---

Let's get our hands dirty with numpy by initializing a numpy array.

In [None]:
import numpy as np

In [None]:
a = np.array([1,2,3])           # From numpy calling array() function and passing in a list of values.

a

array([1, 2, 3])

In [None]:
one_dimention_array = np.array([1,2,3])                         # Initialising 1D array.
two_dimention_array = np.array([[1,2,3],[4,5,6]])               # Initialising 2D array.

print("1D array:- \n", one_dimention_array)
print("\n2D array:- \n", two_dimention_array)

1D array:- 
 [1 2 3]

2D array:- 
 [[1 2 3]
 [4 5 6]]


## Checking the dimentions of an array
---

In [None]:
dim_1 = one_dimention_array.ndim                            # Using ndim you can cross check the dimentions of a numpy array.
dim_2 = two_dimention_array.ndim

print("Dimentions of one_dimention_array array :- ", dim_1)
print("Dimentions of two_dimention_array array :- ", dim_2)

Dimentions of one_dimention_array array :-  1
Dimentions of two_dimention_array array :-  2


## Checking for shape of an array
---

Shape -> (number of rows, number of colums)

**Please Note**  
*  If a value is missing i.e. not displayed while using .shape() it is assumed that the value is 1.
*  Generally, 1 dim arrays are displayed in multiple rows and 1 column. Hence, the shape comes out to be (nrows,).

In [None]:
shape_1 = one_dimention_array.shape                         # Use the shape function to get shape of a numpy array.
shape_2 = two_dimention_array.shape

print("Shape of one_dimention_array :- ", shape_1)
print("Shape of two_dimention_array :- ", shape_2)

Shape of one_dimention_array :-  (3,)
Shape of two_dimention_array :-  (2, 3)


## Getting data type and memory values
---

In [None]:
a = np.array([10,20,30])
b = np.array([[10,20,30],[40,50,60]])

In [None]:
a.dtype         # Getting the Datatype of numpy array. By default it is set to int64.

dtype('int64')

In [None]:
a = np.array([10,20,30], dtype='int16')     # You can set the datatype by yourself as well if you know the data well.
                                            # You can also try int32, float16, float32 and many more.
a.dtype

dtype('int16')

In [None]:
a = np.array([10,20,30], dtype='int32')                                 # Try changing the datatype to see how much memory is allocated.
memory_used_a = a.itemsize                      
memory_used_b = b.itemsize

print("Memory used to store 1 element in a is :- " + str(memory_used_a) + " Bytes")        # Since a array is of datatype int32 it takes 4 bytes to store one element.    
print("Memory used to store 1 element in b is :- " + str(memory_used_b) + " Bytes")        # Since b array is of datatype int64 it takes 8 bytes to store one element.
print("Number of elements in a :- " + str(a.size))                      # Simply returns total number of elements.
print("Number of elements in b :- " + str(b.size))

Memory used to store 1 element in a is :- 4 Bytes
Memory used to store 1 element in b is :- 8 Bytes
Number of elements in a :- 3
Number of elements in b :- 6


**Please Note**  
Don't get confused between *itemsize* and *size*. *itemsize* returns the **memory allocated** where as *size* return the number of elements in the specified array.

In [None]:
# To get total memory allocated 

total_memory_a = a.nbytes
total_memory_b = b.nbytes

print("Total memory used by array a is :- " + str(total_memory_a) + " bytes")       # 3 elements * 4 bytes = 14 bytes.
print("Total memory used by array b is :- " + str(total_memory_b) + " bytes")       # 6 elements * 8 bytes = 488 bytes.

Total memory used by array a is :- 12 bytes
Total memory used by array b is :- 48 bytes


## Accessing and Changing specific element / row / column
---

[Cheatsheet](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html)

In [None]:
a = np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])            # Initializing 2D array (2 rows x 7 rows).
print("Array :- \n", a)
print ("Shape of a :- ", a.shape)

Array :- 
 [[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14]]
Shape of a :-  (2, 7)


In [None]:
# Accessing specific element. (Try to access '13').
num_1 = a[1,5]                 # array_name[row,column].
num_2 = a[1,-2]                # You can also you negative indexing to access the same value.

print(num_1, num_2)

13 13


In [None]:
# Accessing specific row.
a[0,:]                         # Use the list notation to indicate that we need all the columns.

array([1, 2, 3, 4, 5, 6, 7])

In [None]:
# Accessing specific column.
a[:,3]

array([ 4, 11])

In [None]:
# Getting a bit fancy.  
a[0,1:6:2]                     # Similar to lists, you can use [startindex:endindex:step] notation to get specific values.

array([2, 4, 6])

In [None]:
# Changing element.
a[1,5] = 20                    # Just set the element to whatever you want.
a                              # You can do the same for entire row / column. 

array([[ 1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 20, 14]])

In [None]:
# 3D Example.
b = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
b

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

In [None]:
# Accessing '4'. (Work your way from outer layer to inner layer).
b[0,1,1]

4

In [None]:
# Replacing is similar to previous one.
b[0,1,1] = 9
b

array([[[1, 2],
        [3, 9]],

       [[5, 6],
        [7, 8]]])

## Initializing different types of array
---

Array Creation routine:- [Cheatsheet](https://numpy.org/doc/stable/reference/routines.array-creation.html)

In [None]:
# Array containing all 0s.
one_d = np.zeros(5)
two_d = np.zeros((2,3))
three_d = np.zeros((2,3,3))

print("1D :- \n" + str(one_d) + "\n\n")
print("2D :- \n" + str(two_d) + "\n\n")
print("3D :- \n" + str(three_d))

1D :- 
[0. 0. 0. 0. 0.]


2D :- 
[[0. 0. 0.]
 [0. 0. 0.]]


3D :- 
[[[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]

 [[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]]


In [None]:
# Array containing all 1s.
np.ones((2,3,3), dtype='float32')

array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]], dtype=float32)

In [None]:
# Array with specific number.

np.full((3,3), 69)              # You can pass an array instead of (3,3) to form a new array with same dimensions as the specified array.
                                # Usage :- np.full(a.shape, 69).

array([[69, 69, 69],
       [69, 69, 69],
       [69, 69, 69]])

In [None]:
# Array with random numbers.
np.random.rand(4,2,3)          # If you want to pass a shape use np.random.random_sample(a.shape).

array([[[0.610639  , 0.70777587, 0.53239181],
        [0.59905262, 0.68361608, 0.14863471]],

       [[0.76555806, 0.04039051, 0.19345001],
        [0.98565717, 0.6436131 , 0.73355561]],

       [[0.60797998, 0.95085854, 0.19662676],
        [0.74343957, 0.63057859, 0.38940108]],

       [[0.99390018, 0.25637464, 0.56332034],
        [0.3177808 , 0.20375979, 0.16547644]]])

In [None]:
# Random integer values.
np.random.randint(0, 7, size=(3,3))         # np.random.randint(start(inclusive),end(exclusive), size='<shape of you matrix>').

array([[4, 2, 1],
       [5, 3, 6],
       [0, 2, 3]])

In [None]:
# Identity matrix
np.identity(3)                              # Since identity matrix are always square it only requires 1 parameter.

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

## Maths with numpy
---
Mathematical Functions:- [Cheatsheet](https://numpy.org/doc/stable/reference/routines.math.html)

In [None]:
a = np.array([1,2,3,4])
a

array([1, 2, 3, 4])

In [None]:
# Adds 2 to each element
a + 2

array([3, 4, 5, 6])

In [None]:
# Subtracts 2 from each element
a - 2

array([-1,  0,  1,  2])

In [None]:
# Multiplies 2 to each element
a * 2

array([2, 4, 6, 8])

In [None]:
# Divides each element by 2
a / 2

array([0.5, 1. , 1.5, 2. ])

In [None]:
# Matrix Addition
b = np.array([1,0,1,0])                 
a + b

array([2, 2, 4, 4])

In [None]:
# Each element to the power 2
a ** 2

array([ 1,  4,  9, 16])

## Linear Algebra with numpy
---
Linear Algebra:- [Cheatsheet](https://numpy.org/doc/stable/reference/routines.linalg.html)  
There is a lot more you can do. Please checkout the cheatsheet for more functions it is highly recommended.

Things it can do:-  
*  Determinants
*  Trace
*  Singular Vector Decomposition
*  Eigen values / Eigen vectors
*  Matrix Norm
*  Inverse Matrix
*  And much more...

In [None]:
# Creating 2 arrays.
a = np.ones((2,3))
print("A :- \n", a)

b = np.full((3,2), 2)
print(" B:- \n", b)

A :- 
 [[1. 1. 1.]
 [1. 1. 1.]]
 B:- 
 [[2 2]
 [2 2]
 [2 2]]


In [None]:
# Matrix Multiplication. (Uses matrix multiplication rules)
np.matmul(a,b)              # multiplying a with b

array([[6., 6.],
       [6., 6.]])

In [None]:
# Finding determinant.
c = np.identity(3)
np.linalg.det(c)

1.0

In [None]:
# Getting the Eigen values.
d = np.array([[2,4],[8,4]])

np.linalg.inv(d)

array([[-0.16666667,  0.16666667],
       [ 0.33333333, -0.08333333]])

## Statistics with numpy
---


In [None]:
stats = np.array([[1,2,3],[4,5,6]])
stats

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
# Finding min and max values.
min_value = np.min(stats)
max_value = np.max(stats)

print("Min Value :- ", min_value)
print("Max Value :- ", max_value)

Min Value :-  1
Max Value :-  6


In [None]:
# Sum of matrix.
sum = np.sum(stats)                     # If you pass in 1 arg it'll calculate sum of all elements. 
                                        # If you pass 2 or more args it'll do matrix addition.
print("Sum of all the elements :- ", sum)

Sum of all the elements :-  21


## Reorganising Arrays
---

Cheatsheet :- [Link](https://numpy.org/doc/stable/reference/routines.array-manipulation.html)

In [None]:
# Reshaping the structure of numpy array.
before = np.array([[1,2,3,4],[5,6,7,8]])
print("Shape Before :- ", before.shape)
print("Array :- \n", str(before))

print("\n")

after = before.reshape((4,2))               # Reshape allows you to re-order the elements into other shape.


print("Shape After :- ", after.shape)
print("Array :- \n", str(after))

Shape Before :-  (2, 4)
Array :- 
 [[1 2 3 4]
 [5 6 7 8]]


Shape After :-  (4, 2)
Array :- 
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]


In [None]:
# Vertically Stacking two arrays.
v1 = np.array([1,2,3,4])
v2 = np.array([5,6,7,8])

np.vstack([v1,v2])

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [None]:
# You can also do this.
np.vstack([v1,v2,v1,v2]) 

array([[1, 2, 3, 4],
       [5, 6, 7, 8],
       [1, 2, 3, 4],
       [5, 6, 7, 8]])

In [None]:
# Horizontal Stacking.
h1 = np.ones((2,4))
h2 = np.zeros((2,2))

np.hstack([h1,h2])

array([[1., 1., 1., 1., 0., 0.],
       [1., 1., 1., 1., 0., 0.]])