# <center>**Numpy Introduction**</center>
---

<br>

<img src="./images/python_ecosystem.JPG" />|

<img src="./images/python_etl_process.png" />

## **Numpy**
#### Numpy is short for Numerical Python. It is the fundamental package required for high performance scientific computing and data analysis. 

Some uses :

- `ndarray`, a fast and space-efficient multidimensional array for large data
- providing vectorized arithmetic operations and sophisticated broadcasting capabilities.
- Standard mathematical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra, random generation, and Fourier transform capabilities.


In [None]:
# To install numpy in jpuyter notebook.
! pip install numpy

## Importing package

In [2]:
import numpy as np

### The NumPy ndarray : A Multidimensional object

A numpy array is a grid of values, all of the `same type`, and is indexed by a tuple of nonnegative integers. 
Every object has 
- a shape = Shape is a tuple giving size of each dimension
- and a dtype = data type. 

Array() tries to infer a good type, if not given explicitly.

<img center src='./images/dtype-hierarchy.png' />

<img center src='./images/numpy-dtypespng.png' />

### Creating 1-D array

In [3]:
a1 = np.array([1,2,3,4,5])
print(a1)

[1 2 3 4 5]


In [4]:
a1.dtype

dtype('int32')

### Creating 2-D array

In [5]:
a2 = np.array([[1,2,3],[4,7,9]])
print(a2)

[[1 2 3]
 [4 7 9]]


### Creating a 2-D array filled with zeroes

In [8]:
a3 = np.zeros((10,5))
print(a3)
print(a3.shape)
print(a3.ndim)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
(10, 5)
2


### Generating numbers between a range with a specific difference

In [1]:
import numpy as np
np.arange(10,47,5)

array([10, 15, 20, 25, 30, 35, 40, 45])

### Generating **n** numbers between a range

In [10]:
np.linspace(0,2,9)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

### Generating random sample of given dimentions
**Note - Random numbers are generated between 0 to 1. Multiply them with an appropriate factor to get required outcome.**

In [3]:
np.random.random_sample((3, 2)) 

array([[0.38362534, 0.13004178],
       [0.42160274, 0.9735344 ],
       [0.05711392, 0.95676993]])

### Inspecting dimenSions for the array

In [12]:
a3.shape

(10, 5)

### Basic array operations
**Note - The operations are vectorised**


In [15]:
x = np.array([2,4,8,16])
y = np.array([1,1,0,1])

In [16]:
print(x + y)
print(x - y)
print(x * y)

[ 3  5  8 17]
[ 1  3  8 15]
[ 2  4  0 16]


In [17]:
z = np.array([[10,20,30,40],[100,200,300,400]])
y = np.array([1,1,0,1])
print(z + y)
print(z - y)
print(z * y)

[[ 11  21  30  41]
 [101 201 300 401]]
[[  9  19  30  39]
 [ 99 199 300 399]]
[[ 10  20   0  40]
 [100 200   0 400]]


### Matrix multiplication

In [19]:
s = np.array([1,2])
t = np.array([[10,20,30,40],[100,200,300,400]])
print(s.shape)
print(t.shape)
print(np.matmul(s,t))

(2,)
(2, 4)
[210 420 630 840]


### Advanced array operations

In [18]:
a4 = np.random.random_sample(10)
print(a4)
print(np.min(a4))
print(np.max(a4))
print(np.sum(a4))
print(np.cumsum(a4))
print(np.sqrt(a4))
print(np.log(a4))

[0.10718708 0.50895302 0.87854341 0.30093105 0.45284027 0.58379396
 0.16543988 0.19068959 0.68889679 0.12977605]
0.10718708158331836
0.8785434137422441
4.007051106185404
[0.10718708 0.61614011 1.49468352 1.79561457 2.24845484 2.8322488
 2.99768868 3.18837827 3.87727505 4.00705111]
[0.32739438 0.71340944 0.93730647 0.54857183 0.67293407 0.76406411
 0.40674301 0.43668019 0.82999806 0.36024443]
[-2.23317955 -0.67539956 -0.12948995 -1.20087411 -0.79221583 -0.53820716
 -1.79914742 -1.65710836 -0.37266382 -2.04194499]


### Logical operations

In [9]:
a=np.array([1,7,9])
b=np.array([[1,2,9],[5,8,7]])

print(a==b)
print(b>4)

[[ True False  True]
 [False False False]]
[[False False  True]
 [ True  True  True]]


### Subset and slicing

In [10]:
# given the array c, get
# get  second row 
# top right 2 X 2 matrix
# get  third column 

c=np.array([[4,5,10],[5,10,15],[7,8,3],[4,6,9],[10,15,20]])
#print(c)
print(c[:,:])
print(c.shape)
print(c[0:2,1:3])
print(c[:,2])
#print(c[[0,1,2,3,4],[2,2,2,2,2]])
print(c[:,:])

[[ 4  5 10]
 [ 5 10 15]
 [ 7  8  3]
 [ 4  6  9]
 [10 15 20]]
(5, 3)
[[ 5 10]
 [10 15]]
[10 15  3  9 20]
[[ 4  5 10]
 [ 5 10 15]
 [ 7  8  3]
 [ 4  6  9]
 [10 15 20]]


In [20]:
a = np.array([1, 2, 3, 4, 5])
b = a[1:4]
b[0] = 200
print(a[1])

200


### Advantage of Numpy Arrays

In [19]:
import time
size_of_vec = 100000

In [23]:
#Add two big vectors
t1 = time.time()
X = range(size_of_vec)
Y = range(size_of_vec)
Z_list = []
for i in range(len(X)):
    Z_list.append(X[i] + Y[i])
time.time() - t1
#print(Z_list[:50])

0.15194392204284668

In [24]:
t1 = time.time()
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)
Z_numpy = X + Y
float(time.time() - t1)

0.004049062728881836

In [58]:
Z_list == Z_numpy

array([ True,  True,  True, ...,  True,  True,  True])

### Broadcasting
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes 
 when performing arithmetic operations. Frequently we have a smaller array and a larger array, 
and we want to use the smaller array multiple times to perform some operation on the larger array

In [5]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1,0,1])
y = x + v  # Add v to each row of x using broadcasting
print(y) 

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


In [1]:
from IPython.display import Image
Image(url='http://scipy-lectures.github.io/_images/numpy_broadcasting.png',
     width=720)

In [5]:
import numpy as np
x = np.array([1, 2, 3, 4, 5, 6], dtype=np.int32)
y = x[::-1]
y

array([6, 5, 4, 3, 2, 1])

In [6]:
y.strides

(-4,)

In [21]:
x = np.zeros((10, 10, 10), dtype=np.float)
x.strides
#x[::2,::3,::4].strides

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  x = np.zeros((10, 10, 10), dtype=np.float)


(800, 80, 8)

In [8]:
a = np.arange(6, dtype=np.int8).reshape(3, 2)
b = a.T
b.strides

(1, 2)

In [None]:
x = np.array([1, 2, 3, 4], dtype=np.int16)
x2 = as_strided(x, strides=(0, 1*2), shape=(3, 4))
x2

In [10]:
from numpy.lib.stride_tricks import as_strided

In [11]:
y = np.array([5, 6, 7], dtype=np.int16)
y2 = as_strided(y, strides=(1*2, 0), shape=(3, 4))
y2

array([[5, 5, 5, 5],
       [6, 6, 6, 6],
       [7, 7, 7, 7]], dtype=int16)

In [14]:
x = np.array([1, 2, 3, 4], dtype=np.int16)
x2 = as_strided(x, strides=(0, 1*2), shape=(3, 4))
x2

array([[1, 2, 3, 4],
       [1, 2, 3, 4],
       [1, 2, 3, 4]], dtype=int16)

In [15]:
y = np.array([5, 6, 7], dtype=np.int16)
y2 = as_strided(y, strides=(1*2, 0), shape=(3, 4))
y2

array([[5, 5, 5, 5],
       [6, 6, 6, 6],
       [7, 7, 7, 7]], dtype=int16)

In [16]:
x2 + y2

array([[ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11]], dtype=int16)

In [17]:
x = np.array([1, 2, 3, 4], dtype=np.int16)
y = np.array([5, 6, 7], dtype=np.int16)
x[np.newaxis,:] * y[:,np.newaxis]

array([[ 5, 10, 15, 20],
       [ 6, 12, 18, 24],
       [ 7, 14, 21, 28]], dtype=int16)

<img center src='./images/axis_0.jpg' />

In [18]:
np.random.seed(1234)
marks = np.random.randint(20,100,size=(4,6))
print(marks)
print('Minimum marks in the subject : ',marks.min(axis=0))
print('Maximum marks in the subject :', marks.max(axis=0))
print('Total marks for the student :', marks.sum(axis=1))

[[67 58 73 96 44 35]
 [69 43 46 50 63 50]
 [46 78 89 93 67 70]
 [96 57 54 58 87 31]]
Minimum marks in the subject :  [46 43 46 50 44 31]
Maximum marks in the subject : [96 78 89 96 87 70]
Total marks in the subject : [373 321 443 383]


##### <font color='red'> Student Activity Question 6 [5 mins]</font>
    * iris is a popular dataset often used in R and Python for learning data manipulation and visualization
    * It has about 150 different measuremets of iris flower. 
    * Attributes which are measured are sepal length, sepal width, petal length and petal width
    * you can load that dataset from popular ML library scikit learn using the code below (Copy or type the code in python cell block)

```python
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data

```
**Questions**
1. print the type of data
2. print shape of this array X
2. Noting that first 2 columns are sepal data and last 2 columns are petal data, slice a sub-array whith only petal data and measuremets (rows) from 100 to 140. Call this submatrix as Xsub and print it explicitly in the cell block
4. print the column-wise maximum and minimum of the the submatrix

In [25]:
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
print(X.shape)
Xsub=X[100:141,2:4]
print(Xsub.shape)

print(np.max(X,axis=0))
print(np.min(X,axis=0))

(150, 4)
(41, 2)
[7.9 4.4 6.9 2.5]
[4.3 2.  1.  0.1]
