# Manipulation of Data using Numpy

## About the instructor:
    
**Ts. Dr. Nur Shakirah Md Salleh**

Lead Technical Trainer - Data, Analytic and Machine Learning

airasia academy

nurshakirahmdsalleh@airasiaacademy.com

LinkedIn: [Ts. Dr. Nur Shakirah Md Salleh](https://www.linkedin.com/in/nurshakirahmdsalleh)

©2023 [airasia academy.](https://airasiaacademy.com) All rights reserved.

[NumPy](http://www.numpy.org/)  is the fundamental package for scientific computing with Python. It contains among other things:

- a powerful N-dimensional array object

- sophisticated (broadcasting) functions

- tools for integrating C/C++ and Fortran code

- useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

##Assessment
Quiz on Google Form

#Import the NumPy library

In [None]:
import numpy as np

#Create a 1D array
| index | 0 | 1 | 2 | 3 | 4 |
|---|---|---|---|---|---|
| 0 | 10 | 20 | 30 | 40 | 50 |

##Option 1 - Basic initialization

In [None]:
#list
#samplelist = []
samplearr = np.array([])
samplearr

In [None]:
samplearr.shape

In [None]:
#list
#samplelist = [10, 20, 30, 40, 50]
arr = np.array([10, 20, 30, 40, 50])
print(arr)
print(type(arr))

In [None]:
arr.shape

In [None]:
# Access the 1st element of array
print(arr[0])

In [None]:
arr[0]

In [None]:
# Access the 3rd element of array
print(arr[2])

In [None]:
# Perform arithmetic operation between the array elements
print(arr[0]+arr[4])

##Option 2 - arange

In [None]:
print(np.arange(100))

In [None]:
print(np.arange(50,100))

In [None]:
print(np.arange(20,41,5))

#Slicing arrays
Select elements from one index to another index.

Indicate start and end `[start:end]`

Indicate start and end with the customize step `[start:end:step]`

In [None]:
arr

In [None]:
# 0   1   2   3   4
#[10, 20, 30, 40, 50]
print(arr[1:5])

In [None]:
print(arr[0:5:2])

In [None]:
print(arr[2:])

In [None]:
print(arr[:2])

#Copy and View

##Copy

In [None]:
arrOri = np.array([10, 20, 30, 40, 50])
arrCopy = arrOri.copy()
print(arrOri)
print(arrCopy)

arrOri[0] = 100
print(arrOri)
print(arrCopy)

##View

In [None]:
arrOri = np.array([10, 20, 30, 40, 50])
arrView = arrOri.view()
print(arrOri)
print(arrView)

arrOri[0] = 100
print(arrOri)
print(arrView)

#Create a 2D array

| index | 0 | 1 | 2 | 3 | 4 |
|---|---|---|---|---|---|
| 0 | 10 | 20 | 30 | 40 | 50 |
| 1 | 100 | 200 | 300 | 400 | 500 |

##Option 1 - Direct initialization

In [None]:
arr2d = np.array([[10,20,30,40,50],[100,200,300,400,500]])
print(arr2d)
print(type(arr2d))

In [None]:
arr2d.shape

In [None]:
print("[0,0] = ", + arr2d[0,0])
print("[0,1] = ", + arr2d[0,1])
print("[1,0] = ", + arr2d[1,0])
print("[1,3] = ", + arr2d[1,3])

In [None]:
#     0   1   2   3   4
#0[[ 10  20  30  40  50]
#1 [100 200 300 400 500]]
print(arr2d[1, 1:4])

## Option 2 - Create a 2d array from a list

In [None]:
import numpy as np

my_list = [[10,20,30,40,50],[1,2,3,4,5]]
print(my_list, type(my_list))

array = np.array(my_list)
print(array, type(array))

##Option 3 - Generate all zeros and ones

In [None]:
print(np.zeros((4,3)))
print(np.ones((6,2)))

##Option 4 - Generate random values

In [None]:
np.random.random((3, 3))

##Option 5 - Reshape from 1D to 2D

In [None]:
arr1d = np.array([10,20,30,40,50,100,200,300,400,500])
arr2d = arr1d.reshape(2,5)
arr2d

In [None]:
arr2d2 = arr1d.reshape(5,2)
arr2d2

#Flattening Array
Reverse multidimensional array into 1D

In [None]:
arr1d = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
arr2d = arr1d.reshape(2,6)
arr3d = arr1d.reshape(3,2,2)

print('1D')
print(arr1d)
print('2D')
print(arr2d)
print('3D')
print(arr3d)

In [None]:
rearr2d = arr2d.reshape(-1)
rearr3d = arr3d.reshape(-1)

print('Reshape 2D')
print(rearr2d)
print('Reshape 3D')
print(rearr3d)

# Vectorization (element-wise arithmetic operations)

##Addition

In [None]:
# element-wise operations can't be done on ordinary python lists

my_list = [1,2,3,4,5]

my_list + my_list

try:

np.array(my_list) + np.array(my_list)

In [None]:
np.array(my_list) + np.array(my_list)

In [None]:
my_list = [1,2,3,4,5]
arr_my_list = np.array(my_list)
arr_my_list + arr_my_list

In [None]:
arr = np.array([10, 20, 30, 40, 50])
print(arr+arr)

In [None]:
arr1 = np.array([1,2,3,4])
arr2 = np.array([10,20,30,40])
arr1 + arr2

## Matrix multiplication

In [None]:
np.dot(my_list,my_list)

## Array Concatenation and Splitting

np.concatenate (axis = 1)

np.split

np.hstack

np.vstack

np.dstack

np.floor

np.hsplit

np.vsplit

np.dsplit

In [None]:
a = np.arange(5)
b = np.arange(10,15)
print(a)
print(b)

np.hstack((a,b))

In [None]:
np.vstack((a,b))

In [None]:
np.vstack((a,a))

## Data aggregation functions

NumPy provides many other aggregation functions, but we won't discuss them in detail here.
Additionally, most aggregates have a ``NaN``-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point ``NaN`` value (for a fuller discussion of missing data.
The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean (average) of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

Source: Python Data Science Handbook

In [None]:
arr2d = np.array([[10,20,30,40,50],[100,200,300,400,500]])
arr2d

In [None]:
arr2d.mean()

In [None]:
arr2d[0].mean()

In [None]:
arr2d[1].mean()

In [None]:
np.mean(arr2d)

In [None]:
np.mean(arr2d[0])

In [None]:
np.sum(arr2d[0])

In [None]:
arr2d.max()

In [None]:
np.max(arr2d)

In [None]:
arr2d.min()

In [None]:
np.min(arr2d)

In [None]:
np.argmin(arr2d)

In [None]:
np.argmax(arr2d)

#NumPy in statistics
Given an array of age. Find the following descriptive statistics:
* Total data
* Mean
* Median
* Range
* Standard deviation


In [None]:
age = np.array([30, 23, 25, 26, 25, 28, 29, 30, 30, 34])
print('Total data =',age.shape[0])
print('Total age =',age.sum())
print('Mean =',age.mean())
print('Median =',np.median(age))
print('Range =',age.max()-age.min())
print('Range =',age.ptp())
print('Standard deviation =',np.std(age))