# Data Manipulation with Numpy

**NumPy** is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays

In [None]:
# uncomment, the below line and run the cell to install numpy
# !pip install numpy

1. Import numpy

In [None]:
# import numpy
import numpy as np
from numpy import array


2. create a one dimensional array

In [None]:
# create array
arr = np.array([2, 3, 4, 1, 0, -2, 7, 8, 9, 12])

In [None]:
arr[3:]

array([ 1,  7,  8,  9, 12])

In [None]:
"""for i in arr:
    print(i)"""

'for i in arr:\n    print(i)'

In [None]:
#dir(arr)

check the dimension of the array

In [None]:
# check the dimension of the array
arr.ndim

1

check the size of the array

In [None]:
# size of the array
arr.size

8

In [None]:
len(arr)

8

check the shape of the array

In [None]:
# shape of the array
arr.shape

(8,)

Identify the type of the array

In [None]:
# idntify the type of the array
arr.dtype

dtype('int64')

In [None]:
print(arr)

[ 2  3  4  1  7  8  9 12]


sum of the array elements

In [None]:
# sum of array elements
sum(arr)

44

min and max elements from the array

In [None]:
# print min and max
min(arr)

-2

sort array elements


In [None]:
# sort array
sorted(arr, reverse=True)

[12, 9, 8, 7, 4, 3, 2, 1, 0, -2]

3. Create two dimensional array

In [None]:
# create two dimensional array
arr_2d = np.array([
    [2, 3, 6],
    [3, 4, 5],
    [3, 2, 4]
])


In [None]:
arr_2d.ndim

2

In [None]:
arr_2d.size

9

In [None]:
arr_2d.shape

(3, 3)

In [None]:
arr_2d[2][2]

4

4. another way of create an array using `arange()`

In [None]:
# another way of creating an array
# arange() function
arr_a = np.arange(0, 10, dtype="float64").reshape(2, 5)
arr_a

array([[0., 1., 2., 3., 4.],
       [5., 6., 7., 8., 9.]])

5. specify the data type of the array while creating the array using `dtype`

In [None]:
ar = np.array([2.0, 3.1,4], dtype='float64')
ar

array([2. , 3.1, 4. ])

# Part II
- adding
- removing

array elements


In [None]:
# import the numpy
import numpy as np

Let's create both one and two dimensional array

In [None]:
# create one dimensional array
arr_1d = np.arange(2, 21, 2)

# create two dimensional array
arr_2d = np.arange(1,16).reshape(5, 3)


In [None]:
print(arr_1d)

[ 2  4  6  8 10 12 14 16 18 20]


In [None]:
print(arr_2d)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]]


## `append()` - appends values to the end of an array

In [None]:
# append new values into 1D array

print("1D array before append:", arr_1d, arr_1d.size)
print(np.append(arr_1d, 40))
print(np.append(arr_1d, np.arange(1, 5)))

1D array before append: [ 2  4  6  8 10 12 14 16 18 20] 10
[ 2  4  6  8 10 12 14 16 18 20 40]
[ 2  4  6  8 10 12 14 16 18 20  1  2  3  4]


In [None]:
# append new values into 2D with column wise
print("2D before appended:",arr_2d)

print(np.append(arr_2d, [[20, 30, 40]], axis=0))

print(np.append(arr_2d, np.arange(1, 7).reshape(2, 3), axis=0))

2D before appended: [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]]
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]
 [20 30 40]]
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]
 [ 1  2  3]
 [ 4  5  6]]


In [None]:
# append new vlaues into 2D with row wise
print("2D before appended:",arr_2d)

print(np.append(arr_2d, [[20], [30], [40], [20],[15]], axis=1))

print(np.append(arr_2d, np.arange(1, 11).reshape(5, 2), axis=1))

2D before appended: [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]]
[[ 1  2  3 20]
 [ 4  5  6 30]
 [ 7  8  9 40]
 [10 11 12 20]
 [13 14 15 15]]
[[ 1  2  3  1  2]
 [ 4  5  6  3  4]
 [ 7  8  9  5  6]
 [10 11 12  7  8]
 [13 14 15  9 10]]


## `insert()` - insert elements into an array

In [None]:
# insert values into 1D array
print("1D array before insert new element:", arr_1d, arr_1d.shape)

np.insert(arr_1d, 1, np.arange(22, 25))

1D array before insert new element: [ 2  4  6  8 10 12 14 16 18 20] (10,)


array([ 2, 22, 23, 24,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [None]:
# insert values into 2D with at different points
print(arr_2d, arr_2d.shape)

print("after inserted")
np.insert(arr_2d, 2, 66, axis=1)
print("at different points")
np.insert(arr_2d, (0, 3), 20, axis=1)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]] (5, 3)
after inserted
at different points


array([[20,  1,  2,  3, 20],
       [20,  4,  5,  6, 20],
       [20,  7,  8,  9, 20],
       [20, 10, 11, 12, 20],
       [20, 13, 14, 15, 20]])

In [None]:
# insert into 2D with scalars
np.insert(arr_2d, [2], [[2], [4], [3], [5], [6]], axis=1)

array([[ 1,  2,  2,  3],
       [ 4,  5,  4,  6],
       [ 7,  8,  3,  9],
       [10, 11,  5, 12],
       [13, 14,  6, 15]])

## `delete()` - delete elements from an array

In [None]:
# delete any element from the array
# deleting from 1D

print(arr_1d)
np.delete(arr_1d, [1, 4, 7])


[ 2  4  6  8 10 12 14 16 18 20]


array([ 2,  6,  8, 12, 14, 18, 20])

In [None]:
# deleting from 2D
print(arr_2d)
print("after deleting")
np.delete(arr_2d, 2, 1)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]]
after deleting


array([[ 1,  2],
       [ 4,  5],
       [ 7,  8],
       [10, 11],
       [13, 14]])

# Part III


# Data Manipulation with Numpy

# Statistics I
from this tutorial you w'll be able to learn how to:
- calculate the median
- calculate mean
- calculate weighted average

In [None]:
# import numpy
import numpy as np

Create 2D array

In [None]:
# create 2D array
arr_2d = np.arange(1, 50, 2).reshape(5, 5)
arr_2d

array([[ 1,  3,  5,  7,  9],
       [11, 13, 15, 17, 19],
       [21, 23, 25, 27, 29],
       [31, 33, 35, 37, 39],
       [41, 43, 45, 47, 49]])

In [None]:
arr_2d.shape

(5, 5)

In [None]:
arr_2d.size

25

1. Compute the median of array elements along specified axis

**Note**: Given a vector `V` of length `N`, the median of `V` is the middle value of a sorted copy of `V`, `V_sorted` - i e., `V_sorted[(N-1)/2]`, when `N` is odd, and the average of the two middle values of `V_sorted` when `N` is even.

In [None]:
# find the median
np.median(arr_2d)

25.0

In [None]:
arr_2d

array([[ 1,  3,  5,  7,  9],
       [11, 13, 15, 17, 19],
       [21, 23, 25, 27, 29],
       [31, 33, 35, 37, 39],
       [41, 43, 45, 47, 49]])

In [None]:
np.median(arr_2d, axis=1)

array([ 5., 15., 25., 35., 45.])

2. Find the mean of array elements along specified axis

In [None]:
# find the mean
np.mean(arr_2d, axis=1)

array([ 5., 15., 25., 35., 45.])

3. Find weighted average of array elements along specified axis

An array of weights associated with the values in `array_2d`. Each value in `arr_2d` contributes to the average according to its associated weight. The weights array can either be `1-D` (in which case its length must be the size of a along the given axis) or of the same shape as `arr_2d`. If weights=None, then all data in `arr_2d` are assumed to have a weight equal to one. The 1-D calculation is:

`average = sum(array_2d * weights) / sum(weights)`

In [None]:
arr_2d

array([[ 1,  3,  5,  7,  9],
       [11, 13, 15, 17, 19],
       [21, 23, 25, 27, 29],
       [31, 33, 35, 37, 39],
       [41, 43, 45, 47, 49]])

In [None]:
# find the average
np.average(arr_2d, axis=1, weights=[1, 2, 0, 3, 5])

array([ 6.63636364, 16.63636364, 26.63636364, 36.63636364, 46.63636364])


# Quantiles and Percentile with Numpy

In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created.

Let's assume you have a dataset which stores list of person's heights. The height is in cm which are stored in a variable named heights.

- Use `Numpy’s quantile()` function to find the first, second, third quartile.

Store the result in a variable named `height_q1, height_q2, height_q3`.

In [None]:
heights = [80, 150, 155, 160, 156,
           180, 134, 131, 123, 125,
           165, 146, 130, 150, 140,
           19, 170, 164, 166, 153,
           145, 158, 200, 300]

In [None]:
import numpy as np


In [None]:
q1 = np.quantile(heights, 0.25)
q2 = np.quantile(heights, 0.50)
q3 = np.quantile(heights, 0.75)


In [None]:
q1, q2, q3

(133.25, 151.5, 164.25)

What is your height in cm?
Store that value in a variable named `my_height`.

Does that your height fall in the `first, second, third, or fourth quarter` of the data?

What is the IQR?

The `interquartile range (IQR)` is a descriptive statistic that tries to solve this problem. The IQR ignores the tails of the dataset, so you know the range around-which your data is centered.

#

# `copy()` vs. `view()` in Numpy Array


after this tutorial you will understand about:-
- the differences and
- the similarity between `copy` and `view`
----

- view - shallow copy
- copy - deep copy

In [None]:
import numpy as np

arr_2d = np.arange(10).reshape(2, 5)
arr_2d

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

view the original array

In [None]:
# view of the original array
v = arr_2d.view()
v

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

change value at row index 0 and column index 1 of the view data

In [None]:
# change value of the view data at [0][1]
v[0][1] = 20
v

array([[ 0, 20,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

In [None]:
arr_2d

array([[ 0, 20,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

change the value at row index 1 and column index 2 of the original data

In [None]:
# change value of the view data at [1][2]
arr_2d[1][2] = 30
arr_2d

array([[ 0, 20,  2,  3,  4],
       [ 5,  6, 30,  8,  9]])

In [None]:
v

array([[ 0, 20,  2,  3,  4],
       [ 5,  6, 30,  8,  9]])

copy the original array

In [None]:
# copy of the original array
c = arr_2d.copy()
c

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [None]:
c[0][1] = 90
c

array([[ 0, 90,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

In [None]:
arr_2d[1][2] = 15
arr_2d

array([[ 0,  1,  2,  3,  4],
       [ 5,  6, 15,  8,  9]])

In [None]:
c

array([[ 0, 90,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

In [None]:
id(c)

140162769876816

In [None]:
id(arr_2d)

140162769875568

---

# NumPy matrix multiplication

How to multiply matrices?

>A matrix is a 2-D array. Each element in the array has two indices.

In [None]:
import numpy as np

In [None]:
# let's have matrix A and B
A = np.array([[0, 3], [1, -1], [2, 1], [5, 2]])
B = np.array([[1, 2, 1, 2], [4, 1, -1, -4]])

In [None]:
print(A.shape, B.shape)
# 4*2 and 2*4

(4, 2) (2, 4)


Three ways of performing matrix multiplication
1. `np.dot(A, B)`
2. `np.multiply(A, B)`
3. `np.matmul(A, B)`

### Scalar Multiplication

>scalar is a number like `1, 2, or 3`

In [None]:
print(A)

[[ 0  3]
 [ 1 -1]
 [ 2  1]
 [ 5  2]]


In [None]:
# scalar multiplication
# 2*0 2*3
# 2*1
s = 2
np.dot(s, A)

array([[ 0,  6],
       [ 2, -2],
       [ 4,  2],
       [10,  4]])

In [None]:
np.dot(A, s)

array([[ 0,  6],
       [ 2, -2],
       [ 4,  2],
       [10,  4]])

### Dot Product of two 2-D arrays

In [None]:
print(A)
B

[[ 0  3]
 [ 1 -1]
 [ 2  1]
 [ 5  2]]


array([[ 1,  2,  1,  2],
       [ 4,  1, -1, -4]])

In [None]:
# 4*2 x 2*4
# 2*4 x 4*2
# A x B is not the same as B X A
# 0*1 + 3*4 = 12 0*2+3*1 = 3  -3
np.dot(A, B)

array([[ 12,   3,  -3, -12],
       [ -3,   1,   2,   6],
       [  6,   5,   1,   0],
       [ 13,  12,   3,   2]])

In [None]:
np.dot(B, A)

array([[ 14,   6],
       [-21,   2]])

In [None]:
# Example with multiplication of 2 dimensional
# with another 2 dimensional


### #
#
#
#
#
#
#


In [None]:
# zeros function
a = np.zeros((2, 3))

In [None]:
a

array([[0., 0., 0.],
       [0., 0., 0.]])

In [None]:
a = np.ones((3, 5))

In [None]:
a

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [None]:
e = np.empty((3, 4))

In [None]:
e

array([[1.6442717e-316, 0.0000000e+000, 0.0000000e+000, 0.0000000e+000],
       [0.0000000e+000, 0.0000000e+000, 0.0000000e+000, 0.0000000e+000],
       [0.0000000e+000, 0.0000000e+000, 0.0000000e+000, 0.0000000e+000]])

In [None]:
e.dtype

dtype('float64')

# Mathematical Functions

#### Elementwise vs matrices

1. Add()
2. Substract()
3. multiply()
4. divide()
5. sqrt()
6. product(dot)

In [None]:
import numpy as np

In [None]:
x = np.array([[2, 3, 4], [3, 4, 2]])
y = np.array([[5, 3, 4], [3, 4, 5]], dtype="int64")


In [None]:
# adding elements in the array
# [[7, 6, 8], [6, 8, 7]]
result = x + y
result = np.add(x, y)
print(result)

[[7 6 8]
 [6 8 7]]


In [None]:
# subtract
result = x - y
result = np.subtract(y, x)
# [[-3,0, 0 ], [0, 0, -3]]
print(result)

[[3 0 0]
 [0 0 3]]


In [None]:
# multiply
#[[10, 9, 16], [9, 16, 10]]
result = x * y
result = np.multiply(x, y)
print(result)

[[10  9 16]
 [ 9 16 10]]


In [None]:
# divid
result = x / y
result = np.divide(y, x)
print(result)

[[2.5 1.  1. ]
 [1.  1.  2.5]]


In [None]:
resutl = np.sqrt(y)

print(result)

[[2.5 1.  1. ]
 [1.  1.  2.5]]


In [None]:
# Matrices wise
# dot()
x = np.array([[3, 5], [2, 1]])
y = np.array([[5, 1], [1, 2]])

In [None]:
# define vectors
v = np.array([2, 3])
w = np.array([1, 4])


In [None]:
result = v.dot(w)
# 2*1 + 3*4 = 14
# not [[2, 12]]
print(result)

14


In [None]:
# WHAT IS THE OUTPUT OF
result = x.dot(v)
# 3*5 + 5*1 = 20
[3* 2 + 5*3, 2*2 + 1*3]
print(result)

[21  7]


In [None]:
result = x.dot(y)
# [3*5 + 5*1, 3*1 + 5*2, 2* 5 + 1* 1, 2*1 + 1*2]
print(result)


[[20 13]
 [11  4]]


In [None]:
print(np.dot(x, y))

[[20 13]
 [11  4]]


In [None]:
 # what is the output of np.sum(x)
x = np.array([[2, 3, 4],
              [3, 4, 5]])
y = [2, 3, 4, 4] # list

# there are three ways of sum of arrays elements

print(np.sum(y))
print(np.sum(x))

# the sume of column elements
print(np.sum(x, axis=0))
# [5, 7, 9]

13
21
[5 7 9]


In [None]:
print(np.sum(x, axis=1))
# [9, 12]

[ 9 12]


In [None]:
s = 0
for i in y:
    s += i

print(s)


13


In [None]:
 x = np.array([
             [2, 3, 5, 5],
            [3, 2, 1, 6],
            [8, 8, 9, 7]])
print(x.ndim)
x.T

2


array([[2, 3, 8],
       [3, 2, 8],
       [5, 1, 9],
       [5, 6, 7]])

In [None]:
# how many dimensions in this array
print(x.shape)


(3, 4)


In [None]:
x.size

12

---


# Advanced Numpy Tutorial

## Mastering Sorting Algorithms in Python with NumPy

To sort a NumPy array, you can use the `numpy.sort` function or the sort method of the array itself. NumPy offers various sorting algorithms, including **quicksort**, **mergesort**, and **heapsort**. Here's a basic example of how to sort a NumPy array:

In [None]:
import numpy as np

# Create a NumPy array
array = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5])

# Using numpy.sort function to create a sorted copy of the array
sorted_array = np.sort(array)

print("Sorted array:")
print(sorted_array)

# Using the sort method to sort the array in place
array.sort()
print("Original array after sorting:")
print(array)


## Increasing or Decreasing sort

You can also sort a multi-dimensional array along a specific axis by specifying the axis parameter. Here's an example:

In [None]:
# Create a 2D NumPy array
array_2d = np.array([[3, 1, 4],
                     [1, 5, 9],
                     [2, 6, 5]])

# Sorting along the rows (axis=0)
sorted_array_2d_rows = np.sort(array_2d, axis=0)

print("Sorted 2D array along rows:")
print(sorted_array_2d_rows)

# Sorting along the columns (axis=1)
sorted_array_2d_cols = np.sort(array_2d, axis=1)

print("Sorted 2D array along columns:")
print(sorted_array_2d_cols)


NumPy's sorting functions are highly optimized and efficient for handling large arrays. You can choose different sorting algorithms using the **kind** parameter, which includes options like '**quicksort**', '**mergesort**', or '**heapsort**'. You can also use the order parameter to specify the field to use when sorting structured arrays.

## Using the '**quicksort'** algorithm:

In [1]:
import numpy as np

# Create an array
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5])

# Sorting with quicksort algorithm
sorted_array_quicksort = np.sort(arr, kind='quicksort')

print("Sorted array using quicksort:")
print(sorted_array_quicksort)


Sorted array using quicksort:
[1 1 2 3 4 5 5 6 9]


## Using the '**mergesort**' algorithm:

In [2]:
# Sorting with mergesort algorithm
sorted_array_mergesort = np.sort(arr, kind='mergesort')

print("Sorted array using mergesort:")
print(sorted_array_mergesort)


Sorted array using mergesort:
[1 1 2 3 4 5 5 6 9]


## Using the '**heapsort**' algorithm:

In [9]:
# Sorting with heapsort algorithm
sorted_array_heapsort = np.sort(arr, kind='heapsort')

print("Sorted array using heapsort:")
print(sorted_array_heapsort)


Sorted array using heapsort:
[9 6 5 5 4 3 2 1 1]


By default, NumPy's sort function uses **quicksort**. However, you can choose a different algorithm based on your requirements. For example, **mergesort** is a stable algorithm and is generally preferred when stability is required. **Heapsort** has a guaranteed worst-case performance and is useful for sorting in the presence of memory constraints.

Choosing the appropriate sorting algorithm depends on various factors, including the size of the data, the desired stability of the sort, and the memory constraints.