# NumPy

**Numerical Python**

- Ancestor of NumPy(Numerical Python) is **Numeric**
- In `2005`, Travis Oilphant developed NumPy
- Useful library for scientific computing
- NumPy does a real good job on linear algebra operations, can be used as an alternate to `MATLAB`
- It is a very useful library to perform mathematical and statistical operations in Python.
- It provides a high-performance multidimensional array object
- NumPy is memory efficient

## What is an array ?

- In Python, arrays are a collection of **same data types/elements/items** (homogeneous)stored in the memory
- Arrays are homogeneous --> same data types -- (all integers / all strings)--can't mix and much

###  To create an array, we can use `np.array()`

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### Import certain libraries

In [1]:
import os #talk about later
import numpy as np

### `List Comprehension`

![image.png](attachment:image.png)

In [2]:
my_list = [1,2,3,4,5]

#### Square the numbers

In [3]:
[element*element for element in my_list]

[1, 4, 9, 16, 25]

### Common elements from the two lists

In [4]:
list1 = [1, 2, 3, 4, 5]
list2 = [3, 4, 5, 6, 7]

`3,4,5`

In [5]:
common_elements = [x for x in list1 if x in list2] #shorthand notation

In [6]:
common_elements

[3, 4, 5]

`x in list1 if x in list2`: For each `x` in **list1**, it checks if `x` is present in **list2**

`If `x` is found in **list2**, it is included in the new list: common_elements

### H/W Prime Numbers 
Create a list of prime number ess than `n` using list comprehension

### Filtering even numbers

In [7]:
nums=[1,2,3,4,5,6,7,8,9,10]

In [8]:
even_nums = [x for x in nums if x %2==0]

In [9]:
even_nums

[2, 4, 6, 8, 10]

### Nested List Comprehension

In [10]:
nested_list = [[1,2,3], [4,5,6], [7,8,9]]

### Make it a flattened list

In [11]:
flat_list = [num for sublist in nested_list for num in sublist]

In [12]:
flat_list

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [13]:
- `sublist` is a variable not a keyword

SyntaxError: invalid syntax (<ipython-input-13-88a24fe3c47e>, line 1)

### Advantages of list comprehension

- Conciseness : drastically reduces the amount of code you need
- Efficiency: they are generally faster and more efficient than equivalent `for-loops` as they are implemented in C under the hood
- Functionality: Similar to `for-loop`

## Function to create an array from a Python object

### `np.array()`

In [14]:
my_list

[1, 2, 3, 4, 5]

In [15]:
print(type(my_list))

<class 'list'>


In [16]:
my_array = np.array(my_list)

In [17]:
print(type(my_array))

<class 'numpy.ndarray'>


`nd meand n-dimensional`

In [18]:
print(my_list)

[1, 2, 3, 4, 5]


In [19]:
my_array

array([1, 2, 3, 4, 5])

In [20]:
print(my_array)

[1 2 3 4 5]


In [21]:
my_array2 = np.array([7,2,6.7, "APC"]) #in this array, list is hetereogeneous

In [22]:
my_array2

array(['7', '2', '6.7', 'APC'], dtype='<U32')

In [23]:
print(type(my_array2[0]))

<class 'numpy.str_'>


- np.array did **implicit typecasting** to convert the hetereogeneous list to homogeneous array (str)

### NumPy arrays are `homogeneous` and can contain object of only `one data type`

In [24]:
apc_list = [2,4,7,8,9,2,0]

In [25]:
apc_list

[2, 4, 7, 8, 9, 2, 0]

In [26]:
elements_to_remove = [2,2]

In [27]:
for item in elements_to_remove:
    if item in apc_list:
        del apc_list[apc_list.index(item)]
print(apc_list)

[4, 7, 8, 9, 0]


### Why do we need Arrays over lists ?

**`Functionality`**

In [28]:
list1 = [1,12,3]
list2 = [2,4,6]

- Multiply the given two lists to generate the output as

`out_list = [2,48,18]` #elementwise multiplication

In [29]:
list1 * list2

TypeError: can't multiply sequence by non-int of type 'list'

**Elementwise multiplication of two lists - vectorized operations not possible in Python using lists**

### Alternate Solution

In [30]:
out_list=[] #empty list

for i in range(len(list1)):
    out_list.append(list1[i]*list2[i])
    
print('Multiplication of two lists:', out_list)

Multiplication of two lists: [2, 48, 18]


### Alternate Approach - Array Way

In [31]:
### Convert the lists into arrays
arr1 = np.array(list1)
arr2 = np.array(list2)

In [32]:
arr_out= arr1 * arr2

In [33]:
arr_out

array([ 2, 48, 18])

In [34]:
out_list = arr_out.tolist() #converts any object to list

In [35]:
out_list

[2, 48, 18]

In [36]:
out_list2 = (np.array(list1)*np.array(list2)).tolist()

### NumPy Arrays in Python, allow vectorized operations

In [37]:
out_list2

[2, 48, 18]

In [38]:
list(arr_out)  #another way converting an array to a list

[2, 48, 18]

In [39]:
import numpy as np

## Array Creation and Initialization

### 2- D Dimensional Array

![image.png](attachment:image.png)

In [40]:
arr_1d = np.array([1,2,3,6,8,9,10]) # 1-D

In [41]:
arr_1d.dtype

dtype('int32')

In [42]:
arr_1d_8 = np.array([1,2,3,6,8,9,10], dtype='float32') # 1-D

In [43]:
arr_1d_8.dtype

dtype('float32')

In [44]:
arr_1d_8

array([ 1.,  2.,  3.,  6.,  8.,  9., 10.], dtype=float32)

In [45]:
type(arr_1d)

numpy.ndarray

In [46]:
arr_2d = np.array([[5.2,3.0,4.5],[9.1, 0.1, 0.3]]) #2D Array

In [47]:
type(arr_2d)

numpy.ndarray

## Array inspection functions

- ndim: number of dimensions

- shape:returns a tuple with each index having the number of corresponding elements

- size: it counts the no. of elements along a given axis, **by default it will count total no. of elements in array**

- dtype: data type of array elements

- itemsize: byte size of **each array element**

- nbytes: total size of array and it is equal to `itemsize X size`

In [48]:
print('Dimension of the array:', arr_1d.ndim)
print('Shape of the array:', arr_1d.shape)
print('Size of the array:', arr_1d.size)
print('Datatype of the dtype:', arr_1d.dtype)
print('Itemsize of the array:', arr_1d.itemsize)
print('Total size of the array:', arr_1d.nbytes)

Dimension of the array: 1
Shape of the array: (7,)
Size of the array: 7
Datatype of the dtype: int32
Itemsize of the array: 4
Total size of the array: 28


![image.png](attachment:image.png)

- ndim: 2
- .shape: (2,3)
- .size: 6
- .dtype: 'float64'
- .itemsize: '8bytes'
- .nbytes:'48'

In [49]:
print('Dimension of the array:', arr_2d.ndim)
print('Shape of the array:', arr_2d.shape)
print('Size of the array:', arr_2d.size)
print('Datatype of the dtype:', arr_2d.dtype)
print('Itemsize of the array:', arr_2d.itemsize)
print('Total size of the array:', arr_2d.nbytes)

Dimension of the array: 2
Shape of the array: (2, 3)
Size of the array: 6
Datatype of the dtype: float64
Itemsize of the array: 8
Total size of the array: 48


### 3-Dimensional Array

![image.png](attachment:image.png)

`credit: to the infographics creator`

In [50]:
arr_3d = np.array()b

SyntaxError: invalid syntax (<ipython-input-50-21ed8f7d480e>, line 1)

In [51]:
arr_3d.ndim

NameError: name 'arr_3d' is not defined

### APC

**shape(i,j,k)**

`i`: `number of layers`

`j`: `number of rows`

`k`: `number of columns`

In [52]:
arr_3d = np.array([
    [[10,11,12], [13,14,15], [16,17,18]], #first layer
    [[20,21,22], [23,24,25], [26,27,28]],
    [[30,31,32], [33,34,35], [36,37,38]]
])

In [53]:
arr_3d

array([[[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]],

       [[20, 21, 22],
        [23, 24, 25],
        [26, 27, 28]],

       [[30, 31, 32],
        [33, 34, 35],
        [36, 37, 38]]])

In [54]:
print('Dimension of the array:', arr_3d.ndim)
print('Shape of the array:', arr_3d.shape)
print('Size of the array:', arr_3d.size)
print('Datatype of the dtype:', arr_3d.dtype)
print('Itemsize of the array:', arr_3d.itemsize)
print('Total size of the array:', arr_3d.nbytes)

Dimension of the array: 3
Shape of the array: (3, 3, 3)
Size of the array: 27
Datatype of the dtype: int32
Itemsize of the array: 4
Total size of the array: 108


### H/W Create a simple 4-D array

### Initialize all the elements of array (size - your choice) with the value `0`

### `np.zeros`

In [55]:
arr_zero = np.zeros((3,3,3))  #3 layers, 3 rows, 3 column

In [56]:
arr_zero

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]])

In [57]:
arr_zero = np.zeros((3,3,3), dtype ='int')  #3 layers, 3 rows, 3 columsn

In [58]:
arr_zero

array([[[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]])

In [59]:
arr_zero.dtype

dtype('int32')

![image.png](attachment:image.png)

### Initialize all the elements of the array of your choice with `1`

### `np.ones()`

In [60]:
arr_ones = np.ones((3,3,3), dtype='int32')

In [61]:
arr_ones

array([[[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]],

       [[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]],

       [[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]]])

### Initialize all the elements of the array of your choice with any fixed number or an array

### `np.full`

In [62]:
arr_full = np.full((3,3,3),7)

In [63]:
arr_full

array([[[7, 7, 7],
        [7, 7, 7],
        [7, 7, 7]],

       [[7, 7, 7],
        [7, 7, 7],
        [7, 7, 7]],

       [[7, 7, 7],
        [7, 7, 7],
        [7, 7, 7]]])

In [64]:
arr_full = np.full((3,3,3),"APC")

In [65]:
arr_full

array([[['APC', 'APC', 'APC'],
        ['APC', 'APC', 'APC'],
        ['APC', 'APC', 'APC']],

       [['APC', 'APC', 'APC'],
        ['APC', 'APC', 'APC'],
        ['APC', 'APC', 'APC']],

       [['APC', 'APC', 'APC'],
        ['APC', 'APC', 'APC'],
        ['APC', 'APC', 'APC']]], dtype='<U3')

### Q. Create an array of shape: (4,3,2) filled with 0 and 1 only 
--- DO NOT TYPE IN

In [66]:
seq= [0,1]

In [67]:
arr_full = np.full((4,3,2),seq)

In [68]:
arr_full

array([[[0, 1],
        [0, 1],
        [0, 1]],

       [[0, 1],
        [0, 1],
        [0, 1]],

       [[0, 1],
        [0, 1],
        [0, 1]],

       [[0, 1],
        [0, 1],
        [0, 1]]])

### Do you think arrays are mutable?

In [69]:
arr_1d

array([ 1,  2,  3,  6,  8,  9, 10])

In [70]:
arr_1d[3] = 100

In [71]:
arr_1d

array([  1,   2,   3, 100,   8,   9,  10])

**Arrays are mutable**

### Filling RANDOM numbers

### `np.random.random()`

In [72]:
arr_random = np.random.random((3,3,3))

![image.png](attachment:image.png)

In [73]:
arr_random

array([[[0.9754736 , 0.9431767 , 0.69123649],
        [0.58462134, 0.69396172, 0.65167139],
        [0.17476403, 0.18013879, 0.57829328]],

       [[0.08279325, 0.78514793, 0.29157196],
        [0.22720005, 0.2750132 , 0.96891526],
        [0.43443942, 0.01344419, 0.13621074]],

       [[0.31147145, 0.41801673, 0.49913246],
        [0.30484108, 0.17689021, 0.75653971],
        [0.49316984, 0.1521012 , 0.18255911]]])

In [74]:
arr_random*1000

array([[[975.47360373, 943.17669628, 691.23648611],
        [584.62133556, 693.96171922, 651.67139367],
        [174.76402978, 180.13879498, 578.29327747]],

       [[ 82.79325037, 785.14793253, 291.57195888],
        [227.2000535 , 275.013197  , 968.91525704],
        [434.43941599,  13.44419104, 136.21074168]],

       [[311.47144787, 418.016728  , 499.13245694],
        [304.84108342, 176.89020613, 756.53970533],
        [493.16983607, 152.10119932, 182.55911235]]])

## Reading assignment -- NORMAL DISTRIBUTION

**https://statisticsbyjim.com/**

## Report Card

In [75]:
# Function to calculate grades based on exam scores:

def cal_grade(score):
    if score>=90:
        return "A"
    elif score>=80:
        return "B"
    elif score>=70:
        return "C"
    elif score>=60:
        return "D"
    else:
        return "F"
    
# Function to collect student data from the user's input
def collect_student_data():
    name = input("Enter the student's first name:")
    score = float(input("Enter the student's score for {} (0-100):" .format(name)))
    
    #validate the score to be within range [0,100]
    while score <0 or score>100:
        print('Invalid score. Please enter a score between 0 and 100')
        score = float(input("Enter the student's score for {} (0-100):" .format(name)))
    
    return {"name":name, "score": score}

### Function to generate the student report
def generate_student_report(students):
    print('Student Exam Report')
    print('--------------------------')
    print('Name     | Score    | Grade')
    print('---------------------------')
    
    for student in students:
        name = student['name']
        score = student['score']
        grade = cal_grade(score)
        print(f"{name.ljust(10)} | {str(score).ljust(11)}  | {grade}") #formatting functions --read 
        
        
#list to store the student data
students = []

#collect data for multiple students (3-5) --> later condition on 3-5
num_students = int(input("Enter the number of students:"))

for i in range(num_students):
    print(f"\nEnter data for student {i+1}:")
    student_data = collect_student_data()
    students.append(student_data)

print(students)

### Generate and print the report
generate_student_report(students)

Enter the number of students:


ValueError: invalid literal for int() with base 10: ''

In [None]:
grade_criteria={'A':(90,100), 'B':(80,89), 'C':(70,79), 'D':(60,69), 'E':(50,59), 'F':(0,49)}

No_of=int(input('Enter the numbers of student detail you need to enter : '))
Names_of_Students=[]
Marks_of_Students=[]
Grade_of_Students=[]

for i in range(No_of):
    Name= str(input("Enter the Name of the Student : "))
    Names_of_Students.append(Name)
    Marks = int (input('Enter the Marks'))
    Marks_of_Students.append(Marks)

for mark in Marks_of_Students:
    for grades,(min_value,max_value) in grade_criteria.items():
        if min_value <= mark <= max_value:
            Grade_of_Students.append(grades)
            break
            
print(Names_of_Students)
print(Marks_of_Students)
print(Grade_of_Students)


In [None]:
students = [
    {"name": "Rahul", "student_id": "S01", "score": 85},
    {"name": "Rohan", "student_id": "S02", "score": 92},
    {"name": "Sneha", "student_id": "S03", "score": 72},
    {"name": "Mohit", "student_id": "S04", "score": 58},
    {"name": "Shiva", "student_id": "S05", "score": 78},
]
def calc_grade(score):
    if 90 <= score <= 100:
        return "A"
    elif 80 <= score < 90:
        return "B"
    elif 70 <= score < 80:
        return "C"
    elif 60 <= score < 70:
        return "D"
    else:
        return "F"

print("      Student Exam Report    ")
print("-------------------------------------")
print(f"{'Name':<10} | {'Exam Score':<10} | {'Grade'}")
print("------------------------------------")

for student in students:
    name = student["name"]
    score = student["score"]
    grade = calc_grade(score)
    print(f"{  name:<10} | { score:<10} | {  grade}")

print("-------------------------------------")
print(" ***** End of report ******")


## Normal Distribution

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### `np.random.normal()`

In [None]:
arr_nd = np.random.normal(10, 2, (3,3,3))

![image.png](attachment:image.png)

In [None]:
arr_nd

In [None]:
arr_nd.mean() #mean is close to 10

In [None]:
arr_nd.std()

In [None]:
arr_nd = np.random.normal(-10, 2, (3,3,3))

In [None]:
arr_nd

### Print an identity array

### `np.identity()`

In [None]:
arr_identity = np.identity(4, dtype='int8')

In [None]:
print(arr_identity)

- Number of rows and number of columns for an identity array hence we dont need to give any shape

## Indexing and Slicing in NumPy Array

In [None]:
arr_1d

### Indexing 

In [None]:
arr_1d[0]

#### Print the last element of the array

In [None]:
arr_1d[-1]

### Slicing

![image.png](attachment:image.png)

### 2-D indexing & slicing

![image.png](attachment:image.png)

In [None]:
arr_2d = np.array([[1,2,3], [4,5,6], [7,8,9]])

In [None]:
arr_2d

![image.png](attachment:image.png)

In [None]:
arr_2d[1,1]

### Print everything

In [None]:
arr_2d[:] #: means take everything rows as well columns

### Print the first row

In [None]:
arr_2d[0] #by default it is for row

### Print the first column

In [None]:
arr_2d[ : , 0 ] #take everything from row but only  the first column

`i` selects the `row` and `j` selects the `column`

`:` colon means every element in row/column

### Print the alternate rows for the given array

![image.png](attachment:image.png)

In [None]:
arr_2d[0:3: 2 ]

In [None]:
arr_2d[:: 2 ]

### Print the alternate columns for the given array

![image.png](attachment:image.png)

In [None]:
arr_2d[:, ::2 ]

### Reverse the columns

In [None]:
arr_2d[:, ::-1]

### Transpose the array

In [None]:
arr_2d

In [None]:
arr_2d.T #transpose

### Indexing and slicing in 3 dimensions

![image.png](attachment:image.png)

In [None]:
arr_3d

![image.png](attachment:image.png)

In [None]:
arr_3d[:]

#### Select the first layer

In [None]:
arr_3d[0]

#### Select the last layer

In [None]:
arr_3d[-1]

#### Print middle column of the middle layer

#### `arr_3d[ layer   ,  row  ,  column ]`

In [None]:
arr_3d[1, :, 1]

#### Print all the middle columns across the layers

In [None]:
arr_3d[:, :,1]

#### Print the alternate columns in the reverse order across the layers

![image.png](attachment:image.png)

In [None]:
arr_3d[:, :, ::-2]

## H/W Print the diagonal arrays across the layers

![image-2.png](attachment:image-2.png)

## Array Mathematics

![image.png](attachment:image.png)

### Addition

In [None]:
arr1

In [None]:
arr2

### Summing all the elements in an array

In [None]:
arr1.sum()

In [None]:
np.sum(arr1)

In [None]:
arr_3d

In [None]:
np.sum(arr_3d)

In [None]:
arr_3d.sum()

### Elementwise addition of two arrays

In [None]:
arr1 + arr2

In [None]:
np.sum([arr1, arr2], axis=0)

In [None]:
np.sum([arr1, arr2], axis=1)

### 2D Arrays Sum

In [None]:
arr_2d

In [None]:
arr_2d2 = np.array([[99,98,97], [96,95,94],[93,92,91]])

In [None]:
arr_2d2

In [None]:
arr_2d + arr_2d2 #elementwise addition 

In [None]:
np.sum([arr_2d, arr_2d2], axis=0)

In [None]:
np.sum([arr_2d, arr_2d2], axis=1)

### UFunctions - > Universal Functions
`pro tip: interview`

![image.png](attachment:image.png)

In [None]:
arr1

In [None]:
arr1 + 5 

In [None]:
my_list = [1,12,3]

In [None]:
my_list + 5 

In [None]:
x = arr1

In [None]:
x

In [None]:
print('x+5:', x+5) #addition
print('x-5:', x-5) #subtraction
print('x*2:', x*2) #multiplication
print('x/2:', x/2) #division
print('x//2:', x//2) #floor division - Greatest Integer Function
print('x**2:', x**2) #Power
print('x%2:', x%2) #remainder

### Mathematical Functions

In [None]:
arr0 =np.array([2,9,16])

In [None]:
arr0

In [None]:
print('Square root of the array:', np.sqrt(arr0))

In [None]:
print('Natural log of the array:', np.log(arr0)) #base e

In [None]:
print('Log og the array:', np.log10(arr0)) #base 10

In [None]:
print('Sine of the array:', np.sin(arr0))

In [None]:
print('Cosine of the array:', np.cos(arr0))

### How to filter the arrays

#### `np.any()`, `np.where`

 - `np.any` returns 'True' if at least one element in the input array evaluates to 'True' and 'False' otherwise
 - used for checking if any element in an array meets a specified condtion

In [None]:
my_arr = np.array([1,2,3,4,5,6,7,6,5,4,3,2,1,8])

In [None]:
filt_criteria = my_arr>5

In [None]:
my_arr[filt_criteria]

In [76]:
np.any(my_arr>7) #test whether condition is met or not even once

NameError: name 'my_arr' is not defined

`np.where`

- `np.where` is used to locate the indices of elements in an array that meet a certain condition
- it returns the indices where the given condition is 'True'

In [150]:
np.where(my_arr>6)

(array([ 6, 13], dtype=int64),)

#### `&  : and ` , `| : or` , `~ : not`,  `== : equal`

### Basic Statistics

- For advanced statistical tests, use `SciPy` - stats library

**https://docs.scipy.org/doc/scipy/reference/stats.html**

In [151]:
python_scores = np.array([10,20,15,20,12,15,8,5,13,17]) #10 students python scores out of 20 marks

In [170]:
print('Sum total of Python Scores:', python_scores.sum()) #sum
print('Average Python Scores:', python_scores.mean()) #avg
print('Average Python Scores:', np.average(python_scores))#avg
print('Median Python Scores:', np.median(python_scores)) #median
print('Variance of Python Scores:', python_scores.var())#variance
print('Std. Deviation of Python Scores:', python_scores.std())#std

# print('Mode of Python Scores:', np.mode(python_scores))
print('Mode of Python Scores:', python_scores.mode())

Sum total of Python Scores: 135
Average Python Scores: 13.5
Average Python Scores: 13.5
Median Python Scores: 14.0
Variance of Python Scores: 21.85
Std. Deviation of Python Scores: 4.674398357008098


AttributeError: 'numpy.ndarray' object has no attribute 'mode'

### Sort the below array

In [156]:
python_scores

array([10, 20, 15, 20, 12, 15,  8,  5, 13, 17])

In [162]:
np.sort(python_scores) #ascending 

array([ 5,  8, 10, 12, 13, 15, 15, 17, 20, 20])

In [161]:
np.sort(python_scores)[::-1] #descending 

array([20, 20, 17, 15, 15, 13, 12, 10,  8,  5])

**Note: Array doesn't have sorting by parameter - ascending and descending**

However Pandas do have sorting by parameter!

#### `np.average` vs `arr.mean()`

   ##### `np.average` : Compute the weighted average along the specified axis.

##### `python_scores.mean()` : Returns the average of the array elements along given axis. (Arithmetic Mean)

**`Observation: NumPy doesn't have mode as a function so we are going to write it`**

### H/W Let us write a function to calculate mode

Hint: `np.max()` & `np.unique()`

## Benchmark Arrays against Lists

In [185]:
import timeit

### Case 1: Elementwise multiplication using regular lists:
def list_multiplication():
    list1 = list(range(10000000)) #10 million sequences
    list2 = list(range(10000000)) #10 million sequences
    result = [x*y for x,y in zip(list1,list2)]

### Case 2: Elementwise multiplication using arrays:
def array_multiplication():
    arr1 = np.arange(100000000)
    arr2 = np.arange(100000000)
    result=arr1*arr2

### Benchmark the execution time of list multiplication
list_time = timeit.timeit(list_multiplication, number=100)

### Benchmark the execution time of array multiplication
array_time = timeit.timeit(array_multiplication, number=100) 

In [186]:
list_time

140.3912921000001

In [187]:
array_time

39.17159199999969

In [188]:
list_time/array_time

3.5840078212803097

### Big data tools 

`DASK` , `Parquet Files`

**https://www.dask.org/**

## Create an array: np.linspace & np.arange

 `pro tip: interview question`

![image.png](attachment:image.png)

- When it comes to create a sequence of values, `linspace` and `arange` are two commonly used NumPy functions

Here is the subtle difference between the two functions:

- `linspace` allows you to specify the **number of values**
- `arange` allows you to specify the **size of the step**

### np.linspace()

`np.linspace(start, stop, num, …)`

where:

- start: The starting value of the sequence
- stop: The end value of the sequence
- **num: the number of values to generate**

#### Both start and stop are included

In [88]:
np.linspace(30,50)

array([30.        , 30.40816327, 30.81632653, 31.2244898 , 31.63265306,
       32.04081633, 32.44897959, 32.85714286, 33.26530612, 33.67346939,
       34.08163265, 34.48979592, 34.89795918, 35.30612245, 35.71428571,
       36.12244898, 36.53061224, 36.93877551, 37.34693878, 37.75510204,
       38.16326531, 38.57142857, 38.97959184, 39.3877551 , 39.79591837,
       40.20408163, 40.6122449 , 41.02040816, 41.42857143, 41.83673469,
       42.24489796, 42.65306122, 43.06122449, 43.46938776, 43.87755102,
       44.28571429, 44.69387755, 45.10204082, 45.51020408, 45.91836735,
       46.32653061, 46.73469388, 47.14285714, 47.55102041, 47.95918367,
       48.36734694, 48.7755102 , 49.18367347, 49.59183673, 50.        ])

In [80]:
30.40816327-30

0.4081632699999993

In [82]:
 30.81632653 - 30.40816327

0.408163260000002

In [85]:
np.linspace(10,50, 5, retstep=True)

(array([10., 20., 30., 40., 50.]), 10.0)

- By default, 50 numbers will be generated
- `np.linspace` returns 1-D array and to get into other dimensions, we need to use array `reshaping`

### np.arange()

`np.arange(start, stop, step, …)`

where:

- start: The starting value of the sequence
- stop: The end value of the sequence
- **step: the spacing between the values**

**Start is included and stop is excluded**

In [90]:
np.arange(10,50, 10) # it doesn't include the stop --> 50 is excluded

array([10, 20, 30, 40])

In [91]:
np.arange(10,50.000001, 10) # it doesn't include the stop --> 50 is excluded

array([10., 20., 30., 40., 50.])

#### Print all the even numbers between `0` and `101` including 0 

In [92]:
np.arange(0,101,2)

array([  0,   2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,
        26,  28,  30,  32,  34,  36,  38,  40,  42,  44,  46,  48,  50,
        52,  54,  56,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,
        78,  80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100])

### Generating time intervals for Time Series Data

- Suppose you are working for the weather department and you need to create time intervals for plotting or any sort of analysis
- Idea is to generate evenly spaced timestamps

In [94]:
import pandas as pd

In [95]:
start_time = pd.Timestamp("2023-11-08 00:00:00")
end_time = pd.Timestamp("2023-11-08 00:23:00")

timestamps = np.linspace(start_time.value, end_time.value, 24)

In [97]:
timestamps.shape

(24,)

#### Convert timestamps back to Pandas as Timestamps

In [98]:
time_series = pd.to_datetime(timestamps)

In [99]:
time_series

DatetimeIndex(['2023-11-08 00:00:00', '2023-11-08 00:01:00',
               '2023-11-08 00:02:00', '2023-11-08 00:03:00',
               '2023-11-08 00:04:00', '2023-11-08 00:05:00',
               '2023-11-08 00:06:00', '2023-11-08 00:07:00',
               '2023-11-08 00:08:00', '2023-11-08 00:09:00',
               '2023-11-08 00:10:00', '2023-11-08 00:11:00',
               '2023-11-08 00:12:00', '2023-11-08 00:13:00',
               '2023-11-08 00:14:00', '2023-11-08 00:15:00',
               '2023-11-08 00:16:00', '2023-11-08 00:17:00',
               '2023-11-08 00:18:00', '2023-11-08 00:19:00',
               '2023-11-08 00:20:00', '2023-11-08 00:21:00',
               '2023-11-08 00:22:00', '2023-11-08 00:23:00'],
              dtype='datetime64[ns]', freq=None)

## Array Manipulation

### `resize()` & `reshape()`

 `pro tip: interview question`

### resize()

- returns a new array with the specified shape
- If the new array is larger than the original array, the new array is going to be filled with the repeated copies of the original array

![image.png](attachment:image.png)

In [100]:
a = np.array([[1,2], [3,4]])
a

array([[1, 2],
       [3, 4]])

In [101]:
a.ndim

2

In [102]:
a.shape

(2, 2)

In [103]:
b = np.resize(a, (3,2))
b

array([[1, 2],
       [3, 4],
       [1, 2]])

In [104]:
np.resize(a, (3,3))

array([[1, 2, 3],
       [4, 1, 2],
       [3, 4, 1]])

In [105]:
np.resize(a, (3,3,3))

array([[[1, 2, 3],
        [4, 1, 2],
        [3, 4, 1]],

       [[2, 3, 4],
        [1, 2, 3],
        [4, 1, 2]],

       [[3, 4, 1],
        [2, 3, 4],
        [1, 2, 3]]])

### Combining `linspace` and `resize` to get arrays of your choice

#### Gettig an `3D array` of 60 values between 0 and 100 and shape of the array - (3,4,5)

In [107]:
np.resize(np.linspace(0,100,60), (3,4,5))

array([[[  0.        ,   1.69491525,   3.38983051,   5.08474576,
           6.77966102],
        [  8.47457627,  10.16949153,  11.86440678,  13.55932203,
          15.25423729],
        [ 16.94915254,  18.6440678 ,  20.33898305,  22.03389831,
          23.72881356],
        [ 25.42372881,  27.11864407,  28.81355932,  30.50847458,
          32.20338983]],

       [[ 33.89830508,  35.59322034,  37.28813559,  38.98305085,
          40.6779661 ],
        [ 42.37288136,  44.06779661,  45.76271186,  47.45762712,
          49.15254237],
        [ 50.84745763,  52.54237288,  54.23728814,  55.93220339,
          57.62711864],
        [ 59.3220339 ,  61.01694915,  62.71186441,  64.40677966,
          66.10169492]],

       [[ 67.79661017,  69.49152542,  71.18644068,  72.88135593,
          74.57627119],
        [ 76.27118644,  77.96610169,  79.66101695,  81.3559322 ,
          83.05084746],
        [ 84.74576271,  86.44067797,  88.13559322,  89.83050847,
          91.52542373],
        [ 93.2203

### reshape()

- is used to give a new shape to an array `without changing its data/elements
- the new shape must be compatible with the original shape

In [109]:
arr_3d

array([[[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]],

       [[20, 21, 22],
        [23, 24, 25],
        [26, 27, 28]],

       [[30, 31, 32],
        [33, 34, 35],
        [36, 37, 38]]])

In [110]:
np.reshape(arr_3d, (9,3))

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18],
       [20, 21, 22],
       [23, 24, 25],
       [26, 27, 28],
       [30, 31, 32],
       [33, 34, 35],
       [36, 37, 38]])

In [111]:
np.reshape(arr_3d, (3,9,1))

array([[[10],
        [11],
        [12],
        [13],
        [14],
        [15],
        [16],
        [17],
        [18]],

       [[20],
        [21],
        [22],
        [23],
        [24],
        [25],
        [26],
        [27],
        [28]],

       [[30],
        [31],
        [32],
        [33],
        [34],
        [35],
        [36],
        [37],
        [38]]])

In [112]:
np.reshape(arr_3d, (3,2,4))

ValueError: cannot reshape array of size 27 into shape (3,2,4)

## STACKING

![image.png](attachment:image.png)

- Stacking in the context of array refers to the process of joining two or more arrays along a `specified axis`
- Arrays coming from different analyses can be joined to form a consolidated array
- In ML, training and test sets can be combined using `STACKING`

#### `np.stack`, `np.concatenate`, `np.vstack`, `np.hstack`, 

### 1-D STACKING

In [152]:
a = np.array([1,2,3])
b= np.array([12,14,16])

In [125]:
print(a)
print(b)

[1 2 3]
[12 14 16]


In [126]:
a.ndim, b.ndim

(1, 1)

#### Stacking one on the top of the other (axis 0)

In [127]:
ab_stacked_axis_0 = np.stack((a,b), axis=0)

In [128]:
ab_stacked_axis_0

array([[ 1,  2,  3],
       [12, 14, 16]])

In [129]:
ab_stacked_axis_0.ndim

2

In [132]:
ab_stacked_axis_0.shape

(2, 3)

**Observation: Stacking arrays always increase the dimension by one**

`all input arrays must have the same shape`

### Stacking side by side (axis 1)

In [130]:
ab_stacked_axis1 = np.stack((a,b), axis = 1)

In [131]:
ab_stacked_axis1

array([[ 1, 12],
       [ 2, 14],
       [ 3, 16]])

In [133]:
ab_stacked_axis1.ndim

2

In [134]:
ab_stacked_axis1.shape

(3, 2)

## 2-D Stacking

In [136]:
arr_2d1 = np.array([[1,5,7], [2,6,8]])
arr_2d2 = np.array([[10,50,70], [20,60,80]])

In [137]:
print(arr_2d1)

[[1 5 7]
 [2 6 8]]


In [145]:
arr_2d1.ndim

2

In [138]:
print(arr_2d2)

[[10 50 70]
 [20 60 80]]


In [141]:
ab_stacked_2d_axis_0 = np.stack((arr_2d1, arr_2d2), axis=0)

In [142]:
ab_stacked_2d_axis_0.ndim

3

In [143]:
ab_stacked_2d_axis_0

array([[[ 1,  5,  7],
        [ 2,  6,  8]],

       [[10, 50, 70],
        [20, 60, 80]]])

In [144]:
ab_stacked_2d_axis_0.shape

(2, 2, 3)

In [146]:
ab_stacked_2d_axis_1 = np.stack((arr_2d1, arr_2d2), axis=1)

In [147]:
ab_stacked_2d_axis_1.ndim

3

In [148]:
ab_stacked_2d_axis_1.shape

(2, 2, 3)

In [149]:
ab_stacked_2d_axis_1

array([[[ 1,  5,  7],
        [10, 50, 70]],

       [[ 2,  6,  8],
        [20, 60, 80]]])

## Concatenation 

In [153]:
a

array([1, 2, 3])

In [154]:
b

array([12, 14, 16])

In [161]:
a.ndim

1

In [155]:
c = np.concatenate((a,b), axis=0)

In [156]:
c

array([ 1,  2,  3, 12, 14, 16])

In [165]:
c.ndim

1

In [158]:
list1 = [1,2,3]
list2 = [4,5,6]
list1+list2 # + concatenation

[1, 2, 3, 4, 5, 6]

In [160]:
d = np.concatenate((a,b), axis=1)

AxisError: axis 1 is out of bounds for array of dimension 1

In [162]:
arr_2d1

array([[1, 5, 7],
       [2, 6, 8]])

In [163]:
arr_2d2

array([[10, 50, 70],
       [20, 60, 80]])

In [164]:
np.concatenate((arr_2d1, arr_2d2), axis=0)

array([[ 1,  5,  7],
       [ 2,  6,  8],
       [10, 50, 70],
       [20, 60, 80]])

- Concatenation doesn't change the dimension of the combined array

## hstack vs vstack

#### hstack

![image.png](attachment:image.png)

####  NumPy `hstack function` takes 2 arrays with the `same number of rows` and joins them automatically

In [167]:
a1 = np.array([[1,2], [3,4]])

print(a1)

print(a1.shape)

[[1 2]
 [3 4]]
(2, 2)


In [168]:
b1 = np.array([[10,20,30], [40,50,60]])

print(b1)

print(b1.shape)

[[10 20 30]
 [40 50 60]]
(2, 3)


#### Observation: numbers of rows in a1 and b1 are same

In [169]:
np.hstack((a1,b1)) #2D

array([[ 1,  2, 10, 20, 30],
       [ 3,  4, 40, 50, 60]])

#### vstack

In [170]:
c1 = np.array([[10,20,30],[40,50,60], [89,99,101]])

In [171]:
c1

array([[ 10,  20,  30],
       [ 40,  50,  60],
       [ 89,  99, 101]])

In [172]:
c1.shape

(3, 3)

In [173]:
np.hstack((a1,c1))

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2 and the array at index 1 has size 3

p.vsta![image.png](attachment:image.png)

- NumPy vstack funtion takes 2 arrays with the `same number of columns` and joins them vertically

In [174]:
np.vstack((b1,c1))

array([[ 10,  20,  30],
       [ 40,  50,  60],
       [ 10,  20,  30],
       [ 40,  50,  60],
       [ 89,  99, 101]])

In [175]:
np.vstack((a1,c1))

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 2 and the array at index 1 has size 3

In [176]:
a1.shape

(2, 2)

In [177]:
c1.shape

(3, 3)

In [178]:
b1.shape

(2, 3)

## Splitting the arrays

### vsplit

![image.png](attachment:image.png)

In [179]:
arr1 = np.reshape(np.linspace(10,100,8),(4,2))

In [180]:
arr1

array([[ 10.        ,  22.85714286],
       [ 35.71428571,  48.57142857],
       [ 61.42857143,  74.28571429],
       [ 87.14285714, 100.        ]])

In [182]:
k,l = np.vsplit(arr1,2)

In [183]:
k

array([[10.        , 22.85714286],
       [35.71428571, 48.57142857]])

In [184]:
l

array([[ 61.42857143,  74.28571429],
       [ 87.14285714, 100.        ]])

In [185]:
np.vsplit(arr1,4)

[array([[10.        , 22.85714286]]),
 array([[35.71428571, 48.57142857]]),
 array([[61.42857143, 74.28571429]]),
 array([[ 87.14285714, 100.        ]])]

In [186]:
x,y,z,v = np.vsplit(arr1,4)

In [187]:
x

array([[10.        , 22.85714286]])

In [188]:
y

array([[35.71428571, 48.57142857]])

In [189]:
z

array([[61.42857143, 74.28571429]])

In [190]:
v

array([[ 87.14285714, 100.        ]])

In [191]:
np.vsplit(arr1,5)

ValueError: array split does not result in an equal division

### hsplit()

In [192]:
arr2 = np.reshape(np.linspace(10,100,16), (4,4))

In [193]:
arr2

array([[ 10.,  16.,  22.,  28.],
       [ 34.,  40.,  46.,  52.],
       [ 58.,  64.,  70.,  76.],
       [ 82.,  88.,  94., 100.]])

In [194]:
arr2.shape

(4, 4)

In [195]:
np.hsplit(arr2,2)

[array([[10., 16.],
        [34., 40.],
        [58., 64.],
        [82., 88.]]),
 array([[ 22.,  28.],
        [ 46.,  52.],
        [ 70.,  76.],
        [ 94., 100.]])]

In [196]:
k,l = np.hsplit(arr2,2)

In [197]:
k

array([[10., 16.],
       [34., 40.],
       [58., 64.],
       [82., 88.]])

In [198]:
l

array([[ 22.,  28.],
       [ 46.,  52.],
       [ 70.,  76.],
       [ 94., 100.]])

## Broadcasting

In [199]:
arr1 = np.array([1,2,3])

In [200]:
arr1*2

array([2, 4, 6])

![image.png](attachment:image.png)

- It is a powerful feature in NumPy that allows arithmetic operations to be performed on arrays of different shapes and sizes
- Broadcasting makes element-wise operations easier without the need for explicit loops
- Broadcasting has some set of rules to align the shape of the arrays automatically

![image.png](attachment:image.png)

In [201]:
import os

In [202]:
os.getcwd()

'C:\\Users\\think\\OneDrive - Thinking Mojo\\TSLC\\Intellipaat\\Session Master\\06.Data Science Weekday Batch - 11Oct'

In [204]:
os.listdir()

['.ipynb_checkpoints',
 'Flow_Control_Statements_20_Oct_APC.pdf',
 'heart.csv',
 'Introduction_to_Course_APC.pdf',
 'Intro_to_Data_Manipulation_31Oct_APC.pdf',
 'Intro_to_Data_Manipulation_using_NumPy_01Nov_APC.pdf',
 'Intro_to_Python_11Oct_APC.pdf',
 'M01-Basic Python-Session-3-13Oct-APC.ipynb',
 'M01-Basic Python-Session-4-17Oct-APC.ipynb',
 'M01-Basic Python-Session-4-18Oct-APC.ipynb',
 'M01-Basic Python-Session-5-19Oct-APC.ipynb',
 'M01-Basic Python-Session-6-20Oct-APC.ipynb',
 'M01-Flow Control Statements-Session-10-27Oct-APC.ipynb',
 'M01-Flow Control Statements-Session-6-20Oct-APC.ipynb',
 'M01-Flow Control Statements-Session-7-8-24-25Oct-APC.ipynb',
 'M01-Flow Control Statements-Session-9-26Oct-APC.ipynb',
 'M02-Data_Manipulation_NumPy-Session-12-01Nov-APC.ipynb',
 'M02-Data_Manipulation_NumPy-Session-13-02Nov-APC.ipynb',
 'M02-Data_Manipulation_NumPy-Session-14-03Nov-APC.ipynb',
 'M02-Data_Manipulation_NumPy-Session-15-07Nov-APC.ipynb',
 'M02-Data_Manipulation_NumPy-Session-16