# Numpy
Numerical Python (NumPy) is a popular Python library used for numerical computations and handling arrays or matrices.
It provides support for multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

#### Why do we use NumPy?

* `Efficient Array Operations`: NumPy allows for fast operations on arrays/matrix computations, making it essential for scientific and numerical calculations.
* `Data Representation`: It provides a convenient and efficient way of representing and manipulating numerical data.
* `Integration with Other Libraries`: It integrates well with other Python libraries and tools used in data analysis, machine learning, and scientific computing.

#### Advantages of NumPy:

* `Performance`: It's highly optimized and written in C, making operations significantly faster than standard Python lists for numerical computations.
* `Broadcasting`: Allows operations on arrays of different shapes, which makes code concise and easier to read.
* `Array-Oriented Computing`: Provides a wide range of mathematical functions for fast operations on entire arrays without the need for writing loops.
* `Memory Efficiency`: NumPy arrays use less memory compared to Python lists for storing data.
* `Broad Usage`: Widely used in fields like data science, machine learning, scientific research, and engineering due to its speed and functionality.


In [84]:
import numpy as np
from nptyping import NDArray, Shape, UInt32
from typing import Any
from typing import Dict
import pandas as pd
import pandera as pa

Arrays enables you to perform mathematical operations on whole blocks of data which is not possible using list. Arithmetic Operation can be done on nparray which is not possible using list 

In [4]:

data : np.ndarray = np.array([[1.5, 0.1, 3],
                            [0, -3, 6.5]])

display(data);
display(data+5);
display(data*2)

array([[ 1.5,  0.1,  3. ],
       [ 0. , -3. ,  6.5]])

array([[ 6.5,  5.1,  8. ],
       [ 5. ,  2. , 11.5]])

array([[ 3. ,  0.2,  6. ],
       [ 0. , -6. , 13. ]])

Every array has a shape, a tuple indicating the size of each dimension

In [2]:
import numpy as np

a : np.ndarray = np.array(1000) # object to store (scalar)

print(a) # prints
print(a.shape) # prints the shape of the object () = 0 -Denormalized (tuple)
print(a.dtype) # prints the dtype of the object
print(a.ndim) # prints the number of dimensions - Scalar values have 0 dimension
print(a.size) # prints the size of the object
print(a.itemsize) # prints the itemsize of the object


1000
()
int32
0
1
4


List can be converted into array by passing it to the numpy array. Nested lists will be converted into a multi-dimensional array

In [3]:
data1 : list[float] = [6, 7.5, 8, 0, 1];
arr1 : np.ndarray = np.array(data1);
display(arr1)
print(f"Shape of arr1: {arr1.shape}")
print(f"Dimension of arr1: {arr1.ndim}")

data2 : list[list[int]] = [[1, 2, 3],
                            [4, 5, 6]]
arr2 : np.ndarray = np.array(data2)
display(arr2)
print(f"Shape of arr2: {arr2.shape}")
print(f"Dimension of arr2: {arr2.ndim}")

array([6. , 7.5, 8. , 0. , 1. ])

Shape of arr1: (5,)
Dimension of arr1: 1


array([[1, 2, 3],
       [4, 5, 6]])

Shape of arr2: (2, 3)
Dimension of arr2: 2


There are other functions for creating new arrays.
* np.zeros
* np.zeros_like
    * `np.zeros `creates an array of zeros with a specified shape and data type, whereas `np.zeros_like` generates an array of zeros that has the same shape and data type as the input array provided.
* np.ones
* np.ones_like
    * same like `np.like_zeros`
* np.empty
* np.arange
* np.asarray
* np.full
* np.eye


In [4]:
display(np.zeros(10)) #creates a scalar array of 10 zeros
display(np.zeros((2,3))) #creates a 2dimensional array (2 rows, 3 columns)
display(np.zeros_like([[1, 2, 3], [4, 5, 6]])) #replaces the element with zeros

display(np.ones(10)) #creates a scalar array of 10 ones
display(np.ones((2,3))) #creates a 2dimensional array (2 rows, 3 columns)

display(np.empty((2, 3, 2))) #returns garbage value of zeros


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

array([[0., 0., 0.],
       [0., 0., 0.]])

array([[0, 0, 0],
       [0, 0, 0]])

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

array([[1., 1., 1.],
       [1., 1., 1.]])

array([[[9.24862259e-312, 3.16202013e-322],
        [0.00000000e+000, 0.00000000e+000],
        [1.16709769e-312, 2.61786604e+180]],

       [[8.45220061e+169, 3.12164635e-033],
        [3.85601652e-057, 1.24663196e-047],
        [9.83874103e-072, 6.52891886e-042]]])

We can explicitly convert or cast an array from one data type to another using ndarray's method. However, while explicity defining type, I think it would not be a nice practice

In [5]:
arr : np.ndarray = np.array([1, 2, 3, 4, 5]);
print(f"Type of arr: {arr.dtype}")
print(f"Memory id of arr: {id(arr)}")
display(arr)

arr = arr.astype(np.float64) #calling astype always creates a new array (a copy of the data)
                            #even if the new data type is the same as the old data type
                            
print(f"Type of arr: {arr.dtype}")
print(f"Memory id of arr: {id(arr)}") #new memory id will be created
display(arr)



Type of arr: int32
Memory id of arr: 1872331627888


array([1, 2, 3, 4, 5])

Type of arr: float64
Memory id of arr: 1872331628368


array([1., 2., 3., 4., 5.])

There are multiple ways you can perform indexing on numpy array. One-dimensional arrays are simpler. 

In [6]:
arr : np.ndarray = np.arange(10)
display(arr);
print(f"Element at the 6th index is {arr[5]}")
print(f"First 3 elements of the array {arr[:3]}")
print(f"Memory id of array : {id(arr)}")

arr[5:8] = 12 #elements in the array can be replaced with new value
display(arr)
print(f"Memory id of array after values changed : {id(arr)}") #changing values of array doesnt create new array 
                                                            #(same memory address)


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Element at the 6th index is 5
First 3 elements of the array [0 1 2]
Memory id of array : 1872331629232


array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

Memory id of array after values changed : 1872331629232


In a two-dimensional array, the elements at each index are no longer scalars but rather one dimensional arrays

In [7]:
arr2d : np.ndarray = np.array([[1, 2, 3],
                            [4, 5, 6]])
print(f"Values at second row: {arr2d[1,:]}") #1d array instead of scalar
print(f"Values at 1st column: {arr2d[:,1]}")
print(f"Element at row 2 and column 2: {arr2d[1][1]}") #column and row index also starts from zero. 
print(f"Memory id of array : {id(arr2d)}")

arr2d[1] = [10, 11, 12]
print(f"Array after value changed {arr2d}")
print(f"Memory id of array after value changed: {id(arr2d)}") #memory address remains the same

arr2d[1][0] = 30
print(f"Array after value changed {arr2d}")
print(f"Memory id of array after value changed: {id(arr2d)}") #memory address remains the same

Values at second row: [4 5 6]
Values at 1st column: [2 5]
Element at row 2 and column 2: 5
Memory id of array : 1872331629328
Array after value changed [[ 1  2  3]
 [10 11 12]]
Memory id of array after value changed: 1872331629328
Array after value changed [[ 1  2  3]
 [30 11 12]]
Memory id of array after value changed: 1872331629328


![Image description](array-index-slicing.png)

#### Boolean Indexing
Boolean indexing refers to the process of selecting elements from numpy array based on boolean conditions. The Boolean array must be of the same length as the array axis it’s indexing

In [8]:
names : np.ndarray = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
data : np.ndarray = np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2],[-12, -4], [3, 4]])

display(names == 'Will') #returns true for every index where `Will` is present
display(data[names == 'Will']) #returns data of array `data` corresponding to the indexes where name is 'Will'
                        #  It retrieves the rows where the condition names == 'Will' is True.
                        
display(~(names == "Bob")) #selects everything instead of Bob. False will appear at the index where Bob is present

mask = (names == 'Bob') | (names == 'Will') # `and` and `or` doesnot work. Use $ or |
                                        # returns true where bob or will is present                                        
display(mask)

display(data)
print(f"Memory address of the data array {id(data)}")
data[data < 0] = 0 #replace all the negative values with 0

display("Array after value replaced",data)
print(f"Memory address of the data array after value changed {id(data)}")
                                                #memory address remains the same

array([False, False,  True, False,  True, False, False])

array([[-5,  6],
       [ 1,  2]])

array([False,  True,  True, False,  True,  True,  True])

array([ True, False,  True,  True,  True, False, False])

array([[  4,   7],
       [  0,   2],
       [ -5,   6],
       [  0,   0],
       [  1,   2],
       [-12,  -4],
       [  3,   4]])

Memory address of the data array 1872331631248


'Array after value replaced'

array([[4, 7],
       [0, 2],
       [0, 6],
       [0, 0],
       [1, 2],
       [0, 0],
       [3, 4]])

Memory address of the data array after value changed 1872331631248


`Fancy indexing` in numpy refers to accessing arrays using array of indices or boolean arrays

In [13]:
arr : np.ndarray = np.array([1, 2, 3, 4, 5]);
indices : list[int] = [1, 3] #list of indices
result : np.ndarray = arr[indices] 
display(result) #returns elements at index 1 and index 3

array([2, 4])

In [15]:
arr : np.ndarray = np.zeros((8, 4));
for i in range(8):
    arr[i] = i
display(arr);

arr[[4, 3, 0, 6]] #provides a list of array that returns elements for those indexes
                    #returns  row at indice 4, 4, 0 and 6

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

Passing multiple index array works different; it selects a one-dimensional array of elements corresponding to each tuple of indices

In [17]:
arr : np.ndarray = np.arange(32).reshape((8,4))
display(arr)

arr[[1, 5, 7, 2], [0, 3, 1, 2]] #here the elements (1, 0), (5, 3), (7, 1), and  (2, 2) were selected 
                                #(row, column)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

array([ 4, 23, 29, 10])

Transpose of an array can be found usiong `arr.T` method

In [21]:
display("Array: ",arr);
display("Transpose of an array: ",arr.T)

'Array: '

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

'Transpose of an array: '

array([[ 0,  4,  8, 12, 16, 20, 24, 28],
       [ 1,  5,  9, 13, 17, 21, 25, 29],
       [ 2,  6, 10, 14, 18, 22, 26, 30],
       [ 3,  7, 11, 15, 19, 23, 27, 31]])

##### Universal Functions
A universal function is a function that performs element-wise operations on data in ndarrays. Different method includes 
* `abs, fabs` (Compute the absolute value element wise for integer or floating point)
* `sqrt` (finds square root)
* `square` (finds square)
* `exp` (find exponent e^x for each element)
* `log`, `log10` (calculate natural log or log base 10)
* `sign` (calculate sign of element)
* `ceil`
* `floor`
* `rint`
* `cos`, `cosh`, `sin`
* `sin`, `sinh`, `tanh`
* `logical_not`

and more...

In [26]:
arr : np.ndarray = np.arange(10);
display(arr);
display("Square of array elements: ",np.square(arr)); #finds square
display("Square root of array elements: ",np.sqrt(arr)); #finds square root
display("Exponentional of array: ",np.exp(arr)) #np.exp([1, 2, 3]=(e^1, e^2, e^3) and can ba calculated using Eulers number


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

'Square of array elements: '

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

'Square root of array elements: '

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

'Exponentional of array: '

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [31]:
print("sin(90): ",np.sin(90))
print("Sign of -9 ", np.sign(-9)) #-1 indicates negative 0 zero, 1 indicates +1
print(np.ceil(9.4)) #rounds to smallest integer greater or equal to 9.4
print(np.floor(9.4)) #rounds to largest integer less or equal to 9.4

sin(90):  0.8939966636005579
Sign of -9  -1
10.0
9.0


`numpy.where()` is a function in numpy that returns the indices of element in an array that satisfies a specified condition. It has syntax
`numpy.where(condition, [x, y])` where `condition` is the condition to be checked and `x` and `y` are optional arrays

In [35]:
arr : np.ndarray = np.array([1, 6 , 9, 2, 11, 3, 7, 8]);
display(arr);
display("Array with elements greater than 5 ", arr[np.where(arr > 5)])

array([ 1,  6,  9,  2, 11,  3,  7,  8])

'Array with elements greater than 5 '

array([ 6,  9, 11,  7,  8])

##### Mathematical and Statistical Methods

In [51]:
arr = np.random.standard_normal((5, 4));
display(arr)
display("Sorted array: ",np.sort(arr)) #sorts the array column wis
display("Sorted array (rows): ",np.sort(arr, axis=1)) #sorts the array row wise
display("Mean of the array: ", arr.mean())
display("Sum of the array: ", arr.sum())
display("Sum of array columns: ", arr.sum(axis=0))
display("Sum of array rows: ", arr.sum(axis=1))


array([[ 0.97856959, -0.1863732 , -0.04830702,  1.54064901],
       [-2.22634874, -0.64483096, -1.16678043,  0.04913518],
       [ 0.20621451, -0.33306974, -1.06064628,  0.3018838 ],
       [ 0.51458078, -0.05559129, -1.30790728, -1.01414844],
       [ 0.63797845,  0.37786721, -0.53643637, -1.24180958]])

'Sorted array: '

array([[-0.1863732 , -0.04830702,  0.97856959,  1.54064901],
       [-2.22634874, -1.16678043, -0.64483096,  0.04913518],
       [-1.06064628, -0.33306974,  0.20621451,  0.3018838 ],
       [-1.30790728, -1.01414844, -0.05559129,  0.51458078],
       [-1.24180958, -0.53643637,  0.37786721,  0.63797845]])

'Sorted array (rows): '

array([[-0.1863732 , -0.04830702,  0.97856959,  1.54064901],
       [-2.22634874, -1.16678043, -0.64483096,  0.04913518],
       [-1.06064628, -0.33306974,  0.20621451,  0.3018838 ],
       [-1.30790728, -1.01414844, -0.05559129,  0.51458078],
       [-1.24180958, -0.53643637,  0.37786721,  0.63797845]])

'Mean of the array: '

-0.26076854058172955

'Sum of the array: '

-5.215370811634591

'Sum of array columns: '

array([ 0.11099458, -0.84199799, -4.12007737, -0.36429004])

'Sum of array rows: '

array([ 2.28453838, -3.98882495, -0.88561771, -1.86306623, -0.76240029])

In [56]:
names : np.ndarray = np.array(["Bob", "Will", "Joe", "Bob", "Will", "Joe", "Joe"])
display(np.unique(names))

ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
display(np.unique(ints))

array(['Bob', 'Joe', 'Will'], dtype='<U4')

array([1, 2, 3, 4])

#### Vector Type

In [8]:

a : np.ndarray = np.array([1,2,3,4]) # object to store [1,2,3,4] = vector

display(f" object {a}") # prints
display(f"objec shape {a.shape}") # prints the shape of the object () = 0 -Denormalized
display(f" Object type {a.dtype}") # prints the dtype of the object
display(f"Object type with global function {type(a)}") # prints the dtype of the
display(f"Number of dimension {a.ndim}") # prints the number of dimensions
display(f"Total items in Array : {a.size}") # prints the size of the object
display(f"{a.itemsize}") # prints the itemsize of the object

a.size?

' object [1 2 3 4]'

'objec shape (4,)'

' Object type int32'

"Object type with global function <class 'numpy.ndarray'>"

'Number of dimension 1'

'Total items in Array : 4'

'4'

[1;31mType:[0m        int
[1;31mString form:[0m 4
[1;31mDocstring:[0m  
int([x]) -> integer
int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments
are given.  If x is a number, return x.__int__().  For floating point
numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in the
given base.  The literal can be preceded by '+' or '-' and be surrounded
by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
Base 0 means to interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4

#### Matrix

In [9]:
data = [[0, 1, 2, 3],
        [4, 5, 6, 7],
        [8, 9, 10, 11]]

a : np.ndarray = np.array(data) # object to store [1,2,3,4] = vector

display(f" object {a}") # prints
display(f"objec shape {a.shape}") # prints the shape of the object () = 0 -Denormalized
display(f" Object type {a.dtype}") # prints the dtype of the object
display(f"Object type with global function {type(a)}") # prints the dtype of the
display(f"Number of dimension {a.ndim}") # prints the number of dimensions
display(f"Total items in Array : {a.size}") # prints the size of the object
display(f"{a.itemsize}") # prints the itemsize of the object

' object [[ 0  1  2  3]\n [ 4  5  6  7]\n [ 8  9 10 11]]'

'objec shape (3, 4)'

' Object type int32'

"Object type with global function <class 'numpy.ndarray'>"

'Number of dimension 2'

'Total items in Array : 12'

'4'

##### Numpy with NDArray typing support

In [58]:
%%time 
from nptyping import NDArray, Shape, UInt32
from typing import Any

data : NDArray[Shape["10"],Any] = np.arange(1,20);

d1 : list[int] = [1,2,3,4,5,6,7,8,9,10]
#d1 + 5 #will give error because doesnt work on list

print(data);
print(data + 5)
print(data ** 2)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[ 6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[  1   4   9  16  25  36  49  64  81 100 121 144 169 196 225 256 289 324
 361]
CPU times: total: 78.1 ms
Wall time: 1.48 s


In [59]:
print("List Method\n")
data1 : list[int] = list(range(1,21))
print(data1)
print(data1[5:11])
#data1[5:11] = 1000 #list doesnt allow this operation
print(data1)

print("\nNumpy Mehtod \n")
ndata : NDArray[Shape["20"], Any] = np.arange(1,21)
print(ndata)
print(ndata[5:11])
ndata[5:11] = 1000
print(ndata)


List Method

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
[6, 7, 8, 9, 10, 11]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

Numpy Mehtod 

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]
[ 6  7  8  9 10 11]
[   1    2    3    4    5 1000 1000 1000 1000 1000 1000   12   13   14
   15   16   17   18   19   20]


In [3]:
from typing import Any
from nptyping import NDArray, Shape, Bool


state_bank : NDArray[Shape["10"], Any] = np.array([1, 7, 8, 10])
select : NDArray[Shape["10"], Bool] = np.array([True, False, False, True])

ubl_bank : NDArray[Shape["100"], Any] = np.random.randint(1, 100, 20);

display(state_bank)
display(state_bank[select])

display(ubl_bank)
display(ubl_bank[ubl_bank % 2 == 0])

array([ 1,  7,  8, 10])

array([ 1, 10])

array([54, 45, 65, 95, 91, 55, 50, 41,  4, 63, 61, 38,  5, 34, 78, 56, 53,
       85, 57, 73])

array([54, 50,  4, 38, 34, 78, 56])

In [21]:
x : NDArray[Shape["5"], Any] = np.array([1, 3, 4, 5, 7])
y : NDArray[Shape["5"], Any] = np.array([6, 3, 2, 5, 100])

display(x)
np.where(x > y, x, y)


array([1, 3, 4, 5, 7])

array([  6,   3,   4,   5, 100])

In [23]:
a : NDArray[Shape["Size, Size"], Any] = np.array([[1, 2, 3],
                                                [4, 5, 6]])
print(a)

a : NDArray[Shape["Size, Size"], Any] = np.array([[1, 2],
                                                [4, 5]])
print(a)

a : NDArray[Shape["Size, Size"], Any] = np.array([["A"],
                                                ["B"]])


[[1 2 3]
 [4 5 6]]
[[1 2]
 [4 5]]


#### Create any dimension array

In [15]:
a : NDArray[Shape["Size"],Any] = np.arange(1,5)
print(f"1D Array: {a}")

a : NDArray[Shape["Size, Size"],Any] = np.arange(3*3).reshape(3,3)
print(f"\n2D Array \n{a}")

a : NDArray[Shape["Size, Size, Size"],Any] = np.arange(2*3*3).reshape(2,3,3)
print(f"\n3D Array \n{a}")

1D Array: [1 2 3 4]

2D Array 
[[0 1 2]
 [3 4 5]
 [6 7 8]]

3D Array 
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]]


In [16]:
np.zeros(10)


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

## Pandas
Pandas is a popular open-source data manipulation and analysis library for Python. It provides powerful and easy-to-use data structures, such as Series (1-dimensional) and DataFrame (2-dimensional), which are designed to make working with structured data more intuitive

#### <b>Key Components:</b>
`DataFrame`: <br/>
A 2-dimensional tabular data structure resembling a spreadsheet or SQL table. It consists of rows and columns, each of which can contain different data types (integers, strings, floats, etc.).
<br/>
`Series:`<br/>
A one-dimensional array-like object that can hold data of any type. A DataFrame consists of multiple Series objects.

#### <b>Key Components:</b>
`Data Cleaning and Preparation`:<br/>
Pandas is extensively used for data cleaning tasks, such as handling missing data (NaN), removing duplicates, and transforming data into a more usable format.

`Data Manipulation`: <br/>
It allows for efficient manipulation of data, including selecting, filtering, sorting, aggregating, and merging data sets.

`Data Analysis`: <br/>
Pandas provides functionalities to perform various statistical and mathematical operations on datasets. Users can calculate descriptive statistics, apply functions to data, and generate summary reports easily.

`Time Series Analysis`: <br/>
Pandas offers powerful tools for working with time series data, enabling date and time functionalities, such as date range generation, shifting, lagging, and windowing operations.

`Data Visualization`: <br/>
While Pandas itself doesn't handle visualization, it integrates well with other libraries like Matplotlib and Seaborn to create visual representations of data directly from DataFrames.



In [None]:
%pip install pandas
%pip install pandera

In [61]:
import pandas as pd
import pandera as pa

### Series
A `Series` is a one-dimensional array-like object containing a sequence of values of the same type and an associated array of data labels called its `index`
<br/>
The simplest Series is formed from only an array of data:

In [69]:
ser : pd.Series= pd.Series([1,2,3,4,5]) #left column displays the index, right column represents correspinding value
display(ser)
display(type(ser))
display(ser.array) #wraps a numpy array but also contain special extension
display(ser.index)

0    1
1    2
2    3
3    4
4    5
dtype: int64

pandas.core.series.Series

<NumpyExtensionArray>
[1, 2, 3, 4, 5]
Length: 5, dtype: int64

RangeIndex(start=0, stop=5, step=1)

Sometimes, user want to create a series with an index identifying each data point with a label

In [72]:
ser2 : pd.Series = pd.Series([90, 50, 75, 65, 45], index=["A", "D", "B", "C", "F"])
display(ser2)
display("Indexes of ser2: ",ser2.index)

A    90
D    50
B    75
C    65
F    45
dtype: int64

'Indexes of ser2: '

Index(['A', 'D', 'B', 'C', 'F'], dtype='object')

We can use label in the index to access its value

In [80]:
print("Value for label A: ", ser2["A"])
print(f"Value for label F: {ser2["F"]}")
print(f"Value for label C and B: \n{ser2[["C", "B"]]}")
print(f"Values greater than 70 \n{ser2[ser2 > 70]}")

Value for label A:  90
Value for label F: 45
Value for label C and B: 
C    65
B    75
dtype: int64
Values greater than 70 
A    90
B    75
dtype: int64


There are number of ways to create a series. Set cannot be used to create a series

#Set cannot be used for creating series
s1 : pd.Series = pd.Series({1,2,3,4,5})
s1

In [83]:

#Tuple can be used for creating series
s1 : pd.Series = pd.Series((1,2,3,4,5))
display("Series created using tuple: ", s1)

#List can be used for creating series
s1 : pd.Series = pd.Series([1,2,3,4,5])
display("Series created using list: ", s1)
#Dictionary can be used for creating series
s1 : pd.Series = pd.Series({"A":10,
                            "B":20,
                            "C":30,
                            "D":40})
display("Series created using dictionary: ", s1)

#converting series back to dictionary
display("Series converted back to dictionary: ", s1.to_dict())

'Series created using tuple: '

0    1
1    2
2    3
3    4
4    5
dtype: int64

'Series created using list: '

0    1
1    2
2    3
3    4
4    5
dtype: int64

'Series created using dictionary: '

A    10
B    20
C    30
D    40
dtype: int64

'Series converted back to dictionary: '

{'A': 10, 'B': 20, 'C': 30, 'D': 40}

When passing dictionary, the keys of dictionary will be the index of the series. It can be overridden

In [96]:
dict: Dict[str, int] = {"Ohio": 35000, "Texas": 71000, "Oregon": 16000, "Utah": 5000}
ser3 : pd.Series = pd.Series(dict)
display(ser3)

#overriding indexes label

ser3 : pd.Series = pd.Series(dict, index=["California", "Ohio", "Oregon", "Texas"], name='State Population')
display("Series after indexes overridden: ",ser3)

#finding missing numbers
display("Missing number in the series: ",pd.isna(ser3))

#finding numbers that are not NaN
display("Not NaN in the series: ",pd.notna(ser3))

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

'Series after indexes overridden: '

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
Name: State Population, dtype: float64

'Missing number in the series: '

California     True
Ohio          False
Oregon        False
Texas         False
Name: State Population, dtype: bool

'Not NaN in the series: '

California    False
Ohio           True
Oregon         True
Texas          True
Name: State Population, dtype: bool

The indexes of the series can be altered

In [99]:
ser4 : pd.Series = pd.Series([4, 7, -5, 3])
display(ser4)
ser4.index = ["Bob", "Steve", "Jeff", "Ryan"]
display(ser4)

0    4
1    7
2   -5
3    3
dtype: int64

Bob      4
Steve    7
Jeff    -5
Ryan     3
dtype: int64

In [10]:
values : list[int] = [1, 2, 3, 4, 5];
index1 : list[str] = ['a', 'b', 'c', 'd', 'e'];

s1 : pd.Series = pd.Series(values, index=index1)
s1

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [14]:
values : list[int] = [1, 2, 3, 4, 5];
index1 : list[list[str]] = [['a1', 'a1', 'b1', 'b1', 'c1'],
                            ['a', 'b', 'c', 'd', 'e']]; #grouping data

s1 : pd.Series = pd.Series(values, index=index1, name="Student_Data")
s1

a1  a    1
    b    2
b1  c    3
    d    4
c1  e    5
Name: Student_Data, dtype: int64

### DataFrame

In [16]:
from nptyping import DataFrame, Structure as S
import pandas as pd
import pandera as pa

# data to validate
df = pd.DataFrame({
    "column1": [1, 4, 0, 10, 9],
    "column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
    "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"],
})

# define schema
schema = pa.DataFrameSchema({
    "column1": pa.Column(int, checks=pa.Check.le(10)),
    "column2": pa.Column(float, checks=pa.Check.lt(-1.2)),
    "column3": pa.Column(str, checks=[
        pa.Check.str_startswith("value_"),
        # define custom checks as functions that take a series as input and
        # outputs a boolean or boolean Series
        pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
    ]),
})

validated_df = schema(df)
print(validated_df)

   column1  column2  column3
0        1     -1.3  value_1
1        4     -1.4  value_2
2        0     -2.9  value_3
3       10    -10.1  value_2
4        9    -20.4  value_1


##### DataFrame

In [22]:
s1 : pd.Series = pd.Series([1,2,3,4,5], name="student id")
s2 : pd.Series = pd.Series([10,20,30,40,50], name="score")
s3 : pd.Series = pd.Series(["Hamza", "Ali", "Junaid", "Rashid", "Konain"], name="Student Name")

df1 : pd.DataFrame = pd.DataFrame({"student id" : s1, "score" : s2, "student name" : s3})
df1 

Unnamed: 0,student id,score,student name
0,1,10,Hamza
1,2,20,Ali
2,3,30,Junaid
3,4,40,Rashid
4,5,50,Konain


In [23]:
df2 : pd.DataFrame = pd.concat([s1, s2, s3], axis=1)
df2

Unnamed: 0,student id,score,Student Name
0,1,10,Hamza
1,2,20,Ali
2,3,30,Junaid
3,4,40,Rashid
4,5,50,Konain


In [29]:
from nptyping import NDArray, Shape, UInt64
from typing import Any

data : NDArray[Shape["10, 10"], Any] = np.arange(10*10).reshape(10, 10)
data

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

In [31]:
data : NDArray[Shape["10, 10"], Any] = np.arange(10*10).reshape(10, 10);
df : pd.DataFrame = pd.DataFrame(data);
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0,1,2,3,4,5,6,7,8,9
1,10,11,12,13,14,15,16,17,18,19
2,20,21,22,23,24,25,26,27,28,29
3,30,31,32,33,34,35,36,37,38,39
4,40,41,42,43,44,45,46,47,48,49
5,50,51,52,53,54,55,56,57,58,59
6,60,61,62,63,64,65,66,67,68,69
7,70,71,72,73,74,75,76,77,78,79
8,80,81,82,83,84,85,86,87,88,89
9,90,91,92,93,94,95,96,97,98,99


In [37]:
dfl : list[pd.DataFrame] = pd.read_html("https://www.w3schools.com/python/python_operators.asp")
dfl[0]

Unnamed: 0,Operator,Name,Example,Try it
0,+,Addition,x + y,Try it »
1,-,Subtraction,x - y,Try it »
2,*,Multiplication,x * y,Try it »
3,/,Division,x / y,Try it »
4,%,Modulus,x % y,Try it »
5,**,Exponentiation,x ** y,Try it »
6,//,Floor division,x // y,Try it »


In [42]:

s1 : pd.Series = pd.Series([1,2,3,4,5])
display(s1)
print("Applying sliding")
display(s1[3])# index
display(s1[2:4])
display(s1.iloc[1:4]) #index location (numbers) same as numpy slicing index


0    1
1    2
2    3
3    4
4    5
dtype: int64

Applying sliding


4

2    3
3    4
dtype: int64

1    2
2    3
3    4
dtype: int64

In [46]:
x : str = """
20:03:10 From Dr. Ghulam Shabbir to Everyone:
	AoA Honorable Sir Qasim Sb. Stay blessed always iA.
20:03:41 From M Qasim to Everyone:
	sir aj payare lag rahy hy
20:04:02 From Dr. Ghulam Shabbir to Everyone:
	Fantastic
20:04:06 From jhon wick to Qasim(CGAIO)(Direct Message):
	sir aj zia khan sir ne nahi ana
20:04:37 From jhon wick to Everyone:
	Reacted to "sir aj payare lag ra..." with 😂
20:04:44 From M Qasim to Everyone:
	Reacted to "sir aj payare lag ra..." with 😂
20:04:54 From M Qasim to Everyone:
	Removed a 😂 reaction from "sir aj payare lag ra..."
20:04:56 From M Qasim to Everyone:
	Reacted to "sir aj payare lag ra..." with 😂
20:04:57 From M Qasim to Everyone:
	Removed a 😂 reaction from "sir aj payare lag ra..."
20:06:04 From Abdullah to Everyone:
	sir ap kal ki class me ni ayen gy kia??
20:06:04 From M Qasim to Everyone:
	Sir Faisalabad b a rahy hy kya ap?
20:06:49 From Yasir to Everyone:
	Sir generative ai ki class online b hogi na?????
20:07:17 From Sheikh Hamza to Everyone:
	Sir data analytics ke liye maths statistics and probability ani chaye??
20:07:24 From Yasir to Everyone:
	Replying to "Sir generative ai ki..."
	
	@Ikhlas Bhojani PLease ask from sir?
20:07:57 From Ikhlas Bhojani to Everyone:
	Replying to "Sir generative ai ki..."
	
	ok wait
20:08:52 From Dr. Ghulam Shabbir to Everyone:
	Sir Qasim Sb. You were frequently discussed and were given honored and salute in our today's class at Islamabad. Dr. Ghulam Shabbir
20:09:12 From STONE to Everyone:
	Reacted to "Sir Qasim Sb. You we..." with 👍
20:09:15 From Hatif Humayun to Everyone:
	Reacted to "Sir Qasim Sb. You we..." with 👍
20:09:19 From Dr. Ghulam Shabbir to Everyone:
	Reacted to "Sir Qasim Sb. You we..." with 👍
20:09:27 From STONE to Everyone:
	Removed a 👍 reaction from "Sir Qasim Sb. You we..."
20:09:33 From STONE to Everyone:
	Reacted to "Sir Qasim Sb. You we..." with 👍
20:09:34 From STONE to Everyone:
	Removed a 👍 reaction from "Sir Qasim Sb. You we..."
20:12:02 From Amanat Wattoo to Everyone:
	Pd kia convention h
20:12:20 From Amanat Wattoo to Everyone:
	Ya koi or be naam de skte hn
20:12:33 From farhan to Everyone:
	Convention hey
20:12:50 From KifayatUllah to Everyone:
	screen is not visible to me, may be network issue or something else
20:14:16 From Rehan Alı WAKKAS to Everyone:
	Import pandas as pd and import pandora as pa.
	What is pd and pa ?
20:14:28 From farhan to Everyone:
	Just names
20:15:06 From Ikhlas Bhojani to Everyone:
	koi question ho tw hand raise kriyea jese i moka milega me unmute krdonga
20:15:41 From Talha to Everyone:
	Replying to "Import pandas as pd ..."
	
	Alias, instead of writing long words pandas and pandora, we can write pd and pa as their short form
20:15:45 From Rehan Alı WAKKAS to Everyone:
	@Ikhlas Bhojani  
	Import pandas as pd and import pandora as pa.
	What is pd and pa ?
20:16:25 From Ikhlas Bhojani to Everyone:
	Replying to "@Ikhlas Bhojani  
	Im..."
	
	alias
20:16:43 From Jazil Hashmi to Everyone:
	Import pandas could not be resolved from source. Why is this error coming? @Ikhlas Bhojani
20:16:55 From Ikhlas Bhojani to Everyone:
	Replying to "@Ikhlas Bhojani  
	Im..."
	
	like nick name
20:17:36 From Ikhlas Bhojani to Everyone:
	!pip install pandas
20:17:46 From Rehan Alı WAKKAS to Everyone:
	Replying to "@Ikhlas Bhojani  
	Im..."
	
	Do they refer to Directories or what ?
20:19:58 From Abdullah to Everyone:
	Pandas is usually imported under the pd alias. alias: In Python alias are an alternate name for referring to the same thing. Create an alias with the as keyword while importing: import pandas as pd. Now the Pandas package can be referred to as pd instead of pandas .
20:20:30 From farhan to Everyone:
	These are nicknames used by community
20:25:02 From ahsan rasheed to Everyone:
	Mary kush lectures rah gy hain
20:25:15 From ahsan rasheed to Everyone:
	Kia ya lecture record ho rhy hain
20:25:23 From ABDUL KHALIQ to Everyone:
	g
20:25:26 From ABDUL KHALIQ to Everyone:
	youtube
20:25:31 From ABDUL KHALIQ to Everyone:
	panaverse dao
20:25:48 From ahsan rasheed to Everyone:
	Thanks
20:26:30 From Kamran Ahmed to Everyone:
	tabular form in panada
20:26:46 From Sarwar Faridi to Everyone:
	TABULAR
20:26:54 From Sarwar Faridi to Everyone:
	2D
20:26:57 From Kamran Ahmed to Everyone:
	index value also shown
20:27:06 From Ahmed Siddiqui to Everyone:
	matrix
20:27:07 From Kamal Hassan to Everyone:
	series men indexing h or tabular form mn h
20:29:56 From Kaleem to Everyone:
	sir series ma duplicate values bhi dy kar daikhain!!
20:35:09 From Zeeshan Abbas to Everyone:
	Sir what is INT64
20:37:19 From Ikhlas Bhojani to Everyone:
	koi question he tw hand raise krke rakhen
20:37:51 From Zeeshan Abbas to Everyone:
	mera question hai what is INT64
20:38:04 From Abdullah to Everyone:
	smj e ni aya 
	repeat krwa dyen
20:39:23 From Altaf Hussain to Everyone:
	Just like Chart of accounts in Accounting...
20:39:44 From Ahmed Siddiqui to Everyone:
	Reacted to "Just like Chart of a..." with 👍
20:40:40 From hi to Everyone:
	what is int 64
20:40:45 From hi to Everyone:
	dtype: int 64
20:40:54 From Zeeshan Abbas to Everyone:
	Sir why is showing INT64 in below
20:40:58 From Ikhlas Bhojani to Everyone:
	Replying to "mera question hai wh..."
	
	type hoti he data ki
20:42:24 From Kaleem to Everyone:
	in our structure data types of data we use key, value /key, pair values to define our data, how to translate that data Series?
20:42:27 From Asif Ali Shaikh to Everyone:
	int 64 is memory
20:42:44 From Asif Ali Shaikh to Everyone:
	Web3
20:45:11 From Zeeshan Abbas to Everyone:
	Reacted to "Web3" with 👍
20:46:15 From Galaxy to Everyone:
	Sir g account udaar dy dy
20:46:25 From Galaxy to Everyone:
	Reacted to Sir g account udaar ... with "😂"
20:46:40 From Ayesha Arshad to Everyone:
	Reacted to "Sir g account udaar ..." with 😂
20:46:40 From Ayesha Arshad to Everyone:
	Removed a 😂 reaction from "Sir g account udaar ..."
20:46:41 From jhon wick to Everyone:
	Reacted to "Sir g account udaar ..." with 😂
20:46:43 From Ayesha Arshad to Everyone:
	Reacted to "Sir g account udaar ..." with 😂
20:46:54 From Taif Ullah to Everyone:
	is bing equal to gpt 4
20:49:29 From fahad rasheed to Everyone:
	2 dimension data hy pora select krna parhyga "" mn
20:49:29 From Azfar Suhail to Everyone:
	from nptyping import DataFrame as DF, Structure as S
	
	s2 : DF[S["Abc : str"]] = pd.Series(['a','b','c','d','e'])
	s2
20:49:29 From Azfar Suhail to Everyone:
	this is working
20:51:57 From Saboor Hussain to Everyone:
	Reacted to "Sir g account udaa..." with 😂
20:51:59 From Saboor Hussain to Everyone:
	Removed a 😂 from "Sir g account udaa..."
20:58:40 From jhon wick to Everyone:
	yes
20:59:04 From farhan to Everyone:
	Kindly unmute
20:59:22 From Ikhlas Bhojani to Everyone:
	Replying to "Kindly unmute"
	
	wait
21:00:43 From PIAIC80919 Muhammad Asad to Everyone:
	Assalamu Aliakum
21:01:33 From Faizan Hassan to Everyone:
	s1.name kar k kar saktay hon ge
21:01:43 From PIAIC80919 Muhammad Asad to Everyone:
	I joined late on zoom meet so kindly share links that sir share until now
21:02:26 From Taif Ullah to Everyone:
	Replying to "I joined late on zoo..."
	
	every thing will be on github
21:04:19 From PIAIC80919 Muhammad Asad to Everyone:
	Numpy aur Pandas kay liya sir nay koi book share ki hai kay nahi
21:04:42 From Amanat Wattoo to Everyone:
	Replying to "Numpy aur Pandas kay..."
	
	no
21:05:21 From PIAIC80919 Muhammad Asad to Everyone:
	aur python kay liya koi alag WhatsApp group hai to please uska b link share kardien
21:06:00 From Amanat Wattoo to Everyone:
	Replying to "aur python kay liya ..."
	
	no koi group nhi bnia h
21:07:17 From Abdullah to Everyone:
	upper array wala code knsa ha??
21:07:51 From Hamza to Everyone:
	"Shape", "Shape"
21:08:10 From Hamza to Everyone:
	"Size", "Size"
21:08:26 From SheikhMAqib to Everyone:
	Double coat
21:08:33 From Azfar Suhail to Everyone:
	Shape turtle se import kara hai
21:08:51 From Azfar Suhail to Everyone:
	shape nptyping se import nhi howa
21:09:34 From Muhammad Uzair to Everyone:
	shape galat import ha
21:09:35 From Khadija Zahid to Everyone:
	shape ko kindly ek br explain kr den dbra
21:09:38 From PIAIC80919 Muhammad Asad to Qasim(CGAIO)(Direct Message):
	Assalamu Aliakum Sir Kindly sir mujhe bta dien k Data Science kay liya Math aur Statistic kay kon si books and courses hum karien
21:09:50 From Yasir to Everyone:
	shape import
21:09:53 From Yasir to Everyone:
	missing
21:09:56 From Muhammad Uzair to Everyone:
	from import typing shape
21:10:11 From Saboor Hussain to Everyone:
	sir
21:10:19 From Saboor Hussain to Everyone:
	aap nptyping se shape ko import karen
21:10:21 From Yasir to Everyone:
	import nhi kia shape
21:10:24 From farhan to Everyone:
	Shape import nahi thi
21:10:25 From Azfar Suhail to Everyone:
	turtle se import kara hai
21:14:10 From sadia to Everyone:
	can we get values 0 1 2 3 4 5 6 7 8 in vertical, abhi data horizontal arha hai
21:18:10 From Faiz M to Everyone:
	Sir chezy hard sy hard hoti ja rahe hy. aaj tho sir k oper oper ja raha hy.
21:18:56 From Qasim(CGAIO) to Everyone:
	https://www.w3schools.com/python/pandas/data.js
21:19:20 From Khadija Zahid to Everyone:
	html wala b ek br code dekha k bta de plz
21:19:22 From Abdullah to Everyone:
	kindly class k bd groups me sessions ka link send kr dia kryen
21:21:27 From Abdullah to Everyone:
	Replying to "kindly class k bd gr..."
	
	@Ikhlas Bhojani
21:22:11 From Ahmed Siddiqui to Everyone:
	what if data size in millions, what kind of preprocessing is required before handing over to pandas?
21:22:41 From Ikhlas Bhojani to Everyone:
	Replying to "kindly class k bd gr..."
	
	me ap logo ke group me nhi hn
21:22:59 From Abdullah to Everyone:
	Replying to "kindly class k bd gr..."
	
	bro sir sy kah dyen 
	ya sir Imran sy request kr dyen
21:24:13 From Ikhlas Bhojani to Everyone:
	Replying to "html wala b ek br co..."
	
	pd.read_html("url")
21:26:02 From Kaleem to Everyone:
	how to make identical data ?
21:26:19 From Afifa Dar to Everyone:
	colmn3 k chexk me ==2 se kya horaha ?
21:29:40 From PIAIC80919 Muhammad Asad to Qasim(CGAIO)(Direct Message):
	sir assignment b day dein practice kay liya
21:32:29 From Ali Zar FSD to Everyone:
	sliding
21:32:41 From Ali Zar FSD to Everyone:
	likha gya sir
21:33:00 From fahad rasheed to Everyone:
	sir thora data large kryn plx
21:33:18 From fahad rasheed to Everyone:
	for slicing thora data bardhae
21:39:55 From raheela to Everyone:
	Name is tort ???
21:40:46 From Naveed Delattre to Everyone:
	it’s toad
21:41:45 From Altaf Hussain to Everyone:
	PIAIC-173738
21:41:51 From Hamza to Everyone:
	PIAIC-201785
21:41:52 From jhon wick to Everyone:
	piaic 223880
21:41:54 From Hina Zargham to Everyone:
	PIAIC101499
21:41:54 From Hatif Humayun to Everyone:
	PIAIC-52822
21:41:54 From Ahmed Siddiqui to Everyone:
	PIAIC123456
21:41:56 From Arif Najmi to Everyone:
	125657
21:42:00 From Rehan Baig - PIAIC73919 to Everyone:
	PIAIC73919
21:42:00 From STONE to Everyone:
	ZAM - 786
21:42:01 From M. Waheed Iqbal (PIAIC_126369) to Everyone:
	PIAIC_126369
21:42:03 From . to Everyone:
	PIAIC210905
21:42:06 From ABDUL KHALIQ to Everyone:
	PIAIC-604031
21:42:11 From Arshad Siddiqui to Everyone:
	PIAIC120702
21:42:13 From Ali Zar FSD to Everyone:
	PIaic 223972
21:42:13 From Azfar Suhail to Everyone:
	PIAIC218333
21:42:14 From Kamran Ahmed to Everyone:
	PIAIC139495
21:42:18 From Ahmed to Everyone:
	216511
21:42:20 From Ayesha Arshad to Everyone:
	PIAIC-225620
21:42:25 From Kamal Hassan to Everyone:
	PIAIC58320
21:42:29 From Ahmed to Everyone:
	PIAIC-2165111
21:42:30 From Kaleem to Everyone:
	PIAIC:001100
21:42:34 From Arif Najmi to Everyone:
	PIAIC 125657
21:42:35 From Yasir to Everyone:
	PIAIC63502
21:42:37 From Ahmed to Everyone:
	PIAIC-216511
21:42:41 From Ali to Everyone:
	PIAIC76588
21:42:46 From M Qasim to Everyone:
	PIAIC178397
21:42:55 From Dr. Ghulam Shabbir to Everyone:
	PIAIC208889
21:42:59 From IMRAN to Everyone:
	PIAIC216423
21:43:04 From PIAIC80919 Muhammad Asad to Qasim(CGAIO)(Direct Message):
	PIAIC80919
21:43:04 From Ahmed to Everyone:
	PIAIC-216511
21:43:16 From Kamal Hassan to Everyone:
	PIAIC,58321
21:43:29 From Zeeshan Abbas to Everyone:
	PIAIC221479
21:43:43 From Amanat Wattoo to Everyone:
	PIAIC174651
21:44:12 From Ahsan to Everyone:
	PIAIC185091
21:44:26 From Amanat Wattoo to Everyone:
	@Ikhlas Bhojani bhai chat ko save kaisy save krwia h
21:45:09 From Ikhlas Bhojani to Everyone:
	Replying to "@Ikhlas Bhojani bhai..."
	
	neeche three dot se
21:46:20 From Ikhlas Bhojani to Everyone:
	Replying to "@Ikhlas Bhojani bhai..."
	
	Jahan msg likhte hen uske neeche three dot he
21:46:30 From Amanat Wattoo to Everyone:
	Replying to "@Ikhlas Bhojani bhai..."
	
	ok find thanks
21:47:27 From Amanat Wattoo to Everyone:
	Replying to "@Ikhlas Bhojani bhai..."
	
	found


"""

import re 

patterns : str = r'''
(\d{2}:\d{2}:\d{2}) From (.*) to Everyone:
	(PIAIC-? ?\d{5,6})
'''

data : list[list[str]] = re.findall(patterns, x)

data

[('21:41:45', 'Altaf Hussain', 'PIAIC-173738'),
 ('21:41:54', 'Hina Zargham', 'PIAIC101499'),
 ('21:41:54', 'Ahmed Siddiqui', 'PIAIC123456'),
 ('21:42:00', 'Rehan Baig - PIAIC73919', 'PIAIC73919'),
 ('21:42:03', '.', 'PIAIC210905'),
 ('21:42:11', 'Arshad Siddiqui', 'PIAIC120702'),
 ('21:42:13', 'Azfar Suhail', 'PIAIC218333'),
 ('21:42:20', 'Ayesha Arshad', 'PIAIC-225620'),
 ('21:42:34', 'Arif Najmi', 'PIAIC 125657'),
 ('21:42:37', 'Ahmed', 'PIAIC-216511'),
 ('21:42:46', 'M Qasim', 'PIAIC178397'),
 ('21:42:59', 'IMRAN', 'PIAIC216423'),
 ('21:43:04', 'Ahmed', 'PIAIC-216511'),
 ('21:43:29', 'Zeeshan Abbas', 'PIAIC221479'),
 ('21:44:12', 'Ahsan', 'PIAIC185091')]

In [47]:
df : pd.DataFrame = pd.DataFrame(data, columns=['Time', 'Name',"Roll Number"])

df

Unnamed: 0,Time,Name,Roll Number
0,21:41:45,Altaf Hussain,PIAIC-173738
1,21:41:54,Hina Zargham,PIAIC101499
2,21:41:54,Ahmed Siddiqui,PIAIC123456
3,21:42:00,Rehan Baig - PIAIC73919,PIAIC73919
4,21:42:03,.,PIAIC210905
5,21:42:11,Arshad Siddiqui,PIAIC120702
6,21:42:13,Azfar Suhail,PIAIC218333
7,21:42:20,Ayesha Arshad,PIAIC-225620
8,21:42:34,Arif Najmi,PIAIC 125657
9,21:42:37,Ahmed,PIAIC-216511
