#### Import all Libraries - to executed everytime you open the notebook

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

#### Installing the libraries if not present - one time activity

In [None]:
!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install seaborn
!pip install scipy
!pip install openpyxl

# Dataframe

A DataFrame is two dimensional data structure where the data is arranged in the tabular format in rows and columns

#### DataFrame features:

- Columns can be of different data types
- Size of dataframe can be changes
- Axes(rows and columns) are labeled
- Arithmetic operations can be performed on rows and columns

### Concataneting and Merging Dataframes

In [None]:
df_jan = pd.DataFrame({"Order ID" : range(101, 111), "Sales" : np.random.randint(10000, 50000, 10)})
df_feb = pd.DataFrame({"Order ID" : range(111, 121), "Sales" : np.random.randint(10000, 50000, 10)})
df_mar = pd.DataFrame({"Order ID" : range(121, 131), "Sales" : np.random.randint(10000, 50000, 10)})

In [None]:
df_jan.head(2)

In [None]:
df_feb.head(2)

In [None]:
df_mar.head(2)

#### Concatenate
pd.concat(`tuple of dfs`, `ignore_index = False`, `axis=0`)

In [None]:
df = pd.concat((df_jan, df_feb, df_mar), ignore_index=True)

In [None]:
df["Sales"].sum() # - total sales

In [None]:
# Write the data to csv file - 
df.to_csv("Sales.csv", index=None)

In [None]:
# Write the data to excel file - 
df.to_excel("Sales.xlsx", sheet_name="Total Sales", index=None)

# Note - when writing data to excel the original file must be closed.

**Example**
- Add a new column to each dataframe as month and value = "jan" or "feb" or "mar"
- Combine all the three dataframes and write to file

In [None]:
df_jan["Month"] = "Jan"
df_feb["Month"] = "Feb"
df_mar["Month"] = "Mar"

In [None]:
df = pd.concat((df_jan, df_feb, df_mar), ignore_index=True)
df.head(2)

In [None]:
# Write the data to csv file - Always replace the exsisting data
df.to_csv("Sales.csv", index=None)

In [None]:
# Write the data to csv file - Append data to exsisting file
df.to_csv("Sales.csv", index=None, mode = "a")

#### Merging Dataframes

`df1.merge(df2, how="", on = "", left_on="", right_on="")`

- **how** - type of merge (inner, left, right, outer)
- **on** - name of common column, used when both dfs have same name for the common/reference column
- **left_on** or **right_on** - name of left/right column when reference column names are different

In [None]:
df_emp = pd.DataFrame({"Name" : ["Jack", "Bill", "Lizie", "Jane", "George"],
            "Designation" : ["HR", "Manager", "Developer", "Intern", "Manager"]})
df_emp

In [None]:
base_salaries = pd.DataFrame({"Designation" : ["HR", "Developer", "Manager", "Senior Manager"],
            "Salary": [40000, 25000, 70000, 1000000]})
base_salaries

**Inner Merge**
 - Gives data only for the common values for reference column in both the dfs

In [None]:
df_emp.merge(base_salaries, on="Designation", how = "inner" )

**Left Merge**
 - Gives data for the left table and corresponding values from right table based on reference column. Gives null for missing values

In [None]:
df_emp.merge(base_salaries, on="Designation", how = "left" )

**Right Merge**
- Gives data for the right table and corresponding values from left table based on reference column. Gives null for missing values

In [None]:
df_emp.merge(base_salaries, on="Designation", how = "right" )

**Outer Merge**

In [None]:
df_emp.merge(base_salaries, on="Designation", how = "outer" )

#### Examples

###### Ex. Calculate total sales across all three months using Excel plug-in

###### Ex. Create a table displaying salary of each employee

### DataFrame toolkit - 

###### Ex. Read data from `BSE Sensex 30 Historical Data.csv`

In [None]:
df = pd.read_csv(r"./Datasets/BSE Sensex 30 Historical Data.csv")
df.head(2)

#### Drop a column or row from dataframe
`df.drop(columns = [], index = [], inplace=False)`
- inplace = False returns a new DataFrame (default), True modifies original df

In [None]:
df.drop(columns=["High", "Low"], index=[0, 10, 4, 8])

#### Working with **null** values

`df.isna()` - Detect missing values. Return a boolean same-sized object indicating if the values are NA.

`df.fillna(value=None, inplace=False, method = None)` - Fill NA/NaN values using the specified method.

method : {'backfill', 'bfill', 'ffill', None}

In [None]:
df.isna().any()  # True means there is atleast 1 null value in the column

**Incase entire row/column is null - Drop null rows**

df.dropna(`axis = 0`, `how = "any"`, `inplace = False`)
- axis 0 for row or 1 for column
- how - {any or all}

In [None]:
df.shape

In [None]:
# df.dropna(axis = 0, how = "any") - deletes rows with any 1 null value
df.dropna()

In [None]:
df.dropna(axis= 0, how="all", inplace=True) # - deletes rows with with all null values

In [None]:
df.isna().any() # null rows are deleted by vol column still has null values

In [None]:
df.dropna(axis= 1, how="any") # - deletes column with any 1 null values

In [None]:
df.dropna(axis= 1, how="all", inplace=True) # - deletes column with any 1 null values - in this case no change in the df

In [None]:
df.head(2)

###### Extracting null rows for Vol column

In [None]:
df[df.isna().any(axis = 1)] # for any column

In [None]:
df[df["Vol."].isna()] # for specific column

###### Ex. Replace the null value with default 

In [None]:
df["Vol."].fillna(0, inplace=True)  # syntax in older pandas version

In [None]:
df.fillna({"Vol.": 0, "High" : 1, "Low" : df.Low.mean()})  # new syntax - provides single code to modify multiple cols

###### Ex. Replace null with ffill or bfill

In [None]:
df["Vol."] = df["Vol."].ffill()  # fowardfill avoid inplace = True in this case

In [None]:
df["Vol."] = df["Vol."].bfill()  # backwardfill avoid inplace = True in this case

#### Removing Duplicate Data

In [86]:
df.duplicated().any()

np.False_

`df.drop_duplicates(subset = [columns], inplace=False)`

In [88]:
df.drop_duplicates(inplace=True)

#### Replacing values

df.replace({`colname` : {`old_value` : `new_value`}}, `inplace=True`)

In [89]:
df.head()

Unnamed: 0,Date,Price,Open,High,Low,Vol.,Change %
0,16-04-2025,76761.72,76996.78,76996.78,76544.07,4.99M,0.03%
1,15-04-2025,76734.89,76852.06,76857.05,76449.56,12.93M,2.10%
2,11-04-2025,75157.26,74835.49,75467.33,74762.84,14.23M,1.77%
3,09-04-2025,73847.15,74103.83,74103.83,73673.06,9.15M,-0.51%
4,08-04-2025,74227.08,74013.73,74859.39,73424.92,17.06M,1.49%


In [None]:
df.replace({"Change %" : {"0.03%" : "0.05%"}})

#### Clean the dataset

In [91]:
df.dtypes

Date        object
Price       object
Open        object
High        object
Low         object
Vol.        object
Change %    object
dtype: object

In [130]:
df["Price"] = df["Price"].str.replace(",", "").astype(float)
df["Open"] = df["Open"].str.replace(",", "").astype(float)
df["High"] = df["High"].str.replace(",", "").astype(float)
df["Low"] = df["Low"].str.replace(",", "").astype(float)
df["Change %"] = df["Change %"].str.replace("%", "").astype(float)

In [131]:
df["Volume"] = df["Vol."].str[:-1].astype(float)
df["temp"] = df["Vol."].str[-1]
df["Volume"] = df["temp"].map({"M" : 1000000, "K" : 1000, "B" : 1000000000}) * df["Volume"]
df.head(10)

Unnamed: 0,Date,Price,Open,High,Low,Vol.,Change %,Volume,temp
0,2025-04-16,76761.72,76996.78,76996.78,76544.07,4.99M,0.03,4990000.0,M
1,2025-04-15,76734.89,76852.06,76857.05,76449.56,12.93M,2.1,12930000.0,M
2,2025-11-04,75157.26,74835.49,75467.33,74762.84,14.23M,1.77,14230000.0,M
3,2025-09-04,73847.15,74103.83,74103.83,73673.06,9.15M,-0.51,9150000.0,M
4,2025-08-04,74227.08,74013.73,74859.39,73424.92,17.06M,1.49,17060000.0,M
5,2025-07-04,73137.9,71449.94,73403.99,71425.01,29.37M,-2.95,29370000.0,M
6,2025-04-04,75364.69,76160.09,76258.12,75240.55,29.37M,-1.22,29370000.0,M
7,2025-03-04,76295.36,75811.86,76493.74,75807.55,6.92M,-0.42,6920000.0,M
8,2025-02-04,76617.44,76146.28,76680.35,76064.94,10.75M,0.78,10750000.0,M
9,2025-01-04,76024.51,76882.58,77487.05,75912.18,10.59M,-1.8,10590000.0,M


#### Grouping Dataframes

##### `df.groupby(by=None, as_index=True, sort=True, dropna=True)`

- use of `agg()`

In [133]:
df["Date"] = pd.to_datetime(df["Date"], format = "mixed")
df.insert(1, "Year", df["Date"].dt.year)
df.insert(2, "Month", df["Date"].dt.month_name())
df.insert(3, "Month#", df["Date"].dt.month)
df.head()

Unnamed: 0,Date,Year,Month,Month#,Price,Open,High,Low,Vol.,Change %,Volume,temp
0,2025-04-16,2025,April,4,76761.72,76996.78,76996.78,76544.07,4.99M,0.03,4990000.0,M
1,2025-04-15,2025,April,4,76734.89,76852.06,76857.05,76449.56,12.93M,2.1,12930000.0,M
2,2025-11-04,2025,November,11,75157.26,74835.49,75467.33,74762.84,14.23M,1.77,14230000.0,M
3,2025-09-04,2025,September,9,73847.15,74103.83,74103.83,73673.06,9.15M,-0.51,9150000.0,M
4,2025-08-04,2025,August,8,74227.08,74013.73,74859.39,73424.92,17.06M,1.49,17060000.0,M


In [134]:
df.Year.unique()

array([2025, 2024, 2023], dtype=int32)

###### Ex. Year average Price

In [136]:
df.groupby("Year")["Price"].mean().round(2)

Year
2023    64567.91
2024    77225.52
2025    76175.51
Name: Price, dtype: float64

In [138]:
df.groupby(["Year", "Month#", "Month"])["Price"].mean().round(2)

Year  Month#  Month    
2023  1       January      65069.51
      2       February     63441.38
      3       March        59872.70
      4       April        61990.71
      5       May          62860.83
      6       June         63941.57
      7       July         66308.81
      8       August       65205.26
      9       September    65859.63
      10      October      64618.01
      11      November     65714.73
      12      December     68690.92
2024  1       January      73811.38
      2       February     74216.63
      3       March        74811.65
      4       April        75312.02
      5       May          75225.31
      6       June         77504.34
      7       July         79309.00
      8       August       79210.81
      9       September    81191.39
      10      October      79498.66
      11      November     78482.14
      12      December     78356.19
2025  1       January      76630.13
      2       February     75727.45
      3       March        76407.02
    

#### Ranking and Sorting Dataframes

###### Ex. Rank the products in descending order of `Sales`

###### Ex. Sort the data in ascending order of `Rank`

#### Setting and Resetting Index

`df.set_index(keys, drop=True, inplace=False,)`- Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

`df.reset_index(level=None, drop=False, inplace=False,)` - Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

#### Working with dates

###### Create columns Year and Month - extract data using pd.DatetimeIndex

###### Extract data for 2023

###### Ex. Visualise Trend and Sesonality of the data

###### Extract data for Jan - 2023

###### Extract data for Jan - 2023 and 2024

###### Extract data starting from April - 2024

###### Extract data from Jan-2023 to Apr-2024

### Descriptive Statistics

Descriptive statistics deals with summarizing and describing the features of a dataset or sample. Descriptive statistics provides a summary of the main features of the data, including measures of central tendency, dispersion, shape, and relationships between variables.

#### Measures of Central Tendency:
    - Mean: The average value of the data points.
    - Median: The middle value of the data when arranged in ascending order.
    - Mode: The most frequently occurring value in the dataset.

#### Measures of Dispersion:
    - Range: The difference between the maximum and minimum values in the dataset.
    - Variance: The average of the squared differences from the mean.
    - Standard Deviation: The square root of the variance, representing the average deviation from the mean.

#### Measures of Shape:
    - Skewness: A measure of the asymmetry of the distribution.
        - Positive skewness indicates a longer right tail and a concentration of data on the left side.
        - Negative skewness indicates a longer left tail and a concentration of data on the right side.
        - Skewness close to zero indicates approximate symmetry around the mean.

    - Kurtosis: A measure of the "peakedness" or "flatness" of the distribution.
        - Positive kurtosis indicates heavy tails and a sharp peak (leptokurtic).
        - Negative kurtosis indicates light tails and a flat peak (platykurtic).
        - A kurtosis of 0 indicates a distribution with similar tails to the normal distribution (mesokurtic).

#### Frequency Distribution:
    - Frequency table: A table that shows the frequency or count of each value in the dataset.
    - Histogram: A graphical representation of the frequency distribution, showing the distribution of values in bins or intervals.

#### Measures of Association:
    - Correlation: A measure of the strength and direction of the linear relationship between two variables.
    - Covariance: A measure of the joint variability between two variables.

In [None]:
# dataset consists of weights children in the age group of 0 to 10 years
weights = np.array([20.8,15.3,23.2,15.5,17.5,27.3,23.3,20.5,16.4,17.4,22.6,20.8,16.7,29.1,14.2,18.5,17.6,17.1,18.2,26.0,25.6,19.1,17.8,21.6,19.6,28.4,19.4,22.8,25.9,27.1,20.8,21.9,15.4,30.2,23.0,25.0,24.0,17.1,18.8,25.8,23.2,14.9,20.4,15.0,12.3,17.6,10.3,12.1,25.8,21.9,9.9,24.8,17.9,22.0,27.6,21.7,26.3,19.9,16.6,17.8,8.0,21.5,17.9,33.6,20.4,21.9,19.8,15.3,25.6,24.7,10.6,22.1,18.8,9.8,14.9,18.7,27.6,21.2,25.1,21.9,22.1,19.6,10.4,17.2,18.3,21.2,21.1,21.0,19.1,14.8,19.2,18.3,22.1,14.1,19.3,16.2,25.6,14.8,22.3,25.0,20.8,21.2,16.6,15.0,15.6,24.7,26.1,18.1,14.1,25.5,21.4,32.7,13.8,27.4,15.8,18.4,21.3,27.5,15.8,27.2,26.9,16.2,22.6,15.3,22.5,26.6,24.7,28.4,28.0,19.9,9.9,16.5,19.5,11.7,27.9,28.1,21.1,20.0,14.5,25.9,13.3,15.8,21.3,10.7,14.6,19.9,21.6,26.6,26.0,23.7,20.5,11.8,15.6,14.9,24.6,20.9,24.5,26.8,25.3,30.9,26.1,14.9,17.9,18.5,25.7,16.4,8.2,24.1,19.5,13.1,17.6,13.4,24.4,16.4,19.9,12.9,14.3,25.4,15.4,17.7,17.3,18.7,16.1,7.4,18.4,16.2,18.7,19.8,25.6,23.7,22.5,20.5,14.9,25.0,4.7,6.8,21.8,22.2,24.4,13.4,32.1,26.6,27.2,17.8,19.6,16.8,14.4,24.9,12.6,15.0,15.4,10.1,12.9,15.9,22.3,15.0,24.4,21.0,17.2,25.2,15.6,24.6,24.9,13.6,10.3,25.6,18.3,25.1,18.1,20.2,29.1,25.7,14.9,11.4,19.2,21.5,13.9,19.5,19.1,23.0,26.8,14.8,24.4,14.7,24.3,24.5,19.9,12.1,21.2,16.4,19.5,25.8,8.5,19.8,23.4,21.3,14.7,17.9,15.7,14.9,5.2,3.8,31.5,17.4,16.0,20.7,18.4,8.1,22.9,32.8,19.0,33.0,26.0,14.2,18.4,15.1,29.5,13.9,5.1,28.1,18.2,10.5,27.8,19.4,13.1,21.9,18.6,11.0,19.7,20.2,20.1,17.6,21.6,21.2,30.0,26.1,20.5,22.8,20.5,19.6,18.0,19.8,21.8,7.0,21.4,22.5,18.9,15.1,22.0,25.5,11.1,15.4,21.5,13.5,11.2,19.3,25.5,20.2,18.3,15.3,40.3,22.1,23.2,17.1,19.4,15.3,28.6,23.0,19.4,16.7,20.9,21.1,15.9,29.6,17.0,19.0,20.8,20.4,9.8,24.0,15.2,26.2,22.0,21.4,16.2,29.2,21.4,28.9,16.9,21.9,11.2,19.0,21.2,20.8,9.3,12.5,26.9,10.7,18.9,17.5,23.3,12.5,13.4,26.3,23.0,21.8,25.6,20.7,18.0,32.5,25.6,8.4,19.8,24.5,18.0,24.7,21.3,12.9,21.6,29.1,25.8,26.4,27.0,25.6,19.0,20.0,10.4,17.5,19.5,17.5,21.1,19.6,23.1,18.2,27.0,18.5,19.3,25.4,20.8,20.8,20.6,20.4,23.1,17.6,18.7,16.2,18.9,15.2,22.5,10.0,21.1,29.3,17.8,27.1,16.8,18.0,28.3,16.5,19.8,16.7,23.2,23.6,18.5,29.8,24.2,22.4,29.3,29.3,21.5,15.5,23.0,12.8,20.8,11.5,20.0,15.2,18.7,17.2,22.5,13.5,13.1,17.6,12.1,23.2,18.0,24.2,7.3,17.0,17.0,22.7,22.1,18.0,15.6,13.9,17.7,14.7,26.1,12.2,20.3,17.8,16.5,10.2,18.2,22.2,26.3,26.6,19.2,19.5,14.3,15.6,13.9,20.2,11.6,31.2,6.3,23.4,21.1,22.0,8.5,11.5,19.3,17.7,11.9,14.8,16.9,16.1,13.0,17.9,22.0,14.2,13.9,25.4,21.1,16.3,16.4,19.3,18.3,23.0,27.4,24.2,14.9,12.7,16.7,17.8,19.5,14.9,23.9,15.2,25.4,22.9,25.2,12.7,26.0,26.7,15.8,24.9,24.4,15.5,20.0,7.8,20.6,19.0,29.8,14.1,14.1,17.8,24.9,20.7,19.8,24.2,16.7,21.3,23.7,20.9,23.6,25.3,9.9,21.7,16.7,10.3,18.9,25.2,12.7,27.4,21.4,23.0,11.8,22.1,13.0,20.0,27.2,19.8,16.9,18.8,25.0,9.0,19.0,11.1,19.6,24.0,29.3,20.7,10.7,26.9,18.6,21.5,26.2,21.5,27.9,22.1,25.1,27.9,18.9,26.4,20.0,25.6,27.5,17.7,18.9,27.0,14.9,27.6,19.6,18.1,19.2,20.2,16.4,16.6,14.1,8.9,17.6,17.3,21.0,14.3,18.7,19.9,12.3,24.4,23.3,25.1,27.9,15.1,18.3,23.2,17.8,15.5,22.0,23.0,20.4,15.5,22.8,19.5,22.2,22.3,25.1,15.0,19.9,23.8,18.8,17.0,7.9,24.0,31.4,17.6,27.4,28.1,17.9,18.3,17.3,21.6,17.8,22.4,19.2,22.8,21.4,19.1,22.4,29.4,13.6,15.0,28.8,18.2,25.8,15.1,23.6,12.2,10.1,15.4,27.0,17.2,11.6,20.8,18.8,20.4,18.1,20.9,31.1,19.0,18.5,17.9,23.1,32.0,21.7,23.4,17.1,19.0,18.1,19.3,18.8,25.8,19.8,22.1,15.8,15.9,21.3,18.4,17.8,23.1,22.2,15.3,20.0,20.8,30.2,24.0,12.6,9.2,21.7,19.8,16.6,16.5,18.0,21.1,10.0,23.5,26.9,23.7,16.8,12.7,29.8,17.9,18.6,19.9,23.6,26.4,18.2,18.1,19.2,15.3,19.4,20.2,33.2,26.2,26.9,15.3,18.9,18.3,27.6,29.0,22.5,30.2,22.8,13.8,21.4,27.3,25.1,26.7,7.9,27.3,21.9,15.6,18.1,19.9,23.0,22.9,15.0,16.5,18.9,24.8,4.9,16.7,20.2,7.9,19.3,16.1,22.5,27.4,29.6,18.7,21.9,9.9,24.6,10.1,21.5,20.6,11.4,14.0,18.4,6.5,12.8,25.5,19.5,14.6,20.5,18.9,14.8,21.7,17.3,33.1,23.1,25.1,30.2,17.0,13.2,29.2,14.8,13.5,22.3,9.0,19.2,19.1,20.4,14.2,20.6,19.3,27.6,21.4,15.5,23.4,13.2,12.1,23.2,33.8,17.5,19.1,23.2,12.6,24.6,14.6,19.5,20.1,13.4,12.4,14.6,16.8,27.5,19.5,11.5,15.4,24.0,21.1,21.8,29.1,18.9,21.3,18.9,20.1,13.7,19.7,17.0,34.0,24.0,16.9,19.3,19.9,19.4,19.3,21.5,20.8,20.1,27.1,23.5,22.9,19.4,14.8,28.6,23.2,8.7,30.1,19.1,15.1,22.3,17.4,23.8,18.1,20.4,23.1,22.4,13.7,25.5,25.9,21.3,21.6,21.4,29.7,18.1,10.3,23.6,21.0,13.1,10.6,20.8,21.7,19.4,21.4,25.1,16.6,11.1,13.8,21.1,22.0,20.8,19.1,24.4,15.2,25.1,19.2,10.5,21.8,17.5,19.6,8.2,22.1,21.8,13.6,15.1,28.4,22.0,28.6,25.3,29.3,19.1,20.8,10.8,23.6,13.5,23.2,16.9,22.9,21.3,21.8,31.3,21.1,17.1,25.5,26.1,19.2,19.9,26.4,34.9,17.4,24.2,17.5,21.7,11.3,4.9,21.3,17.2,15.6,22.4,28.2,22.3,21.3,20.5,18.4,17.5,9.7,21.4,12.8,20.0,20.7,16.1,26.2,20.9,19.2,23.6,22.4,31.7,22.5,20.3,13.8,23.3,17.4,3.0,16.6,17.1,27.8,17.4,13.7,14.1,26.4,23.9,21.2,27.4,9.9,18.7,17.0,21.1,22.9,8.7,19.1,25.1,21.0,22.3,21.5,19.6,17.6,22.7,14.7,11.5,28.6,19.9,13.5,27.3,19.6,18.7,14.0,11.6,15.0,26.5,21.7,22.1,5.7,18.3,23.4,19.5,15.8,18.6,19.3,27.0,21.7,21.9,20.7,13.5,29.7,22.3,24.0,31.1,23.9,15.7,19.6])


In [None]:
# dataset consists of Salaries of employees in an organisation
salaries = np.array([29756,20014,20347,57214,41327,40209,93390,122004,17725,47210,44386,48407,16837,83731,9130,66723,72525,57347,10941,18726,8913,59251,13090,37983,134656,45499,59533,82998,31440,11672,16295,30676,21822,35263,27340,65522,23380,11662,7066,22403,41230,46693,22478,82491,7347,16263,72672,20522,38409,30175,31383,98820,13605,45096,12397,90988,6602,29786,102559,31790,29768,50085,22649,24426,4059,95210,68657,17799,37370,46160,35133,40969,57201,54757,17973,13610,46004,91341,24474,48005,9473,10277,71287,9383,36492,104352,13473,51293,51911,10026,39992,125885,44462,76531,41512,47267,33231,14180,44474,55702,39554,8359,51892,98574,43638,90568,40508,34129,98497,74784,63383,47197,83519,26458,38642,9629,18404,47324,15793,120345,61126,64613,57964,47582,77944,27082,51891,98126,69008,23284,49785,72406,56418,36769,58715,42999,47333,45733,141091,3848,57584,48356,95301,95269,49894,101380,44028,54577,71055,32066,26596,66653,3179,44484,62889,62952,50903,74656,50733,38180,59410,105003,73854,33579,150293,26348,6769,26315,53038,35766,50517,64714,27523,26867,46607,9882,60052,46653,42143,37371,14475,103629,55402,6149,65128,32861,27603,75553,35641,21457,106916,50369,37731,6473,73858,7716,21144,34340,27917,18150,49270,16344,84532,28616,18452,84678,17990,26463,13671,70005,26237,7245,16941,64383,3317,7275,26981,12600,36983,40054,7283,82140,65120,8259,44235,30682,68578,80737,14009,88942,48374,43148,11447,32203,67168,50149,8607,9680,35442,47306,67316,52503,89884,18337,11798,40659,90852,25479,4737,107231,40006,34020,61695,12128,14126,71024,42150,54591,93625,23809,9698,50910,75967,36494,53497,28006,16650,50352,42133,10915,50698,19962,30772,23430,75790,72083,162101,75728,60565,40074,58299,18280,128972,76801,38314,12744,25607,22188,31862,15955,31175,11044,44390,49677,33251,85617,81684,48054,63108,33461,39505,51449,47547,49199,152777,49820,23147,35010,44921,39633,16546,35436,32229,28603,31804,21668,102866,58514,140647,22149,26732,88552,77813,75665,38038,123394,9457,28241,52657,9075,148287,70362,27398,18672,19003,17600,114609,4318,19729,23148,32015,87090,5342,56550,38458,5400,50686,46353,14777,19302,16606,21645,37117,22488,5465,28650,57321,34736,43956,37151,9776,37461,17631,98557,18773,15927,62892,35395,23658,27429,22496,60550,36644,38050,79320,7934,30101,71573,14389,4701,31291,11384,39725,123530,44408,58972,95799,10389,46232,3432,40560,35984,4665,169950,111402,18065,21540,70358,51973,26344,101435,5668,28783,6701,64979,30591,53626,89555,54550,47720,72312,32532,81224,32367,12856,45452,23288,68436,11028,48698,59988,25334,12898,76129,76496,66076,28330,66192,34221,24405,81851,52335,38502,25430,29421,7258,23734,12534,60625,23697,17543,35830,5033,17253,27189,48127,91649,58796,46586,42569,40202,70022,3922,41658,66536,67928,13621,71191,63947,89954,7543,20366,73226,55216,63823,20147,28646,62441,10910,21883,40687,5770,12349,59303,82027,45440,12710,126532,87569,69111,27004,13098,37670,125784,37616,46404,36971,20823,44255,53184,53752,9362,16464,13631,24283,57198,27205,60289,35590,21193,59034,71649,40198,22347,37446,30613,39731,23986,65414,6705,23140,42971,9792,23886,16397,17598,42024,32014,78351,31432,3978,34883,19845,10204,56595,25611,58573,31771,60213,24678,85938,22206,27750,43462,24977,22131,65617,70257,71995,75183,106608,54436,44381,61439,41163,81099,34095,36953,14703,23992,105384,20334,34145,48786,72804,71943,32757,77178,6381,77041,85234,31634,62231,7004,66194,23721,18122,82066,43339,13417,28110,26647,11703,160005,55765,78251,35519,22708,66840,6126,37952,31632,55294,13842,57847,43009,57445,41641,13437,41892,8126,55609,71439,65768,3032,12225,16758,12150,110890,58822,80581,12690,69074,49169,118185,9745,24482,35611,21100,13245,25269,26177,60738,119320,13615,120677,36560,14048,16249,73591,11789,42419,8691,44373,5698,38758,39244,36214,7654,26381,42371,42425,5167,38173,28250,11362,41671,38101,22759,29654,16846,42528,32035,51949,34841,65641,94153,55081,42157,53629,5482,6064,33333,53055,38653,54655,25486,28830,18681,38431,89032,38939,44533,44382,7073,93080,39698,68653,14900,4180,26923,27360,30629,33018,23166,4915,50098,31775,14625,48831,53413,50677,16354,24128,49869,23038,53312,43846,11263,19507,11322,86895,60729,144564,33429,36964,4437,48013,39779,71605,45697,20501,3059,39338,3228,22719,37974,72431,8486,24363,19558,64046,35799,20259,79873,13544,36404,55886,13904,42955,43750,17743,107390,86058,40137,65042,29084,8999,6357,29914,45867,75705,19543,64725,60567,58452,5015,50256,60877,91907,42209,13678,7797,23545,65227,86909,18614,12483,34314,52497,28754,112096,30756,16519,18075,9958,14076,16114,5200,40241,14275,53117,50561,27253,3998,85851,32716,44901,40698,42272,67106,73621,23828,50619,64147,89432,67240,119266,15347,50315,39374,27347,21786,7037,33320,9277,14225,25474,50546,61235,64796,38341,46464,38388,53785,8315,29782,35079,5943,9616,73662,52409,28236,40773,84419,49739,8678,46548,16583,15864,5920,42891,6635,91882,54534,32013,105413,11681,18153,98213,60754,53642,40221,43931,60076,9481,17046,26098,22609,21386,2797,11266,59378,57464,46271,10182,53724,89160,33549,19557,8022,43213,62795,42025,74820,49326,55701,65268,49257,38526,47121,32407,100592,21980,10691,10664,13298,58489,81011,24481,30354,5334,11554,62781,80241,17457,13682,12911,32340,54094,4987,15562,19126,58105,62497,34333,74015,78119,27715,20098,37580,14200,24208,36266,68885,66174,3965,143792,35892,43824,14009,7294,69932,11540,31644,55554,6756,69754,65940,26128,88712,11048,14382,34369,3908,30339,9290,22745,49669,93604,62655,50036,60244,52406,44821,37915,4894,38413,44612,19168,26668,20326,45231,12448,35082,121782,4863,7291,24332,42551,28462,67887,21226,41026,137990,53668,40922,15485,21118,118903,77715,24519,58873,61054,25674,2960,30624,103189,48284,40536,56053,37084,50773,11615,83270,4311,30367,6372,56358,14518,10602,35857,93798,51500,69148,51610,27676,16157,92788,4395,23687,11944,57418,71058,37037,23290,34201,84364,68400,24135,18615,15050,113480,83720,52761,26031,43187,11278,3710,27465,97386,3393,65371,5707,106125,46278,12099,17823,39132,34422])


#### Handling Outliers - 

#### `Z-Score Method:`

- The z-score method involves calculating the z-score for each data point, which represents the number of standard deviations away from the mean.
Data points with z-scores beyond a certain threshold (e.g., |z-score| > 3) are considered outliers and can be removed or treated separately.
The z-score method is sensitive to the mean and standard deviation of the data, and it assumes that the data is normally distributed.
This method is useful when the data is approximately normally distributed and when the goal is to identify outliers based on their deviation from the mean.

#### `IQR Method:`

- The IQR method involves calculating the interquartile range (IQR), which is the difference between the third quartile (Q3) and the first quartile (Q1) of the data.
Outliers are defined as data points that fall below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR.
The IQR method is robust to outliers and does not assume any specific distribution of the data.
This method is useful when the data is skewed or not normally distributed, as it focuses on the middle 50% of the data and is less influenced by extreme values.
In general, if the data is approximately normally distributed and the goal is to identify outliers based on their deviation from the mean, the z-score method may be more appropriate. On the other hand, if the data is skewed or not normally distributed, or if the goal is to identify outliers based on their relative position within the dataset, the IQR method may be a better choice.