# Title Here

## **`Table of Contents`**

1. [**Introduction**](#1)
   1. [**1.1 Project Description**]
   2. [**1.2 Data Description**]
2. [**Acquiring and Loading Data**](#2)
   1. [**2.1 Importing Libraries:**]
   2. [**2.2 Loading Data**]
   3. [**2.3 Basic Data Exploration**]
   4. [**2.4 Areas to Fix**]
3. [**Data Proprocessing**](#3)
   1. [**3.1 Pre-processing Details:**]
   2. [**3.2 Rename Columns**]
   3. [**3.3 Drop Redundant Columns**]
   4. [**3.4 Changing Data Types**]
   5. [**3.5 Dropping Duplicates**]
   6. [**3.6 Handling Missing Values**]
   7. [**3.7 Handling Unreasonable Data Ranges**]
   8. [**3.8 Feature Engineering / Transformation**]
4. [**Data Analysis**](#4)
   1. [**4.1 Exploring `Column Name`**]
5. [**Conclusion**](#5)
   1. [**5.1 Insights**]
   2. [**5.2 Suggestions**]
   3. [**5.3 Possible Next Steps**]
6. [**Epilogue**](#6) 
   1. [**6.1 References**]
   2. [**6.2 Information about the Author:**]

---

# **`1. Introduction`**
 


Insert Image Here (by dragging it)

##  **1.1 Project Description**

**Goal/Purpose:** 

What is this project about? What is the the goal/purpose of this project? Why is it important for someone to read this notebook?

<p>&nbsp;</p>

**Questions to be Answered:**

- Question 1
- Question 2
- Question 3...

<p>&nbsp;</p>

**Assumptions/Methodology/Scope:** 

Briefly describe assumptions, processing steps, and the scope of this project.

<p>&nbsp;</p>

## **1.2 Data Description**

**Content:** 

This dataset is a _(filetype) file of _(how many) data points which contains ___. 

<p>&nbsp;</p>

**Description of Attributes:** 

Here you can describe what each column represents.

| Column  | Description |
| :------ | :---------- |
| column1 | description1 |

<p>&nbsp;</p>

**Acknowledgements:** 

This dataset is provided by _(. The original dataset was scraped by _) and the original source can be found on [website](https://website.link).

---

# **`2. Aquiring & Loading Data`**
 

## **2.1 Importing Libraries:**

In [None]:
# Install libraries
# !pip install skimpy

In [4]:
# Data manipulation
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## **2.2 Loading Data**

In [3]:
# # Load DataFrame
# file = 'file.csv'
# df = pd.read_csv(file)

## **2.3 Basic Data Exploration**

In [4]:
# # Show rows and columns count
# print(f"Rows count: {df.shape[0]}\nColumns count: {df.shape[1]}")

In [5]:
# df.head()

In [6]:
# df.tail()

### *2.3.1 Check Data Types*

In [7]:
# # Show data types
# df.info()

- `column1`, `column2`, `column3` are **strings**.
- `column4` and `column5` are **floats**.
- `column6` is an **integer**.

`column3` should be a **datetime** type instead.

### *2.3.2 Check Missing Data*

In [8]:
# # Print percentage of missing values
# missing_percent = df.isna().mean().sort_values(ascending=False)
# print('---- Percentage of Missing Values (%) -----')
# if missing_percent.sum():
#     print(missing_percent[missing_percent > 0] * 100)
# else:
#     print('None')

### 2.3.3 Check for Duplicate Rows

In [9]:
# # Show number of duplicated rows
# print(f"No. of entirely duplicated rows: {df.duplicated().sum()}")

# # Show duplicated rows
# df[df.duplicated()]

### 2.3.4 Check Uniqueness of Data

In [10]:
# # Print the percentage similarity of values (the lower %, the better)
# num_unique = df.nunique().sort_values()
# print('---- Percentage Similarity of Values (%) -----')
# print(100/num_unique)

### *2.3.5 Check Data Range*

In [11]:
# # Print summary statistics
# df.describe(include='all')
# skim(df)

### *2.3.6 Checking Value Counts of Categorical Columns*

In [5]:
# df['column_name'].value_counts()

## **2.4 Areas to Fix**
**Data Types**
- [ ] Issue 1

**Missing Data**
- [ ] 

**Duplicate Rows**
- [ ]

**Uniqueness of Data**
- [ ]

**Data Range**
- [ ]

---

# **`3. Data Preprocessing`**

## **3.1 Pre-processing Details:**

Here you can add sections like:

- Renaming columns
- Drop Redundant Columns
- Changing Data Types
- Dropping Duplicates
- Handling Missing Values
- Handling Unreasonable Data Ranges
- Feature Engineering / Transformation

Use `assert` where possible to show that preprocessing is done.

## **3.2 Rename Columns**

In [12]:
# # Rename columns to snake_case
# df = clean_columns(df, replace={})

In [13]:
# # Rename columns
# columns_to_rename = {}
# df.rename(columns=columns_to_rename, inplace=True)

In [14]:
# # Verify columns are renamed
# df.columns

## **3.3 Drop Redundant Columns**

In [15]:
# # Check the proportion of the most frequent value in each column
# print('---- Frequency of the Mode (%) -----')
# mode_dict = {col: (df[col].value_counts().iat[0] / df[col].size * 100) for col in df.columns}
# mode_series = pd.Series(mode_dict)
# mode_series

In [16]:
# # Show the value frequency of each column greater than the mode's threshold
# threshold = 80
# for col in mode_series[mode_series > threshold].index:
#     print(df[col].value_counts(dropna=False))
#     print()

In [17]:
# # Drop columns (specify columns to drop)
# cols_to_drop = []
# df.drop(columns=cols_to_drop, axis=1, inplace=True)

In [18]:
# # Verify columns dropped
# assert all(col not in df.columns for col in cols_to_drop)

In [19]:
# # Drop columns (specify column indices to drop)
# df.drop(df.columns[a:b], axis=1, inplace=True)

In [20]:
# # Verify columns dropped
# assert all(col not in df.columns for col in df.columns[a:b])

In [21]:
# # Drop columns (specify columns to keep)
# cols_to_keep = []
# df = df[cols_to_keep]

In [22]:
# # Verify columns dropped
# assert all(col in df.columns for col in cols_to_keep)

## **3.4 Changing Data Types**

In [23]:
# # Convert columns to the right data types
# df[col] = df[col].astype('string')
# df[col] = df[col].astype('int')
# df[col] = pd.to_datetime(df[col], infer_datetime_format=True)

# # Convert to categorical datatype
# col_cat = ptypes.CategoricalDtype(categories=['A', 'B', 'C'], ordered=True)
# df['col_cat'] = df['col_cat'].astype(col_cat)

In [24]:
# # Verify conversion
# assert ptypes.is_string_dtype(df[col])
# assert ptypes.is_numeric_dtype(df[col])
# cols_to_check = []
# assert all(ptypes.is_datetime64_any_dtype(df[col]) for col in cols_to_check)

## **3.5 Dropping Duplicates**

In [25]:
# # Drop entirely duplicated rows
# df.drop_duplicates(inplace=True, ignore_index=True)

In [26]:
# # Verify rows dropped
# assert df.duplicated().sum()==0

## **3.6 Handling Missing Values**

## **3.7 Handling Unreasonable Data Ranges**

In [27]:
# # Drop affected rows
# df = df.loc[~((df['A'] == 0) | (df['B'] > 100))].reset_index()

In [28]:
# # Verify rows dropped
# len(df)

## **3.8 Feature Engineering / Transformation**

In [29]:
# # Get unique values of interested columns
# cols = []
# pd.unique(df[cols].values.ravel('k'))  # argument 'k' lists the values in the order of the cols 

In [30]:
# # Create custom function
# # Google style docstrings
# # https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
# def custom_function(param1: int, param2: str) -> bool:
#     """Example function with PEP 484 type annotations.

#     Args:
#         param1: The first parameter.
#         param2: The second parameter.

#     Returns:
#         The return value. True for success, False otherwise.

#     """

In [31]:
# # Apply function to multiple columns
# cols = []
# df_updated = df.copy()
# df_updated[cols] = df_updated[cols].applymap(custom_function)

# # Create new aggregated boolean column
# df_updated['bool'] = df_updated[cols].any(axis=1, skipna=False)

---

# *`4. Data Analysis`*
 

Here is where your analysis begins. You can add different sections based on your project goals.

## **4.1 Exploring `Column Name`**

In [32]:
# Code and visualization

**Observations**
- Ob 1
- Ob 2
- Ob 3

---

# **`5. Conclusion`**
 



## **5.1 Insights**
State the insights/outcomes of your project or notebook.

## **5.2 Suggestions**

Make suggestions based on insights.

## **5.3 Possible Next Steps**
Areas to expand on:
- (if there is any)

---

# *`6. Epilogue`*

## **6.1 References**

This is how we use inline citation[<sup id="fn1-back">[1]</sup>](#fn1).

[<span id="fn1">1.</span>](#fn1-back) _Author (date)._ Title. Available at: https://website.com (Accessed: Date). 

> Use [https://www.citethisforme.com/](https://www.citethisforme.com/) to create citations.

---

## **6.2 Information about the Author:**

[<img src="https://media.licdn.com/dms/image/D4D03AQH8PR9DDb3VxQ/profile-displayphoto-shrink_200_200/0/1713280211622?e=2147483647&v=beta&t=5TpzxNZJRmU3_zjNLoRb-O2V9amv1-1rwM5OczG01ZY" width="20%">](https://www.facebook.com/groups/codanics/permalink/1872283496462303/ "Image")


**Mr. ShaheerAli**

BS Computer Science\
[Youtube channel](https://www.youtube.com/channel/UCUTphw52izMNv9W6AOIFGJA)\
[Twitter](https://twitter.com/__shaheerali190)\
[Linkedin](https://www.linkedin.com/in/shaheer-ali-2761aa303/)\
[github](https://github.com/shaheeralics)\
[Kaggle](https://www.kaggle.com/shaheerali197)\
[Portfolio Website](https://shaheer.kesug.com)

## **6.3 Versioning**
Notebook and insights by (Mr.Shaheer Ali).
- Version: 1.5.0
- Date: 2023-05-15