<h1 style="color:OrangeRed; font-style:italic; font-family:Georgia;">🚀 Machine Learning Lifecycle</h1>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Problem Definition: Clearly identify the problem to be solved using machine learning.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Data Collection: Gather relevant data from various sources for analysis.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Data Cleaning: Handle missing data, remove duplicates, and correct errors.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Exploratory Data Analysis (EDA): Explore data patterns, trends, and relationships.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Feature Engineering: Create new features or transform existing ones to boost model performance.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Data Preprocessing: Scale, encode, and split data for training and testing.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Model Selection: Choose suitable algorithms based on the problem type (classification, regression, clustering).</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Model Training: Train the selected model using preprocessed data.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Hyperparameter Tuning: Adjust model parameters to improve accuracy and reduce errors.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Model Evaluation: Validate model performance using appropriate metrics (e.g., accuracy, RMSE, F1-score).</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Model Saving: Save the trained model for future use.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Model Deployment: Integrate the model into applications or systems for real-time predictions.</h5>

<h5 style="color:Orange; font-style:italic; font-family:'Courier New'; font-weight:bold;">📊 Model Monitoring & Maintenance: Continuously monitor performance and retrain as needed.</h5>


<hr style="border-top: 3px solid blue;">

<h1 style="font-family:'Poppins', sans-serif; color:rgb(3, 105, 206);  text-align: center;">Importing the Required Libraries</h1>

In [1]:
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
import pymongo 



import os 
from dotenv import load_dotenv

<h2 style="color:Lime;font-style:italic; font-family:Georgia">About Features:</h2>

<hr style="border-top: 3px solid blue;">

<h1 style="font-family:'Poppins', sans-serif; color:rgb(3, 105, 206);  text-align: center;">Reading the Dataset from Database</h1>

In [2]:
# Our MongoDB Atlas Connection url where clustername is Shipment

Connection_url = os.getenv("MONGO_URL")
# Connect to MongoDB Atlas
client = pymongo.MongoClient(Connection_url)


# Select the database and collection
db = client["new_DB"]  # Replace with our database name
collection = db["new_Collection"]  # Replace with our collection name

# Query data
data = collection.find()

# Convert data to DataFrame
data = pd.DataFrame(list(data))

<hr style="border-top: 3px solid blue;">

In [3]:
pd.set_option('display.max_rows', None)  
pd.set_option('display.max_columns', None)

In [4]:
data.head()

Unnamed: 0,_id,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,67c9d07bcb4dbed121bcd9f6,10,Female,38,Returning,Business,Business,2822,13,0.0,2,5,3,5,2,5,4,5,4,2,5,2,5,5,Satisfied
1,67c9d07bcb4dbed121bcda0d,33,Female,33,First-time,Business,Business,173,22,28.0,2,2,5,2,5,3,2,3,2,2,5,2,2,5,Neutral or Dissatisfied
2,67c9d07bcb4dbed121bcda2e,66,Male,36,Returning,Business,Business,173,12,8.0,5,5,2,3,5,3,3,3,3,4,3,1,3,1,Neutral or Dissatisfied
3,67c9d07bcb4dbed121bcda3a,78,Male,58,Returning,Personal,Economy,173,0,4.0,4,4,1,4,3,4,2,3,2,2,5,4,2,4,Satisfied
4,67c9d07bcb4dbed121bcda42,86,Female,27,Returning,Personal,Economy,212,0,13.0,0,3,5,3,4,3,1,4,1,1,5,3,1,5,Neutral or Dissatisfied


In [5]:
data.drop(columns="_id",inplace=True)

In [7]:
data.head()

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,10,Female,38,Returning,Business,Business,2822,13,0.0,2,5,3,5,2,5,4,5,4,2,5,2,5,5,Satisfied
1,33,Female,33,First-time,Business,Business,173,22,28.0,2,2,5,2,5,3,2,3,2,2,5,2,2,5,Neutral or Dissatisfied
2,66,Male,36,Returning,Business,Business,173,12,8.0,5,5,2,3,5,3,3,3,3,4,3,1,3,1,Neutral or Dissatisfied
3,78,Male,58,Returning,Personal,Economy,173,0,4.0,4,4,1,4,3,4,2,3,2,2,5,4,2,4,Satisfied
4,86,Female,27,Returning,Personal,Economy,212,0,13.0,0,3,5,3,4,3,1,4,1,1,5,3,1,5,Neutral or Dissatisfied


<h1 style="font-family:'Poppins', sans-serif; color:rgb(3, 105, 206);  text-align: center;">Data Exploration</h1>

<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color:rgb(179, 14, 143);">✨Overview of Top and Bottom Rows of the Dataset</h2>


<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color:rgb(179, 14, 143);">✨Checking for Columns/Features Present</h2>


<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color:rgb(179, 14, 143);">✨Random Sample of the Data</h2>

<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color:rgb(179, 14, 143);">✨Checking DataType and  Information of the Dataset</h2>

<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color:rgb(179, 14, 143);">✨Checking for Null Values?</h2>

<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color:rgb(179, 14, 143);">✨Checking for Duplicate Values?</h2>

<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color:rgb(179, 14, 143);">✨Checking for Number of Unique Values for each Column</h2>

<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color: rgb(179, 14, 143);">✨ Identifying Columns with Fewer than 30 Unique Values</h2>


<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color: rgb(179, 14, 143);">✨Checking for value_counts()</h2>


<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color:rgb(179, 14, 143);">✨ Removing Unnecessary Features to Enhance Dataset Quality</h2>


<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color: rgb(179, 14, 143);">✨ Perform Type Casting of Data (DataType Conversion) if Necessary</h2>


<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color: rgb(179, 14, 143);">✨ Summary Statistics for Numerical Data</h2>


<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color: rgb(179, 14, 143);">✨ Summary Statistics for Categorical Data</h2>


<h2 style="font-family: 'Arial', sans-serif; font-size: 1.5em; font-weight: bold; color: rgb(179, 14, 143);">✨ Segregate Data into Numerical, Categorical, and Datetime Columns</h2>


example for you.... use the select_dtype() do segregation...

<hr style="border-top: 3px solid blue;">