<a href="https://colab.research.google.com/github/shalu236616/AI-and-ML-PROJECT-/blob/main/modification_of_decision_tree_using_AdaBoost.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

▶**Workflow Overview**

♋**Step 1: Load and Explore Data**

1- Load the dataset

2- Check for missing values

3-Understand the target column

4-Encode categorical features

♋**Step 2: Preprocess the Data**

1-Feature/target separation

2-Encode labels

3-Normalize/scale data (especially for Logistic Regression)

4-Train-test split

♋**Step 3: Train Models**

1-Train a Decision Tree classifier

2-Train an AdaBoost classifier using Decision Trees as base estimator

3-Train a Logistic Regression model

♋**Step 4: Evaluate Models**

1-Accuracy

2-Confusion Matrix

3-Precision, Recall, F1-Score

 ♋**Step 5: Compare Results**

Compare all 3 models using accuracy and metrics

In [91]:
# Useful libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
import matplotlib.pyplot as plt
import seaborn as sns

In [92]:
# load data
df = pd.read_csv('/content/cybersecurity_intrusion_data.csv')

In [93]:
# Display basic info
df.info()
print(df.isnull().sum())
print("\nTarget Variable Distribution:")
print(df['attack_detected'].value_counts())



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9537 entries, 0 to 9536
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   session_id           9537 non-null   object 
 1   network_packet_size  9537 non-null   int64  
 2   protocol_type        9537 non-null   object 
 3   login_attempts       9537 non-null   int64  
 4   session_duration     9537 non-null   float64
 5   encryption_used      7571 non-null   object 
 6   ip_reputation_score  9537 non-null   float64
 7   failed_logins        9537 non-null   int64  
 8   browser_type         9537 non-null   object 
 9   unusual_time_access  9537 non-null   int64  
 10  attack_detected      9537 non-null   int64  
dtypes: float64(2), int64(5), object(4)
memory usage: 819.7+ KB
session_id                0
network_packet_size       0
protocol_type             0
login_attempts            0
session_duration          0
encryption_used        1966
ip_reputati

In [94]:
# Data Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
df = df.drop(columns=['session_id'])

#  One-Hot Encoding for categorical columns
df = pd.get_dummies(df, columns=['protocol_type', 'encryption_used', 'browser_type'], drop_first=True)
X = df.drop('attack_detected', axis=1)
y = df['attack_detected']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features for logistic regression
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [95]:
# Train a Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

# Decision Tree Model
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

# Predictions
y_pred_dt = dt_model.predict(X_test)

# Evaluation
print(" Decision Tree Accuracy:", accuracy_score(y_test, y_pred_dt))
print(classification_report(y_test, y_pred_dt))


 Decision Tree Accuracy: 0.8238993710691824
              precision    recall  f1-score   support

           0       0.84      0.83      0.84      1042
           1       0.80      0.81      0.81       866

    accuracy                           0.82      1908
   macro avg       0.82      0.82      0.82      1908
weighted avg       0.82      0.82      0.82      1908



In [96]:
# AdaBoost with decision tree
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
# Define the model
ada_model = AdaBoostClassifier(estimator=DecisionTreeClassifier(max_depth=1),learning_rate=0.1, n_estimators=1000,   random_state=42)



# Train the model
ada_model.fit(X_train, y_train)

# Make predictions
y_pred = ada_model.predict(X_test)
ada_model.fit(X_train, y_train)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print(classification_report(y_test, y_pred))





Accuracy: 86.79%
              precision    recall  f1-score   support

           0       0.81      1.00      0.89      1042
           1       1.00      0.71      0.83       866

    accuracy                           0.87      1908
   macro avg       0.90      0.85      0.86      1908
weighted avg       0.89      0.87      0.86      1908



In [97]:
scores = cross_val_score(ada_model, X, y, cv=5, scoring='accuracy')
print("Cross-Validation Accuracy Scores:", scores)

Cross-Validation Accuracy Scores: [0.87421384 0.86163522 0.87572103 0.87362349 0.87362349]


In [98]:
# Train a Logistic Regression Model
from sklearn.linear_model import LogisticRegression

# Logistic Regression Model
lr_model = LogisticRegression(max_iter=1000, random_state=42)
lr_model.fit(X_train_scaled, y_train)

# Predictions
y_pred_lr = lr_model.predict(X_test_scaled)

# Evaluation
print("Logistic Regression Accuracy:", accuracy_score(y_test, y_pred_lr))
print(classification_report(y_test, y_pred_lr))


Logistic Regression Accuracy: 0.7468553459119497
              precision    recall  f1-score   support

           0       0.75      0.81      0.78      1042
           1       0.75      0.67      0.71       866

    accuracy                           0.75      1908
   macro avg       0.75      0.74      0.74      1908
weighted avg       0.75      0.75      0.75      1908



In [99]:
# Compare All Three Models

results = {
    "Decision Tree": accuracy_score(y_test, y_pred_dt),
    "AdaBoost": accuracy_score(y_test, y_pred),
    "Logistic Regression": accuracy_score(y_test, y_pred_lr)
}

# Display in sorted order
for model, acc in sorted(results.items(), key=lambda x: x[1], reverse=True):
    print(f"{model}: {acc:.4f}")


AdaBoost: 0.8679
Decision Tree: 0.8239
Logistic Regression: 0.7469
