# AI Semester Project: Main Analysis

This Jupyter Notebook serves as the main analysis file for the AI semester project. It includes data loading, preprocessing steps, model training, evaluation, and visualization of results.

## Data Loading

In this section, we will load the dataset and display its structure.

In [1]:
import pandas as pd

data_path = '../data/dataset.csv'  # Update with the actual dataset path
df = pd.read_csv(data_path)
print('Loading data...')
df.head()

Loading data...


## Data Preprocessing

This section will handle missing values, normalize the data, and apply any necessary transformations.

In [2]:
from src.data_preprocessing import preprocess_data

df_cleaned = preprocess_data(df)
df_cleaned.head()

## Model Training

In this section, we will train various models including KNN, Decision Tree, Logistic Regression, Neural Network, Random Forest, and Linear Regression.

In [3]:
from src.knn import KNN
from src.decision_tree import DecisionTree
from src.logistic_regression import LogisticRegression
from src.neural_network import NeuralNetwork
from src.random_forest import RandomForest
from src.linear_regression import LinearRegression

# Splitting the data into features and target
X = df_cleaned.drop('target_column', axis=1)  # Replace with actual target column
y = df_cleaned['target_column']

# Train-test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize models
knn_model = KNN(k=5)
dt_model = DecisionTree()
logistic_model = LogisticRegression()
nn_model = NeuralNetwork()
rf_model = RandomForest()
lr_model = LinearRegression()

# Fit models
knn_model.fit(X_train, y_train)
dt_model.fit(X_train, y_train)
logistic_model.fit(X_train, y_train)
nn_model.fit(X_train, y_train)
rf_model.fit(X_train, y_train)
lr_model.fit(X_train, y_train)

## Model Evaluation

In this section, we will evaluate the performance of the trained models using various metrics.

In [4]:
from src.evaluation import accuracy, precision, recall, rmse, mae, r2_score

# Make predictions
y_pred_knn = knn_model.predict(X_test)
y_pred_dt = dt_model.predict(X_test)
y_pred_logistic = logistic_model.predict(X_test)
y_pred_nn = nn_model.predict(X_test)
y_pred_rf = rf_model.predict(X_test)
y_pred_lr = lr_model.predict(X_test)

# Evaluate models
evaluation_results = {
    'KNN': {
        'accuracy': accuracy(y_test, y_pred_knn),
        'precision': precision(y_test, y_pred_knn),
        'recall': recall(y_test, y_pred_knn)
    },
    'Decision Tree': {
        'accuracy': accuracy(y_test, y_pred_dt),
        'precision': precision(y_test, y_pred_dt),
        'recall': recall(y_test, y_pred_dt)
    },
    'Logistic Regression': {
        'accuracy': accuracy(y_test, y_pred_logistic),
        'precision': precision(y_test, y_pred_logistic),
        'recall': recall(y_test, y_pred_logistic)
    },
    'Neural Network': {
        'accuracy': accuracy(y_test, y_pred_nn),
        'precision': precision(y_test, y_pred_nn),
        'recall': recall(y_test, y_pred_nn)
    },
    'Random Forest': {
        'accuracy': accuracy(y_test, y_pred_rf),
        'precision': precision(y_test, y_pred_rf),
        'recall': recall(y_test, y_pred_rf)
    },
    'Linear Regression': {
        'rmse': rmse(y_test, y_pred_lr),
        'mae': mae(y_test, y_pred_lr),
        'r2': r2_score(y_test, y_pred_lr)
    }
}

evaluation_results

## Visualization of Results

In this section, we will visualize the results of the model predictions against the actual values.

In [5]:
import matplotlib.pyplot as plt

# Visualization for KNN
plt.figure(figsize=(10, 6))
plt.scatter(y_pred_knn, y_test, label='KNN Predictions')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', linestyle='--', label='Perfect Prediction')
plt.xlabel('Predicted Values')
plt.ylabel('Actual Values')
plt.title('KNN Predictions vs Actual')
plt.legend()
plt.show()

# Repeat for other models as needed

## Conclusion

In this notebook, we have successfully loaded the dataset, preprocessed the data, trained multiple models, evaluated their performance, and visualized the results. Further analysis can be conducted to improve model performance.