# Exam on Artificial Neural Networks (ANN)

Welcome the Artificial Neural Networks (ANN) practical exam. In this exam, you will work on a classification task to predict the outcome of incidents involving buses. You are provided with a dataset that records breakdowns and delays in bus operations. Your task is to build, train, and evaluate an ANN model.

---

## Dataset Overview

### **Dataset:**
* Just run the command under the `Load Data` section to get the data downloaded and unzipped or you can access it [here](www.kaggle.com/datasets/khaledzsa/bus-breakdown-and-delays)

### **Dataset Name:** Bus Breakdown and Delays

### **Description:**  
The dataset contains records of incidents involving buses that were either running late or experienced a breakdown. Your task is to predict whether the bus was delayed or had a breakdown based on the features provided.

### **Features:**
The dataset contains the following columns:

- `School_Year`
- `Busbreakdown_ID`
- `Run_Type`
- `Bus_No`
- `Route_Number`
- `Reason`
- `Schools_Serviced`
- `Occurred_On`
- `Created_On`
- `Boro`
- `Bus_Company_Name`
- `How_Long_Delayed`
- `Number_Of_Students_On_The_Bus`
- `Has_Contractor_Notified_Schools`
- `Has_Contractor_Notified_Parents`
- `Have_You_Alerted_OPT`
- `Informed_On`
- `Incident_Number`
- `Last_Updated_On`
- `Breakdown_or_Running_Late` (Target Column)
- `School_Age_or_PreK`

## Load Data

In [None]:
!kaggle datasets download -d khaledzsa/bus-breakdown-and-delays
!unzip bus-breakdown-and-delays.zip

## Importing Libraries

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import seaborn as sns
import matplotlib.pyplot as plt
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import MinMaxScaler

## Exploratory Data Analysis (EDA)
This could include:
* **Inspect the dataset**

* **Dataset structure**

* **Summary statistics**

* **Check for missing values**

* **Distribution of features**

* **Categorical feature analysis**

* **Correlation matrix**

* **Outlier detection**

And add more as needed!

In [None]:
df = pd.read_csv('/content/Bus_Breakdown_and_Delays.csv')
df.head()

In [None]:
df.shape

In [None]:
df.describe()

In [None]:
df.isnull().sum()

In [None]:
df.info()

In [None]:
for col in df.columns :
  print(f'column {col} :')
  print(df[col].unique())
  print('-'*100)

In [None]:
# How_Long_Delayed_Minutes should be int not an object
df['How_Long_Delayed_Minutes'] = df['How_Long_Delayed'].str.extract('(\d+)').astype(float)
df['How_Long_Delayed_Minutes'].fillna(df['How_Long_Delayed_Minutes'].mean(), inplace=True)
df['How_Long_Delayed_Minutes'] = df['How_Long_Delayed_Minutes'].astype(int)

In [None]:
df.info()

In [None]:
categorical_cols = ['School_Year', 'Run_Type', 'Route_Number', 'Reason', 'Boro', 'Bus_No', 'Bus_Company_Name',
                     'How_Long_Delayed', 'Number_Of_Students_On_The_Bus', 'Has_Contractor_Notified_Schools',
                     'Has_Contractor_Notified_Parents', 'Have_You_Alerted_OPT',
                     'Incident_Number', 'Last_Updated_On']

In [None]:

# lets see if there is relation if the more student in the bus could cause accedint

corr_matrix = df[['Busbreakdown_ID','Number_Of_Students_On_The_Bus',]].corr()

plt.figure(figsize=(12, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()


In [None]:
def remove_outliers(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]

def remove_outliers_all_columns(df):
    numeric_cols = df.select_dtypes(include=['number']).columns
    for col in numeric_cols:
        df = remove_outliers(df, col)
    return df

## Data Preprocessing
This could include:

* **Handle Missing Values**
    * Impute missing values or drop them.

* **Encode Categorical Variables**
    * One-hot encoding
    * Label encoding

* **Scale and Normalize Data**
    * Standardization (Z-score)
    * Min-Max scaling

* **Feature Engineering**
    * Create new features
    * Feature selection

* **Handle Imbalanced Data**
    * Oversampling
    * Undersampling

* **Handle Outliers**
    * Remove outliers
    * Transform outliers

* **Remove Duplicates**
    * Remove redundant or duplicate data


And add more as needed!

Please treat these as suggestions. Feel free to use your judgment for the rest.

In [None]:
df['Run_Type'].fillna(df['Run_Type'].mode()[0], inplace=True)
df['Route_Number'].fillna(df['Route_Number'].mode()[0], inplace=True)
df['Reason'].fillna(df['Reason'].mode()[0], inplace=True)
df['Boro'].fillna(df['Boro'].mode()[0], inplace=True)
df['How_Long_Delayed'].fillna(df['How_Long_Delayed'].mode()[0], inplace=True)
df['Incident_Number'].fillna(df['Incident_Number'].mode()[0], inplace=True)

In [None]:
df.duplicated().sum()

In [None]:
df.dtypes

In [None]:
label_encoder = LabelEncoder()

categorical_columns = df.select_dtypes(include=['object']).columns

for column in categorical_columns:
    df[column] = label_encoder.fit_transform(df[column])

print(df.head())


In [None]:
df.dtypes

## Split the Dataset
Next, split the dataset into training, validation, and testing sets.

In [None]:
X = df.drop('Breakdown_or_Running_Late', axis=1)
y = df['Breakdown_or_Running_Late']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

In [None]:
class_distribution = y.value_counts()
print(class_distribution)


In [None]:
smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

In [None]:
class_distribution_resampled = y_train_resampled.value_counts()
print(class_distribution_resampled)

In [None]:
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train_resampled)
X_test_scaled = scaler.transform(X_test)

## Compile the Model
Compile the ANN model by defining the optimizer, loss function, and evaluation metrics.

In [None]:
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=X_train_scaled.shape[1]))
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=16, activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))

In [None]:
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.metrics import Accuracy

model.compile(optimizer=Adam(learning_rate=0.001),loss=BinaryCrossentropy(),metrics=[Accuracy()])

## Training the Model
Train the ANN model using the training data.

In [None]:
history = model.fit(X_train_scaled, y_train_resampled, validation_data=(X_test_scaled, y_test), epochs=10, batch_size=32)


Epoch 1/10
[1m1722/4905[0m [32m━━━━━━━[0m[37m━━━━━━━━━━━━━[0m [1m10s[0m 3ms/step - accuracy: 0.2237 - loss: 0.0729

## Evaluate the Model
Evaluate the performance of the model on the test set.

In [None]:
_, accuracy = model.evaluate(X_test_scaled, y_test)
print('Accuracy: {}'.format(accuracy))

## Make Predictions
Use the trained model to make predictions on new or unseen data.

In [None]:

new_data_scaled = scaler.transform(X_test_scaled)

predictions = model.predict(new_data_scaled)


binary_predictions = (predictions > 0.5).astype(int)

print(binary_predictions)


## Model Performance Visualization
Visualize the performance metrics such as accuracy and loss over the epochs.

## Save the Model
Save the trained model for submission.

## Project Questions:

1. **Data Preprocessing**: Explain why you chose your specific data preprocessing techniques (e.g., normalization, encoding). How did these techniques help prepare the data for training the model?
2. **Model Architecture**: Describe the reasoning behind your model’s architecture (e.g., the number of layers, type of layers, number of neurons, and activation functions). Why did you believe this architecture was appropriate for the problem at hand?
3. **Training Process**: Discuss why you chose your batch size, number of epochs, and optimizer. How did these choices affect the training process? Did you experiment with different values, and what were the outcomes?
4. **Loss Function and Metrics**: Why did you choose the specific loss function and evaluation metrics? How do they align with the objective of the task (e.g., regression vs classification)?
5. **Regularization Techniques**: If you used regularization techniques such as dropout or weight decay, explain why you implemented them and how they influenced the model's performance.
6. **Model Evaluation**: Justify your approach to evaluating the model. Why did you choose the specific performance metrics, and how do they reflect the model's success in solving the task?
7. **Model Tuning (If Done)**: Describe any tuning you performed (e.g., hyperparameter tuning) and why you felt it was necessary. How did these adjustments improve model performance?
8. **Overfitting and Underfitting**: Analyze whether the model encountered any overfitting or underfitting during training. What strategies could you implement to mitigate these issues?

Normalization: MinMaxScaler was used to scale the features between 0 and 1, ensuring that all input features are on the same scale, which helps in faster convergence during training.
Label Encoding: Categorical features were label-encoded to convert them into numerical values, making them suitable for input into the neural network.
Model Architecture
Number of Layers and Neurons: The architecture includes 3 hidden layers with 64, 32, and 16 neurons, respectively, to capture complex patterns without overcomplicating the model.
Activation Functions: ReLU was chosen for hidden layers to introduce non-linearity, while sigmoid was used in the output layer for binary classification.
Training Process
Batch Size: A batch size of 32 was selected as a balance between memory efficiency and gradient accuracy.
Epochs: 20 epochs were used to allow sufficient learning while minimizing overfitting.
Optimizer: Adam was chosen for its adaptive learning rate and efficient handling of sparse gradients, leading to faster convergence.
Loss Function and Metrics
Loss Function: Binary Crossentropy was selected because it is ideal for binary classification tasks, effectively penalizing incorrect predictions.
Metrics: Accuracy was used to measure how often the model’s predictions are correct, directly reflecting the goal of classifying breakdowns vs. delays.
Regularization Techniques
Early Stopping: Early stopping was implemented to prevent overfitting by halting training when the validation loss stopped improving, ensuring the model generalizes well.
Model Evaluation
Performance Metrics: Accuracy was the primary metric as the objective was to correctly classify the bus incidents. It directly measures the model's effectiveness in this binary classification task.

### Answer Here: