Present your findings on different activation functions you have used and methods to improve the accuracy of the
model using neural networks. You should be able to clearly articulate the advantage and disadvantage of each
activation function. Use any sample data and present your POV in a well-structured presentation.

# Slide 1: Title Slide

## Neural Network Activation Functions and Accuracy Improvement

**[Your Name/Organization]**

**[Date]**

# Slide 2: Introduction and Objective

## Understanding the Role of Activation Functions

*   **What are Activation Functions?**
    *   Non-linear functions applied to the output of neurons.
    *   Enable neural networks to learn complex patterns.
*   **Why are they important?**
    *   Determine the output of a neuron.
    *   Impact the network's ability to converge and learn effectively.
*   **Objective:**
    *   Explore different activation functions and their impact on model performance.
    *   Investigate techniques to improve neural network accuracy.
    *   Present findings based on an analysis using ride booking data.

# Slide 3: Activation Functions Explored (Without Regularization)

## Comparing Baseline Model Performance

*   **Model Architecture:** Simple Sequential model with three dense hidden layers.
*   **Activation Functions Tested:**
    *   **ReLU (Baseline):** Default choice, computationally efficient, avoids vanishing gradients for positive inputs.
    *   **Sigmoid:** Squashes output between 0 and 1, suitable for binary classification output layer.
    *   **Tanh:** Squashes output between -1 and 1, zero-centered, generally better than Sigmoid in hidden layers.
    *   **Leaky ReLU:** Addresses 'dying ReLU' problem by allowing small gradient for negative inputs.
*   **Results (Test Accuracy):**
    *   Sigmoid: 0.9976
    *   Tanh: 0.9999
    *   Leaky ReLU: 0.9997
*   **Key Finding:** Tanh and Leaky ReLU achieved very high accuracies, outperforming Sigmoid in this scenario.

# Slide 4: Impact of Regularization Techniques

## Enhancing Model Stability and Generalization

*   **Techniques Applied:**
    *   **Batch Normalization:** Normalizes layer inputs, stabilizes training, allows higher learning rates.
    *   **Dropout:** Randomly deactivates neurons during training to prevent overfitting.
*   **Model Architecture:** Same as baseline, with added Batch Normalization and Dropout layers.
*   **Results (Test Accuracy with Regularization):**
    *   Sigmoid: 0.9947
    *   Tanh: 0.9969
    *   Leaky ReLU: 0.9968
*   **Discussion:**
    *   Regularization led to a slight decrease in test accuracy for all models in this case.
    *   This might indicate the non-regularized models were already generalizing well or that tuning is needed.
    *   Regularization is crucial for preventing overfitting, especially in complex models or smaller datasets.

# Slide 5: Advantages and Disadvantages of Activation Functions

## A Comparative Summary

*   **Sigmoid:**
    *   **Advantages:** Useful for binary classification output (probability).
    *   **Disadvantages:** Vanishing gradients, not zero-centered, slower training.
*   **Tanh:**
    *   **Advantages:** Zero-centered, better performance than Sigmoid in hidden layers, less severe vanishing gradients than Sigmoid.
    *   **Disadvantages:** Still susceptible to vanishing gradients.
*   **Leaky ReLU:**
    *   **Advantages:** Prevents dying neurons, computationally efficient, faster training, less prone to vanishing gradients.
    *   **Disadvantages:** Requires choosing a 'leak' rate (though often a default works well).

# Slide 6: Conclusion and Future Work

## Key Takeaways and Next Steps

*   **Key Findings:**
    *   Tanh and Leaky ReLU demonstrated superior performance compared to Sigmoid in hidden layers on this dataset.
    *   Regularization (Batch Normalization and Dropout) is a valuable technique for preventing overfitting and improving training stability, even if it didn't boost test accuracy significantly in this specific instance.
*   **Insights:**
    *   The dataset might be easily separable, leading to high baseline accuracies.
    *   The benefits of regularization might be more apparent with more complex models or datasets.
*   **Next Steps:**
    *   Explore other accuracy improvement techniques (e.g., different optimizers, learning rate schedules, architecture variations).
    *   Investigate the impact of tuning regularization parameters.
    *   Analyze model performance using other metrics (precision, recall, F1-score) for a more comprehensive evaluation.

In [None]:
import pandas as pd

# Load the dataset

df = pd.read_csv('/content/ncr_ride_bookings (1).csv')

# Display the first few rows
display(df.head())

# Display information about the dataframe
display(df.info())

Unnamed: 0,Date,Time,Booking ID,Booking Status,Customer ID,Vehicle Type,Pickup Location,Drop Location,Avg VTAT,Avg CTAT,...,Reason for cancelling by Customer,Cancelled Rides by Driver,Driver Cancellation Reason,Incomplete Rides,Incomplete Rides Reason,Booking Value,Ride Distance,Driver Ratings,Customer Rating,Payment Method
0,2024-03-23,12:29:38,"""CNR5884300""",No Driver Found,"""CID1982111""",eBike,Palam Vihar,Jhilmil,,,...,,,,,,,,,,
1,2024-11-29,18:01:39,"""CNR1326809""",Incomplete,"""CID4604802""",Go Sedan,Shastri Nagar,Gurgaon Sector 56,4.9,14.0,...,,,,1.0,Vehicle Breakdown,237.0,5.73,,,UPI
2,2024-08-23,08:56:10,"""CNR8494506""",Completed,"""CID9202816""",Auto,Khandsa,Malviya Nagar,13.4,25.8,...,,,,,,627.0,13.58,4.9,4.9,Debit Card
3,2024-10-21,17:17:25,"""CNR8906825""",Completed,"""CID2610914""",Premier Sedan,Central Secretariat,Inderlok,13.1,28.5,...,,,,,,416.0,34.02,4.6,5.0,UPI
4,2024-09-16,22:08:00,"""CNR1950162""",Completed,"""CID9933542""",Bike,Ghitorni Village,Khan Market,5.3,19.6,...,,,,,,737.0,48.21,4.1,4.3,UPI


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150000 entries, 0 to 149999
Data columns (total 21 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   Date                               150000 non-null  object 
 1   Time                               150000 non-null  object 
 2   Booking ID                         150000 non-null  object 
 3   Booking Status                     150000 non-null  object 
 4   Customer ID                        150000 non-null  object 
 5   Vehicle Type                       150000 non-null  object 
 6   Pickup Location                    150000 non-null  object 
 7   Drop Location                      150000 non-null  object 
 8   Avg VTAT                           139500 non-null  float64
 9   Avg CTAT                           102000 non-null  float64
 10  Cancelled Rides by Customer        10500 non-null   float64
 11  Reason for cancelling by Customer  1050

None

Present your findings on different activation functions you have used and methods to improve the accuracy of the
model using neural networks. You should be able to clearly articulate the advantage and disadvantage of each
activation function. Use any sample data and present your POV in a well-structured presentation.

# Task
Analyze the impact of different activation functions and accuracy improvement techniques on a neural network model using the dataset located at "/content/ncr_ride_bookings (1).csv". Present the findings, including the advantages and disadvantages of each activation function, and provide step-by-step code for the analysis.

## Data preprocessing

### Subtask:
Clean and prepare the data for use in a neural network. This may include handling missing values, encoding categorical features, and scaling numerical features.


**Reasoning**:
Handle missing values in the numerical columns by imputing with the mean and in the categorical columns by imputing with the mode.



In [None]:
# Impute missing numerical values with the mean
numerical_cols = df.select_dtypes(include=['float64', 'int64']).columns
for col in numerical_cols:
    df[col].fillna(df[col].mean(), inplace=True)

# Impute missing categorical values with the mode
categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols:
    df[col].fillna(df[col].mode()[0], inplace=True)

display(df.info())

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].mean(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].mode()[0], inplace=True)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150000 entries, 0 to 149999
Data columns (total 21 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   Date                               150000 non-null  object 
 1   Time                               150000 non-null  object 
 2   Booking ID                         150000 non-null  object 
 3   Booking Status                     150000 non-null  object 
 4   Customer ID                        150000 non-null  object 
 5   Vehicle Type                       150000 non-null  object 
 6   Pickup Location                    150000 non-null  object 
 7   Drop Location                      150000 non-null  object 
 8   Avg VTAT                           150000 non-null  float64
 9   Avg CTAT                           150000 non-null  float64
 10  Cancelled Rides by Customer        150000 non-null  float64
 11  Reason for cancelling by Customer  1500

None

**Reasoning**:
Encode the categorical features using one-hot encoding.



In [None]:
# Identify categorical features to encode (excluding Date, Time, Booking ID, Customer ID)
categorical_features = ['Booking Status', 'Vehicle Type', 'Pickup Location', 'Drop Location',
                        'Reason for cancelling by Customer', 'Driver Cancellation Reason',
                        'Incomplete Rides Reason', 'Payment Method']

# Apply one-hot encoding
df_encoded = pd.get_dummies(df, columns=categorical_features, drop_first=True)

display(df_encoded.head())

Unnamed: 0,Date,Time,Booking ID,Customer ID,Avg VTAT,Avg CTAT,Cancelled Rides by Customer,Cancelled Rides by Driver,Incomplete Rides,Booking Value,...,Reason for cancelling by Customer_Wrong Address,Driver Cancellation Reason_More than permitted people in there,Driver Cancellation Reason_Personal & Car related issues,Driver Cancellation Reason_The customer was coughing/sick,Incomplete Rides Reason_Other Issue,Incomplete Rides Reason_Vehicle Breakdown,Payment Method_Credit Card,Payment Method_Debit Card,Payment Method_UPI,Payment Method_Uber Wallet
0,2024-03-23,12:29:38,"""CNR5884300""","""CID1982111""",8.456352,29.149636,1.0,1.0,1.0,508.295912,...,True,False,False,False,False,False,False,False,True,False
1,2024-11-29,18:01:39,"""CNR1326809""","""CID4604802""",4.9,14.0,1.0,1.0,1.0,237.0,...,True,False,False,False,False,True,False,False,True,False
2,2024-08-23,08:56:10,"""CNR8494506""","""CID9202816""",13.4,25.8,1.0,1.0,1.0,627.0,...,True,False,False,False,False,False,False,True,False,False
3,2024-10-21,17:17:25,"""CNR8906825""","""CID2610914""",13.1,28.5,1.0,1.0,1.0,416.0,...,True,False,False,False,False,False,False,False,True,False
4,2024-09-16,22:08:00,"""CNR1950162""","""CID9933542""",5.3,19.6,1.0,1.0,1.0,737.0,...,True,False,False,False,False,False,False,False,True,False


**Reasoning**:
Scale the numerical features using StandardScaler and separate features (X) and target variable (y).



In [None]:
from sklearn.preprocessing import StandardScaler

# Identify numerical features to scale (excluding Date, Time, Booking ID, Customer ID)
numerical_features_scaled = ['Avg VTAT', 'Avg CTAT', 'Cancelled Rides by Customer',
                             'Cancelled Rides by Driver', 'Incomplete Rides', 'Booking Value',
                             'Ride Distance', 'Driver Ratings', 'Customer Rating']

# Apply StandardScaler
scaler = StandardScaler()
df_encoded[numerical_features_scaled] = scaler.fit_transform(df_encoded[numerical_features_scaled])

# Define features (X) and target variable (y)
X = df_encoded.drop(columns=['Date', 'Time', 'Booking ID', 'Customer ID', 'Booking Status_Cancelled by Customer',
                             'Booking Status_Cancelled by Driver', 'Booking Status_Completed',
                             'Booking Status_Incomplete', 'Booking Status_No Driver Found']) # Assuming 'Booking Status' is the target and dropping original and encoded target columns
y = df_encoded['Booking Status_Completed'] # Example target: predicting completed rides

display(X.head())
display(y.head())

KeyError: "['Booking Status_Cancelled by Customer'] not found in axis"

**Reasoning**:
Correct the column names being dropped to match the one-hot encoded columns and then separate features (X) and target variable (y).



In [None]:
from sklearn.preprocessing import StandardScaler

# Identify numerical features to scale (excluding Date, Time, Booking ID, Customer ID)
numerical_features_scaled = ['Avg VTAT', 'Avg CTAT', 'Cancelled Rides by Customer',
                             'Cancelled Rides by Driver', 'Incomplete Rides', 'Booking Value',
                             'Ride Distance', 'Driver Ratings', 'Customer Rating']

# Apply StandardScaler
scaler = StandardScaler()
df_encoded[numerical_features_scaled] = scaler.fit_transform(df_encoded[numerical_features_scaled])

# Define features (X) and target variable (y)
X = df_encoded.drop(columns=['Date', 'Time', 'Booking ID', 'Customer ID',
                             'Booking Status_Completed', 'Booking Status_Incomplete',
                             'Booking Status_No Driver Found']) # Assuming 'Booking Status' is the target and dropping original and encoded target columns
y = df_encoded['Booking Status_Completed'] # Example target: predicting completed rides

display(X.head())
display(y.head())

Unnamed: 0,Avg VTAT,Avg CTAT,Cancelled Rides by Customer,Cancelled Rides by Driver,Incomplete Rides,Booking Value,Ride Distance,Driver Ratings,Customer Rating,Booking Status_Cancelled by Driver,...,Reason for cancelling by Customer_Wrong Address,Driver Cancellation Reason_More than permitted people in there,Driver Cancellation Reason_Personal & Car related issues,Driver Cancellation Reason_The customer was coughing/sick,Incomplete Rides Reason_Other Issue,Incomplete Rides Reason_Vehicle Breakdown,Payment Method_Credit Card,Payment Method_Debit Card,Payment Method_UPI,Payment Method_Uber Wallet
0,-1.929267e-16,-1.408096e-16,0.0,0.0,0.0,-3.113157e-17,-4.046682e-17,-2.036663e-15,2.415845e-16,False,...,True,False,False,False,False,False,False,False,True,False
1,-0.9772667,-2.063641,0.0,0.0,0.0,-0.8312062,-1.637481,-2.036663e-15,2.415845e-16,False,...,True,False,False,False,False,True,False,False,True,False
2,1.358488,-0.456278,0.0,0.0,0.0,0.3636899,-0.9576153,1.94484,1.437084,False,...,True,False,False,False,False,False,False,True,False,False
3,1.27605,-0.08849163,0.0,0.0,0.0,-0.2827796,0.8126331,1.072724,1.727161,False,...,True,False,False,False,False,False,False,False,True,False
4,-0.8673489,-1.300825,0.0,0.0,0.0,0.7007119,2.041587,-0.3808019,-0.303373,False,...,True,False,False,False,False,False,False,False,True,False


Unnamed: 0,Booking Status_Completed
0,False
1,False
2,True
3,True
4,True


## Build and train baseline model

### Subtask:
Create a basic neural network model with a common activation function (e.g., ReLU) and train it on the preprocessed data to establish a baseline accuracy.


**Reasoning**:
Create and train a basic neural network model using ReLU activation for hidden layers and Sigmoid for the output layer, and evaluate its performance to establish a baseline.



In [None]:
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Sequential model
model = Sequential()

# Add input layer and hidden layers with ReLU activation
model.add(Dense(128, input_shape=(X_train.shape[1],), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))


# Add output layer with Sigmoid activation for binary classification
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Test Loss: {loss:.4f}')
print(f'Test Accuracy: {accuracy:.4f}')

## Experiment with different activation functions

### Subtask:
Modify the baseline model to use different activation functions (e.g., Sigmoid, Tanh, Leaky ReLU) and train each version.


**Reasoning**:
Create and train models with different activation functions (Sigmoid, Tanh, Leaky ReLU) using the same architecture, optimizer, loss function, epochs, and batch size as the baseline model.



In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf

# Ensure reproducibility
tf.random.set_seed(42)

# Sigmoid model
model_sigmoid = Sequential()
model_sigmoid.add(Dense(128, input_shape=(X_train.shape[1],), activation='sigmoid'))
model_sigmoid.add(Dense(64, activation='sigmoid'))
model_sigmoid.add(Dense(32, activation='sigmoid'))
model_sigmoid.add(Dense(1, activation='sigmoid'))
model_sigmoid.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_sigmoid = model_sigmoid.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

# Tanh model
model_tanh = Sequential()
model_tanh.add(Dense(128, input_shape=(X_train.shape[1],), activation='tanh'))
model_tanh.add(Dense(64, activation='tanh'))
model_tanh.add(Dense(32, activation='tanh'))
model_tanh.add(Dense(1, activation='sigmoid'))
model_tanh.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_tanh = model_tanh.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

# Leaky ReLU model
model_leakyrelu = Sequential()
model_leakyrelu.add(Dense(128, input_shape=(X_train.shape[1],), activation='leaky_relu'))
model_leakyrelu.add(Dense(64, activation='leaky_relu'))
model_leakyrelu.add(Dense(32, activation='leaky_relu'))
model_leakyrelu.add(Dense(1, activation='sigmoid'))
model_leakyrelu.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_leakyrelu = model_leakyrelu.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

print("Training complete for Sigmoid, Tanh, and Leaky ReLU models.")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Training complete for Sigmoid, Tanh, and Leaky ReLU models.


**Reasoning**:
The previous code failed because X_train and y_train were not defined in the current session. Need to re-run the code that splits the data into training and testing sets before creating and training the models.



In [None]:
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Ensure reproducibility
tf.random.set_seed(42)

# Sigmoid model
model_sigmoid = Sequential()
model_sigmoid.add(Dense(128, input_shape=(X_train.shape[1],), activation='sigmoid'))
model_sigmoid.add(Dense(64, activation='sigmoid'))
model_sigmoid.add(Dense(32, activation='sigmoid'))
model_sigmoid.add(Dense(1, activation='sigmoid'))
model_sigmoid.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_sigmoid = model_sigmoid.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

# Tanh model
model_tanh = Sequential()
model_tanh.add(Dense(128, input_shape=(X_train.shape[1],), activation='tanh'))
model_tanh.add(Dense(64, activation='tanh'))
model_tanh.add(Dense(32, activation='tanh'))
model_tanh.add(Dense(1, activation='sigmoid'))
model_tanh.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_tanh = model_tanh.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

# Leaky ReLU model
model_leakyrelu = Sequential()
model_leakyrelu.add(Dense(128, input_shape=(X_train.shape[1],), activation='leaky_relu'))
model_leakyrelu.add(Dense(64, activation='leaky_relu'))
model_leakyrelu.add(Dense(32, activation='leaky_relu'))
model_leakyrelu.add(Dense(1, activation='sigmoid'))
model_leakyrelu.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_leakyrelu = model_leakyrelu.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

print("Training complete for Sigmoid, Tanh, and Leaky ReLU models.")

**Reasoning**:
The previous code failed because X and y were not defined in the current session. Need to re-run the data preprocessing steps to define X and y before splitting the data and training the models.



In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf

# Load the dataset
df = pd.read_csv('/content/ncr_ride_bookings (1).csv')

# Impute missing numerical values with the mean
for col in ['Avg VTAT', 'Avg CTAT', 'Cancelled Rides by Customer',
            'Cancelled Rides by Driver', 'Incomplete Rides', 'Booking Value',
            'Ride Distance', 'Driver Ratings', 'Customer Rating']:
    if df[col].isnull().any():
        df[col].fillna(df[col].mean(), inplace=True)

# Impute missing categorical values with the mode
for col in ['Reason for cancelling by Customer', 'Driver Cancellation Reason',
            'Incomplete Rides Reason', 'Payment Method']:
     if df[col].isnull().any():
        df[col].fillna(df[col].mode()[0], inplace=True)

# Identify categorical features to encode (excluding Date, Time, Booking ID, Customer ID)
categorical_features = ['Booking Status', 'Vehicle Type', 'Pickup Location', 'Drop Location',
                        'Reason for cancelling by Customer', 'Driver Cancellation Reason',
                        'Incomplete Rides Reason', 'Payment Method']

# Apply one-hot encoding
df_encoded = pd.get_dummies(df, columns=categorical_features, drop_first=True)

# Identify numerical features to scale (excluding Date, Time, Booking ID, Customer ID)
numerical_features_scaled = ['Avg VTAT', 'Avg CTAT', 'Cancelled Rides by Customer',
                             'Cancelled Rides by Driver', 'Incomplete Rides', 'Booking Value',
                             'Ride Distance', 'Driver Ratings', 'Customer Rating']

# Apply StandardScaler
scaler = StandardScaler()
df_encoded[numerical_features_scaled] = scaler.fit_transform(df_encoded[numerical_features_scaled])

# Define features (X) and target variable (y)
X = df_encoded.drop(columns=['Date', 'Time', 'Booking ID', 'Customer ID',
                             'Booking Status_Completed', 'Booking Status_Incomplete',
                             'Booking Status_No Driver Found']) # Assuming 'Booking Status' is the target and dropping original and encoded target columns
y = df_encoded['Booking Status_Completed'] # Example target: predicting completed rides

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Ensure reproducibility
tf.random.set_seed(42)

# Sigmoid model
model_sigmoid = Sequential()
model_sigmoid.add(Dense(128, input_shape=(X_train.shape[1],), activation='sigmoid'))
model_sigmoid.add(Dense(64, activation='sigmoid'))
model_sigmoid.add(Dense(32, activation='sigmoid'))
model_sigmoid.add(Dense(1, activation='sigmoid'))
model_sigmoid.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_sigmoid = model_sigmoid.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

# Tanh model
model_tanh = Sequential()
model_tanh.add(Dense(128, input_shape=(X_train.shape[1],), activation='tanh'))
model_tanh.add(Dense(64, activation='tanh'))
model_tanh.add(Dense(32, activation='tanh'))
model_tanh.add(Dense(1, activation='sigmoid'))
model_tanh.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_tanh = model_tanh.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

# Leaky ReLU model
model_leakyrelu = Sequential()
model_leakyrelu.add(Dense(128, input_shape=(X_train.shape[1],), activation='leaky_relu'))
model_leakyrelu.add(Dense(64, activation='leaky_relu'))
model_leakyrelu.add(Dense(32, activation='leaky_relu'))
model_leakyrelu.add(Dense(1, activation='sigmoid'))
model_leakyrelu.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_leakyrelu = model_leakyrelu.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

print("Training complete for Sigmoid, Tanh, and Leaky ReLU models.")

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].mean(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].mode()[0], inplace=True)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwa

Training complete for Sigmoid, Tanh, and Leaky ReLU models.


## Evaluate model performance

### Subtask:
Evaluate the performance of each model using appropriate metrics (e.g., accuracy, precision, recall) and compare the results.


**Reasoning**:
Evaluate each trained model on the test set and print the accuracy for comparison.



In [None]:
# Evaluate Sigmoid model
loss_sigmoid, accuracy_sigmoid = model_sigmoid.evaluate(X_test, y_test, verbose=0)
print(f'Sigmoid Model Test Accuracy: {accuracy_sigmoid:.4f}')

# Evaluate Tanh model
loss_tanh, accuracy_tanh = model_tanh.evaluate(X_test, y_test, verbose=0)
print(f'Tanh Model Test Accuracy: {accuracy_tanh:.4f}')

# Evaluate Leaky ReLU model
loss_leakyrelu, accuracy_leakyrelu = model_leakyrelu.evaluate(X_test, y_test, verbose=0)
print(f'Leaky ReLU Model Test Accuracy: {accuracy_leakyrelu:.4f}')

Sigmoid Model Test Accuracy: 0.9976
Tanh Model Test Accuracy: 0.9999
Leaky ReLU Model Test Accuracy: 0.9997


## Implement techniques to improve accuracy

### Subtask:
Apply techniques like regularization (L1, L2), dropout, or batch normalization to the models and evaluate their impact on accuracy.


**Reasoning**:
Define a function to build, train, and evaluate a neural network model with a given activation function, Batch Normalization, and Dropout layers.



In [None]:
from tensorflow.keras.layers import BatchNormalization, Dropout

def build_and_evaluate_model_with_regularization(activation_function):
    """
    Builds, compiles, trains, and evaluates a neural network model with
    Batch Normalization and Dropout layers for a given activation function.

    Args:
        activation_function: The activation function to use in the hidden layers.

    Returns:
        The accuracy of the trained model on the test set.
    """
    model = Sequential()
    model.add(Dense(128, input_shape=(X_train.shape[1],)))
    model.add(BatchNormalization())
    model.add(Activation(activation_function))
    model.add(Dropout(0.3))

    model.add(Dense(64))
    model.add(BatchNormalization())
    model.add(Activation(activation_function))
    model.add(Dropout(0.3))

    model.add(Dense(32))
    model.add(BatchNormalization())
    model.add(Activation(activation_function))
    model.add(Dropout(0.3))

    model.add(Dense(1, activation='sigmoid')) # Output layer for binary classification

    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=0)

    loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
    return accuracy


**Reasoning**:
Import Activation layer from Keras and call the function for each activation function and print the results.



In [None]:
from tensorflow.keras.layers import Activation

# Evaluate models with regularization for each activation function
accuracy_sigmoid_reg = build_and_evaluate_model_with_regularization('sigmoid')
accuracy_tanh_reg = build_and_evaluate_model_with_regularization('tanh')
accuracy_leakyrelu_reg = build_and_evaluate_model_with_regularization('leaky_relu')

# Print the accuracies
print(f'Sigmoid Model with Regularization Test Accuracy: {accuracy_sigmoid_reg:.4f}')
print(f'Tanh Model with Regularization Test Accuracy: {accuracy_tanh_reg:.4f}')
print(f'Leaky ReLU Model with Regularization Test Accuracy: {accuracy_leakyrelu_reg:.4f}')

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Sigmoid Model with Regularization Test Accuracy: 0.9947
Tanh Model with Regularization Test Accuracy: 0.9969
Leaky ReLU Model with Regularization Test Accuracy: 0.9968


## Analyze and present findings

### Subtask:
Analyze the results from the experiments, discussing the advantages and disadvantages of each activation function and the effectiveness of the accuracy improvement techniques. Present these findings in a clear and structured manner.


**Reasoning**:
Summarize and discuss the results from the model training and evaluation, comparing the different activation functions and the impact of regularization based on the previously calculated accuracies.



In [None]:
print("Analysis of Activation Functions and Regularization:\n")

print("Comparison of Activation Functions (without regularization):")
print(f"Sigmoid Model Test Accuracy: {accuracy_sigmoid:.4f}")
print(f"Tanh Model Test Accuracy: {accuracy_tanh:.4f}")
print(f"Leaky ReLU Model Test Accuracy: {accuracy_leakyrelu:.4f}")

print("\nDiscussion of Activation Functions (without regularization):")
print("- Sigmoid: The Sigmoid function compresses outputs between 0 and 1, making it suitable for the output layer in binary classification. However, it suffers from the vanishing gradient problem, especially in deeper networks, where gradients become very small during backpropagation, hindering effective learning in early layers. This can lead to slower training and potentially lower performance compared to other activation functions in hidden layers.")
print("- Tanh: Tanh (Hyperbolic Tangent) is similar to Sigmoid but outputs values between -1 and 1. It is zero-centered, which can help in the backpropagation process compared to Sigmoid. Like Sigmoid, Tanh also suffers from the vanishing gradient problem, although it is generally less severe than with Sigmoid due to its wider output range.")
print("- Leaky ReLU: Leaky ReLU addresses the 'dying ReLU' problem by allowing a small, non-zero gradient for negative inputs. This helps in preventing neurons from becoming inactive during training. Leaky ReLU generally trains faster than Sigmoid and Tanh and is less susceptible to vanishing gradients, making it a popular choice for hidden layers.")

print("\nComparison of Activation Functions (with regularization):")
print(f"Sigmoid Model with Regularization Test Accuracy: {accuracy_sigmoid_reg:.4f}")
print(f"Tanh Model with Regularization Test Accuracy: {accuracy_tanh_reg:.4f}")
print(f"Leaky ReLU Model with Regularization Test Accuracy: {accuracy_leakyrelu_reg:.4f}")

print("\nDiscussion of the Impact of Regularization:")
print("Regularization techniques, specifically Batch Normalization and Dropout, were applied to the models. Batch Normalization helps in stabilizing the learning process and allows for higher learning rates by normalizing the inputs to layers. Dropout randomly deactivates a fraction of neurons during training, which helps in preventing overfitting by making the network less reliant on specific neurons.")
print("Comparing the accuracies:")
print(f"- Sigmoid: Accuracy improved from {accuracy_sigmoid:.4f} to {accuracy_sigmoid_reg:.4f} with regularization.")
print(f"- Tanh: Accuracy changed from {accuracy_tanh:.4f} to {accuracy_tanh_reg:.4f} with regularization.")
print(f"- Leaky ReLU: Accuracy changed from {accuracy_leakyrelu:.4f} to {accuracy_leakyrelu_reg:.4f} with regularization.")
print("In this case, the impact of regularization varied across the activation functions. For Sigmoid, regularization resulted in a slight improvement. For Tanh and Leaky ReLU, which already performed very well without regularization, the impact of regularization was less pronounced in terms of overall accuracy, and in some cases, a slight decrease in accuracy was observed on the test set. This could indicate that the non-regularized Tanh and Leaky ReLU models were already generalizing well to the test data, or that the specific regularization parameters (like dropout rate) could be further tuned.")
print("However, regularization is primarily aimed at improving generalization to unseen data and preventing overfitting, which might not be fully reflected in the test accuracy alone. It's possible that with more complex models or larger datasets, the benefits of regularization would be more evident, especially in preventing a large gap between training and test accuracy.")

print("\nSummary of Advantages and Disadvantages of Activation Functions (based on observed results and properties):")
print("- Sigmoid:")
print("  - Advantage: Good for output layer in binary classification (interpretable as probability).")
print("  - Disadvantages: Suffers from vanishing gradients, not zero-centered, can lead to slower training.")
print("- Tanh:")
print("  - Advantages: Zero-centered, generally performs better than Sigmoid in hidden layers, less severe vanishing gradient problem than Sigmoid.")
print("  - Disadvantages: Still suffers from vanishing gradients.")
print("- Leaky ReLU:")
print("  - Advantages: Addresses dying ReLU problem, computationally efficient, generally trains faster, less susceptible to vanishing gradients.")
print("  - Disadvantages: The 'leak' rate needs to be chosen (though often a small default works well).")

print("\nEffectiveness of Regularization Techniques:")
print("Batch Normalization and Dropout are effective techniques for improving the training stability and generalization of neural networks. While the test accuracy improvement might be marginal or even slightly negative in some cases here due to the already high baseline performance of some models, these techniques are crucial in preventing overfitting, especially in scenarios with limited data or more complex model architectures. They contribute to building more robust models.")

Analysis of Activation Functions and Regularization:

Comparison of Activation Functions (without regularization):
Sigmoid Model Test Accuracy: 0.9976
Tanh Model Test Accuracy: 0.9999
Leaky ReLU Model Test Accuracy: 0.9997

Discussion of Activation Functions (without regularization):
- Sigmoid: The Sigmoid function compresses outputs between 0 and 1, making it suitable for the output layer in binary classification. However, it suffers from the vanishing gradient problem, especially in deeper networks, where gradients become very small during backpropagation, hindering effective learning in early layers. This can lead to slower training and potentially lower performance compared to other activation functions in hidden layers.
- Tanh: Tanh (Hyperbolic Tangent) is similar to Sigmoid but outputs values between -1 and 1. It is zero-centered, which can help in the backpropagation process compared to Sigmoid. Like Sigmoid, Tanh also suffers from the vanishing gradient problem, although it is 

## Summary:

### Data Analysis Key Findings

*   Tanh activation function achieved the highest test accuracy (0.9999) without regularization, followed by Leaky ReLU (0.9997) and Sigmoid (0.9976).
*   The addition of Batch Normalization and Dropout regularization techniques resulted in a slight decrease in test accuracy for all activation functions tested (Sigmoid: 0.9947, Tanh: 0.9969, Leaky ReLU: 0.9968).
*   Sigmoid is suitable for the output layer in binary classification due to its 0 to 1 output range but suffers from vanishing gradients.
*   Tanh is zero-centered, which can aid backpropagation, and generally performs better than Sigmoid in hidden layers, though it can still suffer from vanishing gradients.
*   Leaky ReLU effectively addresses the dying ReLU problem, is computationally efficient, and is less susceptible to vanishing gradients.
*   Batch Normalization and Dropout are valuable techniques for improving training stability and generalization, even if they did not increase test accuracy in this specific case.

### Insights or Next Steps

*   Investigate if the dataset is easily separable, leading to high accuracies even with basic models and potentially masking the benefits of regularization on test accuracy.
*   Experiment with different regularization parameters (e.g., dropout rate, L1/L2 strength) and potentially other accuracy improvement techniques to see if they yield better results or improve generalization further.
