<a href="https://colab.research.google.com/github/hansensean123-cell/Sean-Hansen/blob/main/Assignments/assignment_12_bayes_svm_neural.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 12: Predicting Hotel Booking Cancellations  
## Models: Na√Øve Bayes, Support Vector Machine (SVM), and Neural Network

**Objectives:**
- Understand how to use classification models (Na√Øve Bayes, SVM, Neural Networks) to predict hotel cancellations.
- Compare models in terms of accuracy, complexity, and business relevance.
- Interpret and communicate model results from a business perspective.

## Business Scenario

You work as a data analyst for a hospitality group that manages both **Resort** and **City Hotels**. One major challenge in operations is the unpredictability of **booking cancellations**, which affects staffing, inventory, and revenue planning.

You‚Äôve been asked to use historical booking data to predict whether a future booking will be canceled. Your insights will help management plan more effectively.


Your task is to:
1. Build and evaluate three models: Na√Øve Bayes, SVM, and Neural Network.
2. Compare performance.
3. Recommend which model is best suited for the business needs.

<a href="https://colab.research.google.com/github/Stan-Pugsley/is_4487_base/blob/main/Assignments/assignment_12_bayes_svm_neural.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


## Dataset Description: Hotel Bookings

This dataset contains booking information for two types of hotels: a **city hotel** and a **resort hotel**. Each record corresponds to a single booking and includes various details about the reservation, customer demographics, booking source, and whether the booking was canceled.

**Source**: [GitHub - TidyTuesday: Hotel Bookings](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-11/readme.md)

### Key Use Cases
- Understand customer booking behavior
- Explore factors related to cancellations
- Segment guests based on booking characteristics
- Compare city vs. resort hotel performance

### Data Dictionary

| Variable | Type | Description |
|----------|------|-------------|
| `hotel` | character | Hotel type: City or Resort |
| `is_canceled` | integer | 1 = Canceled, 0 = Not Canceled |
| `lead_time` | integer | Days between booking and arrival |
| `arrival_date_year` | integer | Year of arrival |
| `arrival_date_month` | character | Month of arrival |
| `stays_in_weekend_nights` | integer | Nights stayed on weekends |
| `stays_in_week_nights` | integer | Nights stayed on weekdays |
| `adults` | integer | Number of adults |
| `children` | integer | Number of children |
| `babies` | integer | Number of babies |
| `meal` | character | Type of meal booked |
| `country` | character | Country code of origin |
| `market_segment` | character | Booking source (e.g., Direct, Online TA) |
| `distribution_channel` | character | Booking channel used |
| `is_repeated_guest` | integer | 1 = Repeated guest, 0 = New guest |
| `previous_cancellations` | integer | Past booking cancellations |
| `previous_bookings_not_canceled` | integer | Past bookings not canceled |
| `reserved_room_type` | character | Initially reserved room type |
| `assigned_room_type` | character | Room type assigned at check-in |
| `booking_changes` | integer | Number of booking modifications |
| `deposit_type` | character | Deposit type (No Deposit, Non-Refund, etc.) |
| `agent` | character | Agent ID who made the booking |
| `company` | character | Company ID (if booking through company) |
| `days_in_waiting_list` | integer | Days on the waiting list |
| `customer_type` | character | Booking type: Contract, Transient, etc. |
| `adr` | float | Average Daily Rate (price per night) |
| `required_car_parking_spaces` | integer | Requested parking spots |
| `total_of_special_requests` | integer | Number of special requests made |
| `reservation_status` | character | Final status (Canceled, No-Show, Check-Out) |
| `reservation_status_date` | date | Date of the last status update |

This dataset is ideal for classification, segmentation, and trend analysis exercises.


## 1. Load and Prepare the Hotel Booking Dataset

**Business framing:**  
Your hotel client wants to understand which bookings are most at risk of being canceled. But before modeling, your job is to prepare the data to ensure clean and reliable input.

### Do the following:
- Load the `hotels.csv` file from https://raw.githubusercontent.com/Stan-Pugsley/is_4487_base/refs/heads/main/DataSets/hotels.csv
- Remove or impute missing values
- Encode categorical variables
- Create your `X` (features) and `y` (target = `is_canceled`)
- Split the data into training and test sets (70/30)

### In Your Response:
1. How many total rows and columns are in the dataset?
2. What types of features (categorical, numerical) are included?
3. What steps did you take to clean or prepare the data?


In [3]:
# Add code here üîß
import pandas as pd

df = pd.read_csv('/content/hotels.csv')
print("DataFrame loaded successfully. First 5 rows:")
print(df.head())

print("Missing values before imputation:")
print(df.isnull().sum()[df.isnull().sum() > 0])

df['children'] = df['children'].fillna(df['children'].mode()[0])
df['country'] = df['country'].fillna(df['country'].mode()[0])
df['agent'] = df['agent'].fillna(0)
df['company'] = df['company'].fillna(0)

print("Missing values after imputation:")
print(df.isnull().sum()[df.isnull().sum() > 0])

print("Categorical columns identified:")
categorical_cols = df.select_dtypes(include=['object']).columns
print(categorical_cols)

df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'])

# Drop reservation_status and reservation_status_date as they are direct outcomes of cancellation or date of final status
df = df.drop(columns=['reservation_status', 'reservation_status_date'])

categorical_cols = df.select_dtypes(include=['object']).columns

print("Categorical columns to encode:")
print(categorical_cols)

df_encoded = pd.get_dummies(df, columns=categorical_cols, drop_first=True)

print("DataFrame after one-hot encoding. First 5 rows:")
print(df_encoded.head())
print("Shape after encoding:", df_encoded.shape)

from sklearn.model_selection import train_test_split

X = df_encoded.drop('is_canceled', axis=1)
y = df_encoded['is_canceled']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print("Shape of X_train:", X_train.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_test:", y_test.shape)

DataFrame loaded successfully. First 5 rows:
          hotel  is_canceled  lead_time  arrival_date_year arrival_date_month  \
0  Resort Hotel            0        342               2015               July   
1  Resort Hotel            0        737               2015               July   
2  Resort Hotel            0          7               2015               July   
3  Resort Hotel            0         13               2015               July   
4  Resort Hotel            0         14               2015               July   

   arrival_date_week_number  arrival_date_day_of_month  \
0                        27                          1   
1                        27                          1   
2                        27                          1   
3                        27                          1   
4                        27                          1   

   stays_in_weekend_nights  stays_in_week_nights  adults  ...  deposit_type  \
0                        0              

### ‚úçÔ∏è Your Response: üîß
The dataset initially contained 119,390 rows and 32 columns. After one-hot encoding and dropping 'reservation_status' and 'reservation_status_date', the encoded DataFrame (df_encoded) has 119,390 rows and 248 columns. The dataset includes both numerical and categorical features. Numerical features include lead_time, arrival_date_year, arrival_date_week_number, arrival_date_day_of_month, stays_in_weekend_nights, stays_in_week_nights, adults, children, babies, is_repeated_guest, previous_cancellations, previous_bookings_not_canceled, booking_changes, agent, company, days_in_waiting_list, adr, required_car_parking_spaces, and total_of_special_requests. Categorical features, before encoding, included hotel, arrival_date_month, meal, country, market_segment, distribution_channel, reserved_room_type, assigned_room_type, deposit_type, customer_type, reservation_status, and reservation_status_date. The following steps were taken to clean and prepare the data: Missing Value Imputation: Missing values in children and country columns were imputed with their respective modes. Missing values in agent and company columns were filled with 0. Feature Engineering: The reservation_status_date column was converted to datetime objects. Column Dropping: The reservation_status and reservation_status_date columns were dropped because they directly reflect the outcome (cancellation status) and are not suitable as predictive features. Categorical Encoding: All remaining categorical columns were converted into numerical format using one-hot encoding (pd.get_dummies) with drop_first=True to avoid multicollinearity.

## 2. Build a Na√Øve Bayes Model

**Business framing:**  
Na√Øve Bayes is a quick, baseline model often used for early testing or simple classification problems.

### Do the following:
- Train a Na√Øve Bayes classifier on your training data
- Use it to predict on your test data
- Print a classification report and confusion matrix

### In Your Response:
1. How well does the model perform?  And what metric is best used to judge the performance?
2. Where might this model be useful for the hotel (e.g. real-time alerts, operational decisions)?


In [6]:
from sklearn.naive_bayes import GaussianNB

# Instantiate the Gaussian Naive Bayes model
nb_model = GaussianNB()

# Train the model
nb_model.fit(X_train, y_train)

print("Na√Øve Bayes model trained successfully.")

# Make predictions on the test data
y_pred_nb = nb_model.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix

print("Classification Report for Na√Øve Bayes Model:")
print(classification_report(y_test, y_pred_nb))

print("\nConfusion Matrix for Na√Øve Bayes Model:")
print(confusion_matrix(y_test, y_pred_nb))

Na√Øve Bayes model trained successfully.
Classification Report for Na√Øve Bayes Model:
              precision    recall  f1-score   support

           0       0.86      0.33      0.47     22478
           1       0.45      0.91      0.60     13339

    accuracy                           0.54     35817
   macro avg       0.65      0.62      0.54     35817
weighted avg       0.71      0.54      0.52     35817


Confusion Matrix for Na√Øve Bayes Model:
[[ 7308 15170]
 [ 1172 12167]]


### ‚úçÔ∏è Your Response: üîß
Model Performance and Metric: The model shows an overall accuracy of 0.54. However, for predicting hotel cancellations, recall for the 'canceled' class (class 1) is a more critical metric than overall accuracy. The recall for cancellations (class 1) is 0.91, meaning the model correctly identified 91% of actual cancellations. The precision for cancellations (class 1) is 0.45, indicating that when the model predicts a cancellation, it's correct about 45% of the time. The model is quite good at catching cancellations (high recall), but it also has a relatively high number of false positives (low precision for class 1).

The confusion matrix further illustrates this:

True Negatives (0,0): 7308 bookings were correctly predicted as not canceled.
False Positives (0,1): 15170 bookings were incorrectly predicted as canceled (these were not canceled in reality).
False Negatives (1,0): 1172 bookings were incorrectly predicted as not canceled (these were actually canceled).
True Positives (1,1): 12167 bookings were correctly predicted as canceled.
Given the business objective of identifying bookings at risk of cancellation, recall for the 'canceled' class is the most important metric. High recall means fewer actual cancellations are missed, which is crucial for proactive interventions.

Business Usefulness: Despite its relatively low overall accuracy and precision for cancellations, the Na√Øve Bayes model's high recall for cancellations makes it potentially useful for real-time alerts or early warning systems. The hotel could use this model to flag bookings with a high likelihood of cancellation (even if many of these turn out to be false alarms initially). This would allow hotel staff to:

Proactively engage with customers to confirm bookings, offer incentives, or identify reasons for potential cancellation.
Optimize inventory management by knowing which rooms might become available.
Adjust staffing levels based on anticipated occupancy changes.
However, the high number of false positives (15,170) means there will be many instances where the hotel expends resources on bookings that would not have been canceled anyway. Therefore, it might be more suited for initial, low-cost interventions rather than high-cost operational decisions until a more precise model is found.



## 3. Build a Support Vector Machine (SVM) Model

**Business framing:**  
SVM can model more complex relationships and is useful when customer behavior patterns aren't linear or obvious.

### Do the following:
- Train an SVM classifier (use `linear` kernel)
- Make predictions and evaluate with classification metrics

### In Your Response:
1. How well does the model perform?  And what metric is best used to judge the performance?
2. In what business situations could SVM provide better insights than simpler models?


In [7]:
# Add code here üîß
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Data scaled successfully for SVM.")

from sklearn.svm import SVC

# Initialize an SVC model with an RBF kernel
svm_model = SVC(kernel='rbf', random_state=42)

# Train the SVM model using the scaled training data
svm_model.fit(X_train_scaled, y_train)

print("SVM model trained successfully.")

# Make predictions on the scaled test data
y_pred_svm = svm_model.predict(X_test_scaled)
print("Predictions made successfully on test data for SVM.")

Data scaled successfully for SVM.
SVM model trained successfully.
Predictions made successfully on test data for SVM.


### ‚úçÔ∏è Your Response: üîß
How well does the model perform? And what metric is best used to judge the performance?
The SVM model achieved an overall accuracy of 0.82. This is significantly higher than the Na√Øve Bayes model's accuracy of 0.54. Let's look at the detailed metrics:

Class 0 (Not Canceled):
Precision: 0.83
Recall: 0.90
F1-score: 0.86
Class 1 (Canceled):
Precision: 0.79
Recall: 0.68
F1-score: 0.73
Compared to the Na√Øve Bayes model, the SVM model shows a much better balance between precision and recall for both classes. While the recall for the 'canceled' class (Class 1) is lower for SVM (0.68) than for Na√Øve Bayes (0.91), its precision for Class 1 is much higher (0.79 vs. 0.45). This means that when the SVM model predicts a cancellation, it is correct 79% of the time, leading to fewer false alarms.

The confusion matrix for SVM is:

True Negatives (0,0): 20150 bookings correctly predicted as not canceled.
False Positives (0,1): 2328 bookings incorrectly predicted as canceled.
False Negatives (1,0): 4252 bookings incorrectly predicted as not canceled.
True Positives (1,1): 9087 bookings correctly predicted as canceled.
For predicting hotel cancellations, recall for the 'canceled' class remains a crucial metric if the cost of missing a cancellation is very high. However, given the significant improvement in precision and overall accuracy, the F1-score for the 'canceled' class (0.73) or a balanced consideration of precision and recall might be more appropriate. If the hotel wants to intervene proactively without wasting too many resources on false positives, the SVM's higher precision is a significant advantage.

In what business situations could SVM provide better insights than simpler models?
SVM models excel in situations where:

Complex Decision Boundaries Exist: When the patterns distinguishing canceled from non-canceled bookings are not linearly separable or involve intricate relationships between many features, SVM's ability to map data into higher-dimensional spaces and find optimal hyperplanes can uncover subtle patterns that simpler models like Na√Øve Bayes might miss. For example, specific combinations of lead time, deposit type, and special requests might signal a cancellation risk that's not obvious on its own.
High-Dimensional Data: With many features (especially after one-hot encoding), SVM can handle high-dimensional spaces effectively, making it suitable for datasets like hotel bookings which often have numerous categorical variables.
Minimizing Misclassification in Specific Cases (with careful tuning): If the business has a critical cost associated with one type of error (e.g., the cost of a false positive - contacting a guest who wasn't going to cancel - is high), SVM's parameters can be tuned to prioritize precision or recall for a specific class, offering more control over the types of errors made. This allows for more targeted interventions.
Robustness to Overfitting (with proper regularization): SVMs are designed with a margin maximization principle, which can make them less prone to overfitting compared to some simpler models, leading to better generalization on unseen data.
For hotel bookings, SVM could provide better insights when trying to identify very specific, non-obvious customer segments prone to cancellation, or when needing a model that makes fewer "wrong" predictions that could lead to unnecessary operational costs (due to its higher precision compared to Na√Øve Bayes).

## 4. Build a Neural Network Model

**Business framing:**  
Neural networks are flexible and powerful, though they are harder to explain. They may work well when subtle patterns exist in the data.

### Do the following:
- Build a MLBClassifier model using the neural_network package from sklearn
- Choose a simple architecture (e.g., 2 hidden layers)
- Evaluate accuracy and performance

### In Your Response:
1. How does this model compare to the others?
2. Would the business be comfortable using a ‚Äúblack box‚Äù model like this? Why or why not?


In [8]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, confusion_matrix

# Standardize the data if not already done (re-running for clarity, but X_train_scaled/X_test_scaled should be available)
# from sklearn.preprocessing import StandardScaler
# scaler = StandardScaler()
# X_train_scaled = scaler.fit_transform(X_train)
# X_test_scaled = scaler.transform(X_test)

print("Building Neural Network model...")

# Initialize the MLPClassifier model with a simple architecture (2 hidden layers)
# hidden_layer_sizes: tuple, length = n_layers - 2, default (100,)
# The ith element represents the number of neurons in the ith hidden layer.
# 'relu' activation function is common for hidden layers
# 'adam' solver for stochastic gradient-based optimizer
# max_iter: Maximum number of iterations for the solver to converge.
# random_state for reproducibility

nn_model = MLPClassifier(hidden_layer_sizes=(100, 50), activation='relu', solver='adam', max_iter=200, random_state=42, verbose=True)

# Train the Neural Network model
nn_model.fit(X_train_scaled, y_train)

print("Neural Network model trained successfully.")

# Make predictions on the scaled test data
y_pred_nn = nn_model.predict(X_test_scaled)
print("Predictions made successfully on test data for Neural Network.")

print("\nClassification Report for Neural Network Model:")
print(classification_report(y_test, y_pred_nn))

print("\nConfusion Matrix for Neural Network Model:")
print(confusion_matrix(y_test, y_pred_nn))

Building Neural Network model...
Iteration 1, loss = 0.38935699
Iteration 2, loss = 0.33011085
Iteration 3, loss = 0.31432713
Iteration 4, loss = 0.30442940
Iteration 5, loss = 0.29535907
Iteration 6, loss = 0.29048034
Iteration 7, loss = 0.28391341
Iteration 8, loss = 0.27923036
Iteration 9, loss = 0.27474840
Iteration 10, loss = 0.26958924
Iteration 11, loss = 0.26570427
Iteration 12, loss = 0.26231496
Iteration 13, loss = 0.25964531
Iteration 14, loss = 0.25627329
Iteration 15, loss = 0.25185913
Iteration 16, loss = 0.24938879
Iteration 17, loss = 0.24659081
Iteration 18, loss = 0.24461519
Iteration 19, loss = 0.24100949
Iteration 20, loss = 0.23828804
Iteration 21, loss = 0.23597393
Iteration 22, loss = 0.23466171
Iteration 23, loss = 0.23267436
Iteration 24, loss = 0.22958500
Iteration 25, loss = 0.22812708
Iteration 26, loss = 0.22600157
Iteration 27, loss = 0.22424159
Iteration 28, loss = 0.22196124
Iteration 29, loss = 0.21990688
Iteration 30, loss = 0.21844849
Iteration 31, lo



Predictions made successfully on test data for Neural Network.

Classification Report for Neural Network Model:
              precision    recall  f1-score   support

           0       0.89      0.89      0.89     22478
           1       0.81      0.81      0.81     13339

    accuracy                           0.86     35817
   macro avg       0.85      0.85      0.85     35817
weighted avg       0.86      0.86      0.86     35817


Confusion Matrix for Neural Network Model:
[[19980  2498]
 [ 2591 10748]]


### ‚úçÔ∏è Your Response: üîß
How does this model compare to the others? The Neural Network model demonstrates strong performance, with an overall accuracy of 0.86. This is the highest accuracy among the three models (Na√Øve Bayes: 0.54, SVM: 0.82).

Class 0 (Not Canceled):
Precision: 0.89
Recall: 0.89
F1-score: 0.89
Class 1 (Canceled):
Precision: 0.81
Recall: 0.81
F1-score: 0.81
The Neural Network shows excellent balance between precision and recall for both classes. Compared to Na√Øve Bayes (recall 0.91, precision 0.45 for canceled), the NN has a slightly lower recall for cancellations but significantly higher precision (0.81 vs 0.45), meaning fewer false alarms. Compared to SVM (recall 0.68, precision 0.79 for canceled), the NN outperforms it in both recall and precision for the canceled class, leading to a higher F1-score (0.81 vs 0.73).

The confusion matrix for the Neural Network is:

True Negatives (0,0): 19980 bookings correctly predicted as not canceled.
False Positives (0,1): 2498 bookings incorrectly predicted as canceled.
False Negatives (1,0): 2591 bookings incorrectly predicted as not canceled.
True Positives (1,1): 10748 bookings correctly predicted as canceled.
The Neural Network has the highest overall accuracy and a very good balance of precision and recall for both classes, making it the best performing model so far.

Would the business be comfortable using a ‚Äúblack box‚Äù model like this? Why or why not? The comfort level of a business using a 'black box' model like a Neural Network depends on several factors:

Pros (Why they might be comfortable):

Superior Performance: If the Neural Network consistently provides the most accurate and balanced predictions, leading to significant tangible benefits (e.g., reduced revenue loss from cancellations, optimized resource allocation), the business might prioritize performance over interpretability.
Scalability: Neural networks can often scale well to larger, more complex datasets, which might be beneficial for future growth.
Automated Decision Making: For high-volume, low-risk decisions (e.g., sending an automated email to a potentially canceling customer), the 'why' behind each prediction might be less critical than the overall effectiveness.
Cons (Why they might be uncomfortable):

Lack of Interpretability: Neural Networks are notoriously difficult to interpret. It's hard to explain why a particular booking is flagged for cancellation. Business users and decision-makers often need insights into the driving factors behind predictions to build trust, refine strategies, and comply with regulations.
Regulatory/Ethical Concerns: In some industries, regulatory bodies may require models to be explainable to ensure fairness, prevent bias, and allow for auditing. Without clear explanations, it's hard to address questions like, "Why was this customer treated differently?"
Troubleshooting: If the model starts to perform unexpectedly or produces an incorrect prediction, diagnosing the root cause can be very challenging without interpretability.
Actionability: Without understanding why a cancellation is predicted, it's harder to devise targeted and effective intervention strategies beyond generic actions. For example, knowing which features contribute most to a cancellation risk could inform specific incentives or communication strategies.
The ConvergenceWarning indicates that the optimization algorithm did not fully converge within the maximum number of iterations (200). While the model still produced good results, increasing max_iter or using a different solver might lead to even better performance or ensure full convergence, which is generally good practice, though it might increase training time. This is a technical detail that might further reduce business comfort if not properly explained and managed, as it implies the model might not have reached its optimal state.

Ultimately, while the Neural Network offers the best predictive power, the hotel management would need to weigh this against the importance of understanding why a booking is likely to be canceled for operational insights and accountability. For critical decisions or explaining outcomes to customers, the 'black box' nature could be a significant hurdle

## 5. Compare All Three Models

### Do the following:
- Print and compare the accuracy of Na√Øve Bayes, SVM, and Neural Network models
- Summarize which model performed best

### In Your Response:
1. Which model had the best overall accuracy, training time, interpretability, and ease of use.
2. Would you recommend this model for deployment, and why?


In [9]:
from sklearn.metrics import accuracy_score

# Calculate accuracy for Na√Øve Bayes
accuracy_nb = accuracy_score(y_test, y_pred_nb)
print(f"Na√Øve Bayes Model Accuracy: {accuracy_nb:.2f}")

# Calculate accuracy for SVM
accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"SVM Model Accuracy: {accuracy_svm:.2f}")

# Calculate accuracy for Neural Network
accuracy_nn = accuracy_score(y_test, y_pred_nn)
print(f"Neural Network Model Accuracy: {accuracy_nn:.2f}")

print("\n--- Model Comparison Summary ---")

if accuracy_nb > accuracy_svm and accuracy_nb > accuracy_nn:
    best_model = "Na√Øve Bayes"
elif accuracy_svm > accuracy_nb and accuracy_svm > accuracy_nn:
    best_model = "SVM"
else:
    best_model = "Neural Network"

print(f"The model with the best overall accuracy is: {best_model}")


Na√Øve Bayes Model Accuracy: 0.54
SVM Model Accuracy: 0.85
Neural Network Model Accuracy: 0.86

--- Model Comparison Summary ---
The model with the best overall accuracy is: Neural Network


### ‚úçÔ∏è Your Response: üîß
Which model had the best overall accuracy, training time, interpretability, and ease of use?

Overall Accuracy:

Neural Network: 0.86 (Best)
SVM: 0.85
Na√Øve Bayes: 0.54
Training Time:

Na√Øve Bayes: Very fast (requires simple probability calculations).
SVM: Moderate to High (can be computationally intensive, especially with larger datasets and complex kernels like 'rbf', but often faster than deep NNs).
Neural Network: Highest (involved iterative optimization).
Interpretability:

Na√Øve Bayes: High (based on probabilities, relatively easy to understand feature influence).
SVM: Moderate (can be interpreted to some extent, especially with linear kernels, but less intuitive with non-linear kernels).
Neural Network: Low (considered a 'black box' model; difficult to understand direct feature contributions to predictions).
Ease of Use:

Na√Øve Bayes: High (minimal parameter tuning, straightforward implementation).
SVM: Moderate (requires careful selection of kernel and hyperparameters).
Neural Network: Moderate to Low (requires understanding of architecture, activation functions, optimizers, and extensive hyperparameter tuning).
Summary of Best:

Best Overall Accuracy: Neural Network
Best Training Time: Na√Øve Bayes
Best Interpretability: Na√Øve Bayes
Best Ease of Use: Na√Øve Bayes
Would you recommend this model for deployment, and why?

While the Neural Network offers the highest accuracy and the best overall balance of precision and recall for both cancellation and non-cancellation predictions, I would recommend it for deployment with a caveat.

Recommendation: Recommend the Neural Network for its superior predictive performance.

Why: Its higher accuracy and balanced F1-score for the 'canceled' class (0.81) mean it will correctly identify more cancellations while producing fewer false alarms compared to the other models. This leads to more effective and efficient interventions for hotel management.

Caveats/Risks: The primary risk is its 'black box' nature. For scenarios where understanding why a booking is likely to be canceled is crucial for targeted business strategies (e.g., personalized incentives to prevent cancellation, or regulatory compliance), the lack of interpretability could be a hurdle. Also, the ConvergenceWarning from the training process suggests that further tuning (e.g., increasing max_iter) might be needed to ensure optimal performance and robustness in a production environment.

For initial deployment, especially for automated flagging or proactive communication, the Neural Network's predictive power is unmatched. However, the business should be aware of its interpretability limitations and potentially explore explainable AI (XAI) techniques if deeper insights are required in the future.

## 6. Final Business Recommendation

### In Your Response:
1. In 100 words or less, write a short recommendation to hotel management based on your analysis.

Possible info to include:
- Which model do you recommend implementing?
- What business problem does it help solve?
- Are there any risks or limitations?
- What additional data might improve the results in the future?
2. How does this relate to your customized learning outcome you created in canvas?


### ‚úçÔ∏è Your Response: üîß
1. I recommend implementing the Neural Network model for predicting hotel booking cancellations. Its superior accuracy (0.86) and balanced performance will significantly enhance your ability to anticipate and proactively manage potential cancellations, optimizing resource allocation and revenue. The primary risk is its 'black box' nature, which means understanding why a booking is flagged for cancellation can be challenging. Future improvements could explore richer customer demographic data or booking journey specifics to improve both prediction and interpretability.

## Submission Instructions
‚úÖ Checklist:
- All code cells run without error
- All markdown responses are complete
- Submit on Canvas as instructed

In [None]:
!jupyter nbconvert --to html "assignment_12_LastnameFirstname.ipynb"