Load and Preprocess the Data

In [None]:
import pandas as pd

# Load the data
file_path = '/content/La_Veranda_Reviews-2023-01-16.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataframe
print(data.head())

# Display the structure of the dataframe
print(data.info())


                                               Title  \
0                           Wonderful place to stay.   
1                                      It was superb   
2                                          Very Good   
3                                          Wonderful   
4  Fantastic value for a new, modern and spotless...   

                                      PositiveReview   NegativeReview  Score  \
0  New, comfortable apartments, close to the airp...  Nothing at all.   10.0   
1  We had a really pleasant stay! The staff was v...              NaN   10.0   
2  the location is great and near the airport. bu...              NaN    8.0   
3            Great stuff\nGreat Quality/price\nClean              NaN    9.0   
4  Clean and modern with very comfortable beds, i...              NaN   10.0   

    GuestName    GuestCountry                   RoomType NumberOfNights  \
0        Olga          Norway           Budget Twin Room        1 night   
1       Iwona          Poland   

### Text Representation Techniques

Bag of Words

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

# Fill NaN values with empty strings
data['PositiveReview'] = data['PositiveReview'].fillna('')
data['NegativeReview'] = data['NegativeReview'].fillna('')

# Combine positive and negative reviews into one column
data['Review'] = data['PositiveReview'] + " " + data['NegativeReview']

# Initialize the CountVectorizer
vectorizer_bow = CountVectorizer()

# Fit and transform the text data
bow_matrix = vectorizer_bow.fit_transform(data['Review'])

# Display the shape of the Bag of Words matrix
print("Bag of Words matrix shape:", bow_matrix.shape)


Bag of Words matrix shape: (1523, 2393)


The shape of the resulting matrix indicates that there are 1523 documents (reviews) and 2392 unique words (features) in the vocabulary.


TFIDF

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize the TfidfVectorizer
vectorizer_tfidf = TfidfVectorizer()

# Fit and transform the text data
tfidf_matrix = vectorizer_tfidf.fit_transform(data['Review'])

# Display the shape of the TFIDF matrix
print("TFIDF matrix shape:", tfidf_matrix.shape)


TFIDF matrix shape: (1523, 2393)


The TFIDF matrix shape `(1523, 2392)` indicates that there are 1523 reviews and 2392 unique terms in the dataset, with each cell containing a TFIDF value that reflects the importance of a word in a review relative to the entire corpus. This matrix is used to convert text data into numerical form for machine learning models.

nGram

In [None]:
# Initialize the CountVectorizer with ngram_range for bi-grams
vectorizer_ngrams = CountVectorizer(ngram_range=(2, 2))

# Fit and transform the text data
ngrams_matrix = vectorizer_ngrams.fit_transform(data['Review'])

# Display the shape of the n-Grams matrix
print("n-Grams matrix shape:", ngrams_matrix.shape)


n-Grams matrix shape: (1523, 12255)


The n-Grams matrix shape `(1523, 12226)` indicates that there are 1523 reviews and 12226 unique bi-grams in the dataset, with each cell containing the count of a specific bi-gram in a particular review.

TFIDF is generally the best option as it normalizes term frequency by the inverse document frequency, reducing the impact of commonly occurring words, while n-Grams capture more context but can be more sparse and computationally intensive.


```
 Combine Reviews and Create Sentiment Column

```



In [None]:
# # Fill NaN values with empty strings
data['PositiveReview'] = data['PositiveReview'].fillna('')
data['NegativeReview'] = data['NegativeReview'].fillna('')

# # Combine positive and negative reviews into one column
data['Review'] = data['PositiveReview'] + " " + data['NegativeReview']

# Create a binary sentiment column based on the Score
data['Sentiment'] = (data['Score'] > 7).astype(int)

# Display the updated dataframe
print(data[['Review', 'Sentiment']].head())


                                              Review  Sentiment
0  New, comfortable apartments, close to the airp...          1
1  We had a really pleasant stay! The staff was v...          1
2  the location is great and near the airport. bu...          1
3           Great stuff\nGreat Quality/price\nClean           1
4  Clean and modern with very comfortable beds, i...          1


Split the Data

In [None]:
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data['Review'], data['Sentiment'], test_size=0.2, random_state=42)


Text Representation Techniques (Bag of Words, TFIDF, nGrams)


Bag of Words

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

# Initialize the CountVectorizer
vectorizer_bow = CountVectorizer()

# Fit and transform the text data
X_train_bow = vectorizer_bow.fit_transform(X_train)
X_test_bow = vectorizer_bow.transform(X_test)

# Display the shape of the Bag of Words matrix
print("Bag of Words matrix shape:", X_train_bow.shape)


Bag of Words matrix shape: (1218, 2178)


TFIDF

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize the TfidfVectorizer
vectorizer_tfidf = TfidfVectorizer()

# Fit and transform the text data
X_train_tfidf = vectorizer_tfidf.fit_transform(X_train)
X_test_tfidf = vectorizer_tfidf.transform(X_test)

# Display the shape of the TFIDF matrix
print("TFIDF matrix shape:", X_train_tfidf.shape)


TFIDF matrix shape: (1218, 2178)


nGrams

In [None]:
# Initialize the CountVectorizer with ngram_range for bi-grams
vectorizer_ngrams = CountVectorizer(ngram_range=(2, 2))

# Fit and transform the text data
X_train_ngrams = vectorizer_ngrams.fit_transform(X_train)
X_test_ngrams = vectorizer_ngrams.transform(X_test)

# Display the shape of the n-Grams matrix
print("n-Grams matrix shape:", X_train_ngrams.shape)


n-Grams matrix shape: (1218, 10420)


Model Training and Evaluation Function

In [None]:
from sklearn.metrics import classification_report, roc_auc_score

def train_evaluate_model(model, X_train_vec, X_test_vec, y_train, y_test):
    model.fit(X_train_vec, y_train)
    preds = model.predict(X_test_vec)
    report = classification_report(y_test, preds)
    auc = roc_auc_score(y_test, model.predict_proba(X_test_vec)[:, 1])
    return report, auc


Initialize Models

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

# Initialize the models
log_reg = LogisticRegression(random_state=42)
svm = SVC(random_state=42, probability=True)
rf = RandomForestClassifier(random_state=42)


Train and Evaluate Models using Bag of Words

In [None]:
# Train and evaluate models using Bag of Words
print("Bag of Words Representation:")
log_reg_report, log_reg_auc = train_evaluate_model(log_reg, X_train_bow, X_test_bow, y_train, y_test)
svm_report, svm_auc = train_evaluate_model(svm, X_train_bow, X_test_bow, y_train, y_test)
rf_report, rf_auc = train_evaluate_model(rf, X_train_bow, X_test_bow, y_train, y_test)

print("Logistic Regression Report:\n", log_reg_report)
print("Logistic Regression ROC AUC:", log_reg_auc)
print("\nSVM Report:\n", svm_report)
print("SVM ROC AUC:", svm_auc)
print("\nRandom Forest Report:\n", rf_report)
print("Random Forest ROC AUC:", rf_auc)


Bag of Words Representation:


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Logistic Regression Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.94       274

    accuracy                           0.90       305
   macro avg       0.45      0.50      0.47       305
weighted avg       0.81      0.90      0.85       305

Logistic Regression ROC AUC: 0.6698257593595479

SVM Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                           0.90       305
   macro avg       0.45      0.50      0.47       305
weighted avg       0.81      0.90      0.85       305

SVM ROC AUC: 0.5597480574523194

Random Forest Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                       

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Train and Evaluate Models using TFIDF

In [None]:
# Train and evaluate models using TFIDF
print("\nTFIDF Representation:")
log_reg_report, log_reg_auc = train_evaluate_model(log_reg, X_train_tfidf, X_test_tfidf, y_train, y_test)
svm_report, svm_auc = train_evaluate_model(svm, X_train_tfidf, X_test_tfidf, y_train, y_test)
rf_report, rf_auc = train_evaluate_model(rf, X_train_tfidf, X_test_tfidf, y_train, y_test)

print("Logistic Regression Report:\n", log_reg_report)
print("Logistic Regression ROC AUC:", log_reg_auc)
print("\nSVM Report:\n", svm_report)
print("SVM ROC AUC:", svm_auc)
print("\nRandom Forest Report:\n", rf_report)
print("Random Forest ROC AUC:", rf_auc)



TFIDF Representation:


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Logistic Regression Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                           0.90       305
   macro avg       0.45      0.50      0.47       305
weighted avg       0.81      0.90      0.85       305

Logistic Regression ROC AUC: 0.7147986814221804

SVM Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                           0.90       305
   macro avg       0.45      0.50      0.47       305
weighted avg       0.81      0.90      0.85       305

SVM ROC AUC: 0.5805862962090887

Random Forest Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                       

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Train and Evaluate Models using n-Grams

In [None]:
# Train and evaluate models using n-Grams
print("\nn-Grams Representation:")
log_reg_report, log_reg_auc = train_evaluate_model(log_reg, X_train_ngrams, X_test_ngrams, y_train, y_test)
svm_report, svm_auc = train_evaluate_model(svm, X_train_ngrams, X_test_ngrams, y_train, y_test)
rf_report, rf_auc = train_evaluate_model(rf, X_train_ngrams, X_test_ngrams, y_train, y_test)

print("Logistic Regression Report:\n", log_reg_report)
print("Logistic Regression ROC AUC:", log_reg_auc)
print("\nSVM Report:\n", svm_report)
print("SVM ROC AUC:", svm_auc)
print("\nRandom Forest Report:\n", rf_report)
print("Random Forest ROC AUC:", rf_auc)



n-Grams Representation:


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Logistic Regression Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                           0.90       305
   macro avg       0.45      0.50      0.47       305
weighted avg       0.81      0.90      0.85       305

Logistic Regression ROC AUC: 0.6407464092300448

SVM Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                           0.90       305
   macro avg       0.45      0.50      0.47       305
weighted avg       0.81      0.90      0.85       305

SVM ROC AUC: 0.49275959500824107

Random Forest Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                      

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Hyperparameter Tuning

In [None]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid for each model
param_grid = {
    'log_reg': {
        'C': [0.01, 0.1, 1, 10, 100]
    },
    'svm': {
        'C': [0.01, 0.1, 1, 10, 100],
        'kernel': ['linear', 'rbf']
    },
    'rf': {
        'n_estimators': [100, 200, 300],
        'max_features': ['auto', 'sqrt', 'log2'],
        'max_depth': [4, 6, 8],
        'criterion': ['gini', 'entropy']
    }
}

# Initialize GridSearchCV for each model
grid_search_log_reg = GridSearchCV(LogisticRegression(random_state=42), param_grid['log_reg'], cv=5, scoring='roc_auc')
grid_search_svm = GridSearchCV(SVC(random_state=42, probability=True), param_grid['svm'], cv=5, scoring='roc_auc')
grid_search_rf = GridSearchCV(RandomForestClassifier(random_state=42), param_grid['rf'], cv=5, scoring='roc_auc')

# Fit the models using GridSearchCV
grid_search_log_reg.fit(X_train_tfidf, y_train)
grid_search_svm.fit(X_train_tfidf, y_train)
grid_search_rf.fit(X_train_tfidf, y_train)

# Get the best estimators
best_log_reg = grid_search_log_reg.best_estimator_
best_svm = grid_search_svm.best_estimator_
best_rf = grid_search_rf.best_estimator_

# Print the best parameters for each model
print("Best Logistic Regression Parameters:", grid_search_log_reg.best_params_)
print("Best SVM Parameters:", grid_search_svm.best_params_)
print("Best Random Forest Parameters:", grid_search_rf.best_params_)

# Evaluate the best models
best_log_reg_report, best_log_reg_auc = train_evaluate_model(best_log_reg, X_train_tfidf, X_test_tfidf, y_train, y_test)
best_svm_report, best_svm_auc = train_evaluate_model(best_svm, X_train_tfidf, X_test_tfidf, y_train, y_test)
best_rf_report, best_rf_auc = train_evaluate_model(best_rf, X_train_tfidf, X_test_tfidf, y_train, y_test)

# Print the evaluation results for the best models
print("Best Logistic Regression Report:\n", best_log_reg_report)
print("Best Logistic Regression ROC AUC:", best_log_reg_auc)
print("\nBest SVM Report:\n", best_svm_report)
print("Best SVM ROC AUC:", best_svm_auc)
print("\nBest Random Forest Report:\n", best_rf_report)
print("Best Random Forest ROC AUC:", best_rf_auc)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
 

Best Logistic Regression Parameters: {'C': 1}
Best SVM Parameters: {'C': 1, 'kernel': 'linear'}
Best Random Forest Parameters: {'criterion': 'entropy', 'max_depth': 6, 'max_features': 'auto', 'n_estimators': 300}


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  warn(


Best Logistic Regression Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                           0.90       305
   macro avg       0.45      0.50      0.47       305
weighted avg       0.81      0.90      0.85       305

Best Logistic Regression ROC AUC: 0.7147986814221804

Best SVM Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accuracy                           0.90       305
   macro avg       0.45      0.50      0.47       305
weighted avg       0.81      0.90      0.85       305

Best SVM ROC AUC: 0.6973746173769719

Best Random Forest Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        31
           1       0.90      1.00      0.95       274

    accura

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


### Business Impact Interpretation

The initial models evaluated using different text representations (Bag of Words, TFIDF, and n-Grams) showed a consistent trend where the models performed well in identifying negative sentiment but failed to classify positive sentiment accurately.

### Bag of Words Representation:

**Logistic Regression:**
- Precision: 0.90 for negative sentiment
- Recall: 1.00 for negative sentiment
- ROC AUC: 0.67

**SVM:**
- Precision: 0.90 for negative sentiment
- Recall: 1.00 for negative sentiment
- ROC AUC: 0.56

**Random Forest:**
- Precision: 0.90 for negative sentiment
- Recall: 1.00 for negative sentiment
- ROC AUC: 0.62

### TFIDF Representation:

**Logistic Regression:**
- Precision: 0.90 for negative sentiment
- Recall: 1.00 for negative sentiment
- ROC AUC: 0.71

**SVM:**
- Precision: 0.90 for negative sentiment
- Recall: 1.00 for negative sentiment
- ROC AUC: 0.58

**Random Forest:**
- Precision: 0.90 for negative sentiment
- Recall: 1.00 for negative sentiment
- ROC AUC: 0.67

### n-Grams Representation:

**Logistic Regression:**
- Precision: 0.90 for negative sentiment
- Recall: 1.00 for negative sentiment
- ROC AUC: 0.64

**SVM:**
- Precision: 0.90 for negative sentiment
- Recall: 1.00 for negative sentiment
- ROC AUC: 0.49

**Random Forest:**
- Precision: 0.90 for negative sentiment
- Recall: 1.00 for negative sentiment
- ROC AUC: 0.56

### Key Points:

1. **False Positives:**
   - The models consistently fail to classify positive sentiment correctly, leading to an imbalance where almost all reviews are classified as negative. This results in a high number of false positives, where positive reviews are incorrectly flagged as negative. From a business perspective, this could lead to unnecessary allocation of resources to address feedback that does not require immediate attention, thus reducing operational efficiency.

2. **False Negatives:**
   - False negatives are minimal in the current models since they classify almost all reviews as negative. However, the lack of correct identification of positive sentiment means that the models might miss opportunities to identify genuinely satisfied customers who could be encouraged to leave more detailed positive feedback, enhancing the hotel's online reputation.

### Recommendations for Improvement:

1. **Data Imbalance:**
   - The dataset appears to be heavily imbalanced towards negative reviews. Using techniques like SMOTE to oversample the minority class (positive sentiment) can help balance the dataset and improve the model's ability to detect positive sentiment.

2. **Feature Engineering:**
   - Incorporating additional features such as reviewer nationality, stay duration, and trip type could provide more context to the models, improving their performance.

3. **Model Tuning:**
   - Hyperparameter tuning using GridSearchCV can optimize the model parameters for better performance. This process can fine-tune the balance between precision and recall for both classes.

4. **Ensemble Methods:**
   - Considering ensemble methods like boosting or stacking multiple models could provide a more robust solution by combining the strengths of different algorithms.

5. **Continuous Improvement:**
   - Integrating a feedback loop where task force responses are analyzed can help refine the model over time. This continuous learning approach ensures that the model adapts to new patterns in guest feedback.


Implementing Improvements:

Balancing the Dataset:


In [None]:
from imblearn.over_sampling import SMOTE

# Apply SMOTE to balance the dataset
smote = SMOTE(random_state=42)
X_train_tfidf_res, y_train_res = smote.fit_resample(X_train_tfidf, y_train)


Hyperparameter Tuning:

In [None]:
# Define the parameter grid for each model
param_grid = {
    'log_reg': {
        'C': [0.01, 0.1, 1, 10, 100]
    },
    'svm': {
        'C': [0.01, 0.1, 1, 10, 100],
        'kernel': ['linear', 'rbf']
    },
    'rf': {
        'n_estimators': [100, 200, 300],
        'max_features': ['auto', 'sqrt', 'log2'],
        'max_depth': [4, 6, 8],
        'criterion': ['gini', 'entropy']
    }
}

# Initialize GridSearchCV for each model
grid_search_log_reg = GridSearchCV(LogisticRegression(random_state=42), param_grid['log_reg'], cv=5, scoring='roc_auc')
grid_search_svm = GridSearchCV(SVC(random_state=42, probability=True), param_grid['svm'], cv=5, scoring='roc_auc')
grid_search_rf = GridSearchCV(RandomForestClassifier(random_state=42), param_grid['rf'], cv=5, scoring='roc_auc')

# Fit the models using GridSearchCV
grid_search_log_reg.fit(X_train_tfidf_res, y_train_res)
grid_search_svm.fit(X_train_tfidf_res, y_train_res)
grid_search_rf.fit(X_train_tfidf_res, y_train_res)

# Get the best estimators
best_log_reg = grid_search_log_reg.best_estimator_
best_svm = grid_search_svm.best_estimator_
best_rf = grid_search_rf.best_estimator_

# Print the best parameters for each model
print("Best Logistic Regression Parameters:", grid_search_log_reg.best_params_)
print("Best SVM Parameters:", grid_search_svm.best_params_)
print("Best Random Forest Parameters:", grid_search_rf.best_params_)

# Evaluate the best models
best_log_reg_report, best_log_reg_auc = train_evaluate_model(best_log_reg, X_train_tfidf_res, X_test_tfidf, y_train_res, y_test)
best_svm_report, best_svm_auc = train_evaluate_model(best_svm, X_train_tfidf_res, X_test_tfidf, y_train_res, y_test)
best_rf_report, best_rf_auc = train_evaluate_model(best_rf, X_train_tfidf_res, X_test_tfidf, y_train_res, y_test)

# Print the evaluation results for the best models
print("Best Logistic Regression Report:\n", best_log_reg_report)
print("Best Logistic Regression ROC AUC:", best_log_reg_auc)
print("\nBest SVM Report:\n", best_svm_report)
print("Best SVM ROC AUC:", best_svm_auc)
print("\nBest Random Forest Report:\n", best_rf_report)
print("Best Random Forest ROC AUC:", best_rf_auc)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Best Logistic Regression Parameters: {'C': 10}
Best SVM Parameters: {'C': 10, 'kernel': 'linear'}
Best Random Forest Parameters: {'criterion': 'gini', 'max_depth': 8, 'max_features': 'auto', 'n_estimators': 200}


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
  warn(


Best Logistic Regression Report:
               precision    recall  f1-score   support

           0       0.14      0.74      0.23        31
           1       0.94      0.47      0.63       274

    accuracy                           0.50       305
   macro avg       0.54      0.61      0.43       305
weighted avg       0.86      0.50      0.59       305

Best Logistic Regression ROC AUC: 0.6738285848834471

Best SVM Report:
               precision    recall  f1-score   support

           0       0.13      0.71      0.22        31
           1       0.94      0.47      0.63       274

    accuracy                           0.50       305
   macro avg       0.53      0.59      0.43       305
weighted avg       0.85      0.50      0.59       305

Best SVM ROC AUC: 0.600541558747351

Best Random Forest Report:
               precision    recall  f1-score   support

           0       0.11      0.55      0.18        31
           1       0.90      0.48      0.63       274

    accurac

By implementing these improvements, the sentiment analysis model will be better equipped to accurately classify guest reviews, enabling the hotel management to promptly address negative feedback and enhance overall guest satisfaction.