# Evaluating our Prediction Models with Loss Functions
The Titanic data frames describe the survival status of individual passengers on the Titanic.

Source the data from the Kaggle repository here: https://www.kaggle.com/c/titanic/data

Execute the tasks listed below:

* Build 2 prediction models of your choice to compute the survival rates of passengers
* Evaluate the quality of your predictions models using the relevant loss functions (use at least 2 loss functions)
* Explain your thoughts on the obtained these results

In [6]:
# 1. library import
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, log_loss, mean_squared_error
from sklearn.preprocessing import LabelEncoder

# 2. Loading Titanic data
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)

# 3. Pre-treatment
# Encoding categorical variables
df['Sex'] = LabelEncoder().fit_transform(df['Sex'])
df['Embarked'] = LabelEncoder().fit_transform(df['Embarked'])

features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
df = df[features + ['Survived']].dropna()

X = df[features]
y = df['Survived']

# 4. Train/test separation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# 5. Model training
log_model = LogisticRegression()
log_model.fit(X_train, y_train)

rf_model = RandomForestClassifier()
rf_model.fit(X_train, y_train)

# 6. Predictions
log_preds = log_model.predict(X_val)
log_probas = log_model.predict_proba(X_val)[:, 1]

rf_preds = rf_model.predict(X_val)
rf_probas = rf_model.predict_proba(X_val)[:, 1]

# 7. Evaluation
print("=== Logistic Regression ===")
print(f"Accuracy: {accuracy_score(y_val, log_preds):.4f}")
print(f"Log Loss: {log_loss(y_val, log_model.predict_proba(X_val)):.4f}")
print(f"MSE: {mean_squared_error(y_val, log_probas):.4f}")

print("\n=== Random Forest ===")
print(f"Accuracy: {accuracy_score(y_val, rf_preds):.4f}")
print(f"Log Loss: {log_loss(y_val, rf_model.predict_proba(X_val)):.4f}")
print(f"MSE: {mean_squared_error(y_val, rf_probas):.4f}")


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


=== Logistic Regression ===
Accuracy: 0.7413
Log Loss: 0.5009
MSE: 0.1628

=== Random Forest ===
Accuracy: 0.7692
Log Loss: 1.2033
MSE: 0.1668
