# 🛠️ Titanic Survival Prediction - Baseline Model

This notebook is part of the Titanic Survival Prediction project. In this phase, we set up a mlflow server on dagshub and build a baseline model.

[MLflow Server](https://dagshub.com/pxxthik/Titanic-Survival-Prediction.mlflow)

In [1]:
!pip install mlflow==2.15.0

Collecting mlflow==2.15.0
  Downloading mlflow-2.15.0-py3-none-any.whl.metadata (29 kB)
Collecting mlflow-skinny==2.15.0 (from mlflow==2.15.0)
  Downloading mlflow_skinny-2.15.0-py3-none-any.whl.metadata (30 kB)
Collecting graphene<4 (from mlflow==2.15.0)
  Downloading graphene-3.4.3-py2.py3-none-any.whl.metadata (6.9 kB)
Collecting pyarrow<16,>=4.0.0 (from mlflow==2.15.0)
  Downloading pyarrow-15.0.2-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (3.0 kB)
Collecting querystring-parser<2 (from mlflow==2.15.0)
  Downloading querystring_parser-1.2.4-py2.py3-none-any.whl.metadata (559 bytes)
Collecting gunicorn<23 (from mlflow==2.15.0)
  Downloading gunicorn-22.0.0-py3-none-any.whl.metadata (4.4 kB)
Collecting databricks-sdk<1,>=0.20.0 (from mlflow-skinny==2.15.0->mlflow==2.15.0)
  Downloading databricks_sdk-0.57.0-py3-none-any.whl.metadata (39 kB)
Collecting importlib-metadata!=4.7.0,<8,>=3.7.0 (from mlflow-skinny==2.15.0->mlflow==2.15.0)
  Downloading importlib_metadata-7

In [2]:
!pip install dagshub==0.3.34

Collecting dagshub==0.3.34
  Downloading dagshub-0.3.34-py3-none-any.whl.metadata (11 kB)
Collecting fusepy>=3 (from dagshub==0.3.34)
  Downloading fusepy-3.0.1.tar.gz (11 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting appdirs>=1.4.4 (from dagshub==0.3.34)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting httpx~=0.23.0 (from dagshub==0.3.34)
  Downloading httpx-0.23.3-py3-none-any.whl.metadata (7.1 kB)
Collecting rich~=13.1.0 (from dagshub==0.3.34)
  Downloading rich-13.1.0-py3-none-any.whl.metadata (18 kB)
Collecting dacite~=1.6.0 (from dagshub==0.3.34)
  Downloading dacite-1.6.0-py3-none-any.whl.metadata (14 kB)
Collecting tenacity~=8.2.2 (from dagshub==0.3.34)
  Downloading tenacity-8.2.3-py3-none-any.whl.metadata (1.0 kB)
Collecting gql[requests] (from dagshub==0.3.34)
  Downloading gql-3.5.3-py2.py3-none-any.whl.metadata (9.4 kB)
Collecting treelib~=1.6.4 (from dagshub==0.3.34)
  Downloading treelib-1.6.4-py3-none-

In [3]:
import mlflow
import mlflow.sklearn
import dagshub
import os

In [4]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
dagshub_token = user_secrets.get_secret("DAGSHUB_PAT")

In [5]:
if not dagshub_token:
    raise EnvironmentError("DAGSHUB_PAT environment variable is not set")

os.environ["MLFLOW_TRACKING_USERNAME"] = dagshub_token
os.environ["MLFLOW_TRACKING_PASSWORD"] = dagshub_token

In [6]:
dagshub_url = "https://dagshub.com"
repo_owner = "pxxthik"
repo_name = "Titanic-Survival-Prediction"

# Set up MLflow tracking URI
mlflow.set_tracking_uri(f'{dagshub_url}/{repo_owner}/{repo_name}.mlflow')

In [7]:
with mlflow.start_run():
    mlflow.log_param("Parameter", "value")
    mlflow.log_metric("Metric", 1)

2025/06/18 16:52:09 INFO mlflow.tracking._tracking_service.client: 🏃 View run burly-kite-413 at: https://dagshub.com/pxxthik/Titanic-Survival-Prediction.mlflow/#/experiments/0/runs/163ed6d380394320ab2563ff58456d0a.
2025/06/18 16:52:09 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/pxxthik/Titanic-Survival-Prediction.mlflow/#/experiments/0.


In [8]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.metrics import roc_auc_score

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

In [9]:
df = pd.read_csv("/kaggle/input/titanic-features/titanic_features.csv")
X = df.drop(columns=["Survived"])
y = df["Survived"]

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
                                                   random_state=42)

In [11]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((712, 10), (179, 10), (712,), (179,))

In [12]:
model = LogisticRegression()
model.fit(X_train, y_train)

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [13]:
y_pred = model.predict(X_test)

In [14]:
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"ROC AUC: {roc_auc}")

Accuracy: 0.7932960893854749
Precision: 0.7761194029850746
Recall: 0.7027027027027027
ROC AUC: 0.7799227799227799


In [15]:
import json

metrics = {
    "accuracy": accuracy,
    "precision": precision,
    "recall": recall,
    "roc_auc": roc_auc,
}

with open("metrics.json", 'w') as file:
    json.dump(metrics, file, indent=4)

In [16]:
mlflow.set_experiment("baseline model")
with mlflow.start_run():
    
    if hasattr(model, 'get_params'):
        params = model.get_params()
        for param_name, param_value in params.items():
            mlflow.log_param(param_name, param_value)

    for key, value in metrics.items():
        mlflow.log_metric(key, value)

    mlflow.log_artifact('/kaggle/working/metrics.json')
    mlflow.sklearn.log_model(model, "baseline_model")

2025/06/18 16:52:41 INFO mlflow.tracking._tracking_service.client: 🏃 View run popular-rat-751 at: https://dagshub.com/pxxthik/Titanic-Survival-Prediction.mlflow/#/experiments/1/runs/8a6bb03d5d8140c5b4c4d89b53707728.
2025/06/18 16:52:41 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://dagshub.com/pxxthik/Titanic-Survival-Prediction.mlflow/#/experiments/1.


In [17]:
mlflow.__version__

'2.15.0'

In [18]:
dagshub.__version__

'0.3.34'

In [19]:
!pip uninstall dagshub -y

Found existing installation: dagshub 0.3.34
Uninstalling dagshub-0.3.34:
  Successfully uninstalled dagshub-0.3.34


In [20]:
import dagshub