### 1. Setup the Environment

In [None]:
pip install pandas scikit-learn mlflow

### 2. Load the Titanic Dataset

In [None]:
import pandas as pd
import seaborn as sns

# Load the Titanic dataset
titanic = sns.load_dataset('titanic')


In [None]:
# Preview the dataset
print(titanic.head())

### 3. Preprocessing the Data

In [None]:
# Drop columns that are not relevant for modeling
titanic = titanic.drop(columns=['embarked', 'who', 'sex', 'deck', 'embark_town', 'alive'])

# Drop missing values
titanic = titanic.dropna()

# Encode categorical variables
titanic['sex'] = titanic['sex'].map({'male': 0, 'female': 1})

# Features and target variable
X = titanic.drop(columns=['survived'])
y = titanic['survived']

### 4. Build and Train a Model

Let’s use a RandomForestClassifier, but you can experiment with other models as well. We’ll also use MLflow to log parameters and metrics.

In [None]:
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Set up MLflow experiment
mlflow.set_experiment('Titanic_Survival_Prediction')

# Log parameters, metrics, and model using MLflow
with mlflow.start_run():
    # Define model parameters
    n_estimators = 100
    learning_rate = 0.1

    # Train the model
    model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
    model.fit(X_train, y_train)

    # Predict on the test set
    y_pred = model.predict(X_test)

    # Calculate accuracy and f1 score
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)

    # Log parameters and metrics to MLflow
    mlflow.log_param('n_estimators', n_estimators)
    mlflow.log_param('learning_rate', learning_rate)
    mlflow.log_metric('accuracy', accuracy)
    mlflow.log_metric('f1_score', f1)

    # Log the model
    mlflow.sklearn.log_model(model, 'model')

    print(f"Accuracy: {accuracy}")
    print(f"F1 Score: {f1}")

### 5. Compare the Runs in MLflow UI

Start MLflow server: To view the experiments and compare them, you need to start the MLflow tracking UI. In the terminal, run:

In [None]:
mlflow ui

In [None]:
#This will start a web server, and you can visit the UI at http://127.0.0.1:5000/ in your browser.

Multiple Runs: To compare multiple runs, you can train your model several times, possibly with different hyperparameters (e.g., different n_estimators, learning_rate), and each time, MLflow will log the parameters and metrics. You’ll be able to view and compare them in the UI.

### 6. Document the Setup Guidelines

In [2]:
#In your project, create a README.md file with the following content:

# Titanic Survival Prediction with MLflow

This project demonstrates the use of MLflow to track machine learning experiments. We use a RandomForestClassifier to predict Titanic survival based on various features.

## Setup Instructions

1. **Create a virtual environment** (recommended):
   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows, use venv\Scripts\activate


2. Install dependencies:

In [None]:
pip install pandas scikit-learn mlflow seaborn


3. Run the experiment:

Clone this repository and run the Python script to train the model and log metrics.

Start the MLflow UI

mlflow ui

Visit http://127.0.0.1:5000 to view and compare experiments.

4. Log parameters and metrics:

Parameters (e.g., n_estimators, learning_rate) and metrics (e.g., accuracy, f1_score) are logged automatically during training.

How to Compare Runs

After training different models, visit the MLflow UI at http://127.0.0.1:5000.

Navigate to the "Experiments" tab to see all your runs.

Compare the logged parameters and metrics to select the best model.


### 7. Prepare the GitHub Repository

1. **Create a GitHub Repository**:
   - Create a repository on GitHub (e.g., `HW-MLflow`).
   
2. **Add Your Code**:
   - Push the code for your project to GitHub, including:
     - The Python script(s) for training models.
     - The `README.md` file with the setup instructions.
   
3. **Push to GitHub**:
   - Commit and push your changes to the repository:
     ```bash
     git init
     git add .
     git commit -m "Initial commit"
     git remote add origin <your_repo_url>
     git push -u origin master
     ```
