# Excercise 3: Track Experiments

## Prepare data once again

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

start_df = pd.read_csv('./clean-loan-data.csv', low_memory=False)

pd.set_option('display.max_columns', None)

df = start_df.copy(deep=True)
df['TARGET'] = [0 if i=='Default' else 1 for i in df['loan_status']]
df.drop(df.columns.difference(['loan_amnt', 'annual_inc', 'TARGET']), 1, inplace=True)
df = df.dropna()
df = pd.get_dummies(df)

In [2]:
from sklearn.model_selection import train_test_split
X_train = df.drop('TARGET',axis=1)
y_train = df['TARGET']
X_train, X_test, y_train, y_test = train_test_split(df.drop('TARGET',axis=1),df['TARGET'],test_size=0.15,random_state=101,  stratify=df['TARGET'])

In [3]:
from imblearn.over_sampling import SMOTE

sm = SMOTE(random_state=12)
x_train_r, y_train_r = sm.fit_resample(X_train, y_train)

# Excercise 1: Track Experiment

Check the documentation at https://www.mlflow.org/docs/latest/tracking.html#logging-data-to-runs and track the experiment using these functions.

1. set_experiment
2. start run
3. log_param
4. set_tag

Check the MLFlow UI: http://localhost:5500

*Questions*
- Experiment with these values. How would you use them?
- Can your reproduce how the pipeline looks in the MLFlow UI?

In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.pipeline import make_pipeline

import mlflow


mlflow.set_tracking_uri("http://mlflow:5500")

experiment_id = mlflow.set_experiment("eligibility")

with mlflow.start_run():
    mlflow.set_tag("estimator","LogReg")
    
    eligibility_pipeline = make_pipeline(StandardScaler(), LogisticRegression())
    eligibility_pipeline.fit(x_train_r, y_train_r)





## Excercise 2: Autolog

Instead of tracking everything manually, MLFlow supports autotracking of several libraries including sklearn.

So reuse your code above, remove all calls to `set_tag` and `log_param`. Instead call `mlflow.autolog()` before starting the run.

Check the MLFlow UI: http://localhost:5500

Questions
- How did that change in the MLFlow UI

In [5]:
import mlflow

mlflow.set_tracking_uri("http://mlflow:5500")
