## Training on Azure Machine Learning

The main task on this dataset is to predict based on the given attributes of a patient that whether that particular person has heart disease or not and other is the experimental task to diagnose and find out various insights from this dataset which could help in understanding the problem more.

#### Libraries and modules to use

In [1]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import mlflow
from training import fetch_data, create_splits, run_experiment

#### Data reading

In [12]:
df = fetch_data("https://mlrawdata123.blob.core.windows.net/rawdata/raw_data.csv")

#### Data preprocessing

Train-test split

In [76]:
X_train, X_test, y_train, y_test = create_splits(df)

### Modeling

#### Logistic regression

In [4]:
run_experiment(X_train, X_test, y_train, y_test, LogisticRegression, {"random_state":1234, "penalty":'l2'}, experiment_name="heart_disease_exp", run_name="log_reg_baseline")



#### Decision Tree

In [8]:
run_experiment(X_train, X_test, y_train, y_test, DecisionTreeClassifier, {"random_state":1234, "max_depth":10}, experiment_name="heart_disease_exp", run_name="decision_tree")

#### Random Forest

In [11]:
run_experiment(X_train, X_test, y_train, y_test, RandomForestClassifier, {"random_state":1234, "n_estimators":50}, experiment_name="heart_disease_exp", run_name = "random_forest")

### Model Registry

Set MLflow client

In [80]:
from mlflow.tracking import MlflowClient

ML_FLOW_TRACKING_URI = mlflow.get_tracking_uri()
client = MlflowClient(tracking_uri=ML_FLOW_TRACKING_URI)
client.search_experiments()

In [29]:
experiment_id = mlflow.get_experiment_by_name("heart_disease_exp").experiment_id

Search runs with accuracy higher than 0.79

In [81]:
from mlflow.entities import ViewType

runs = client.search_runs(
    experiment_ids=experiment_id,
    filter_string='metrics.test_accuracy > 0.79',
    run_view_type=ViewType.ACTIVE_ONLY,
    max_results=3
)

In [66]:
for run in runs:
    artifact = client.list_artifacts(run_id=run.info.run_id)[0].path
    print(f"run id: {run.info.run_id}, test_accuracy: {run.data.metrics['test_accuracy']:.4f},  artifact: {artifact}")

run id: 8bb212b8-6697-4960-b9bc-0dff1bbb0d2d, test_accuracy: 0.7912,  artifact: heart_disease_DecisionTreeClassifier
run id: c181a6c4-3f0f-42e6-8e8f-6c6c6167194f, test_accuracy: 0.8242,  artifact: heart_disease_RandomForestClassifier


Registry Random Forest Classifier

In [82]:
run_id = "c181a6c4-3f0f-42e6-8e8f-6c6c6167194f"
artifact = "heart_disease_RandomForestClassifier"
model_uri = f'runs:/{run_id}/{artifact}'
mlflow.register_model(model_uri=model_uri, name = 'heart_disease_model')

Change stage to production

In [83]:
model_name = 'heart_disease_model'
model_version = 1
new_stage = 'Production'
client.transition_model_version_stage(
    name=model_name,
    version=model_version,
    stage=new_stage,
    archive_existing_versions=False
)