![image.png](attachment:image.png)

**`Experiments Overview`**

- In a machine learning (ML) project, multiple experiments are required to fine-tune and optimize the model's performance. These experiments help us evaluate different configurations and approaches, allowing us to choose the best solution. By running multiple experiments, we can compare various hyperparameters, algorithms, and data processing techniques.

    1. `Data Preprocessing Experiments`: Testing different methods of handling missing data, scaling, or encoding categorical features.
    2. `Feature Engineering`: Experimenting with different sets of features or transformations to improve the model.
    3. `Model Selection Experiments`: Trying various machine learning algorithms to find the best one for the task.
    4. `Hyperparameter Tuning`: Testing different hyperparameters to optimize the model’s performance.

- Along with performing multiple experiments, we need to log the params & data used, thier results etc so that we can compare them later and select the best technique
- `MLFlow` is one such tool we can use to track all the experiments.
- `DVC` was developed for data versioning, although it offers experiment tracking also but MLFlow in comparison is specifically developed for experiment tracking.
- DVC works with Git but MLFlow works independently.
- MLFlow provides UI view.
- MLFlow also offers other functionailty along with experiment tracking like model registry etc

**`Practice`**

- start mlflow ui
- create a sample experiment in file1.py and log metrics, prams etc
- visualize the experiment in UI

![image.png](attachment:image.png)

- perform another experiment and visualize it

![image-2.png](attachment:image-2.png)

- **What all we can track using mlflow?**
    1. metrics: accuracy, loss, precision, AUC (area under the curve), custom metrics
    2. parameters: model hyperparameters, data preprocessing parameters, feature engineering
    3. artifacts
    4. models pkl/joblib files
    5. tags
    6. source code
    7. logging

- write a code to track artifacts and other details in mlflow

![image-3.png](attachment:image-3.png)

- All these experiments are also saved in local which are being used in UI

![image-4.png](attachment:image-4.png)

- **Difference b/w experiments and runs**
    1. We perform multiple runs using 1 model
    2. Each run with 1 model and different params are runs
    3. for example: Linear_Regression_Expliment: run1, run2, run3 etc -- Random_Forest_Experiment: run1, run2, run3 etc

- **MLFlow server architecture**

![image-5.png](attachment:image-5.png)

- currently we are saving everything locally, hosting ui locally, saving artifacts locally, saving metadata locally
- In real life project, we should save artifacts, metadata in a cloud storage or dbs and host UI on cloud so that team-member can see it
- To do this, we can use AWS (tricky) or DAGSHub (easy)
- create a new file to impliment all experiments using DAGSHub
- MLFlow UI on dagshub is a sharable link which is hosted on dagshub, which everybody can use.

![image-6.png](attachment:image-6.png)

- `We can also **Auto Log** all params without explicitly stating to track params, model etc`
- create a new file `autolog.py`
- Auto logs a lot of things


- **`How to do Hyper-Parameter Tuning using MLFlow`**
- create new file hyper_param.py to perform manual hyper tune first & later using mlflow
- now we can save all the experiments instead of just saving the best one

![image-7.png](attachment:image-7.png)

- We can compare all nested in mlflow, how accuracy changes when changing the values of parameters

![image-8.png](attachment:image-8.png)

- `Model Resgistry`
- MLflow's Model Registry is a central repository for managing machine learning models, providing versioning, tracking, and stage management. It allows you to track different versions of your models, manage their lifecycle (from development to production), and easily transition between different stages (like dev, staging, production, and retired).