* project 目錄為 `End-to-End-Machine-Learning-Pipeline`，以下路徑均以此為 根目錄

* `src/mlProject/constants/__init__.py` 下定義了

	```python
	CONFIG_FILE_PATH
	PARAMS_FILE_PATH
	SCHEMA_FILE_PATH
	```

* `src/mlProject/constants/__init__.py` 要先完成 (針對 5. Update the configuration manager )

# 自行實作

In [None]:
!git clone https://github.com/henrykohl/MLOps-Foundation.git

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install() # expect a kernel restart

In [None]:
!conda create -n mlproj python=3.8 -y

In [None]:
%cd MLOps-Foundation/End-to-End-Machine-Learning-Pipeline/

In [None]:
%pwd

In [None]:
!source activate mlproj; pip install -r requirements.txt

* colab 操作時，依序
  - 將 `01_data_ingestion.ipynb` 中，自行實作的執行部分複製過來，執行一次。
  - 將 `02_data_validation.ipynb` 中，自行實作的執行部分複製過來，執行一次。
	- 將 `03_data_transformation.ipynb` 中，自行實作的執行部分複製過來，執行一次。

# Lecture Demo

In [1]:
import os
os.chdir("../")
%pwd

'd:\\Bappy\\Live Sessions\\Euron\\MLOPs Masters Batch\\End-to-End-Machine-Learning-Pipeline'

## 4. Update the entity
* project 對應 `src/mlProject/entity/config_entity.py`

In [None]:
from dataclasses import dataclass
from pathlib import Path


@dataclass(frozen=True)
class ModelTrainerConfig:
    root_dir: Path        ## 定義在 config.yaml (model_trainer)
    train_data_path: Path ## 定義在 config.yaml (model_trainer)
    test_data_path: Path  ## 定義在 config.yaml (model_trainer)
    model_name: str       ## 定義在 config.yaml (model_trainer)
    alpha: float          ## 定義在 params.yaml
    l1_ratio: float       ## 定義在 params.yaml
    target_column: str    ## 定義在 schema.yaml

## 5. Update the configuration manager 
* project 對應 `src/mlProject/config/configuration.py`用到
	- '4. entity'： `src/mlProject/entity/config_entity.py` -- 輸出 ModelTrainerConfig

In [None]:
from mlProject.constants import *
from mlProject.utils.common import read_yaml, create_directories

In [None]:
class ConfigurationManager:
    def __init__(
        self,
        config_filepath = CONFIG_FILE_PATH,   ## 輸出: PosixPath("config/config.yaml")
        params_filepath = PARAMS_FILE_PATH,   ## 輸出: PosixPath("params.yaml")
        schema_filepath = SCHEMA_FILE_PATH):  ## 輸出: PosixPath("schema.yaml")

        self.config = read_yaml(config_filepath) ## 輸出: ConfigBox({...}); config.artifacts_root 是 str
        self.params = read_yaml(params_filepath) ## 輸出: ConfigBox({...})
        self.schema = read_yaml(schema_filepath) ## 輸出: ConfigBox({...})

        create_directories([self.config.artifacts_root]) ## 建立目錄 artifacts



    def get_model_trainer_config(self) -> ModelTrainerConfig:
        config = self.config.model_trainer  ## 輸出: ConfigBox({...});
        params = self.params.ElasticNet     ## 輸出: ConfigBox({...}); 
        schema =  self.schema.TARGET_COLUMN ## 輸出: ConfigBox({...});

        create_directories([config.root_dir]) ## 建目錄 artifacts/model_trainer

        model_trainer_config = ModelTrainerConfig(
            root_dir=config.root_dir,                  ## artifacts/model_trainer
            train_data_path = config.train_data_path,  ## artifacts/data_transformation/train.csv
            test_data_path = config.test_data_path,    ## artifacts/data_transformation/test.csv
            model_name = config.model_name,            ## model.joblib
            alpha = params.alpha,                      ## 0.2
            l1_ratio = params.l1_ratio,                ## 0.1 
            target_column = schema.name                ## quality
            
        )

        return model_trainer_config

## 6. Update the components
* project 對應 `src/mlProject/components/model_trainer.py` 用到
	- '4. entity'： `src/mlProject/entity/config_entity.py` -- 輸入 ModelTrainerConfig

In [5]:
import pandas as pd
import os
from mlProject import logger
from sklearn.linear_model import ElasticNet
import joblib

In [None]:
class ModelTrainer:
    def __init__(self, config: ModelTrainerConfig):
        self.config = config

    
    def train(self):
        train_data = pd.read_csv(self.config.train_data_path) ## 讀取 artifacts/data_transformation/train.csv
        test_data = pd.read_csv(self.config.test_data_path)  ## 讀取 artifacts/data_transformation/test.csv


        train_x = train_data.drop([self.config.target_column], axis=1) ## train_data 捨棄 quality 欄
        test_x = test_data.drop([self.config.target_column], axis=1)   ## test_data 捨棄 quality 欄 (沒用到)
        train_y = train_data[[self.config.target_column]]              ## train_data 取得 quality 欄
        test_y = test_data[[self.config.target_column]]                ## test_data 取得 quality 欄 (沒用到)


        lr = ElasticNet(alpha=self.config.alpha, l1_ratio=self.config.l1_ratio, random_state=42)
        lr.fit(train_x, train_y)
				## 將 lr 存為 artifacts/model_trainer/model.joblib
        joblib.dump(lr, os.path.join(self.config.root_dir, self.config.model_name))

## 7. Update the pipeline 
* project 對應 `src/mlProject/pipeline/stage_04_model_trainer.py` 需調用
	- '5. configuration': `src/mlProject/config/configuration.py` -- 輸出 ModelTrainerConfig
	- '6. components': `src/mlProject/components/model_evaluation.py` -- 輸入 ModelTrainerConfig

In [None]:
try:
    config = ConfigurationManager()                                  ## 例化configuration，建立主目錄
    model_trainer_config = config.get_model_trainer_config()         ## 執行configuration，建立次目錄，例化entity
    model_trainer_config = ModelTrainer(config=model_trainer_config) ## 例化 component
    model_trainer_config.train()                                     ## 執行 component，訓練模型後存成 joblib 檔
except Exception as e:
    raise e

[2025-01-12 13:04:48,406: INFO: common: yaml file: config\config.yaml loaded successfully]
[2025-01-12 13:04:48,407: INFO: common: yaml file: params.yaml loaded successfully]
[2025-01-12 13:04:48,409: INFO: common: yaml file: schema.yaml loaded successfully]
[2025-01-12 13:04:48,409: INFO: common: created directory at: artifacts]
[2025-01-12 13:04:48,410: INFO: common: created directory at: artifacts/model_trainer]


改成以下寫法應該更為清楚明瞭
```python
try:
		...
    model_trainer = ModelTrainer(config=model_trainer_config)
    model_trainer.train()
except Exception as e:
    raise e
```