<div class="bar_title"></div>

*Enterprise AI*

# Assignment 2 - Machine Learning Pipeline with ZenML

Gunther Gust / Viet Nguyen<br>
Chair of Enterprise AI

Summer Semester 25

<img src="https://github.com/GuntherGust/tds2_data/blob/main/images/d3.png?raw=true" style="width:20%; float:left;" />

In this assignment, your goal is to modularize each part of the machine learning process using ZenML `step` and `pipeline`:
- Load, split and preprocess the data
- Train a model
- Evaluate a model

Please DO NOT remove or modify the cells with `assert` functions. They are meant to let you know that your functions are working correctly, and you are on the right track. In addition, you PASS the assignment ONLY IF **your code logic is correct** AND **you pass all the `assert` functions**. Good luck!

In [1]:
# run this cell to initialize a fresh zenml project
!rm -rf .zen
!zenml init

  import pkg_resources
[?25l[32m⠋[0m Initializing ZenML repository at /workspaces/assignment-2-solution.
[2K[1A[2K[32m⠙[0m Initializing ZenML repository at /workspaces/assignment-2-solution.
[2K[1A[2K[32m⠹[0m Initializing ZenML repository at /workspaces/assignment-2-solution.
[2K[1A[2K[32m⠸[0m Initializing ZenML repository at /workspaces/assignment-2-solution.
[2K[1A[2K[32m⠼[0m Initializing ZenML repository at /workspaces/assignment-2-solution.
[2K[1A[2K[32m⠴[0m Initializing ZenML repository at /workspaces/assignment-2-solution.
[1;35mSetting the repo active project to 'default'.[0m
[33mSetting the repo active stack to default.[0m
[2K[1A[2K[2;36mZenML repository initialized at [0m[2;35m/workspaces/[0m[2;95massignment-2-solution.[0m
[2;32m⠦[0m[2;36m Initializing ZenML repository at /workspaces/assignment-2-solution.[0m
[2K[1A[2K[32m⠦[0m Initializing ZenML repository at /workspaces/assignment-2-solution.

[1A[2K[1A[2K[2;36mThe local 

## 1. Data Inspection

The dataset contains daily weather observations of Perth, Australia. Each row represents the weather conditions for a given day. Our task is to predict whether it will rain tomorrow based on today's weather conditions (features).

In [2]:
# run this cell
import pandas as pd

(a) Load the dataset "weather.csv" in the `data` folder. Display the first 7 rows of the dataset:

In [3]:
data = pd.read_csv("data/weather.csv")
data.head(7)

Unnamed: 0,Date,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,WindDir3pm,...,WindSpeed3pm,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainTomorrow
0,2008-07-01,2.7,18.8,0.0,0.8,9.1,ENE,20.0,,E,...,7.0,97.0,53.0,1027.6,1024.5,2.0,3.0,8.5,18.1,No
1,2008-07-02,6.4,20.7,0.0,1.8,7.0,NE,22.0,ESE,ENE,...,9.0,80.0,39.0,1024.1,1019.0,0.0,6.0,11.1,19.7,No
2,2008-07-03,6.5,19.9,0.4,2.2,7.3,NE,31.0,,WNW,...,4.0,84.0,71.0,1016.8,1015.6,1.0,3.0,12.1,17.7,Yes
3,2008-07-04,9.5,19.2,1.8,1.2,4.7,W,26.0,NNE,NNW,...,6.0,93.0,73.0,1019.3,1018.4,6.0,6.0,13.2,17.7,Yes
4,2008-07-05,9.5,16.4,1.8,1.4,4.9,WSW,44.0,W,SW,...,17.0,69.0,57.0,1020.4,1022.1,7.0,5.0,15.9,16.0,Yes
5,2008-07-06,0.7,15.9,6.8,2.4,9.3,NNE,24.0,ENE,NE,...,7.0,86.0,41.0,1032.0,1029.6,0.0,1.0,6.9,15.5,No
6,2008-07-07,0.7,18.3,0.0,0.8,9.3,N,37.0,NE,NNE,...,13.0,72.0,36.0,1028.9,1024.2,1.0,5.0,8.7,17.9,No


In [4]:
# run this cell
assert data.shape == (3193, 21)

(b) Summarize the missing values of individual columns:

In [5]:
data.isna().sum()

Date               0
MinTemp            0
MaxTemp            1
Rainfall           0
Evaporation        1
Sunshine           5
WindGustDir        5
WindGustSpeed      5
WindDir9am       134
WindDir3pm         7
WindSpeed9am       0
WindSpeed3pm       1
Humidity9am        9
Humidity3pm        8
Pressure9am        1
Pressure3pm        1
Cloud9am           2
Cloud3pm           4
Temp9am            0
Temp3pm            1
RainTomorrow       0
dtype: int64

(c) Which column has the most missing values? Answer in one line:

WindDir9am

## 2. ML Pipeline with ZenML

Now that we have a good understanding of our data, we can begin building our pipeline using ZenML. A ZenML pipeline consists of a series of modular **steps**, each representing a distinct stage in the machine learning workflow such as data loading, feature engineering, or model tuning. These steps are defined as functions that comply with the ZenML framework's specifications. Once all the individual steps are implemented, they can be assembled into a single Python function that defines the complete **pipeline**.

In [6]:
# run this cell
from zenml import pipeline, step
from typing_extensions import Annotated
from typing import Tuple
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, MinMaxScaler


#### 2.1. Feature Engineering Steps

(a) Load the data from the csv file. Please use `pd.read_csv()` with parameter `index_col` as the `Date` column of the dataset. 

The `:param name:` is a description of the expected behavior of the parameter:

In [7]:
@step(enable_cache=False)
def loading_data(filename: str) -> Annotated[pd.DataFrame, "input_data"]:
    """ 
    Loads a CV File and transforms it to a Pandas DataFrame
    :param filename: the file name (including the path) of the dataset
    
    return pandas DataFrame of the dataset
    """
    data = pd.read_csv(filename, index_col="Date")
    return data

(b) Split the data set into train/test ratio of 7/3. Use `random_state=0`:

In [None]:
@step
def split_data(dataset:pd.DataFrame, label: str) -> Tuple[
    Annotated[pd.DataFrame, "X_train"],
    Annotated[pd.DataFrame, "X_test"],
    Annotated[pd.Series, "y_train"],
    Annotated[pd.Series, "y_test"]]:
    """
    Splits a dataset into training and testing sets.
    :param dataset: the pandas DataFrame loaded from the csv file
    :param label: the column of target variable. Example usage: y = dataset[label] will get the values of target variable

    return X_train, X_test, y_train, y_test of the dataset
    """
    X = dataset.drop(label, axis=1)
    Y = dataset[label]
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, shuffle=False)
    return X_train, X_test, y_train, y_test

(c) As observed earlier, our dataset contains missing values. To address this, we will define a pipeline step that handles imputation. Specifically, numerical features will be imputed using the `median strategy`, while categorical features will use the `most frequent value`. These imputation strategies should be applied consistently to both the training and test datasets. The step function will return the transformed train and test sets:

In [9]:
@step
def impute_missing_values(X_train:pd.DataFrame, X_test:pd.DataFrame) -> Tuple[Annotated[pd.DataFrame, "X_train_imputed"],Annotated[pd.DataFrame, "X_test_imputed"]]:
    """
    Imputes missing values in training and testing datasets.
    :param X_train: feature columns of the train set
    :param X_test: feature columns of the test set

    return train and test sets that have been properly imputed
    """
    categorical_imputer = SimpleImputer(strategy="most_frequent")
    numerical_imputer = SimpleImputer(strategy="median")
    categorical_columns = X_train.select_dtypes(include="object").columns
    numerical_columns = X_train.select_dtypes(exclude="object").columns

    X_train[numerical_columns] = pd.DataFrame(
    numerical_imputer.fit_transform(X_train[numerical_columns]),index=X_train.index, columns=numerical_columns
    )

    X_test[numerical_columns] = pd.DataFrame(
    numerical_imputer.transform(X_test[numerical_columns]),index=X_test.index, columns=numerical_columns
    )

    X_train[categorical_columns] = pd.DataFrame(
    categorical_imputer.fit_transform(X_train[categorical_columns]),index=X_train.index, columns=categorical_columns
    )

    X_test[categorical_columns] = pd.DataFrame(
    categorical_imputer.transform(X_test[categorical_columns]),index=X_test.index, columns=categorical_columns
    )
    return X_train, X_test 

(d) The next step in our feature engineering process is encoding the categorical variables. We'll define a pipeline step that applies `one-hot encoding` to all `categorical features` in both the training and test datasets. This transformation ensures that categorical values are converted into a numerical format suitable for machine learning models. The function should replace the original categorical columns with their corresponding one-hot encoded values and return the updated train and test sets:

In [10]:
@step
def encode_categorical_values(X_train:pd.DataFrame, X_test:pd.DataFrame) -> Tuple[Annotated[pd.DataFrame, "X_train_encoded"],Annotated[pd.DataFrame, "X_test_encoded"]]:
    """
    Encodes categorical columns in the training and testing datasets using one-hot encoding.
    :param X_train: feature columns of the train set
    :param X_test: feature columns of the test set

    return train and test sets that have been properly encoded
    """
    one_hot_encoder = OneHotEncoder(sparse_output=False)
    categorical_columns = X_train.select_dtypes(include="object").columns
    encoded_values_train = pd.DataFrame(one_hot_encoder.fit_transform(X_train[categorical_columns]),index=X_train.index,columns=one_hot_encoder.get_feature_names_out())

    encoded_values_test = pd.DataFrame(one_hot_encoder.transform(X_test[categorical_columns]),index=X_test.index,columns=one_hot_encoder.get_feature_names_out())
    print(encoded_values_test)
    X_train.drop(categorical_columns, axis=1, inplace=True)
    X_train = pd.concat([X_train, encoded_values_train], axis=1)

    X_test.drop(categorical_columns, axis=1, inplace=True)
    X_test = pd.concat([X_test, encoded_values_test], axis=1)
    return X_train, X_test

(e) The next feature engineering step in our pipeline is label encoding. Since our target variable, RainTomorrow, is represented as text values ('No' and 'Yes'), we will use `label encoding` to convert these into numerical format. Specifically, the encoder will map 'No' to 0 and 'Yes' to 1, making the target variable suitable for model training.


In [11]:
@step
def label_encoding(y_train:pd.Series, y_test:pd.Series) -> Tuple[Annotated[pd.Series, "y_train_encoded"], Annotated[pd.Series, "y_test_encoded"]]:
    """
    Applies label encoding to the target variable for both training and testing datasets.
    :param y_train: target column of the train set
    :param y_test: target column of the test set

    return train and test target columns that have been properly encoded
    """
    encoder = LabelEncoder()
    y_train = pd.Series(encoder.fit_transform(y_train))
    y_test = pd.Series(encoder.transform(y_test))
    return y_train, y_test

(f) The final feature engineering step in our pipeline is feature scaling. We will use a `MinMaxScaler` to normalize the feature values, scaling them to a range between 0 and 1. This ensures that all features contribute equally to the model's learning process. The scaling will be applied to both the training and test datasets, and the transformed results will be returned as DataFrames:

In [12]:
@step 
def scale_values(X_train:pd.DataFrame,X_test:pd.DataFrame) -> Tuple[Annotated[pd.DataFrame, "X_train_scaled"], Annotated[pd.DataFrame, "X_test_scaled"]]:
    """
    Scales numerical features to a range between 0 and 1 using MinMax scaling.
    :param X_train: feature columns of the train set
    :param X_test: feature columns of the test set

    return train and test sets that have been properly scaled 
    """
    scaler = MinMaxScaler()
    X_train = pd.DataFrame(scaler.fit_transform(X_train),index=X_train.index, columns=X_train.columns)
    X_test = pd.DataFrame(scaler.transform(X_test),index=X_test.index, columns=X_test.columns)

    return X_train, X_test

### 2.2. Modeling and Evaluation

Now that we've completed all necessary preprocessing steps to create a clean and usable dataset, we're ready to develop and evaluate our machine learning model. Before assembling and executing the full pipeline, we need to define two final steps. Let's try to fit this dataset with a `Logistic Regression` model:

In [13]:
# run this cell
from sklearn.linear_model import LogisticRegression
from sklearn.base import ClassifierMixin

(a) The first step is the model_trainer step. This step takes `X_train` (feature set) and `y_train` (corresponding labels) as input. Within the step, a machine learning model is instantiated and trained on the input data. Once training is complete, the fitted model is returned as an artifact, ready for evaluation and inference. Please use the the built-in function `.score()` of the `model` to compute the accuracy

In [14]:
@step
def model_trainer(X_train: pd.DataFrame, y_train: pd.Series)-> Tuple[Annotated[ClassifierMixin, "model"], Annotated[float, "in_sample_accuracy"]]:
    """
    Trains a logistic regression model using the provided training data and computes the in-sample accuracy.
    :param X_train: feature columns of the train set
    :param y_train: target column of the train set

    return a logistic regression model and in-sample accuracy (train accuracy)
    """
    model = LogisticRegression()
    model.fit(X_train, y_train)
    in_sample_score = model.score(X_train, y_train)
    return model, in_sample_score

(b) The final step in our pipeline is to evaluate the performance of the trained model. For this, we define the evaluate_model step. It takes three input arguments: the trained model returned by the `model_trainer` step, the preprocessed test features (`X_test`), and the corresponding test labels (`y_test`). Within this step, we calculate the model's accuracy on the test dataset. The resulting accuracy score is then returned as a performance metric. Please use the the built-in function `.score()` of the `model` to compute the accuracy:

In [15]:
@step
def evaluate_model(model:ClassifierMixin, X_test:pd.DataFrame, y_test:pd.DataFrame) -> Annotated[float, "accuracy"]:
    """
    Evaluates the accuracy of a trained model using the testing dataset.
    :param model: a trained model
    :param X_test: feature columns of the test set
    :param y_test: target column of the test set

    return out-of-sample accuracy (test accuracy)
    """
    score = model.score(X_test, y_test)
    return score

### 2.3. Pipeline

With all the necessary steps defined, we are now ready to assemble our pipeline. In ZenML, this is done by stacking the individual steps into a single function that represents the pipeline. To create this, we define a new function such as `training_pipeline()`, and annotate it with the `@pipeline` decorator. Please create a pipeline following this procedure:
1. Load data
2. Split data
3. Impute, encode and scale data
4. Encode targets
5. Train and evaluate the models


In [16]:
@pipeline
def training_pipeline():
    """
    Executes a full training pipeline on weather data to predict rain tomorrow.
    """
    dataset = loading_data("data/weather.csv")
    X_train, X_test,y_train,y_test = split_data(dataset, "RainTomorrow")
    X_train, X_test = impute_missing_values(X_train, X_test)
    X_train, X_test = encode_categorical_values(X_train, X_test)
    X_train, X_test = scale_values(X_train, X_test)
    y_train, y_test = label_encoding(y_train, y_test)
    model, in_sample_score = model_trainer(X_train, y_train)
    score = evaluate_model(model, X_test, y_test)

Let us execute our pipeline by calling our training_pipeline() function.

In [17]:
# run this cell
training_pipeline()

[1;35mInitiating a new run for the pipeline: [0m[1;36mtraining_pipeline[1;35m.[0m


  import pkg_resources


[33mIn a future release, the default Python package installer used by ZenML to build container images for your containerized pipelines will change from 'pip' to 'uv'. To maintain current behavior, you can explicitly set [0m[1;36mpython_package_installer=PythonPackageInstaller.PIP[33m in your DockerSettings.[0m
[1;35mUsing user: [0m[1;36mdefault[1;35m[0m
[1;35mUsing stack: [0m[1;36mdefault[1;35m[0m
[1;35m  orchestrator: [0m[1;36mdefault[1;35m[0m
[1;35m  artifact_store: [0m[1;36mdefault[1;35m[0m
[1;35mYou can visualize your pipeline runs in the [0m[1;36mZenML Dashboard[1;35m. In order to try it locally, please run [0m[1;36mzenml login --local[1;35m.[0m
[1;35mStep [0m[1;36mloading_data[1;35m has started.[0m
[loading_data] [33mBy default, the [0m[1;36mPandasMaterializer[33m stores data as a [0m[1;36m.csv[33m file. If you want to store data more efficiently, you can install [0m[1;36mpyarrow[33m by running '[0m[1;36mpip install pyarrow[33m'.

  df = pd.read_csv(f, index_col=0, parse_dates=True)


[1;35mStep [0m[1;36mmodel_trainer[1;35m has finished in [0m[1;36m0.503s[1;35m.[0m
[1;35mStep [0m[1;36mevaluate_model[1;35m has started.[0m
[evaluate_model] [33mBy default, the [0m[1;36mPandasMaterializer[33m stores data as a [0m[1;36m.csv[33m file. If you want to store data more efficiently, you can install [0m[1;36mpyarrow[33m by running '[0m[1;36mpip install pyarrow[33m'. This will allow [0m[1;36mPandasMaterializer[33m to automatically store the data as a [0m[1;36m.parquet[33m file instead.[0m
[evaluate_model] [33mBy default, the [0m[1;36mPandasMaterializer[33m stores data as a [0m[1;36m.csv[33m file. If you want to store data more efficiently, you can install [0m[1;36mpyarrow[33m by running '[0m[1;36mpip install pyarrow[33m'. This will allow [0m[1;36mPandasMaterializer[33m to automatically store the data as a [0m[1;36m.parquet[33m file instead.[0m


  df = pd.read_csv(f, index_col=0, parse_dates=True)


[1;35mStep [0m[1;36mevaluate_model[1;35m has finished in [0m[1;36m0.278s[1;35m.[0m
[1;35mPipeline run has finished in [0m[1;36m5.557s[1;35m.[0m


PipelineRunResponse(body=PipelineRunResponseBody(created=datetime.datetime(2025, 5, 29, 22, 47, 47, 700136), updated=datetime.datetime(2025, 5, 29, 22, 47, 53, 285850), user_id=UUID('e4b90bb5-714c-4daf-aed3-77b52adbf0dd'), project_id=UUID('36a9770f-837c-49ea-91e4-fea5baf0f818'), status=<ExecutionStatus.COMPLETED: 'completed'>, stack=StackResponse(body=StackResponseBody(created=datetime.datetime(2025, 5, 29, 19, 45, 26, 1397), updated=datetime.datetime(2025, 5, 29, 19, 45, 26, 1414), user_id=None), metadata=None, resources=None, id=UUID('7fed847a-0444-45d1-bade-678d2d045f31'), permission_denied=False, name='default'), pipeline=PipelineResponse(body=PipelineResponseBody(created=datetime.datetime(2025, 5, 29, 20, 12, 52, 587050), updated=datetime.datetime(2025, 5, 29, 20, 12, 52, 587067), user_id=UUID('e4b90bb5-714c-4daf-aed3-77b52adbf0dd'), project_id=UUID('36a9770f-837c-49ea-91e4-fea5baf0f818')), metadata=None, resources=None, id=UUID('c6f6c1f4-38a8-48d7-af59-98c365cc0d06'), permission_

### 2.4. Evaluating performance by loading artifacts

Let's now retrieve the artifacts from the pipeline for different purposes. We first create a client instance to interact with the ZenML backend:

In [18]:
# run this cell
from zenml.client import Client
client = Client()

(a) Retrieve the test accuracy artifact:

In [19]:
acc_artifact = client.get_artifact_version("accuracy")
# no need to modify this
test_acc_re = acc_artifact.load()

In [20]:
# run this cell
# if your pipeline is correct, you should obtain accuracy >= 90%
assert test_acc_re >= 0.9

(b) You can also retrieve the trained model by loading the artifact from the `model_trainer` step:

In [21]:
model_artifact = client.get_artifact_version("model")
# no need to modify this
model_re = model_artifact.load()

(c) Please retrieve:
- `X_test` artifact from `scale_values` step
- `y_test` artifact from `label_encoding` step

In [22]:
X_test_artifact = client.get_artifact_version("X_test_scaled")
y_test_artifact = client.get_artifact_version("y_test_encoded")
# no need to modify these, please ignore the warnings
X_test_re = X_test_artifact.load()
y_test_re = y_test_artifact.load()

[33mBy default, the [0m[1;36mPandasMaterializer[33m stores data as a [0m[1;36m.csv[33m file. If you want to store data more efficiently, you can install [0m[1;36mpyarrow[33m by running '[0m[1;36mpip install pyarrow[33m'. This will allow [0m[1;36mPandasMaterializer[33m to automatically store the data as a [0m[1;36m.parquet[33m file instead.[0m
[33mBy default, the [0m[1;36mPandasMaterializer[33m stores data as a [0m[1;36m.csv[33m file. If you want to store data more efficiently, you can install [0m[1;36mpyarrow[33m by running '[0m[1;36mpip install pyarrow[33m'. This will allow [0m[1;36mPandasMaterializer[33m to automatically store the data as a [0m[1;36m.parquet[33m file instead.[0m


  df = pd.read_csv(f, index_col=0, parse_dates=True)


(d) Use the retrieved model (`model_re`) to compute the accuracy using the retrieved data (`X_test_re`, `y_test_re`):

In [23]:
test_acc = model_re.score(X_test_re, y_test_re)

In [24]:
# run this cell, the retrieved accuracy and the newly computed one should be equal
# the reason is, we are using the same trained model to compute the accuracy
assert test_acc_re == test_acc

## 3. Evaluating Model Robustness via Artifact Resuse in ZenML

In this exercise, you will build upon the pipeline you created in Exercise 2 by adding a new step that **perturbs the test data with Gaussian (white) noise** before evaluating the omdel's performance. Instead of testing on the original test set (`X_test`), you will:

- Apply random noise to the numerical features of the test data, simulating real-world data imperfections or sensor noise.
- Evaluate the trained model on this noisy test set to observe how its performance change

### Why perturbing the test data?

Machine learning models often perform well on clean, well-prepared data but can be sensitive to small perturbations or noise in inputs. This sensitivity reflects a model’s robustness or generalization ability when facing slightly altered or imperfect data. For example, Gaussian noise (random variation with zero mean) can simulate measurement errors or environmental variations that frequently occur in real-world scenarios. 

More importantly, this sensitivity is related to a broader challenge known as adversarial robustness: small, carefully crafted perturbations, called [adversarial attacks](https://medium.com/@yashgaherwar2002/adversarial-machine-learning-attacks-preventions-640c5ffc2404), which can drastically change a model’s predictions despite being imperceptible to humans. While Gaussian  noise is random and unstructured, adversarial perturbations are deliberate and exploit model vulnerabilities. Studying how models perform under random noise is a first step toward understanding and improving their robustness against such adversarial manipulations and other real-world data shifts.

To briefly illustrate the concept of adversarial examples, consider the following image:


![panda](https://miro.medium.com/v2/resize:fit:720/format:webp/0*iGzOt4oCR74nMNgu)

In this example, an image of a panda is correctly classified by a machine learning model. However, by adding subtle perturbations (undetected by human eyes), the model misclassifies the image as a "gibbon" with high confidence. This phenomenon demonstrates the vulnerability of machine learning models to small, intentional changes in input data, which highlights the importance of evaluating model robustness. If you are interested, here is a good [resource](https://arxiv.org/pdf/1412.6572) to start with.

(a) Create a ZenML step to perturb the input data `X` with Gaussian noise using [numpy.random.normal](https://numpy.org/doc/2.1/reference/random/generated/numpy.random.normal.html).
- Hint 1: extract numeric columns using `X.select_dtypes()` to get numerical columns only. Please refer back to assignment 1 and tutorial 2.
- Hint 2: create `noise` variable with the following parameters:
    - `loc=0.0`
    - `scale=0.5`
    - `size=X[extracted_numerical_columns].shape`
- Hint 3: Add this noise into X using `X.loc[:, extracted_numerical_columns] += noise`

In [25]:
import numpy as np

@step
def perturb_data(X: pd.DataFrame) -> Annotated[pd.DataFrame, "perturbed_x"] :
    """Applies Gaussian noise to numerical features of X."""
    # Select only numeric columns
    numeric_cols = X.select_dtypes(include=[np.number]).columns

    # Add noise only to numeric columns
    noise = np.random.normal(0, 0.5, size=X[numeric_cols].shape)
    X.loc[:, numeric_cols] += noise
    return X

(b) Create a new pipeline with the following steps:
1. Load data
2. Split data
3. Impute, encode and scale data
4. Encode targets
5. Train the model using `X_train`, `y_train`
6. Perturb `X_test`
7. Evaluate the model using `perturbed_X_test`, `y_test`

Hint: re-use the steps from exercise 2 and combine it with 3(a)

In [26]:
@pipeline
def robustness_evaluation_pipeline():
    dataset = loading_data("data/weather.csv")
    X_train, X_test, y_train, y_test = split_data(dataset, "RainTomorrow")
    X_train, X_test = impute_missing_values(X_train, X_test)
    X_train, X_test = encode_categorical_values(X_train, X_test)
    X_train, X_test = scale_values(X_train, X_test)
    y_train, y_test = label_encoding(y_train, y_test)
    perturbed_X_test = perturb_data(X_test)
    model, in_sample_score = model_trainer(X_train, y_train)
    score = evaluate_model(model, perturbed_X_test, y_test)

In [27]:
robustness_evaluation_pipeline()

[1;35mInitiating a new run for the pipeline: [0m[1;36mrobustness_evaluation_pipeline[1;35m.[0m
[1;35mUsing user: [0m[1;36mdefault[1;35m[0m
[1;35mUsing stack: [0m[1;36mdefault[1;35m[0m
[1;35m  orchestrator: [0m[1;36mdefault[1;35m[0m
[1;35m  artifact_store: [0m[1;36mdefault[1;35m[0m
[1;35mYou can visualize your pipeline runs in the [0m[1;36mZenML Dashboard[1;35m. In order to try it locally, please run [0m[1;36mzenml login --local[1;35m.[0m
[1;35mStep [0m[1;36mloading_data[1;35m has started.[0m
[loading_data] [33mBy default, the [0m[1;36mPandasMaterializer[33m stores data as a [0m[1;36m.csv[33m file. If you want to store data more efficiently, you can install [0m[1;36mpyarrow[33m by running '[0m[1;36mpip install pyarrow[33m'. This will allow [0m[1;36mPandasMaterializer[33m to automatically store the data as a [0m[1;36m.parquet[33m file instead.[0m
[1;35mStep [0m[1;36mloading_data[1;35m has finished in [0m[1;36m0.154s[1;35m.

  df = pd.read_csv(f, index_col=0, parse_dates=True)


[1;35mStep [0m[1;36mmodel_trainer[1;35m has finished in [0m[1;36m0.555s[1;35m.[0m
[1;35mStep [0m[1;36mperturb_data[1;35m has started.[0m
[perturb_data] [33mBy default, the [0m[1;36mPandasMaterializer[33m stores data as a [0m[1;36m.csv[33m file. If you want to store data more efficiently, you can install [0m[1;36mpyarrow[33m by running '[0m[1;36mpip install pyarrow[33m'. This will allow [0m[1;36mPandasMaterializer[33m to automatically store the data as a [0m[1;36m.parquet[33m file instead.[0m
[perturb_data] [33mBy default, the [0m[1;36mPandasMaterializer[33m stores data as a [0m[1;36m.csv[33m file. If you want to store data more efficiently, you can install [0m[1;36mpyarrow[33m by running '[0m[1;36mpip install pyarrow[33m'. This will allow [0m[1;36mPandasMaterializer[33m to automatically store the data as a [0m[1;36m.parquet[33m file instead.[0m
[1;35mStep [0m[1;36mperturb_data[1;35m has finished in [0m[1;36m0.226s[1;35m.[0m


  df = pd.read_csv(f, index_col=0, parse_dates=True)


[1;35mStep [0m[1;36mevaluate_model[1;35m has finished in [0m[1;36m0.241s[1;35m.[0m
[1;35mPipeline run has finished in [0m[1;36m6.164s[1;35m.[0m


PipelineRunResponse(body=PipelineRunResponseBody(created=datetime.datetime(2025, 5, 29, 22, 47, 54, 788994), updated=datetime.datetime(2025, 5, 29, 22, 48, 0, 959403), user_id=UUID('e4b90bb5-714c-4daf-aed3-77b52adbf0dd'), project_id=UUID('36a9770f-837c-49ea-91e4-fea5baf0f818'), status=<ExecutionStatus.COMPLETED: 'completed'>, stack=StackResponse(body=StackResponseBody(created=datetime.datetime(2025, 5, 29, 19, 45, 26, 1397), updated=datetime.datetime(2025, 5, 29, 19, 45, 26, 1414), user_id=None), metadata=None, resources=None, id=UUID('7fed847a-0444-45d1-bade-678d2d045f31'), permission_denied=False, name='default'), pipeline=PipelineResponse(body=PipelineResponseBody(created=datetime.datetime(2025, 5, 29, 22, 21, 54, 301866), updated=datetime.datetime(2025, 5, 29, 22, 21, 54, 301883), user_id=UUID('e4b90bb5-714c-4daf-aed3-77b52adbf0dd'), project_id=UUID('36a9770f-837c-49ea-91e4-fea5baf0f818')), metadata=None, resources=None, id=UUID('3444efd0-6507-4f7b-aa1a-a887086de239'), permission_d

(c) Let's use the client from `exercise 2.4` to load the `accuracy` artifact again:

In [28]:
perturbed_acc_artifact = client.get_artifact_version("accuracy")
# no need to modify this
perturbed_test_acc = perturbed_acc_artifact.load()
perturbed_test_acc

0.7390396659707724

You would notice that the performance is now worse than that of the original test set. If you increase the `scale` parameter of `numpy.random.normal`, you will notice that the performance drops further.

In [29]:
# run this cell
assert perturbed_test_acc < test_acc_re