Continuous Integration and Deployment (CI/CD) processes have been essential in traditional software development, and they are now becoming equally important in the realm of Artificial Intelligence (AI) systems. By adapting CI/CD for AI applications, automated pipelines can be created to regularly update, retrain, and deploy AI models, ensuring consistent performance and accuracy over time. This approach fosters collaboration between engineering and data science teams, streamlining workflows involving data collection, model training, evaluation, and deployment.

### Understanding CI/CD for AI
CI/CD is a DevOps strategy that automates the process of building, testing, and deploying software updates, allowing code to move quickly from development to production. In AI systems, this involves a series of steps starting with data preparation, model training, and testing, followed by continuous integration and deployment of the model to production environments.

The traditional CI/CD flow in software development starts with requirements gathering and design, moves through coding, building, and testing, and finally transitions into continuous deployment where updates are rolled out, monitored, and iterated upon. Similarly, in AI, the process involves preparing data, training models, testing their performance, and deploying them in real-world scenarios. This process is further enhanced by a feedback loop that continuously monitors AI models, ensuring they are up-to-date and effective in making accurate predictions.

### CI/CD for AI Systems
CI/CD for AI, often referred to as **MLOps**, is designed to streamline the creation, deployment, and management of AI models. It automates the AI lifecycle, removing the need for constant manual intervention by data scientists and engineers. Once an AI model is integrated into a CI/CD pipeline, it can be automatically retrained and redeployed as new data becomes available, reducing human error and improving model performance over time.

This continuous feedback loop allows the AI system to evolve with changing data, ensuring that the model remains relevant and accurate. Regular monitoring and retraining are key components of this process, ensuring that AI models are consistently optimized based on real-world performance metrics.

### Why CI/CD is Important for AI
The value of CI/CD in AI lies in its scalability. For small-scale AI projects, manual updates and deployments may suffice. However, for organizations building large-scale AI systems, where hundreds or even thousands of model versions may need to be tested and deployed, a more robust and automated approach is essential. CI/CD provides the infrastructure needed to support these operations, enabling teams to manage the complexity of AI model development at scale while minimizing technical debt.

By integrating CI/CD into the AI workflow, the entire pipeline—from data ingestion to model deployment—becomes more streamlined and scalable. CI/CD enables AI models to be automatically trained, tested, and deployed, allowing for faster experimentation and iteration. With proper infrastructure support, including cloud-based or on-premise compute resources, AI systems can continuously evolve with minimal manual oversight.

### Building a CI/CD Pipeline for AI
An AI pipeline using CI/CD starts with data collection and validation, ensuring the quality and integrity of the data used for model training. Next, various algorithms are applied to train the model, followed by rigorous testing to evaluate the model’s performance. Once the model passes these tests, it is deployed in production environments, where it begins making predictions or performing other tasks.

A key aspect of CI/CD in AI is the continuous monitoring of model performance in production. Since AI models degrade over time as new data patterns emerge (a phenomenon known as **model drift** or **model decay**), the pipeline must automatically detect when retraining is necessary. This feedback loop is critical to ensuring that the AI model stays relevant and continues delivering accurate results.

### Automating the AI Pipeline with CI/CD
In AI, CI/CD pipelines must incorporate statistical testing, anomaly detection, and continuous monitoring to ensure data integrity and model accuracy. By continuously optimizing AI pipelines, organizations gain a competitive edge, ensuring that models in production are always performing at their best.

Building a fully automated CI/CD pipeline for AI also reduces the need for frequent manual intervention from data scientists and engineers. This allows for the establishment of production-ready AI systems that can automatically adapt to new data and changing conditions. By automating model retraining and deployment, AI pipelines can scale effectively without sacrificing reliability or accuracy.

### Conclusion
CI/CD for AI systems is crucial for scaling AI models and ensuring their continued effectiveness in production. By automating the entire AI lifecycle—from data collection to model deployment and retraining—CI/CD enables AI systems to evolve continuously without manual oversight. With proper infrastructure and monitoring tools, AI pipelines can operate efficiently, ensuring that models remain accurate, compliant with regulations, and ready for real-world applications. As AI systems grow in complexity, the role of CI/CD will only become more important in maintaining the performance and reliability of these models.



In [None]:
# Install pipenv if not already installed
!pip install pipenv

# Initialize pipenv and install dependencies
!pipenv install pandas numpy scikit-learn giskard


Collecting pipenv
  Downloading pipenv-2024.0.2-py3-none-any.whl.metadata (19 kB)
Collecting virtualenv>=20.24.2 (from pipenv)
  Downloading virtualenv-20.26.5-py3-none-any.whl.metadata (4.5 kB)
Collecting distlib<1,>=0.3.7 (from virtualenv>=20.24.2->pipenv)
  Downloading distlib-0.3.8-py2.py3-none-any.whl.metadata (5.1 kB)
Downloading pipenv-2024.0.2-py3-none-any.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading virtualenv-20.26.5-py3-none-any.whl (6.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m27.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading distlib-0.3.8-py2.py3-none-any.whl (468 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.9/468.9 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: distlib, virtualenv, pipenv
Successfully installed distlib-0.3.8 pipenv-2024.0.2 virtualenv-20.26.5
[1

In [None]:
pip install "giskard>=2.0.0b" -U -q


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m544.9/544.9 kB[0m [31m24.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.8/140.8 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m43.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.3/143.3 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import numpy as np
import pandas as pd
from giskard import Dataset
from sklearn.model_selection import train_test_split

# Load the dataset
DATASET_URL = 'https://raw.githubusercontent.com/Giskard-AI/examples/main/datasets/WA_Fn-UseC_-Telco-Customer-Churn.csv'
churn_df = pd.read_csv(DATASET_URL)

# Pre-process the dataset
CATEGORICAL_COLUMNS = ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod']
NUMERIC_COLUMNS = ['tenure', 'MonthlyCharges', 'TotalCharges']

def preprocess(df: pd.DataFrame) -> pd.DataFrame:
    df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
    df = df.dropna()
    df = df.drop('customerID', axis=1)
    df['PaymentMethod'] = df['PaymentMethod'].str.replace(' (automatic)', '', regex=False)
    df[CATEGORICAL_COLUMNS + ['Churn']] = df[CATEGORICAL_COLUMNS + ['Churn']].astype('object')
    return df

churn_df = preprocess(churn_df)

# Train-validation-test split
X_train, X_valid, y_train, y_valid = train_test_split(churn_df.drop('Churn', axis=1), churn_df.Churn, test_size=0.3, random_state=42)
X_valid, X_test, y_valid, y_test = train_test_split(X_valid, y_valid, test_size=0.5, random_state=42)

# Wrap the dataset with Giskard
raw_data = pd.concat([X_valid, y_valid], axis=1)
wrapped_data = Dataset(
    df = raw_data,  # A pandas.DataFrame that contains the raw data (before all the pre-processing steps) and the actual ground truth variable
    target = 'Churn',  # Ground truth variable
    name = "Churn classification dataset",  # Optional
    cat_columns = CATEGORICAL_COLUMNS  # List of categorical columns. Optional, but is a MUST if available. Inferred automatically if not.
)

INFO:giskard.datasets.base:Your 'pandas.DataFrame' is successfully wrapped by Giskard's 'Dataset' wrapper class.


In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

# Pre-process steps for the pipeline
preprocessor = ColumnTransformer(transformers=[
    ('num', StandardScaler(), NUMERIC_COLUMNS),
    ('cat', OneHotEncoder(handle_unknown='ignore',drop='first'), CATEGORICAL_COLUMNS)
])

# Define the pipeline
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', LogisticRegression())
])

# Train the model
pipeline.fit(X_train, y_train)

In [None]:
from giskard import Model, scan

# Define the prediction function
def prediction_function(df: pd.DataFrame) -> np.ndarray:
    # The pre-processor can be a pipeline of one-hot encoding, imputer, scaler, etc. OR
    # Perform the pre-processing steps manually here
    return pipeline.predict_proba(df)

# Wrap the model with Giskard
wrapped_model = Model(
    model = prediction_function,                # A prediction function that encapsulates all the data pre-processing steps and that could be executed with the dataset used by the scan.
    model_type = "classification",              # Either regression, classification or text_generation.
    classification_labels = pipeline.classes_,  # Their order MUST be identical to the prediction_function's output order
    name = "Churn classification",              # Name of the wrapped model [Optional]
    feature_names = X_valid.columns.to_list(),  # Default: all columns of your dataset [Optional]
    classification_threshold = 0.5,             # Default: 0.5 [Optional]
)

INFO:giskard.models.automodel:Your 'prediction_function' is successfully wrapped by Giskard's 'PredictionFunctionModel' wrapper class.


In [None]:

from giskard import scan

# Scan the model
scan_results = scan(wrapped_model, wrapped_data)

# Display the results in the notebook
display(scan_results)

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Your model is successfully validated.
🔎 Running scan…
Running detector SpuriousCorrelationDetector…


INFO:giskard.scanner.logger:SpuriousCorrelationDetector: `tenure` < 5.500 AND `tenure` >= 2.500	Association = 0.030
INFO:giskard.scanner.logger:SpuriousCorrelationDetector: `tenure` < 1.500	Association = 0.040
INFO:giskard.scanner.logger:SpuriousCorrelationDetector: `tenure` >= 53.500	Association = 0.159
INFO:giskard.scanner.logger:SpuriousCorrelationDetector: `TotalCharges` < 49.975	Association = 0.002
INFO:giskard.scanner.logger:SpuriousCorrelationDetector: `TotalCharges` >= 4883.700	Association = 0.085
INFO:giskard.scanner.logger:SpuriousCorrelationDetector: `TotalCharges` < 180.475 AND `TotalCharges` >= 80.900	Association = 0.005
INFO:giskard.scanner.logger:SpuriousCorrelationDetector: `TotalCharges` >= 3281.275 AND `TotalCharges` < 4883.700	Association = 0.015
INFO:giskard.scanner.logger:SpuriousCorrelationDetector: `TotalCharges` < 825.750 AND `TotalCharges` >= 603.800	Association = 0.003
INFO:giskard.scanner.logger:SpuriousCorrelationDetector: `MonthlyCharges` < 24.325	Associati

SpuriousCorrelationDetector: 0 issue detected. (Took 0:00:01.525765)
Running detector PerformanceBiasDetector…


INFO:giskard.scanner.logger:PerformanceBiasDetector: Loss calculated (took 0:00:02.026287)
INFO:giskard.scanner.logger:PerformanceBiasDetector: Finding data slices
INFO:giskard.scanner.logger:PerformanceBiasDetector: 51 slices found (took 0:00:01.937303)
INFO:giskard.scanner.logger:PerformanceBiasDetector: Analyzing issues
INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneServ

PerformanceBiasDetector: 15 issues detected. (Took 0:00:11.694657)
Running detector DataLeakageDetector…


INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

DataLeakageDetector: 0 issue detected. (Took 0:00:06.207372)
Running detector OverconfidenceDetector…


INFO:giskard.scanner.logger:OverconfidenceDetector: 27 slices found (took 0:00:01.835561)
INFO:giskard.scanner.logger:OverconfidenceDetector: Analyzing issues
INFO:giskard.datasets.base:Your 'pandas.DataFrame' is successfully wrapped by Giskard's 'Dataset' wrapper class.
INFO:giskard.scanner.logger:OverconfidenceDetector: Using overconfidence threshold = 0.5
INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'ob

OverconfidenceDetector: 6 issues detected. (Took 0:00:03.146652)
Running detector UnderconfidenceDetector…


INFO:giskard.scanner.logger:UnderconfidenceDetector: 56 slices found (took 0:00:04.439201)
INFO:giskard.scanner.logger:UnderconfidenceDetector: Analyzing issues
INFO:giskard.datasets.base:Your 'pandas.DataFrame' is successfully wrapped by Giskard's 'Dataset' wrapper class.
INFO:giskard.scanner.logger:UnderconfidenceDetector: Testing slice `Dependents` == "No"	Underconfidence rate (slice) = 0.028 (global 0.023) Δm = 0.226
INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', '

UnderconfidenceDetector: 4 issues detected. (Took 0:00:05.259773)
Running detector StochasticityDetector…
StochasticityDetector: 0 issue detected. (Took 0:00:00.051666)
Running detector EthicalBiasDetector…
EthicalBiasDetector: 0 issue detected. (Took 0:00:00.003023)
Running detector TextPerturbationDetector…
TextPerturbationDetector: 0 issue detected. (Took 0:00:00.003296)
Scan completed: 25 issues found. (Took 0:00:27.918871)


In [None]:

# Save the results to a html file
scan_results.to_html("scan_results.html")

# Save the results to a dataframe
results_df = scan_results.to_dataframe()

In [None]:
# Create a test suite from the scan results
test_suite = scan_results.generate_test_suite(name="My first test suite")

# You can run the test suite locally to verify that it reproduces the issues
test_suite.run()

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Recall on data slice “`Contract` == "One year"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9005210>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.0
               
               
Executed 'Recall on data slice “`tenure` >= 44.500 AND `tenure` < 70.500”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec8540760>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.06
               
               
Executed 'Recall on data slice “`PaymentMethod` == "Bank transfer"”' with arguments {'model': <

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Recall on data slice “`InternetService` == "DSL"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec8ff2fb0>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.24
               
               
Executed 'Recall on data slice “`TechSupport` == "Yes"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9004940>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.26
               
               
Executed 'Recall on data slice “`PaymentMethod` == "Credit card"”' with arguments {'model': <giskard.models.fu

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Recall on data slice “`PaperlessBilling` == "No"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec90054b0>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.35
               
               
Executed 'Recall on data slice “`PaymentMethod` == "Mailed check"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9005990>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.35
               
               


INFO:giskard.utils.logging_utils:Predicted dataset with shape (371, 20) executed in 0:00:00.213856
INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': '

Executed 'Overconfidence on data slice “`OnlineBackup` == "Yes"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9132860>, 'threshold': 0.2880952380952381, 'p_threshold': 0.5}: 
               Test failed
               Metric: 0.36
               
               
Executed 'Overconfidence on data slice “`PaperlessBilling` == "No"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec852b430>, 'threshold': 0.2880952380952381, 'p_threshold': 0.5}: 
               Test failed
               Metric: 0.35
               
               
Executed 'Overconfidence on data slice “`MultipleLine

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Overconfidence on data slice “`InternetService` == "DSL"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9132530>, 'threshold': 0.2880952380952381, 'p_threshold': 0.5}: 
               Test failed
               Metric: 0.3
               
               
Executed 'Overconfidence on data slice “`gender` == "Female"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec8ec14e0>, 'threshold': 0.2880952380952381, 'p_threshold': 0.5}: 
               Test failed
               Metric: 0.29
               
               
Executed 'Underconfidence on data slice “`Contract` == "M

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Underconfidence on data slice “`TechSupport` == "No"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec908cc70>, 'threshold': 0.025023696682464455, 'p_threshold': 0.95}: 
               Test failed
               Metric: 0.04
               
               
Executed 'Underconfidence on data slice “`Dependents` == "No"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9060bb0>, 'threshold': 0.025023696682464455, 'p_threshold': 0.95}: 
               Test failed
               Metric: 0.03
               
               


In [None]:
from giskard import testing

# Add a test to the test suite
test_suite = test_suite.add_test(testing.test_accuracy(wrapped_model, wrapped_data, threshold=0.80))

# Run the test suite
test_suite_results = test_suite.run()

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Recall on data slice “`Contract` == "One year"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9005210>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.0
               
               
Executed 'Recall on data slice “`tenure` >= 44.500 AND `tenure` < 70.500”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec8540760>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.06
               
               
Executed 'Recall on data slice “`PaymentMethod` == "Bank transfer"”' with arguments {'model': <

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Recall on data slice “`TotalCharges` >= 3485.375”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec8fbeb30>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.17
               
               
Executed 'Recall on data slice “`OnlineSecurity` == "Yes"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec8ff3970>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.2
               
               
Executed 'Recall on data slice “`InternetService` == "DSL"”' with arguments {'model': <giskard.models.functi

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Recall on data slice “`Dependents` == "Yes"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec8ff2230>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.33
               
               
Executed 'Recall on data slice “`PaperlessBilling` == "No"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec90054b0>, 'threshold': 0.5163763066202091}: 
               Test failed
               Metric: 0.35
               
               
Executed 'Recall on data slice “`PaymentMethod` == "Mailed check"”' with arguments {'model': <giskard.models.fu

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Overconfidence on data slice “`MultipleLines` == "Yes"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9130040>, 'threshold': 0.2880952380952381, 'p_threshold': 0.5}: 
               Test failed
               Metric: 0.31
               
               
Executed 'Overconfidence on data slice “`Partner` == "Yes"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec84ba740>, 'threshold': 0.2880952380952381, 'p_threshold': 0.5}: 
               Test failed
               Metric: 0.3
               
               
Executed 'Overconfidence on data slice “`InternetService` == 

INFO:giskard.datasets.base:Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'object', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCha

Executed 'Underconfidence on data slice “`OnlineSecurity` == "No"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec90626b0>, 'threshold': 0.025023696682464455, 'p_threshold': 0.95}: 
               Test failed
               Metric: 0.04
               
               
Executed 'Underconfidence on data slice “`TechSupport` == "No"”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec908cc70>, 'threshold': 0.025023696682464455, 'p_threshold': 0.95}: 
               Test failed
               Metric: 0.04
               
               
Executed 'Underconfidence on data slice “`Depende

INFO:giskard.core.suite:Recall on data slice “`Dependents` == "Yes"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec8ff2230>, 'threshold': 0.5163763066202091}): {failed, metric=0.3269230769230769}
INFO:giskard.core.suite:Recall on data slice “`PaperlessBilling` == "No"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec90054b0>, 'threshold': 0.5163763066202091}): {failed, metric=0.34782608695652173}
INFO:giskard.core.suite:Recall on data slice “`PaymentMethod` == "Mailed check"” ({'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset obje

In [None]:
# Extract the values of the test suite results using the `results` attribute
# The format of the results is a list of dictionaries
test_suite_results.results

[SuiteResult(test_name='Recall on data slice “`Contract` == "One year"”', result=
                Test failed
                Metric: 0.0
                
                , params={'model': <giskard.models.function.PredictionFunctionModel object at 0x78cec9a60a90>, 'dataset': <giskard.datasets.base.Dataset object at 0x78cec9b4ea70>, 'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9005210>, 'threshold': 0.5163763066202091}, test_partial=TestPartial(giskard_test=To execute the test call "execute()" method
 Named inputs: {'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9005210>, 'threshold': 0.5163763066202091}, provided_inputs={'slicing_function': <giskard.slicing.slice.QueryBasedSliceFunction object at 0x78cec9005210>, 'threshold': 0.5163763066202091}, test_id='Recall on data slice “`Contract` == "One year"”', display_name='Recall on data slice “`Contract` == "One year"”', suite_test_id=None)),
 SuiteResult(test_nam

In [None]:

# Save the model
import pickle

with open("/content/model.pkl", "wb") as f:
    pickle.dump(pipeline, f)

import os

# Create directory if it doesn't exist
os.makedirs('/content/data', exist_ok=True)

# Save the dataset
raw_data.to_csv('/content/data/validation_data.csv', index=False)



In [None]:
%%writefile giskard_validation.py
import os
import re
import pickle
import numpy as np
import pandas as pd
import warnings
import logging

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from giskard import Dataset, Model, scan, testing

warnings.filterwarnings("ignore")

# Set up logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s"
)

CATEGORICAL_COLUMNS = ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod']
NUMERIC_COLUMNS = ['tenure', 'MonthlyCharges', 'TotalCharges']

# Load the validation dataset
logging.info("Loading the validation dataset")
validation_df = pd.read_csv('data/validation_data.csv')
validation_df[CATEGORICAL_COLUMNS] = validation_df[CATEGORICAL_COLUMNS].astype('object')
FEATURES = [col for col in validation_df.columns if col != 'Churn']

# Load the model
logging.info("Loading the model")
with open("model/model.pkl", "rb") as f:
    pipeline = pickle.load(f)

# Wrap the dataset with Giskard
logging.info("Wrapping the dataset with Giskard")
wrapped_data = Dataset(
    df = validation_df,  # A pandas.DataFrame that contains the raw data (before all the pre-processing steps) and the actual ground truth variable
    target = 'Churn',  # Ground truth variable
    name = "Churn classification dataset",  # Optional
    cat_columns = CATEGORICAL_COLUMNS  # List of categorical columns. Optional, but is a MUST if available. Inferred automatically if not.
)

# Define the prediction function
logging.info("Defining the prediction function")
def prediction_function(df: pd.DataFrame) -> np.ndarray:
    return pipeline.predict_proba(df)

# Wrap the model with Giskard
logging.info("Wrapping the model with Giskard")
wrapped_model = Model(
    model = prediction_function,                # A prediction function that encapsulates all the data pre-processing steps and that could be executed with the dataset used by the scan.
    model_type = "classification",              # Either regression, classification or text_generation.
    classification_labels = pipeline.classes_,  # Their order MUST be identical to the prediction_function's output order
    name = "Churn classification",              # Name of the wrapped model [Optional]
    feature_names = FEATURES,                   # Default: all columns of your dataset [Optional]
    classification_threshold = 0.5              # Default: 0.5 [Optional]
)

# Scan the model
logging.info("Scanning the model")
scan_results = scan(wrapped_model, wrapped_data)

# Create a test suite from the scan results and add custom tests
logging.info("Creating a test suite from the scan results and adding custom tests")
test_suite = scan_results.generate_test_suite("My first test suite")
test_suite = test_suite.add_test(testing.test_accuracy(wrapped_model, wrapped_data, threshold=0.75))
test_suite_results = test_suite.run()

if scan_results.has_issues():
    print("Your model has vulnerabilities")
else:
    print("Your model is safe")

# Extract the values of the test suite results using the `results` attribute
logging.info("Extracting the values of the test suite results using the `results` attribute")
output = dict()
for idx, test_result in enumerate(test_suite_results.results):
    test_name = re.sub('"|`|"|"', "", test_result[0])
    output[test_name] = {
        "Status": test_result[1].passed,
        "Threshold": test_result[2]["threshold"],
        "Score": test_result[1].metric,
    }

# To log the results to a pull request comment,
# save the results as a GitHub environment variable
logging.info("Saving the results as a GitHub environment variable")
import json
with open(os.getenv("GITHUB_ENV"), 'a') as fh:
    fh.write(f'TEST_RESULT={json.dumps(output)}')

Writing giskard_validation.py


In [None]:
!mkdir -p .github/workflows
!touch .github/workflows/ci-cd.yml


In [None]:
!ls -la

total 288
drwxr-xr-x 1 root root   4096 Sep 20 03:57 .
drwxr-xr-x 1 root root   4096 Sep 20 03:53 ..
drwxr-xr-x 4 root root   4096 Sep 18 13:24 .config
drwxr-xr-x 2 root root   4096 Sep 20 03:57 data
-rw-r--r-- 1 root root   3899 Sep 20 03:57 giskard_validation.py
drwxr-xr-x 3 root root   4096 Sep 20 03:57 .github
-rw-r--r-- 1 root root   4105 Sep 20 03:57 model.pkl
-rw-r--r-- 1 root root    197 Sep 20 03:55 Pipfile
-rw-r--r-- 1 root root  82101 Sep 20 03:55 Pipfile.lock
drwxr-xr-x 1 root root   4096 Sep 18 13:25 sample_data
-rw-r--r-- 1 root root 164634 Sep 20 03:57 scan_results.html


In [None]:
# Define the content of the ci-cd.yml file
ci_cd_content = '''
name: Giskard-CI-CD

on:
  push:
    paths:
      - 'data/**'
      - 'model/**'
      - 'giskard_validation.py'
      - 'README.md'
    branches:
      - main
      - feature

permissions:
  contents: read

jobs:
  run-giskard-test-suite:
    name: Giskard-Test-Suite
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write

    steps:
      - name: Checkout Code
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install dependencies
        run: |
          pip install --upgrade pip
          pip install pipenv

      - name: Install dependencies
        working-directory: .
        run: pipenv install --system --deploy

      - name: Execute Giskard Test Suite
        run: python giskard_validation.py

      - name: Sanitize TEST_RESULT
        run: |
          # Safely replace Unicode characters in TEST_RESULT
          TEST_RESULT_ESCAPED=$(echo "${TEST_RESULT}" | sed 's/\\u201c/"/g' | sed 's/\\u201d/"/g')
          echo "Sanitized TEST_RESULT: $TEST_RESULT_ESCAPED"
          # Export the sanitized value for further steps
          echo "TEST_RESULT_ESCAPED=$TEST_RESULT_ESCAPED" >> $GITHUB_ENV

      - name: PR comment
        uses: actions/github-script@v6
        with:
          script: |
            try {
              const issue_number = context.payload.pull_request ? context.payload.pull_request.number : null;
              if (issue_number) {
                if (!process.env.TEST_RESULT_ESCAPED) {
                  console.log("No TEST_RESULT_ESCAPED found.");
                  return;
                }
                let testResults;
                try {
                  testResults = JSON.stringify(JSON.parse(process.env.TEST_RESULT_ESCAPED), null, 2);
                } catch (error) {
                  console.log("Error parsing TEST_RESULT_ESCAPED:", error.message);
                  return;
                }
                github.rest.issues.createComment({
                  issue_number: issue_number,
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  body: `Test Suites Results:\n\n\`\`\`json\n${testResults}\n\`\`\``
                });
              } else {
                console.log("No issue number found. This may not be a pull request event.");
              }
            } catch (error) {
              console.log("Error in PR comment action:", error.message);
            }
'''

# Write the content to the ci-cd.yml file
with open('.github/workflows/ci-cd.yml', 'w') as file:
    file.write(ci_cd_content)


In [None]:
import shutil

# Replace 'old_file_path' and 'new_folder_path' with the actual paths
old_file_path = "/content/model.pkl"
new_folder_path = "/content/model"

shutil.move(old_file_path, new_folder_path)

'/content/model/model.pkl'

In [None]:
# Step 1: Initialize the git repository
!git init

# Step 2: Add the remote repository
!git remote add origin https://github.com/toniramchandani1/CICDTestingAIApps.git

# Step 3: Configure Git
!git config --global user.email "ramchandani.toni@example.com"
!git config --global user.name "Toni Ramchandani"

# Step 4: Add the files to the staging area
!git add .

# Step 5: Commit the files
!git commit -m "Add CI/CD pipeline"

# Step 6: Create a new branch and switch to it
!git checkout -b feature

# Step 7: Push the files to GitHub
import getpass
token = getpass.getpass('Enter your GitHub Personal Access Token: ')
!git push https://{token}@github.com/toniramchandani1/CICDTestingAIApps.git feature


Reinitialized existing Git repository in /content/.git/
error: remote origin already exists.
On branch feature
nothing to commit, working tree clean
fatal: A branch named 'feature' already exists.
Enter your GitHub Personal Access Token: ··········
Enumerating objects: 39, done.
Counting objects: 100% (39/39), done.
Delta compression using up to 2 threads
Compressing objects: 100% (28/28), done.
Writing objects: 100% (39/39), 8.50 MiB | 1.75 MiB/s, done.
Total 39 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), done.[K
To https://github.com/toniramchandani1/CICDTestingAIApps.git
 * [new branch]      feature -> feature


In [None]:
!git add Pipfile Pipfile.lock
!git commit -m "Add Pipfile and Pipfile.lock for dependency management"


On branch feature
nothing to commit, working tree clean


In [None]:
!git fetch origin
!git checkout feature

Already on 'feature'


In [None]:
!git merge origin/feature


merge: origin/feature2 - not something we can merge


In [None]:
!git add .
!git commit -m "Piplock file"


On branch feature
nothing to commit, working tree clean


In [None]:
import getpass
token = getpass.getpass('Enter your GitHub Personal Access Token: ')
!git push https://{token}@github.com/toniramchandani1/CICDAITesting.git feature


Enter your GitHub Personal Access Token: ··········
To https://github.com/toniramchandani1/CICDAITesting.git
 [31m! [rejected]       [m feature -> feature (fetch first)
[31merror: failed to push some refs to 'https://github.com/toniramchandani1/CICDAITesting.git'
[m[33mhint: Updates were rejected because the remote contains work that you do[m
[33mhint: not have locally. This is usually caused by another repository pushing[m
[33mhint: to the same ref. You may want to first integrate the remote changes[m
[33mhint: (e.g., 'git pull ...') before pushing again.[m
[33mhint: See the 'Note about fast-forwards' in 'git push --help' for details.[m


In [None]:
import getpass

# Get your GitHub Personal Access Token securely (don't copy this line)
token = getpass.getpass("Enter your GitHub Personal Access Token: ")

# Fetch remote changes before pushing
!git fetch origin

# Integrate remote changes (choose either merge or rebase)
# Option 1: Merge (may cause conflicts)
#!git checkout feature
#!git merge origin/feature

# Option 2: Rebase (cleaner but rewrites history)
!git checkout feature
!git rebase origin/feature

# Push your changes after integrating remote updates
!git push https://{token}@github.com/toniramchandani1/CICDTestingAIApps.git feature


Enter your GitHub Personal Access Token: ··········
Already on 'feature'
Current branch feature is up to date.
Everything up-to-date
