# Example usage

The following is an example usage of H_LLM.

**Very important**. H-LLM is *not* a product or an algorithm intended to be used outside of the viability setups. As written in the paper, all the studies are *viability* studies that are intended to show the *viability* of self-healing machine learning. We do not recommend using H-LLM as software in any real-life system. 

This H-LLM class was developed for the sole purpose of showing that the ideas presented in the paper are viable. It is not intended for any other applications. 

Most importantly, we hope that this work spurs new self-healing research within the field.

In [2]:
import os
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from openai_config import get_openai_config  # Ensure this module contains OpenAI API configuration
from openai import AzureOpenAI
from HLLM import H_LLM  # Import the H_LLM class


In [None]:

# ===============================
# Setup Configuration
# ===============================

# Load configuration for Azure OpenAI
config = get_openai_config()
llm = AzureOpenAI(
    api_version=config["api_version"],
    azure_endpoint=config["api_base"],
    api_key=config["api_key"]
)

# Initialize the H_LLM instance with a context
context = "The goal is to hypothesize concrete reasons for why the model has underperformed."
H = H_LLM(config=config, llm=llm, context=context)

# ===============================
# Data Generation Function
# ===============================

def generate_diabetes_data(n_samples, seed):
    """
    Generates synthetic diabetes data for binary classification.
    """
    np.random.seed(seed)
    HbA1c = np.random.normal(5.7, 0.5, n_samples)
    FastingGlucose = np.random.normal(100, 15, n_samples)
    Age = np.random.normal(50, 12, n_samples)
    BMI = np.random.normal(25, 4, n_samples)
    BloodPressure = np.random.normal(120, 15, n_samples)
    Cholesterol = np.random.normal(200, 40, n_samples)
    Insulin = np.random.normal(85, 45, n_samples)
    PhysicalActivity = np.random.normal(3, 1, n_samples)
    
    # Combine features into a DataFrame
    X = np.vstack((HbA1c, FastingGlucose, Age, BMI, BloodPressure, Cholesterol, Insulin, PhysicalActivity)).T
    columns = ['HbA1c', 'FastingGlucose', 'Age', 'BMI', 'BloodPressure', 'Cholesterol', 'Insulin', 'PhysicalActivity']
    data = pd.DataFrame(X, columns=columns)
    
    # Create synthetic binary outcomes
    coefficients = np.array([0.3, 0.01, -0.02, 0.04, 0.05, -0.03, -0.01, -0.1])
    noise = np.random.normal(0, 0.2, n_samples)
    linear_combination = np.dot(X, coefficients) + noise
    probabilities = 1 / (1 + np.exp(-linear_combination))
    outcomes = (probabilities > 0.5).astype(int)
    
    return data, outcomes


# Example 1: Basic Hypothesis Generation

In [None]:
# ===============================
# Example 1: Basic Hypothesis Generation
# ===============================

def example_hypothesis_generation():
    print("=== Example 1: Basic Hypothesis Generation ===")
    
    # Generate initial training data and shifted test data
    X_before, y_before = generate_diabetes_data(1000, seed=0)
    X_after, y_after = generate_diabetes_data(1000, seed=1)
    
    # Introduce a shift in the 'Age' column to simulate data drift
    X_after['Age'] *= 1.5

    # Initialize a logistic regression model and fit it on the original data
    model = LogisticRegression()
    model.fit(X_before, y_before)
    
    # Check model accuracy on shifted data
    preds_before = model.predict(X_before)
    preds_after = model.predict(X_after)
    acc_before = accuracy_score(y_before, preds_before)
    acc_after = accuracy_score(y_after, preds_after)
    
    print(f"Accuracy before shift: {acc_before:.2f}")
    print(f"Accuracy after shift: {acc_after:.2f}")
    
    # Use H_LLM to hypothesize issues with model performance
    hypotheses = H.hypothesize_issues_with_performance(X_before, X_after, y_before, y_after, model, context)
    print("Hypotheses generated by H_LLM:")
    print(hypotheses)
    
example_hypothesis_generation()

=== Example 1: Basic Hypothesis Generation ===
Accuracy before shift: 0.98
Accuracy after shift: 0.89


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Hypotheses generated by H_LLM:
Covariate: Age; Hypothesis: The distribution of Age has shifted significantly to the right, which might have affected the model's performance; Evidence: The mean of Age has increased from 49.385257 to 74.600966, and the minimum value has increased from 12.597721 to 19.845456; Strength of belief: Extremely confident.

Covariate: BMI; Hypothesis: The distribution of BMI has shifted slightly to the right, which might have affected the model's performance; Evidence: The mean of BMI has increased from 24.924230 to 25.014395, and the minimum value has increased from 10.039597 to 13.511799; Strength of belief: Confident.

Covariate: BloodPressure; Hypothesis: The distribution of BloodPressure has shifted slightly to the right, which might have affected the model's performance; Evidence: The mean of BloodPressure has increased from 120.422710 to 121.108888, and the minimum value has decreased from 74.888442 to 71.204486; Strength of belief: Confident.

Covariate:

# Example 2: Covariate Combination Hypotheses


In [None]:
# ===============================
# Example 2: Covariate Combination Hypotheses
# ===============================

def example_covariate_combinations():
    print("\n=== Example 2: Covariate Combination Hypotheses ===")
    
    model = LogisticRegression()
    # Generate initial training data and shifted test data
    X_before, y_before = generate_diabetes_data(1000, seed=42)
    X_after, y_after = generate_diabetes_data(1000, seed=24)

    model.fit(X_before, y_before)

    
    # Introduce shifts in multiple covariates
    X_after['FastingGlucose'] *= 1.2
    X_after['BMI'] += 5
    
    # Use H_LLM to generate hypotheses for covariate combinations
    hypotheses, queries = H.hypothesize_issues_with_performance_covariate_combinations(X_before, X_after, y_before, y_after, model, context)
    
    print("Covariate combination hypotheses:")
    print(hypotheses)
    print("\nSuggested queries for data filtering:")
    print(queries)

example_covariate_combinations()



=== Example 2: Covariate Combination Hypotheses ===


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Covariate combination hypotheses:
[[5. The mean of BMI has increased significantly after the shift, which might have affected the model's performance.],
[1. The mean of FastingGlucose has increased significantly after the shift, which might have affected the model's performance.],
[4. The maximum value of FastingGlucose has increased significantly, which might have affected the model's performance.],
[8. The maximum value of BMI has increased significantly, which might have affected the model's performance.],
[7. The minimum value of BMI has increased significantly, which might have affected the model's performance.],
[3. The minimum value of FastingGlucose has increased, which might have affected the model's performance.],
[2. The standard deviation of FastingGlucose has also increased, indicating a wider spread of data which might have affected the model's performance.],
[6. The standard deviation of BMI has decreased, indicating a narrower spread of data which might have affected th

# Example 3: Adaptive Model Retraining


In [10]:
# ===============================
# Example 3: Adaptive Model Retraining
# ===============================

def example_adaptive_retraining():
    print("\n=== Example 3: Adaptive Model Retraining ===")
    
    # Generate initial training data and shifted test data
    X_before, y_before = generate_diabetes_data(1000, seed=123)
    X_after, y_after = generate_diabetes_data(1000, seed=321)
    
    # Simulate a shift in 'BloodPressure'
    X_after['BloodPressure'] *= 1.3
    
    # Split X_after into a backtesting set
    X_after, X_backtest, y_after, y_backtest = train_test_split(X_after, y_after, test_size=0.2, random_state=42)
    
    # Initialize a logistic regression model
    model = LogisticRegression()
    model.fit(X_before, y_before)
    
    # Use H_LLM to fit the model adaptively based on covariate shifts
    adapted_model = H.fit_model(model, X_before, X_after, y_after, X_backtest, y_backtest)
    
    # Evaluate the adapted model
    preds = adapted_model.predict(X_backtest)
    acc = accuracy_score(y_backtest, preds)
    print(f"Accuracy of the adapted model: {acc:.2f}")


# Run examples
example_hypothesis_generation()

=== Example 1: Basic Hypothesis Generation ===
Accuracy before shift: 0.98
Accuracy after shift: 0.89


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Hypotheses generated by H_LLM:
Covariate: Age; Hypothesis: The significant shift in the Age covariate has likely caused the model to underperform; Evidence: The mean of Age has significantly increased after the shift. Also, the model's accuracy has decreased in most of the ranges of Age after the shift; Strength of belief: Extremely confident.

Covariate: HbA1c; Hypothesis: The model's performance has degraded due to the shift in the HbA1c covariate; Evidence: The mean of HbA1c has slightly increased after the shift. Also, the model's accuracy has decreased in most of the ranges of HbA1c after the shift; Strength of belief: Confident.

Covariate: FastingGlucose; Hypothesis: The shift in the FastingGlucose covariate might have contributed to the model's underperformance; Evidence: The mean of FastingGlucose has slightly increased after the shift. However, the model's accuracy has not significantly changed in most of the ranges of FastingGlucose after the shift; Strength of belief: Somew

# Example 4: Suggesting Data Removal Queries

In [13]:
def generate_synthetic_data(n_samples, seed):
    """
    Generates synthetic dataset with numerical features.
    """
    np.random.seed(seed)
    HbA1c = np.random.normal(5.7, 0.5, n_samples)
    FastingGlucose = np.random.normal(100, 15, n_samples)
    Age = np.random.normal(50, 12, n_samples)
    BMI = np.random.normal(25, 4, n_samples)
    BloodPressure = np.random.normal(120, 15, n_samples)
    Cholesterol = np.random.normal(200, 40, n_samples)
    Insulin = np.random.normal(85, 45, n_samples)
    PhysicalActivity = np.random.normal(3, 1, n_samples)
    
    # Combine features into a DataFrame
    columns = ['HbA1c', 'FastingGlucose', 'Age', 'BMI', 'BloodPressure', 'Cholesterol', 'Insulin', 'PhysicalActivity']
    data = pd.DataFrame(np.vstack((HbA1c, FastingGlucose, Age, BMI, BloodPressure, Cholesterol, Insulin, PhysicalActivity)).T, columns=columns)
    
    return data

# ===============================
# Example 4: Suggesting Data Removal Queries
# ===============================

def example_suggest_data_removal_queries():
    print("\n=== Example 2: Suggesting Data Removal Queries ===")

    # Generate initial dataset and shifted dataset
    X_before = generate_synthetic_data(1000, seed=42)
    X_after = generate_synthetic_data(1000, seed=24)

    # Introduce shifts in multiple covariates
    X_after['Age'] *= 1.5
    X_after['BloodPressure'] += 20

    # Use H_LLM to hypothesize issues and suggest data removal queries
    issues = H.hypothesize_issues(X_before, X_after, context)
    removal_queries = H.suggest_solutions_remove_data(issues, X_before, X_after)

    print("Suggested queries to remove problematic data:")
    print(removal_queries)

    # Convert the queries into a list format for use with pandas filtering
    formatted_queries = H.convert_to_list_of_queries(removal_queries, X_before)
    print("\nFormatted queries for pandas filtering:")
    print(formatted_queries)

example_suggest_data_removal_queries()


=== Example 2: Suggesting Data Removal Queries ===
Suggested queries to remove problematic data:
1. Subgroup: Age > 90; Reason: The maximum age has increased significantly, which could be skewing the data.
2. Subgroup: Blood Pressure > 160; Reason: The mean blood pressure has increased significantly, removing higher values could normalize the data.
3. Subgroup: Fasting Glucose < 60; Reason: The mean fasting glucose has decreased, removing lower values could normalize the data.
4. Subgroup: Insulin < 0; Reason: The minimum value for insulin is negative, which is not possible, suggesting errors in the data.
5. Subgroup: Physical Activity < 1; Reason: The mean physical activity has decreased and the minimum value is negative, suggesting errors in the data.
6. Subgroup: BMI > 35; Reason: The mean BMI has increased slightly, removing higher values could normalize the data.
7. Subgroup: Cholesterol > 300; Reason: The mean cholesterol has increased slightly, removing higher values could norm