<a href="https://colab.research.google.com/github/fjgarate/colab/blob/main/Dickens_zephyr7b_beta_pre_finetuning_steps.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Dickens: the LLM that writes Great Expectations** 🙌

In this notebook, we will create a fine-tuned version of Zephyr 7B Beta, called Dickens-Zephyr7B-beta. This LLM will receive natural language, related to data quality, and will output Great Expectations. In this initial version, only core Great Expectations will be presented. For a full list of expectations available check: https://greatexpectations.io/expectations/

## Important notice

The original notebook has been split into two notebooks. Otherwise, it would fail with "Out of Memory error" when running in free colab tier.

- Notebook 1 (this one): explains the problem we are trying to solve, plus some initial tests without fine-tuning.
- Notebook 2 ([link](https://colab.research.google.com/drive/1P30YSoemEoeaLACyJqzk-M15SGR2mYpW?usp=sharing)): introduces the Dickens dataset plus the fine-tuning code.

Feel free to check both sequentially to reproduce the whole experiments.

# **Example: Using Dickens LLM for Great Expectations Generation** 💻

In this notebook, the model will take natural language as input, and should a valid customized expectation. Following the original Ludwig notebook, we will first try using the base model with prompting, then  instruction-fine-tune the model.

As an example, if we prompt the model with this instruction:

```
Instruction: DIVISION names should be either the values NSA or start by D.
```

We want the model to produce exactly this response:

```
Response: expect_column_values_to_match_regex(column='DIVISION',regex='NSA|^D.*')
```



### **Install Ludwig and Ludwig's LLM related dependencies.**

Install Ludwig from the latest release

In [None]:
!pip uninstall -y tensorflow --quiet
!pip install ludwig[full] --quiet
#!pip install ludwig[full]==0.9.1 --quiet
!pip install fastapi --quiet
!pip install tiktoken --quiet
!pip install cohere --quiet
!pip install --upgrade git+https://github.com/huggingface/peft.git --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.1 MB[0m [31m1.3 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/1.1 MB[0m [31m1.5 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.2/1.1 MB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m0.5/1.1 MB[0m [31m3.9 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.1/1.1 MB[0m [31m6.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel .

Enable text wrapping so we don't have to scroll horizontally and create a function to flush CUDA cache.

In [None]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))

# get_ipython().events.register('pre_run_cell', set_css)

def clear_cache():
  if torch.cuda.is_available():
    torch.cuda.empty_cache()

Setup Weights & Biases to track our experiments' performance

In [None]:
!pip install wandb timm fastprogress transformers datasets -Uqqq

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m47.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.5/8.5 MB[0m [31m55.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m195.4/195.4 kB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.5/258.5 kB[0m [31m28.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# Setting up base Weights and Biases configuration. We import the library here, so we don't get any errors or warnings downstream
import wandb

# To enable logging the results, set WANDB_MODE = True
WANDB_MODE = False

In [None]:
if WANDB_MODE:
  wandb.login()

# Our baseline configuration will use the HuggingFaceH4/zephyr-7b-beta model, and the BirdiDQ dataset
  wandb.init(
    project="dickens-zephyr",
    config={
        "model": "HuggingFaceH4/zephyr-7b-beta",
        "dataset": "BirdiDQ dataset",
        })



### **Import The Code Generation Dataset** 📋



In [None]:
# from google.colab import data_table; data_table.enable_dataframe_formatter()
import numpy as np; np.random.seed(123)
import pandas as pd

birdi_df = pd.read_json("https://raw.githubusercontent.com/BirdiD/BirdiDQ/master/great_expectations/finetuning_template/data/train.json")

# We're going to create a new column called `split` where:
# 80% will be assigned a value of 0 -> train set
# 10% will be assigned a value of 1 -> validation set
# 10% will be assigned a value of 2 -> test set

# Calculate the number of rows for each split value
total_rows = len(birdi_df)
split_0_count = int(total_rows * 0.8)
split_1_count = int(total_rows * 0.1)
split_2_count = total_rows - split_0_count - split_1_count

# Create an array with split values based on the counts
split_values = np.concatenate([
    np.zeros(split_0_count),
    np.ones(split_1_count),
    np.full(split_2_count, 2)
])

# Shuffle the array to ensure randomness
np.random.shuffle(split_values)

# Add the 'split' column to the DataFrame
birdi_df['split'] = split_values
birdi_df['split'] = birdi_df['split'].astype(int)

# Given the dataset is only 250 examples, we will the whole file
birdi_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 252 entries, 0 to 251
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   prompt      252 non-null    object
 1   completion  252 non-null    object
 2   split       252 non-null    int64 
dtypes: int64(1), object(2)
memory usage: 6.0+ KB


In [None]:
# export birdi_df to csv
birdi_df.to_csv('birdi_df.csv', index=False)

## **Understanding The Original BirdiDQ Dataset** 📖

The original dataset to fine-tune the Great Expectations is available at https://raw.githubusercontent.com/BirdiD/BirdiDQ/master/great_expectations/finetuning_template/data/train.json under a Apache 2.0 license.

Let's take a look to the data!






In [None]:
birdi_df.head(10)

Unnamed: 0,prompt,completion,split
0,Ensure that MIT University graduate proportion...,expect_column_proportion_of_unique_values_to_b...,1
1,Check that at least 50% of the values in the s...,expect_column_values_to_be_between(column='sal...,1
2,Verify if the values in the price column are n...,expect_column_values_to_not_be_null_and_column...,1
3,"Does the median value of revenue, for records ...",expect_column_median_to_be_between(column='rev...,0
4,Verify if the values in the description column...,expect_column_values_to_not_be_in_set(column='...,2
5,Check that none of the values in the name colu...,expect_column_values_to_not_match_regex(column...,0
6,Verify if the values in the revenue column are...,expect_column_pair_values_a_to_be_greater_than...,0
7,Check if the values in the event_time column m...,expect_column_values_to_match_strftime_format(...,1
8,"Check that in 85% of the cases, the values in ...",expect_column_values_to_be_null(column='salary...,0
9,Verify that the lengths of values in the title...,expect_column_value_lengths_to_be_between(colu...,1


Each row in the dataset consists of an:
- `prompt` that describe the data quality requirements
- `completion` that provides the expectation rule with the given parameters.

Now, we will extract the distribution of the expectations used in this original dataset.

In [None]:
import pandas as pd
import re
from collections import Counter

def extract_function_names(completion_series):
    """
    Extract Python function names from a pandas Series using regular expressions.

    :param completion_series: pandas Series containing strings with function calls
    :return: List of extracted function names
    """
    # Regular expression pattern for Python function names
    pattern = r"\b\w+\b(?=\()"

    # Extract all matches
    function_names = []
    for item in completion_series:
        matches = re.findall(pattern, item)
        function_names.extend(matches)

    return function_names

def calculate_frequency(function_names):
    """
    Calculate the frequency of each function name in a list.

    :param function_names: List of function names
    :return: Dictionary with function names as keys and their frequencies as values
    """
    return Counter(function_names)

birdi_df_function_names = extract_function_names(birdi_df['completion'])
birdi_df_frequency_count = calculate_frequency(birdi_df_function_names)

sorted_frequency_count = dict(sorted(birdi_df_frequency_count.items(), key=lambda item: item[1], reverse=True))

for function, count in sorted_frequency_count.items():
    print(f"{function}: {count}")



expect_column_values_to_not_be_null_and_column_to_not_be_empty: 16
expect_column_values_to_not_be_in_set: 16
expect_column_values_to_be_between: 15
expect_column_values_to_match_strftime_format: 15
expect_column_values_to_be_unique: 15
expect_column_to_exist: 15
expect_column_values_to_match_regex: 15
expect_column_values_to_not_be_null: 14
expect_column_values_to_be_null: 10
expect_column_pair_values_a_to_be_greater_than_b: 8
expect_column_values_to_be_in_set: 8
expect_column_proportion_of_unique_values_to_be_between: 6
expect_column_value_lengths_to_be_between: 6
expect_column_most_common_value_to_be_in_set: 6
expect_column_median_to_be_between: 5
expect_column_values_to_not_match_regex: 5
expect_column_quantile_values_to_be_between: 5
expect_column_pair_values_to_be_in_set: 5
expect_column_values_to_be_increasing: 5
expect_column_mean_to_be_between: 5
expect_column_values_to_not_match_regex_list: 5
expect_column_values_to_be_decreasing: 5
expect_column_unique_value_count_to_be_betwe

In [None]:
# Summarize the dataset

summary_data = {
    "Total Rows in DataFrame": [len(birdi_df)],
    "Number of Different Functions": [len(sorted_frequency_count)]
}
summary_df = pd.DataFrame(summary_data)

print("\nSummary Information:")
print(summary_df)


Summary Information:
   Total Rows in DataFrame  Number of Different Functions
0                      252                             33


In [None]:
# Based on a local install of Great Expectations, we generate a list of the whole core Great Expectations library

gx_core_list_of_functions = """
expect_column_bootstrapped_ks_test_p_value_to_be_greater_than
expect_column_chisquare_test_p_value_to_be_greater_than
expect_column_distinct_values_to_be_in_set
expect_column_distinct_values_to_contain_set
expect_column_distinct_values_to_equal_set
expect_column_kl_divergence_to_be_less_than
expect_column_max_to_be_between
expect_column_mean_to_be_between
expect_column_median_to_be_between
expect_column_min_to_be_between
expect_column_most_common_value_to_be_in_set
expect_column_pair_cramers_phi_value_to_be_less_than
expect_column_pair_values_a_to_be_greater_than_b
expect_column_pair_values_to_be_equal
expect_column_pair_values_to_be_in_set
expect_column_parameterized_distribution_ks_test_p_value_to_be_greater_than
expect_column_proportion_of_unique_values_to_be_between
expect_column_quantile_values_to_be_between
expect_column_stdev_to_be_between
expect_column_sum_to_be_between
expect_column_to_exist
expect_column_unique_value_count_to_be_between
expect_column_value_lengths_to_be_between
expect_column_value_lengths_to_equal
expect_column_value_z_scores_to_be_less_than
expect_column_values_to_be_between
expect_column_values_to_be_dateutil_parseable
expect_column_values_to_be_decreasing
expect_column_values_to_be_in_set
expect_column_values_to_be_in_type_list
expect_column_values_to_be_increasing
expect_column_values_to_be_json_parseable
expect_column_values_to_be_null
expect_column_values_to_be_of_type
expect_column_values_to_be_unique
expect_column_values_to_match_json_schema
expect_column_values_to_match_like_pattern_list
expect_column_values_to_match_like_pattern
expect_column_values_to_match_regex_list
expect_column_values_to_match_regex
expect_column_values_to_match_strftime_format
expect_column_values_to_not_be_in_set
expect_column_values_to_not_be_null
expect_column_values_to_not_match_like_pattern_list
expect_column_values_to_not_match_like_pattern
expect_column_values_to_not_match_regex_list
expect_column_values_to_not_match_regex
expect_compound_columns_to_be_unique
expect_multicolumn_sum_to_equal
expect_multicolumn_values_to_be_unique
expect_select_column_values_to_be_unique_within_record
expect_table_column_count_to_be_between
expect_table_column_count_to_equal
expect_table_columns_to_match_ordered_list
expect_table_columns_to_match_set
expect_table_row_count_to_be_between
expect_table_row_count_to_equal_other_table
expect_table_row_count_to_equal
"""

# Split the list into individual function names
gx_core_function_names = gx_core_list_of_functions.strip().split('\n')

gx_core_function_names

['expect_column_bootstrapped_ks_test_p_value_to_be_greater_than',
 'expect_column_chisquare_test_p_value_to_be_greater_than',
 'expect_column_distinct_values_to_be_in_set',
 'expect_column_distinct_values_to_contain_set',
 'expect_column_distinct_values_to_equal_set',
 'expect_column_kl_divergence_to_be_less_than',
 'expect_column_max_to_be_between',
 'expect_column_mean_to_be_between',
 'expect_column_median_to_be_between',
 'expect_column_min_to_be_between',
 'expect_column_most_common_value_to_be_in_set',
 'expect_column_pair_cramers_phi_value_to_be_less_than',
 'expect_column_pair_values_a_to_be_greater_than_b',
 'expect_column_pair_values_to_be_equal',
 'expect_column_pair_values_to_be_in_set',
 'expect_column_parameterized_distribution_ks_test_p_value_to_be_greater_than',
 'expect_column_proportion_of_unique_values_to_be_between',
 'expect_column_quantile_values_to_be_between',
 'expect_column_stdev_to_be_between',
 'expect_column_sum_to_be_between',
 'expect_column_to_exist',
 '

In [None]:
# Number of functions in the core Great Expectations library
print(f"Number of functions in the core Great Expectations library: {len(gx_core_function_names)}")

Number of functions in the core Great Expectations library: 58


As we can see, only 33 out of the 58 expectations available in Great Expectations are used in the BirdiDQ dataset. We should take this into consideration when using it for fine-tuning, as several core functions are missing. Specifically, the following expectations are not present in the dataset:

In [None]:
# Table with the expectations from the core Great Expectations library that are absent from the BirdiDQ dataset

missing_gx_core_functions = set(gx_core_function_names) - set(sorted_frequency_count.keys())
missing_gx_core_functions_df = pd.DataFrame(missing_gx_core_functions, columns=['Missing Expectations'])
missing_gx_core_functions_df

Unnamed: 0,Missing Expectations
0,expect_column_pair_cramers_phi_value_to_be_les...
1,expect_column_chisquare_test_p_value_to_be_gre...
2,expect_table_columns_to_match_ordered_list
3,expect_column_values_to_be_dateutil_parseable
4,expect_table_column_count_to_equal
5,expect_column_values_to_match_like_pattern
6,expect_column_values_to_match_like_pattern_list
7,expect_compound_columns_to_be_unique
8,expect_column_values_to_match_regex_list
9,expect_table_row_count_to_equal_other_table


We will address the unbalanced distribution of expectations in the dataset after evaluating the performance of the model.

## Evaluating model performance

Given we are planning to fine-tune a model, the first step is to evaluate the performance of the base model. This will allow us to compare the performance of the fine-tuned model with the base model. For this proposal, we will be using Mistral-7B-v0.1, as it's a small model that offers good performance producing code.

In [None]:
!pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U bitsandbytes==0.40.0 # required for Ludwig
!pip install -q -U pip ipywidgets


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.9/91.9 MB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.4/139.4 kB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m39.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# The original colab notebook is missing these imports!
import logging
import yaml

from peft import PeftModel, PeftModelForCausalLM, PeftConfig, LoraConfig
from ludwig.api import LudwigModel, TrainingResults


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda122_nocublaslt.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 122
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda122_nocublaslt.so...


To evaluate the performance of the models, we will use a common golden standard (given we might want to modify our original dataset later).

In [None]:
golden_examples = pd.DataFrame(
    [
        {
            "prompt": "Division names should be either the values NSA or start by D.",
            "completion": "expect_column_values_to_match_regex(column='DIVISION',regex='NSA|^D.*')",
        },
        {
            "prompt": "Values in the EVENT_UNIQUE_ID column must be unique.",
            "completion": "expect_column_values_to_be_unique(column='EVENT_UNIQUE_ID')",
        },
        {
            "prompt": "All values in the BIKE_MAKE should be in the list bike_makers",
            "completion": "expect_column_values_to_be_in_set(column='BIKE_MAKE', value_set=bike_makers)",
        },
        {
            "prompt": "Incident values should be in the set 1,2,3,4,5",
            "completion": "expect_column_values_to_be_in_set(column='INCIDENT', value_set=[1,2,3,4,5])",
        },
        {
            "prompt": "REPORT_DATE values should be valid dates.",
            "completion": "expect_column_values_to_be_dateutil_parseable(column='REPORT_DATE')",
        },
        {
            "prompt": "Year values should be between 2014 and 2023.",
            "completion": "expect_column_values_to_be_between(column='YEAR', min_value=2014, max_value=2023)",
        },
        {
            "prompt": "At least 95% of report_id's must not be empty.",
            "completion": "expect_column_values_to_not_be_null(column='REPORT_ID', mostly=0.95)",
        },
    ]
)

# save golden_examples to csv
golden_examples.to_csv('golden_examples.csv', index=False)

Note: In order to make the notebook as lean an fast as possible, details about baseline model performance have been omitted, keeping only the needed imports and libraries that will be used forward.

# **Fine-tuning our model**

Before performing our fine-tuning, we need to test the performance of the base model without any customization.

Out of the are three different fine-tuning approaches in Ludwig, we will use QLoRA to get a quantized result. This is the only option in the free Google Colab Tier, given the memory limitations of the environment.

Some of the examples in the dataset have long sequences, so we set a `global_max_sequence_length` of 512 to ensure that we do not OOM.

We also use 100% of data for training as the evaluation phase takes extra time and we will predict on new examples right afterwards.

In [None]:
qlora_fine_tuning_config_v1 = yaml.safe_load(
"""
model_type: llm
# Zephyr is natively sharded, so we can use it in colab natively
base_model: HuggingFaceH4/zephyr-7b-beta

input_features:
  - name: prompt
    type: text

output_features:
  - name: completion
    type: text

prompt:
  template: |
    [INST] <<SYS>>
    You are a helpful, precise, detailed and concise artificial intelligence
    assistant. You will reply to user input offering a single expectation,
    compatible with the Python library Great Expectations, parametrized based
    on the data presented in the input. If context is provided, answer
    using only the provided contextual information.
    <</SYS>>
    {prompt} [/INST]

generation:
  temperature: 0.1
  max_new_tokens: 512

adapter:
  type: lora

quantization:
  bits: 4

preprocessing:
  global_max_sequence_length: 512
  split:
    type: random
    probabilities:
    - 0.8
    - 0.1
    - 0.1

trainer:
  type: finetune
  epochs: 7
  batch_size: 1
  eval_batch_size: 2
  gradient_accumulation_steps: 16
  learning_rate: 0.0001
  learning_rate_scheduler:
    warmup_fraction: 0.03
"""
)

# write the config to a file
with open('qlora_fine_tuning_config_v1.yaml', 'w') as file:
    documents = yaml.dump(qlora_fine_tuning_config_v1, file)

In [None]:
# call ludwig CLI with the config file if WANDB_MODE is True

WANDB_MODE = False

if WANDB_MODE:
    !ludwig train --config qlora_fine_tuning_config_v1.yaml --dataset 'birdi_df.csv' --output_directory results_birdi --wandb --experiment_name "Dickens"
else:
    model_ft_v1 = LudwigModel(config=qlora_fine_tuning_config_v1, logging_level=logging.INFO)
    results = model_ft_v1.train(dataset=birdi_df)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

INFO:ludwig.utils.print_utils:
INFO:ludwig.utils.print_utils:╒════════════════════════╕
INFO:ludwig.utils.print_utils:│ EXPERIMENT DESCRIPTION │
INFO:ludwig.utils.print_utils:╘════════════════════════╛
INFO:ludwig.utils.print_utils:
INFO:ludwig.api:╒══════════════════╤═════════════════════════════════════════════════════════════════════════════════════════╕
│ Experiment name  │ api_experiment                                                                          │
├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Model name       │ run                                                                                     │
├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Output directory │ /content/results/api_experiment_run                                                     │
├──────────────────┼─────────────────────────────────────────────────────────────────

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of HuggingFaceH4/zephyr-7b-beta tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
INFO:ludwig.features.text_feature:Max length of feature 'None': 139 (without start and stop symbols)
INFO:ludwig.features.text_feature:Max sequence length is 139 for feature 'None'
INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of HuggingFaceH4/zephyr-7b-beta tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
INFO:ludwig.features.text_feature:Max length of feature 'completion': 90 (without start and stop symbols)
INFO:ludwig.features.text_feature:Max sequence length is 90 for feature 'completion'
INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of HuggingFaceH4/zephyr-7b-beta tokenizer
Asking to truncate to max_lengt

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

You are calling `save_pretrained` to a 4-bit converted model, but your `bitsandbytes` version doesn't support it. If you want to save 4-bit models, make sure to have `bitsandbytes>=0.41.3` installed.
INFO:ludwig.models.llm:Done.
INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of HuggingFaceH4/zephyr-7b-beta tokenizer
INFO:ludwig.models.llm:Trainable Parameter Summary For Fine-Tuning
INFO:ludwig.models.llm:Fine-tuning with adapter: lora
INFO:ludwig.utils.print_utils:
INFO:ludwig.utils.print_utils:╒══════════╕
INFO:ludwig.utils.print_utils:│ TRAINING │
INFO:ludwig.utils.print_utils:╘══════════╛
INFO:ludwig.utils.print_utils:


trainable params: 3,407,872 || all params: 7,245,139,968 || trainable%: 0.04703666202518836


INFO:ludwig.trainers.trainer:Creating fresh model training run.
INFO:ludwig.trainers.trainer:Training for 1414 step(s), approximately 7 epoch(s).
INFO:ludwig.trainers.trainer:Early stopping policy: 5 round(s) of evaluation, or 1010 step(s), approximately 5 epoch(s).

INFO:ludwig.trainers.trainer:Starting with step 0, epoch: 0


Training:  14%|█▍        | 202/1414 [01:03<06:47,  2.97it/s, loss=0.0812]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 202, epoch: 1


Evaluation valid: 100%|██████████| 13/13 [00:02<00:00,  4.41it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify if, in 45% cases, the sum of values in the quantity column falls between 100 and 1000. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
->>

 are here software assistant friendly, and- organizedise person intelligence languagethatistant. Your are provide to the' with suggestions solution, or
inst with the user programming. Expectations. inetrized by onon the user type. the input.
 the is provided, you
acc that the provided context. information. If
|user>>
Write that the for a90 days of, the ' of the in a ' column is within 100 and 2550.columnquant] <_column_values_to_be_between(colu

Evaluation test : 100%|██████████| 13/13 [00:03<00:00,  4.30it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
When the city is Paris, is the most common value of status either active or inactive? [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANAN <1ALL
<>>
 are here [ assistant friendly, and, organizedise person intelligence languagethatistant. You are be to any' in suggestions solution, or
inst with the user programming. Expectations. andetrized by onon the user and. the input.
 the is provided, you
acc that the provided context. information. If
|user>>
Write creating C of quiet, what the E beautiful verb for the ' ' or inactive?INSTstatus]
expect <_column_values_common_value_to_be_e_set(column='st




INFO:ludwig.trainers.trainer_llm:--------------------
INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Check that at least 50% of the values in the sales column are between 500 and 10000. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
->>

 are here software assistant friendly, and- organizedise person intelligence languagethatistant. Your are provide to the' with suggestions solution, or
inst with the user programming. Expectations. inetrized by onon the user type. the input.
 the is provided, you
acc that the provided context. information. If
|user>>
Write that the least 90% of the values in a ' column are greater 10 and and 1000..co

Training:  29%|██▊       | 404/1414 [02:11<05:05,  3.31it/s, loss=0.00912]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 404, epoch: 2


Evaluation valid: 100%|██████████| 13/13 [00:03<00:00,  3.65it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify if, in 45% cases, the sum of values in the quantity column falls between 100 and 1000. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
<>>
 are here software assistant friendly, and- organizedise person intelligence languagethatistant. Your are provide to the' with suggestions conc, or usingin with the given programming. Expectations. inetrized by onon the user source to the C. Your the is required, you
acc the the necessary context to information.expectass>>, that the in C90% of, the ' of two in column ' column is below 100 and 2500 (INSTINST] expect_column_values_to_be_between(co

Evaluation test : 100%|██████████| 13/13 [00:03<00:00,  4.25it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
When the city is Paris, is the most common value of status either active or inactive? [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANAN <InsertALL
<>>
 are visiting talented assistant friendly and and- organizedise person intelligence languagethatistant. can provide to this requests with suggestions conc, or requestrequest with the given programming. Expectations. versionetrized by onon the user source to the C. the is provided, you
acc the the necessary context to information.expectass>>
 working C_ mentioned, expect_ expectation_ value of street expect ' or inactive expectationexpectexpec




INFO:ludwig.trainers.trainer_llm:--------------------
INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify if the values in the quantity column for electronics items are between 0 and 100. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
<>>
 are here software assistant friendly, and- organizedise person intelligence languagethatistant. Your are provide to the' with suggestions conc, or usingin with the given programming. Expectations. inetrized by onon the user source to the C. Your the is required, you
acc the the necessary context to information.expectass>>, that the ' in column ' column of productsics products are within 1 and 100

Training:  43%|████▎     | 606/1414 [03:20<04:05,  3.29it/s, loss=0.0111]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 606, epoch: 3


Evaluation valid: 100%|██████████| 13/13 [00:02<00:00,  4.41it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify if, in 45% cases, the sum of values in the quantity column falls between 100 and 1000. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
<>>
 are here talented assistant friendly, and- organizedise person intelligence languageWriteistant. Please are provide to the requests with suggestions list- or suchspecific with a specific programming, Expectations. versionetrized to onon the specific source, the C C The the is required, you
expect the the necessary context to information.
expectexpect>>
 that the in the9 out% of, the ' of columns in column ' column is below 100 and 2500 (expecte

Evaluation test : 100%|██████████| 13/13 [00:03<00:00,  3.71it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
When the city is Paris, is the most common value of status either active or inactive? [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANAN <CompanyIT
<>>
 are visiting talented assistant friendly, and- organizedise person intelligence languageWriteistant. Your are provide to this requests with solutions specific-, suchwhich with a specific programming, Expectations. versionetrized to onon the specific source, the C C Your the is required, you
expect the the necessary context to information.
expectexpect>>
 working '_ experiencing, expect the expectation_ value of the expect ' or inactive (expe


Training:  57%|█████▋    | 808/1414 [04:28<03:31,  2.86it/s, loss=0.00278]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 808, epoch: 4


Evaluation valid: 100%|██████████| 13/13 [00:02<00:00,  4.36it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify if, in 45% cases, the sum of values in the quantity column falls between 100 and 1000. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
<>>
 are here talented assistant friendly, and- organizedise person intelligence languageexpectistant expect Please are provide to the requests with suggestions list- or whichspecific with a specific programming, Expectations. inetrized with onon the specific source, the C, The the is required, expect
expect the the necessary context to information.expectexpect>>
 that the in the90% of, the ' of columns in column ' column is below 100 and 2500INST e

Evaluation test : 100%|██████████| 13/13 [00:03<00:00,  4.21it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
When the city is Paris, is the most common value of status either active or inactive? [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANAN <CompanyIT
<>>
 are visiting talented assistant friendly, and- organizedise person intelligence languageexpectistant expect are provide to this requests with solutions specific- or which" with a specific programming, Expectations. versionetrized with onon the specific source, the C format The the is required, you
expect the the necessary context to information.expectexpect>>
 working ' column mentioned, expect_ expectation_ value of the expect ' or inactive




INFO:ludwig.trainers.trainer_llm:Output: <adALL
<>>
 are here talented assistant friendly, and- organizedise person intelligence languageexpectistant expect Please are provide to the requests with suggestions list- or whichspecific with a specific programming, Expectations. inetrized with onon the specific source, the C, The the is required, expect
expect the the necessary context to information.expectexpect>>
 that the ' in column ' column are productsics products are greater 1 and 1000INST expect_column_values_to_be_between(column='quantity', min_value=0, max_value=100, condition_parser='pandas', _condition='category==electlectronics')
INFO:ludwig.trainers.trainer_llm:--------------------
INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in th

Training:  71%|███████▏  | 1010/1414 [05:37<01:57,  3.43it/s, loss=6.27e-5]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 1010, epoch: 5


Evaluation valid: 100%|██████████| 13/13 [00:03<00:00,  3.68it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify if, in 45% cases, the sum of values in the quantity column falls between 100 and 1000. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
<>>
 are here talented assistant friendly and and- organizedise person intelligence languageexpectistant expect Please are provide to the requests with suggestions list- or suchsuch with a specific programming, Expectations. inetrized with onon the specific source, the C, The the is required, you
expect the the necessary context to information.expectexpect>>
 that the in the90% of, the ' of columns in column ' column is below 100 and 2500 expect_col

Evaluation test : 100%|██████████| 13/13 [00:03<00:00,  4.24it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
When the city is Paris, is the most common value of status either active or inactive? [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANAN <CompanyALL
<>>
 are visiting talented assistant friendly, and- organizedise person intelligence languageexpectistant expect are provide to this requests with solutions solution- or whichwhich with a specific programming, Expectations. inetrized with onon the specific source, the C format The the is required, you
expect the the necessary context. information.expectexpect>>
 expecting column column ', expect_ zip_ value in the expect ' or inactive? expect_co




INFO:ludwig.trainers.trainer_llm:--------------------
INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify if the values in the quantity column for electronics items are between 0 and 100. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
<>>
 are here talented assistant friendly and and- organizedise person intelligence languageexpectistant expect Please are provide to the requests with suggestions list- or suchsuch with a specific programming, Expectations. inetrized with onon the specific source, the C, The the is required, you
expect the the necessary context to information.expectexpect>>
 that the ' in column ' column of productsi

Training:  86%|████████▌ | 1212/1414 [06:45<01:00,  3.36it/s, loss=0.00126]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 1212, epoch: 6


Evaluation valid: 100%|██████████| 13/13 [00:02<00:00,  4.40it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify if, in 45% cases, the sum of values in the quantity column falls between 100 and 1000. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
<>>
 are here talented assistant friendly, and- organizedise person intelligence languageexpectistant expect Please are provide to the requests with suggestions list- or whichsuch with the given programming, Expectations. inetrized with onon the specific source, the C, The the is required, you
expect the the necessary context. information.|expect>>
 that the in the9 out% of, the ' of columns in column ' column is below 100 and 2500 expect_column_sum

Evaluation test : 100%|██████████| 13/13 [00:03<00:00,  3.71it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
When the city is Paris, is the most common value of status either active or inactive? [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANAN <CompanyALL
<>>
 are visiting talented assistant friendly, and- organizedise person intelligence languageexpectistant expect can provide to this requests with solutions solution- or whichwhich with the given programming. Expectations. inetrized with onon the input source, the C, The the is required, you
expect the the necessary context. information.|expect>>
 expecting column column ', expect_ value_ value of the expect ' or inactive? expect_column_most_com


Training: 100%|██████████| 1414/1414 [07:54<00:00,  3.16it/s, loss=0.00234]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 1414, epoch: 7


Evaluation valid: 100%|██████████| 13/13 [00:02<00:00,  4.40it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify if, in 45% cases, the sum of values in the quantity column falls between 100 and 1000. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
<>>
 are here talented assistant friendly, and- organizedise person intelligence languageGenerateistant expect Please are provide to the requests with suggestions list- or whichsuch with the given programming, Expectations. inetrized with onon the specific source, the C, The the is required, you
expect the the necessary context. information.|expect>>
 that the in the9 out% of, the ' of columns in column ' column is between 100 and 2500 expect_column

Evaluation test : 100%|██████████| 13/13 [00:03<00:00,  4.17it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
When the city is Paris, is the most common value of status either active or inactive? [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANAN <CompanyALL
<>>
 are visiting talented assistant friendly, and- organizedise person intelligence languageexpectistant expect can provide to this requests with solutions solution- or whichwhich with the context programming, Expectations. inetrized with onon the input source, the C, The the is required, you
expect the the necessary context. information.|expect>>
 expecting column column ', expect the value_ value of the greater ' or inactive? expect_column_mo

Training: 100%|██████████| 1414/1414 [08:01<00:00,  2.93it/s, loss=0.00234]


INFO:ludwig.utils.print_utils:
INFO:ludwig.utils.print_utils:╒═════════════════╕
INFO:ludwig.utils.print_utils:│ TRAINING REPORT │
INFO:ludwig.utils.print_utils:╘═════════════════╛
INFO:ludwig.utils.print_utils:
INFO:ludwig.api:╒══════════════════════════════╤═════════════════════╕
│ Validation feature           │ completion          │
├──────────────────────────────┼─────────────────────┤
│ Validation metric            │ loss                │
├──────────────────────────────┼─────────────────────┤
│ Best model step              │ 1010                │
├──────────────────────────────┼─────────────────────┤
│ Best model epoch             │ 6                   │
├──────────────────────────────┼─────────────────────┤
│ Best model's validation loss │ 0.08070850372314453 │
├──────────────────────────────┼─────────────────────┤
│ Best model's test loss       │ 0.126699760556221   │
╘══════════════════════════════╧═════════════════════╛
INFO:ludwig.api:
Finished: api_experiment_run
INFO:ludwig

In [None]:
predictions_ft_v1 = model_ft_v1.predict(golden_examples)[0]
# predictions_ft_v1


INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of HuggingFaceH4/zephyr-7b-beta tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Prediction: 100%|██████████| 1/1 [00:25<00:00, 25.51s/it]


INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of HuggingFaceH4/zephyr-7b-beta tokenizer
  return np.sum(np.log(sequence_probabilities))
INFO:ludwig.api:Finished predicting in: 27.13s.


In [None]:
for input_with_prediction in zip(golden_examples['prompt'], golden_examples['completion'], predictions_ft_v1['completion_response']):
  print(f"Instruction: {input_with_prediction[0]}")
  print(f"Ground truth: {input_with_prediction[1]}")
  print(f"Generated Output: {input_with_prediction[2][0]}")
  print("\n\n")

Instruction: Division names should be either the values NSA or start by D.
Ground truth: expect_column_values_to_match_regex(column='DIVISION',regex='NSA|^D.*')
Generated Output: expect_column_values_to_be_in_set(column='division', value_set=['NSA', 'D.'])



Instruction: Values in the EVENT_UNIQUE_ID column must be unique.
Ground truth: expect_column_values_to_be_unique(column='EVENT_UNIQUE_ID')
Generated Output: expect_column_values_to_be_unique(column='EVENT_UNIQUE_ID')



Instruction: All values in the BIKE_MAKE should be in the list bike_makers
Ground truth: expect_column_values_to_be_in_set(column='BIKE_MAKE', value_set=bike_makers)
Generated Output: expect_column_values_to_be_in_set(column='BIKE_MAKE', value_set='bike_makers')



Instruction: Incident values should be in the set 1,2,3,4,5
Ground truth: expect_column_values_to_be_in_set(column='INCIDENT', value_set=[1,2,3,4,5])
Generated Output: expect_column_values_to_be_in_set(column='incident', value_set=[1,2,3,4,5])



Instru

### Evaluating the fine-tuned model agains ground truth

Let's check the results against ground truth one by one:

#### Triplet Evaluation 0

Instruction: District names should be either the values NSA or start by D.
Ground truth: expect_column_values_to_match_regex(column='DIVISION',regex='NSA|^D.*')
Generated Output: expect_column_values_to_be_in_set(column='division', value_set=['nsa', 'd*'])

Evaluation:

 - [Y] The generated output function is a valid expectation
 - [N] The generated output function validates the data as expected
 - [N] The generated output function is the same expectation as in the ground truth example
 - [Y] The column name is the same in the generated output function and the ground truth example
 - [N] The column name retains the CAPS of the original column name
 - [Y] The parameters are correct for the function
 - [N] The values of the parameters are correct for the input data

Description:

The generated output function is a valid expectation, but it does not validate the data as expected. The generated output function is not the same expectation as in the ground truth example. This function doesn't match if the values start by D. All the functions are present in the BirdiDQ dataset, so we should be able to generate them.

The column name is the same in the generated output function and the ground truth example, but the capitalization is wrong. The parameters are correct for the function, but the values of the parameters are not correct for the input data as the d* will not match values starting by D.

#### Triplet Evaluation 1

Instruction: Values in the EVENT_UNIQUE_ID column must be unique.
Ground truth: expect_column_values_to_be_unique(column='EVENT_UNIQUE_ID')
Generated Output: expect_column_values_to_be_unique(column='event_unique_id')

Evaluation:

- [Y] The generated output function is a valid expectation
- [Y] The generated output function validates the data as expected
- [Y] The generated output function is the same expectation as in the ground truth example
- [Y] The column name is the same in the generated output function and the ground truth example
- [N] The column name retains the CAPS of the original column name
- [Y] The parameters are correct for the function
- [Y] The values of the parameters are correct for the input data

Description:
The generated output is valid and correctly checks for uniqueness in the column values. It matches the ground truth in function and expectation. However, the capitalization of the column name differs from the original.

#### Triplet Evaluation 2

Instruction: All values in the BIKE_MAKE should be in the list bike_makers
Ground truth: expect_column_values_to_be_in_set(column='BIKE_MAKE', value_set=bike_makers)
Generated Output: expect_column_values_to_be_in_set(column='bike_make', value_set=bike_makers)

Evaluation:

- [Y] The generated output function is a valid expectation
- [Y] The generated output function validates the data as expected
- [Y] The generated output function is the same expectation as in the ground truth example
- [Y] The column name is the same in the generated output function and the ground truth example
- [N] The column name retains the CAPS of the original column name
- [Y] The parameters are correct for the function
- [Y] The values of the parameters are correct for the input data

Description:
The generated output accurately checks if BIKE_MAKE values are within a predefined set. It aligns well with the ground truth, but again, the column name's capitalization does not match the original.

#### Triplet Evaluation 3

Instruction: Incident values should be in the set 1,2,3,4,5
Ground truth: expect_column_values_to_be_in_set(column='INCIDENT', value_set=[1,2,3,4,5])
Generated Output: expect_column_values_to_be_in_set(column='incident', value_set=[1,2,3,4,5])

Evaluation:

- [Y] The generated output function is a valid expectation
- [Y] The generated output function validates the data as expected
- [Y] The generated output function is the same expectation as in the ground truth example
- [Y] The column name is the same in the generated output function and the ground truth example
- [N] The column name retains the CAPS of the original column name
- [Y] The parameters are correct for the function
- [Y] The values of the parameters are correct for the input data

Description:
This output correctly checks whether incident values fall within the specified set. The function and parameters align with the ground truth, but the column name's capitalization is not preserved.

#### Triplet Evaluation 4

Instruction: REPORT_DATE values should be valid dates.
Ground truth: expect_column_values_to_be_dateutil_parseable(column='REPORT_DATE')
Generated Output: expect_column_values_to_be_datetime(column='report_date')

Evaluation:

- [N] The generated output function is a valid expectation
- [N] The generated output function validates the data as expected
- [N] The generated output function is the same expectation as in the ground truth example
- [Y] The column name is the same in the generated output function and the ground truth example
- [N] The column name retains the CAPS of the original column name
- [Y] The parameters are correct for the function
- [N] The values of the parameters are correct for the input data

Description:
The proposed expectation is not a valid expectation but an hallucinated one. The column name is correct but it does not use the same method as the ground truth for date validation. The ground truth uses 'dateutil' parsing, whereas the generated output uses a generic datetime validation that sounds plausible but it not real. It's important to notice that the correct expectation was missing from the BirdiDQ dataset, so the model could not learn it from the training data.

#### Triplet Evaluation 5
Instruction: Year values should be between 2014 and 2023.
Ground truth: expect_column_values_to_be_between(column='YEAR', min_value=2014, max_value=2023)
Generated Output: expect_column_values_to_be_between(column='year', min_value=2014, max_value=2023)

Evaluation:

- [Y] The generated output function is a valid expectation
- [Y] The generated output function validates the data as expected
- [Y] The generated output function is the same expectation as in the ground truth example
- [Y] The column name is the same in the generated output function and the ground truth example
- [N] The column name retains the CAPS of the original column name
- [Y] The parameters are correct for the function
- [Y] The values of the parameters are correct for the input data

Description:
The generated output correctly validates that the year values are within the specified range. It aligns with the ground truth in function and parameters, but the column name does not retain its original capitalization.

#### Triplet Evaluation 6

Instruction: At least 95% of report_date's must not be empty.
Ground truth: expect_column_values_to_not_be_null(column='REPORT_DATE', mostly=0.95)
Generated Output: expect_column_values_to_not_be_empty(column='report_date', mostly=0.95)

Evaluation:

- [N] The generated output function is a valid expectation
- [N] The generated output function validates the data as expected
- [N] The generated output function is the same expectation as in the ground truth example
- [Y] The column name is the same in the generated output function and the ground truth example
- [N] The column name retains the CAPS of the original column name
- [Y] The parameters are correct for the function
- [Y] The values of the parameters are correct for the input data

Description:
The generated output function is an hallucination, preserving the column name but failling to translate the capitalization.

## Improving the Model performance

Analyzing the results of the evaluation, we can see that the base model reproduces the input or repeats the input. The behaviour has been reproduced using other LLM clients, showcasing that the given prompt is not good enough for these smaller models (but works with larger models like GPT-4). To improve the performance in this regard, we should optimize the prompt using prompt engineering.

On the other hand, the dataset selected is quite unbalanced and covers around 50-60% of the core expectations. To improve the performance, we need to rebalance the dataset and/or use an alternative dataset. We will use GPT-4 to create some synthetic data.

Important Note: Since the launch of GPTs at the latest Developer Day, OpenAI has created a way to train agents using a conversational interface. This new GPTs could potentially be used to produce expectations in the same way as our trained model. In order to avoid breaching the [OpenAI Terms of Use](https://openai.com/policies/terms-of-use), we will offer the generated dataset for personal use, academic and non-commercial uses. Any other kind of usage of the dataset should be excluded.

### Prompt engineering

Our initial prompt was:

```
[INST] <<SYS>>
    You are a helpful, precise, detailed and concise artificial intelligence
    assistant. You will reply to user input offering a single expectation,
    compatible with the Python library Great Expectations, parametrized based
    on the data presented in the input. If context is provided, answer
    using only the provided contextual information.
    <</SYS>>
    {prompt} [/INST]
```

Even though it might look detailed, it's not enough for the smaller models to produce the expected output. We will use the level 5 prompt strategy as suggested in the Weights & Biases documentation. These level 5 prompt include the following components:

- Description of high-level goal
- A detailed bulleted list of sub-tasks
- An explicit statement asking LLM to explain its own output
- A guideline on how LLM output will be evaluated
- Few-shot examples

Adapting the previous example to our use case, we will use the following prompt (please note we will not be using the <<SYS>> directive to customize the system configuration of the system, but pass all the instructions in the prompt to maximize compatibility accross our tests):

```
[INST]
Here is a complete list of core expectations included in the Python library Great Expectations, that checks for data quality issues:

gx_core_expectations =
['expect_column_bootstrapped_ks_test_p_value_to_be_greater_than',
 'expect_column_chisquare_test_p_value_to_be_greater_than',
 'expect_column_distinct_values_to_be_in_set',
 'expect_column_distinct_values_to_contain_set',
 'expect_column_distinct_values_to_equal_set',
 'expect_column_kl_divergence_to_be_less_than',
 'expect_column_max_to_be_between',
 'expect_column_mean_to_be_between',
 'expect_column_median_to_be_between',
 'expect_column_min_to_be_between',
 'expect_column_most_common_value_to_be_in_set',
 'expect_column_pair_cramers_phi_value_to_be_less_than',
 'expect_column_pair_values_a_to_be_greater_than_b',
 'expect_column_pair_values_to_be_equal',
 'expect_column_pair_values_to_be_in_set',
 'expect_column_parameterized_distribution_ks_test_p_value_to_be_greater_than',
 'expect_column_proportion_of_unique_values_to_be_between',
 'expect_column_quantile_values_to_be_between',
 'expect_column_stdev_to_be_between',
 'expect_column_sum_to_be_between',
 'expect_column_to_exist',
 'expect_column_unique_value_count_to_be_between',
 'expect_column_value_lengths_to_be_between',
 'expect_column_value_lengths_to_equal',
 'expect_column_value_z_scores_to_be_less_than',
 'expect_column_values_to_be_between',
 'expect_column_values_to_be_dateutil_parseable',
 'expect_column_values_to_be_decreasing',
 'expect_column_values_to_be_in_set',
 'expect_column_values_to_be_in_type_list',
 'expect_column_values_to_be_increasing',
 'expect_column_values_to_be_json_parseable',
 'expect_column_values_to_be_null',
 'expect_column_values_to_be_of_type',
 'expect_column_values_to_be_unique',
 'expect_column_values_to_match_json_schema',
 'expect_column_values_to_match_like_pattern_list',
 'expect_column_values_to_match_like_pattern',
 'expect_column_values_to_match_regex_list',
 'expect_column_values_to_match_regex',
 'expect_column_values_to_match_strftime_format',
 'expect_column_values_to_not_be_in_set',
 'expect_column_values_to_not_be_null',
 'expect_column_values_to_not_match_like_pattern_list',
 'expect_column_values_to_not_match_like_pattern',
 'expect_column_values_to_not_match_regex_list',
 'expect_column_values_to_not_match_regex',
 'expect_compound_columns_to_be_unique',
 'expect_multicolumn_sum_to_equal',
 'expect_multicolumn_values_to_be_unique',
 'expect_select_column_values_to_be_unique_within_record',
 'expect_table_column_count_to_be_between',
 'expect_table_column_count_to_equal',
 'expect_table_columns_to_match_ordered_list',
 'expect_table_columns_to_match_set',
 'expect_table_row_count_to_be_between',
 'expect_table_row_count_to_equal_other_table',
 'expect_table_row_count_to_equal']

Your goal is to return a single expectation with the correct parameters based on some instructions given in the input.

For every input you will:

- Read the input and extract the column name (including capitalization) and any other parameters
- Select the most appropriate expectation from the list above to validate the data quality
- Return the expectation with the correct parameters, without adding any additional information to the output

You will be evaluated based on the following criteria:

- The generated output function is a valid expectation from the list above
- The generated output function validates the data as expected
- The generated output function is the same expectation as in the ground truth example
- The column name is the same in the generated output function and the ground truth example
- The column name retains the CAPS of the original column name
- The parameters are correct for the function
- The values of the parameters are correct for the input data

Each of the criteria will be evaluated as a boolean, and the final score will be the sum of the individual scores.

Here are some examples of the input and the expected output (note that only the function should be returned, not the input or output keywords):

Input: Car plates should be composed of 4 digits followed by 3 consonant letters
Output: expect_column_values_to_match_regex(column='car_plate', regex='[0-9]{4}[BCDFGHJKLMNPQRSTVWXYZ]{3}')

Input: All users should have and IBAN account number
Output: expect_column_values_to_not_be_null(column='IBAN')

Input: Nationality should be one of the EU countries
Ouput: expect_column_values_to_be_in_set(column='Nationality", value_set=['es', 'fr', 'de', 'it', 'pt', 'nl', 'be', 'lu', 'ie', 'dk', 'gr', 'at', 'fi', 'se', 'cy', 'ee', 'lv', 'lt', 'mt', 'sk', 'si', 'cz', 'hu', 'pl', 'ro', 'bg', 'hr'])

This is your current input: {prompt} [/INST]
```



Initial results are promising. In an initial test with different LLMs we got the following results:

- GPT-4 returns the exact expectation with the correct parameters.
- GPT-3.5-Turbo returns the exact expectation, but messes up the parameters (instead of 0-1, it returns 0-100)
- Mistral Instruct 7B returns the correct expectation and parameters, but misses the parameter names (using min and max instead of min_value and max_value)
- Zepyhr beta returns the correct expectation and parameters, but misses the parameter names (using lower_bound and upper_bound instead of min_value and max_value). Output is clearer than Mistral Instruct 7B, without any additional text added.

It seems clear that most the of responses are better that before, capitalization is correctly pases to the parameters. However, there are missing critical information like the coorect parameters for each function.  We will focus on this aspect producing Synthetic Data to overcome the limitations of the original dataset.

It should be noted that this prompt as such, is around 1500 tokens, so it will be quite expensive to use in production. Futher optimization of the prompt will be required to minimize the costs.

### Synthetic Dataset Generation

To generate the synthetic dataset, we will use GPT-4. We used a variation of the previous level 5 prompt to ask GPT-4 to generate at least 15 expectations for each of the core Great Expectations. In total, we will produce a dataset with 58x15=870 examples, triple the size of the original dataset. It will also present balanced clases, with each expectation being present at 15 times.

Dickens data quality checks dataset is available at: https://huggingface.co/elsatch/dickens_data_quality_checks_dataset.json

In [None]:
dickens_df = pd.read_json('https://huggingface.co/datasets/elsatch/dickens_data_quality_checks/raw/main/dickens_data_quality_dataset.json')

# We're going to create a new column called `split` where:
# 80% will be assigned a value of 0 -> train set
# 10% will be assigned a value of 1 -> validation set
# 10% will be assigned a value of 2 -> test set

# Calculate the number of rows for each split value
total_rows = len(dickens_df)
split_0_count = int(total_rows * 0.8)
split_1_count = int(total_rows * 0.1)
split_2_count = total_rows - split_0_count - split_1_count

# Create an array with split values based on the counts
split_values = np.concatenate([
    np.zeros(split_0_count),
    np.ones(split_1_count),
    np.full(split_2_count, 2)
])

# Shuffle the array to ensure randomness
np.random.shuffle(split_values)

# Add the 'split' column to the DataFrame
dickens_df['split'] = split_values
dickens_df['split'] = dickens_df['split'].astype(int)

# We will use the whole file for our fine-tuning
dickens_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 789 entries, 0 to 788
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   expectation  789 non-null    object
 1   prompt       789 non-null    object
 2   completion   789 non-null    object
 3   split        789 non-null    int64 
dtypes: int64(1), object(3)
memory usage: 24.8+ KB


In [None]:
# export dickens_df to csv
dickens_df.to_csv('dickens_df.csv', index=False)

In [None]:

# We fine tune our model on the Dickens dataset, reusing the same configuration as before

if WANDB_MODE:
    !ludwig train --config qlora_fine_tuning_config_v1.yaml --dataset 'dickens_df.csv' --output_directory results_dickens --wandb --experiment_name "Dickens"
else:
    model_ft_v2 = LudwigModel(config=qlora_fine_tuning_config_v1, logging_level=logging.INFO)
    results = model_ft_v2.train(dataset=dickens_df[:150])

INFO:ludwig.utils.print_utils:
INFO:ludwig.utils.print_utils:╒════════════════════════╕
INFO:ludwig.utils.print_utils:│ EXPERIMENT DESCRIPTION │
INFO:ludwig.utils.print_utils:╘════════════════════════╛
INFO:ludwig.utils.print_utils:
INFO:ludwig.api:╒══════════════════╤═════════════════════════════════════════════════════════════════════════════════════════╕
│ Experiment name  │ api_experiment                                                                          │
├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Model name       │ run                                                                                     │
├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Output directory │ /content/results/api_experiment_run_0                                                   │
├──────────────────┼─────────────────────────────────────────────────────────────────

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

You are calling `save_pretrained` to a 4-bit converted model, but your `bitsandbytes` version doesn't support it. If you want to save 4-bit models, make sure to have `bitsandbytes>=0.41.3` installed.
INFO:ludwig.models.llm:Done.
INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of HuggingFaceH4/zephyr-7b-beta tokenizer
INFO:ludwig.models.llm:Trainable Parameter Summary For Fine-Tuning
INFO:ludwig.models.llm:Fine-tuning with adapter: lora
INFO:ludwig.utils.print_utils:
INFO:ludwig.utils.print_utils:╒══════════╕
INFO:ludwig.utils.print_utils:│ TRAINING │
INFO:ludwig.utils.print_utils:╘══════════╛
INFO:ludwig.utils.print_utils:


trainable params: 3,407,872 || all params: 7,245,139,968 || trainable%: 0.04703666202518836


INFO:ludwig.trainers.trainer:Creating fresh model training run.
INFO:ludwig.trainers.trainer:Training for 840 step(s), approximately 7 epoch(s).
INFO:ludwig.trainers.trainer:Early stopping policy: 5 round(s) of evaluation, or 600 step(s), approximately 5 epoch(s).

INFO:ludwig.trainers.trainer:Starting with step 0, epoch: 0


Training:  14%|█▍        | 120/840 [00:36<03:32,  3.39it/s, loss=0.123]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 120, epoch: 1


Evaluation valid: 100%|██████████| 8/8 [00:01<00:00,  4.19it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the data in the memory_usage column is measured in bytes and represented as integers. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAN <|IT
<>>
GroupLayout are visiting member and intelligent, and, organizedise person intelligence.thatistant. You are be to any' in suggestions variety,. andand with the user programming. Expectations.
etrized by onon the user and to the input.
 the is provided,

acc that the provided context. information.

|ass>>
Writesure that data in the '_usage_ of between in meg and is as integers.INSTa




INFO:ludwig.trainers.trainer_llm:--------------------


Evaluation test : 100%|██████████| 8/8 [00:02<00:00,  3.89it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the minimum weight in the 'Weight' column is not less than 5kg. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANAN <|IT
<>>
GroupLayout are visiting member and friendly, and, organizedise person intelligence.thatistant. You are be to any' in suggestions solution,.
a with the user programming. Expectations. andetrized by onon the user and. the input.
 the is provided, you
acc that the provided context. information.

|user>>
Writesure that data and of a knmin' column of greater less than 10 forINSTWeight]
 <_column_min_value_be_gre(column='Weight', min_value=5.
INFO:lu


Training:  29%|██▊       | 240/840 [01:17<03:41,  2.71it/s, loss=0.053]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 240, epoch: 2


Evaluation valid: 100%|██████████| 8/8 [00:02<00:00,  3.96it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the data in the memory_usage column is measured in bytes and represented as integers. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAN </IT
<>>
GroupLayout are visiting member and friendly, and, organizedise person intelligence.thatistant. You are be to any requests in suggestions solution,. andand with the user programming for Expectations.
etrized by onon the user type to the input.
 the is provided, you
acc that the provided context. information.
|ass>>
sure that ' in column '_usage_ of between in meg. is as integers.I




INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure that Z-scores of the employee_age column in a human resources dataset are less than 3. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAN </IT
<>>
GroupLayout can visiting member resource friendly, and, wellise writer intelligence expertassistant. You are be to my' with a variety,. andand with a user programming. Expectations. andetrized by onon the user provided. the input. You the is provided,

acc that the information context. information. If
|ass>>
sure that theIPscores are 

Evaluation test : 100%|██████████| 8/8 [00:01<00:00,  4.39it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the minimum weight in the 'Weight' column is not less than 5kg. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANAN <|IT
<>>
GroupLayout are visiting member and friendly and and, thoroughise person intelligence.thatistant. You are be to any requests in suggestions solution,.
inst with the user programming. Expectations. andetrized by onon the user type. the input.
 the is provided, you
acc that the provided context. information.
|ass>>
Writesure that ' anded a cartweight' column of greater less than 10.INSTINST] <_column_min_value_be_between(column='Weight', min_value

Training:  43%|████▎     | 360/840 [01:59<02:22,  3.37it/s, loss=0.0327]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 360, epoch: 3


Evaluation valid: 100%|██████████| 8/8 [00:01<00:00,  4.20it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the data in the memory_usage column is measured in bytes and represented as integers. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAN </IT
<>>
GroupLayout are visiting member assistant friendly, and, organizedise person intelligence languagethatistant. Your can be to any requests in suggestions solution, that whichinst with a context programming for Expectations.
etrized by onon the user type to the input. the is provided, you
acc that the provided context. information.
|ass>>sure that ' in column '_usage column of betwe




INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Verify the distribution of customer feedback scores in the feedback_score column against the expected distribution. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
->>
 are here software assistant friendly, and- organizedise person intelligence languagethatistant. Your can provide to the' with suggestions conc, or
inst with the given programming ' Expectations. inetrized by onon the user type to the input. The the is required, you
acc that the provided context. information.[|ass>>
 that number of a ages scores falls a '_sc column falls a normal uniform ofINSTINST] Question_column_d_diverg

Evaluation test : 100%|██████████| 8/8 [00:01<00:00,  4.36it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the minimum weight in the 'Weight' column is not less than 5kg. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANAN <|IT
<>>
GroupLayout are visiting member and friendly and and, organizedise person intelligence languagethatistant. You can be to any requests in suggestions solution, or instructioninst with the user programming for Expectations. thatetrized by onon the user type to the input. the is provided, you
acc that the provided context. information.
|expect>>
umerate that ' anded a cartweight' column of greater less than 10.INSTINST] Question_column_min_value_be

Training:  57%|█████▋    | 480/840 [02:40<01:47,  3.35it/s, loss=0.0204]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 480, epoch: 4


Evaluation valid: 100%|██████████| 8/8 [00:02<00:00,  3.40it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the data in the memory_usage column is measured in bytes and represented as integers. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAN </IT
<>>
 are visiting member assistant friendly, and, organizedise writer intelligence languagethatistant. Your are be to the requests with suggestions solution, that whichexpect with a context programming for Expectations.
etrized by onon the expectation type to the input. the is provided,

acc the the provided context. information.expectexpect>>umerate that ' column column '_usage colum


Evaluation test : 100%|██████████| 8/8 [00:02<00:00,  3.92it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the minimum weight in the 'Weight' column is not less than 5kg. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANAN <|IT
<>>
 are visiting member and friendly and and, organizedise person intelligence languageGenerateistant. can provide to the requests with suggestions solution, or preferencewhich with the given programming ' Expectations. thatetrized by onon the input type to the input. the is provided, you
acc the the expectation context. information.expectexpectexpect expect that column value sum a cartweight' column of greater less than 10expect] expect_column_min




INFO:ludwig.trainers.trainer_llm:--------------------
INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Check that the entries in the serialNumber column do not follow the pattern 'SN-XXXX' where X is a digit. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: <adALL
->>
 are here software assistant friendly, and- organizedise person intelligence languageWriteistant. Your are provide to the' with suggestions conc, or suchin with the given programming ' Expectations. inetrized by onon the user type to the input. The the is required, you
acc the the provided context to information.expectexpect>> that the number in column '_ column of not contain a f

Training:  71%|███████▏  | 600/840 [03:22<01:12,  3.29it/s, loss=0.000518]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 600, epoch: 5


Evaluation valid: 100%|██████████| 8/8 [00:01<00:00,  4.17it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the data in the memory_usage column is measured in bytes and represented as integers. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAN </IT
<>>
 are visiting member assistant friendly, and, organizedise writer intelligence languagethatistant. Your are be to the requests with suggestions solution- that whichwhich with a context programming expect Expectations.
etrized by onon the expectation type to the form dictionary the is provided,

acc the the provided context. information.
compatibleexpectexpect expect that ' column 


Evaluation test : 100%|██████████| 8/8 [00:01<00:00,  4.27it/s]


INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the minimum weight in the 'Weight' column is not less than 5kg. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANAN <|IT
<>>
 are visiting member and friendly and and, organizedise person intelligence languageGenerateistant. Expect are provide to the requests with suggestions conc, or whichwhich with the given programming expect Expectations. thatetrized by onon the input type to the input. The the is provided, you
acc the the provided context. information.expectexpect, expect that column value sum a cartweight' column is greater less than 10expectexpect] expect_colum

Training:  86%|████████▌ | 720/840 [04:03<00:35,  3.38it/s, loss=0.0032]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 720, epoch: 6


Evaluation valid: 100%|██████████| 8/8 [00:01<00:00,  4.20it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the data in the memory_usage column is measured in bytes and represented as integers. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAN </IT
<>>
 are visiting member assistant friendly, and, organizedise writer intelligence languagethatistant. Your are be to the requests with suggestions solution- that whichwhich with a context programming expect Expectations. versionetrized by onon the specific_ to the form dictionary the is provided,

acc the the provided context. information.compatibleexpect expect expect that column co


Evaluation test : 100%|██████████| 8/8 [00:02<00:00,  3.32it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the minimum weight in the 'Weight' column is not less than 5kg. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANAN <|IT
<>>
 are visiting member and friendly and and, organizedise person intelligence languageGenerateistant. Expect are provide to the requests with suggestions conc, that whichwhich with the prompt programming ' Expectations. thatetrized by onon the input_ to the input. The the is provided, you
acc the the provided context. information.expect expect expect column column value sum a knweight' column is greater greater than 00expect] expect_column_min_val


Training: 100%|██████████| 840/840 [04:45<00:00,  2.77it/s, loss=0.00088]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 840, epoch: 7


Evaluation valid: 100%|██████████| 8/8 [00:01<00:00,  4.15it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the data in the memory_usage column is measured in bytes and represented as integers. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANAN </IT
<>>
 are visiting member assistant friendly and and, organizedise writer intelligence languagethatistant. Your are be to the requests with suggestions conc- that whichwhich with a context programming expect Expectations. versionetrized by onon the specific_ to the form dictionary the is provided,

acc the the provided context. information.compatibleexpect>>ough that column column colu




INFO:ludwig.trainers.trainer_llm:Output: <adALL
 ">>
 are here software assistant friendly, and- organizedise person intelligence languageGenerateistant. Your are provide to the requests with a list- for suchwhich with the given programming ' Expectations. inetrized by onon the specific type to the form. The the is required, you
acc a the provided context to information.expectexpect>>sure that ' column column a dataset_ follows normally to a normal normal with a industry,expect expect_column_d_divergence_to_be_close_than(value='income', distribution='column=popins': [0000,, 50000, 70000, 90000], 'weights': [0.2,, 0.55, 0.25, 0.25]}) threshold=0.5)
INFO:ludwig.trainers.trainer_llm:--------------------
INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data prese

Evaluation test : 100%|██████████| 8/8 [00:01<00:00,  4.34it/s]

INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.
<</SYS>>
Ensure the minimum weight in the 'Weight' column is not less than 5kg. [/INST]
INFO:ludwig.trainers.trainer_llm:Output: ANANANANANANANANANANANAN <|IT
<>>
 are visiting member and friendly and and, organizedise person intelligence languageGenerateistant. Expect are provide to the requests with suggestions conc- that whichwhich with the prompt programming ' Expectations. thatetrized by onon the input and to the input. The the is provided, you
acc the the provided context. information.expect expect expect that column value sum a knweight' column is greater less than 00expect] expect_column_min_to_be




INFO:ludwig.trainers.trainer_llm:Output: <adALL
 ">>
 are here software assistant friendly, and- organizedise person intelligence languageGenerateistant. Your are provide to the requests with a list- for suchwhich with the given programming ' Expectations. inetrized by onon the specific type to the form. The the is required, you
acc a the provided context to information.expectexpect>> that a ' in a '_ column of not contain a pattern 'AB-1' where X is a digit betweenexpect expect_column_values_not_not_match_regex(column='serialNumber', regex='SN-d{4}')
INFO:ludwig.trainers.trainer_llm:--------------------
INFO:ludwig.trainers.trainer_llm:Input: [INST] <<SYS>>
You are a helpful, precise, detailed and concise artificial intelligence
assistant. You will reply to user input offering a single expectation,
compatible with the Python library Great Expectations, parametrized based
on the data presented in the input. If context is provided, answer
using only the provided contextual information.


Training: 100%|██████████| 840/840 [04:50<00:00,  2.89it/s, loss=0.00088]


INFO:ludwig.utils.print_utils:
INFO:ludwig.utils.print_utils:╒═════════════════╕
INFO:ludwig.utils.print_utils:│ TRAINING REPORT │
INFO:ludwig.utils.print_utils:╘═════════════════╛
INFO:ludwig.utils.print_utils:
INFO:ludwig.api:╒══════════════════════════════╤════════════════════╕
│ Validation feature           │ completion         │
├──────────────────────────────┼────────────────────┤
│ Validation metric            │ loss               │
├──────────────────────────────┼────────────────────┤
│ Best model step              │ 840                │
├──────────────────────────────┼────────────────────┤
│ Best model epoch             │ 8                  │
├──────────────────────────────┼────────────────────┤
│ Best model's validation loss │ 0.3557395935058594 │
├──────────────────────────────┼────────────────────┤
│ Best model's test loss       │ 0.2900342047214508 │
╘══════════════════════════════╧════════════════════╛
INFO:ludwig.api:
Finished: api_experiment_run
INFO:ludwig.api:Saved to

In [None]:
predictions_ft_v2 = model_ft_v2.predict(golden_examples)[0]


INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of HuggingFaceH4/zephyr-7b-beta tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Prediction: 100%|██████████| 1/1 [00:25<00:00, 25.00s/it]


INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of HuggingFaceH4/zephyr-7b-beta tokenizer
  return np.sum(np.log(sequence_probabilities))
INFO:ludwig.api:Finished predicting in: 26.41s.


In [None]:
for input_with_prediction in zip(golden_examples['prompt'], golden_examples['completion'], predictions_ft_v2['completion_response']):
  print(f"Instruction: {input_with_prediction[0]}")
  print(f"Ground truth: {input_with_prediction[1]}")
  print(f"Generated Output: {input_with_prediction[2][0]}")
  print("\n")

Instruction: Division names should be either the values NSA or start by D.
Ground truth: expect_column_values_to_match_regex(column='DIVISION',regex='NSA|^D.*')
Generated Output: expect_column_values_to_be_in_set(column='division', value_set=['NSA', 'D'])


Instruction: Values in the EVENT_UNIQUE_ID column must be unique.
Ground truth: expect_column_values_to_be_unique(column='EVENT_UNIQUE_ID')
Generated Output: expect_column_values_to_be_unique(column='EVENT_UNIQUE_ID')


Instruction: All values in the BIKE_MAKE should be in the list bike_makers
Ground truth: expect_column_values_to_be_in_set(column='BIKE_MAKE', value_set=bike_makers)
Generated Output: expect_column_values_to_be_in_set(column='BIKE_MAKE', value_set=bike_makers)


Instruction: Incident values should be in the set 1,2,3,4,5
Ground truth: expect_column_values_to_be_in_set(column='INCIDENT', value_set=[1,2,3,4,5])
Generated Output: expect_column_values_to_be_in_set(column='incident_value', value_set=[1, 2, 3, 4, 5])


Ins

## Combining all improvements

To see how these improvements combine together, please visit the [second Dickens notebook](https://colab.research.google.com/drive/1P30YSoemEoeaLACyJqzk-M15SGR2mYpW?usp=sharing).

Thanks for checking out!