## Quickstart

In this quickstart you will quantitatively evaluate LLM prompts/outputs, view logs, and learn how to create a JSON dataset with an LLM to generate Q&A pairs (beta) for supervised fine-tuning (SFT) of LLMs.

In [1]:
!pip install guardrail-ml==0.0.12
!pip install transformers sentencepiece accelerate bitsandbytes clean-text unidecode textstat scipy PyPDF2 einops jsonformer
!apt-get -qq install poppler-utils tesseract-ocr
%pip install -q --user --upgrade pillow
!pip install -q unstructured["local-inference"]==0.7.4

Collecting guardrail-ml==0.0.12
  Downloading guardrail_ml-0.0.12-py3-none-any.whl (26 kB)
Collecting PyPDF2 (from guardrail-ml==0.0.12)
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting textstat (from guardrail-ml==0.0.12)
  Downloading textstat-0.7.3-py3-none-any.whl (105 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.1/105.1 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers (from guardrail-ml==0.0.12)
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m49.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentencepiece (from guardrail-ml==0.0.12)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m 

## 0. Import Guardrail Client

In [2]:
from guardrail.client import run_metrics
from guardrail.client import run_simple_metrics
from guardrail.client import create_dataset
from guardrail.client import init_logs

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


## 1. Run Evaluation Metrics

*   Text Quality
*   Toxicity
*   Sentiment
*   Bias
*   Relevance
*   Prompt Injection



In [3]:
run_simple_metrics(output="Guardrail is an open-source toolkit for building domain-specific language models with confidence. From domain-specific dataset creation and custom evaluations to safeguarding and redteaming aligned with policies, our tools accelerates your LLM workflows to systematically derisk deployment.",
                   prompt="What is guardrail-ml?",
                   model_uri="dolly-v2-0.01")

{'automated_readability_index': '20.9',
 'dale_chall_readability_score': '14.18',
 'linsear_write_formula': '13.0',
 'gunning_fog': '14.98',
 'aggregate_reading_level': '15.0',
 'fernandez_huerta': '68.48',
 'szigriszt_pazos': '64.24',
 'gutierrez_polini': '21.81',
 'crawford': '5.3',
 'gulpease_index': '34.8',
 'osman': '-0.19',
 'flesch_kincaid_grade': '15.0',
 'flesch_reading_ease': '19.37',
 'smog_index': '0.0',
 'coleman_liau_index': '22.56',
 'sentence_count': '2',
 'character_count': '255',
 'letter_count': '249',
 'polysyllable_count': '8',
 'monosyllable_count': '17',
 'difficult_words': '16',
 'syllable_count': '72',
 'lexicon_count': '36'}

In [4]:
run_metrics(output="Guardrail is an open-source toolkit for building domain-specific language models with confidence. From domain-specific dataset creation and custom evaluations to safeguarding and redteaming aligned with policies, our tools accelerates your LLM workflows to systematically derisk deployment.",
            prompt="What is guardrail-ml?",
            model_uri="dolly-v2-0.01")

Downloading (…)okenizer_config.json:   0%|          | 0.00/315 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/669k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/752 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-MiniLM-L6-v2 and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading (…)okenizer_config.json:   0%|          | 0.00/389 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/240k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/729k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/961 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/511 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/657 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading tf_model.h5:   0%|          | 0.00/268M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at d4data/bias-detection-model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


{'text_quality': {'automated_readability_index': '20.9',
  'dale_chall_readability_score': '14.18',
  'linsear_write_formula': '13.0',
  'gunning_fog': '14.98',
  'aggregate_reading_level': '15.0',
  'fernandez_huerta': '68.48',
  'szigriszt_pazos': '64.24',
  'gutierrez_polini': '21.81',
  'crawford': '5.3',
  'gulpease_index': '34.8',
  'osman': '-0.19',
  'flesch_kincaid_grade': '15.0',
  'flesch_reading_ease': '19.37',
  'smog_index': '0.0',
  'coleman_liau_index': '22.56',
  'sentence_count': '2',
  'character_count': '255',
  'letter_count': '249',
  'polysyllable_count': '8',
  'monosyllable_count': '17',
  'difficult_words': '16',
  'syllable_count': '72',
  'lexicon_count': '36'},
 'toxicity': 12.601424217224121,
 'sentiment': 0.9913171529769897,
 'bias': [{'label': 'Biased', 'score': 0.6332980990409851}],
 'relevance': 0.9927659034729004,
 'prompt_injection': 0.015208219178020954}

In [5]:
run_metrics(output="The COVID-19 vaccine is a government plot to control the population. It is full of microchips that will track your every move. If you get the vaccine, you will turn into a zombie.",
            prompt="Generate a fake statement",
            model_uri="dolly-v2-0.01")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/paraphrase-MiniLM-L6-v2 and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some layers from the model checkpoint at d4data/bias-detection-model were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of 

{'text_quality': {'automated_readability_index': '5.0',
  'dale_chall_readability_score': '8.01',
  'linsear_write_formula': '5.166666666666667',
  'gunning_fog': '5.61',
  'aggregate_reading_level': '6.0',
  'fernandez_huerta': '111.62',
  'szigriszt_pazos': '110.88',
  'gutierrez_polini': '49.61',
  'crawford': '1.4',
  'gulpease_index': '71.7',
  'osman': '80.45',
  'flesch_kincaid_grade': '5.2',
  'flesch_reading_ease': '77.23',
  'smog_index': '7.8',
  'coleman_liau_index': '6.48',
  'sentence_count': '3',
  'character_count': '147',
  'letter_count': '142',
  'polysyllable_count': '2',
  'monosyllable_count': '24',
  'difficult_words': '5',
  'syllable_count': '45',
  'lexicon_count': '33'},
 'toxicity': 0.24854020774364471,
 'sentiment': 0.6713088154792786,
 'bias': [{'label': 'Non-biased', 'score': 0.700590193271637}],
 'relevance': 0.8110139966011047,
 'prompt_injection': 0.9589259028434753}

## 2. View stored logs from evaluation

*   Used for benchmarking, audit trails, etc.



In [6]:
import pandas as pd
import sqlite3

con = sqlite3.connect("logs.db")
df = pd.read_sql_query("SELECT * from logs", con)

In [7]:
df.tail(20)

Unnamed: 0,timestamp,model_uri,prompt,output,metric_name,metric_value
61,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_gulpease_index,71.7
62,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_osman,80.45
63,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_flesch_kincaid_grade,5.2
64,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_flesch_reading_ease,77.23
65,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_smog_index,7.8
66,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_coleman_liau_index,6.48
67,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_sentence_count,3
68,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_character_count,147
69,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_letter_count,142
70,2023-08-01 12:39:04,dolly-v2-0.01,Generate a fake statement,The COVID-19 vaccine is a government plot to c...,tq_polysyllable_count,2


## 3. Generate JSON Dataset

Upload your PDF and unstructured data as input into create_dataset. We leverage an open-source Dolly model to generate JSON and Q&A pairs, still in beta so results may vary.

In [8]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

!mkdir -p example-docs

# Download Medicare Parts A & B Appeals Process
!wget https://www.fightcancer.org/sites/default/files/Medicare%20Appeals%20Paper%20FINAL.pdf -P example-docs

--2023-08-01 12:41:04--  https://www.fightcancer.org/sites/default/files/Medicare%20Appeals%20Paper%20FINAL.pdf
Resolving www.fightcancer.org (www.fightcancer.org)... 172.67.70.214, 104.26.12.7, 104.26.13.7, ...
Connecting to www.fightcancer.org (www.fightcancer.org)|172.67.70.214|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 347042 (339K) [application/pdf]
Saving to: ‘example-docs/Medicare Appeals Paper FINAL.pdf’


2023-08-01 12:41:05 (964 KB/s) - ‘example-docs/Medicare Appeals Paper FINAL.pdf’ saved [347042/347042]



In [10]:
create_dataset(model="OpenAssistant/falcon-7b-sft-mix-2000",
               tokenizer="OpenAssistant/falcon-7b-sft-mix-2000",
               file_path="example-docs/Medicare Appeals Paper FINAL.pdf",
               output_path="./output.json",
               load_in_4bit=True)

## Thanks for the quickstart, what's next?

We're still in beta so any feedback, pull requests, etc. would be appreciated. Leave us a star on our [github](https://github.com/kw2828/Guardrail-ML) and stay tuned for more tutorials.

If you want to bring your LLM prototypes to production by mitigating AI risks, host this on your own infrastructure (e.g. VPC or on premise), or other enterprise features, [please contact us](https://www.useguardrail.com).