<a href="https://colab.research.google.com/github/jbottum/t5pat/blob/main/score2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install einops
!pip install transformers
!pip install --upgrade transformers
!pip install accelerate
!pip install torch
!pip install langchain
!pip install bert-score

Collecting einops
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m675.6 kB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: einops
Successfully installed einops-0.6.1
Collecting transformers
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m

In [2]:
import time
import matplotlib.pyplot as plt
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM, pipeline
import transformers
import torch

import bert_score
import logging
logging.getLogger("transformers.modeling_utils").setLevel(logging.ERROR)


import os
# Disable parallelism and avoid the warning message
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [3]:
# create a sample model data using python variables and lists
user_id = 'jbottum'
project_id = 'project1'
model_id = 'google/flan-t5-large'

# Define prompts and question types
prompts = [
    'What is the capital of Germany?',
    'What is the capital of Spain?',
    'What is the capital of Canada?',
    'What is the next number in the sequence: 2, 4, 6, 8, ...? If all cats have tails, and Fluffy is a cat, does Fluffy have a tail?',
    'If you eat too much junk food, what will happen to your health? How does smoking affect the risk of lung cancer?',
    'In the same way that pen is related to paper, what is fork related to? If tree is related to forest, what is brick related to?',
    'Every time John eats peanuts, he gets a rash. Does John have a peanut allergy? Every time Sarah studies for a test, she gets an A. Will Sarah get an A on the next test if she studies?',
    'All dogs have fur. Max is a dog. Does Max have fur? If it is raining outside, and Mary does not like to get wet, will Mary take an umbrella?',
    'If I had studied harder, would I have passed the exam? What would have happened if Thomas Edison had not invented the light bulb?',
    'The center of Tropical Storm Arlene, at 02/1800 UTC, is near 26.7N 86.2W. This position is about 425 km/230 nm to the west of Fort Myers in Florida, and it is about 550 km/297 nm to the NNW of the western tip of Cuba. The tropical storm is moving southward, or 175 degrees, 4 knots. The estimated minimum central pressure is 1002 mb. The maximum sustained wind speeds are 35 knots with gusts to 45 knots. The sea heights that are close to the tropical storm are ranging from 6 feet to a maximum of 10 feet.  Precipitation: scattered to numerous moderate is within 180 nm of the center in the NE quadrant. Isolated moderate is from 25N to 27N between 80W and 84W, including parts of south Florida.  Broad surface low pressure extends from the area of the tropical storm, through the Yucatan Channel, into the NW part of the Caribbean Sea.   Where and when will the storm make landfall?',
]

types = [
    'Knowledge Retrieval',
    'Knowledge Retrieval',
    'Knowledge Retrieval',
    'Logical Reasoning',
    'Cause and Effect',
    'Analogical Reasoning',
    'Inductive Reasoning',
    'Deductive Reasoning',
    'Counterfactual Reasoning',
    'In Context'
]

# Example reference answers
reference_list = [
    'The capital of Germany is Berlin',
    'The capital of Spain is Madrid',
    'The capital of Canada is Ottawa',
    'The next number in the sequence is 10.  Yes, Fluffy is a cat and therefore has a tail.',
    'Eating junk food can result in health problems like weight gain and high cholesterol. Smoking can cause lung issues including cancer.',
    'Fork is related to a plate.  A brick is related to a building.',
    'Maybe, to determine if Johns rash is caused by peanuts, he should take an allergy test for peanuts.   Maybe, Sarah will likely do well if she studies and she may be able to get an A.',
    'Yes, Max is a dog and has fur.   Yes, Mary will take an umbrella.',
    'Yes, if you studied harder, you would have passed the test.  If Thomas Edison did not invent the light blub, another inventor would have created the light bulb.',
    'If Arlene continues in the same direciton and speed, storm will make landfall in the Forida Keys in 18 hours from this report.'
]



In [4]:
# load sample model data into a python dictionary
data = {
#   "user_id": "jbottum",
#   "project_id": "project1",
    "model_id": "google/flan-t5-large",
    "user_id": user_id,
    "project_id": project_id,
    "model_id": model_id,
    "prompts": prompts,
    "types": types,
    "reference_list": reference_list
}


In [5]:
import json

# Save data as JSON
with open("data.json", "w") as json_file:
    json.dump(data, json_file, indent=4)


In [6]:
import csv

# Transpose the data to convert it to CSV format
transposed_data = list(zip(prompts, types, reference_list))

# Save data as CSV
with open("data.csv", "w", newline="") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["Prompts", "Types", "Reference List"])
    writer.writerows(transposed_data)


JSON Schema for the Data:

{
  "user_id": "string",
  "project_id": "string",
  "model_id": "string",
  "prompts": [
    "string",
    ...
  ],
  "types": [
    "string",
    ...
  ],
  "reference_list": [
    "string",
    ...
  ]
}


Explanation:

user_id (string): The user_id field is a string that should contain the unique identifier of the user who owns the project. This identifier helps in associating the data with a specific user account or project owner.

project_id (string): The project_id field is a string that should contain the unique identifier of the project associated with the data. It allows users to differentiate between various projects and their corresponding data.

model_id (string): The model_id field is a string that should contain the identifier of the model used for the project. This identifier helps users identify the specific language model or AI model used for processing the prompts.

prompts (list of strings): The prompts field is a list of strings. Each string in this list represents a different type of question or scenario that the model is expected to answer or process. Users should provide the prompts in the order they want them processed.

types (list of strings): The types field is a list of strings. Each string in this list corresponds to the type of the prompt in the same position in the prompts list. The types categorize the prompts based on their nature, such as "Knowledge Retrieval," "Logical Reasoning," etc.

reference_list (list of strings): The reference_list field is a list of strings. Each string in this list corresponds to the example reference answer for the prompt in the same position in the prompts list. These reference answers represent the expected model responses for each prompt.

Note:

The ... in the schema represents that the respective lists (prompts, types, and reference_list) can contain an arbitrary number of elements, depending on the number of prompts and their corresponding information.
Users building JSON files should follow the exact structure and field names as specified in the schema to ensure compatibility and proper data representation within the app.

CSV Schema for the Data:

Prompts,Types,Reference List
string,string,string
...


Explanation:

Prompts (string): This column should contain the prompts that represent different types of questions or scenarios for the model to answer or process.
Types (string): This column should contain the types corresponding to each prompt. These types categorize the prompts based on the nature of the question or scenario, such as "Knowledge Retrieval," "Logical Reasoning," etc.
Reference List (string): This column should contain the example reference answers for each prompt. These reference answers represent the expected model responses to the corresponding prompts.
Note:

The ... in the schema indicates that there can be an arbitrary number of rows in the CSV file, each corresponding to a different prompt, type, and reference answer.
Users building CSV files should strictly follow the order of columns as specified in the schema to ensure proper mapping and accuracy. Each row in the CSV should contain data for one prompt, its corresponding type, and the reference answer.

In [7]:
import json

# Load data from the JSON file into a dictionary
with open("data.json", "r") as json_file:
    data = json.load(json_file)


In [8]:
user_id = data["user_id"]
project_id = data["project_id"]
model_id = data["model_id"]
prompts = data["prompts"]
types = data["types"]
reference_list = data["reference_list"]


In [9]:
import json

# Load data from the JSON file into a dictionary
with open("data.json", "r") as json_file:
    data = json.load(json_file)

# Print the dictionary to check the loaded data
print(data)


{'model_id': 'google/flan-t5-large', 'user_id': 'jbottum', 'project_id': 'project1', 'prompts': ['What is the capital of Germany?', 'What is the capital of Spain?', 'What is the capital of Canada?', 'What is the next number in the sequence: 2, 4, 6, 8, ...? If all cats have tails, and Fluffy is a cat, does Fluffy have a tail?', 'If you eat too much junk food, what will happen to your health? How does smoking affect the risk of lung cancer?', 'In the same way that pen is related to paper, what is fork related to? If tree is related to forest, what is brick related to?', 'Every time John eats peanuts, he gets a rash. Does John have a peanut allergy? Every time Sarah studies for a test, she gets an A. Will Sarah get an A on the next test if she studies?', 'All dogs have fur. Max is a dog. Does Max have fur? If it is raining outside, and Mary does not like to get wet, will Mary take an umbrella?', 'If I had studied harder, would I have passed the exam? What would have happened if Thomas Ed

In [10]:
import time
import torch
# from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM, pipeline, HuggingFacePipeline

# Load tokenizer
tokenizer_start_time = time.time()
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer_end_time = time.time()

# Load model
model_start_time = time.time()
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
model_end_time = time.time()

# Load pipeline
pipe_start_time = time.time()
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_length=512)
local_llm = HuggingFacePipeline(pipeline=pipe)
pipe_end_time = time.time()

# Store loading times
model_load_time = model_end_time - model_start_time
tokenizer_load_time = tokenizer_end_time - tokenizer_start_time
pipeline_load_time = pipe_end_time - pipe_start_time

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [11]:
import time
from transformers import pipeline
from transformers import AutoModelForSeq2SeqLM, pipeline
import bert_score

# Create the pipeline for text generation
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_length=50)

# Initialize the dictionary to store results
epoch_record = {}

# Loop through the prompts and generate text
for i, prompt in enumerate(data['prompts']):
    start_time = time.time()
    generated = pipe(prompt, num_return_sequences=1)[0]['generated_text']
    end_time = time.time()
    generation_time = end_time - start_time

    # Store the results in a dictionary record
    record = {
        'timestamp': start_time,
        'project_name': data['project_id'],
        'user_id': data['user_id'],
        'model_id': data['model_id'],
        'prompt': prompt,
        'type': data['types'][i],
        'reference_answer': data['reference_list'][i],
        'generated_answer': generated,
        'generation_time': generation_time,
        'model_load_time': model_load_time,
        'tokenizer_load_time': tokenizer_load_time,
        'pipeline_load_time': pipeline_load_time,
    }

    # Calculate BERTScore
    P, R, F1 = bert_score.score([generated], [record['reference_answer']], lang="en", verbose=False)
    record['precision'] = P.numpy()[0].item()
    record['recall'] = R.numpy()[0].item()
    record['f1'] = F1.numpy()[0].item()

    # Store the record in epoch_record with a unique key
    epoch_record[f"Example {i + 1}"] = record



Downloading (…)lve/main/config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

In [17]:
# Print the results in the same format as the current script
import time
import datetime
from transformers import pipeline
import bert_score
print()
print(f"Model: {data['model_id']}")
print(f"Project: {data['project_id']}")
print(f"User: {data['user_id']}")
print("Run Date:", datetime.datetime.fromtimestamp(record['timestamp']).strftime('%Y-%m-%d %H:%M:%S'))
print("Model load time:", round(record['model_load_time'], 5))
print("Tokenizer load time:", round(record['tokenizer_load_time'], 5))
print("Pipeline load time:", round(record['pipeline_load_time'], 5))
print("=" * 30)

for example_number, record in epoch_record.items():
    print(f"{example_number}:")
    print("Prompt:", record['prompt'])
    print("Generated Text:", record['generated_answer'])
    print("Reference Answer:", record['reference_answer'])
    print("Generation Time:", round(record['generation_time'], 5))

    print("Type:", record['type'])
    print("Precision:", round(record['precision'], 5))
    print("Recall:", round(record['recall'], 5))
    print("F1 Score:", round(record['f1'], 5))
    print("=" * 10)


Model: google/flan-t5-large
Project: project1
User: jbottum
Run Date: 2023-07-21 03:52:46
Model load time: 64.29525
Tokenizer load time: 2.0048
Pipeline load time: 0.1414
Example 1:
Prompt: What is the capital of Germany?
Generated Text: berlin
Reference Answer: The capital of Germany is Berlin
Generation Time: 1.23543
Type: Knowledge Retrieval
Precision: 0.80484
Recall: 0.82159
F1 Score: 0.81313
Example 2:
Prompt: What is the capital of Spain?
Generated Text: turin
Reference Answer: The capital of Spain is Madrid
Generation Time: 1.69031
Type: Knowledge Retrieval
Precision: 0.76089
Recall: 0.79643
F1 Score: 0.77825
Example 3:
Prompt: What is the capital of Canada?
Generated Text: toronto
Reference Answer: The capital of Canada is Ottawa
Generation Time: 1.09452
Type: Knowledge Retrieval
Precision: 0.77874
Recall: 0.78587
F1 Score: 0.78229
Example 4:
Prompt: What is the next number in the sequence: 2, 4, 6, 8, ...? If all cats have tails, and Fluffy is a cat, does Fluffy have a tail?


In [13]:
import csv
import json

# Save epoch_record as a CSV file
csv_file = "epoch_record.csv"
with open(csv_file, "w", newline="") as csvfile:
    fieldnames = [
        'timestamp', 'project_name', 'user_id', 'model_id', 'prompt', 'type',
        'reference_answer', 'generated_answer', 'generation_time',
        'model_load_time','tokenizer_load_time','pipeline_load_time',
        'precision', 'recall', 'f1'
    ]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for example_number, record in epoch_record.items():
        writer.writerow(record)

# Save epoch_record as a JSON file
json_file = "epoch_record.json"
with open(json_file, "w") as jsonfile:
    json.dump(epoch_record, jsonfile, indent=4)


In [14]:
import pandas as pd

# Read the CSV file
csv_file = "epoch_record.csv"
df = pd.read_csv(csv_file)

# Display the DataFrame with headers
df


Unnamed: 0,timestamp,project_name,user_id,model_id,prompt,type,reference_answer,generated_answer,generation_time,model_load_time,tokenizer_load_time,pipeline_load_time,precision,recall,f1
0,1689911000.0,project1,jbottum,google/flan-t5-large,What is the capital of Germany?,Knowledge Retrieval,The capital of Germany is Berlin,berlin,1.235425,64.295249,2.004805,0.141404,0.804843,0.821588,0.813129
1,1689912000.0,project1,jbottum,google/flan-t5-large,What is the capital of Spain?,Knowledge Retrieval,The capital of Spain is Madrid,turin,1.690311,64.295249,2.004805,0.141404,0.760889,0.796428,0.778253
2,1689912000.0,project1,jbottum,google/flan-t5-large,What is the capital of Canada?,Knowledge Retrieval,The capital of Canada is Ottawa,toronto,1.094523,64.295249,2.004805,0.141404,0.778736,0.785868,0.782286
3,1689912000.0,project1,jbottum,google/flan-t5-large,"What is the next number in the sequence: 2, 4,...",Logical Reasoning,"The next number in the sequence is 10. Yes, F...",yes,1.633795,64.295249,2.004805,0.141404,0.800387,0.808023,0.804187
4,1689912000.0,project1,jbottum,google/flan-t5-large,"If you eat too much junk food, what will happe...",Cause and Effect,Eating junk food can result in health problems...,no,0.927203,64.295249,2.004805,0.141404,0.817102,0.795092,0.805947
5,1689912000.0,project1,jbottum,google/flan-t5-large,"In the same way that pen is related to paper, ...",Analogical Reasoning,Fork is related to a plate. A brick is relate...,brick is related to brick,1.765297,64.295249,2.004805,0.141404,0.928557,0.897447,0.912737
6,1689912000.0,project1,jbottum,google/flan-t5-large,"Every time John eats peanuts, he gets a rash. ...",Inductive Reasoning,"Maybe, to determine if Johns rash is caused by...",yes,1.273458,64.295249,2.004805,0.141404,0.822684,0.790701,0.806376
7,1689912000.0,project1,jbottum,google/flan-t5-large,All dogs have fur. Max is a dog. Does Max have...,Deductive Reasoning,"Yes, Max is a dog and has fur. Yes, Mary wil...",yes,1.13079,64.295249,2.004805,0.141404,0.805786,0.808462,0.807122
8,1689912000.0,project1,jbottum,google/flan-t5-large,"If I had studied harder, would I have passed t...",Counterfactual Reasoning,"Yes, if you studied harder, you would have pas...",no one would have invented the light bulb,4.717661,64.295249,2.004805,0.141404,0.902076,0.866546,0.883954
9,1689912000.0,project1,jbottum,google/flan-t5-large,"The center of Tropical Storm Arlene, at 02/180...",In Context,If Arlene continues in the same direciton and ...,about 425 km/230 nm to the west of Fort Myers ...,13.231891,64.295249,2.004805,0.141404,0.79358,0.816447,0.804851


In [15]:
import json
from pprint import pprint

# Read the JSON file
json_file = "epoch_record.json"
with open(json_file, "r") as jsonfile:
    json_data = json.load(jsonfile)

# Display the JSON data with formatting
pprint(json_data)


{'Example 1': {'f1': 0.8131290674209595,
               'generated_answer': 'berlin',
               'generation_time': 1.2354254722595215,
               'model_id': 'google/flan-t5-large',
               'model_load_time': 64.29524898529053,
               'pipeline_load_time': 0.1414036750793457,
               'precision': 0.8048427104949951,
               'project_name': 'project1',
               'prompt': 'What is the capital of Germany?',
               'recall': 0.821587860584259,
               'reference_answer': 'The capital of Germany is Berlin',
               'timestamp': 1689911482.2889998,
               'tokenizer_load_time': 2.0048046112060547,
               'type': 'Knowledge Retrieval',
               'user_id': 'jbottum'},
 'Example 10': {'f1': 0.8048513531684875,
                'generated_answer': 'about 425 km/230 nm to the west of Fort '
                                    'Myers in Florida, and it is about 550 '
                                    'km/297 n