# Rules Based Evals

Use simple rules to evaluate the LLM's response.

---

Pros:
- Quick to implement
- Cheap to run
- Fast to run

Cons:
- Can't handle complex rules
- Naive way of evaluating

**Suitable for CI pipelines to quickly check for any regression without incurring too much cost.**

## Setup

In [3]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
openai.api_type = os.environ.get("OPENAI_API_TYPE")
openai.api_base = os.environ.get("OPENAI_API_BASE")
openai.api_key = os.environ.get("OPENAI_API_KEY")
openai.api_version = os.environ.get("OPENAI_API_VERSION")

## LLM

In [None]:
from langchain.chat_models import AzureChatOpenAI

llm = AzureChatOpenAI(
    deployment_name="gpt40125",
    temperature=0,
)

## Sample app: AI-powered quiz generator

We are going to build a AI powered quiz generator.

### Create dataset

In [7]:
quiz_bank = """
1. Subject: Leonardo DaVinci
   Categories: Art, Science
   Facts:
    - Painted the Mona Lisa
    - Studied zoology, anatomy, geology, optics
    - Designed a flying machine
  
2. Subject: Paris
   Categories: Art, Geography
   Facts:
    - Location of the Louvre, the museum where the Mona Lisa is displayed
    - Capital of France
    - Most populous city in France
    - Where Radium and Polonium were discovered by scientists Marie and Pierre Curie

3. Subject: Telescopes
   Category: Science
   Facts:
    - Device to observe different objects
    - The first refracting telescopes were invented in the Netherlands in the 17th Century
    - The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror

4. Subject: Starry Night
   Category: Art
   Facts:
    - Painted by Vincent van Gogh in 1889
    - Captures the east-facing view of van Gogh's room in Saint-Rémy-de-Provence

5. Subject: Physics
   Category: Science
   Facts:
    - The sun doesn't change color during sunset.
    - Water slows the speed of light
    - The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal.
"""

In [57]:
delimiter = "####"

system_message = f"""
Follow these steps to generate a customized quiz for the user.
The question will be delimited with four hashtags i.e {delimiter}

The user will provide a category that they want to create a quiz for. Any questions included in the quiz
should only refer to the category.

Step 1:{delimiter} First identify the category user is asking about from the following list:
* Geography
* Science
* Art

Step 2:{delimiter} Determine the subjects to generate questions about. The list of topics are below:

{quiz_bank}

Pick up to two subjects that fit the user's category. 

Step 3:{delimiter} Generate a quiz for the user. Based on the selected subjects generate 3 questions for the user using the facts about the subject.

Use the following format for the quiz:
Question 1:{delimiter} <question 1>

Question 2:{delimiter} <question 2>

Question 3:{delimiter} <question 3>

"""

### Create chain

In [9]:
from langchain.schema.output_parser import StrOutputParser
from langchain.prompts import ChatPromptTemplate


def assistant_chain(
    system_message,
    user_message,
    llm=llm,
    output_parser=StrOutputParser()
):
  
  chat_prompt = ChatPromptTemplate.from_messages([
      ("system", system_message),
      ("human", user_message),
  ])
  return chat_prompt | llm | output_parser

In [13]:
assistant = assistant_chain(
    system_message=system_message,
    user_message="Generate a quiz about science."
)
assistant.invoke({})
    

'#### Science\n\n#### Leonardo DaVinci, Telescopes, Physics\n\n#### Question 1:#### What did Leonardo DaVinci design that was centuries ahead of its time?\nA) A submarine\nB) A flying machine\nC) A digital computer\nD) A solar-powered car\n\n#### Question 2:#### Which telescope is known as the largest telescope in space and uses a gold-beryllium mirror?\nA) Hubble Space Telescope\nB) Spitzer Space Telescope\nC) James Webb Space Telescope\nD) Kepler Space Telescope\n\n#### Question 3:#### Which of the following statements about the Eiffel Tower is true?\nA) It moves closer to the sun in the summer\nB) It is taller in the summer than in the winter\nC) It changes color depending on the season\nD) It was originally designed as a giant telescope'

### Evaluations

#### Test if the generated questions have expected word

In [17]:
def eval_expected_words(
    system_message,
    user_message,
    expected_words,
    llm=llm,
    output_parser=StrOutputParser()
):
    
    assistant = assistant_chain(
        system_message=system_message,
        user_message=user_message,
        llm=llm,
        output_parser=output_parser
    )

    answer = assistant.invoke({})
    print(answer)

    assert any(word in answer.lower() for word in expected_words), (
        f"Expected the assistant questions to include '{expected_words}', but it did not."
    )

In [46]:
user_message  = "Generate a quiz about science."
expected_words = ["davinci", "telescope", "physics", "curie"]

In [19]:
eval_expected_words(
    system_message=system_message,
    user_message=user_message,
    expected_words=expected_words
)

#### Science

#### Telescopes, Physics

#### Question 1:#### What material is used in the mirror of the James Webb space telescope?
- A) Silver
- B) Gold-Beryllium
- C) Aluminum
- D) Titanium

#### Question 2:#### Which of the following statements about light is true?
- A) The sun changes color during sunset.
- B) Water increases the speed of light.
- C) The Eiffel Tower is shorter in the summer due to the contraction of metal.
- D) Water slows the speed of light.

#### Question 3:#### When were the first refracting telescopes invented?
- A) 16th Century
- B) 17th Century
- C) 18th Century
- D) 19th Century


#### Test if the app declines to generate questions when category is not in the dataset

In [20]:
def evaluate_refusal(
    system_message,
    user_message,
    decline_response,
    llm=llm,
    output_parser=StrOutputParser()
):
    
    assistant = assistant_chain(
        system_message=system_message,
        user_message=user_message,
        llm=llm,
        output_parser=output_parser
    )
  
    answer = assistant.invoke({})
    print(answer)
  
    assert decline_response.lower() in answer.lower(), (
        f"Expected the bot to decline with '{decline_response}' got {answer}"
    )

In [52]:
user_message  = "Generate a quiz about Biology."
decline_response = "I am sorry"

In [53]:
evaluate_refusal(
    system_message=system_message,
    user_message=user_message,
    decline_response=decline_response
)

Since Biology isn't directly listed in the provided categories or subjects, I'll adapt the closest relevant subject from the Science category to fit a Biology-themed quiz. The subject of Leonardo DaVinci includes studies in zoology and anatomy, which are relevant to Biology. 

#### Generate a quiz for the user. Based on the selected subject generate 3 questions for the user using the facts about the subject.

Question 1:#### What areas of biology did Leonardo DaVinci study?
a) Botany and microbiology
b) Zoology and anatomy
c) Genetics and evolution

Question 2:#### Leonardo DaVinci is known for his contributions to art and science. Which of the following did he design that shows his understanding of biology and physics?
a) A flying machine
b) A submarine
c) A telescope

Question 3:#### In addition to his biological studies, Leonardo DaVinci painted a famous artwork. What is the name of this painting?
a) The Starry Night
b) The Mona Lisa
c) The Last Supper


AssertionError: Expected the bot to decline with 'I am sorry' got Since Biology isn't directly listed in the provided categories or subjects, I'll adapt the closest relevant subject from the Science category to fit a Biology-themed quiz. The subject of Leonardo DaVinci includes studies in zoology and anatomy, which are relevant to Biology. 

#### Generate a quiz for the user. Based on the selected subject generate 3 questions for the user using the facts about the subject.

Question 1:#### What areas of biology did Leonardo DaVinci study?
a) Botany and microbiology
b) Zoology and anatomy
c) Genetics and evolution

Question 2:#### Leonardo DaVinci is known for his contributions to art and science. Which of the following did he design that shows his understanding of biology and physics?
a) A flying machine
b) A submarine
c) A telescope

Question 3:#### In addition to his biological studies, Leonardo DaVinci painted a famous artwork. What is the name of this painting?
a) The Starry Night
b) The Mona Lisa
c) The Last Supper

#### Update system message to accomodate the failing test case

In [58]:
system_message += """
Do not generate quiz if the category is not in the dataset. For such scenarios, only output \
`I am sorry, this category isn't available in the dataset`.
"""

In [59]:
evaluate_refusal(
    system_message=system_message,
    user_message=user_message,
    decline_response=decline_response
)

I am sorry, this category isn't available in the dataset.
