# Judge Judy 
Judge Judy relies on OpenAI to provide evaluations.  This notebook only works under **https** mode as it requires you to call out to the OpenAI servers.

CAUTION: Your **OPENAI_API_KEY** will be displayed in this notebook

Please copy this example and customize it for your own purposes!

In [9]:
OPENAI_API_KEY='your-key-here'
QUEPID_API_QEY='your-api-here'
TEAM_ID=1
BOOK_ID=25
JUDGE_EMAIL='judge-judy-5@quepid.com'
LIMIT_TO_BE_JUDGED=10
judgement_counter = 0

In [2]:
from IPython.display import HTML, Markdown
import json
from pyodide.ffi import to_js
from IPython.display import JSON
from js import Object
from js import fetch

In [3]:
# Generic GET call to a JSON endpoint 
async def get_json(url):
    resp = await fetch(url)
    resp_text = await resp.text()
    return json.loads(resp_text)

async def post_json(url, payload):
    resp = await fetch(url,
      method= "POST",
      body= json.dumps(payload),
      credentials= "same-origin",
      headers= Object.fromEntries(to_js({  "Content-Type":"application/json","Authorization": "Bearer " + QUEPID_API_QEY })),
    )
    resp_text = await resp.text()
    return json.loads(resp_text)


## Step 2: Extract and Prepare Data for Judging

Make sure our user defined by the `JUDGE_EMAIL` exists.

In [4]:
users = await get_json(f'/api/users?prefix={JUDGE_EMAIL}')
if (len(users['users']) == 0):
    print(f'CREATING NEW JUDGE {JUDGE_EMAIL}')
    user = await post_json(f'/api/teams/{TEAM_ID}/members/invite', {'id': JUDGE_EMAIL})
else:
    user = users['users'][0]

judge_id = user['id']
    
Markdown(f"We will be generating judgements for {JUDGE_EMAIL}, judge_id: {judge_id}")
    

We will be generating judgements for judge-judy-5@quepid.com, judge_id: 23

Find out how many query/doc pairs exists for the book, and how many have already been judged by Judy

In [5]:
query_doc_pairs = await get_json(f'/api/books/{BOOK_ID}/query_doc_pairs')
Markdown(f"There are {len(query_doc_pairs['query_doc_pairs'])} query/doc pairs, I wish I could tell you how many have already been judged by {JUDGE_EMAIL}")


There are 2420 query/doc pairs, I wish I could tell you how many have already been judged by judge-judy-5@quepid.com

In [6]:
model = {
    'model': 'gpt-4-turbo-preview',
    'max_tokens': 2048,
    'top_p': 0.1,
    'seed': 1,
    'frequency_penalty': 0,
    'presence_penalty': 0,
    'response_format': {
        'type': 'json_object'
    }
}
system_message = {
    'role': 'system',
    'content': 'You are a helpful AI assistant.'
}
    

## Step 3: Judge just like a Human, using the sort of Randomized selection process

In [7]:
async def run_judgement(judge_id):    
    query_doc_pair = await get_json(f'/api/books/{BOOK_ID}/query_doc_pairs/to_be_judged/{judge_id}')
    query_text = query_doc_pair['query_text']
    document_fields = json.loads(query_doc_pair['document_fields'])
    document = f"{document_fields['name']} {document_fields['title']}"

    judge_prompt = f"""
    Act as a judge determining to what extent a document matches the search query that it is paired with. All of the documents are related to business and finance. Your job is to understand the intent of the search query and the relevance of the document.
    The user provides:
    - Query: This is the actual search that was sent to the search engine
    - Document: Fields from the retrieved document
    Consider each attribute and how it does or does not pertain to the question. If you do not understand a term or how it is used do not try to guess. You will judge the relevance according to the following rules:
    0: The document is irrelevant or relevance cannot be determined
    1: The document is somewhat relevant and may contribute to answering the query
    2: The document is relevant the query
    The date is 11-JAN-2024. This may affect how relevant documents are to time-based queries.
    Please reply in JSON with the following structure:
    - explanation: Why the document is relevant to the query
    - judgement: The judgement you would apply to the text from 0 to 2
    When explaining the judgement, only discuss why the document is relevant and not extraneous features of the document. Consider your answer carefully and explain your reasoning. Be strict in your assessment.
    Query: {query_text}
    Document: {document}
    """

    resp = await fetch('https://api.openai.com/v1/chat/completions',
      method= "POST",
      body= '{' + json.dumps(model)[1:-1] + ', "messages": [' + json.dumps(system_message) + ', {"role": "user", "content": ' + json.dumps(judge_prompt) + '}]}',
      credentials= "same-origin",
      headers= Object.fromEntries(to_js({  "Content-Type":"application/json","Authorization": "Bearer " + OPENAI_API_KEY })),
    )
    res = await resp.text()
    response = json.loads(res)
    #JSON(response)
    content_json = response['choices'][0]['message']['content']
    judgement = json.loads(content_json)
    print(f"Judged a {str(judgement['judgement'])} because {judgement['explanation']}")
    #s = "Judged a <b>" + str(judgement['judgement']) + "</b> because <i>" + judgement['explanation'] + '</i>'

    #display(HTML(s))


    judgement = await post_json(f"/api/books/{BOOK_ID}/judgements/", {'judgement': {'query_doc_pair_id':query_doc_pair['id'],'rating':judgement['judgement'], 'user_id': judge_id, 'explanation':judgement['explanation']}})
    #print(judgement) 


In [8]:
while (judgement_counter < LIMIT_TO_BE_JUDGED):
    judgement_counter = judgement_counter + 1
    print(f'Making Judgment {judgement_counter}')
    await run_judgement(judge_id)

Making Judgment 1
Judged a 1 because The document describes a product (a mobile phone case) specifically designed for the iPhone XR, not the iPhone X. While both are models of iPhones, the query specifically searches for the iPhone X, making the document only somewhat relevant due to the focus on a different, though related, iPhone model.
Making Judgment 2
Judged a 1 because The document describes a specific smartphone model, the Motorola Moto G 8, providing details about its screen size, memory capacity, SIM capability, connectivity, color, operating system, and battery life. These details directly address the search query for 'smartphone' by presenting a relevant product in the smartphone category. However, the document does not cover a range of smartphones or provide comparative information, which might be expected if the user's intent was to explore various options or learn about smartphones in general. The document's focus on a single product makes it highly relevant to someone sp

  _This notebook was last updated 11-MAR-2024_