# Jarvis Prompts
While implementing Jarvis it became apparent that the prompt offered by the YouTuber where I copied the code were not optimal.
Rather than develop the prompts in an adhoc way I decided to take a more scientific approach, measuring for each change. It will
also let me measure the differences using the various models for each stage.

In [1]:
# read in the necessary libraries 
from groq import Groq
import cv2
import pyperclip
from PIL import ImageGrab, Image
import google.generativeai as genai
import os
import pandas as pd
from sklearn.metrics import confusion_matrix
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
from langchain_community.chat_models import ChatOllama
import time
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.runnables import RunnableLambda
from langchain_core.messages.ai import AIMessage


## Utilities for testing

In [2]:
def open_file(filepath):
    with open(filepath, 'r', encoding='utf-8', errors='ignore') as infile:
        return infile.read()

def save_file(filepath, content):
    with open(filepath, 'w', encoding='utf-8') as outfile:
        outfile.write(content)
        
def append_file(filepath, content):
    with open(filepath, 'a', encoding='utf-8') as outfile:
        outfile.write(content)
        
results_log_file = 'logs/log.txt'
def log_results(results):
    append_file(results_log_file, results + '\n')
    print(results)


In [58]:
class PromptTester:
    def __init__(self, prompt_file, model):
        prompt_text = open_file(prompt_file)
        human = "{text}"
        prompt = ChatPromptTemplate.from_messages([("system", prompt_text), ("human", human)])
        self._chain = prompt | model
        self._start_time = time.time()

    def __call__(self, text):
        response = self._chain.invoke({"text": text})
        if isinstance(response, AIMessage):
            response = response.content
        if isinstance(response, str):
            response = response.lstrip(' "\'').rstrip(' "\'').lower()
        return response

def run_test(test_description, prompt_file, model, example_file='TestFunctions.tsv'):
    dut = PromptTester(prompt_file, model)
    test_fn_prompt = pd.read_csv(example_file, sep='\t')
    test_fn_prompt['results'] = test_fn_prompt['user input'].apply(lambda x: dut(x))
    cm = confusion_matrix(test_fn_prompt['function call'], test_fn_prompt['results'])
    labels = sorted(test_fn_prompt['function call'].unique())
    log_results('\n' + test_description + '\n')
    print(f'Labels: {labels}')
    cm_df = pd.DataFrame(cm, index=labels, columns=labels)
    log_results(str(cm_df))

    log_results('Failures:')
    num_failures = 0
    for _, row in test_fn_prompt.iterrows():
        if row['function call'] != row['results']:
            log_results(f'{row["user input"].ljust(60)} Expected: {row["function call"].ljust(20)} Result: {row["results"]}')
            num_failures += 1
    log_results(f'Accuracy: {(1 - num_failures / len(test_fn_prompt)) * 100:.2f}%')
    return cm_df

## Test 1: My First Function Prompt/Llama 3 70B Model

In [4]:
run_test('Test 1: My first function prompt using llama3-70b-8192', 'FunctionCallPrompt1.txt', ChatGroq(temperature=0, model_name='llama3-70b-8192'))


Test 1: My first function prompt using llama3-70b-8192

                 capture webcam  read clipboard  search web  take screenshot
capture webcam               19               0           0                0
read clipboard                0              14           0                0
search web                    0               0           8                0
take screenshot               0               0           0               16
Failures:
Accuracy: 100.00%


Unnamed: 0,capture webcam,read clipboard,search web,take screenshot
capture webcam,19,0,0,0
read clipboard,0,14,0,0
search web,0,0,8,0
take screenshot,0,0,0,16


## Test 2: My First Function Prompt/Llama 3 8B Model

In [9]:
run_test('Test 2: My first function prompt using llama3-8b-8192', 'FunctionCallPrompt1.txt', ChatGroq(temperature=0, model_name='llama3-8b-8192'))


Test 2: My first function prompt using llama3-8b-8192

Unknown Labels in: ['capture webcam', 'none', 'read clipboard', 'search web', 'take screenshot']
Failures:
Show me the text I copied earlier.                           Expected: read clipboard       Result: take screenshot
Find out the name of the book on my desk.                    Expected: capture webcam       Result: take screenshot
What's the latest meme I saved?                              Expected: read clipboard       Result: take screenshot
How does my setup look for the meeting?                      Expected: capture webcam       Result: take screenshot
Is the website I'm on secure?                                Expected: take screenshot      Result: search web
What’s the last message I copied?                            Expected: read clipboard       Result: take screenshot
Can you see if my desktop background is appropriate?         Expected: capture webcam       Result: take screenshot
What’s the latest link I saved

## Test 3: My First Function Prompt/Gemma7bit Model

In [5]:
run_test('Test 3: My first function prompt using gemma-7b-it', 'FunctionCallPrompt1.txt', ChatGroq(temperature=0, model_name='gemma-7b-it'))


Test 3: My first function prompt using gemma-7b-it

Unknown Labels in: ['capture webcam', 'read clipboard', 'search web', 'take screenshot', 'the provided text is not included in the given context, so i am unable to extract the requested information.']
Failures:
Show me the text I copied earlier.                           Expected: read clipboard       Result: the provided text is not included in the given context, so i am unable to extract the requested information.
Is there anything interesting on the document I just opened? Expected: take screenshot      Result: read clipboard
Can you see if my coding setup looks correct?                Expected: take screenshot      Result: read clipboard
Is the website I'm on secure?                                Expected: take screenshot      Result: search web
Is there a notification banner on my screen?                 Expected: take screenshot      Result: capture webcam
Is there a reminder notification on my screen?               Expected: 

## Test 4: My First Function Prompt/phi3-medium

In [16]:
local_llm = 'phi3:14b-medium-4k-instruct-q8_0'
chat_llm = ChatOllama(model=local_llm, temperature=0, base_url="http://192.168.86.2:11434", keep_alive=-1, max_new_tokens=5)
run_test(f'Test 4: My first function prompt using {local_llm}', 'FunctionCallPrompt1.txt', chat_llm)


2629.58ms: Can you show me the current content of my clipboard? -> read clipboard
[response]: read clipboard
[input]: what's the weather like in paris today?
[output]: search web
[input]: i need to see what's on my desk.
[output]: take screenshot
[input]: how do i look right now?
[output]: capture webcam
[input]: can you find me a recipe for lasagna?
[output]: search web
[input]: what was the last thing i copied to my clipboard?
[output]: read clipboard
4305.72ms: Take a screenshot of my current screen. -> take screenshot
[response]: take screenshot
[query]: what function should be called if a user asks for assistance in finding information about historical landmarks in their city?
[reply]: search web
[rationale]: the user is seeking information that can likely be found on the internet, so "search web" would be the appropriate function to call.
4487.61ms: What is the weather like in Sydney today? -> search web

9114.13ms: Check what the webcam is showing right now. -> capture webcam
b:

## Test 5: JSON formatted output. Model phi3:14b-medium-4k-instruct-q8_0

In [14]:
class ExtractKey:
    def __init__(self, key):
        self._key = key

    def __call__(self, _input):
        if self._key in _input.keys():
            return _input[self._key]
        return "None"



In [15]:
local_llm = 'phi3:14b-medium-4k-instruct-q8_0'
chat_llm = ChatOllama(model=local_llm, temperature=0, base_url="http://192.168.86.2:11434", keep_alive=-1, format='json')
output_parser = JsonOutputParser()
chain = chat_llm | output_parser | RunnableLambda(ExtractKey('function'))

run_test(f'Test 5: JSON function prompt using {local_llm}', 'FunctionCallPromptJSON1.txt', chain)


Test 5: JSON function prompt using phi3:14b-medium-4k-instruct-q8_0

                 capture webcam  read clipboard  search web  take screenshot
capture webcam               18               0           0                1
read clipboard                0              14           0                0
search web                    0               0           8                0
take screenshot               1               1           1               13
Failures:
Is there anything interesting on the document I just opened? Expected: take screenshot      Result: read clipboard
Find out the name of the book on my desk.                    Expected: capture webcam       Result: take screenshot
Can you see if my coding setup looks correct?                Expected: take screenshot      Result: capture webcam
Is the website I'm on secure?                                Expected: take screenshot      Result: search web
Accuracy: 92.98%


Unnamed: 0,capture webcam,read clipboard,search web,take screenshot
capture webcam,18,0,0,1
read clipboard,0,14,0,0
search web,0,0,8,0
take screenshot,1,1,1,13


OK That worked better than I thought. Can we improve the groq Llama 8b model by as much?

## Test 6: Groq LLama 3 8B using JSON fields

In [16]:
local_llm = 'llama3-8b-8192'
chat_llm = ChatGroq(temperature=0, model_name='llama3-8b-8192')
output_parser = JsonOutputParser()

chain = chat_llm | output_parser | RunnableLambda(ExtractKey('function'))

run_test(f'Test 6: JSON function prompt using {local_llm} on Groq', 'FunctionCallPromptJSON1.txt', chain)


Test 6: JSON function prompt using llama3-8b-8192 on Groq

Unknown Labels in: ['capture webcam', 'none', 'read clipboard', 'search web', 'take screenshot']
Failures:
Do you think my room is tidy?                                Expected: capture webcam       Result: none
Is there anything interesting on the document I just opened? Expected: take screenshot      Result: read clipboard
Find out the name of the book on my desk.                    Expected: capture webcam       Result: read clipboard
Can you see if my coding setup looks correct?                Expected: take screenshot      Result: search web
How does my setup look for the meeting?                      Expected: capture webcam       Result: search web
Is the website I'm on secure?                                Expected: take screenshot      Result: search web
What’s the last message I copied?                            Expected: read clipboard       Result: search web
Can you see if my desktop background is appropriate?  

## Test 7: Llama 3 70b using JSON.
Lets see if Llama 3 70b also does better when using JSON

In [17]:
local_llm = 'llama3-70b-8192'
chat_llm = ChatGroq(temperature=0, model_name=local_llm)
output_parser = JsonOutputParser()

chain = chat_llm | output_parser | RunnableLambda(ExtractKey('function'))

run_test(f'Test 7: JSON function prompt using {local_llm} on Groq', 'FunctionCallPromptJSON1.txt', chain)


Test 7: JSON function prompt using llama3-70b-8192 on Groq

                 capture webcam  read clipboard  search web  take screenshot
capture webcam               19               0           0                0
read clipboard                0              14           0                0
search web                    0               0           8                0
take screenshot               0               1           0               15
Failures:
Is there anything interesting on the document I just opened? Expected: take screenshot      Result: read clipboard
Accuracy: 98.25%


Unnamed: 0,capture webcam,read clipboard,search web,take screenshot
capture webcam,19,0,0,0
read clipboard,0,14,0,0
search web,0,0,8,0
take screenshot,0,1,0,15


So the two leaders on JSON are llama3-70b-8192 and my local phi3-medium. I should do a quick check of mixtral as well

## Test 8: Groq Mixtral-8x7b-32768 on JSON

In [18]:
local_llm = 'mixtral-8x7b-32768'
chat_llm = ChatGroq(temperature=0, model_name=local_llm)
output_parser = JsonOutputParser()

chain = chat_llm | output_parser | RunnableLambda(ExtractKey('function'))

run_test(f'Test 8: JSON function prompt using {local_llm} on Groq', 'FunctionCallPromptJSON1.txt', chain)


Test 8: JSON function prompt using mixtral-8x7b-32768 on Groq

Unknown Labels in: ['capture webcam', 'none', 'read clipboard', 'search web', 'take screenshot']
Failures:
Do you think my room is tidy?                                Expected: capture webcam       Result: none
Is there anything interesting on the document I just opened? Expected: take screenshot      Result: read clipboard
Is the website I'm on secure?                                Expected: take screenshot      Result: search web
Can you see if my laptop is plugged in?                      Expected: capture webcam       Result: none
Accuracy: 92.98%


## Analysis
Llama 3 70B seems to be my best option. I want to now expand the router to extract terms for a web search. This should be in the query field returned.
I also need to add some none cases into the test cases.

## Test 9: Llama 3 70B where None is a choice Json Output

In [26]:
local_llm = 'llama3-70b-8192'
chat_llm = ChatGroq(temperature=0, model_name=local_llm)
output_parser = JsonOutputParser()

chain = chat_llm | output_parser | RunnableLambda(ExtractKey('function'))

run_test(f'Test 9: JSON function prompt with None as a choice using {local_llm} on Groq', 'FunctionCallPromptJSON1.txt', chain, "TestFunctionsWithNone.tsv")


Test 9: JSON function prompt with None as a choice using llama3-70b-8192 on Groq

Labels: ['capture webcam', 'none', 'read clipboard', 'search web', 'take screenshot']
                 capture webcam  none  read clipboard  search web  \
capture webcam               19     0               0           0   
none                          0     1               0           2   
read clipboard                0     0              14           0   
search web                    0     0               0          16   
take screenshot               0     0               1           0   

                 take screenshot  
capture webcam                 0  
none                           0  
read clipboard                 0  
search web                     0  
take screenshot               15  
Failures:
Is there anything interesting on the document I just opened? Expected: take screenshot      Result: read clipboard
Can you help me solve this math problem?                     Expected: none      

Unnamed: 0,capture webcam,none,read clipboard,search web,take screenshot
capture webcam,19,0,0,0,0
none,0,1,0,2,0
read clipboard,0,0,14,0,0
search web,0,0,0,16,0
take screenshot,0,0,1,0,15


## Test 10: Test with None as an option on phi3 medium

In [24]:
local_llm = 'phi3:14b-medium-4k-instruct-q8_0'
chat_llm = ChatOllama(model=local_llm, temperature=0, base_url="http://192.168.86.2:11434", keep_alive=-1, format='json')
output_parser = JsonOutputParser()
chain = chat_llm | output_parser | RunnableLambda(ExtractKey('function'))

run_test(f'Test 10: JSON function prompt with None Functions using {local_llm}', 'FunctionCallPromptJSON1.txt', chain, "TestFunctionsWithNone.tsv")


Test 10: JSON function prompt with None Functions using phi3:14b-medium-4k-instruct-q8_0

Labels: ['capture webcam', 'none', 'read clipboard', 'search web', 'take screenshot']
                 capture webcam  none  read clipboard  search web  \
capture webcam               18     0               0           0   
none                          0     0               0          10   
read clipboard                0     0              14           0   
search web                    0     0               0           8   
take screenshot               1     0               1           1   

                 take screenshot  
capture webcam                 1  
none                           0  
read clipboard                 0  
search web                     0  
take screenshot               13  
Failures:
Is there anything interesting on the document I just opened? Expected: take screenshot      Result: read clipboard
Find out the name of the book on my desk.                    Expected: ca

Unnamed: 0,capture webcam,none,read clipboard,search web,take screenshot
capture webcam,18,0,0,0,1
none,0,0,0,10,0
read clipboard,0,0,14,0,0
search web,0,0,0,8,0
take screenshot,1,0,1,1,13


## Test 11: Improve the prompt for none identification.

In [27]:
local_llm = 'llama3-70b-8192'
chat_llm = ChatGroq(temperature=0, model_name=local_llm)
output_parser = JsonOutputParser()

chain = chat_llm | output_parser | RunnableLambda(ExtractKey('function'))

run_test(f'Test 11: JSON function prompt with None as a choice using {local_llm} on Groq', 'FunctionCallPromptJSON2.txt', chain, "TestFunctionsWithNone.tsv")


Test 11: JSON function prompt with None as a choice using llama3-70b-8192 on Groq

Labels: ['capture webcam', 'none', 'read clipboard', 'search web', 'take screenshot']
                 capture webcam  none  read clipboard  search web  \
capture webcam               19     0               0           0   
none                          0     3               0           0   
read clipboard                0     0              14           0   
search web                    0     0               0          16   
take screenshot               0     0               2           0   

                 take screenshot  
capture webcam                 0  
none                           0  
read clipboard                 0  
search web                     0  
take screenshot               14  
Failures:
Is there anything interesting on the document I just opened? Expected: take screenshot      Result: read clipboard
Is the website I'm on secure?                                Expected: take scre

Unnamed: 0,capture webcam,none,read clipboard,search web,take screenshot
capture webcam,19,0,0,0,0
none,0,3,0,0,0
read clipboard,0,0,14,0,0
search web,0,0,0,16,0
take screenshot,0,0,2,0,14


## Test 12: Improve the prompt by adding reason into JSON

In [50]:
 from langchain_core.utils.json import parse_json_markdown

 def FilterOutExtraToJSON(input):
    candidate_json = input.content
    posi = candidate_json.find('{')
    return AIMessage(content=candidate_json[posi:])


In [51]:

parse_json_markdown("Here is my response:\
\
{ \
\"thought\": \"The user is asking for a joke, which is a form of entertainment that can be found online. I don't have a joke stored in my database, so I need to search the web for one.\",\
\"search\": \"funny jokes\",\
\"function\": \"search web\"\
}")

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [53]:
local_llm = 'llama3-70b-8192'
chat_llm = ChatGroq(temperature=0, model_name=local_llm)
output_parser = JsonOutputParser()

chain = chat_llm | RunnableLambda(FilterOutExtraToJSON) | output_parser | RunnableLambda(ExtractKey('function'))

run_test(f'Test 12: JSON function prompt with None as a choice using {local_llm} on Groq', 'FunctionCallPromptJSON3.txt', chain, "TestFunctionsWithNone.tsv")


Test 12: JSON function prompt with None as a choice using llama3-70b-8192 on Groq

Labels: ['capture webcam', 'none', 'read clipboard', 'search web', 'take screenshot']
                 capture webcam  none  read clipboard  search web  \
capture webcam               17     0               0           0   
none                          0     1               0           2   
read clipboard                0     0              14           0   
search web                    1     0               0          15   
take screenshot               0     0               1           0   

                 take screenshot  
capture webcam                 2  
none                           0  
read clipboard                 0  
search web                     0  
take screenshot               15  
Failures:
Is there anything interesting on the document I just opened? Expected: take screenshot      Result: read clipboard
Can you see if my desktop background is appropriate?         Expected: capture w

Unnamed: 0,capture webcam,none,read clipboard,search web,take screenshot
capture webcam,17,0,0,0,2
none,0,1,0,2,0
read clipboard,0,0,14,0,0
search web,1,0,0,15,0
take screenshot,0,0,1,0,15


## Refining the Answer
What I want now is a function that I can call with the input and get back a dictionary with thought, query, and function.

In [62]:
local_llm = 'llama3-70b-8192'
chat_llm = ChatGroq(temperature=0, model_name=local_llm)
output_parser = JsonOutputParser()

chain = chat_llm | RunnableLambda(FilterOutExtraToJSON) | output_parser
dut = PromptTester('FunctionCallPromptJSON3.txt', chain)

print(dut("Is there anything interesting in the document I just opened?"))
print(dut("Is there anything interesting in the document on my screen?"))
print(dut("Can you see if my desktop background is appropriate?"))
print(dut("Can you see if my laptop is plugged in?"))
print(dut("Can you help me solve this math problem?"))
print(dut("How do I change the tire on my car?"))
print(dut("Can you tell me a joke?"))

{'thought': "The user has opened a document and wants to know if there's anything interesting in it. To determine this, we need to access the content of the document, which is likely stored in the clipboard.", 'search': '', 'function': 'read clipboard'}
{'thought': "The user is referring to a document on their screen, so we need to take a closer look at it to determine what's interesting. We don't have direct access to the user's screen, so we need to use a function that can capture the screen content.", 'search': '', 'function': 'take screenshot'}
{'thought': "The user is asking about their desktop background, which is something that can be seen on their screen. To determine if it's appropriate, we need to visually inspect the background.", 'search': '', 'function': 'take screenshot'}
{'thought': "The user is asking about the status of their laptop's power source. This information is not readily available online and requires real-time access to the user's laptop. It's not related to t

In [63]:
print(dut("How are you doing?"))


{'thought': "The user is asking about the AI's status, which is a general inquiry that doesn't require any specific information from the user's environment.", 'search': '', 'function': 'None'}
