# OpenAI Assistant API - Google Reviews inspection tool.
### Agentic RAG with the Function Calling, Retrieval, and Code Interpreter Tools. The tool can call ourscraper function to extract and store reviews of any business and then answer questions about the reviews using the retrieval and code interpreter. 


## Dependencies

We'll start, as we usually do, with some dependiencies and our API key!

In [2]:
# -q stands for "quiet". When this option is used, pip will produce less console output. 
# U stands for "upgrade". When this option is used, pip will upgrade all specified packages to the newest available version.
! pip install -qU openai outscraper pandas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# OpenAI initialization

In [3]:
from getpass import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key:")

# Outscraper initialization
### Outscraper will be used to extract google reviews. Google reviews API only returns newest 5 reviews. For this project - for code interpreter inspection side I need all reviews. Outscraper can also extract reviews from all businesses and not just only from my business. It is possible to get all reviews from Google Business API but requires approval. For these reasons I used Outscraper API and as usual their free tier works for my use case. 

In [7]:
from getpass import getpass
import os
from outscraper import ApiClient
import json

# Prompt the user to enter the API key
os.environ["OUTSCRAPER_API_KEY"] = getpass("Outscraper API Key:")

# Use the API key from the environment variable
outscraper_client = ApiClient(api_key=os.environ["OUTSCRAPER_API_KEY"])

# Search for the bsuiness and return reviews. I am capping it at 50 reviews for this exercise. Ideally reviews_limit will be set to unlimited.
### Search, return reviews using Outscraper API, store the file as json (the return structure is json anyway.) The file name is the search query appended with "reviews"
### ideally the function in Assistants API can make this call and store the file for cross examination of 2 or more businesses. 

In [42]:
business_search = 'Poppy Kids pediatric dentistry Novato CA usa'

In [43]:
reviews = outscraper_client.google_maps_reviews(business_search, reviews_limit=50, limit=1, language='en')

In [44]:

# Remove any non-alphanumeric characters (except spaces), and convert to lowercase
normalized_string = ''.join(char.lower() for char in business_search if char.isalnum() or char.isspace())

# Replace spaces with underscores
file_name = normalized_string.replace(' ', '_') + '_reviews'

# Print or use the file name
#print(file_name)


In [45]:
# Define the full path where the JSON file will be saved.
file_path = "/Users/acrobat/Documents/GitHub/AI-Engineering-Cohort-2/Week 2/Day 2/data_reviews/" 


In [46]:
# Write the JSON data to a file
with open(file_path + file_name + '.json', 'w') as f:
    json.dump(reviews, f)

In [14]:
# Chatbot Instructions for Poppy Google Reviews Bot.
name = "Poppy" # @param {type: "string"}
instructions = """
Purpose: Poppy is designed to assist business owners and customers by extracting and providing detailed insights from Google reviews. Poppy helps users understand customer sentiments, identify common themes, and respond to frequently asked questions derived from the reviews.

Capabilities:

Review Summarization: Poppy can quickly summarize the key points of reviews, highlighting the most mentioned positives and negatives.
Trend Identification: Poppy identifies trends in customer feedback over different periods, helping users see changes in customer satisfaction.
Question Extraction: Poppy extracts and answers common questions that customers have about a business, based on the reviews.
Sentiment Analysis: Poppy analyzes the sentiment of reviews to gauge overall customer satisfaction and alert users to any significant shifts in public perception.
Personality: Poppy is friendly, helpful, and always eager to provide insights in a straightforward and accessible manner. Poppy is designed to be approachable, making complex data easy to understand for everyone.

Interaction Style:

Poppy actively listens to user queries and responds with information extracted from Google reviews.
Poppy uses clear, concise language to ensure that all users can easily understand the insights derived from the reviews.
Poppy is proactive in offering suggestions on how to improve customer satisfaction based on review trends and common feedback.
Example Interaction:

User: "Poppy, what do customers like most about our service last month?"
Poppy: "From last month’s reviews, customers really appreciated the quick service and friendly staff. There was a notable positive sentiment about the cleanliness of the premises too."
Limitations:

Poppy relies on the available data from Google reviews and does not generate opinions or advice beyond the extracted information.
Poppy can only provide information as accurately as the reviews allow; it does not interpret ambiguous feedback without clear context.
"""  # @param {type: "string"}
model = "gpt-4-turbo" # @param ["gpt-3.5-turbo", "gpt-4-turbo-preview", "gpt-4"]

### OpenAI Client

In [16]:
from openai import OpenAI

client = OpenAI()

In [23]:
print(file_path + file_name)

/Users/acrobat/Documents/GitHub/AI-Engineering-Cohort-2/Week 2/Day 2/data_reviews/poppy_kids_pediatric_dentistry_novato_ca_usa_reviews


In [29]:
# Need to update this part for multiple documents - for now just pulling the single file. I have to move on from this madness. 
file_reference = client.files.create(
  file=open('/Users/acrobat/Documents/GitHub/AI-Engineering-Cohort-2/Week 2/Day 2/data_reviews/poppy_kids_pediatric_dentistry_novato_ca_usa_reviews.json', "rb"),
  purpose='assistants'
)

In [30]:
# for now one file for retieval, with retrieval and code intepreter tools.
assistant = client.beta.assistants.create(
    
    name=name + " + Google Reviews New",
    instructions=instructions,
    tools=[
        {"type": "code_interpreter"},
        {"type": "retrieval"},
       # {"type": "function", "function" : ddg_function}
    ],
    model=model,
    file_ids=[file_reference.id],
)

In [33]:
import json
import time

def wait_for_run_completion(thread_id, run_id):
    """
    Waits for the completion of a run identified by the given thread ID and run ID.

    Args:
        thread_id (str): The ID of the thread containing the run.
        run_id (str): The ID of the run to wait for.

    Returns:
        dict: The details of the completed run.

    Raises:
        None

    """
    while True:
        time.sleep(1)
        run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run_id)
        print(f"Current run status: {run.status}")
        if run.status in ['completed', 'failed', 'requires_action']:
            return run

def submit_tool_outputs(thread_id, run_id, tools_to_call):
    """
    Submits the tool outputs to the specified thread and run.

    Args:
        thread_id (str): The ID of the thread to submit the tool outputs to.
        run_id (str): The ID of the run to submit the tool outputs to.
        tools_to_call (list): A list of tools to call and retrieve outputs from.

    Returns:
        dict: A dictionary containing the submitted tool outputs.

    """
    tool_output_array = []
    for tool in tools_to_call:
        output = None
        tool_call_id = tool.id
        function_name = tool.function.name
        function_args = tool.function.arguments

        if function_name == "duckduckgo_search":
            print("Consulting Duck Duck Go...")
            output = duckduckgo_search(query=json.loads(function_args)["query"])

        if output:
            tool_output_array.append({"tool_call_id": tool_call_id, "output": output})

    print(tool_output_array)

    return client.beta.threads.runs.submit_tool_outputs(
        thread_id=thread_id,
        run_id=run_id,
        tool_outputs=tool_output_array
    )

def print_messages_from_thread(thread_id):
    """
    Prints the messages from a given thread.

    Args:
        thread_id (str): The ID of the thread.

    Returns:
        None
    """
    messages = client.beta.threads.messages.list(thread_id=thread_id)
    for msg in messages:
        print(f"{msg.role}: {msg.content[0].text.value}")

def use_assistant(query, assistant_id, thread_id=None):
    """
    Uses the OpenAI Assistant to interact with the AI model.

    Args:
        query (str): The user's query or message.
        assistant_id (str): The ID of the OpenAI Assistant.
        thread_id (str, optional): The ID of the thread to use. If not provided, a new thread will be created.

    Returns:
        str: The ID of the thread used for the conversation.
    """
    thread = client.beta.threads.create()

    message = client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=query,
    )

    print("Creating Assistant ")

    run = client.beta.threads.runs.create(
        thread_id=thread.id,
        assistant_id=assistant_id,
    )

    print("Querying OpenAI Assistant Thread.")

    run = wait_for_run_completion(thread.id, run.id)

    if run.status == 'requires_action':
        run = submit_tool_outputs(thread.id, run.id, run.required_action.submit_tool_outputs.tool_calls)
        run = wait_for_run_completion(thread.id, run.id)

    print_messages_from_thread(thread.id)

    return thread.id


In [34]:
use_assistant("How many reviews are there for Poppy Kids Pediatric Dentistry?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: Poppy Kids Pediatric Dentistry has a total of 90 reviews.
user: How many reviews are there for Poppy Kids Pediatric Dentistry?


'thread_sM5oOSv0ScKT3OAjG7taq4YA'

In [35]:
# This was interesting. Since the overall review score exists as a line item in the JSON file, LLM extract it and use it as a response to the question.
# Although there are only 50 reviews extracted, LLM used the ready / available information instead of calculating the average from each review score.
use_assistant("What is the average review score?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: The average review score for Poppy Kids Pediatric Dentistry is 5/5, based on a total of 90 reviews.
user: What is the average review score?


'thread_zsNunuvI3fag2Z6ZooVaTSwW'

In [37]:
# The events word is never used in the exported json file. Still LLM correctlt retrieved the information.
use_assistant("Are there any posts about events?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: Yes, there are several posts about events hosted by Poppy Kids Pediatric Dentistry. Here are some examples:

1. **Poppy Planting Party** - A family-friendly gardening event where children can plant poppies and decorate pots to take home.
2. **Mom's Morning Out with Poppy Kids & The Mom Walk Collective** - A morning event held on January 27th, focused on providing a delightful experience for moms and children.
3. **Valentine's Day Card Making Event** - Held on February 11th, this event invites children to craft love-filled Valentine's cards with various decorative materials.
4. **Holiday Cookie Decorating** - An event on December 2nd featuring cookie decorating for festive fun.

These events focus on engaging children and families in cr

'thread_lxm5QH5FdaGFB23rXW6vW30U'

In [38]:
# The review_timestamp is in Unix timestamp or Epoch time. Code interpreter tool correctly parsed the timestamp and converted it to a human-readable format.
use_assistant("Count the number of reviews by month and year.", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: Here is the count of reviews by month and

'thread_XBnI0fiOsGYYz84N7vn3QBmI'

In [39]:
# two questions, one query result exists and the other does not.
use_assistant("What is the address of the business? and when are they open?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: The business, Poppy Kids Pediatric Dentistry, is located at **450 Ignacio Blvd, Novato, CA 94949**.

Unfortunately, the hours of operation are not provided in the document. For this information, it may be beneficial to visit their official site or directly contact them via the provided phone number.
user: What is the address of the business? and when are they open?


'thread_FYHFzGsymnRuObjKvFqKrPGZ'

# Multiple review files

# Lets try it with two data files. Reviews from two pediatric dentistries. And see if code interpreter is able to answer questions from both first and then compare the business to each other.
### Use the same assitant but add another file to the mix.

In [41]:
# a new business search, create a new file_name,pull and write reviews to a JSON data to a file. 
business_search = 'Novato Childrens Dentistry Novato CA usa'
reviews = outscraper_client.google_maps_reviews(business_search, reviews_limit=50, limit=1, language='en')

# Remove any non-alphanumeric characters (except spaces), and convert to lowercase
normalized_string = ''.join(char.lower() for char in business_search if char.isalnum() or char.isspace())

# Replace spaces with underscores
file_name = normalized_string.replace(' ', '_') + '_reviews'

with open(file_path + file_name + '.json', 'w') as f:
    json.dump(reviews, f)

# Attention:
# Although I asked for Novato Childrens Dentistry, the Outscraper returned Novato pediatric dentistry. So Novato pediatric dentistry super seeds the novato children's.
# This also impacted the file name since I am using the business search string to create the file name. These are the limitations of the current implementation and must be addressed.

# I want to update the existing assistant instead of creating a new assistant

In [48]:
print(assistant.id) # print assistant id
existing_file_ids = [file_reference.id]
print(existing_file_ids)

asst_c7xCiqsJ4pXFYbBtFyak6woh
['file-p5ECEuYUaUb1hYfChUTuqOBL']


In [50]:
# Need to update this part for multiple documents - for now just pulling the single file. I have to move on from this madness. 
new_file_reference = client.files.create(
  file=open('/Users/acrobat/Documents/GitHub/AI-Engineering-Cohort-2/Week 2/Day 2/data_reviews/novato_childrens_dentistry_novato_ca_usa_reviews.json', "rb"),
  purpose='assistants'
)

In [51]:
# Append the new file ID
existing_file_ids.append(new_file_reference.id)

# Update the assistant
updated_assistant = client.beta.assistants.update(
    assistant_id=assistant.id,
    file_ids=existing_file_ids  # Updated list of file IDs
)

# Oley works, I can see it in the Assistants playground. 

In [52]:
# Lets ask question. First is to see if LLM can differentiate and retrieve different businesses from two files. 
use_assistant("How many different businesses are there in the files and what are their names?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: There are two different businesses mentioned in the files:

1. Poppy Kids Pediatric Dentistry【9†source】
2. Novato Children's Dentistry【13†source】
user: How many different businesses are there in the files and what are their names?


'thread_LpR7ZwNhZFcfQPat8eKrL34y'

In [53]:
use_assistant("How many reviews are there for each business and what are the overall review score for business?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: Novato Children's Dentistry in Novato, CA has **200 reviews** with an **overall review score of 4.9 stars** out of 5.

To recap, here's the information for both businesses:
1. Poppy Kids Pediatric Dentistry: 90 reviews, 5.0 stars.
2. Novato Children's Dentistry: 200 reviews, 4.9 stars.
assistant: Poppy Kids Pediatric Dentistry in Novato, CA has **90 reviews** with an **overall review score of 5.0 stars** out of 5.

Now, I will check the information for the other business.
user: How many reviews are there for each business and what are the overall review score for business?


'thread_VmUxrHXJWtodBNApemnxP9Da'

# question to AI Makerspace? Why is the returned information in a weird order? How am I supposed to read above information. 

Now turn the google reviews outscraper function into an assistants api function callling. This Assitants API function will Search a new business address, save the returned results into a variable, create a new file reference from that variable to be added to existing assistants file reference.
This must be triggered within the assistant with a prompt. Get me the business reviews of "business_search".


In [59]:
# First lets see the list of files associated with the assistant.

# Get the assistant ID
assistant_id = assistant.id

# Get a list of all files associated with the assistant
files = client.beta.assistants.files.list(assistant_id=assistant_id)

# Get the IDs of all files
file_ids = [file.id for file in files]

print(file_ids)

['file-KJZBaIIwbZ5FJNQANPQDBXBL', 'file-p5ECEuYUaUb1hYfChUTuqOBL']


In [60]:
# wrap the code into a function that takes the business search string and the assistant ID as parameters, fetches the reviews, saves them to a JSON file, and updates the assistant with the new file.

def fetch_reviews_and_update_assistant(business_search, assistant_id):
    # Fetch the reviews
    reviews = outscraper_client.google_maps_reviews(business_search, reviews_limit=50, limit=1, language='en')

    # Normalize the business search string to create the file name
    normalized_string = ''.join(char.lower() for char in business_search if char.isalnum() or char.isspace())
    file_name = normalized_string.replace(' ', '_') + '_reviews.json'

    # Save the reviews to a JSON file
    with open(file_name, 'w') as f:
        json.dump(reviews, f)

    # Create a new file reference
    new_file_reference = client.files.create(
        file=open(file_name, "rb"),
        purpose='assistants'
    )

    # Get the existing file IDs
    existing_file_ids = [file.id for file in client.beta.assistants.files.list(assistant_id=assistant_id)]

    # Append the new file ID
    existing_file_ids.append(new_file_reference.id)

    # Update the assistant
    updated_assistant = client.beta.assistants.update(
        assistant_id=assistant_id,
        file_ids=existing_file_ids  # Updated list of file IDs
    )

    return updated_assistant

In [61]:
updated_assistant = fetch_reviews_and_update_assistant('Mt Tam Pediatric Dentistry Corte Madera CA usa', assistant.id)

In [62]:
# Worked!!! lets see the list of files now. We had 2 and we expect 3
# Get the assistant ID
assistant_id = assistant.id

# Get a list of all files associated with the assistant
files = client.beta.assistants.files.list(assistant_id=assistant_id)

# Get the IDs of all files
file_ids = [file.id for file in files]

print(file_ids) # Success!!! 

['file-PpsEB5JBEkzZ3JaJ0rLG1PwH', 'file-KJZBaIIwbZ5FJNQANPQDBXBL', 'file-p5ECEuYUaUb1hYfChUTuqOBL']


### Now I will test contents in Assistants playground to see if it answers questions using the new document - All Works

In [63]:
# Test one here: #success!!!
use_assistant("How many reviews are there for each business and what are the overall review score for business?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: For the business "Mt Tam Pediatric Dentistry" in Corte Madera, CA, there are **30 reviews** with an **overall rating of 5 stars**.

Here is a summary of the review analysis for each business:
1. **Poppy Kids Pediatric Dentistry**: 90 reviews, 5 stars.
2. **Novato Children's Dentistry**: 200 reviews, 4.9 stars.
3. **Mt Tam Pedia

'thread_DvP3jzU4R45mk2VPN3NOzhFg'

# Jay: Somehow figure out how to run this function as part of function calling. Within it I need to feed the search business variable within the prompt.

In [None]:
#I already have the function that I wish to be called.
#fetch_reviews_and_update_assistant() - it featches the new set of reviews and adds them to the file reference store.

In [None]:
# Now we need to express how our function works in a way that is compatible with the OpenAI Function Calling API.
# We'll want to provide a `JSON` object that includes what parameters we have, how to call them, and a short natural language description.

fetch_reviews_and_update_assistant_function = {
    "name" : "fetch_reviews_and_update_assistant",
    "description" : "Fetch Google Maps reviews for a business and update an assistant with the reviews.",
    "parameters" : {
        "type" : "object",
        "properties" : {
            "business_search" : {
                "type" : "string",
                "description" : "The business to search for on Google Maps. For example: 'Novato Childrens Dentistry Novato CA usa'"
            },
            "assistant_id" : {
                "type" : "string",
                "description" : "The ID of the assistant to update."
            }
        },
        "required" : ["business_search", "assistant_id"]
    }
}

# Question: Do I need the assistant update if I am running this in the same thread in the assistants API?

### If you're running this in the same thread in the Assistants API and you're creating the assistant and adding the files in the same session, you don't necessarily need to update the assistant. When you create an assistant and add files, those files are associated with the assistant immediately. The update operation is typically used when you want to modify an existing assistant's configuration, such as adding new files or changing the model. However, if you're adding new files after the assistant has been created and you want these new files to be associated with the assistant, you would need to update the assistant with the new file IDs. So, whether you need to update the assistant or not depends on when and how you're adding the files.

### YES, I do. I am adding files after the assistant is created. This is the forever use case. 

# Update the existing assistant, rather then creating a new one. i want to see if this is possible and also dont want to loose my docs. I also would like to add duck duck go to aid with the reviews analysis. 

In [None]:
# Get the assistant ID
assistant_id = assistant.id

# Update the assistant
updated_assistant = client.beta.assistants.update(
    assistant_id=assistant_id,
    name=name + " + All Tools",
    instructions=instructions,
    tools=[
        {"type": "code_interpreter"},
        {"type": "retrieval"},
        {"type": "function", "function" : fetch_reviews_and_update_assistant}
    ],
    model=model,
    file_ids=[file_reference.id],
)

In [None]:
######################### Here!!!

### Creating an Assistant with a Function Calling Tool

Let's finally create an Assistant that utilizes the Function Calling API.

We'll start by creating a function that we wish to be called.

We'll utilize DuckDuckGo search to allow our Assistant to have the most up to date information!

In [59]:
!pip install -qU duckduckgo_search


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [60]:
from duckduckgo_search import DDGS

def duckduckgo_search(query):
  with DDGS() as ddgs:
    results = [r for r in ddgs.text(query, max_results=5)]
    return "\n".join(result["body"] for result in results)

Let's test our function to make sure it behaves as we expect it to.

In [63]:
duckduckgo_search("How many 5 star Google reviews does a pediatric dentistry need to place at top of Google search results?")

'A study by Harvard Business School found that a 1-star increase in Yelp ratings leads to a 5-9% increase in revenue. And while a Google review may not have as much of an impact, they can still lead to more sales and revenue for your business. So, if you want to improve your local SEO, build trust and credibility, and generate more sales, ensure ...\nHow to use the calculator correctly. The 5-star rating calculator is a handy little tool that can help you figure out how many more reviews you need to get your overall or average rating to 5 stars. Here are a few examples of how to use it correctly: 1. Enter the current number of reviews you have. 2. Enter the current average rating you have. 3.\nSimilar to other types of reviews, recipe cards in search results show the average review rating and the total number of reviews. Screenshot from search for [best vegan winter recipes], Google ...\nA 5-star rating can be what you need to stand out in search results. Trust and SEO: Positive Google

Now we need to express how our function works in a way that is compatible with the OpenAI Function Calling API.

We'll want to provide a `JSON` object that includes what parameters we have, how to call them, and a short natural language description.

In [65]:
ddg_function = {
    "name" : "duckduckgo_search",
    "description" : "Answer non-technical questions. ",
    "parameters" : {
        "type" : "object",
        "properties" : {
            "query" : {
                "type:" : "string",
                "description" : "The search query to use. For example: 'What is the importance of Google reviews in local SEO?'"
            }
        },
        "required" : ["query"]
    }
}

####❓ Question

Why does the description key-value pair matter?

#### ANSWER:
It helps clarify the purpose and usage of the function. 

Now when we create our Assistant - we'll want to include the function description as a tool using the following format.

Jay: Can I assign multiple functions simultanously?

In [66]:
assistant = client.beta.assistants.create(
    name=name + " + Function Calling API",
    instructions=instructions,
    tools=[
        {"type": "function",
         "function" : ddg_function
        }
    ],
    model=model
)

We need to make a few modifications to our Assistant to include the ability to make calls to our local function and pass the results back to our Assistant for further generation.

In [68]:
import json

def wait_for_run_completion(thread_id, run_id):
    """
    Waits for the completion of a run identified by the given thread ID and run ID.

    Args:
        thread_id (str): The ID of the thread containing the run.
        run_id (str): The ID of the run to wait for.

    Returns:
        dict: The details of the completed run.

    Raises:
        None

    """
    while True:
        time.sleep(1)
        run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run_id)
        print(f"Current run status: {run.status}")
        if run.status in ['completed', 'failed', 'requires_action']:
            return run

def submit_tool_outputs(thread_id, run_id, tools_to_call):
    """
    Submits the tool outputs to the specified thread and run.

    Args:
        thread_id (str): The ID of the thread to submit the tool outputs to.
        run_id (str): The ID of the run to submit the tool outputs to.
        tools_to_call (list): A list of tools to call and retrieve outputs from.

    Returns:
        dict: A dictionary containing the submitted tool outputs.

    """
    tool_output_array = []
    for tool in tools_to_call:
        output = None
        tool_call_id = tool.id
        function_name = tool.function.name
        function_args = tool.function.arguments

        if function_name == "duckduckgo_search":
            print("Consulting Duck Duck Go...")
            output = duckduckgo_search(query=json.loads(function_args)["query"])

        if output:
            tool_output_array.append({"tool_call_id": tool_call_id, "output": output})

    print(tool_output_array)

    return client.beta.threads.runs.submit_tool_outputs(
        thread_id=thread_id,
        run_id=run_id,
        tool_outputs=tool_output_array
    )

def print_messages_from_thread(thread_id):
    """
    Prints the messages from a given thread.

    Args:
        thread_id (str): The ID of the thread.

    Returns:
        None
    """
    messages = client.beta.threads.messages.list(thread_id=thread_id)
    for msg in messages:
        print(f"{msg.role}: {msg.content[0].text.value}")

def use_assistant(query, assistant_id, thread_id=None):
    """
    Uses the OpenAI Assistant to interact with the AI model.

    Args:
        query (str): The user's query or message.
        assistant_id (str): The ID of the OpenAI Assistant.
        thread_id (str, optional): The ID of the thread to use. If not provided, a new thread will be created.

    Returns:
        str: The ID of the thread used for the conversation.
    """
    thread = client.beta.threads.create()

    message = client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=query,
    )

    print("Creating Assistant ")

    run = client.beta.threads.runs.create(
        thread_id=thread.id,
        assistant_id=assistant_id,
    )

    print("Querying OpenAI Assistant Thread.")

    run = wait_for_run_completion(thread.id, run.id)

    if run.status == 'requires_action':
        run = submit_tool_outputs(thread.id, run.id, run.required_action.submit_tool_outputs.tool_calls)
        run = wait_for_run_completion(thread.id, run.id)

    print_messages_from_thread(thread.id)

    return thread.id


####❓ Question

Outline, in simple terms, what the `use_assistant` helper function is doing.

#### ANSWER
The use_assistant function is the main function that uses the assistant. It creates a new thread, creates a new message in the thread with the user's query, and creates a new run with the assistant. If the run requires action, it submits the tool outputs and waits for the run to complete. Finally, it prints all the messages from the thread and returns the thread id.

In [69]:
use_assistant("Do I need Goggle reviews for local SEO?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: requires_action
Consulting Duck Duck Go...
[{'tool_call_id': 'call_nDMnWA1z7x8MZvoihT5zGnVs', 'output': "1) The Owner Sets The Review Stage. 65% of review writers have left negative reviews after experiencing bad or rude customer service, and an additional 28% say their negative reviews result from businesses failing to resolve their complaints at the time of service. One of the best paths to building a sterling local business reputation is to run ...\nPlus, skillful review management can also impact a business's success. Hence, you may have already got a gist of why are Google Reviews important. Now, let's dive into the 7 staggering benefits of Google Reviews in detail! 1. Increases Trust and Credibility. Google dominates the search engine market with a 92% share.\nHow Reviews & Ratings Impact Local SEO for Google Maps. For listings on Google Maps, our local rank trackers found th

'thread_esMzUcQOK8dh25E6HvpTdxzV'

## Wrapping it All Together (Super good)

Now we can create an Assistant with all of the available tools and see how it responds to various queries!

In [70]:
assistant = client.beta.assistants.create(
    name=name + " + All Tools",
    instructions=instructions,
    tools=[
        {"type": "code_interpreter"},
        {"type": "retrieval"},
        #{"type": "function", "function" : ddg_function}
    ],
    model=model,
    file_ids=[file_reference.id],
)

In [71]:
use_assistant("Why should I choose a pediatric dentist over a general dentist?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: Choosing a pediatric dentist over a general dentist is recommended because pediatric dentists receive specialized training in caring for children’s teeth, gums, and mouth through all stages of childhood. They are also trained to handle the unique behavioral needs of children, making dental visits more positive and effective for young patients.
user: Why should I choose a pediatric dentist over a general dentist?


'thread_eEcsOtjN1MJb6xMlj07cpvLe'

In [72]:
use_assistant("How many questions are there in my document?", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: The document contains a total of 11 questions.
user: How many questions are there in my document?


'thread_W1oO6LbADHBSQ02WNMF1j8fI'

In [73]:
use_assistant("Extract all questions in my document as  bullet points. I only want the questions.", assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: - When is a good time for a child's first dental visit and what should parents expect?
- What age should children start getting dental x-rays and why are they important?
- What's the best age to introduce an electric toothbrush to children and which type is recommended?
- Why do children's teeth sometimes appear yellow and how can it be addressed?
- How can parents determine the right time for their child to visit an orthodontist?
- What age do children typically lose their first tooth and what should parents do when their child's tooth is wiggly?
- Until what age does your pediatric dental practice provide ca

'thread_OrnTmB0SnJ042iKZPisRsWaH'

####❓ Question

Notice that our response can go through multiple paths, given that:

What is "deciding" to use the tool?

#### ANSWER:
I think the answer is both sequential and contextual logic. This makes the most sense to me. I think this is the reason we add answer non technical question. The model checks the knowledge base first (retrival). If it does not find the answer there then moves on to the web search. This is controlled by the "non technical" question part. if there is any math / coding involved it defaults to the code retrieval tool.
One interesting thing would be to try to add the coding questions answer to the knowledge base. In my example I have 13 questions with answers in my document. What if one of my question asked: How many questions are there in my document? And the answer was 20 instead of the correct 13. Would it return 20 since it finds the answer in my knowledge base?

### Adding JSON Mode for More Agentic Behaviour

Finally, we have the ability to select tools - all we need to do now is set up a process to allow us to create some kind of loop and make decisions about whether or not the response is complete or not.

We'll leverage the OpenAI completions end-endpoint with JSON mode to let us understand when we've adequately answered our user's question!

In [74]:
completed_template = \
"""
Does this response adequately answer the user's query?

Please return your response in JSON format - with key: "completed" and either True (if completed) or False (if not completed)

User Query:
{query}

Assistant Response:
{response}
"""

def is_complete(query, response):
  completed_response = client.chat.completions.create(
      messages=[
          {
              "role": "user",
              "content": completed_template.format(query=query, response=response),
          }
      ],
      model=model,
      response_format={"type" : "json_object"}
  )

  return completed_response

In [93]:
query = "How many bytes is the provided file?"

thread_id_for_response = use_assistant(query, assistant.id)

Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: in_progress
Current run status: in_progress
Current run status: completed
assistant: The provided file is 11,043 bytes in size.
user: How many bytes is the provided file?


Now we can observe JSON mode in action!

In [76]:
messages = client.beta.threads.messages.list(thread_id=thread_id_for_response)
response = messages.data[0].content[0].text.value
completed_flag = json.loads(is_complete(query, response).choices[0].message.content)

In [84]:
messages

SyncCursorPage[Message](data=[Message(id='msg_t1QJ4ozNngEAkA7YxWcFdddj', assistant_id='asst_dYx26lafARyFiozBz5MfFNZg', completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value='The provided file is 11,043 bytes in size.'), type='text')], created_at=1712951728, file_ids=[], incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='assistant', run_id='run_O60vqxyOrMBAkfUWrR0SPbjD', status=None, thread_id='thread_uDRImtznWXjo2SjeKSuwSCuP'), Message(id='msg_YgKDf549KUJb9AjbDGLSfL9X', assistant_id=None, completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value='How many bytes is the provided file?'), type='text')], created_at=1712951723, file_ids=[], incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='user', run_id=None, status=None, thread_id='thread_uDRImtznWXjo2SjeKSuwSCuP')], object='list', first_id='msg_t1QJ4ozNngEAkA7YxWcFdddj', last_id='msg_YgKDf549KUJb9AjbDGLSfL9X', has_more=Fa

In [85]:
response

'The provided file is 11,043 bytes in size.'

In [78]:
completed_flag

{'completed': True}


## 🚧 BONUS CHALLENGE 🚧:

Use the components we've constructed so far to build a loop that lets us continue to query the Assistant if the response is not completed!

In [94]:
### YOUR CODE HERE: not sure how to test this other then this. Sometimes it runs for 5 cycles, sometimes just twice. 
## After talking to Chris, i created a new thread_id before I run the below code and it did run a few times with completed_flag = False.  


query = "What is in my pocket right now?"
completed_flag = False

while completed_flag == False:
    thread_id_for_response = use_assistant(query, assistant.id)
    messages = client.beta.threads.messages.list(thread_id=thread_id_for_response)
    response = messages.data[0].content[0].text.value
    completed_flag = json.loads(is_complete(query, response).choices[0].message.content)['completed']


Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: completed
assistant: I don't have the capability to know or predict what is in your pocket right now. If you have specific information or questions related to pediatric dentistry, feel free to ask!
user: What is in my pocket right now?
Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: completed
assistant: I can't help with that question. If you have any queries or need assistance with the documents you uploaded, please let me know!
user: What is in my pocket right now?
Creating Assistant 
Querying OpenAI Assistant Thread.
Current run status: in_progress
Current run status: completed
assistant: I am unable to determine the contents of your pocket without direct observation or additional information.
user: What is in my pocket right now?


In [95]:
completed_flag

True

# Make Sure You Delete Resources

Make sure you delete all the resources you created!

This function will help you do so!

In [None]:
file_deletion_status = client.beta.assistants.files.delete(
  assistant_id=assistant.id,
  file_id=file_reference.id
)

###############################################################

# LATER Experiments for Outscraper - These can be used for added functionality if I want to add to reviews intelligence tool. 

In [None]:
# Another experient to load all files under data reviews as I create the assistant. Alternatively I can update the assistant with all files.
# So in essence I have two options. Update the existing new assistant with new file OR
# delete the existing file(s) wihin the assistant and reload the assistant with all files under data_reviews. 

import os

# Directory containing the files
directory = '/Users/acrobat/Documents/GitHub/AI-Engineering-Cohort-2/Week 2/Day 2/data_reviews/'

# Get a list of all files in the directory
files = os.listdir(directory)

# Create a file reference for each file
file_references = [client.files.create(file=open(directory + file, "rb"), purpose='assistants') for file in files]

# Get the IDs of the file references
file_ids = [file_reference.id for file_reference in file_references]

# Create the assistant
assistant = client.beta.assistants.create(
    name=name + " + Google Reviews multi file",
    instructions=instructions,
    tools=[
        {"type": "code_interpreter"},
        {"type": "retrieval"},
       # {"type": "function", "function" : ddg_function}
    ],
    model=model,
    file_ids=file_ids,
)