# How to SAO

In this notebook, I'm going to be referencing many of the topics that I covered in Living Assets Lab post on the [Agentic Web](https://www.livingassets.co/blog/agentic-web). If you haven't read up on this topic, go ahead and take five minutes to read that article. The code in this notebook will make a lot more sense once you've had the context.

Enjoy!

## Pre-requisites
- Some python coding knowledge
- Background understanding of the Agentic Web
- Docker, Docker-Compose

## Running this notebook
- In the repository, you will find a `docker-compose.yml` configuration that you can use to spin up this experiment on your own.
- Use the following commands to run this the way we do:
```
docker-compose up -d --build la-jupyter 
```
- Check your container's logs for the notebook server URL `docker logs -f CONTAINER_ID`
- or visit `http://127.0.0.1:8888/tree`
- Select the notebook. This one is called `how_to_sao.ipynb`
- When it comes time to run the code, select the local jupyter server and allow connections without a token. Since this setup is intended to be run on a local machine, that should be okay. Depending on your IDE instructions to connect to the jupyter server environment may vary.

### Setup
First we install all required packages. We are using

[bert_score](https://github.com/Tiiiger/bert_score)

In [None]:
! pip install bert-score

Test our installation...

In [11]:
from bert_score import score

## Demo

In this demo, we're going to look at the following things to explain how to SAO. 

### Objectives
- Identify a contextual alignment measurement i.e. a method that we can use to measure how closely aligned a generated text response from an LLM is to the original intent(s) of a user query. 
- In other words, we're looking for a way to demonstrate how good a generated response is by evaluating whether the AI (an LLM in this case) actually answered the question.
- We will then apply this measurement to two queries, one from Perplexity.ai and one from a Living Assets Agent.
- Finally, we'll discuss the difference in the context of How to SAO!

### The User Query
| "What is search agent optimization?"
- In this example, our query a very simple question. What is search agent optimization? I'm a user who wants to know about this emerging topic, and how can I find out about it?


### Null & Alternative Hypothesis
- For this experience, our null hypothesis states that there will be no difference between the BertScore of our two agentic responses, the Living Assets Agent and the Perplexity agent. According to this hypothesis, we won't able to measure any difference between the BertScores of the two models.

- Our alternative hypothesis is that there is a detectable difference between the BertScore of the two models. We will only consider this alternative hypothesis if we have evidence to reject the null hypothesis.

- Regardless of the outcome of this experiment, we will discuss the evidence at the end from both a stastical perspective and from the perspective of any potential real world implications.

### What the Agents Know
- You may be wondering what context does each Agent have access to. The Living Assets Agent has access to proprietary documents that are not published online. They're internal to Living Assets. Think of these as documents internal to any company though, as it is common practice to not publish everything. single. thing. to the public internet.
- The Perplexity Agent has access to whatever information Perplexity AI has been able to scrape from the public web, or from bypassing paywalls on private company websites.

In [3]:
# json import
import json

In [26]:
# Preparing our Agents response text
def load_json_from_file(file_path):
    # Function to load JSON from a file
    with open(file_path, 'r', encoding='utf-8') as file:
        return json.load(file)

def print_capped_output(text, max_length=500):
    # Define a function to print capped output
    if len(text) > max_length:
        print(text[:max_length] + '...')
    else:
        print(text)

Now let's look at each of our agent's responses and compare for ourselves what some of the initial differences might be.

Starting with a Living Assets Agent.

In [36]:
# Load JSON data from a file
la_agents_file_path = './static/docs/la_agent_response.json'
la_agents_response = load_json_from_file(la_agents_file_path)

# Get the raw_output from the agent's response
la_agents_raw_output = la_agents_response['1']['raw_output']

# Print the capped raw_output
print_capped_output(la_agents_raw_output)

Search Agent Optimization (SAO) is a strategy designed for the evolving AI-centric digital landscape, particularly to enhance the discoverability and usability of content by AI agents. Unlike traditional SEO, which targets human users' search engine results, SAO focuses on structuring content in a way that AI systems can easily understand and utilize. Here are some key components of SAO:

1. **Structuring Content**: Clear hierarchies, semantic markup, and machine-readable formats are essential. ...


In [37]:
# Load JSON data from a file
perplexity_file_path = './static/docs/perplexity_response.json'
perplexity_response = load_json_from_file(perplexity_file_path)

# Get the raw_output from the agent's response
perplexity_raw_output = perplexity_response['1']['raw_output']

# Print the capped raw_output
print_capped_output(perplexity_raw_output)

Search engine optimization (SEO) is the process of improving a website's visibility and ranking in search engine results pages (SERPs) to increase organic traffic. Here's an overview of what SEO agents do:

## Key Responsibilities of SEO Agents

SEO agents, often working for SEO companies, perform several crucial tasks to optimize websites for search engines:

### Keyword Research and Analysis

SEO agents conduct thorough keyword research to identify high-performing keywords that can attract tar...


### Obvious Differences
- So right away there are some obvious differences between the two search results. The one that should be most apparent is the fact that our user's query was what is **search agent optimization?**. This is a question about a specific topic and it is obvious that Perplexity - which relies on web searches, RAG, web scraping, and the BM25 algorithm - was not able to properly figure out the subject. 
- It also seems that Perplexity may have missed some of the context to the question, and thus it did not address adjacent topics - which might include queries like "What are new ways to optimize for the agentic web" or "What are the changes happening on the internet insofar as search and query behavior?"

### Living Assets Agent Response
- The Living Assets Agent is an ensemble of traditional code functions, statistical models, and LLMs. This network of functionalities works together to understand the context of the question and accesses specific documents. The Agent is also able to access subsets of documents and it uses metadata to understand contextual information.
- All of these methods combine "traditional" AI (statistics), generative AI, and customized data structures to ensure that questions are addressed directly, efficiently and precisely.
- Lastly, our document apparently has a Call to Action (CTA) which is common in sales to provide the user with an ability to take actions on the responses they receive from the Living Assets Agent.

## Contextual Understanding: Analysis of the Two Responses

In [25]:
query = "What is search agent optimization?"
generated_result = la_agents_raw_output

LA_P, LA_R, LA_F1 = score([generated_result], [query], lang="en", verbose=True)

print(f"BERTScore: {LA_F1.item():.4f}")
print(f"BERTPrecision: {LA_P.item():.4f}")
print(f"BERTRecall: {LA_R.item():.4f}")

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


  0%|          | 0/1 [00:00<?, ?it/s]

computing greedy matching.


  0%|          | 0/1 [00:00<?, ?it/s]

done in 5.68 seconds, 0.18 sentences/sec
BERTScore: 0.8008
BERTPrecision: 0.7565
BERTRecall: 0.8507


In [24]:
query = "What is search agent optimization?"
generated_result = perplexity_raw_output

Perp_P, Perp_R, Perp_F1 = score([generated_result], [query], lang="en", verbose=True)

print(f"BERTScore: {Perp_F1.item():.4f}")
print(f"BERTPrecision: {Perp_P.item():.4f}")
print(f"BERTRecall: {Perp_R.item():.4f}")


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


  0%|          | 0/1 [00:00<?, ?it/s]

computing greedy matching.


  0%|          | 0/1 [00:00<?, ?it/s]

done in 5.73 seconds, 0.17 sentences/sec
BERTScore: 0.7903
BERTPrecision: 0.7454
BERTRecall: 0.8408


## Results Discussion

We observed an F1 score of `0.79` for the Perplexity agents response, and a score of `0.80` from the Living Assets agents response. Across our other metrics, Precision and Recall, there is a similar discrepancy of `0.01` between the two agents responses to the User Query. 

This difference seems small, no?

While the difference is only 1%, there is an observable 1% difference between the results of a quantitative assessment of the two model's responses to our User Query.

We reject the null hypothesis and state that there is a difference between the two model's responses to our original query in this experiment.

## What the Results Tell Us
The results demonstrate that while both Agents are able to answer the user query effectively, achieving a rating significantly above the proposed minimum threshold of `.65`, there is one agent that has captured 1% of the contextual nuance that is often an unspoken, rather untyped, portion of user queries to LLM agents.

In our earlier discussion, we saw how that 1% can make a huge difference. In the `README.md` attached in this repository, I will include both of the full responses from both agents. It is obvious that the Perplexity Agent, while able to answer the question, gave context that diverged quite a bit from the original user query. This would likely result in a very unsatisfied and frustrated user who got the over confident response that is typical of LLM models and RAG systems, without actually addressing any of their question.

The Living Assets Agent was also able to effective answer the question in terms of BertScore. However, the Agent's understanding of context, access to offline content, hierarhical organization, use of accessible tagging in the document, among other things, all combined to generate an overall more targeted and more effective response.

## The Cost of Misleading Users
In conclusion, while both models achieved high BertScores, the model with the slightly higher score demonstrated a critical ability to capture nuanced details that enhance user interactions. This subtle yet impactful difference means the better-performing model provides responses that are not only accurate but also more contextually appropriate, fostering deeper user engagement. In contrast, the overconfidence of the lesser model, despite appearing correct, can mislead users, potentially reducing conversions and discouraging further exploration. This highlights the importance of not just accuracy, but also the quality and relevance of responses in driving successful user interactions.

To elevate your search optimization strategy, consider our new approach that combines machine learning with non-technical solutions, greater specificity, and understanding through Agentic ensembles. Take the next step towards enhancing user engagement and conversion. Visit [LivingAssets.co](https://livingassets.co)