# LLM for query optimization

The problem:
- Singularity 6 / Palia: We should search Reddit for Palia
- Akili Interactive / EndeavorRx: We should search Reddit for EndeavorRx
- Rad.ai Omni: We should search Reddit for Rad.ai Omni



In [99]:
from core import Seed, init

init()

In [121]:
seed = Seed.init("Rad.ai", "Omni")

company_description = "Rad AI is a rapidly growing AI start-up focused on revolutionizing radiology reporting by utilizing advanced machine learning to enhance efficiency, reduce burnout, and improve patient care in healthcare settings."
product_description = "Omni by Rad AI is an intelligent radiology reporting software that automatically generates customized report impressions based on radiologists' findings and preferences, significantly streamlining the reporting process and improving productivity."

queries = [
    f'site:reddit.com "{seed.company}" "{seed.product}"',
    f'site:reddit.com "{seed.product}"',
    # f'"{seed.company}"',
    # f'"{seed.company}" news',
]

queries

['site:reddit.com "Rad.ai" "Omni"', 'site:reddit.com "Omni"']

In [122]:
from google_search import search

queries_results = [list(search(query, num=10)) for query in queries]
queries_results

[32m2024-09-19 09:00:12.904[0m | [34m[1mDEBUG   [0m | [36mgoogle_search[0m:[36msearch[0m:[36m58[0m - [34m[1mGoogle search results: {'kind': 'customsearch#search', 'url': {'type': 'application/json', 'template': 'https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json'}, 'queries': {'request': [{'title': 'Google Custom Search - site:reddit.com "Rad.ai" "Omni"', 'totalResults': '2', 's

[[SearchResult(title='Powerscribe One -- are you for real? : r/Radiology', link='https://www.reddit.com/r/Radiology/comments/18bymqt/powerscribe_one_are_you_for_real/', snippet='Dec 6, 2023 ... Rad AI is mostly known for automatically generating Impressions and this September launched Omni Reporting which was designed by\xa0...', formattedUrl='https://www.reddit.com/r/Radiology/.../powerscribe_one_are_you_for_real/'),
  SearchResult(title='Is the Grey Matter DNA Asmuths? : r/Ben10', link='https://www.reddit.com/r/Ben10/comments/18oytib/is_the_grey_matter_dna_asmuths/', snippet="Dec 23, 2023 ... Graymatter/Ben understood what Rad's AI said immediately after transforming ... For all those people who say he'd beat omni man just by becoming a\xa0...", formattedUrl='https://www.reddit.com/r/Ben10/.../is_the_grey_matter_dna_asmuths/')],
 [SearchResult(title='Omni source code now available : r/kubernetes', link='https://www.reddit.com/r/kubernetes/comments/1be30qa/omni_source_code_now_availab

In [123]:
from typing import Dict, List
from google_search import SearchResult

def index_search_results(search_results: List[SearchResult]) -> Dict[int, SearchResult]:
    indexed_results = {}

    # TODO: Deduplicate search results

    for i, search_result in enumerate(search_results):
        indexed_results[i+1] = search_result

    return indexed_results

def format_search_results(indexed_search_results: Dict[int, SearchResult]) -> str:
    result = ""

    for i, search_result in sorted(indexed_search_results.items()):
        domain = search_result.link.split('/')[2]
        result += f"{i}. {search_result.title} (from {domain})\n"
        result += f"{search_result.snippet}\n\n"

    return result

print(format_search_results(index_search_results(queries_results[0][:5])))

1. Powerscribe One -- are you for real? : r/Radiology (from www.reddit.com)
Dec 6, 2023 ... Rad AI is mostly known for automatically generating Impressions and this September launched Omni Reporting which was designed by ...

2. Is the Grey Matter DNA Asmuths? : r/Ben10 (from www.reddit.com)
Dec 23, 2023 ... Graymatter/Ben understood what Rad's AI said immediately after transforming ... For all those people who say he'd beat omni man just by becoming a ...




In [128]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

from pydantic import BaseModel, Field
from typing import List

indexed_search_results = index_search_results(queries_results[0] + queries_results[1])
formatted_search_results = format_search_results(indexed_search_results)

class RatedResult(BaseModel):
    result_number: int = Field(description="Number of the search result", ge=min(indexed_search_results.keys()), le=max(indexed_search_results.keys()))

    customer_relevance: int = Field(description="Relevance of the search result for customers, such as product feedback and critique (1 is least relevant, 5 is most relevant)", ge=1, le=5)
    company_relevance: int = Field(description="Relevance of the search result to the company critique (1 is least relevant, 5 is most relevant)", ge=1, le=5)

class RatedResults(BaseModel):
    rated_queries: List[RatedResult] = Field(description="List of relevance-rated search results")

_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """
Review a list of search results and rate their relevance (1-5 scale) to A) the company {company} and B) the customer experience with {product} by {company} on a 1-5 scale.

Additional information:
{company}: {company_description}
{product}: {product_description}
            """,
        ),
        (
            "human",
            """
Search results:
{search_results}
            """,
        ),
    ]
)


llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
runnable = _prompt | llm.with_structured_output(RatedResults)
llm_result = runnable.with_config({"run_name": "POC: Rerank search results"}).invoke(
    {
        "company": seed.company,
        "company_description": company_description,
        "product": seed.product,
        "product_description": product_description,
        "search_results": formatted_search_results,
    }
)

llm_result

RatedResults(rated_queries=[RatedResult(result_number=1, customer_relevance=4, company_relevance=5), RatedResult(result_number=2, customer_relevance=1, company_relevance=1), RatedResult(result_number=3, customer_relevance=1, company_relevance=1), RatedResult(result_number=4, customer_relevance=1, company_relevance=1), RatedResult(result_number=5, customer_relevance=1, company_relevance=1), RatedResult(result_number=6, customer_relevance=1, company_relevance=1), RatedResult(result_number=7, customer_relevance=1, company_relevance=1), RatedResult(result_number=8, customer_relevance=1, company_relevance=1), RatedResult(result_number=9, customer_relevance=2, company_relevance=1), RatedResult(result_number=10, customer_relevance=1, company_relevance=1), RatedResult(result_number=11, customer_relevance=5, company_relevance=4), RatedResult(result_number=12, customer_relevance=1, company_relevance=1)])

In [129]:
import numpy as np

print(f"""
Number of search results: {len(llm_result.rated_queries)}
Average customer_relevance relevance: {np.mean([rated_result.customer_relevance for rated_result in llm_result.rated_queries]):.1f}
Average company relevance: {np.mean([rated_result.company_relevance for rated_result in llm_result.rated_queries]):.1f}
""")


Number of search results: 12
Average customer_relevance relevance: 1.7
Average company relevance: 1.6



In [130]:
link2query = {}

for query, query_results in zip(queries, queries_results):
    for query_result in query_results:
        link2query[query_result.link] = query

len(link2query)

12

In [131]:
print("""# Customer-relevant search results\n\n""")

for rated_result in sorted(llm_result.rated_queries, key=lambda x: x.customer_relevance, reverse=True)[:10]:
    search_result = indexed_search_results[rated_result.result_number]
    print(f"""
## [{search_result.title}]({search_result.link})
{search_result.snippet}

Customer relevance: {rated_result.customer_relevance}
Company relevance: {rated_result.company_relevance}

""")

print("""# Company-relevant search results\n\n""")
for rated_result in sorted(llm_result.rated_queries, key=lambda x: x.company_relevance, reverse=True)[:10]:
    search_result = indexed_search_results[rated_result.result_number]
    print(f"""
## [{search_result.title}]({search_result.link})
{search_result.snippet}

Customer relevance: {rated_result.customer_relevance}
Company relevance: {rated_result.company_relevance}

""")


# Customer-relevant search results



## [The Downgrade to Omni : r/ChatGPTPro](https://www.reddit.com/r/ChatGPTPro/comments/1cxyxce/the_downgrade_to_omni/)
May 23, 2024 ... 100 votes, 100 comments. I've been remarkably disappointed by Omni since it's drop. While I appreciate the new features, and how fast it is, ...

Customer relevance: 5
Company relevance: 4



## [Powerscribe One -- are you for real? : r/Radiology](https://www.reddit.com/r/Radiology/comments/18bymqt/powerscribe_one_are_you_for_real/)
Dec 6, 2023 ... Rad AI is mostly known for automatically generating Impressions and this September launched Omni Reporting which was designed by ...

Customer relevance: 4
Company relevance: 5



## [Omni hotels system outage : r/hotels](https://www.reddit.com/r/hotels/comments/1brhte6/omni_hotels_system_outage/)
Mar 30, 2024 ... Everyone has to be escorted to their room by an employee and the phones and Wi-Fi are down. Anyone else staying at an Omni that can confirm this ...

Customer 