# Proof of concept search module

The idea with this module is to Google search 100+ results, then use an LLM to organize and re-rank the results.

## Ideas to consider

- Multiple searches: The company, company + product, search for product reviews, search all time vs just recent
- Provide a markdown template to fill out with an "other" category, and refine the "other" listings

In [12]:
from core import CompanyProduct, init_requests_cache, init_langchain_cache, make_experiment_dir

init_requests_cache()
init_langchain_cache()

'/home/keith/company-detective/.cache/langchain.sqlite'

In [13]:
target = CompanyProduct.same("98point6")
experiment_dir = make_experiment_dir(target)

In [15]:
from search import search
from pprint import pprint

search_results = list(search(f"{target.company}", num=100))
pprint(search_results)

[SearchResult(title='98point6 Virtual Care Platform for async and real-time telehealth', link='https://www.98point6.com/', snippet='98point6 empowers health systems to decrease the administrative burden on clinicians, promote quality, and increase patient satisfaction.', formattedUrl='https://www.98point6.com/'),
 SearchResult(title='98point6 hit by new layoffs in latest change at health tech startup ...', link='https://www.geekwire.com/2024/98point6-hit-by-new-layoffs-in-latest-change-at-health-tech-startup/', snippet="Apr 23, 2024 ... 98point6 hit by new layoffs in latest change at health tech startup ... GeekWire's in-depth startup coverage tells the stories of the Pacific\xa0...", formattedUrl='https://www.geekwire.com/.../98point6-hit-by-new-layoffs-in-latest-change...'),
 SearchResult(title='Careers | 98point6 Technologies - Seattle', link='https://www.98point6.com/about-us/careers/', snippet="98point6 Technologies is on a mission to provide equitable access to exceptional care. 

In [16]:
from typing import List
from search import SearchResult

def result_to_markdown(search_result: SearchResult) -> str:
    return f"[{search_result.title}]({search_result.link})\n{search_result.snippet}"

def results_to_markdown(search_results: List[SearchResult]) -> str:
    return "\n\n".join(result_to_markdown(result) for result in search_results)

print(results_to_markdown(search_results))

[98point6 Virtual Care Platform for async and real-time telehealth](https://www.98point6.com/)
98point6 empowers health systems to decrease the administrative burden on clinicians, promote quality, and increase patient satisfaction.

[98point6 hit by new layoffs in latest change at health tech startup ...](https://www.geekwire.com/2024/98point6-hit-by-new-layoffs-in-latest-change-at-health-tech-startup/)
Apr 23, 2024 ... 98point6 hit by new layoffs in latest change at health tech startup ... GeekWire's in-depth startup coverage tells the stories of the Pacific ...

[Careers | 98point6 Technologies - Seattle](https://www.98point6.com/about-us/careers/)
98point6 Technologies is on a mission to provide equitable access to exceptional care. We're collaborators, innovators and passionate problem-solvers ...

[98point6 Technologies Inc. | LinkedIn](https://www.linkedin.com/company/98point6-tech-inc)
98point6 Technologies Inc. | 10151 followers on LinkedIn. Expand your virtual clinic with tec

In [24]:
from typing import List
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages.ai import AIMessage
from langchain_openai import ChatOpenAI

from core import CompanyProduct
from dotenv import load_dotenv

load_dotenv()


prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """
You're an expert at organizing search results.
Given search results for a company or product, organize them into the following headers:

# Official social media
# Job boards
# App stores
# Product reviews
# News articles (most recent first)
# Key employees (with subheaders by employee)
# Other pages on the company website
# Other

Include the publication date after the link, if available.

Unless otherwise specified, order the results in each section from most to least relevant.
Format the output as a markdown document, preserving any links in the source.
Organize ALL search results into these headers; do not omit any results.
            """,
        ),
        (
            "human",
            """
            Company: {company_name}
            Product: {product_name}
            
            Search results: 
            {text}
            """,
        ),
    ]
)


def summarize(
    target: CompanyProduct, search_results: List[SearchResult], debug=True
) -> AIMessage:
    unified_markdown = results_to_markdown(search_results)

    if debug:
        print(f"{len(unified_markdown):,} characters in unified context")

    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    runnable = prompt | llm
    result = runnable.invoke({"text": unified_markdown, "company_name": target.company, "product_name": target.product})
    result.content = result.content.strip().strip("```markdown").strip("```")
    return result

summary = summarize(target, search_results)
print(summary.content)

with open(f"{experiment_dir}/search_results.md", "w") as f:
    f.write(summary.content)

    f.write("\n# Sources\n")
    for result in search_results:
        f.write(result_to_markdown(result) + "\n\n")

27,429 characters in unified context

# Official social media
- [98point6 Technologies Inc. | LinkedIn](https://www.linkedin.com/company/98point6-tech-inc)
- [98point6 Technologies Inc. (@98point6Inc) / X](https://twitter.com/98point6inc?lang=en)
- [98point6 Technologies Inc. - YouTube](https://m.youtube.com/@98point6Tech)
- [98point6 | Seattle WA](https://www.facebook.com/98point6inc/)

# Job boards
- [Careers | 98point6 Technologies - Seattle](https://www.98point6.com/about-us/careers/)
- [Jobs at 98point6 - Otta](https://app.otta.com/companies/98point6)
- [98point6 Careers | Wellfound (formerly AngelList Talent)](https://wellfound.com/company/98point6)
- [98point6 Careers | Levels.fyi](https://www.levels.fyi/companies/98point6)
- [25+ 98point6 Jobs, Employment August 1, 2024| Indeed.com](https://www.indeed.com/q-98point6-jobs.html)

# App stores
- [98point6 on the App Store](https://apps.apple.com/us/app/98point6/id1157653928)
- [98point6 - Apps on Google Play](https://play.google.c