# Content Evaluation with LLMs
* **Using crawling**
* **In bulk (ask multiple questions about each page)**
* **Running concurrently**
* **Produces a standard summary report**

In [9]:
import pandas as pd
import os
import json
from concurrent import futures
from openai import OpenAI
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

## Crawl a website optionally setting the selector for the article

If not set, the default `BODY_TEXT` selector will be used.

In [49]:
import advertools as adv

adv.crawl(
    'https://www.whitehouse.gov/briefing-room/speeches-remarks/',
    'wh_briefing_room.jl',
    follow_links=True,
    include_url_regex='/briefing-room/speeches-remarks/',
    custom_settings={
        'CLOSESPIDER_PAGECOUNT': 100
    })

In [50]:
df = pd.read_json('wh_briefing_room.jl', lines=True)

## Use [Google's helpful content guidelines](https://developers.google.com/search/docs/fundamentals/creating-helpful-content)

In [5]:
content_guidelines = pd.read_csv('quality_guidelines.csv')


## Define a single function to get a completion

* The title and body_text go to the LLM
* The URL is used here to to link each response to the page it analyzed
* This is important when runnin concurrent code because requests are not made in sequence, so we need a way to map the responses somehow. The `make_completion` function returns the URL with the completion as a tuple

In [6]:
def make_completion(url, title, body_text):
    try:
        completion = client.chat.completions.create(
            model="gpt-4o",
            temperature=0,
            seed=123,
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": f"""
Please answer the following questions about this article and its title.
Ignore the parts that look like footer links, header links, or any navigation items.
Ignore text that looks like advertising.
Respond in JSON where questions are keys and answers are values.
Answers should be boolean only.

article_title: {title}

aritcle_text: {body_text.replace('@@', ' ')}


questions: {content_guidelines.head(21)['question'].tolist()}
"""}
            ])
        return [url, completion]
    except Exception as e:
        return f'Exception occurred: {url} {str(e)}'

## Make concurrent requests

In [15]:
responses = []

with futures.ThreadPoolExecutor(max_workers=4) as executor:
    to_do = []
    for url, title, body_text in df[['url', 'title', 'body_text']].values:
        future = executor.submit(make_completion, url, title, body_text)
        to_do.append(future)
    done_iter = futures.as_completed(to_do)
    for future in done_iter:
        responses.append(future.result())

## Each response has two parts
URL:

In [52]:
responses[0][0]

'https://www.whitehouse.gov/briefing-room/speeches-remarks/2024/06/17/remarks-by-national-economic-council-deputy-director-daniel-hornung-on-the-generational-opportunity-to-reorient-the-tax-system-in-2025/'

Completion

In [19]:
responses[0][1].dict().keys()

dict_keys(['id', 'choices', 'created', 'model', 'object', 'system_fingerprint', 'usage'])

## Create a summary report per quetion per URL

The report DataFrame has URLs as column names. They are shown here as numbers not to clutter the report. These numbers corresponsd to the indexes of `responses`.

In [48]:
eval_df = pd.DataFrame([json.loads(res[1].dict()['choices'][0]['message']['content'].replace('```json', '').replace('```', '').strip()) for res in responses]).T
(eval_df
 .astype(int)
 .assign(avg=lambda df: df.mean(axis=1))
 .iloc[:, list(range(20)) + [-1]]
 .style
 .background_gradient(cmap='cividis').format({'avg': '{:.1%}'})
 .set_caption("<h3>Content evaluated based on Google's criteria</h3><br>111 articles, 1st twenty shown only for space"))



Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,avg
"Does the content provide original information, reporting, research, or analysis?",1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,1,81.1%
"Does the content provide a substantial, complete, or comprehensive description of the topic?",1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,1,80.2%
Does the content provide insightful analysis or interesting information that is beyond the obvious?,1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,0,1,1,70.3%
"If the content draws on other sources, does it avoid simply copying or rewriting those sources, and instead provide substantial additional value and originality?",1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,1,81.1%
"Does the main heading or page title provide a descriptive, helpful summary of the content?",1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,99.1%
Does the main heading or page title avoid exaggerating or being shocking in nature?,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,100.0%
"Is this the sort of page you'd want to bookmark, share with a friend, or recommend?",1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,0,1,1,70.3%
"Would you expect to see this content in or referenced by a printed magazine, encyclopedia, or book?",1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,1,80.2%
Does the content provide substantial value when compared to other pages in search results?,1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,1,79.3%
Does the content have any spelling or stylistic issues?,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0%
