<a href="https://colab.research.google.com/github/mandip-openai/SourceUtilityAG/blob/main/Source_Utility_AutoGrader_v0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Grading with chatberry


In [1]:
import asyncio
import typing

import pydantic
import rich

import chz
import sonic_training.experiments.index_switcher.v2.grader.utils as grader_utils
from maraschino_rater.base import Document, Query
from oai_serialization import oai_json
from search_service.api.user_metadata import UserMetadata
from sonic_training.experiments.tools.lean_browser_sampling import load_completer, make_convo

"""
oaipkg run maraschino_data_pipeline.grader_experiment.grade
"""

TModelType = typing.Literal["tb2_with_browse", "cb"]
TRatingMode = typing.Literal["page", "snippet_w_context", "snippet_wo_context"]


class DocumentWithContext(pydantic.BaseModel):
    document: Document
    context: str | None = None


DEFAULT_INSTRUCTION = """
You are a professional grader that judges the quality of search results.
You need to follow the rating instruction below to rate a given QUERY - DOCUMENT pair with respect to each dimension.
The QUERY contains the QUERY TEXT, DATE, and LOCATION of the searcher. The DOCUMENT contains TITLE, URL, PUBLICATION DATE, and CONTENT.
Optionally, the DOCUMENT may also be provided with an extra CONTEXT, which contains the text of the page that the CONTENT is extracted from.
When it is provided, you can refer to it to make your decision. But the grading should focus on CONTENT. If the key information is missing from CONTENT but in CONTEXT, you rate it as low.
Write your decision for each aspects in the markdown format. It should contain a title of the area, the rating, and detailed justification of your decision.
If you have any confusion or something worth commenting, can also output them in a "notes".

The response of each aspect (dimension) is:
## <Aspect>
* rating: <rating>
* justification: <justification>
* notes: <notes>

The Rubrics is:

"""

DEFAULT_RUBRICS = r"""
To evaluate search results effectively, think like a professional rater using a structured approach. Your goal is to assess each result’s quality based on its ability to meet user needs, focusing on the following key factors: Relevance, Accuracy, Authority, Coverage, Timeliness, and User Experience. Follow these instructions:

1. Step into the user’s shoes: Consider the user’s intent behind the query. What specific information or action might the user need? Tailor your evaluation accordingly.
2. Use the guidelines as a framework: Evaluate based on nuanced distinctions. Avoid broad generalizations—consider the context, specificity, and depth of the result.
3. Identify strengths and weaknesses: For each evaluation, pinpoint what the result does well and where it falls short, using examples from the content.
4. Assign a rating on a 1-7 scale for each category, with 1 representing the lowest and 7 the highest level of relevance. Justify your rating with concrete reasoning that aligns with the established evaluation criteria.
5. Evaluation Levels: Unless specified otherwise, use the following scale to classify evaluation levels: Low (1-2): Indicates minimal relevance or poor alignment with the query. Medium (3-5): Reflects partial relevance, with some alignment to the query but lacking depth or focus. High (6-7): Signifies strong relevance and clear alignment with the query intent, providing comprehensive and precise information. Please interpolate within the designated range as appropriate to reflect nuances in the evaluation.
6. Contextualize the importance of each category: For example, accuracy and authority are more critical for medical or scientific queries, while timeliness is key for current event searches."

### Categories and Expanded Prompts:

#### 1\. Relevance Evaluation

Prompt:

"Evaluate the relevance of the search result to the user’s query by carefully analyzing the query's intent, the specificity of the content, and the contextual appropriateness of the result. Approach the evaluation as if you are a search quality rater, focusing on the user's needs and how well the result fulfills them.

Evaluation Criteria:

* Does the result match the query intent?
  * Identify the primary intent behind the query (informational).
  * Assess whether the result directly fulfills this intent. Consider whether the result provides information, a resource, or a service the user is seeking.
  * Examples:
    * High relevance: For the query *"buy iPhone 15 online"*, a result linking directly to Apple’s iPhone 15 purchase page or a reputable retailer.
    * Low relevance: A result discussing the history of iPhone models or one linking to outdated iPhone models for sale.
* How specific is the content?
  * Determine whether the result provides focused, detailed, and relevant information that directly addresses the query.
  * Avoid results that provide generic, tangential, or incomplete information.
  * Examples:
    * High specificity: A result listing *“restaurants in San Francisco that serve vegan dishes”* with detailed descriptions, reviews, and locations.
    * Low specificity: A result listing restaurants worldwide or discussing general food options.
* Does the result align with the query's context?
  * Consider contextual factors such as location, device type, language, and user-specific needs.
  * Evaluate whether the result adapts to the user’s circumstances (e.g., is the content mobile-friendly for users searching on smartphones, or does it offer regionally relevant information for location-specific queries).
  * Examples:
    * High context alignment: For the query *"weather in New York City"* on a mobile device, a result displaying today’s local weather with a mobile-friendly interface.
    * Low context alignment: A result showing a weather forecast for a different city or outdated information.

Evaluation Level Definition:

* High Relevance: The result directly addresses the query’s intent, is highly specific, and aligns with the query's context. The information provided is complete, useful, and easy to understand.
* Medium Relevance: The result partially addresses the query, providing some relevant information but lacking depth, specificity, or contextual alignment.
* Low Relevance: The result does not address the query or provides irrelevant, generic, or misleading content.

Instruction: "Be specific in your assessment: Highlight both the relevance strengths (e.g., specificity) and weaknesses (e.g., misaligned intent). Think about whether the result would fully satisfy the user."

Evaluation Levels: Unless specified otherwise, use the following scale to classify evaluation levels: Low (1-2): Indicates minimal relevance or poor alignment with the query. Medium (3-5): Reflects partial relevance, with some alignment to the query but lacking depth or focus. High (6-7): Signifies strong relevance and clear alignment with the query intent, providing comprehensive and precise information. Please interpolate within the designated range as appropriate to reflect nuances in the evaluation.

"""


class Grader:
    def __init__(self, instruction: str = DEFAULT_INSTRUCTION, debug: bool = False):
        self.instruction = instruction
        self.debug = debug

    async def __call__(
        self, query: Query, document: Document | DocumentWithContext, rubrics: str = ""
    ):
        date_spec = ""
        if query.create_timestamp:
            date_spec = f"{query.create_timestamp.date()}"
        geo_spec = ""
        if (
            query.user_metadata
            and query.user_metadata.user_country
            and query.user_metadata.user_region
            and query.user_metadata.ip_city
        ):
            geo_spec = f"country - {query.user_metadata.user_country}, region - {query.user_metadata.user_region}, city - {query.user_metadata.ip_city}"

        context = None
        doc = None
        if isinstance(document, DocumentWithContext):
            context = document.context
            doc = document.document
        else:
            doc = document

        input = f"""
# Input Query and Document

QUERY:
QUERY TEXT: {query.query}
DATE: {date_spec or "N/A"}
LOCATION: {geo_spec or "N/A"}

DOCUMENT:
TITLE: {doc.title or "N/A"}
URL: {doc.url}
PUBLICATION DATE: {doc.pub_date or "N/A"}
CONTEXT: {context or "N/A"}
CONTENT: {doc.content or "N/A"}
    """
        try:
            return await self.rate_text(input, rubrics)
        except Exception as e:
            return f"Error: {e}"

    async def rate_text(self, text: str, rubrics: str):
        raise NotImplementedError()


class ToolberryWithBrowseGrader(Grader):
    async def rate_text(self, text: str, rubrics: str):
        convo = make_convo(
            "\n".join(
                [
                    self.instruction,
                    rubrics,
                    text,
                    "You can use browser.tool_call to help you grading. ",
                ]
            )
        )
        if self.debug:
            print(f"DEBUG: {convo=}")

        tc = await load_completer()
        result = None
        try:
            async for result in tc.async_completion_stream(convo, include_system_messages=False):
                pass
            assert result is not None
            if self.debug:
                for message in result.input_conversation.messages:
                    print(f"input: {message=}")
                for message in result.output_messages:
                    print(f"output: {message=}")
            return result.output_messages[-1].content.model_dump_json()
        except Exception as e:
            return f"Error: {e}"


class ChatberryGrader(Grader):
    def __init__(self, juice=128, instruction=DEFAULT_INSTRUCTION, debug=False):
        super().__init__(instruction, debug)
        self.juice = juice

    async def rate_text(self, text: str, rubrics: str):
        if self.debug:
            print(f"DEBUG: {text=}, {rubrics=}")
        r, _cot = await grader_utils.query_chatberry_parsed(
            prompt="\n".join([self.instruction, rubrics, text]), reward_multipler=self.juice
        )
        if self.debug:
            print(f"DEBUG: {r=}, {_cot=}")
        return r


def get_grader(model: TModelType, debug: bool = False) -> Grader:
    if model == "tb2_with_browse":
        return ToolberryWithBrowseGrader(debug=debug)
    if model == "cb":
        return ChatberryGrader(debug=debug)
    raise ValueError(f"Invalid model: {model}")


def build_grader_input(
    record: typing.Any,
    rating_mode: TRatingMode,
) -> typing.Generator[tuple[Query, DocumentWithContext], typing.Any, typing.Any]:
    query = record["query"]
    location = [s.strip() for s in query.get("location", ",,").split(",")]
    q = Query(
        query=query["query"],
        create_timestamp=query.get("timestamp"),
        user_metadata=UserMetadata(
            user_country=location[2],
            user_region=location[1],
            ip_city=location[0],
        ),
    )

    search_result_groups = record["search_result_groups"]
    for group in search_result_groups:
        for page in group["pages"]:
            if rating_mode == "page":
                d = DocumentWithContext(
                    document=Document(
                        url=page.get("url"),
                        title=page.get("title"),
                        pub_date=page.get("pub_date"),
                        content=page.get("content"),
                    ),
                )
                yield q, d
            else:
                for snippet in page.get("snippets", []):
                    d = DocumentWithContext(
                        document=Document(
                            url=page.get("url"),
                            title=page.get("title"),
                            pub_date=page.get("pub_date"),
                            content=snippet,
                        ),
                        context=(
                            page.get("content", "") if rating_mode == "snippet_w_context" else ""
                        ),
                    )
                    yield q, d


def dump_jsonl(file_path: str, data: typing.Any):
    oai_json.jsonl_dump(data, file_path)
    print(f"Dumped data to {file_path}, use http://go/azv/{file_path} to view")


def _progress(*, console: rich.console.Console) -> rich.progress.Progress:
    return rich.progress.Progress(
        *rich.progress.Progress.get_default_columns(),
        rich.progress.MofNCompleteColumn(),
        console=console,
    )


async def run(
    input_path: str,
    output_path: str,
    model: TModelType,
    rating_mode: TRatingMode,
    limit: int | None = None,
    debug: bool = False,
) -> None:
    grader = ChatberryGrader(debug=debug) # get_grader(model, debug=debug)
    res = []
    count = 0
    stream = oai_json.jsonl_load_stream(input_path)

    with _progress(console=rich.get_console()) as pbar:
        task = pbar.add_task("Grading ...", total=limit)
        for record in stream:
            tasks = []

            qs = []
            ds = []
            for query, document in build_grader_input(record, rating_mode):
                qs.append(query)
                ds.append(document)
                tasks.append(grader(query, document))

            results = await asyncio.gather(*tasks)
            for query, document, result in zip(qs, ds, results):
                res.append(
                    {
                        "query": query.model_dump_json(),
                        "document": document.model_dump_json(),
                        "grading": result,
                    }
                )
                pbar.update(task, advance=1)
                count += 1
                if limit is not None and count >= limit:
                    dump_jsonl(output_path, res)
                    return
    dump_jsonl(output_path, res)

'''
if __name__ == "__main__":
    chz.entrypoint(main, allow_hyphens=True)
'''


# import nest_asyncio
# import asyncio

# nest_asyncio.apply()


async def main_async(
    input_path: str = "/tmp/dump_feather_results/content_enriched_search_result_groups_dump_top20.jsonl",
    output_path: str = "/tmp/dump_feather_results/graded_search_result_groups_dump_top20.jsonl",
    model: TModelType = "cb",
    rating_mode: TRatingMode = "snippet_wo_context",
    limit: int | None = 1,
    debug: bool = False,
) -> None:
    await run(input_path, output_path, model, rating_mode, limit, debug)

# Execute the asynchronous main function in a Jupyter Notebook cell
await main_async()
# asyncio.run(main_async())





Output()

## Converting the graded data into a dataframe

In [25]:
import pandas as pd
import json
import blobfile as bf

# Use the local file path
file_path = "az://oairic1/oaibwen/data/source_utility/experiments/ds=20250123/identity=685f9d0e-1b03-442e-b09d-b3339e2a4340/content_enriched_search_result_groups_dump.jsonl"

# Initialize an empty list to collect all the records
records = []

# Read the JSONL file line by line and parse each line as JSON
try:
    with bf.BlobFile(file_path, 'r') as f:
        for i, line in enumerate(f):
            try:
                # Load each line as a JSON object
                record = json.loads(line)
                records.append(record)
            except json.JSONDecodeError as e:
                print(f"Error decoding JSON on line {i + 1}: {e}")
except FileNotFoundError:
    print(f"File not found: {file_path}")
except Exception as e:
    print(f"An error occurred while reading the file: {e}")

# Print the number of records loaded
print(f"Total records loaded: {len(records)}")

# Convert the list of records to a DataFrame
df = pd.DataFrame(records)

# Display the first few rows of the DataFrame to inspect
print("Initial DataFrame structure:")
print(df.head())

# Function to parse nested JSON strings
def parse_nested_json(json_obj):
    if isinstance(json_obj, str):
        try:
            return json.loads(json_obj)
        except (json.JSONDecodeError, TypeError):
            return json_obj
    return json_obj

# Apply parsing to the 'document' and 'query' columns if they are in string format
if 'document' in df.columns:
    df['document'] = df['document'].apply(parse_nested_json)
if 'query' in df.columns:
    df['query'] = df['query'].apply(parse_nested_json)

# Further flatten the nested dictionaries if needed
document_df = pd.json_normalize(df['document']) if 'document' in df.columns else pd.DataFrame()
query_df = pd.json_normalize(df['query']) if 'query' in df.columns else pd.DataFrame()

# Combine the DataFrame with the parsed 'document' and 'query' columns
if not document_df.empty:
    df = pd.concat([df.drop(columns=['document']), document_df], axis=1)
if not query_df.empty:
    df = pd.concat([df.drop(columns=['query']), query_df], axis=1)

# Display the first few rows of the consolidated DataFrame
print("Consolidated DataFrame structure:")
print(df.head())


Total records loaded: 30
Initial DataFrame structure:
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  

In [32]:
pd.set_option('display.max_colwidth', None)
df.grading[2]


'## Relevance\n* rating: 6\n* justification: The user\'s query is "google translate," which likely indicates an intent to access or find information about Google Translate. The provided document is a page from the Apple App Store for the Google Translate app, including details about its features and supported languages. This result directly addresses the query, offering specific and detailed information about the Google Translate app, including how to download it. It aligns with the user\'s context, particularly if they are using an Apple device, such as an iPhone or iPad. Therefore, the result is highly relevant, though it might be slightly less so if the user is on a non-Apple device.\n* notes: The relevance could be even higher if the user\'s device type were specified (e.g., iOS device user), confirming the applicability of the App Store link.\n\n## Timeliness\n* rating: 4\n* justification: The document provides information about the Google Translate app, including its features and

In [59]:
# Function to parse a single section

import pandas as pd
import re

def parse_section(section):
    lines = section.strip().split('\n')
    category = lines[0].replace('## ', '').strip()
    rating_line = next((line for line in lines if line.startswith('* rating:')), None)
    justification_line = next((line for line in lines if line.startswith('* justification:')), None)
    notes_line = next((line for line in lines if line.startswith('* notes:')), None)

    rating = rating_line.replace('* rating:', '').strip() if rating_line else None
    justification = justification_line.replace('* justification:', '').strip() if justification_line else None
    notes = notes_line.replace('* notes:', '').strip() if notes_line else None

    return {
        'category': category,
        'rating': rating,
        'justification': justification,
        'notes': notes
    }

# Function to process the 'grading' column
def process_grading_column(grading_text):
    # Split the grading text into sections
    sections = re.split(r'\n## ', grading_text)
    parsed_data = [parse_section(section) for section in sections if section.strip()]

    # Convert parsed data to DataFrame
    grading_df = pd.DataFrame(parsed_data)
    return grading_df

# Apply the function to each row in the 'grading' column and concatenate the results

flattened_dfs = df['grading'].apply(process_grading_column)

# Concatenate the list of DataFrames into a single DataFrame
flattened_df = pd.concat(flattened_dfs.tolist(), ignore_index=True)

# Display the flattened DataFrame
flattened_df.head(15)

Unnamed: 0,category,rating,justification,notes
0,Relevance Evaluation,7,"The query ""google translate"" suggests that the user is seeking information about the Google Translate service or application. The document is the App Store page for Google Translate, providing direct access to download the app, view its features, updates, and user reviews. The content is highly specific to the query intent, directly matching the user's likely need to find or download the Google Translate app. The information is current and contextually appropriate for a user located in the US seeking the iOS version of the app.",None.
1,Timeliness Evaluation,7,"The document reflects the most recent update to the Google Translate app as of December 11, 2024, which is less than a month prior to the query date of January 7, 2025. This indicates that the application is being actively maintained with recent bug fixes and improvements. Although the user reviews showcased are from early 2022, it's reasonable to assume that the App Store page also includes more recent reviews; the presented selection may be a subset. Overall, the content is current and up-to-date, aligning with user expectations for timeliness.",The outdated user reviews included in the content may not reflect the most current user experiences with the app.
2,Navigational or Transactional,7,"The query ""google translate"" suggests a navigational intent to reach the official Google Translate application, or a transactional intent to download or access the service. The search result directly leads to the App Store page for the Google Translate app, enabling the user to download the application immediately. This fulfills both navigational and transactional intents at the highest level by providing direct access to the official app's page without additional steps.",None.
3,Geographic & Cultural Relevance,7,"The document is the US App Store page for the Google Translate app, perfectly aligning with the user's location in Seattle, Washington, USA. This ensures that the content is culturally appropriate, and any region-specific information, such as language support and functionalities, is relevant to a US audience. The App Store is a trusted, authoritative source for app distribution in this region, enhancing its local authority.",None.
4,Accuracy Evaluation,7,"The document is sourced directly from the official Apple App Store, providing verifiable and accurate information about the Google Translate app, including version updates and user reviews. The details such as the ""What's New"" section accurately reflect the app's latest updates, and the user reviews are presented authentically, reflecting genuine user experiences. There is no evidence of misinformation or omission of critical information.",None.
5,High Stakes,Not High Stake,"The document is an App Store page for the Google Translate application, primarily intended for users to download and obtain information about the app. While the translation tool could be used in contexts impacting health, legal matters, or safety, this webpage itself does not provide direct information or services in those High Stake categories. Therefore, the page does not qualify as a High Stake page in terms of content that significantly impacts a person's well-being.",None.
6,Harmful,Not Harmful,"The document is an official App Store page for the Google Translate application, providing legitimate information about the app, its updates, and user reviews. There is no indication of harmful misinformation, promotion of self-harm, violence, illegal activities, or deceptive practices. The content is standard for an app distribution platform and poses no apparent risk to individuals or society.",None.
7,Explicit Content,Not Porn,"The content presented is an official App Store page for the Google Translate application and does not contain any sexually explicit or suggestive materials. It focuses on app information, updates, and user reviews, all of which are appropriate for general audiences.",None.
8,Spam,Not Spam,"The document is an official page on Apple's App Store, a legitimate and trusted platform for app distribution. There is no evidence of spam practices such as cloaking, keyword stuffing, link spam, or deceptive practices. The content is transparent, user-oriented, and adheres to standard web quality guidelines.",None.
9,Source Quality,High,,None.


In [113]:
import pandas as pd
import json
import blobfile as bf

# Use the local file path
file_path = "/tmp/dump_feather_results/graded_search_result_groups_dump_Rel_SamplewC.jsonl"

# Initialize an empty list to collect all the records
records = []

# Read the JSONL file line by line and parse each line as JSON
try:
    with open(file_path, 'r') as f:
        for i, line in enumerate(f):
            try:
                # Load each line as a JSON object
                record = json.loads(line)
                records.append(record)
            except json.JSONDecodeError as e:
                print(f"Error decoding JSON on line {i + 1}: {e}")
except FileNotFoundError:
    print(f"File not found: {file_path}")
except Exception as e:
    print(f"An error occurred while reading the file: {e}")

# Print the number of records loaded
print(f"Total records loaded: {len(records)}")

# Convert the list of records to a DataFrame
df = pd.DataFrame(records)

# Function to parse nested JSON strings
def parse_nested_json(json_obj):
    if isinstance(json_obj, str):
        try:
            return json.loads(json_obj)
        except (json.JSONDecodeError, TypeError):
            return json_obj
    return json_obj

# Apply parsing to the 'document' and 'query' columns if they are in string format
if 'document' in df.columns:
    df['document'] = df['document'].apply(parse_nested_json)
if 'query' in df.columns:
    df['query'] = df['query'].apply(parse_nested_json)

# Further flatten the nested dictionaries if needed
document_df = pd.json_normalize(df['document']) if 'document' in df.columns else pd.DataFrame()
query_df = pd.json_normalize(df['query']) if 'query' in df.columns else pd.DataFrame()

# Combine the DataFrame with the parsed 'document' and 'query' columns
if not document_df.empty:
    df = pd.concat([df.drop(columns=['document']), document_df], axis=1)
if not query_df.empty:
    df = pd.concat([df.drop(columns=['query']), query_df], axis=1)

df_4o_mini_woC = df
import re
import pandas

def parse_grading_info(grading_text):
    # Extract category
    category_match = re.search(r'## (.+?)\n', grading_text)
    category = category_match.group(1).strip() if category_match else None

    # Extract rating
    rating_match = re.search(r'\* rating: (.+?)\n', grading_text)
    rating = rating_match.group(1).strip() if rating_match else None

    # Extract justification
    justification_match = re.search(r'\* justification: (.+?)(\n\*|\Z)', grading_text, re.S)
    justification = justification_match.group(1).strip() if justification_match else None

    return pd.Series([category, rating, justification])

# Apply the parsing function to the 'grading' column
df_4o_mini_woC[['category', 'rating', 'justification']] = df_4o_mini_woC['grading'].apply(parse_grading_info)
df_4o_mini_woC['4o_mini_woc_rating'] = df_4o_mini_woC['rating']
df_4o_mini_woC['4o_mini_woc_justification'] = df_4o_mini_woC['justification']

df_4o_mini_woC['4o_mini_woc_rating'] = pd.to_numeric(df_4o_mini_woC['4o_mini_woc_rating'], errors='coerce')
df_4o_mini_woC_mean = df_4o_mini_woC.groupby('query')['4o_mini_woc_rating'].agg(mean_4o_mini_woc_rating='mean', std_dev_4o_mini_woc='std').reset_index()
df_4o_mini_woC_mean

Total records loaded: 423


Unnamed: 0,query,mean_4o_mini_woc_rating,std_dev_4o_mini_woc
0,Evan Gershkovich prisoner exchange,3.304348,2.382413
1,How did Yusuf Dikec win a silver medal in shooting at the 2024 Olympics?,5.058824,1.951621
2,I am going to Big Sky this weekend. What should I look into doing that is really popular?,4.681818,1.886957
3,Is Amazon stock a good buy after recent sell-off?,4.222222,1.926764
4,Summer movies,5.322581,1.469401
5,Things to do with mom in boston this weekend,2.681818,0.716231
6,What would be a reasonable cost estimate for API access to GPT 4o and Claude 3.5 Sonnet to create a few hundred outputs of 1 page length per month,4.714286,1.270545
7,Why are people criticizing Jonathan Owens?,3.888889,2.348689
8,amzn,3.448276,1.638514
9,android docs,3.307692,1.086986


In [114]:
## Importing gpt-4o-mini with context

import pandas as pd
import json
import blobfile as bf

# Use the local file path
file_path = "/tmp/dump_feather_results/graded_search_result_groups_dump_Rel_SamplewC_wC.jsonl"

# Initialize an empty list to collect all the records
records = []

# Read the JSONL file line by line and parse each line as JSON
try:
    with open(file_path, 'r') as f:
        for i, line in enumerate(f):
            try:
                # Load each line as a JSON object
                record = json.loads(line)
                records.append(record)
            except json.JSONDecodeError as e:
                print(f"Error decoding JSON on line {i + 1}: {e}")
except FileNotFoundError:
    print(f"File not found: {file_path}")
except Exception as e:
    print(f"An error occurred while reading the file: {e}")

# Print the number of records loaded
print(f"Total records loaded: {len(records)}")

# Convert the list of records to a DataFrame
df = pd.DataFrame(records)


# Function to parse nested JSON strings
def parse_nested_json(json_obj):
    if isinstance(json_obj, str):
        try:
            return json.loads(json_obj)
        except (json.JSONDecodeError, TypeError):
            return json_obj
    return json_obj

# Apply parsing to the 'document' and 'query' columns if they are in string format
if 'document' in df.columns:
    df['document'] = df['document'].apply(parse_nested_json)
if 'query' in df.columns:
    df['query'] = df['query'].apply(parse_nested_json)

# Further flatten the nested dictionaries if needed
document_df = pd.json_normalize(df['document']) if 'document' in df.columns else pd.DataFrame()
query_df = pd.json_normalize(df['query']) if 'query' in df.columns else pd.DataFrame()

# Combine the DataFrame with the parsed 'document' and 'query' columns
if not document_df.empty:
    df = pd.concat([df.drop(columns=['document']), document_df], axis=1)
if not query_df.empty:
    df = pd.concat([df.drop(columns=['query']), query_df], axis=1)

# Display the first few rows of the consolidated DataFrame
print("Consolidated DataFrame structure:")
df_4o_mini_wC = df

import re
import pandas

def parse_grading_info(grading_text):
    # Extract category
    category_match = re.search(r'## (.+?)\n', grading_text)
    category = category_match.group(1).strip() if category_match else None

    # Extract rating
    rating_match = re.search(r'\* rating: (.+?)\n', grading_text)
    rating = rating_match.group(1).strip() if rating_match else None

    # Extract justification
    justification_match = re.search(r'\* justification: (.+?)(\n\*|\Z)', grading_text, re.S)
    justification = justification_match.group(1).strip() if justification_match else None

    return pd.Series([category, rating, justification])

# Apply the parsing function to the 'grading' column
df_4o_mini_wC[['category', 'rating', 'justification']] = df_4o_mini_wC['grading'].apply(parse_grading_info)

df_4o_mini_wC['4o_mini_wc_rating'] = df_4o_mini_wC['rating']
df_4o_mini_wC['4o_mini_wc_justification'] = df_4o_mini_wC['justification']

df_4o_mini_wC['4o_mini_wc_rating'] = pd.to_numeric(df_4o_mini_wC['4o_mini_wc_rating'], errors='coerce')
df_4o_mini_mean = df_4o_mini_wC.groupby('query')['4o_mini_wc_rating'].agg(mean_4o_mini_wc_rating='mean', std_dev_4o_mini_wc='std').reset_index()
df_4o_mini_mean

Total records loaded: 423
Consolidated DataFrame structure:


Unnamed: 0,query,mean_4o_mini_wc_rating,std_dev_4o_mini_wc
0,Evan Gershkovich prisoner exchange,6.173913,0.834058
1,How did Yusuf Dikec win a silver medal in shooting at the 2024 Olympics?,5.588235,0.939336
2,I am going to Big Sky this weekend. What should I look into doing that is really popular?,6.136364,0.940894
3,Is Amazon stock a good buy after recent sell-off?,5.111111,1.778595
4,Summer movies,5.612903,1.054433
5,Things to do with mom in boston this weekend,3.272727,0.827032
6,What would be a reasonable cost estimate for API access to GPT 4o and Claude 3.5 Sonnet to create a few hundred outputs of 1 page length per month,4.904762,1.445848
7,Why are people criticizing Jonathan Owens?,5.833333,0.514496
8,amzn,5.137931,1.641518
9,android docs,4.269231,1.185165


In [115]:
## Importing gpt-4o-mini with context

import pandas as pd
import json
import blobfile as bf

# Use the local file path
file_path = "/tmp/dump_feather_results/graded_search_result_groups_dump_Rel_SamplewC_cb_wC.jsonl"

# Initialize an empty list to collect all the records
records = []

# Read the JSONL file line by line and parse each line as JSON
try:
    with open(file_path, 'r') as f:
        for i, line in enumerate(f):
            try:
                # Load each line as a JSON object
                record = json.loads(line)
                records.append(record)
            except json.JSONDecodeError as e:
                print(f"Error decoding JSON on line {i + 1}: {e}")
except FileNotFoundError:
    print(f"File not found: {file_path}")
except Exception as e:
    print(f"An error occurred while reading the file: {e}")

# Print the number of records loaded
print(f"Total records loaded: {len(records)}")

# Convert the list of records to a DataFrame
df = pd.DataFrame(records)


# Function to parse nested JSON strings
def parse_nested_json(json_obj):
    if isinstance(json_obj, str):
        try:
            return json.loads(json_obj)
        except (json.JSONDecodeError, TypeError):
            return json_obj
    return json_obj

# Apply parsing to the 'document' and 'query' columns if they are in string format
if 'document' in df.columns:
    df['document'] = df['document'].apply(parse_nested_json)
if 'query' in df.columns:
    df['query'] = df['query'].apply(parse_nested_json)

# Further flatten the nested dictionaries if needed
document_df = pd.json_normalize(df['document']) if 'document' in df.columns else pd.DataFrame()
query_df = pd.json_normalize(df['query']) if 'query' in df.columns else pd.DataFrame()

# Combine the DataFrame with the parsed 'document' and 'query' columns
if not document_df.empty:
    df = pd.concat([df.drop(columns=['document']), document_df], axis=1)
if not query_df.empty:
    df = pd.concat([df.drop(columns=['query']), query_df], axis=1)

# Display the first few rows of the consolidated DataFrame
print("Consolidated DataFrame structure:")
df_o3_mini_woC = df

import re
import pandas

def parse_grading_info(grading_text):
    # Extract category
    category_match = re.search(r'## (.+?)\n', grading_text)
    category = category_match.group(1).strip() if category_match else None

    # Extract rating
    rating_match = re.search(r'\* rating: (.+?)\n', grading_text)
    rating = rating_match.group(1).strip() if rating_match else None

    # Extract justification
    justification_match = re.search(r'\* justification: (.+?)(\n\*|\Z)', grading_text, re.S)
    justification = justification_match.group(1).strip() if justification_match else None

    return pd.Series([category, rating, justification])

# Apply the parsing function to the 'grading' column
df_o3_mini_woC[['category', 'rating', 'justification']] = df_o3_mini_woC['grading'].apply(parse_grading_info)

df_o3_mini_woC['o3_mini_woC_rating'] = df_o3_mini_woC['rating']
df_o3_mini_woC['o3_mini_woC_justification'] = df_o3_mini_woC['justification']

df_o3_mini_woC['o3_mini_woC_rating'] = pd.to_numeric(df_o3_mini_woC['o3_mini_woC_rating'], errors='coerce')
df_o3_mini_woC_mean = df_o3_mini_woC.groupby('query')['o3_mini_woC_rating'].agg(mean_o3_mini_woC_rating='mean', std_dev_o3_mini_woC='std').reset_index()
df_o3_mini_woC_mean

Total records loaded: 423
Consolidated DataFrame structure:


Unnamed: 0,query,mean_o3_mini_woC_rating,std_dev_o3_mini_woC
0,Evan Gershkovich prisoner exchange,6.227273,1.109776
1,How did Yusuf Dikec win a silver medal in shooting at the 2024 Olympics?,5.5,1.825742
2,I am going to Big Sky this weekend. What should I look into doing that is really popular?,6.181818,0.732664
3,Is Amazon stock a good buy after recent sell-off?,3.5,1.465285
4,Summer movies,6.290323,1.006431
5,Things to do with mom in boston this weekend,3.047619,1.07127
6,What would be a reasonable cost estimate for API access to GPT 4o and Claude 3.5 Sonnet to create a few hundred outputs of 1 page length per month,2.857143,1.236354
7,Why are people criticizing Jonathan Owens?,4.277778,2.492472
8,amzn,6.75,0.518188
9,android docs,3.166667,0.56466


# **Importing the human eval data set**

In [116]:
# Import necessary libraries
import nest_asyncio
import asyncio
import pandas as pd
import numpy as np
import csv
import uuid
import blobfile as bf

from pydantic import BaseModel, Field
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_random_exponential

from feather_api_client.client import FeatherAPIError, FeatherClient, FeatherTier
from feather_api_client.search_types import (
    FilterValue,
    Operator,
    SearchType,
    StringListFilter,
    TaskFilter,
)
from feather_api_client.types import (
    FeatherRegiment,
    FeatherRegimentSourceInfo,
    FeatherRegimentType,
    FeatherRegimentVersion,
    FeatherTask,
    FeatherTaskBatchQuality,
    FeatherTaskStatus,
    FeatherUser,
    SearchFeatherTaskBatchesParams,
    SearchFeatherTasksV2Request,
    SearchRegimentsParams,
    SearchRegimentsResponse,
)
from harmony_components.directload.condor_query import CondorQuery
from harmony_components.directload.snapshots import SnapshotNameType
from harmony_components.directload.types import DatasetSplit
from oaicommon import oai_itertools

# Apply nest_asyncio to allow nested asyncio calls in Jupyter Notebook
nest_asyncio.apply()

# Define the asynchronous function to process tasks
async def process_tasks():
    with bf.BlobFile('az://oaidatasets2/chatgpt/sonic/x/npancha/filt.csv', 'r') as f:
        r = list(csv.reader(f))
    qts = {k.strip().lower(): float(v) for _, k, v in r[1:]}
    c = FeatherClient("regiment_client", instance=FeatherTier("prod"))
    try:
        tb = await c.get_task_batch(uuid.UUID('685f9d0e-1b03-442e-b09d-b3339e2a4340'))
    except Exception as e:
        print(f"Exception occurred while fetching task batch: {e}")
        return
    if tb is None:
        print("Error: Task batch not found or is empty.")
        return
    if not tb.task_ids:
        print("Error: No task IDs found in the batch.")
        return

    async def debug_get_task(task_id):
        task = await c.get_task(task_id)
        return task

    tasks = await asyncio.gather(*map(debug_get_task, tb.task_ids))
    rtgs = [
        {
            'a': v['scorecard-a'],
            'b': v['scorecard-b'],
            's': qts[t.metadata['query'].strip().lower()],
            'query_text': t.metadata['query']  # Add query text here
        }
        for t in tasks
        if len(v := next(iter(t.form_content.values()))['search-completion-rating']) == 2
    ]

    df_human_eval = pd.DataFrame.from_records(rtgs)
    return df_human_eval

    '''
    print('Found', len(rtgs), 'ratings')
    print('Found', sum(t.reviews is not None for t in tasks), 'reviews')

    print('Below is the analysis for threshold 0.457')
    df = pd.DataFrame.from_records(rtgs).query('s >= 0.457')
    print('Rating distribution')
    print(pd.DataFrame(df['a'].value_counts()).join(pd.DataFrame(df['b'].value_counts()), lsuffix='_a', rsuffix='_b'))
    print('Rating a - rating b distribution')
    print((df.a - df.b).apply(lambda x: np.clip(x, -2, 2)).value_counts().sort_index())
    print(((df.a - df.b) < 0).mean())
    print(((df.a - df.b) > 0).mean())
    print(df[(df['a']-df['b'])==-2]['query_text'])

    print('Below is the analysis for threshold 0.47')
    df = pd.DataFrame.from_records(rtgs).query('s >= 0.47')
    print('Rating distribution')
    print(pd.DataFrame(df['a'].value_counts()).join(pd.DataFrame(df['b'].value_counts()), lsuffix='_a', rsuffix='_b'))
    print('Rating a - rating b distribution')
    print((df.a - df.b).apply(lambda x: np.clip(x, -2, 2)).value_counts().sort_index())
    print(((df.a - df.b) < 0).mean())
    print(((df.a - df.b) > 0).mean())
    print(df[(df['a']-df['b'])==-2]['query_text'])

    return df

'''


# Run the async function in Jupyter Notebook
df_human_eval = await process_tasks()


In [117]:
df_human_eval['he_oai_index'] = df_human_eval['a']
df_human_eval['he_serp'] = df_human_eval['b']
df_human_eval['query'] = df_human_eval['query_text']
def rescale_1_to_5_to_1_to_7(rating):
    return ((rating - 1) / (5 - 1)) * (7 - 1) + 1

# Apply the rescaling function to the 'rating' column
df_human_eval['he_oai_index'] = df_human_eval['he_oai_index'].apply(rescale_1_to_5_to_1_to_7)
df_human_eval['he_serp'] = df_human_eval['he_serp'].apply(rescale_1_to_5_to_1_to_7)
df_human_eval.dtypes

a                 int64
b                 int64
s               float64
query_text       object
he_oai_index    float64
he_serp         float64
query            object
dtype: object

In [119]:
## Join all the dataframes

df_final = df_human_eval[['query','he_oai_index','he_serp']].merge(df_4o_mini_woC_mean,on='query').merge(df_4o_mini_mean,on='query').merge(df_o3_mini_woC_mean,on='query')
df_final.sort_values('he_oai_index')

Unnamed: 0,query,he_oai_index,he_serp,mean_4o_mini_woc_rating,std_dev_4o_mini_woc,mean_4o_mini_wc_rating,std_dev_4o_mini_wc,mean_o3_mini_woC_rating,std_dev_o3_mini_woC
18,closest electric car charging stations,4.0,5.5,1.9375,0.997914,2.1875,1.470544,2.625,1.454877
16,boutique pet stores,4.0,5.5,1.65,0.587143,1.5,0.512989,2.210526,0.630604
2,Is Amazon stock a good buy after recent sell-off?,4.0,4.0,4.222222,1.926764,5.111111,1.778595,3.5,1.465285
15,upcoming Planet of the Apes movie,4.0,5.5,4.714286,2.163636,5.714286,0.82542,5.5,0.797724
6,"what is the latest update about US election, is trump quit?",4.0,5.5,3.73913,0.864312,4.173913,0.886883,1.772727,0.528413
11,who were some notable angel investors in wish,4.0,4.0,3.0,1.383128,3.5,1.215838,1.416667,0.717282
14,I am going to Big Sky this weekend. What should I look into doing that is really popular?,5.5,7.0,4.681818,1.886957,6.136364,0.940894,6.181818,0.732664
12,What would be a reasonable cost estimate for API access to GPT 4o and Claude 3.5 Sonnet to create a few hundred outputs of 1 page length per month,5.5,4.0,4.714286,1.270545,4.904762,1.445848,2.857143,1.236354
17,How did Yusuf Dikec win a silver medal in shooting at the 2024 Olympics?,5.5,5.5,5.058824,1.951621,5.588235,0.939336,5.5,1.825742
0,Things to do with mom in boston this weekend,5.5,5.5,2.681818,0.716231,3.272727,0.827032,3.047619,1.07127


In [127]:
## Within variance as a % of mean for each model type

print('4o_mini_woc=', np.mean(df_final['std_dev_4o_mini_woc']**2/df_final['mean_4o_mini_woc_rating']),
                              '4o_mini_wc=', np.mean(df_final['std_dev_4o_mini_wc']**2/df_final['mean_4o_mini_wc_rating']),
                              'o3_mini_woC=',np.mean(df_final['std_dev_o3_mini_woC']**2/df_final['mean_o3_mini_woC_rating']))


4o_mini_woc= 0.7393381353513392 4o_mini_wc= 0.36299853881638827 o3_mini_woC= 0.38639004200087135


In [129]:
## Between variance i.e. variance of means

print('4o_mini_woc=', np.std(df_final['mean_4o_mini_woc_rating'])**2/np.mean(df_final['mean_4o_mini_woc_rating']),
                              '4o_mini_wc=', np.std(df_final['mean_4o_mini_wc_rating'])**2/np.mean(df_final['mean_4o_mini_wc_rating']),
                              'o3_mini_woC=',np.std(df_final['mean_o3_mini_woC_rating'])**2/np.mean(df_final['mean_o3_mini_woC_rating']))

4o_mini_woc= 0.281034183853949 4o_mini_wc= 0.3577398861613026 o3_mini_woC= 0.6791445185123794


In [139]:
query_string = 'what is the latest update about US election, is trump quit?'

filtered_df = df_o3_mini_woC[['query', 'create_timestamp','o3_mini_woC_rating', 'o3_mini_woC_justification', 'document.url', 'document.title', 'document.pub_date']].query('query == @query_string')

filtered_df


Unnamed: 0,query,create_timestamp,o3_mini_woC_rating,o3_mini_woC_justification,document.url,document.title,document.pub_date
66,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,2.0,"The article discusses Biden dropping out in 2024, not the latest 2025 US election update or Trump quitting specifically.",https://www.fox5atlanta.com/news/donald-trump-responds-biden-dropping-out-race,"Donald Trump, GOP respond to Biden dropping out of race | FOX 5 Atlanta",
67,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,1.0,"The document discusses GOP reaction to Biden dropping out, not latest US election update or Trump quitting, so it's not relevant.",https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
68,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,2.0,"The article discusses Biden dropping out and Trump's reaction, not the latest US election update or Trump quitting the race.",https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
69,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,2.0,"The article doesn’t address the user’s query about Trump quitting and latest election update, focusing instead on Biden dropping out.",https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
70,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,2.0,"The document discusses Biden dropping out and GOP reactions, not the latest US election update about Trump quitting, so it's largely off‑topic.",https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
71,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,2.0,"The document discusses Biden dropping out and GOP reactions, not latest US election update or whether Trump quit, so it's off topic.",https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
72,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,2.0,"The document discusses Biden dropping out and GOP reactions, not the latest US election update or whether Trump quit, so it's largely irrelevant.",https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/?intcid=CNI-00-10aaa3a,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
73,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,,,https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/?intcid=CNI-00-10aaa3a,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
74,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,2.0,"The article discusses reactions to Biden dropping out, not the latest US election update about Trump quitting, so it's largely off topic.",https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/?intcid=CNI-00-10aaa3a,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
75,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,2.0,"The document discusses Biden dropping out and GOP reactions, not the latest US election update about Trump quitting, so it's off-topic.",https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/?ftag=MSF0951a18,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",


In [136]:
df_o3_mini_woC.columns

Index(['grading', 'context', 'document.url', 'document.title',
       'document.content', 'document.pub_date', 'query', 'create_timestamp',
       'orig_query', 'user_metadata.time_zone', 'user_metadata.user_locale',
       'user_metadata.cf_connecting_ip', 'user_metadata.user_country',
       'user_metadata.user_region', 'user_metadata.user_region_code',
       'user_metadata.ip_city', 'user_metadata.latitude',
       'user_metadata.longitude', 'user_metadata.locationAccuracy',
       'user_metadata.is_precise_location', 'user_metadata.plan_type',
       'category', 'rating', 'justification', 'o3_mini_woC_rating',
       'o3_mini_woC_justification'],
      dtype='object')

In [143]:
query_string = 'what is the latest update about US election, is trump quit?'

filtered_df = df_4o_mini_woC[['query', 'create_timestamp','4o_mini_woc_rating', '4o_mini_woc_justification', 'document.url', 'document.title', 'document.pub_date']].query('query == @query_string')

filtered_df

Unnamed: 0,query,create_timestamp,4o_mini_woc_rating,4o_mini_woc_justification,document.url,document.title,document.pub_date
66,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,4,The document discusses Biden's withdrawal from the race but does not address Trump's status directly.,https://www.fox5atlanta.com/news/donald-trump-responds-biden-dropping-out-race,"Donald Trump, GOP respond to Biden dropping out of race | FOX 5 Atlanta",
67,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,3,The document discusses Biden's withdrawal but does not address Trump's status or the user's query directly.,https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
68,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,5,"The document discusses Trump's reaction to Biden dropping out, partially addressing the query about updates on the US election and Trump's status.",https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
69,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,4,The document discusses Biden's withdrawal but lacks direct information on Trump's status or the latest election updates.,https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
70,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,3,The document discusses Biden's withdrawal but does not address Trump's status or the user's query directly.,https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
71,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,3,The document discusses Biden's withdrawal but does not address Trump's status or the user's query directly.,https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
72,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,4,The document discusses Biden's withdrawal but does not address Trump's status or the latest election updates directly.,https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/?intcid=CNI-00-10aaa3a,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
73,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,3,The document discusses Biden's withdrawal but does not address Trump's status or the latest election updates.,https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/?intcid=CNI-00-10aaa3a,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
74,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,4,The document discusses Biden's withdrawal but lacks direct information on Trump's status or latest updates.,https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/?intcid=CNI-00-10aaa3a,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
75,"what is the latest update about US election, is trump quit?",2025-01-07T18:33:54.254126-05:00,4,The document discusses Biden's withdrawal but lacks direct information on Trump's status or latest updates.,https://www.cbsnews.com/news/republican-lawmakers-react-biden-dropping-out-presidential-race/?ftag=MSF0951a18,"Trump, JD Vance, Republican lawmakers react to Biden's decision to drop out of presidential race - CBS News",
