# Gen AI Capstone: X Post Claim Analyzer

**Author(s):** Michael Chaves
**Date:** April 2025

## Introduction: The Challenge of Information on X

The rapid flow of information on platforms like X (formerly Twitter) presents a challenge: how can users quickly assess the potential reliability of factual claims within posts? While definitive "truth detection" by AI is complex and arguably impossible, Generative AI offers tools to *assist* users in this process.

This Capstone project, developed for the **5-Day Gen AI Intensive Course with Google**, demonstrates a prototype **Gen AI Assistant** designed to:
1.  **Extract** potentially verifiable factual claims from sample X posts using Structured Output and Few-Shot Prompting.
2.  **Attempt Verification** by querying a simulated knowledge base using Function Calling.
3.  **Summarize** the findings concisely, ensuring the summary is Grounded in the verification results.

**Disclaimer:** This tool is an educational prototype using simulated data and knowledge sources. It is **NOT** a definitive fact-checker or truth detector but rather demonstrates specific Gen AI capabilities.

**Gen AI Capabilities Showcased:** This project demonstrates the application of the following key capabilities:
* **Structured Output:** Extracting information in a predefined JSON format.
* **Few-Shot Prompting:** Guiding model behavior with examples for improved accuracy.
* **Function Calling:** Enabling the LLM to interact with simulated external tools/knowledge.
* **Document Understanding & Grounding:** Processing input text and ensuring generated summaries are based strictly on provided evidence.

## 1. Setup and Configuration

This section handles the initial setup, including importing necessary libraries, configuring the connection to the Generative AI model, and initializing the model instance.

### 1.1 Libraries

We import standard libraries like `os` and `json`, the `google-generativeai` library for interacting with the Gemini API, and `kaggle_secrets` for securely accessing the API key.

### 1.2 API Key Configuration

Access to the Gemini API requires an API key. **Important:** Store your API key securely using Kaggle Secrets (Add-ons -> Secrets). Add a secret named `GOOGLE_API_KEY` (or update the code if you used a different name) containing your key value.

The following code retrieves the key from Kaggle Secrets and configures the `google-generativeai` library. It includes error handling in case the secret is not found. **Never expose your API key directly in the notebook code.**

### 1.3 Model Initialization

We initialize the Gemini model we'll be using for various tasks (claim extraction, function calling decisions, summarization). We are starting with `gemini-1.5-flash-latest` for its balance of speed and capability. The model instance `model` will be used in subsequent steps.

In [40]:
# --- 1. Setup ---
print("--- 1. Setup ---")

# Install necessary libraries (only needed once per session)
# !pip install -q google-generativeai

import google.generativeai as genai
import os
import json # For handling JSON data
# import pandas as pd # Optional: if you prefer DataFrames for handling data

# --- Configure API Key ---
try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    GOOGLE_API_KEY = user_secrets.get_secret("GOOGLE_API_KEY")
    genai.configure(api_key=GOOGLE_API_KEY)
    print("Gemini API Key configured successfully.")
except Exception as e:
    print(f"Error configuring Gemini API Key: {e}")
    print("Please ensure 'GOOGLE_API_KEY' is set in Kaggle Secrets.")
    # Handle the error appropriately - maybe stop execution or use a dummy mode
    GOOGLE_API_KEY = None # Set to None if key retrieval fails

# --- Initialize the Generative Model ---
# Let's start with gemini-1.5-flash, which is fast and capable
# If you have access and need JSON mode enforcement, gemini-1.5-pro might be better
model_name = "gemini-1.5-flash-latest"
# If using 1.5 Pro for guaranteed JSON:
# model_name = "gemini-1.5-pro-latest"
# generation_config_json = genai.types.GenerationConfig(response_mime_type="application/json")

if GOOGLE_API_KEY:
    model = genai.GenerativeModel(model_name)
    # If using 1.5 Pro for guaranteed JSON:
    # model = genai.GenerativeModel(model_name, generation_config=generation_config_json)
    print(f"Initialized model: {model_name}")
else:
    model = None
    print("Model not initialized due to missing API key.")

--- 1. Setup ---
Gemini API Key configured successfully.
Initialized model: gemini-1.5-flash-latest


## 2. Sample Data

Due to the offline requirements for Kaggle submissions and the complexities of live API access, this notebook operates on a small, predefined list of sample "X Posts". This static data allows us to demonstrate the workflow consistently.

The sample data includes a mix of posts containing:
* Clear factual claims (some potentially verifiable by our mini-KB).
* Opinions or subjective statements.
* Questions.

The code below defines and displays these sample posts.

In [41]:
# --- 2. Sample Data ---
print("\n--- 2. Sample Data ---")

# Using a list of strings for simplicity. You could use dicts or a DataFrame.
sample_tweets = [
    # 1. Factual, likely verifiable by simple KB
    "Just read that Python 3.10 was released on October 4, 2021. Time flies! #Python",
    # 2. Factual, less likely verifiable by simple KB
    "Amazing turnout at the local park cleanup event today! Over 50 volunteers showed up. #Community",
    # 3. Opinion
    "This new AI model is the best thing since sliced bread! So impressed. #AI #Tech",
    # 4. Question
    "Does anyone know if Kaggle was founded in 2010 or 2011? #Kaggle",
    # 5. Factual, but incorrect info (relative to a potential KB)
    "Paris is the capital of Spain, right? Planning my trip! #Travel",
    # 6. Factual, potentially verifiable
    "France's national day, Bastille Day, is celebrated on 14 July. #France #History" ,
    # 7. No clear factual claim
    "Feeling excited about the upcoming weekend! #TGIF"
]

print(f"Loaded {len(sample_tweets)} sample tweets.")
# Displaying first few tweets
for i, tweet in enumerate(sample_tweets[:3]):
    print(f"Tweet {i+1}: {tweet}")


--- 2. Sample Data ---
Loaded 7 sample tweets.
Tweet 1: Just read that Python 3.10 was released on October 4, 2021. Time flies! #Python
Tweet 2: Amazing turnout at the local park cleanup event today! Over 50 volunteers showed up. #Community
Tweet 3: This new AI model is the best thing since sliced bread! So impressed. #AI #Tech


## 3. Extracting Factual Claims

The first core step is to identify potentially verifiable factual claims within each post, separating them from opinions, questions, or general commentary. We instruct the LLM to return these claims in a structured format for easier processing in later steps.

### 3.1 Defining the Output Structure

For reliable processing, we require the LLM to output extracted claims in a specific JSON format. The desired structure includes a list of claims, where each claim has the `claim_text` (the verbatim claim) and suggested `keywords` (useful for potential searching).

```json
{
  "claims": [
    {
      "claim_text": "Specific verifiable fact stated in the post.",
      "keywords": ["entity1", "entity2", "event"]
    }
  ]
}

In [42]:
# --- 3. Extract Factual Claims ---
# Note: Ensure Step 1 (Setup) has been run successfully and 'model' is initialized.
# Note: Ensure Step 2 (Sample Data) has been run and 'sample_tweets' exists.
# Note: Ensure 'json' library is imported from Step 1.
print("\n--- 3. Extract Factual Claims ---")

# --- Claim Extraction Prompt (with Few-Shot Examples) ---
def create_extraction_prompt(tweet_text):
    # Define 2-3 clear examples
    example_1_input = "Just read that Python 3.10 was released on October 4, 2021. Time flies! #Python"
    example_1_output = """
{
  "claims": [
    {
      "claim_text": "Python 3.10 was released on October 4, 2021",
      "keywords": ["Python 3.10", "release date", "October 4 2021"]
    }
  ]
}
"""

    example_2_input = "This new AI model is the best thing since sliced bread! So impressed. #AI #Tech"
    example_2_output = """
{
  "claims": []
}
"""

    example_3_input = "France's national day, Bastille Day, is celebrated on 14 July. #France"
    example_3_output = """
{
  "claims": [
    {
      "claim_text": "France's national day, Bastille Day, is celebrated on 14 July",
      "keywords": ["France", "Bastille Day", "14 July"]
    }
  ]
}
"""

    # Construct the prompt with clear instructions and examples
    prompt = f"""
Analyze the following post carefully. Your task is to identify sentences or phrases that state objective, verifiable factual claims. Ignore opinions, questions, subjective statements, predictions, and general commentary.

Format your output strictly as a JSON object containing a single key "claims". The value associated with "claims" must be a list of JSON objects. Each object in the list represents one factual claim found and must have two keys:
1.  "claim_text": The exact verbatim factual claim extracted from the post.
2.  "keywords": A list of 1-3 key entities, concepts, or terms from the claim text that would be useful for searching or verification.

If no verifiable factual claims are found in the post, return a JSON object with an empty list: {{"claims": []}}

Ensure your entire output is only the valid JSON object, starting with {{ and ending with }}.

Here are some examples:        <-- **** LOOK FOR THIS LINE ****

Post:
\"\"\"
{example_2_input}
\"\"\"
JSON Output:
```json
{example_2_output.strip()}

Post:
\"\"\"
{example_2_input}
\"\"\"
JSON Output:
```json
{example_2_output.strip()}

Post:
\"\"\"
{example_3_input}
\"\"\"
JSON Output:
```json
{example_3_output.strip()}

```                        <-- **** EXAMPLES END HERE ****

Now, analyze the following post:

Post:
\"\"\"
{tweet_text}
\"\"\"

JSON Output:
"""
    # Consider removing the final ```json ... ``` markers if using guaranteed JSON mode
    # (e.g., with Gemini 1.5 Pro) as it might expect pure JSON. Test what works best.
    return prompt

# --- Claim Extraction Function ---
# (Includes error handling and validation)
def extract_claims(tweet_text, model):
    if not model:
        print("Model not available.")
        return {"claims": [], "error": "Model not initialized"} # Return error structure

    prompt = create_extraction_prompt(tweet_text)
    try:
        # Generate content using the model
        # Adjust generation config if needed (e.g., temperature, JSON mode)
        response = model.generate_content(prompt)

        # Attempt to parse the response as JSON
        # Clean potential markdown formatting ```json ... ``` first
        cleaned_text = response.text.strip().lstrip('```json').rstrip('```').strip()
        extracted_data = json.loads(cleaned_text)

        # Basic validation of the parsed structure
        if "claims" not in extracted_data or not isinstance(extracted_data["claims"], list):
             print(f"Warning: Unexpected JSON structure from LLM for post: {tweet_text[:50]}...")
             return {"claims": [], "error": "Invalid JSON structure received"}

        # Further validate structure of items within 'claims' list if needed
        for item in extracted_data["claims"]:
             if not isinstance(item, dict) or "claim_text" not in item or "keywords" not in item:
                  print(f"Warning: Invalid item structure within claims for post: {tweet_text[:50]}...")
                  return {"claims": [], "error": "Invalid item structure in claims list"}
             if not isinstance(item["keywords"], list):
                   print(f"Warning: Keywords not a list for post: {tweet_text[:50]}...")
                   return {"claims": [], "error": "Keywords field is not a list"}

        return extracted_data

    except json.JSONDecodeError as e:
        print(f"Error decoding JSON from LLM response for post: {tweet_text[:50]}...")
        print(f"LLM Raw Text: '{response.text}'") # Print raw text for debugging
        return {"claims": [], "error": f"JSON Decode Error: {e}"}
    except Exception as e:
        # Catch other potential errors during API call or processing
        print(f"An error occurred during claim extraction for post: {tweet_text[:50]}...")
        print(f"Error: {e}")
        # Attempt to access response parts or feedback for more details if available
        try:
             # Check candidate feedback first if available
             if hasattr(response, 'prompt_feedback') and response.prompt_feedback:
                 print(f"LLM Response Feedback: {response.prompt_feedback}")
             # Check parts if feedback isn't informative or error happened differently
             elif hasattr(response, 'parts'):
                 print(f"LLM Response Parts: {response.parts}")
        except Exception:
             pass # Ignore errors during error reporting
        return {"claims": [], "error": f"General Error: {e}"}


# --- Process Sample Tweets ---
# Make sure 'sample_tweets' list exists from Step 2
if 'sample_tweets' not in globals():
     print("Error: sample_tweets list not found. Please run Step 2 first.")
elif 'model' not in globals() or model is None:
     print("Error: 'model' is not initialized. Please run Step 1 successfully.")
else:
    # Initialize/reset the results list before processing
    extracted_claims_results = []
    print("\nProcessing posts to extract claims...")
    for tweet in sample_tweets:
        result = extract_claims(tweet, model) # Assumes 'model' is initialized
        extracted_claims_results.append({"original_tweet": tweet, "extraction_result": result})

    print("Finished extracting claims.")

    # --- Display Extraction Results ---
    print("\n--- Extracted Claims Results ---")
    if not 'extracted_claims_results' in globals() or not extracted_claims_results:
         print("Claim extraction did not run or produced no results variable.")
    else:
        for item in extracted_claims_results:
            print(f"Original Post: {item['original_tweet']}")
            result = item['extraction_result']
            # Check for error key *and* ensure it has a value
            if "error" in result and result['error']:
                print(f"  Error: {result['error']}")
            # Check for claims key *and* ensure it's not empty
            elif not result.get('claims'): # Use .get for safer access, catches missing key or empty list/None
                print("  No factual claims identified.")
            else:
                claims_list = result.get('claims', []) # Get claims list safely
                if not claims_list: # Double check if list is empty after .get
                     print("  No factual claims identified.")
                else:
                    for i, claim_info in enumerate(claims_list):
                        # Use .get for claim_text and keywords for robustness
                        print(f"  Claim {i+1}: {claim_info.get('claim_text', 'N/A')}")
                        print(f"    Keywords: {claim_info.get('keywords', 'N/A')}")
            print("-" * 20)


--- 3. Extract Factual Claims ---

Processing posts to extract claims...
Finished extracting claims.

--- Extracted Claims Results ---
Original Post: Just read that Python 3.10 was released on October 4, 2021. Time flies! #Python
  Claim 1: Python 3.10 was released on October 4, 2021
    Keywords: ['Python 3.10', 'release date', 'October 4, 2021']
--------------------
Original Post: Amazing turnout at the local park cleanup event today! Over 50 volunteers showed up. #Community
  Claim 1: Over 50 volunteers showed up
    Keywords: ['volunteers', 'park cleanup', 'number']
--------------------
Original Post: This new AI model is the best thing since sliced bread! So impressed. #AI #Tech
  No factual claims identified.
--------------------
Original Post: Does anyone know if Kaggle was founded in 2010 or 2011? #Kaggle
  No factual claims identified.
--------------------
Original Post: Paris is the capital of Spain, right? Planning my trip! #Travel
  No factual claims identified.
----------

## 4. Simulating Evidence Search via Function Calling

Once factual claims are extracted, we attempt to verify them. In a real-world scenario, this might involve searching the web or specific databases. For this offline Capstone project, we **simulate** this process using **Function Calling**.

We define a simple knowledge base and a Python function to search it. We then make the LLM aware of this function (as a "tool") and instruct it to call the function *only* when appropriate to verify a claim based on its keywords.

### 4.1 Simulated Knowledge Base (KB)

A small Python dictionary (`mini_kb`) acts as our "trusted source". It contains a few predefined facts relevant to our sample posts. This allows us to demonstrate the *mechanism* of function calling and verification, even if the knowledge is limited.

In [43]:
# --- 4. Simulate Evidence Search (Function Calling) ---
print("\n--- 4. Simulate Evidence Search (Function Calling) ---")

# --- 4.1. Define Simulated Knowledge Base ---
# A simple dictionary acting as our 'reliable source' for demo purposes.
# Keys/values should relate to potential claims in your sample_tweets.
mini_kb = {
    "Python 3.10 release date": "October 4, 2021",
    "Kaggle founding year": 2010,
    "Capital of France": "Paris",
    "Bastille Day date": "14 July",
    # Add 1-2 more simple facts if relevant to your sample tweets
}
print("Mini Knowledge Base (KB) defined:")
print(mini_kb)


--- 4. Simulate Evidence Search (Function Calling) ---
Mini Knowledge Base (KB) defined:
{'Python 3.10 release date': 'October 4, 2021', 'Kaggle founding year': 2010, 'Capital of France': 'Paris', 'Bastille Day date': '14 July'}


### 4.2 KB Search Function (`search_knowledge_base`)

This standard Python function simulates querying the KB. It accepts `query_keywords` (provided by the LLM) and performs simple string matching against the `mini_kb`. It returns a string indicating whether confirming information was found, if a contradiction was detected (based on simple hardcoded rules for this demo), or if no relevant information was available in the KB.

In [44]:
# --- 4.2. Define Python Function to Search KB ---
# This function will be callable by the LLM via Function Calling.
def search_knowledge_base(query_keywords: list) -> str:
    """
    Searches the mini_kb for information related to the query_keywords.
    Returns a string indicating what was found, contradicted, or not found.
    """
    print(f"  -> KB Search Function called with keywords: {query_keywords}")
    found_info = []
    if not query_keywords:
        return "No keywords provided for search."

    query_str = " ".join(query_keywords).lower() # Combine keywords for simple matching

    for key, value in mini_kb.items():
        key_lower = key.lower()
        value_str_lower = str(value).lower() # Convert value to string for searching

        # Simple check if combined keywords appear in key or value
        # More sophisticated matching could be done here (e.g., check individual keywords)
        if query_str in key_lower or query_str in value_str_lower:
            found_info.append(f"Possible match found in KB: '{key}: {value}'")
        # Check if a keyword matches a key directly
        elif any(keyword.lower() in key_lower for keyword in query_keywords):
             found_info.append(f"Possible match found in KB: '{key}: {value}'")

    if found_info:
        return " | ".join(found_info) # Return all potential matches
    else:
        # Check for potential contradictions (simple example)
        if "paris" in query_str and "spain" in query_str:
            return f"KB Contradiction: KB lists Capital of France as '{mini_kb.get('Capital of France', 'N/A')}'."
        elif "kaggle" in query_str and "2011" in query_str:
             return f"KB Contradiction: KB lists Kaggle founding year as {mini_kb.get('Kaggle founding year', 'N/A')}."
        else:
             return "No relevant information found in Mini KB for the given keywords."

### 4.3 Defining the Tool for the LLM

To enable **Function Calling**, we must describe our Python function (`search_knowledge_base`) to the LLM using a specific schema (OpenAPI-like). This `FunctionDeclaration` tells the LLM the tool's name, purpose, and the expected parameters (`query_keywords`).

In [45]:
# --- 4.3. Define the Function Declaration for the LLM Tool ---
# Describe the function so the LLM knows how and when to use it.
# CORRECTED version using dictionary-based schema (OpenAPI style)

try:
    search_tool = genai.types.FunctionDeclaration(
        name="search_knowledge_base",
        description="Searches a small knowledge base for facts related to given keywords (like dates, places, events, entities) to help verify a claim. Use keywords extracted from the claim.",
        # Define parameters using a dictionary matching OpenAPI schema
        parameters={
            'type': 'object', # Use lowercase string "object" for the top level
            'properties': {
                'query_keywords': { # Define the parameter name
                    'type': 'array',  # Use lowercase string "array"
                    'description': "List of 1-3 key entities, concepts, or terms from the claim to search for.",
                    'items': {        # Describe the items within the array
                        'type': 'string' # Use lowercase string "string"
                    }
                }
            },
            'required': ["query_keywords"] # List required parameters
        }
    )
    print("Function Declaration 'search_tool' created successfully.")

except Exception as e:
    print(f"Error creating Function Declaration: {e}")
    # Handle error appropriately, maybe set search_tool to None
    search_tool = None

Function Declaration 'search_tool' created successfully.


### 4.4 Implementation: Calling the Function

The core logic involves:
1.  Re-initializing the Gemini model with the `tools` parameter set to our `search_tool` declaration.
2.  Iterating through each `claim` extracted earlier.
3.  Creating a new prompt asking the LLM to *decide* whether using the `search_knowledge_base` tool is appropriate for verifying the current claim, based on its text and keywords.
4.  Calling the LLM with this prompt and the tool definition.
5.  Checking the LLM's response:
    * If the response contains a `function_call` part for `search_knowledge_base`: We parse the arguments (keywords) provided by the LLM and execute our *local* Python `search_knowledge_base` function with them.
    * If the response contains only text: The LLM decided not to call the function (e.g., deemed it irrelevant), and we record its textual response.
6.  Storing the result of this verification attempt (either the output from our Python function or the LLM's textual refusal).

### 4.5 Verification Attempt Results

The code cell below performs this verification loop and displays the outcome for each claim.

In [46]:
# --- 4.4. Process Claims and Attempt Verification ---
verification_results = []
print("\nAttempting verification for extracted claims using Function Calling...")

if not model:
    print("Skipping verification as model not initialized.")
else:
    # Re-initialize model with the tool (important!)
    # Note: If using flash, it might sometimes struggle with complex function calling. Pro might be more reliable.
    model_with_tool = genai.GenerativeModel(model_name, tools=[search_tool])
    print(f"Re-initialized model '{model_name}' with search tool.")

    for item in extracted_claims_results:
        original_tweet = item["original_tweet"]
        claims = item["extraction_result"].get("claims", [])
        tweet_verifications = []

        if not claims or "error" in item["extraction_result"]:
            # Skip if no claims were extracted or if there was an extraction error
            verification_results.append({
                "original_tweet": original_tweet,
                "verifications": tweet_verifications # Empty list
            })
            continue

        print(f"\nVerifying claims for tweet: {original_tweet[:60]}...")
        for claim_info in claims:
            claim_text = claim_info.get("claim_text", "N/A")
            keywords = claim_info.get("keywords", [])

            if claim_text == "N/A" or not keywords:
                 tweet_verifications.append({
                    "claim": claim_text,
                    "keywords": keywords,
                    "verification_attempted": False,
                    "raw_llm_response": None,
                    "kb_search_result": "Skipped (missing claim or keywords)."
                 })
                 continue

            # Prompt for the LLM asking it to use the tool if appropriate
            verification_prompt = f"""
            Consider the following factual claim extracted from a social media post:
            Claim: "{claim_text}"
            Keywords: {keywords}

            Use the available 'search_knowledge_base' tool ONLY IF it seems likely to help verify this specific claim based on the keywords. If the tool is not relevant for verifying this claim, respond directly stating that verification wasn't possible with available tools. Do not make up information.
            """

            try:
                # Send prompt and tool definition to the model
                response = model_with_tool.generate_content(verification_prompt)
                response_part = response.candidates[0].content.parts[0]
                kb_search_result = "Verification not attempted by LLM." # Default

                # Check if the model decided to call the function
                if response_part.function_call.name == "search_knowledge_base":
                    function_call = response_part.function_call
                    args = function_call.args

                    # Execute the *local* Python function
                    function_response_text = search_knowledge_base(
                        query_keywords=args.get("query_keywords", [])
                    )
                    kb_search_result = function_response_text # Store the result

                elif response_part.text:
                    # If the LLM responded directly without calling the function
                     kb_search_result = f"LLM Response (No Tool Call): {response_part.text}"


                tweet_verifications.append({
                    "claim": claim_text,
                    "keywords": keywords,
                    "verification_attempted": response_part.function_call.name == "search_knowledge_base",
                    "raw_llm_response_part": str(response_part), # Store for debugging
                    "kb_search_result": kb_search_result
                 })

            except Exception as e:
                 print(f"  Error during verification for claim '{claim_text[:50]}...': {e}")
                 # Log response parts if possible
                 try:
                     print(f"  LLM Response Parts on Error: {response.candidates[0].content.parts}")
                 except: pass
                 tweet_verifications.append({
                    "claim": claim_text,
                    "keywords": keywords,
                    "verification_attempted": False,
                    "raw_llm_response": None,
                    "kb_search_result": f"Error during verification: {e}"
                 })

        verification_results.append({
            "original_tweet": original_tweet,
            "verifications": tweet_verifications
        })

print("\nFinished attempting verification.")

# --- Display Verification Results ---
print("\n--- Verification Attempt Results ---")
for item in verification_results:
    print(f"Original Tweet: {item['original_tweet']}")
    if not item['verifications']:
        print("  No claims were processed for verification.")
    else:
        for i, verification in enumerate(item['verifications']):
            print(f"  Claim {i+1}: {verification['claim']}")
            print(f"    Keywords: {verification['keywords']}")
            print(f"    Verification Attempted: {verification['verification_attempted']}")
            print(f"    KB Search Result/LLM Response: {verification['kb_search_result']}")
            # print(f"    Raw LLM Part: {verification['raw_llm_response_part']}") # Optional: for debugging
    print("-" * 20)


Attempting verification for extracted claims using Function Calling...
Re-initialized model 'gemini-1.5-flash-latest' with search tool.

Verifying claims for tweet: Just read that Python 3.10 was released on October 4, 2021. ...
  -> KB Search Function called with keywords: ['Python 3.10', 'release date', 'October 4, 2021']

Verifying claims for tweet: Amazing turnout at the local park cleanup event today! Over ...

Verifying claims for tweet: France's national day, Bastille Day, is celebrated on 14 Jul...
  -> KB Search Function called with keywords: ['France', 'Bastille Day', 'July 14']

Finished attempting verification.

--- Verification Attempt Results ---
Original Tweet: Just read that Python 3.10 was released on October 4, 2021. Time flies! #Python
  Claim 1: Python 3.10 was released on October 4, 2021
    Keywords: ['Python 3.10', 'release date', 'October 4, 2021']
    Verification Attempted: True
    KB Search Result/LLM Response: Possible match found in KB: 'Python 3.10 relea

## 5. Summarizing Verification Findings

To present the results clearly, we use the LLM one last time to generate a concise summary sentence based on the outcome of the simulated verification attempt.

This step utilizes the LLM's **Document Understanding** to process the claim and the verification result, and relies on **Grounding** to ensure the summary accurately reflects *only* the information provided by the simulated KB search, preventing hallucination.

### 5.1 Prompt Design for Summarization

The prompt instructs the LLM to:
* Analyze the original claim and the text result from the KB search step (`kb_search_result`).
* Generate a single sentence stating whether the claim appears "confirmed", "contradicted", or "unverified" *by the available knowledge base*.
* Crucially, base this summary *strictly* on the provided `kb_search_result`.

### 5.2 Implementation: `summarize_findings` Function

The `summarize_findings` Python function takes the claim text, the KB search result string, and the model, constructs the summary prompt, calls the LLM, and returns the generated summary sentence.

In [47]:
# --- 5. Summarize Findings ---
print("\n--- 5. Summarize Findings ---")

def create_summary_prompt(claim_text, kb_search_result):
    prompt = f"""
    Based *only* on the provided 'KB Search Result' below, write a concise, single-sentence summary stating whether the claim appears confirmed, contradicted, or remains unverified by the available knowledge base.

    - If the KB result indicates confirmation or provides matching info, state it's 'confirmed by the knowledge base'.
    - If the KB result indicates a contradiction, state it's 'contradicted by the knowledge base'.
    - If the KB result indicates 'No relevant information found', 'Verification not attempted', or an LLM text response instead of a KB result, state it's 'unverified by the available knowledge base'.
    - Do not add any information not present in the KB Search Result.

    Claim: "{claim_text}"
    KB Search Result: "{kb_search_result}"

    Concise Summary:
    """
    return prompt

def summarize_findings(claim_text, kb_search_result, model):
    if not model:
        return "Summary skipped (model not available)."
    if "Error during verification" in kb_search_result: # Handle previous errors
         return "Summary skipped due to verification error."

    prompt = create_summary_prompt(claim_text, kb_search_result)
    try:
        response = model.generate_content(prompt)
        # Add basic check for safety ratings if needed
        # if response.prompt_feedback.block_reason:
        #    return f"Summary generation blocked: {response.prompt_feedback.block_reason}"
        return response.text.strip()
    except Exception as e:
        print(f"  Error during summary generation for claim '{claim_text[:50]}...': {e}")
        return f"Error generating summary: {e}"

print("Summarization function defined.")


--- 5. Summarize Findings ---
Summarization function defined.


### 5.3 Final Results Display

The final code cell integrates all previous steps. It iterates through the verification results, calls the `summarize_findings` function for each claim, and presents a consolidated view: Original Post, Extracted Claim, KB Search Result, and Final Summary Sentence.

In [48]:
# --- Display Final Results (Combining Extraction, Verification, and Summary) ---
print("\n--- Final Processed Results ---")

final_results_display = [] # Store structured results

# Make sure you have 'verification_results' from the end of step 4
if not 'verification_results' in globals():
     print("Error: verification_results not found. Please ensure step 4 ran correctly.")
else:
    for item in verification_results:
        original_tweet = item['original_tweet']
        verifications = item['verifications']
        print(f"Original Post: {original_tweet}")

        processed_verifications = []
        if not verifications:
            print("  No claims processed for verification.")
        else:
            for i, verification in enumerate(verifications):
                claim_text = verification['claim']
                kb_result = verification['kb_search_result']

                # Call the new summary function
                summary_sentence = summarize_findings(claim_text, kb_result, model) # Use the base model is fine

                print(f"  Claim {i+1}: {claim_text}")
                # print(f"    Keywords: {verification['keywords']}") # Optional to keep
                # print(f"    Verification Attempted: {verification['verification_attempted']}") # Optional
                print(f"    KB Search Result: {kb_result}")
                print(f"    Summary: {summary_sentence}")

                # Store for potential later use
                processed_verifications.append({
                     "claim": claim_text,
                     "kb_search_result": kb_result,
                     "summary": summary_sentence
                })
        final_results_display.append({
             "original_post": original_tweet,
             "processed_claims": processed_verifications
        })
        print("-" * 30) # Increase separator length


--- Final Processed Results ---
Original Post: Just read that Python 3.10 was released on October 4, 2021. Time flies! #Python
  Claim 1: Python 3.10 was released on October 4, 2021
    KB Search Result: Possible match found in KB: 'Python 3.10 release date: October 4, 2021'
    Summary: Confirmed by the knowledge base.
------------------------------
Original Post: Amazing turnout at the local park cleanup event today! Over 50 volunteers showed up. #Community
  Claim 1: Over 50 volunteers showed up
    KB Search Result: LLM Response (No Tool Call): The available tool `search_knowledge_base` is unlikely to be helpful in verifying the claim "Over 50 volunteers showed up" because the keywords provided do not contain a specific number or quantifiable information related to the number of volunteers.  The keyword 'number' is too generic.  Verification wasn't possible with available tools.

    Summary: Unverified by the available knowledge base.
------------------------------
Original Post:

## 6. Conclusion & Project Summary

### 6.1 Project Recap

This notebook demonstrated a prototype Gen AI assistant designed to analyze factual claims within sample X posts (formerly Tweets). The workflow involved:
1.  Extracting potential factual claims using an LLM guided by **Few-Shot Prompting** and **Structured Output** (JSON).
2.  Attempting to verify these claims against a *simulated* knowledge base using **Function Calling**.
3.  Generating a concise summary of the verification findings, emphasizing **Grounding** the summary based *only* on the simulated evidence.

This project serves as a proof-of-concept for how Gen AI tools can be combined to create assistants for navigating and analyzing information encountered on social media platforms.

### 6.2 Gen AI Capabilities Demonstrated

This Capstone Project successfully demonstrated the application of **four (4)** key capabilities learned during the Gen AI Intensive Course:

1.  **Structured Output:** The LLM was instructed to return extracted claims in a specific JSON format, ensuring the output could be reliably parsed and used programmatically in subsequent steps.
2.  **Few-Shot Prompting:** Examples of desired input-output behavior were included directly in the prompt for claim extraction. This helped guide the LLM to better distinguish factual claims from opinions/questions and adhere to the required JSON structure.
3.  **Function Calling:** A Python function simulating a knowledge base search was defined as a "tool". The LLM was empowered to decide when to call this function with appropriate arguments (keywords) based on the claim text, demonstrating interaction with external (simulated) systems.
4.  **Document Understanding & Grounding:** The LLM processed the original post, extracted claims, and later generated summaries based on verification results. The summarization step specifically required the model to *ground* its output solely on the provided evidence string, preventing hallucination or the introduction of outside information.


### 6.3 Limitations

It is crucial to acknowledge the limitations of this prototype:

* **Static & Limited Data:** The analysis was performed on a small, predefined list of sample posts, not a live or comprehensive feed.
* **Simulated Verification:** The "knowledge base" was extremely limited and predefined. The evidence search was entirely simulated and did not query external databases or the internet.
* **Not a Truth Detector:** This system **does not determine truth**. It only checks claims against its tiny, simulated knowledge base. The results ("confirmed", "contradicted", "unverified") are strictly relative to this KB.
* **LLM Reliability:** The entire workflow relies on the LLM's ability to accurately follow complex instructions (parsing text, identifying facts, formatting JSON, deciding when to call functions, grounding summaries). LLMs can make errors, misunderstand context, or exhibit biases.
* **Basic Logic:** Claim extraction, keyword generation, and KB search logic are simplified for this demonstration.

### 6.4 Future Work

This prototype could be extended in several ways:

* **RAG Implementation:** Replace the simple KB dictionary with a proper **Retrieval-Augmented Generation** system, using **Embeddings** and **Vector Search** against a larger corpus of trusted documents.
* **Real APIs (with caution):** Carefully integrate calls to external APIs (e.g., web search, specific knowledge bases) via Function Calling, managing API keys and usage limits.
* **Improved Claim/Keyword Logic:** Enhance the sophistication of claim extraction and keyword generation.
* **Gen AI Evaluation:** Implement automated checks (**Gen AI Evaluation**) to assess the quality of extracted claims or summaries against predefined criteria or human labels.
* **User Interface:** Build a simple UI (e.g., Gradio/Streamlit) for easier interaction.