# News Retrieval Pipeline for Forecasting

This notebook demonstrates the news retrieval pipeline that uses LLMs to:
1. **Generate search keywords** optimized for Google News
2. **Retrieve relevant news** articles using the generated keywords
3. **Rate the relevance** of each news article (1-5 scale)
4. **Synthesize all relevant news** into a single cohesive summary

**Important**: Only news articles that meet the minimum relevance threshold are passed to the summarization agent. The summary synthesizes all relevant articles into a factual analysis of key factors affecting the forecasting question, without listing individual articles or adding editorial opinions.

## Setup and Imports

In [1]:
import sys
import pandas as pd
from datasets import load_dataset
import warnings
warnings.filterwarnings('ignore')

from forecast_kag.news_retrieval import NewsRetrievalPipeline

## Select a Question

In [2]:
question = "Will Apple release a new iPhone model before June 2024?"
background = "Apple typically releases new iPhone models in September each year."
question_date = "2024-03-01"

## Configure Pipeline Parameters

Set parameters for the news retrieval pipeline.

In [3]:
# Pipeline configuration
pipeline_config = {
    'num_questions': 5,           # Number of search keywords to generate
    'news_per_keyword': 2,        # News articles to retrieve per keyword
    'min_news_rating': 3,         # Minimum relevance rating (1-5) to keep articles
    'news_period_days': 90,       # How many days back to search for news
    'question_gen_temp': 0.7,     # Temperature for keyword generation
    'news_rating_temp': 0.3,      # Temperature for news rating
    'summarization_temp': 0.5,    # Temperature for summarization
    'max_tokens': 1000            # Max tokens for LLM responses
}

## Initialize and Run Pipeline

Create the pipeline and run it on the selected question.

In [4]:
pipeline = NewsRetrievalPipeline(
    model_shortname="llama70",
    **pipeline_config
)

results = pipeline.run(
    question=question,
    background=background,
    question_date=question_date
)

11/16/2025 11:44:16 AM - NEWS RETRIEVAL PIPELINE
11/16/2025 11:44:16 AM - Question: Will Apple release a new iPhone model before June 2024?
11/16/2025 11:44:16 AM - Date Range: 90 days
11/16/2025 11:44:16 AM - News search cutoff date: 2024-03-01
11/16/2025 11:44:16 AM - Searching news from 2023-12-02 to 2024-03-01
11/16/2025 11:44:16 AM - [Agent 1: Keyword Generation] Generating 5 search keywords...
11/16/2025 11:44:16 AM - Calling LLM for Keyword Generation...
11/16/2025 11:44:18 AM - Keyword Generation completed
11/16/2025 11:44:18 AM - Generated 5 search keywords
11/16/2025 11:44:18 AM -   1. Apple iPhone early release rumors 2024
11/16/2025 11:44:18 AM -   2. iPhone 16 production schedule leak
11/16/2025 11:44:18 AM -   3. Apple spring product launch event 2024
11/16/2025 11:44:18 AM -   4. iPhone new model prototype unveiling
11/16/2025 11:44:18 AM -   5. Apple deviates from September release tradition
11/16/2025 11:44:18 AM - 
[Agent 2: News Retrieval] Preparing search queries...

## Display Results

### Pipeline Statistics

In [5]:
for key, value in results['stats'].items():
    print(f"{key}: {value}")

num_search_keywords: 5
total_articles_retrieved: 9
total_articles_rated: 9
relevant_articles: 5
min_rating_threshold: 3


### Generated Search Keywords

In [6]:
for i, kw in enumerate(results['search_keywords'], 1):
    print(f"{i}. {kw}")


GENERATED SEARCH KEYWORDS
1. Apple iPhone early release rumors 2024
2. iPhone 16 production schedule leak
3. Apple spring product launch event 2024
4. iPhone new model prototype unveiling
5. Apple deviates from September release tradition


### All Retrieved News Articles (with Ratings)

All articles retrieved and rated by the pipeline, sorted by relevance rating.

In [7]:
print("\n" + "="*80)
print(f"ALL RETRIEVED NEWS ARTICLES ({len(results['all_rated_news'])} total)")
print("="*80)

if results['all_rated_news']:
    for i, article in enumerate(results['all_rated_news'], 1):
        print(f"\n{i}. [RATING: {article['relevance_rating']}/5]")
        print(f"   Title: {article.get('title', 'N/A')}")
        print(f"   Publisher: {article.get('publisher', {}).get('title', 'N/A') if isinstance(article.get('publisher'), dict) else 'N/A'}")
        print(f"   Date: {article.get('published date', 'N/A')}")
        print(f"   URL: {article.get('url', 'N/A')}")
        print(f"   Description: {article.get('description', 'N/A')[:200]}...")
        print(f"   Search Query: {article.get('search_query', 'N/A')[:60]}...")
else:
    print("No articles retrieved.")
print("\n" + "="*80)


ALL RETRIEVED NEWS ARTICLES (9 total)

1. [RATING: 4/5]
   Title: iPhone 16 in 2024 — here's everything Apple could announce this year - Tom's Guide
   Publisher: Tom's Guide
   Date: Sun, 31 Dec 2023 08:00:00 GMT
   URL: https://news.google.com/rss/articles/CBMiWEFVX3lxTE5kbVNNWko2eDhXdkh3aWlieXhLX29BZUxiaHhsUFF3ZF9OSFNkVzVsVVNpdFhlZHF4UGVnWGMxWUhXdDItWDgwdkhfVUtGRE5UOHFmTTRuRXY?oc=5&hl=en-US&gl=US&ceid=US:en
   Description: iPhone 16 in 2024 — here's everything Apple could announce this year  Tom's Guide...
   Search Query: Apple iPhone early release rumors 2024...

2. [RATING: 4/5]
   Title: Leaked iPhone 16 design is giving me major 2017 vibes - Creative Bloq
   Publisher: Creative Bloq
   Date: Wed, 07 Feb 2024 08:00:00 GMT
   URL: https://news.google.com/rss/articles/CBMiX0FVX3lxTE9aUURaUElVWGtyWFZoaE14MEpYdXphN2w4QTlZSmVMdE13REY0dVNaQjJaaGtjTnR1dUxJUW1zU185Q21iQm5iMjgwSmZud0h2V2t3N0FXdThLeEZvZkRr?oc=5&hl=en-US&gl=US&ceid=US:en
   Description: Leaked iPhone 16 design is giving m

### News Summary (Synthesized Analysis)

This summary synthesizes all articles with rating >= minimum threshold into a single cohesive analysis. The summary identifies key factors affecting the forecasting question by combining insights from all relevant articles, eliminating redundant information, and focusing on verifiable facts without editorial opinions.

In [8]:
print("\n" + "="*80)
print(f"NEWS SUMMARY (Only articles with rating >= {pipeline_config['min_news_rating']})")
print("="*80)
print(f"\nRelevant articles: {len(results['relevant_news'])}/{len(results['all_rated_news'])}\n")
print(results['summary'])
print("\n" + "="*80)


NEWS SUMMARY (Only articles with rating >= 3)

Relevant articles: 5/9

Apple's iPhone plans for 2024 are underway, with the company accounting for 52% of its $383 billion in sales in its 2023 fiscal year. The iPhone 15 models, released in the fall of 2023, featured significant upgrades, including the adoption of the Dynamic Island feature and a 48MP main camera on the standard model. The iPhone 15 Pro models had a lighter titanium frame and a powerful A17 Pro chipset, with the iPhone 15 Pro Max offering 256GB of storage and a redesigned telephoto lens.

Apple has a history of releasing new iPhone models in September of each year. In 2024, the company is rumored to be working on new models, with leaked schematics indicating a possible return to a vertical camera layout, similar to the iPhone 12. The design may also feature a slimmer cutout and rounded edges, reminiscent of the iPhone X. Additionally, the Action Button, currently limited to the iPhone 15 Pro, may be introduced to the en