# News Retrieval Pipeline for Forecasting

This notebook demonstrates the news retrieval pipeline that uses LLMs to:
1. **Generate search keywords** optimized for Google News
2. **Retrieve relevant news** articles using the generated keywords
3. **Rate the relevance** of each news article (1-5 scale)
4. **Synthesize all relevant news** into a single cohesive summary

## Select a Question

In [5]:
question = "Will Apple release a new iPhone model before June 2024?"
background = "Apple typically releases new iPhone models in September each year."
question_date = "2024-03-01"

## Initialize and Run Pipeline

Set parameters for the news retrieval pipeline.

In [6]:
# Pipeline configuration
pipeline_config = {
    'num_keywords': 4,            # Number of search keywords to generate
    'news_per_keyword': 2,        # News articles to retrieve per keyword
    'min_news_rating': 3,         # Minimum relevance rating (1-5) to keep articles
    'news_period_days': 90,       # How many days back to search for news
    'question_gen_temp': 0.7,     # Temperature for keyword generation
    'news_rating_temp': 0.3,      # Temperature for news rating
    'summarization_temp': 0.5,    # Temperature for summarization
    'max_tokens': 1000            # Max tokens for LLM responses
}

## Initialize and Run Pipeline

In [7]:
from forecast_kag.news_retrieval import NewsRetrievalPipeline

pipeline = NewsRetrievalPipeline(
     keyword_model="qwen80",      # For keyword generation
     rating_model="qwen80",         # For rating articles
     summarization_model="oss120",  # For summarization
     **pipeline_config
 )

# Run pipeline
print("="*80)
results = pipeline.run(
    question=question,
    background=background,
    question_date=question_date
)
print("="*80)

11/16/2025 12:31:08 PM - NEWS RETRIEVAL PIPELINE
11/16/2025 12:31:08 PM - Question: Will Apple release a new iPhone model before June 2024?
11/16/2025 12:31:08 PM - Date Range: 90 days
11/16/2025 12:31:08 PM - News search cutoff date: 2024-03-01
11/16/2025 12:31:08 PM - Searching news from 2023-12-02 to 2024-03-01
11/16/2025 12:31:08 PM - [Agent 1: Keyword Generation] Generating 4 search keywords...
11/16/2025 12:31:08 PM - Calling LLM for Keyword Generation...




11/16/2025 12:31:10 PM - Keyword Generation completed
11/16/2025 12:31:10 PM - Generated 4 search keywords
11/16/2025 12:31:10 PM -   1. Apple iPhone early release rumors 2024
11/16/2025 12:31:10 PM -   2. iPhone 16 production schedule leak
11/16/2025 12:31:10 PM -   3. Apple spring product launch event 2024
11/16/2025 12:31:10 PM -   4. iPhone 2024 prototype testing reports
11/16/2025 12:31:10 PM - 
[Agent 2: News Retrieval] Preparing search queries...
11/16/2025 12:31:10 PM - Original question: 'Will Apple release a new iPhone model before June 2024?...'
11/16/2025 12:31:10 PM - Total search queries: 5 (1 original + 4 keywords)
11/16/2025 12:31:10 PM - Executing 5 news searches...
11/16/2025 12:31:10 PM - Query 1/5 [ORIGINAL]: 'Will Apple release a new iPhone model before June 2024?...'
11/16/2025 12:31:15 PM - Found 2 articles
11/16/2025 12:31:15 PM - Query 2/5 [KEYWORD 1]: 'Apple iPhone early release rumors 2024...'
11/16/2025 12:31:18 PM - Found 2 articles
11/16/2025 12:31:18 PM -



### Pipeline Statistics

In [8]:
print("\n" + "="*80)
print("PIPELINE STATISTICS")
print("="*80)
for key, value in results['stats'].items():
    print(f"{key}: {value}")
print("="*80)


PIPELINE STATISTICS
num_search_keywords: 4
total_articles_retrieved: 9
total_articles_rated: 9
relevant_articles: 4
min_rating_threshold: 3


### Generated Search Keywords

In [9]:
print("\n" + "="*80)
print("GENERATED SEARCH KEYWORDS")
print("="*80)
for i, kw in enumerate(results['search_keywords'], 1):
    print(f"{i}. {kw}")
print("="*80)


GENERATED SEARCH KEYWORDS
1. Apple iPhone early release rumors 2024
2. iPhone 16 production schedule leak
3. Apple spring product launch event 2024
4. iPhone 2024 prototype testing reports


### All Retrieved News Articles (with Ratings)

All articles retrieved and rated by the pipeline, sorted by relevance rating.

In [10]:
print("\n" + "="*80)
print(f"ALL RETRIEVED NEWS ARTICLES ({len(results['all_rated_news'])} total)")
print("="*80)

if results['all_rated_news']:
    for i, article in enumerate(results['all_rated_news'], 1):
        print(f"\n{i}. [RATING: {article['relevance_rating']}/5]")
        print(f"   Title: {article.get('title', 'N/A')}")
        print(f"   Publisher: {article.get('publisher', {}).get('title', 'N/A') if isinstance(article.get('publisher'), dict) else 'N/A'}")
        print(f"   Date: {article.get('published date', 'N/A')}")
        print(f"   URL: {article.get('url', 'N/A')}")
        print(f"   Description: {article.get('description', 'N/A')[:200]}...")
        print(f"   Search Query: {article.get('search_query', 'N/A')[:60]}...")
else:
    print("No articles retrieved.")
print("\n" + "="*80)


ALL RETRIEVED NEWS ARTICLES (9 total)

1. [RATING: 4/5]
   Title: iPhone 16 in 2024 — here's everything Apple could announce this year - Tom's Guide
   Publisher: Tom's Guide
   Date: Sun, 31 Dec 2023 08:00:00 GMT
   URL: https://news.google.com/rss/articles/CBMiWEFVX3lxTE5kbVNNWko2eDhXdkh3aWlieXhLX29BZUxiaHhsUFF3ZF9OSFNkVzVsVVNpdFhlZHF4UGVnWGMxWUhXdDItWDgwdkhfVUtGRE5UOHFmTTRuRXY?oc=5&hl=en-US&gl=US&ceid=US:en
   Description: iPhone 16 in 2024 — here's everything Apple could announce this year  Tom's Guide...
   Search Query: Apple iPhone early release rumors 2024...

2. [RATING: 4/5]
   Title: iPhone 16 leak reveals Apple may have ditched bold design change for more of the same - iMore
   Publisher: iMore
   Date: Wed, 13 Dec 2023 08:00:00 GMT
   URL: https://news.google.com/rss/articles/CBMiswFBVV95cUxQWGxhRW5MeWlZNXFLRkEtWkt4Ym1TVGtVUU5uWFJCZjVQdG1SdDR3X3QySmNRTlBoQmRpUmFfQkp5bEw4d3llSUJFczdWanRnREZHMl9kb1NRTWZxOEltNDlpS0xWclVLZU9lY2RfaEZ1VjVQNnhWVmVPbzN2eGFoZExLSXJ1Wk9WdlNobWRlNk8

### News Summary (Synthesized Analysis)

This summary synthesizes all articles with rating >= minimum threshold into a single cohesive analysis. The summary identifies key factors affecting the forecasting question by combining insights from all relevant articles, eliminating redundant information, and focusing on verifiable facts without editorial opinions.

In [11]:
print("\n" + "="*80)
print(f"NEWS SUMMARY (Only articles with rating >= {pipeline_config['min_news_rating']})")
print("="*80)
print(f"\nRelevant articles: {len(results['relevant_news'])}/{len(results['all_rated_news'])}\n")
print(results['summary'])
print("\n" + "="*80)


NEWS SUMMARY (Only articles with rating >= 3)

Relevant articles: 4/9

Apple’s iPhone releases have historically been tied to a September‑time special event, and the articles note that the next iPhone launch is anticipated for September 2024. No Apple‑announced event before June 2024 is dedicated to a new iPhone; the confirmed early‑year events are a February 2024 launch of the Apple Vision Pro headset and the June 2024 Worldwide Developers Conference (WWDC), which focus on software and other product categories.  

The coverage details multiple design leaks for the upcoming iPhone 16, including changes to the camera layout, the addition of an Action button and a Capture button, and various button configurations. These leaks describe prototype renderings and speculation about the device’s appearance but do not provide an official release date or confirmation from Apple.  

Because Apple’s official schedule lists only a Vision Pro launch in February and WWDC in June, and because the nex