# News Retrieval Pipeline for Forecasting

This notebook demonstrates the news retrieval pipeline that uses LLMs to:
1. **Generate search keywords** optimized for Google News
2. **Retrieve relevant news** articles using the generated keywords
3. **Rate the relevance** of each news article (1-5 scale)
4. **Synthesize all relevant news** into a single cohesive summary

## Select a Question

In [1]:
question = "Will Apple release a new iPhone model before June 2024?"
background = "Apple typically releases new iPhone models in September each year."
question_date = "2024-03-01"

## Initialize and Run Pipeline

Set parameters for the news retrieval pipeline.

In [2]:
# Pipeline configuration
pipeline_config = {
    'num_keywords': 4,            # Number of search keywords to generate
    'news_per_keyword': 2,        # News articles to retrieve per keyword
    'min_news_rating': 3,         # Minimum relevance rating (1-5) to keep articles
    'news_period_days': 90,       # How many days back to search for news
    'question_gen_temp': 0.7,     # Temperature for keyword generation
    'news_rating_temp': 0.3,      # Temperature for news rating
    'summarization_temp': 0.5,    # Temperature for summarization
    'max_tokens': 1000            # Max tokens for LLM responses
}

In [None]:
from forecast_kag.news_retrieval import NewsRetrievalPipeline
# Initialize pipeline
print(f"Initializing pipeline...\n")
pipeline = NewsRetrievalPipeline(
    model_shortname="qwen80",
    **pipeline_config
)

# Run pipeline
print("="*80)
results = pipeline.run(
    question=question,
    background=background,
    question_date=question_date
)
print("="*80)

11/16/2025 12:19:22 PM - NEWS RETRIEVAL PIPELINE
11/16/2025 12:19:22 PM - Question: Will Apple release a new iPhone model before June 2024?
11/16/2025 12:19:22 PM - Date Range: 90 days
11/16/2025 12:19:22 PM - News search cutoff date: 2024-03-01
11/16/2025 12:19:22 PM - Searching news from 2023-12-02 to 2024-03-01
11/16/2025 12:19:22 PM - [Agent 1: Keyword Generation] Generating 4 search keywords...
11/16/2025 12:19:22 PM - Calling LLM for Keyword Generation...


Initializing pipeline...



11/16/2025 12:19:23 PM - Keyword Generation completed
11/16/2025 12:19:23 PM - Generated 4 search keywords
11/16/2025 12:19:23 PM -   1. Apple iPhone 16 launch date 2024
11/16/2025 12:19:23 PM -   2. Apple product roadmap leak 2024
11/16/2025 12:19:23 PM -   3. iPhone 16 production start date
11/16/2025 12:19:23 PM -   4. Apple supply chain rumors June 2024
11/16/2025 12:19:23 PM - 
[Agent 2: News Retrieval] Preparing search queries...
11/16/2025 12:19:23 PM - Original question: 'Will Apple release a new iPhone model before June 2024?...'
11/16/2025 12:19:23 PM - Total search queries: 5 (1 original + 4 keywords)
11/16/2025 12:19:23 PM - Executing 5 news searches...
11/16/2025 12:19:23 PM - Query 1/5 [ORIGINAL]: 'Will Apple release a new iPhone model before June 2024?...'
11/16/2025 12:19:33 PM - Found 2 articles
11/16/2025 12:19:33 PM - Query 2/5 [KEYWORD 1]: 'Apple iPhone 16 launch date 2024...'
11/16/2025 12:19:36 PM - Found 2 articles
11/16/2025 12:19:36 PM - Query 3/5 [KEYWORD 2]: 



## Display Results

### Pipeline Statistics

In [4]:
print("\n" + "="*80)
print("PIPELINE STATISTICS")
print("="*80)
for key, value in results['stats'].items():
    print(f"{key}: {value}")
print("="*80)


PIPELINE STATISTICS
num_search_keywords: 4
total_articles_retrieved: 8
total_articles_rated: 8
relevant_articles: 4
min_rating_threshold: 3


### Generated Search Keywords

In [5]:
print("\n" + "="*80)
print("GENERATED SEARCH KEYWORDS")
print("="*80)
for i, kw in enumerate(results['search_keywords'], 1):
    print(f"{i}. {kw}")
print("="*80)


GENERATED SEARCH KEYWORDS
1. Apple iPhone 16 launch date 2024
2. Apple product roadmap leak 2024
3. iPhone 16 production start date
4. Apple supply chain rumors June 2024


### All Retrieved News Articles (with Ratings)

All articles retrieved and rated by the pipeline, sorted by relevance rating.

In [6]:
print("\n" + "="*80)
print(f"ALL RETRIEVED NEWS ARTICLES ({len(results['all_rated_news'])} total)")
print("="*80)

if results['all_rated_news']:
    for i, article in enumerate(results['all_rated_news'], 1):
        print(f"\n{i}. [RATING: {article['relevance_rating']}/5]")
        print(f"   Title: {article.get('title', 'N/A')}")
        print(f"   Publisher: {article.get('publisher', {}).get('title', 'N/A') if isinstance(article.get('publisher'), dict) else 'N/A'}")
        print(f"   Date: {article.get('published date', 'N/A')}")
        print(f"   URL: {article.get('url', 'N/A')}")
        print(f"   Description: {article.get('description', 'N/A')[:200]}...")
        print(f"   Search Query: {article.get('search_query', 'N/A')[:60]}...")
else:
    print("No articles retrieved.")
print("\n" + "="*80)


ALL RETRIEVED NEWS ARTICLES (8 total)

1. [RATING: 4/5]
   Title: iPhone 16 in 2024 — here's everything Apple could announce this year - Tom's Guide
   Publisher: Tom's Guide
   Date: Sun, 31 Dec 2023 08:00:00 GMT
   URL: https://news.google.com/rss/articles/CBMiWEFVX3lxTE5kbVNNWko2eDhXdkh3aWlieXhLX29BZUxiaHhsUFF3ZF9OSFNkVzVsVVNpdFhlZHF4UGVnWGMxWUhXdDItWDgwdkhfVUtGRE5UOHFmTTRuRXY?oc=5&hl=en-US&gl=US&ceid=US:en
   Description: iPhone 16 in 2024 — here's everything Apple could announce this year  Tom's Guide...
   Search Query: Apple iPhone 16 launch date 2024...

2. [RATING: 4/5]
   Title: iPhone 16 Ultra: Release date rumors, news, and more - iMore
   Publisher: iMore
   Date: Mon, 05 Feb 2024 08:00:00 GMT
   URL: https://news.google.com/rss/articles/CBMihAFBVV95cUxQS0FxU1ZZQjlaUTdsakIzbVpCVkk2eHlqZ0tacFlQMTBDanVodzdJN1pKQzlGdEdQM2hIdUwzTHEyNk9mdzZlNWVSTU1lVHFHYmlyVUc4ek5jVXJiV0lZbklnLW8zaGdibDFqOGtCRm1NU1dyMzI1aEtFOEdhNDhoVGFIalM?oc=5&hl=en-US&gl=US&ceid=US:en
   Description: iPhone 

### News Summary (Synthesized Analysis)

This summary synthesizes all articles with rating >= minimum threshold into a single cohesive analysis. The summary identifies key factors affecting the forecasting question by combining insights from all relevant articles, eliminating redundant information, and focusing on verifiable facts without editorial opinions.

In [7]:
print("\n" + "="*80)
print(f"NEWS SUMMARY (Only articles with rating >= {pipeline_config['min_news_rating']})")
print("="*80)
print(f"\nRelevant articles: {len(results['relevant_news'])}/{len(results['all_rated_news'])}\n")
print(results['summary'])
print("\n" + "="*80)


NEWS SUMMARY (Only articles with rating >= 3)

Relevant articles: 4/8

Apple has historically released new iPhone models in September each year. Information indicates that the next iteration, referred to as the iPhone 16, is anticipated to follow this pattern with a launch in September 2024. Rumors and leaks describe planned modifications to the iPhone 16, including changes to the button layout, a vertical camera configuration, the inclusion of an Action button, and the introduction of a new Capture button. These design adjustments are based on pre-production renders and industry reports, but no official release date has been confirmed.

Production planning for the iPhone 16 is underway, with Apple directing suppliers to shift battery manufacturing from China to India as part of a broader supply chain diversification strategy. This transition involves companies such as Desay and Simplo Technology and aligns with Apple’s prior movement of iPhone 15 production to India. The company’s ef