<a href="https://colab.research.google.com/github/ianellisjones/usn/blob/main/Geopolitics_News_Aggregator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üåç GEOPOLITICS & DEFENSE NEWS AGGREGATOR

**An AI-Powered News Intelligence Platform**

This notebook automatically aggregates, categorizes, and publishes geopolitics and defense news headlines. Think of it as a modern, automated Drudge Report focused on global security.

### Features:
- **Multi-Source Aggregation**: Pulls from 30+ premium news sources via RSS and web scraping
- **AI Categorization**: Uses Claude AI to intelligently categorize and prioritize headlines
- **Geographic Bucketing**: Organizes news by region (Americas, Europe, Asia-Pacific, Middle East, Africa)
- **Priority Ranking**: Identifies breaking news and high-impact stories
- **Modern Design**: Generates a clean, corporate-quality responsive website
- **Auto-Deploy**: Publishes directly to GitHub Pages

---

## üì¶ Step 1: Install Dependencies

In [None]:
%%shell
# Install required packages
pip install feedparser anthropic requests beautifulsoup4 newspaper3k lxml_html_clean python-dateutil pytz --quiet

# For GitHub deployment
pip install PyGithub --quiet

## üîë Step 2: Configuration

Set your API keys and preferences here. You can store these in Colab secrets for security.

In [None]:
import os
from datetime import datetime
import pytz

# =============================================================================
# CONFIGURATION - Edit these values
# =============================================================================

# Option 1: Use Colab secrets (recommended)
try:
    from google.colab import userdata
    ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')
    GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')  # Optional: for auto-deploy
except:
    # Option 2: Set directly (not recommended for production)
    ANTHROPIC_API_KEY = "your-anthropic-api-key-here"
    GITHUB_TOKEN = None  # Optional

# Site Configuration
SITE_TITLE = "GLOBAL SECURITY BRIEFING"
SITE_SUBTITLE = "Real-Time Geopolitics & Defense Intelligence"
TIMEZONE = "US/Eastern"

# GitHub Pages Configuration (optional)
GITHUB_REPO = "ianellisjones/usn"  # Format: username/repo
GITHUB_BRANCH = "gh-pages"
OUTPUT_FILENAME = "index.html"

# AI Settings
MAX_HEADLINES_PER_REGION = 15
PRIORITY_KEYWORDS = [
    "breaking", "urgent", "war", "attack", "invasion", "strike",
    "nuclear", "missile", "troops", "military", "nato", "china",
    "russia", "iran", "north korea", "taiwan", "ukraine"
]

print(f"‚úÖ Configuration loaded")
print(f"   Site: {SITE_TITLE}")
print(f"   Timezone: {TIMEZONE}")
print(f"   API Key: {'Set' if ANTHROPIC_API_KEY and ANTHROPIC_API_KEY != 'your-anthropic-api-key-here' else '‚ö†Ô∏è NOT SET'}")

## üì∞ Step 3: News Sources Database

Curated list of premium geopolitics and defense news sources.

In [None]:
# =============================================================================
# NEWS SOURCES - Organized by category
# =============================================================================

NEWS_SOURCES = {
    # === DEFENSE & MILITARY ===
    "defense": [
        {"name": "Defense News", "url": "https://www.defensenews.com/arc/outboundfeeds/rss/?outputType=xml", "type": "rss"},
        {"name": "Breaking Defense", "url": "https://breakingdefense.com/feed/", "type": "rss"},
        {"name": "Defense One", "url": "https://www.defenseone.com/rss/all/", "type": "rss"},
        {"name": "Military Times", "url": "https://www.militarytimes.com/arc/outboundfeeds/rss/?outputType=xml", "type": "rss"},
        {"name": "USNI News", "url": "https://news.usni.org/feed", "type": "rss"},
        {"name": "War on the Rocks", "url": "https://warontherocks.com/feed/", "type": "rss"},
        {"name": "The War Zone", "url": "https://www.thedrive.com/the-war-zone/feed", "type": "rss"},
        {"name": "Naval News", "url": "https://www.navalnews.com/feed/", "type": "rss"},
        {"name": "Air & Space Forces", "url": "https://www.airandspaceforces.com/feed/", "type": "rss"},
        {"name": "Stars and Stripes", "url": "https://www.stripes.com/rss", "type": "rss"},
        {"name": "Janes", "url": "https://www.janes.com/feeds/news", "type": "rss"},
    ],

    # === GEOPOLITICS & FOREIGN POLICY ===
    "geopolitics": [
        {"name": "Foreign Affairs", "url": "https://www.foreignaffairs.com/rss.xml", "type": "rss"},
        {"name": "Foreign Policy", "url": "https://foreignpolicy.com/feed/", "type": "rss"},
        {"name": "The Diplomat", "url": "https://thediplomat.com/feed/", "type": "rss"},
        {"name": "CSIS", "url": "https://www.csis.org/analysis/feed", "type": "rss"},
        {"name": "Brookings", "url": "https://www.brookings.edu/feed/", "type": "rss"},
        {"name": "RAND", "url": "https://www.rand.org/news/press.xml", "type": "rss"},
        {"name": "Carnegie Endowment", "url": "https://carnegieendowment.org/rss/solr/?fa=feeds", "type": "rss"},
        {"name": "Council on Foreign Relations", "url": "https://www.cfr.org/rss/expert-brief", "type": "rss"},
        {"name": "Atlantic Council", "url": "https://www.atlanticcouncil.org/feed/", "type": "rss"},
    ],

    # === WIRE SERVICES & MAJOR NEWS ===
    "wire": [
        {"name": "Reuters World", "url": "https://www.reutersagency.com/feed/?taxonomy=best-topics&post_type=best", "type": "rss"},
        {"name": "AP News", "url": "https://rsshub.app/apnews/topics/world-news", "type": "rss"},
        {"name": "BBC World", "url": "http://feeds.bbci.co.uk/news/world/rss.xml", "type": "rss"},
        {"name": "Al Jazeera", "url": "https://www.aljazeera.com/xml/rss/all.xml", "type": "rss"},
        {"name": "France 24", "url": "https://www.france24.com/en/rss", "type": "rss"},
        {"name": "DW News", "url": "https://rss.dw.com/rdf/rss-en-all", "type": "rss"},
    ],

    # === REGIONAL SPECIALISTS ===
    "regional": [
        {"name": "South China Morning Post", "url": "https://www.scmp.com/rss/91/feed", "type": "rss"},
        {"name": "Nikkei Asia", "url": "https://asia.nikkei.com/rss/feed/nar", "type": "rss"},
        {"name": "The Moscow Times", "url": "https://www.themoscowtimes.com/rss/news", "type": "rss"},
        {"name": "Times of Israel", "url": "https://www.timesofisrael.com/feed/", "type": "rss"},
        {"name": "Middle East Eye", "url": "https://www.middleeasteye.net/rss", "type": "rss"},
        {"name": "Kyiv Independent", "url": "https://kyivindependent.com/feed/", "type": "rss"},
        {"name": "ISW", "url": "https://www.understandingwar.org/rss.xml", "type": "rss"},
    ],

    # === INTELLIGENCE & SECURITY ===
    "intel": [
        {"name": "Bellingcat", "url": "https://www.bellingcat.com/feed/", "type": "rss"},
        {"name": "The Intercept", "url": "https://theintercept.com/feed/?rss", "type": "rss"},
        {"name": "Lawfare", "url": "https://www.lawfaremedia.org/rss.xml", "type": "rss"},
        {"name": "Just Security", "url": "https://www.justsecurity.org/feed/", "type": "rss"},
    ],
}

# Count total sources
total_sources = sum(len(sources) for sources in NEWS_SOURCES.values())
print(f"üì∞ Loaded {total_sources} news sources across {len(NEWS_SOURCES)} categories")

## üîÑ Step 4: News Fetcher Engine

Core engine for fetching and parsing news from multiple sources.

In [None]:
import feedparser
import requests
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
from dateutil import parser as date_parser
import time
import re
from typing import List, Dict, Optional
from concurrent.futures import ThreadPoolExecutor, as_completed
import hashlib

class NewsFetcher:
    """Multi-source news aggregation engine."""

    def __init__(self, sources: Dict):
        self.sources = sources
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
            'Accept': 'application/rss+xml, application/xml, text/xml, */*',
        }
        self.session = requests.Session()
        self.session.headers.update(self.headers)

    def fetch_rss(self, source: Dict) -> List[Dict]:
        """Fetch and parse RSS feed."""
        articles = []
        try:
            response = self.session.get(source['url'], timeout=15)
            feed = feedparser.parse(response.content)

            for entry in feed.entries[:20]:  # Limit per source
                # Parse publication date
                pub_date = None
                for date_field in ['published', 'pubDate', 'updated', 'created']:
                    if hasattr(entry, date_field) and getattr(entry, date_field):
                        try:
                            pub_date = date_parser.parse(getattr(entry, date_field))
                            break
                        except:
                            continue

                # Skip articles older than 48 hours
                if pub_date:
                    if pub_date.tzinfo is None:
                        pub_date = pub_date.replace(tzinfo=pytz.UTC)
                    age = datetime.now(pytz.UTC) - pub_date
                    if age > timedelta(hours=48):
                        continue

                # Clean title
                title = entry.get('title', '').strip()
                title = re.sub(r'\s+', ' ', title)

                # Get description/summary
                description = entry.get('summary', entry.get('description', ''))
                if description:
                    description = BeautifulSoup(description, 'html.parser').get_text()
                    description = re.sub(r'\s+', ' ', description).strip()[:300]

                if title and len(title) > 10:
                    articles.append({
                        'title': title,
                        'url': entry.get('link', ''),
                        'source': source['name'],
                        'published': pub_date,
                        'description': description,
                        'id': hashlib.md5(title.encode()).hexdigest()[:8]
                    })

        except Exception as e:
            print(f"   ‚ö†Ô∏è Error fetching {source['name']}: {str(e)[:50]}")

        return articles

    def fetch_all(self, max_workers: int = 10) -> List[Dict]:
        """Fetch from all sources concurrently."""
        all_articles = []
        all_sources = []

        # Flatten sources
        for category, sources in self.sources.items():
            for source in sources:
                source['category'] = category
                all_sources.append(source)

        print(f"\nüîÑ Fetching from {len(all_sources)} sources...\n")

        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            future_to_source = {
                executor.submit(self.fetch_rss, source): source
                for source in all_sources if source['type'] == 'rss'
            }

            for i, future in enumerate(as_completed(future_to_source)):
                source = future_to_source[future]
                try:
                    articles = future.result()
                    if articles:
                        all_articles.extend(articles)
                        print(f"   ‚úì {source['name']}: {len(articles)} articles")
                    else:
                        print(f"   ‚óã {source['name']}: No recent articles")
                except Exception as e:
                    print(f"   ‚úó {source['name']}: Failed")

        # Deduplicate by title similarity
        seen_titles = set()
        unique_articles = []
        for article in all_articles:
            title_key = re.sub(r'[^a-z0-9]', '', article['title'].lower())[:50]
            if title_key not in seen_titles:
                seen_titles.add(title_key)
                unique_articles.append(article)

        print(f"\nüìä Total: {len(unique_articles)} unique articles (from {len(all_articles)} raw)")
        return unique_articles

# Initialize fetcher
fetcher = NewsFetcher(NEWS_SOURCES)
print("‚úÖ News Fetcher initialized")

## ü§ñ Step 5: AI Categorization Engine

Uses Claude AI to intelligently categorize and prioritize headlines.

In [None]:
import anthropic
import json

class AICategorizor:
    """AI-powered news categorization and prioritization."""

    REGIONS = [
        "AMERICAS",
        "EUROPE",
        "ASIA_PACIFIC",
        "MIDDLE_EAST",
        "AFRICA",
        "GLOBAL"  # For stories spanning multiple regions
    ]

    PRIORITIES = ["BREAKING", "HIGH", "MEDIUM", "STANDARD"]

    def __init__(self, api_key: str):
        self.client = anthropic.Anthropic(api_key=api_key)

    def categorize_batch(self, articles: List[Dict], batch_size: int = 25) -> List[Dict]:
        """Categorize articles in batches for efficiency."""
        categorized = []

        for i in range(0, len(articles), batch_size):
            batch = articles[i:i+batch_size]
            print(f"   Processing batch {i//batch_size + 1}/{(len(articles)-1)//batch_size + 1}...")

            # Prepare batch for AI
            headlines_text = "\n".join([
                f"{j+1}. [{a['source']}] {a['title']}"
                for j, a in enumerate(batch)
            ])

            prompt = f"""Analyze these geopolitics/defense news headlines and categorize each one.

For each headline, provide:
1. REGION: One of {self.REGIONS}
2. PRIORITY: One of {self.PRIORITIES}
   - BREAKING: Major developing crisis, conflict escalation, significant military action
   - HIGH: Important policy shifts, military movements, diplomatic tensions
   - MEDIUM: Notable developments, analysis pieces
   - STANDARD: General news, routine updates
3. TOPIC: Brief 2-3 word topic tag (e.g., "Ukraine War", "Taiwan Strait", "NATO Expansion")

Headlines:
{headlines_text}

Respond with ONLY a JSON array, no other text:
[
  {{"index": 1, "region": "REGION", "priority": "PRIORITY", "topic": "Topic"}},
  ...
]"""

            try:
                response = self.client.messages.create(
                    model="claude-sonnet-4-20250514",
                    max_tokens=2000,
                    messages=[{"role": "user", "content": prompt}]
                )

                # Parse response
                response_text = response.content[0].text.strip()

                # Extract JSON from response
                json_match = re.search(r'\[.*\]', response_text, re.DOTALL)
                if json_match:
                    results = json.loads(json_match.group())

                    for result in results:
                        idx = result.get('index', 0) - 1
                        if 0 <= idx < len(batch):
                            article = batch[idx].copy()
                            article['region'] = result.get('region', 'GLOBAL')
                            article['priority'] = result.get('priority', 'STANDARD')
                            article['topic'] = result.get('topic', '')
                            categorized.append(article)
                else:
                    # Fallback: add with default categorization
                    for article in batch:
                        article['region'] = 'GLOBAL'
                        article['priority'] = 'STANDARD'
                        article['topic'] = ''
                        categorized.append(article)

            except Exception as e:
                print(f"   ‚ö†Ô∏è AI error: {e}")
                # Fallback categorization
                for article in batch:
                    article['region'] = self._guess_region(article['title'])
                    article['priority'] = self._guess_priority(article['title'])
                    article['topic'] = ''
                    categorized.append(article)

            time.sleep(0.5)  # Rate limiting

        return categorized

    def _guess_region(self, title: str) -> str:
        """Fallback region detection based on keywords."""
        title_lower = title.lower()
        region_keywords = {
            'AMERICAS': ['us ', 'u.s.', 'america', 'pentagon', 'washington', 'canada', 'mexico', 'brazil', 'venezuela', 'cuba'],
            'EUROPE': ['europe', 'nato', 'eu ', 'ukraine', 'russia', 'uk ', 'britain', 'france', 'germany', 'poland', 'baltic'],
            'ASIA_PACIFIC': ['china', 'taiwan', 'japan', 'korea', 'pacific', 'indo-pacific', 'australia', 'philippines', 'vietnam', 'asean'],
            'MIDDLE_EAST': ['israel', 'iran', 'saudi', 'gaza', 'yemen', 'syria', 'iraq', 'lebanon', 'gulf', 'houthi'],
            'AFRICA': ['africa', 'sahel', 'niger', 'sudan', 'ethiopia', 'libya', 'egypt', 'mali', 'somalia'],
        }
        for region, keywords in region_keywords.items():
            if any(kw in title_lower for kw in keywords):
                return region
        return 'GLOBAL'

    def _guess_priority(self, title: str) -> str:
        """Fallback priority detection based on keywords."""
        title_lower = title.lower()
        if any(kw in title_lower for kw in ['breaking', 'urgent', 'just in', 'developing']):
            return 'BREAKING'
        if any(kw in title_lower for kw in ['attack', 'strike', 'war', 'invasion', 'kills', 'dead', 'troops']):
            return 'HIGH'
        if any(kw in title_lower for kw in ['military', 'defense', 'nuclear', 'missile', 'navy', 'air force']):
            return 'MEDIUM'
        return 'STANDARD'

# Initialize if API key is set
if ANTHROPIC_API_KEY and ANTHROPIC_API_KEY != 'your-anthropic-api-key-here':
    ai_categorizer = AICategorizor(ANTHROPIC_API_KEY)
    print("‚úÖ AI Categorizer initialized with Claude API")
else:
    ai_categorizer = None
    print("‚ö†Ô∏è AI Categorizer not initialized - using fallback keyword matching")

## üé® Step 6: Website Generator

Generates a modern, responsive HTML website with the aggregated news.

In [None]:
class WebsiteGenerator:
    """Generates modern, responsive news aggregation website."""

    def __init__(self, site_title: str, site_subtitle: str, timezone: str):
        self.site_title = site_title
        self.site_subtitle = site_subtitle
        self.tz = pytz.timezone(timezone)

    def generate(self, articles: List[Dict]) -> str:
        """Generate complete HTML page."""
        now = datetime.now(self.tz)
        timestamp = now.strftime("%B %d, %Y at %I:%M %p %Z")

        # Organize articles by region and priority
        organized = self._organize_articles(articles)

        # Generate HTML sections
        breaking_html = self._generate_breaking_section(organized.get('BREAKING', []))
        regions_html = self._generate_regions_section(organized)

        html = f'''<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="description" content="{self.site_subtitle}">
    <title>{self.site_title}</title>
    <link rel="preconnect" href="https://fonts.googleapis.com">
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700;800&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
    <style>
        :root {{
            --bg-primary: #0a0a0b;
            --bg-secondary: #111113;
            --bg-tertiary: #18181b;
            --text-primary: #fafafa;
            --text-secondary: #a1a1aa;
            --text-muted: #71717a;
            --accent-red: #ef4444;
            --accent-orange: #f97316;
            --accent-blue: #3b82f6;
            --accent-green: #22c55e;
            --accent-purple: #a855f7;
            --accent-cyan: #06b6d4;
            --border-color: #27272a;
            --border-hover: #3f3f46;
        }}

        * {{
            margin: 0;
            padding: 0;
            box-sizing: border-box;
        }}

        body {{
            font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
            background: var(--bg-primary);
            color: var(--text-primary);
            line-height: 1.6;
            min-height: 100vh;
        }}

        /* Header */
        .header {{
            background: linear-gradient(180deg, var(--bg-secondary) 0%, var(--bg-primary) 100%);
            border-bottom: 1px solid var(--border-color);
            padding: 2rem 1rem;
            text-align: center;
            position: sticky;
            top: 0;
            z-index: 100;
            backdrop-filter: blur(10px);
        }}

        .header h1 {{
            font-size: clamp(1.5rem, 4vw, 2.5rem);
            font-weight: 800;
            letter-spacing: 0.1em;
            background: linear-gradient(135deg, var(--text-primary) 0%, var(--accent-blue) 100%);
            -webkit-background-clip: text;
            -webkit-text-fill-color: transparent;
            background-clip: text;
        }}

        .header .subtitle {{
            color: var(--text-muted);
            font-size: 0.875rem;
            margin-top: 0.5rem;
            font-weight: 500;
        }}

        .header .timestamp {{
            font-family: 'JetBrains Mono', monospace;
            font-size: 0.75rem;
            color: var(--accent-green);
            margin-top: 0.75rem;
            display: flex;
            align-items: center;
            justify-content: center;
            gap: 0.5rem;
        }}

        .header .timestamp::before {{
            content: '';
            width: 8px;
            height: 8px;
            background: var(--accent-green);
            border-radius: 50%;
            animation: pulse 2s infinite;
        }}

        @keyframes pulse {{
            0%, 100% {{ opacity: 1; }}
            50% {{ opacity: 0.5; }}
        }}

        /* Main Content */
        .container {{
            max-width: 1400px;
            margin: 0 auto;
            padding: 1.5rem 1rem;
        }}

        /* Breaking News */
        .breaking {{
            background: linear-gradient(135deg, rgba(239, 68, 68, 0.1) 0%, rgba(239, 68, 68, 0.05) 100%);
            border: 1px solid rgba(239, 68, 68, 0.3);
            border-radius: 12px;
            padding: 1.5rem;
            margin-bottom: 2rem;
        }}

        .breaking-header {{
            display: flex;
            align-items: center;
            gap: 0.75rem;
            margin-bottom: 1rem;
            padding-bottom: 1rem;
            border-bottom: 1px solid rgba(239, 68, 68, 0.2);
        }}

        .breaking-badge {{
            background: var(--accent-red);
            color: white;
            font-size: 0.7rem;
            font-weight: 700;
            padding: 0.25rem 0.75rem;
            border-radius: 4px;
            letter-spacing: 0.05em;
            animation: flash 1.5s infinite;
        }}

        @keyframes flash {{
            0%, 100% {{ opacity: 1; }}
            50% {{ opacity: 0.7; }}
        }}

        .breaking-title {{
            color: var(--accent-red);
            font-size: 0.875rem;
            font-weight: 600;
            letter-spacing: 0.05em;
        }}

        /* Region Grid */
        .regions {{
            display: grid;
            grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
            gap: 1.5rem;
        }}

        .region {{
            background: var(--bg-secondary);
            border: 1px solid var(--border-color);
            border-radius: 12px;
            overflow: hidden;
            transition: border-color 0.2s ease;
        }}

        .region:hover {{
            border-color: var(--border-hover);
        }}

        .region-header {{
            padding: 1rem 1.25rem;
            border-bottom: 1px solid var(--border-color);
            display: flex;
            align-items: center;
            gap: 0.75rem;
        }}

        .region-icon {{
            font-size: 1.25rem;
        }}

        .region-name {{
            font-weight: 700;
            font-size: 0.875rem;
            letter-spacing: 0.05em;
            text-transform: uppercase;
        }}

        .region-count {{
            margin-left: auto;
            font-family: 'JetBrains Mono', monospace;
            font-size: 0.75rem;
            color: var(--text-muted);
            background: var(--bg-tertiary);
            padding: 0.25rem 0.5rem;
            border-radius: 4px;
        }}

        .region.americas .region-header {{ border-left: 3px solid var(--accent-blue); }}
        .region.europe .region-header {{ border-left: 3px solid var(--accent-purple); }}
        .region.asia-pacific .region-header {{ border-left: 3px solid var(--accent-orange); }}
        .region.middle-east .region-header {{ border-left: 3px solid var(--accent-red); }}
        .region.africa .region-header {{ border-left: 3px solid var(--accent-green); }}
        .region.global .region-header {{ border-left: 3px solid var(--accent-cyan); }}

        /* Article List */
        .articles {{
            padding: 0.5rem 0;
        }}

        .article {{
            padding: 0.875rem 1.25rem;
            border-bottom: 1px solid var(--border-color);
            transition: background 0.2s ease;
        }}

        .article:last-child {{
            border-bottom: none;
        }}

        .article:hover {{
            background: var(--bg-tertiary);
        }}

        .article a {{
            color: var(--text-primary);
            text-decoration: none;
            font-size: 0.9rem;
            font-weight: 500;
            line-height: 1.4;
            display: block;
            transition: color 0.2s ease;
        }}

        .article a:hover {{
            color: var(--accent-blue);
        }}

        .article-meta {{
            display: flex;
            align-items: center;
            gap: 0.75rem;
            margin-top: 0.5rem;
            flex-wrap: wrap;
        }}

        .article-source {{
            font-size: 0.7rem;
            color: var(--text-muted);
            font-weight: 500;
        }}

        .article-time {{
            font-family: 'JetBrains Mono', monospace;
            font-size: 0.65rem;
            color: var(--text-muted);
        }}

        .article-topic {{
            font-size: 0.65rem;
            color: var(--accent-cyan);
            background: rgba(6, 182, 212, 0.1);
            padding: 0.125rem 0.5rem;
            border-radius: 3px;
        }}

        .priority-high .article a {{
            color: var(--accent-orange);
        }}

        /* Footer */
        .footer {{
            text-align: center;
            padding: 2rem 1rem;
            margin-top: 2rem;
            border-top: 1px solid var(--border-color);
            color: var(--text-muted);
            font-size: 0.75rem;
        }}

        .footer a {{
            color: var(--accent-blue);
            text-decoration: none;
        }}

        /* Responsive */
        @media (max-width: 768px) {{
            .regions {{
                grid-template-columns: 1fr;
            }}

            .header {{
                padding: 1.5rem 1rem;
            }}
        }}

        /* Empty State */
        .empty-state {{
            text-align: center;
            padding: 2rem;
            color: var(--text-muted);
        }}
    </style>
</head>
<body>
    <header class="header">
        <h1>{self.site_title}</h1>
        <p class="subtitle">{self.site_subtitle}</p>
        <div class="timestamp">Last Updated: {timestamp}</div>
    </header>

    <main class="container">
        {breaking_html}
        <div class="regions">
            {regions_html}
        </div>
    </main>

    <footer class="footer">
        <p>Powered by AI-driven news aggregation</p>
        <p style="margin-top: 0.5rem;">Headlines sourced from {len(set(a.get("source", "") for a in articles))} publishers</p>
    </footer>
</body>
</html>'''

        return html

    def _organize_articles(self, articles: List[Dict]) -> Dict:
        """Organize articles by region and priority."""
        organized = {'BREAKING': []}
        region_icons = {
            'AMERICAS': 'üåé',
            'EUROPE': 'üåç',
            'ASIA_PACIFIC': 'üåè',
            'MIDDLE_EAST': 'üïå',
            'AFRICA': 'üåç',
            'GLOBAL': 'üåê'
        }

        for region in region_icons.keys():
            organized[region] = []

        for article in articles:
            region = article.get('region', 'GLOBAL')
            priority = article.get('priority', 'STANDARD')

            if priority == 'BREAKING':
                organized['BREAKING'].append(article)

            if region in organized:
                organized[region].append(article)
            else:
                organized['GLOBAL'].append(article)

        # Sort each region by priority then time
        priority_order = {'BREAKING': 0, 'HIGH': 1, 'MEDIUM': 2, 'STANDARD': 3}
        for region in organized:
            organized[region].sort(
                key=lambda x: (
                    priority_order.get(x.get('priority', 'STANDARD'), 3),
                    -(x.get('published') or datetime.min.replace(tzinfo=pytz.UTC)).timestamp()
                )
            )
            # Limit articles per region
            organized[region] = organized[region][:MAX_HEADLINES_PER_REGION]

        return organized

    def _generate_breaking_section(self, articles: List[Dict]) -> str:
        """Generate breaking news section."""
        if not articles:
            return ''

        items = ''
        for article in articles[:5]:  # Top 5 breaking
            items += f'''
            <div class="article">
                <a href="{article.get('url', '#')}" target="_blank" rel="noopener">{article.get('title', 'Untitled')}</a>
                <div class="article-meta">
                    <span class="article-source">{article.get('source', 'Unknown')}</span>
                    {f'<span class="article-topic">{article.get("topic")}</span>' if article.get('topic') else ''}
                </div>
            </div>'''

        return f'''
        <section class="breaking">
            <div class="breaking-header">
                <span class="breaking-badge">BREAKING</span>
                <span class="breaking-title">DEVELOPING STORIES</span>
            </div>
            <div class="articles">
                {items}
            </div>
        </section>'''

    def _generate_regions_section(self, organized: Dict) -> str:
        """Generate regional news sections."""
        region_config = {
            'AMERICAS': {'name': 'Americas', 'icon': 'üåé', 'class': 'americas'},
            'EUROPE': {'name': 'Europe', 'icon': 'üåç', 'class': 'europe'},
            'ASIA_PACIFIC': {'name': 'Asia-Pacific', 'icon': 'üåè', 'class': 'asia-pacific'},
            'MIDDLE_EAST': {'name': 'Middle East', 'icon': 'üïå', 'class': 'middle-east'},
            'AFRICA': {'name': 'Africa', 'icon': 'üåç', 'class': 'africa'},
            'GLOBAL': {'name': 'Global', 'icon': 'üåê', 'class': 'global'},
        }

        sections = ''
        for region_key, config in region_config.items():
            articles = organized.get(region_key, [])
            if not articles:
                continue

            items = ''
            for article in articles:
                priority_class = 'priority-high' if article.get('priority') == 'HIGH' else ''
                time_str = ''
                if article.get('published'):
                    try:
                        local_time = article['published'].astimezone(self.tz)
                        time_str = local_time.strftime('%I:%M %p')
                    except:
                        pass

                items += f'''
                <div class="article {priority_class}">
                    <a href="{article.get('url', '#')}" target="_blank" rel="noopener">{article.get('title', 'Untitled')}</a>
                    <div class="article-meta">
                        <span class="article-source">{article.get('source', 'Unknown')}</span>
                        {f'<span class="article-time">{time_str}</span>' if time_str else ''}
                        {f'<span class="article-topic">{article.get("topic")}</span>' if article.get('topic') else ''}
                    </div>
                </div>'''

            sections += f'''
            <section class="region {config['class']}">
                <div class="region-header">
                    <span class="region-icon">{config['icon']}</span>
                    <span class="region-name">{config['name']}</span>
                    <span class="region-count">{len(articles)}</span>
                </div>
                <div class="articles">
                    {items}
                </div>
            </section>'''

        return sections

# Initialize generator
generator = WebsiteGenerator(SITE_TITLE, SITE_SUBTITLE, TIMEZONE)
print("‚úÖ Website Generator initialized")

## üöÄ Step 7: GitHub Pages Deployment

Automatically deploy the generated website to GitHub Pages.

In [None]:
from github import Github
import base64

class GitHubDeployer:
    """Deploy generated website to GitHub Pages."""

    def __init__(self, token: str, repo_name: str, branch: str = "gh-pages"):
        self.github = Github(token)
        self.repo = self.github.get_repo(repo_name)
        self.branch = branch

    def deploy(self, html_content: str, filename: str = "index.html") -> str:
        """Deploy HTML file to GitHub Pages."""
        try:
            # Check if branch exists
            try:
                self.repo.get_branch(self.branch)
            except:
                # Create branch from main/master
                default_branch = self.repo.default_branch
                source = self.repo.get_branch(default_branch)
                self.repo.create_git_ref(
                    ref=f"refs/heads/{self.branch}",
                    sha=source.commit.sha
                )
                print(f"   Created branch: {self.branch}")

            # Check if file exists
            try:
                file = self.repo.get_contents(filename, ref=self.branch)
                # Update existing file
                self.repo.update_file(
                    path=filename,
                    message=f"Update {filename} - {datetime.now().strftime('%Y-%m-%d %H:%M')}",
                    content=html_content,
                    sha=file.sha,
                    branch=self.branch
                )
                print(f"   ‚úì Updated {filename}")
            except:
                # Create new file
                self.repo.create_file(
                    path=filename,
                    message=f"Create {filename} - {datetime.now().strftime('%Y-%m-%d %H:%M')}",
                    content=html_content,
                    branch=self.branch
                )
                print(f"   ‚úì Created {filename}")

            # Return the GitHub Pages URL
            owner = self.repo.owner.login
            repo_name = self.repo.name
            return f"https://{owner}.github.io/{repo_name}/"

        except Exception as e:
            print(f"   ‚ö†Ô∏è Deploy error: {e}")
            return None

# Initialize deployer if token is available
if GITHUB_TOKEN:
    deployer = GitHubDeployer(GITHUB_TOKEN, GITHUB_REPO, GITHUB_BRANCH)
    print("‚úÖ GitHub Deployer initialized")
else:
    deployer = None
    print("‚ö†Ô∏è GitHub Deployer not initialized - will save locally only")

## ‚ñ∂Ô∏è Step 8: Run the Aggregator

Execute the complete pipeline: Fetch ‚Üí Categorize ‚Üí Generate ‚Üí Deploy

In [None]:
def run_aggregator(deploy: bool = True, save_local: bool = True):
    """
    Run the complete news aggregation pipeline.

    Args:
        deploy: Whether to deploy to GitHub Pages
        save_local: Whether to save HTML file locally
    """
    print("="*70)
    print(f"üåç GEOPOLITICS NEWS AGGREGATOR")
    print(f"   Started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print("="*70)

    # Step 1: Fetch news
    print("\nüì• STEP 1: Fetching News...")
    articles = fetcher.fetch_all()

    if not articles:
        print("\n‚ùå No articles fetched. Check your internet connection.")
        return

    # Step 2: AI Categorization
    print("\nü§ñ STEP 2: AI Categorization...")
    if ai_categorizer:
        categorized = ai_categorizer.categorize_batch(articles)
    else:
        print("   Using fallback keyword-based categorization...")
        categorized = []
        for article in articles:
            article['region'] = ai_categorizer._guess_region(article['title']) if ai_categorizer else 'GLOBAL'
            article['priority'] = ai_categorizer._guess_priority(article['title']) if ai_categorizer else 'STANDARD'
            article['topic'] = ''
            categorized.append(article)

    # Print summary
    print("\nüìä Categorization Summary:")
    regions_count = {}
    priorities_count = {}
    for a in categorized:
        r = a.get('region', 'GLOBAL')
        p = a.get('priority', 'STANDARD')
        regions_count[r] = regions_count.get(r, 0) + 1
        priorities_count[p] = priorities_count.get(p, 0) + 1

    print(f"   By Region: {regions_count}")
    print(f"   By Priority: {priorities_count}")

    # Step 3: Generate HTML
    print("\nüé® STEP 3: Generating Website...")
    html = generator.generate(categorized)
    print(f"   Generated {len(html):,} bytes of HTML")

    # Step 4: Save locally
    if save_local:
        local_path = OUTPUT_FILENAME
        with open(local_path, 'w', encoding='utf-8') as f:
            f.write(html)
        print(f"   ‚úì Saved to: {local_path}")

    # Step 5: Deploy to GitHub Pages
    if deploy and deployer:
        print("\nüöÄ STEP 4: Deploying to GitHub Pages...")
        url = deployer.deploy(html, OUTPUT_FILENAME)
        if url:
            print(f"\n‚úÖ DEPLOYMENT SUCCESSFUL!")
            print(f"   üåê Live at: {url}")
    else:
        print("\n‚ö†Ô∏è Skipping GitHub deployment (no token configured)")

    # Display in Colab
    print("\n" + "="*70)
    print("‚úÖ AGGREGATION COMPLETE!")
    print(f"   Total headlines: {len(categorized)}")
    print("="*70)

    return html, categorized

# Run the aggregator
html_output, articles_data = run_aggregator(deploy=False, save_local=True)

## üëÄ Step 9: Preview the Website

Display the generated website directly in Colab.

In [None]:
from IPython.display import HTML, display

# Display in an iframe
if html_output:
    display(HTML(f'''
    <div style="border: 1px solid #333; border-radius: 8px; overflow: hidden; margin: 20px 0;">
        <iframe srcdoc="{html_output.replace('"', '&quot;')}" 
                style="width: 100%; height: 800px; border: none;"
                sandbox="allow-same-origin allow-scripts allow-popups allow-forms">
        </iframe>
    </div>
    '''))
    print("üëÜ Preview above. Scroll to explore all sections.")

## ‚è∞ Step 10: Schedule Automatic Updates (Optional)

Set up automatic updates using Colab's scheduling or external services.

In [None]:
import time

def scheduled_run(interval_minutes: int = 60, max_runs: int = 24):
    """
    Run the aggregator on a schedule.

    Args:
        interval_minutes: Time between updates
        max_runs: Maximum number of runs before stopping
    """
    print(f"üïê Starting scheduled runs every {interval_minutes} minutes")
    print(f"   Max runs: {max_runs}")
    print(f"   Press Runtime > Interrupt to stop\n")

    for i in range(max_runs):
        print(f"\n{'='*50}")
        print(f"RUN {i+1}/{max_runs}")
        print(f"{'='*50}")

        try:
            run_aggregator(deploy=True, save_local=True)
        except Exception as e:
            print(f"\n‚ùå Error in run {i+1}: {e}")

        if i < max_runs - 1:
            next_run = datetime.now() + timedelta(minutes=interval_minutes)
            print(f"\n‚è∞ Next run at: {next_run.strftime('%H:%M:%S')}")
            time.sleep(interval_minutes * 60)

    print("\n‚úÖ Scheduled runs complete!")

# Uncomment to run on schedule:
# scheduled_run(interval_minutes=60, max_runs=24)

## üì• Step 11: Download the HTML File

In [None]:
try:
    from google.colab import files
    files.download(OUTPUT_FILENAME)
    print(f"‚úÖ Downloaded: {OUTPUT_FILENAME}")
except:
    print(f"üìÅ File saved locally: {OUTPUT_FILENAME}")
    print("   (Download manually if not in Colab)")

---

## üìñ Usage Guide

### Quick Start
1. Set your `ANTHROPIC_API_KEY` in Colab secrets
2. Run all cells (Runtime > Run all)
3. Preview your site in Step 9
4. Download in Step 11

### GitHub Pages Deployment
1. Create a GitHub Personal Access Token with `repo` scope
2. Add as `GITHUB_TOKEN` in Colab secrets
3. Update `GITHUB_REPO` with your repository
4. Enable GitHub Pages in repo settings (source: `gh-pages` branch)

### Customization
- Edit `NEWS_SOURCES` to add/remove news sources
- Modify `WebsiteGenerator` CSS for different themes
- Adjust `PRIORITY_KEYWORDS` for different focus areas

### Tips
- Without an Anthropic API key, the system uses keyword-based categorization (less accurate)
- Run during off-peak hours for faster RSS fetching
- Use scheduled runs for continuous updates

---

*Built with ‚ù§Ô∏è for geopolitics enthusiasts*