<a href="https://colab.research.google.com/github/ianellisjones/usn/blob/main/Geopolitics_News_Aggregator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IEJ Intel Brief - News Aggregator

**Drudge-Style AI-Powered Geopolitics Intelligence**

Dense, single-page news aggregation focused on geopolitics, defense, and global security.

### Layout (3x3 Grid)
```
+---------+---------+---------+
|  GLOBE  |   US    | EUROPE  |
+---------+---------+---------+
|  ASIA   |   IEJ   | MIDEAST |
+---------+---------+---------+
|CONFLICTS| CYBER   | FINANCE |
+---------+---------+---------+
```

### Sections
- **Globe**: International diplomacy, global institutions
- **US**: Pentagon, Washington, domestic policy  
- **Europe**: NATO, EU, UK, Russia, Ukraine
- **Asia**: China, Taiwan, Japan, Korea, Indo-Pacific
- **Middle East**: Israel, Iran, Gaza, Yemen, Syria
- **Conflicts**: Active warfare, strikes, combat
- **Cybersecurity**: Cyber attacks, hacking, digital warfare
- **Finance**: Defense contracts, sanctions, arms deals

---

## üì¶ Step 1: Install Dependencies

In [None]:
%%shell
# Install required packages
pip install feedparser anthropic requests beautifulsoup4 newspaper3k lxml_html_clean python-dateutil pytz --quiet

# For GitHub deployment
pip install PyGithub --quiet

## üîë Step 2: Configuration

Set your API keys and preferences here. You can store these in Colab secrets for security.

In [None]:
import os
from datetime import datetime
import pytz

# =============================================================================
# CONFIGURATION - Edit these values
# =============================================================================

# Option 1: Use Colab secrets (recommended)
try:
    from google.colab import userdata
    ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')
    GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')  # Optional: for auto-deploy
except:
    # Option 2: Set directly (not recommended for production)
    ANTHROPIC_API_KEY = "your-anthropic-api-key-here"
    GITHUB_TOKEN = None  # Optional

# Site Configuration
SITE_TITLE = "IEJ"
SITE_SUBTITLE = "Global Intelligence Briefing"
TIMEZONE = "US/Eastern"

# GitHub Pages Configuration (optional)
GITHUB_REPO = "ianellisjones/usn"  # Format: username/repo
GITHUB_BRANCH = "gh-pages"
OUTPUT_FILENAME = "index.html"

# Layout Settings - Dense like Drudge
HEADLINES_PER_SECTION = 8  # 6 sections x 8 = ~48 headlines
TOTAL_HEADLINE_TARGET = 50

print(f"Configuration loaded")
print(f"   Site: {SITE_TITLE}")
print(f"   Timezone: {TIMEZONE}")
print(f"   API Key: {'Set' if ANTHROPIC_API_KEY and ANTHROPIC_API_KEY != 'your-anthropic-api-key-here' else 'NOT SET'}")

## üì∞ Step 3: News Sources Database

Curated list of premium geopolitics and defense news sources.

In [None]:
# =============================================================================
# NEWS SOURCES - Organized by category
# =============================================================================

NEWS_SOURCES = {
    # === DEFENSE & MILITARY ===
    "defense": [
        {"name": "Defense News", "url": "https://www.defensenews.com/arc/outboundfeeds/rss/?outputType=xml", "type": "rss"},
        {"name": "Breaking Defense", "url": "https://breakingdefense.com/feed/", "type": "rss"},
        {"name": "Defense One", "url": "https://www.defenseone.com/rss/all/", "type": "rss"},
        {"name": "Military Times", "url": "https://www.militarytimes.com/arc/outboundfeeds/rss/?outputType=xml", "type": "rss"},
        {"name": "USNI News", "url": "https://news.usni.org/feed", "type": "rss"},
        {"name": "War on the Rocks", "url": "https://warontherocks.com/feed/", "type": "rss"},
        {"name": "The War Zone", "url": "https://www.thedrive.com/the-war-zone/feed", "type": "rss"},
        {"name": "Naval News", "url": "https://www.navalnews.com/feed/", "type": "rss"},
        {"name": "Air & Space Forces", "url": "https://www.airandspaceforces.com/feed/", "type": "rss"},
        {"name": "Stars and Stripes", "url": "https://www.stripes.com/rss", "type": "rss"},
        {"name": "Janes", "url": "https://www.janes.com/feeds/news", "type": "rss"},
    ],

    # === GEOPOLITICS & FOREIGN POLICY ===
    "geopolitics": [
        {"name": "Foreign Affairs", "url": "https://www.foreignaffairs.com/rss.xml", "type": "rss"},
        {"name": "Foreign Policy", "url": "https://foreignpolicy.com/feed/", "type": "rss"},
        {"name": "The Diplomat", "url": "https://thediplomat.com/feed/", "type": "rss"},
        {"name": "CSIS", "url": "https://www.csis.org/analysis/feed", "type": "rss"},
        {"name": "Brookings", "url": "https://www.brookings.edu/feed/", "type": "rss"},
        {"name": "RAND", "url": "https://www.rand.org/news/press.xml", "type": "rss"},
        {"name": "Carnegie Endowment", "url": "https://carnegieendowment.org/rss/solr/?fa=feeds", "type": "rss"},
        {"name": "Council on Foreign Relations", "url": "https://www.cfr.org/rss/expert-brief", "type": "rss"},
        {"name": "Atlantic Council", "url": "https://www.atlanticcouncil.org/feed/", "type": "rss"},
    ],

    # === WIRE SERVICES & MAJOR NEWS ===
    "wire": [
        {"name": "Reuters World", "url": "https://www.reutersagency.com/feed/?taxonomy=best-topics&post_type=best", "type": "rss"},
        {"name": "AP News", "url": "https://rsshub.app/apnews/topics/world-news", "type": "rss"},
        {"name": "BBC World", "url": "http://feeds.bbci.co.uk/news/world/rss.xml", "type": "rss"},
        {"name": "Al Jazeera", "url": "https://www.aljazeera.com/xml/rss/all.xml", "type": "rss"},
        {"name": "France 24", "url": "https://www.france24.com/en/rss", "type": "rss"},
        {"name": "DW News", "url": "https://rss.dw.com/rdf/rss-en-all", "type": "rss"},
    ],

    # === REGIONAL SPECIALISTS ===
    "regional": [
        {"name": "South China Morning Post", "url": "https://www.scmp.com/rss/91/feed", "type": "rss"},
        {"name": "Nikkei Asia", "url": "https://asia.nikkei.com/rss/feed/nar", "type": "rss"},
        {"name": "The Moscow Times", "url": "https://www.themoscowtimes.com/rss/news", "type": "rss"},
        {"name": "Times of Israel", "url": "https://www.timesofisrael.com/feed/", "type": "rss"},
        {"name": "Middle East Eye", "url": "https://www.middleeasteye.net/rss", "type": "rss"},
        {"name": "Kyiv Independent", "url": "https://kyivindependent.com/feed/", "type": "rss"},
        {"name": "ISW", "url": "https://www.understandingwar.org/rss.xml", "type": "rss"},
    ],

    # === INTELLIGENCE & SECURITY ===
    "intel": [
        {"name": "Bellingcat", "url": "https://www.bellingcat.com/feed/", "type": "rss"},
        {"name": "The Intercept", "url": "https://theintercept.com/feed/?rss", "type": "rss"},
        {"name": "Lawfare", "url": "https://www.lawfaremedia.org/rss.xml", "type": "rss"},
        {"name": "Just Security", "url": "https://www.justsecurity.org/feed/", "type": "rss"},
    ],
}

# Count total sources
total_sources = sum(len(sources) for sources in NEWS_SOURCES.values())
print(f"üì∞ Loaded {total_sources} news sources across {len(NEWS_SOURCES)} categories")

## üîÑ Step 4: News Fetcher Engine

Core engine for fetching and parsing news from multiple sources.

In [None]:
import feedparser
import requests
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
from dateutil import parser as date_parser
import time
import re
from typing import List, Dict, Optional
from concurrent.futures import ThreadPoolExecutor, as_completed
import hashlib

class NewsFetcher:
    """Multi-source news aggregation engine."""

    def __init__(self, sources: Dict):
        self.sources = sources
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
            'Accept': 'application/rss+xml, application/xml, text/xml, */*',
        }
        self.session = requests.Session()
        self.session.headers.update(self.headers)

    def fetch_rss(self, source: Dict) -> List[Dict]:
        """Fetch and parse RSS feed."""
        articles = []
        try:
            response = self.session.get(source['url'], timeout=15)
            feed = feedparser.parse(response.content)

            for entry in feed.entries[:20]:  # Limit per source
                # Parse publication date
                pub_date = None
                for date_field in ['published', 'pubDate', 'updated', 'created']:
                    if hasattr(entry, date_field) and getattr(entry, date_field):
                        try:
                            pub_date = date_parser.parse(getattr(entry, date_field))
                            break
                        except:
                            continue

                # Skip articles older than 48 hours
                if pub_date:
                    if pub_date.tzinfo is None:
                        pub_date = pub_date.replace(tzinfo=pytz.UTC)
                    age = datetime.now(pytz.UTC) - pub_date
                    if age > timedelta(hours=48):
                        continue

                # Clean title
                title = entry.get('title', '').strip()
                title = re.sub(r'\s+', ' ', title)

                # Get description/summary
                description = entry.get('summary', entry.get('description', ''))
                if description:
                    description = BeautifulSoup(description, 'html.parser').get_text()
                    description = re.sub(r'\s+', ' ', description).strip()[:300]

                if title and len(title) > 10:
                    articles.append({
                        'title': title,
                        'url': entry.get('link', ''),
                        'source': source['name'],
                        'published': pub_date,
                        'description': description,
                        'id': hashlib.md5(title.encode()).hexdigest()[:8]
                    })

        except Exception as e:
            print(f"   ‚ö†Ô∏è Error fetching {source['name']}: {str(e)[:50]}")

        return articles

    def fetch_all(self, max_workers: int = 10) -> List[Dict]:
        """Fetch from all sources concurrently."""
        all_articles = []
        all_sources = []

        # Flatten sources
        for category, sources in self.sources.items():
            for source in sources:
                source['category'] = category
                all_sources.append(source)

        print(f"\nüîÑ Fetching from {len(all_sources)} sources...\n")

        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            future_to_source = {
                executor.submit(self.fetch_rss, source): source
                for source in all_sources if source['type'] == 'rss'
            }

            for i, future in enumerate(as_completed(future_to_source)):
                source = future_to_source[future]
                try:
                    articles = future.result()
                    if articles:
                        all_articles.extend(articles)
                        print(f"   ‚úì {source['name']}: {len(articles)} articles")
                    else:
                        print(f"   ‚óã {source['name']}: No recent articles")
                except Exception as e:
                    print(f"   ‚úó {source['name']}: Failed")

        # Deduplicate by title similarity
        seen_titles = set()
        unique_articles = []
        for article in all_articles:
            title_key = re.sub(r'[^a-z0-9]', '', article['title'].lower())[:50]
            if title_key not in seen_titles:
                seen_titles.add(title_key)
                unique_articles.append(article)

        print(f"\nüìä Total: {len(unique_articles)} unique articles (from {len(all_articles)} raw)")
        return unique_articles

# Initialize fetcher
fetcher = NewsFetcher(NEWS_SOURCES)
print("‚úÖ News Fetcher initialized")

## ü§ñ Step 5: AI Categorization Engine

Uses Claude AI to intelligently categorize and prioritize headlines.

In [None]:
import anthropic
import json

class AICategorizor:
    """AI-powered news categorization and prioritization."""

    # 8 sections for the 3x3 grid (center is IEJ logo)
    SECTIONS = [
        "GLOBE",         # Global/international stories
        "US",            # United States
        "EUROPE",        # Europe including Russia/Ukraine
        "ASIA",          # Asia-Pacific region
        "MIDDLE_EAST",   # Middle East
        "CONFLICTS",     # Active conflicts, military action
        "CYBERSECURITY", # Cyber attacks, hacking, digital warfare
        "FINANCE",       # Defense industry, sanctions, economic warfare
    ]

    PRIORITIES = ["BREAKING", "HIGH", "STANDARD"]

    def __init__(self, api_key: str):
        self.client = anthropic.Anthropic(api_key=api_key)

    def categorize_batch(self, articles: List[Dict], batch_size: int = 30) -> List[Dict]:
        """Categorize articles in batches for efficiency."""
        categorized = []

        for i in range(0, len(articles), batch_size):
            batch = articles[i:i+batch_size]
            print(f"   Processing batch {i//batch_size + 1}/{(len(articles)-1)//batch_size + 1}...")

            headlines_text = "\n".join([
                f"{j+1}. [{a['source']}] {a['title']}"
                for j, a in enumerate(batch)
            ])

            prompt = f"""Categorize these geopolitics/defense headlines into sections.

SECTIONS (pick ONE per headline):
- GLOBE: International diplomacy, global institutions, multi-region stories, UN, treaties
- US: United States domestic, Pentagon, Washington policy, Congress, military branches
- EUROPE: European nations, NATO, EU, UK, Russia, Ukraine conflict, Balkans
- ASIA: China, Taiwan, Japan, Korea, Indo-Pacific, ASEAN, Australia, India
- MIDDLE_EAST: Israel, Iran, Gaza, Yemen, Syria, Iraq, Gulf states, Lebanon
- CONFLICTS: Active warfare, military strikes, battles, combat operations, casualties
- CYBERSECURITY: Cyber attacks, hacking, data breaches, digital warfare, ransomware, espionage
- FINANCE: Defense contracts, sanctions, economic warfare, arms deals, military spending, trade wars

PRIORITY:
- BREAKING: Major developing events, escalations
- HIGH: Important developments  
- STANDARD: Regular news

Headlines:
{headlines_text}

Return ONLY a JSON array:
[{{"index": 1, "section": "SECTION", "priority": "PRIORITY"}}, ...]"""

            try:
                response = self.client.messages.create(
                    model="claude-sonnet-4-20250514",
                    max_tokens=2000,
                    messages=[{"role": "user", "content": prompt}]
                )

                response_text = response.content[0].text.strip()
                json_match = re.search(r'\[.*\]', response_text, re.DOTALL)
                
                if json_match:
                    results = json.loads(json_match.group())
                    for result in results:
                        idx = result.get('index', 0) - 1
                        if 0 <= idx < len(batch):
                            article = batch[idx].copy()
                            article['section'] = result.get('section', 'GLOBE')
                            article['priority'] = result.get('priority', 'STANDARD')
                            categorized.append(article)
                else:
                    for article in batch:
                        article['section'] = self._guess_section(article['title'])
                        article['priority'] = self._guess_priority(article['title'])
                        categorized.append(article)

            except Exception as e:
                print(f"   AI error: {e}")
                for article in batch:
                    article['section'] = self._guess_section(article['title'])
                    article['priority'] = self._guess_priority(article['title'])
                    categorized.append(article)

            time.sleep(0.3)

        return categorized

    def _guess_section(self, title: str) -> str:
        """Fallback section detection based on keywords."""
        t = title.lower()
        
        # Cybersecurity
        if any(kw in t for kw in ['cyber', 'hack', 'breach', 'ransomware', 'malware', 'phishing', 'data leak', 'encryption', 'zero-day', 'vulnerability', 'apt', 'espionage']):
            return 'CYBERSECURITY'
        
        # Finance/Economic
        if any(kw in t for kw in ['sanction', 'contract', 'billion', 'million', 'defense budget', 'arms deal', 'lockheed', 'raytheon', 'boeing', 'northrop', 'trade war', 'tariff', 'economic']):
            return 'FINANCE'
        
        # Conflicts (active combat)
        if any(kw in t for kw in ['strike', 'attack', 'combat', 'battle', 'offensive', 'airstrike', 'missile strike', 'killed', 'casualties', 'wounded', 'bombing', 'shelling']):
            return 'CONFLICTS'
        
        # Regional matching
        if any(kw in t for kw in ['pentagon', 'washington', 'congress', 'u.s.', 'us ', 'american', 'biden', 'trump', 'white house', 'state department', 'cia', 'fbi']):
            return 'US'
        if any(kw in t for kw in ['china', 'taiwan', 'japan', 'korea', 'beijing', 'tokyo', 'pacific', 'indo-pacific', 'asean', 'philippines', 'vietnam', 'australia', 'india', 'modi']):
            return 'ASIA'
        if any(kw in t for kw in ['europe', 'nato', 'eu ', 'ukraine', 'russia', 'moscow', 'kyiv', 'britain', 'uk ', 'france', 'germany', 'poland', 'baltic', 'putin', 'zelensky']):
            return 'EUROPE'
        if any(kw in t for kw in ['israel', 'iran', 'gaza', 'hamas', 'hezbollah', 'yemen', 'houthi', 'syria', 'iraq', 'saudi', 'gulf', 'lebanon', 'tehran', 'netanyahu']):
            return 'MIDDLE_EAST'
        
        return 'GLOBE'

    def _guess_priority(self, title: str) -> str:
        """Fallback priority detection."""
        t = title.lower()
        if any(kw in t for kw in ['breaking', 'urgent', 'just in', 'developing', 'live:']):
            return 'BREAKING'
        if any(kw in t for kw in ['attack', 'strike', 'kills', 'dead', 'explosion', 'invasion', 'launches']):
            return 'HIGH'
        return 'STANDARD'

# Initialize
if ANTHROPIC_API_KEY and ANTHROPIC_API_KEY != 'your-anthropic-api-key-here':
    ai_categorizer = AICategorizor(ANTHROPIC_API_KEY)
    print("AI Categorizer initialized with Claude API")
else:
    ai_categorizer = AICategorizor("dummy")
    print("AI Categorizer using keyword fallback (no API key)")

## üé® Step 6: Website Generator

Generates a modern, responsive HTML website with the aggregated news.

In [None]:
class WebsiteGenerator:
    """Generates Drudge-style dark theme news site with IEJ branding in center."""

    def __init__(self, site_title: str, site_subtitle: str, timezone: str):
        self.site_title = site_title
        self.site_subtitle = site_subtitle
        self.tz = pytz.timezone(timezone)

    def generate(self, articles: List[Dict]) -> str:
        """Generate dark theme HTML page with 3x3 grid, IEJ in center."""
        now = datetime.now(self.tz)
        date_display = now.strftime("%B %d, %Y")
        time_display = now.strftime("%I:%M %p %Z")

        # Organize articles by section
        sections = self._organize_by_section(articles)

        html = f'''<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="description" content="{self.site_subtitle}">
    <title>{self.site_title} - {self.site_subtitle}</title>
    <link rel="preconnect" href="https://fonts.googleapis.com">
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600&display=swap" rel="stylesheet">
    <style>
        * {{ margin: 0; padding: 0; box-sizing: border-box; }}
        
        body {{
            font-family: 'Inter', Arial, sans-serif;
            font-size: 13px;
            line-height: 1.35;
            background: #0d1117;
            color: #c9d1d9;
            min-height: 100vh;
        }}

        /* Main grid - 3x3 with IEJ in center */
        .main-grid {{
            display: grid;
            grid-template-columns: 1fr 1fr 1fr;
            min-height: 100vh;
        }}

        /* Section styling */
        .section {{
            border-right: 1px solid #21262d;
            border-bottom: 1px solid #21262d;
            display: flex;
            flex-direction: column;
        }}

        .section:nth-child(3n) {{
            border-right: none;
        }}

        .section:nth-child(7),
        .section:nth-child(8),
        .section:nth-child(9) {{
            border-bottom: none;
        }}

        .section-header {{
            background: #161b22;
            color: #8b949e;
            padding: 10px 12px;
            font-size: 11px;
            font-weight: 600;
            text-transform: uppercase;
            letter-spacing: 1.5px;
            border-bottom: 1px solid #21262d;
        }}

        /* Center IEJ cell */
        .center-cell {{
            display: flex;
            flex-direction: column;
            align-items: center;
            justify-content: center;
            background: linear-gradient(135deg, #0d1117 0%, #161b22 50%, #0d1117 100%);
            border-right: 1px solid #21262d;
            border-bottom: 1px solid #21262d;
        }}

        .center-cell .logo {{
            font-family: 'Inter', Arial, sans-serif;
            font-size: 56px;
            font-weight: 600;
            color: #58a6ff;
            letter-spacing: 8px;
            text-shadow: 0 0 30px rgba(88, 166, 255, 0.3);
        }}

        .center-cell .tagline {{
            font-size: 10px;
            color: #8b949e;
            text-transform: uppercase;
            letter-spacing: 3px;
            margin-top: 8px;
        }}

        .center-cell .date {{
            font-size: 11px;
            color: #58a6ff;
            margin-top: 12px;
            font-weight: 500;
        }}

        .center-cell .time {{
            font-size: 10px;
            color: #6e7681;
            margin-top: 4px;
        }}

        /* Headlines list */
        .headlines {{
            padding: 10px;
            flex: 1;
            overflow-y: auto;
        }}

        .headline {{
            margin-bottom: 10px;
            padding-bottom: 10px;
            border-bottom: 1px solid #21262d;
        }}

        .headline:last-child {{
            border-bottom: none;
            margin-bottom: 0;
            padding-bottom: 0;
        }}

        .headline a {{
            color: #c9d1d9;
            text-decoration: none;
            font-size: 13px;
            line-height: 1.4;
            display: block;
            transition: color 0.15s ease;
        }}

        .headline a:hover {{
            color: #58a6ff;
        }}

        .headline.breaking a {{
            color: #f85149;
            font-weight: 500;
        }}

        .headline.high a {{
            color: #d29922;
        }}

        .headline .src {{
            font-size: 10px;
            color: #6e7681;
            text-transform: uppercase;
            margin-top: 3px;
            letter-spacing: 0.5px;
        }}

        /* Footer */
        .footer {{
            grid-column: 1 / -1;
            text-align: center;
            padding: 12px;
            background: #161b22;
            border-top: 1px solid #21262d;
            font-size: 10px;
            color: #6e7681;
        }}

        /* Responsive */
        @media (max-width: 900px) {{
            .main-grid {{
                grid-template-columns: 1fr;
            }}
            .section {{
                border-right: none;
            }}
            .center-cell .logo {{
                font-size: 42px;
            }}
        }}

        /* Scrollbar styling */
        .headlines::-webkit-scrollbar {{
            width: 6px;
        }}
        .headlines::-webkit-scrollbar-track {{
            background: #0d1117;
        }}
        .headlines::-webkit-scrollbar-thumb {{
            background: #30363d;
            border-radius: 3px;
        }}
    </style>
</head>
<body>
    <div class="main-grid">
        <!-- Row 1: Globe | US | Europe -->
        {self._generate_section('GLOBE', sections.get('GLOBE', []))}
        {self._generate_section('US', sections.get('US', []))}
        {self._generate_section('EUROPE', sections.get('EUROPE', []))}
        
        <!-- Row 2: Asia | IEJ Center | Middle East -->
        {self._generate_section('ASIA', sections.get('ASIA', []))}
        
        <div class="center-cell">
            <div class="logo">IEJ</div>
            <div class="tagline">Intel Brief</div>
            <div class="date">{date_display}</div>
            <div class="time">{time_display}</div>
        </div>
        
        {self._generate_section('MIDDLE EAST', sections.get('MIDDLE_EAST', []))}
        
        <!-- Row 3: Conflicts | Cybersecurity | Finance -->
        {self._generate_section('CONFLICTS', sections.get('CONFLICTS', []))}
        {self._generate_section('CYBERSECURITY', sections.get('CYBERSECURITY', []))}
        {self._generate_section('FINANCE', sections.get('FINANCE', []))}
    </div>

    <footer class="footer">
        {len(set(a.get('source', '') for a in articles))} Sources | AI-Aggregated | {time_display}
    </footer>
</body>
</html>'''

        return html

    def _organize_by_section(self, articles: List[Dict]) -> Dict:
        """Organize articles by section."""
        sections = {s: [] for s in ['GLOBE', 'US', 'EUROPE', 'ASIA', 'MIDDLE_EAST', 'CONFLICTS', 'CYBERSECURITY', 'FINANCE']}
        
        priority_order = {'BREAKING': 0, 'HIGH': 1, 'STANDARD': 2}
        sorted_articles = sorted(
            articles,
            key=lambda x: (
                priority_order.get(x.get('priority', 'STANDARD'), 2),
                -(x.get('published') or datetime.min.replace(tzinfo=pytz.UTC)).timestamp()
            )
        )
        
        for article in sorted_articles:
            section = article.get('section', 'GLOBE')
            if section in sections and len(sections[section]) < HEADLINES_PER_SECTION:
                sections[section].append(article)
        
        return sections

    def _generate_section(self, title: str, articles: List[Dict]) -> str:
        """Generate a section with headlines."""
        headlines_html = ''
        for article in articles:
            priority_class = ''
            if article.get('priority') == 'BREAKING':
                priority_class = 'breaking'
            elif article.get('priority') == 'HIGH':
                priority_class = 'high'
            
            headlines_html += f'''
            <div class="headline {priority_class}">
                <a href="{article.get('url', '#')}" target="_blank">{article.get('title', 'Untitled')}</a>
                <div class="src">{article.get('source', '')}</div>
            </div>'''
        
        if not headlines_html:
            headlines_html = '<div class="headline"><span style="color:#6e7681;">No recent headlines</span></div>'
        
        return f'''
        <div class="section">
            <div class="section-header">{title}</div>
            <div class="headlines">{headlines_html}</div>
        </div>'''

# Initialize generator
generator = WebsiteGenerator(SITE_TITLE, SITE_SUBTITLE, TIMEZONE)
print("Website Generator initialized - Dark theme 3x3 grid")

## üöÄ Step 7: GitHub Pages Deployment

Automatically deploy the generated website to GitHub Pages.

In [None]:
from github import Github
import base64

class GitHubDeployer:
    """Deploy generated website to GitHub Pages."""

    def __init__(self, token: str, repo_name: str, branch: str = "gh-pages"):
        self.github = Github(token)
        self.repo = self.github.get_repo(repo_name)
        self.branch = branch

    def deploy(self, html_content: str, filename: str = "index.html") -> str:
        """Deploy HTML file to GitHub Pages."""
        try:
            # Check if branch exists
            try:
                self.repo.get_branch(self.branch)
            except:
                # Create branch from main/master
                default_branch = self.repo.default_branch
                source = self.repo.get_branch(default_branch)
                self.repo.create_git_ref(
                    ref=f"refs/heads/{self.branch}",
                    sha=source.commit.sha
                )
                print(f"   Created branch: {self.branch}")

            # Check if file exists
            try:
                file = self.repo.get_contents(filename, ref=self.branch)
                # Update existing file
                self.repo.update_file(
                    path=filename,
                    message=f"Update {filename} - {datetime.now().strftime('%Y-%m-%d %H:%M')}",
                    content=html_content,
                    sha=file.sha,
                    branch=self.branch
                )
                print(f"   ‚úì Updated {filename}")
            except:
                # Create new file
                self.repo.create_file(
                    path=filename,
                    message=f"Create {filename} - {datetime.now().strftime('%Y-%m-%d %H:%M')}",
                    content=html_content,
                    branch=self.branch
                )
                print(f"   ‚úì Created {filename}")

            # Return the GitHub Pages URL
            owner = self.repo.owner.login
            repo_name = self.repo.name
            return f"https://{owner}.github.io/{repo_name}/"

        except Exception as e:
            print(f"   ‚ö†Ô∏è Deploy error: {e}")
            return None

# Initialize deployer if token is available
if GITHUB_TOKEN:
    deployer = GitHubDeployer(GITHUB_TOKEN, GITHUB_REPO, GITHUB_BRANCH)
    print("‚úÖ GitHub Deployer initialized")
else:
    deployer = None
    print("‚ö†Ô∏è GitHub Deployer not initialized - will save locally only")

## ‚ñ∂Ô∏è Step 8: Run the Aggregator

Execute the complete pipeline: Fetch ‚Üí Categorize ‚Üí Generate ‚Üí Deploy

In [None]:
def run_aggregator(deploy: bool = True, save_local: bool = True):
    """
    Run the complete news aggregation pipeline.

    Args:
        deploy: Whether to deploy to GitHub Pages
        save_local: Whether to save HTML file locally
    """
    print("="*60)
    print("IEJ GEOPOLITICS NEWS AGGREGATOR")
    print(f"Started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print("="*60)

    # Step 1: Fetch news
    print("\n[1/4] Fetching News...")
    articles = fetcher.fetch_all()

    if not articles:
        print("\nNo articles fetched. Check your internet connection.")
        return None, None

    # Step 2: AI Categorization
    print("\n[2/4] Categorizing Headlines...")
    if ANTHROPIC_API_KEY and ANTHROPIC_API_KEY != 'your-anthropic-api-key-here':
        categorized = ai_categorizer.categorize_batch(articles)
    else:
        print("   Using keyword-based categorization...")
        categorized = []
        for article in articles:
            article['section'] = ai_categorizer._guess_section(article['title'])
            article['priority'] = ai_categorizer._guess_priority(article['title'])
            categorized.append(article)

    # Print summary
    print("\nCategorization Summary:")
    sections_count = {}
    for a in categorized:
        s = a.get('section', 'GLOBE')
        sections_count[s] = sections_count.get(s, 0) + 1
    
    for section, count in sorted(sections_count.items()):
        print(f"   {section}: {count}")

    # Step 3: Generate HTML
    print("\n[3/4] Generating Website...")
    html = generator.generate(categorized)
    print(f"   Generated {len(html):,} bytes")

    # Step 4: Save locally
    if save_local:
        with open(OUTPUT_FILENAME, 'w', encoding='utf-8') as f:
            f.write(html)
        print(f"   Saved to: {OUTPUT_FILENAME}")

    # Step 5: Deploy to GitHub Pages
    if deploy and deployer:
        print("\n[4/4] Deploying to GitHub Pages...")
        url = deployer.deploy(html, OUTPUT_FILENAME)
        if url:
            print(f"\nDEPLOYMENT SUCCESSFUL!")
            print(f"Live at: {url}")
    else:
        print("\n[4/4] Skipping deployment (no token)")

    print("\n" + "="*60)
    print(f"COMPLETE - {len(categorized)} headlines aggregated")
    print("="*60)

    return html, categorized

# Run the aggregator
html_output, articles_data = run_aggregator(deploy=False, save_local=True)

## üëÄ Step 9: Preview the Website

Display the generated website directly in Colab.

In [None]:
from IPython.display import HTML, display

# Display in an iframe
if html_output:
    display(HTML(f'''
    <div style="border: 1px solid #333; border-radius: 8px; overflow: hidden; margin: 20px 0;">
        <iframe srcdoc="{html_output.replace('"', '&quot;')}" 
                style="width: 100%; height: 800px; border: none;"
                sandbox="allow-same-origin allow-scripts allow-popups allow-forms">
        </iframe>
    </div>
    '''))
    print("üëÜ Preview above. Scroll to explore all sections.")

## ‚è∞ Step 10: Schedule Automatic Updates (Optional)

Set up automatic updates using Colab's scheduling or external services.

In [None]:
import time

def scheduled_run(interval_minutes: int = 60, max_runs: int = 24):
    """
    Run the aggregator on a schedule.

    Args:
        interval_minutes: Time between updates
        max_runs: Maximum number of runs before stopping
    """
    print(f"üïê Starting scheduled runs every {interval_minutes} minutes")
    print(f"   Max runs: {max_runs}")
    print(f"   Press Runtime > Interrupt to stop\n")

    for i in range(max_runs):
        print(f"\n{'='*50}")
        print(f"RUN {i+1}/{max_runs}")
        print(f"{'='*50}")

        try:
            run_aggregator(deploy=True, save_local=True)
        except Exception as e:
            print(f"\n‚ùå Error in run {i+1}: {e}")

        if i < max_runs - 1:
            next_run = datetime.now() + timedelta(minutes=interval_minutes)
            print(f"\n‚è∞ Next run at: {next_run.strftime('%H:%M:%S')}")
            time.sleep(interval_minutes * 60)

    print("\n‚úÖ Scheduled runs complete!")

# Uncomment to run on schedule:
# scheduled_run(interval_minutes=60, max_runs=24)

## üì• Step 11: Download the HTML File

In [None]:
try:
    from google.colab import files
    files.download(OUTPUT_FILENAME)
    print(f"‚úÖ Downloaded: {OUTPUT_FILENAME}")
except:
    print(f"üìÅ File saved locally: {OUTPUT_FILENAME}")
    print("   (Download manually if not in Colab)")

---

## üìñ Usage Guide

### Quick Start
1. Set your `ANTHROPIC_API_KEY` in Colab secrets
2. Run all cells (Runtime > Run all)
3. Preview your site in Step 9
4. Download in Step 11

### GitHub Pages Deployment
1. Create a GitHub Personal Access Token with `repo` scope
2. Add as `GITHUB_TOKEN` in Colab secrets
3. Update `GITHUB_REPO` with your repository
4. Enable GitHub Pages in repo settings (source: `gh-pages` branch)

### Customization
- Edit `NEWS_SOURCES` to add/remove news sources
- Modify `WebsiteGenerator` CSS for different themes
- Adjust `PRIORITY_KEYWORDS` for different focus areas

### Tips
- Without an Anthropic API key, the system uses keyword-based categorization (less accurate)
- Run during off-peak hours for faster RSS fetching
- Use scheduled runs for continuous updates

---

*Built with ‚ù§Ô∏è for geopolitics enthusiasts*