Skip to content

indiser/Bharat-News-API

Repository files navigation

🇮🇳 Bharat News API

Real-time geospatial news intelligence for India — A serverless FastAPI pipeline that aggregates 39 RSS feeds, processes 1000+ articles hourly, and maps breaking news to Indian states with sub-second latency via GitHub Actions automation.

Deploy with Vercel Python 3.9+ FastAPI License: MIT


🎯 What Makes This Special

This isn't just another news aggregator. It's a production-grade geospatial intelligence system that:

  • Processes 39 RSS feeds concurrently using async/await patterns
  • 🗺️ Maps news to 36 Indian states/UTs with city-level granularity (600+ cities)
  • 🔄 Auto-updates every hour via GitHub Actions (zero manual intervention)
  • 🚀 Serverless architecture — scales from 0 to 1M requests seamlessly
  • 📊 Powers real-time heatmaps with Plotly integration
  • 🔒 Secured cron endpoints with Bearer token authentication

🏗️ Architecture

┌─────────────────┐
│ GitHub Actions  │  ← Triggers every hour
│  (Cron Job)     │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────────────┐
│         FastAPI Application (main.py)           │
│  ┌──────────────────────────────────────────┐  │
│  │  /api/cron/update (Protected Endpoint)   │  │
│  └──────────────┬───────────────────────────┘  │
│                 │                                │
│                 ▼                                │
│  ┌──────────────────────────────────────────┐  │
│  │   fetch_news.py (Async Aggregator)       │  │
│  │   • 39 concurrent aiohttp requests       │  │
│  │   • Feedparser for RSS parsing           │  │
│  │   • 10s timeout per feed                 │  │
│  └──────────────┬───────────────────────────┘  │
│                 │                                │
│                 ▼                                │
│  ┌──────────────────────────────────────────┐  │
│  │  process_data_india.py (NLP Engine)      │  │
│  │   • Fuzzy matching against 600+ cities   │  │
│  │   • State-level aggregation              │  │
│  │   • JSON serialization for headlines     │  │
│  └──────────────┬───────────────────────────┘  │
│                 │                                │
│                 ▼                                │
│  ┌──────────────────────────────────────────┐  │
│  │    Neon PostgreSQL (Serverless DB)       │  │
│  │   • heatmap_data table (replace mode)    │  │
│  │   • SQLAlchemy ORM                       │  │
│  └──────────────────────────────────────────┘  │
└─────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────┐
│         Public REST Endpoints                   │
│  • GET /api/news (All states)                   │
│  • GET /api/news/{state_code} (Single state)    │
└─────────────────────────────────────────────────┘

🚀 Quick Start

Prerequisites

Python 3.9+
PostgreSQL database (Neon recommended)
Vercel account (for deployment)

Local Development

  1. Clone & Install
git clone https://github.com/indiser/Bharat-News-API.git
cd Bharat-News-Api
pip install -r requirements.txt
  1. Configure Environment
# .env
DATABASE_URL=postgresql://user:pass@host/db
CRON_SECRET=your-secret-token-here
  1. Seed Database
python seed.py
  1. Run Server
uvicorn main:app --reload

Visit http://localhost:8000/docs for interactive API documentation.


📡 API Reference

GET /api/news

Fetch all states with active news coverage.

Response:

[
  {
    "State": "Maharashtra",
    "Code": "MH",
    "Lat": 19.7515,
    "Long": 75.7139,
    "news_count": 47,
    "headlines": [
      "Mumbai Metro Line 3 Opens Tomorrow",
      "Pune Sees Record Rainfall This Season",
      "..."
    ]
  }
]

GET /api/news/{state_code}

Fetch news for a specific state (e.g., /api/news/DL for Delhi).

Response:

{
  "State": "Delhi",
  "Code": "DL",
  "Lat": 28.7041,
  "Long": 77.1025,
  "news_count": 23,
  "headlines": [
    "Delhi Air Quality Improves After Rain",
    "New Metro Route Announced"
  ]
}

GET /api/cron/update (Protected)

Triggers manual database refresh.

Headers:

Authorization: Bearer <CRON_SECRET>

🛠️ Tech Stack

Layer Technology Why?
API Framework FastAPI Async support, auto-docs, type safety
Async Runtime aiohttp + asyncio 40x faster than sequential requests
RSS Parsing feedparser Battle-tested XML/RSS parser
Database Neon PostgreSQL Serverless, auto-scaling, generous free tier
ORM SQLAlchemy Production-grade, prevents SQL injection
Deployment Vercel Zero-config serverless, built-in cron
Automation GitHub Actions Hourly cron job with manual trigger support
Visualization Plotly Express Interactive geospatial heatmaps

📊 Data Pipeline

1. Aggregation (fetch_news.py)

  • Fetches from 39 curated Indian news sources (TOI, NDTV, Indian Express, The Hindu, India Today, Business Standard, FirstPost, OpIndia, ABP Live, The Quint, etc.)
  • Concurrent requests with 10s timeout per feed
  • Handles malformed feeds gracefully with feed.bozo error checking
  • Returns ~1000 articles per run

2. Processing (process_data_india.py)

  • Loads india_locations_cities.csv (36 states/UTs, 600+ cities)
  • Fuzzy text matching: " {city_name} " in " {headline} " (space-padded for accuracy)
  • Aggregates headlines per state with deduplication
  • Serializes headline arrays to JSON strings for PostgreSQL JSONB compatibility

3. Storage

  • Replaces entire heatmap_data table (idempotent)
  • Stores only states with active news (news_count > 0)
  • Indexed by state code for fast lookups

🎨 Visualization

Run the included Plotly visualizer:

python indian_visualizer.py

Features:

  • Bubble size = news volume
  • Color intensity = story count
  • Hover = full headline list
  • Auto-zoom to India bounds
  • Realistic map colors (beige land, blue ocean)

🔐 Security

  • ✅ CORS configured for production origins
  • ✅ Cron endpoint protected with Bearer token
  • ✅ SQL injection prevented via parameterized queries
  • ✅ Environment variables for secrets (never committed)
  • ✅ Input validation on state codes

📈 Performance

Metric Value
Cold start ~2s
Avg response time 180ms
RSS fetch time 8-12s (concurrent)
Processing time 3-5s
Database write <1s
Total pipeline ~15s

🚢 Deployment

Vercel (Recommended)

  1. Push to GitHub
git init
git add .
git commit -m "Initial commit"
git remote add origin <your-repo>
git push -u origin main
  1. Import to Vercel
  • Connect GitHub repo
  • Add environment variables:
    • DATABASE_URL
    • CRON_SECRET
  • Deploy!
  1. Configure GitHub Actions
  • Go to GitHub repo → Settings → Secrets and variables → Actions
  • Add repository secrets:
    • API_URL (your Vercel deployment URL)
    • CRON_SECRET (same as in Vercel)
  • GitHub Actions will automatically trigger hourly updates
  1. Verify Automation
  • Check GitHub Actions tab → "Hourly News Fetcher" workflow
  • Should run every hour at :00
  • Can also trigger manually via "Run workflow" button

🧪 Testing

# Test news fetch
curl http://localhost:8000/api/news

# Test state-specific
curl http://localhost:8000/api/news/MH

# Test cron (local)
curl -H "Authorization: Bearer your-secret" \
     http://localhost:8000/api/cron/update

📁 Project Structure

Bharat-News-Api/
├── .github/
│   └── workflows/
│       └── cron.yaml             # GitHub Actions hourly trigger
├── main.py                      # FastAPI app + endpoints
├── fetch_news.py                # Async RSS aggregator
├── process_data_india.py        # NLP + DB writer
├── seed.py                      # Initial DB setup
├── india_locations_cities.csv   # Geographic reference data
├── requirements.txt             # Python dependencies
├── vercel.json                  # Deployment config
└── .env                         # Secrets (gitignored)

🎓 Key Learnings Demonstrated

  1. Async Programming: Proper use of asyncio and aiohttp for I/O-bound tasks
  2. Serverless Architecture: Stateless design, idempotent operations
  3. Data Engineering: ETL pipeline (Extract → Transform → Load)
  4. API Design: RESTful conventions, proper status codes, CORS
  5. DevOps: Environment management, GitHub Actions automation, zero-downtime deploys
  6. Geospatial Analysis: Coordinate-based data mapping
  7. Production Practices: Error handling, logging, secrets management

🔮 Future Enhancements

  • WebSocket support for real-time updates
  • Sentiment analysis (positive/negative news)
  • Historical data retention (time-series analysis)
  • GraphQL endpoint for flexible queries
  • Redis caching layer (reduce DB load)
  • Multi-language support (Hindi, Tamil, etc.)
  • Mobile app integration (React Native)

🤝 Contributing

Contributions welcome! Please:

  1. Fork the repo
  2. Create a feature branch (git checkout -b feature/amazing)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing)
  5. Open a Pull Request

📄 License

MIT License - feel free to use this in your portfolio or commercial projects.


👨‍💻 Author

Indiser
GitHubLinkedInPortfolio


🙏 Acknowledgments

  • News sources: Times of India, NDTV, Indian Express, The Hindu, India Today, Business Standard, FirstPost, The Quint, National Herald, Free Press Journal, OpIndia, ABP Live, Siasat, OneIndia, Organiser, TFI Post, The Better India, Hindu Business Line, and 21 others
  • Database: Neon for serverless PostgreSQL
  • Deployment: Vercel for seamless hosting

⭐ Star this repo if it helped you!

Built with ❤️ for the Indian developer community

About

⚡ Real-time geospatial news API for India | Serverless FastAPI + 40 concurrent RSS feeds + Neon PostgreSQL | Auto-updates hourly via Github Actions Cron | Maps 1000+ articles to 36 states with sub-second latency

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages