# How to Create Bar Charts in Python Using Real News Data

Learn to create bar charts using Python's Matplotlib library and real-time news data from the NewsDataHub API.

**What you'll build:**
- Topic distribution chart (vertical bars)
- Language distribution chart (horizontal bars)
- Top 10 news sources chart
- Political leaning analysis (optional)

**Time:** 15-20 minutes  
**Level:** Beginner  
**Stack:** Python, Matplotlib, NewsDataHub API

---

## Setup

You can run this notebook in two ways:
1. **With sample data** (no API key needed) - Uses cached data from `data/sample-news-data.json`
2. **With live data** - Provide your NewsDataHub API key to fetch fresh articles

Get a free API key at: https://newsdatahub.com/login

In [None]:
# Install required packages (uncomment if needed)
# !pip install requests matplotlib

In [None]:
import requests
import matplotlib.pyplot as plt
from collections import Counter
import json
import os

## Step 1: Fetch News Data

Set your API key below, or leave it empty to use sample data.

In [None]:
# Set your API key here (or leave empty to use sample data)
API_KEY = ""  # Replace with your NewsDataHub API key, or leave empty

# Check if API key is provided
if API_KEY and API_KEY != "":
    print("Using live API data...")
    
    url = "https://api.newsdatahub.com/v1/news"
    headers = {"x-api-key": API_KEY}
    
    articles = []
    cursor = None
    
    # Fetch 2 pages (up to 200 articles)
    for _ in range(2):
        params = {
            "per_page": 100,
            "country": "US,FR,DE,ES,BR",
            "source_type": "mainstream_news,digital_native"
        }
        if cursor:
            params["cursor"] = cursor
        
        response = requests.get(url, headers=headers, params=params)
        response.raise_for_status()
        data = response.json()
        
        articles.extend(data.get("data", []))
        cursor = data.get("next_cursor")
        
        if not cursor:
            break
    
    print(f"Fetched {len(articles)} articles from API")
    
else:
    print("No API key provided. Using sample data from data/sample-news-data.json...")
    
    # Load sample data
    with open("data/sample-news-data.json", "r") as f:
        articles = json.load(f)
    
    print(f"Loaded {len(articles)} articles from sample data")

### Understanding the Code

- **`x-api-key` header** — Authenticates your request
- **`per_page` parameter** — Controls batch size (max 100 on free tier)
- **`country` parameter** — Fetches from US, France, Germany, Spain, Brazil for diverse data
- **`source_type` parameter** — Filters for mainstream and digital-native sources
- **`cursor` parameter** — Marks position in result set for next page
- **`next_cursor`** — Token for fetching next page

**Why multi-country filtering?** Creates meaningful language distribution (English, French, German, Spanish, Portuguese) and diverse topics.

---

## Step 2: Topic Distribution Chart

Extract topics and create a vertical bar chart showing the most popular topics.

In [None]:
# Extract topics from articles
topics = []
for article in articles:
    article_topics = article.get("topics", [])
    if article_topics:
        # Topics is an array - add all topics from this article
        if isinstance(article_topics, list):
            topics.extend(article_topics)
        else:
            topics.append(article_topics)

# Exclude 'general' topic (articles not yet categorized)
topics = [t for t in topics if t != 'general']

topic_counts = Counter(topics)
print(f"Found {len(topic_counts)} unique topics (excluding 'general')")

# Get top 15 topics to avoid chart clutter
top_topics = dict(topic_counts.most_common(15))
print(f"Displaying top 15 topics out of {len(topic_counts)} total")

**What this does:**
- NewsDataHub returns `topics` as an array, not a single value
- Each article can have multiple topics, so we use `extend()` to add them all
- Filters out `'general'` — a placeholder for uncategorized articles
- `Counter` aggregates and counts topic occurrences
- Limits to top 15 topics to prevent clutter

In [None]:
# Color palette for data visualization
vibrant_colors = [
    '#EF4444',  # Red
    '#3B82F6',  # Blue
    '#10B981',  # Green
    '#FBBF24',  # Yellow
    '#8B5CF6',  # Purple
    '#F59E0B',  # Orange
    '#EC4899',  # Pink
    '#14B8A6',  # Teal
    '#6366F1',  # Indigo
    '#F97316'   # Orange-red
]

plt.figure(figsize=(12, 6))
categories = list(top_topics.keys())
values = list(top_topics.values())
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]

bars = plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)

plt.title("Top 15 Topics in News Coverage", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("Topic", fontsize=12, fontweight="bold")
plt.ylabel("Article Count", fontsize=12, fontweight="bold")
plt.xticks(rotation=45, ha="right", fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis="y", linestyle="--", alpha=0.3)

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}', ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.savefig("topic-distribution-chart.png", dpi=300, bbox_inches="tight")
plt.show()

print("Chart saved as: topic-distribution-chart.png")

**Styling breakdown:**
- **`figsize=(12, 6)`** — Creates wide chart for 15 categories
- **`vibrant_colors`** — Color palette optimized for readability
- **`edgecolor='white', linewidth=2`** — White borders make bars stand out
- **`rotation=45, ha="right"`** — Rotates x-labels to prevent overlap
- **`grid(axis="y")`** — Adds horizontal gridlines for easier comparison
- **Value labels** — Shows exact counts on top of each bar

---

## Step 3: Language Distribution (Horizontal Bars)

Horizontal bar charts work better when you have many categories or want to display labels without rotation.

In [None]:
# Extract languages
languages = [
    article.get("language")
    for article in articles
    if article.get("language")
]

lang_counts = Counter(languages)
print(f"Found {len(lang_counts)} languages:")
for lang, count in lang_counts.items():
    print(f"  {lang}: {count} articles")

In [None]:
plt.figure(figsize=(10, 6))
categories = list(lang_counts.keys())
values = list(lang_counts.values())
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]

bars = plt.barh(categories, values, color=colors, edgecolor='white', linewidth=2)

plt.title("Language Distribution in News Coverage", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("Article Count", fontsize=12, fontweight="bold")
plt.ylabel("Language", fontsize=12, fontweight="bold")
plt.xticks(fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis="x", alpha=0.3, linestyle="--")

# Add value labels
for bar in bars:
    width = bar.get_width()
    plt.text(width, bar.get_y() + bar.get_height()/2.,
             f'{int(width)}', ha='left', va='center', fontsize=11, fontweight='bold',
             bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7, edgecolor='none'))

plt.tight_layout()
plt.savefig("language-distribution-chart.png", dpi=300, bbox_inches="tight")
plt.show()

print("Chart saved as: language-distribution-chart.png")

**When to use horizontal bars:**
- You have many categories (10+)
- Labels are long (horizontal text is easier to read than rotated text)
- Comparing similar values
- Sorting alphabetically (creates natural top-to-bottom reading flow)

---

## Step 4: Top 10 News Sources

Analyzing source distribution helps identify the most active publishers and potential dataset biases.

In [None]:
# Extract sources
sources = []
for article in articles:
    source = article.get("source", {})
    if isinstance(source, dict):
        source_title = source.get("title") or source.get("name")
    else:
        source_title = None
    
    if not source_title:
        source_title = article.get("source_title")
    
    if source_title:
        sources.append(source_title)

source_counts = Counter(sources)
top10 = source_counts.most_common(10)

print("Top 10 most active sources:")
for rank, (source, count) in enumerate(top10, 1):
    print(f"{rank}. {source}: {count} articles")

In [None]:
plt.figure(figsize=(12, 6))
categories = [x[0] for x in top10]
values = [x[1] for x in top10]
colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]

bars = plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)

plt.title("Top 10 Most Active News Sources", fontsize=16, fontweight="bold", pad=20)
plt.xlabel("News Source", fontsize=12, fontweight="bold")
plt.ylabel("Article Count", fontsize=12, fontweight="bold")
plt.xticks(rotation=45, ha="right", fontsize=11)
plt.yticks(fontsize=11)
plt.grid(axis="y", alpha=0.3, linestyle="--")

# Add value labels
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}', ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.savefig("top-sources-chart.png", dpi=300, bbox_inches="tight")
plt.show()

print("Chart saved as: top-sources-chart.png")

**Why analyze sources?**
- **Identify dominant publishers** — See which outlets produce most content
- **Detect dataset bias** — Over-representation may skew analysis
- **Assess coverage diversity** — Balanced distribution suggests varied perspectives
- **Plan data collection** — Adjust filters if you need more diversity

---

## Optional: Political Leaning Analysis

NewsDataHub includes political leaning metadata for sources, ranging from far-left to far-right.

**Note:** Political leaning data requires a paid NewsDataHub plan. Check https://newsdatahub.com/plans for feature availability.

In [None]:
# Extract political leaning
leanings = [
    article.get("source", {}).get("political_leaning")
    for article in articles
    if article.get("source", {}).get("political_leaning")
]

leaning_counts = Counter(leanings)
print(f"Political leaning: {len(leanings)} out of {len(articles)} articles have leaning data")
print(f"All leaning values: {dict(leaning_counts)}")

In [None]:
# Only create chart if we have political leaning data
if leaning_counts:
    # Define order: left to right political spectrum + nonpartisan
    # 'nonpartisan' represents wire services (AP, Reuters, AFP) and fact-based outlets
    order = ['far_left', 'left', 'center_left', 'center', 'center_right', 'right', 'far_right', 'nonpartisan']
    categories = [cat for cat in order if cat in leaning_counts]
    values = [leaning_counts[cat] for cat in categories]
    
    plt.figure(figsize=(12, 6))
    colors = [vibrant_colors[i % len(vibrant_colors)] for i in range(len(categories))]
    bars = plt.bar(categories, values, color=colors, edgecolor='white', linewidth=2)
    
    plt.title('Political Leaning Distribution of News Sources', fontsize=16, fontweight='bold', pad=20)
    plt.xlabel('Political Leaning', fontsize=12, fontweight='bold')
    plt.ylabel('Article Count', fontsize=12, fontweight='bold')
    plt.xticks(rotation=45, ha='right', fontsize=11)
    plt.yticks(fontsize=11)
    plt.grid(axis='y', alpha=0.3, linestyle='--')
    
    # Add value labels
    for bar in bars:
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height,
                 f'{int(height)}', ha='center', va='bottom', fontsize=11, fontweight='bold')
    
    plt.tight_layout()
    plt.savefig('political-leaning-chart.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("Chart saved as: political-leaning-chart.png")
else:
    print("No political leaning data available in this dataset.")
    print("This feature requires a paid NewsDataHub plan.")

**Understanding the categories:**
- **Political spectrum (far_left → far_right)** — Sources with identifiable political bias
- **Nonpartisan** — Wire services (AP, Reuters, AFP) and fact-based outlets maintaining editorial neutrality

**Use cases:**
- Media bias research
- Comparative topic analysis across political spectrum
- Source diversity metrics for news aggregators
- Trend analysis over time

---

## Summary

You've created:
- ✓ Topic distribution chart
- ✓ Language distribution chart
- ✓ Top 10 sources chart
- ✓ Political leaning chart (if data available)

All charts are saved as high-resolution PNG files in your current directory.

## Next Steps

- **Add more chart types:** Pie charts, line charts, heatmaps
- **Build interactive dashboards:** Try Plotly or Streamlit
- **Automate reports:** Schedule this notebook to run daily
- **Deeper analysis:** Sentiment over time, geographic analysis, keyword extraction

## Learn More

- **NewsDataHub API Docs:** https://newsdatahub.com/docs
- **Matplotlib Gallery:** https://matplotlib.org/stable/gallery/index.html
- **Full Tutorial:** https://newsdatahub.com/learning-center/article/bar-charts-in-python-using-real-news-data
- **Get API Key:** https://newsdatahub.com/login