# US Attention Data

Wikipedia views, Google Trends, GDELT coverage for US topics in 2025

**Author**: Luke Steuber | [lukesteuber.com](https://lukesteuber.com) | [@lukesteuber.com](https://bsky.app/profile/lukesteuber.com)

**License**: MIT

---

## Quick Stats

- **Records**: 10
- **Size**: 0.2 MB
- **Date Range**: Various
- **Geographic Coverage**: Global

---

## Live Visualizations

[Add links to visualizations in ~/html/datavis/ that use this dataset]

- [Visualization Name](https://dr.eamer.dev/path/to/viz) - Description

---

In [None]:
# Load dependencies
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set visualization style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print("✓ Dependencies loaded")

## Load Data

In [None]:
# Load main data file
data_path = Path('wikipedia_event_articles.json')

# For JSON files
with open(data_path) as f:
    data = json.load(f)

# Convert to DataFrame
df = pd.DataFrame(data)

print(f"Loaded {len(df):,} records")
df.head()

## Data Structure

In [None]:
# Display schema
print("Columns:", df.columns.tolist())
print("\nData types:")
print(df.dtypes)
print("\nMissing values:")
print(df.isnull().sum())
print("\nMemory usage:")
print(df.memory_usage(deep=True).sum() / 1024**2, "MB")

## Summary Statistics

In [None]:
# Numeric columns
df.describe()

In [None]:
# Categorical columns (if applicable)
categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols[:5]:  # Show first 5 categorical columns
    print(f"\n{col} - Top 10 values:")
    print(df[col].value_counts().head(10))

## Visualizations

### Distribution Analysis

In [None]:
# [CUSTOMIZE: Add distribution plots for key numeric columns]
# Example:
# fig, ax = plt.subplots(figsize=(12, 6))
# df['column_name'].hist(bins=50, ax=ax)
# plt.title('Distribution of [COLUMN]')
# plt.xlabel('[COLUMN]')
# plt.ylabel('Frequency')
# plt.show()

### Geographic Distribution (if applicable)

In [None]:
# [CUSTOMIZE: Add map visualization if dataset has coordinates]
# Example:
# fig, ax = plt.subplots(figsize=(15, 10))
# df.plot.scatter(x='longitude', y='latitude', alpha=0.3, s=1, ax=ax)
# plt.title('Geographic Distribution')
# plt.xlabel('Longitude')
# plt.ylabel('Latitude')
# plt.show()

### Temporal Trends (if applicable)

In [None]:
# [CUSTOMIZE: Add time series analysis if dataset has dates]
# Example:
# df['date'] = pd.to_datetime(df['date'])
# df.set_index('date').resample('M').size().plot(figsize=(12, 6))
# plt.title('Records Over Time')
# plt.ylabel('Count')
# plt.show()

## Example Queries

### Query 1: [DESCRIPTION]

In [None]:
# [CUSTOMIZE: Add example queries relevant to this dataset]
# Example: Find top 10 by some metric
# top_10 = df.nlargest(10, 'column_name')[['name', 'column_name']]
# top_10

### Query 2: [DESCRIPTION]

In [None]:
# [CUSTOMIZE: Another example query]
# Example: Filter by criteria
# filtered = df[df['column'] > threshold]
# print(f"Found {len(filtered):,} matching records")
# filtered.head()

## Export Subset

Create a filtered subset for further analysis:

In [None]:
# Example: Export filtered data
# subset = df[df['condition']]
# subset.to_json('subset_output.json', orient='records', indent=2)
# print(f"Exported {len(subset):,} records to subset_output.json")

---

## Data Sources

[LIST_SOURCES]

## Citation

```bibtex
[BIBTEX_CITATION]
```

---

**Questions or issues?** Contact luke@lukesteuber.com or [@lukesteuber.com](https://bsky.app/profile/lukesteuber.com) on Bluesky.