# Exercise: U.S. Congress Twitter Influence Network

---

## Overview

Analyze a **directed, weighted network** of influence among 117th U.S. Congress members based on Twitter interactions (Feb-June 2022). For a detailed description of the dataset you can give a look at the original paper <a href="https://doi.org/10.1016/j.dib.2023.109521">Fink, Christian G., et al. "A Congressional Twitter network dataset quantifying pairwise probability of influence." Data in Brief 50 (2023): 109521

**Key concepts:**
- **Directed**: Edge A→B means A influences B
- **Weighted**: Edge weights = probability of influence
- **Why reverse for PageRank**: Edges show who influences whom; PageRank needs who is influenced

## Centrality in Directed Weighted Networks

### Out-Strength
Sum of outgoing edge weights:
$$s_{out}(i) = \sum_{j} w_{ij}$$

```python
out_strength = {}
for node in G.nodes():
    total = sum(data['weight'] for _, _, data in G.out_edges(node, data=True))
    out_strength[node] = total
```

### PageRank: Why Reverse?
- Edge A→B = "A influences B"
- PageRank finds who is **most influenced**
- Reverse: A←B = "A receives influence"

```python
G_reversed = G.reverse()
pagerank = nx.pagerank(G_reversed, alpha=0.85, weight='weight')
```

## Part 1: Load Data

In [None]:
# Import libraries
# YOUR CODE HERE


In [None]:
# Load network from "formatted_congress.edgelist"
# Use nx.read_weighted_edgelist with create_using=nx.DiGraph() and nodetype=int
# YOUR CODE HERE


In [None]:
# Load usernames from "congress_network_data.json"
# Extract data[0]['usernameList']
# YOUR CODE HERE


## Part 2: Compute Centralities

In [None]:
# Compute out-strength for all nodes
# For each node, sum weights of outgoing edges: G.out_edges(node, data=True)
# Print top 5
# YOUR CODE HERE


In [None]:
# Compute PageRank on REVERSED graph
# 1. G_reversed = G.reverse()
# 2. pagerank = nx.pagerank(G_reversed, alpha=0.85, weight='weight')
# Print top 5
# YOUR CODE HERE


In [None]:
# Compute betweenness centrality
# betweenness = nx.betweenness_centrality(G, weight='weight', normalized=True)
# Print top 5
# YOUR CODE HERE


## Part 3: Comparison Table

In [None]:
# Get top 10 for each centrality measure
# Create pandas DataFrame with columns: Rank, Out-Strength, PageRank, Betweenness
# Save to "congress_centrality_table.csv"
# YOUR CODE HERE


## Part 4: Scatter Plots

In [None]:
# Extract centrality values in consistent order
# nodes = list(G.nodes())
# strength_vals = [out_strength[n] for n in nodes]
# pagerank_vals = [pagerank[n] for n in nodes]
# betweenness_vals = [betweenness[n] for n in nodes]
# YOUR CODE HERE


In [None]:
# Create 2 scatter plots side-by-side
# fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Plot 1: strength vs pagerank
# Plot 2: strength vs betweenness
# Save as 'congress_scatter_plots.png'
# YOUR CODE HERE


## Part 5: Network Visualizations

In [None]:
# Normalize centrality values to 0-1 range
# Helper function:
def normalize(values):
    vmin = min(values)
    vmax = max(values)
    if vmax == vmin:
        return [0.5] * len(values)
    return [(v - vmin) / (vmax - vmin) for v in values]

# Apply to all three centralities
# norm_strength = normalize(strength_vals)
# norm_pagerank = normalize(pagerank_vals)
# norm_betweenness = normalize(betweenness_vals)
# YOUR CODE HERE


In [None]:
# Prepare edge widths
# edge_weights = [data['weight'] for _, _, data in G.edges(data=True)]
# max_weight = max(edge_weights)
# edge_widths = [3 * w / max_weight for w in edge_weights]
# YOUR CODE HERE


In [None]:
# Create layout: pos = nx.spring_layout(G, seed=42)
# YOUR CODE HERE


In [None]:
# Visualization 1: Out-Strength
# YOUR CODE HERE


In [None]:
# Visualization 2: PageRank
# YOUR CODE HERE


In [None]:
# Visualization 3: Betweenness
# YOUR CODE HERE


In [None]:
# Visualization 4: Side-by-side comparison
# fig, axes = plt.subplots(1, 3, figsize=(16, 5))
# Draw all three networks side-by-side with smaller nodes (size=80)
# Save as 'congress_comparison.png'
# YOUR CODE HERE


## Discussion Questions

1. **Are the same people at the top of all three measures?** If not, why?

2. **Are there members with low strength but high PageRank or Betweenness?** What might this tell us?

3. **Looking at the visualizations, do you see patterns or clusters?**

4. **Why was it important to reverse the graph for PageRank?**

