##### Stephanie Chiang
##### DATA 620 Summer 2025

## Assignment Week 2 Part 1:
# Graph Visualization

1. Load a graph database of your choosing from a text file or other source. If you take a
large network dataset from the web (such as from https://snap.stanford.edu/data/), please
feel free at this point to load just a small subset of the nodes and edges.

## Importing the Dataset

https://snap.stanford.edu/data/congress-twitter.html
"This network represents the Twitter interaction network for the 117th United States Congress, both House of Representatives and Senate. The base data was collected via the Twitter’s API, then the empirical transmission probabilities were quantified according to the fraction of times one member retweeted, quote tweeted, replied to, or mentioned another member’s tweet."

In [None]:
import networkx as nx
import matplotlib.pyplot as plt

# Helper function to read node data from a JSON file
def read_json_value(file_path):
    import json
    with open(file_path, 'r') as file:
        data = json.load(file)
    return data

# Per README, the usernameList[i] gives the Twitter username corresponding to each node i
cnd = read_json_value("congress_network_data.json")[0]
usernameList = cnd["usernameList"]
# print(usernameList[:10])

# Read the edgelist from the file and create a graph
G = nx.read_edgelist("congress.edgelist", create_using = nx.Graph())

# Print graph info
print(G)
print("First 10 nodes:", list(G.nodes())[:10])

# Replace anonymized node labels with Twitter handles from usernameList
for i in range(len(usernameList)):
    if usernameList[i] is not None:
        G = nx.relabel_nodes(G, {str(i): usernameList[i]})

# Print updated graph info
print("First 10 nodes:", list(G.nodes())[:10])

['SenatorBaldwin', 'SenJohnBarrasso', 'SenatorBennet', 'MarshaBlackburn', 'SenBlumenthal', 'RoyBlunt', 'CoryBooker', 'JohnBoozman', 'SenatorBraun', 'SenSherrodBrown']
Graph with 475 nodes and 10222 edges
First 10 nodes: ['0', '4', '12', '18', '25', '30', '46', '55', '58', '59']
Graph with 475 nodes and 10222 edges
First 10 nodes: ['SenatorBaldwin', 'SenBlumenthal', 'SenatorCardin', 'SenCortezMasto', 'SenatorDurbin', 'LindseyGrahamSC', 'SenAmyKlobuchar', 'SenJeffMerkley', 'ChrisMurphyCT', 'PattyMurray']


## Basic EDA

2. Create basic analysis on the graph, including the graph’s diameter, and at least one other
metric of your choosing. You may either code the functions by hand (to build your
intuition and insight), or use functions in an existing package.

## Analysis

In [None]:
# Calculate the diameter of the graph
diameter = nx.diameter(G)
print(f"Diameter of the graph: {diameter}")

# Calculate the density of the graph
density = nx.density(G)
print(f"Density of the graph: {density}")

# Centrality measures
degree_centrality = nx.degree_centrality(G)
print("Degree Centrality:")

# Top 10 nodes by degree centrality
top_degree_centrality = sorted(degree_centrality.items(), key=lambda x: x[1], reverse=True)[:10]
for node, centrality in top_degree_centrality:
    print(f"Node {node}: {centrality:.4f}")
# Draw the graph
plt.figure(figsize=(12, 12))
nx.draw(G, with_labels=True, node_size=50, font_size=8)
plt.title("Congress Graph")
plt.show()


3. Use a visualization tool of your choice to display information. Use NetworkX directly.

## Visualizing the Graph


In [None]:

# isualize a small portion
subgraph = G.subgraph(list(G.nodes)[:50])  # visualize a subset
nx.draw(subgraph, with_labels = True, node_size=30)
plt.show()




4. Please record a short video (~ 5 minutes), and submit a link to the video as part of your
homework submission. Put your notebook in GitHub and submit your assignment link by end of day on Sunday.