# Social Network Basics: Friendship Network Analysis

Learn social network analysis fundamentals using a small friendship network.

## Dataset

Friendship network with:
- **10 people** (nodes): Alice, Bob, Carol, David, Eve, Frank, Grace, Henry, Ian, Jane
- **38 friendships** (edges): Weighted by interaction strength

## Methods
- Network visualization
- Centrality analysis (degree, betweenness, closeness, eigenvector)
- Community detection
- Path analysis
- Network statistics

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
import community as community_louvain
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print("✓ Setup complete")

## 1. Load and Create Network

In [None]:
# Load edge list
df = pd.read_csv('sample_network_data.csv')

# Create graph
G = nx.from_pandas_edgelist(df, source='source', target='target', edge_attr='weight')

print(f"Network Statistics:")
print(f"  Nodes: {G.number_of_nodes()}")
print(f"  Edges: {G.number_of_edges()}")
print(f"  Density: {nx.density(G):.3f}")
print(f"\nNodes: {list(G.nodes())}")
df.head(10)

## 2. Network Visualization

In [None]:
# Create layout
pos = nx.spring_layout(G, k=2, iterations=50, seed=42)

# Plot network
fig, ax = plt.subplots(figsize=(12, 10))

# Draw edges with width based on weight
edges = G.edges()
weights = [G[u][v]['weight'] for u, v in edges]
nx.draw_networkx_edges(G, pos, width=[w*0.5 for w in weights], alpha=0.5, ax=ax)

# Draw nodes
nx.draw_networkx_nodes(G, pos, node_size=3000, node_color='lightblue', 
                       edgecolors='black', linewidths=2, ax=ax)

# Draw labels
nx.draw_networkx_labels(G, pos, font_size=12, font_weight='bold', ax=ax)

ax.set_title('Friendship Network', fontsize=14, fontweight='bold')
ax.axis('off')
plt.tight_layout()
plt.show()

print("Thicker edges indicate stronger friendships (higher interaction).")

## 3. Degree Centrality

In [None]:
# Calculate degree centrality
degree_cent = nx.degree_centrality(G)
degree_df = pd.DataFrame(list(degree_cent.items()), columns=['Person', 'Degree Centrality'])
degree_df = degree_df.sort_values('Degree Centrality', ascending=False)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.barh(degree_df['Person'], degree_df['Degree Centrality'], 
               color='steelblue', alpha=0.7, edgecolor='black')
ax.set_xlabel('Degree Centrality', fontsize=12)
ax.set_title('Degree Centrality - Who has most connections?', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

print("\nDegree Centrality Rankings:")
print(degree_df.to_string(index=False))
print("\nInterpretation: Higher degree = more direct connections = more social")

## 4. Betweenness Centrality

In [None]:
# Calculate betweenness centrality
between_cent = nx.betweenness_centrality(G, weight='weight')
between_df = pd.DataFrame(list(between_cent.items()), columns=['Person', 'Betweenness Centrality'])
between_df = between_df.sort_values('Betweenness Centrality', ascending=False)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.barh(between_df['Person'], between_df['Betweenness Centrality'], 
               color='green', alpha=0.7, edgecolor='black')
ax.set_xlabel('Betweenness Centrality', fontsize=12)
ax.set_title('Betweenness Centrality - Who bridges groups?', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

print("\nBetweenness Centrality Rankings:")
print(between_df.to_string(index=False))
print("\nInterpretation: High betweenness = lies on many shortest paths = broker/bridge")

## 5. Closeness and Eigenvector Centrality

In [None]:
# Closeness centrality
close_cent = nx.closeness_centrality(G, distance='weight')

# Eigenvector centrality
eigen_cent = nx.eigenvector_centrality(G, weight='weight', max_iter=1000)

# Combined dataframe
centrality_df = pd.DataFrame({
    'Person': list(G.nodes()),
    'Degree': [degree_cent[n] for n in G.nodes()],
    'Betweenness': [between_cent[n] for n in G.nodes()],
    'Closeness': [close_cent[n] for n in G.nodes()],
    'Eigenvector': [eigen_cent[n] for n in G.nodes()]
})

centrality_df = centrality_df.sort_values('Degree', ascending=False)

print("All Centrality Measures:")
print(centrality_df.round(3).to_string(index=False))

print("\nCentrality Interpretations:")
print("  Degree: Number of direct connections")
print("  Betweenness: Bridging role between groups")
print("  Closeness: How quickly can reach others")
print("  Eigenvector: Connected to well-connected people")

## 6. Community Detection

In [None]:
# Detect communities using Louvain method
communities = community_louvain.best_partition(G, weight='weight')

# Count communities
n_communities = len(set(communities.values()))

print(f"Number of communities detected: {n_communities}")
print("\nCommunity Assignments:")
for community_id in range(n_communities):
    members = [node for node, comm in communities.items() if comm == community_id]
    print(f"  Community {community_id}: {', '.join(members)}")

In [None]:
# Visualize communities
fig, ax = plt.subplots(figsize=(12, 10))

# Color nodes by community
colors = plt.cm.Set2(np.linspace(0, 1, n_communities))
node_colors = [colors[communities[node]] for node in G.nodes()]

# Draw edges
nx.draw_networkx_edges(G, pos, width=[w*0.5 for w in weights], alpha=0.5, ax=ax)

# Draw nodes colored by community
nx.draw_networkx_nodes(G, pos, node_size=3000, node_color=node_colors,
                       edgecolors='black', linewidths=2, ax=ax)

# Draw labels
nx.draw_networkx_labels(G, pos, font_size=12, font_weight='bold', ax=ax)

ax.set_title('Network Communities (Louvain Method)', fontsize=14, fontweight='bold')
ax.axis('off')
plt.tight_layout()
plt.show()

print("Colors indicate detected communities (tightly connected groups).")

## 7. Path Analysis

In [None]:
# Shortest paths
print("Example Shortest Paths:\n")
path_examples = [('Alice', 'Jane'), ('Bob', 'Henry'), ('Eve', 'David')]

for source, target in path_examples:
    try:
        path = nx.shortest_path(G, source=source, target=target)
        length = len(path) - 1
        print(f"{source} → {target}:")
        print(f"  Path: {' → '.join(path)}")
        print(f"  Length: {length} steps\n")
    except nx.NetworkXNoPath:
        print(f"{source} → {target}: No path exists\n")

# Average shortest path length
avg_path_length = nx.average_shortest_path_length(G)
print(f"Average shortest path length: {avg_path_length:.2f} steps")
print("This measures how 'small' the network is (small-world property).")

## 8. Network Statistics

In [None]:
# Calculate various network metrics
stats = {
    'Nodes': G.number_of_nodes(),
    'Edges': G.number_of_edges(),
    'Density': nx.density(G),
    'Average Degree': sum(dict(G.degree()).values()) / G.number_of_nodes(),
    'Clustering Coefficient': nx.average_clustering(G, weight='weight'),
    'Average Path Length': nx.average_shortest_path_length(G),
    'Diameter': nx.diameter(G),
    'Number of Communities': n_communities
}

print("="*60)
print("NETWORK STATISTICS SUMMARY")
print("="*60)
for key, value in stats.items():
    if isinstance(value, float):
        print(f"{key:.<40} {value:.3f}")
    else:
        print(f"{key:.<40} {value}")
print("="*60)

print("\nInterpretations:")
print(f"  Density ({stats['Density']:.3f}): {stats['Density']*100:.1f}% of possible connections exist")
print(f"  Clustering ({stats['Clustering Coefficient']:.3f}): Friends tend to be friends (transitivity)")
print(f"  Avg Path Length ({stats['Average Path Length']:.2f}): Average separation between people")
print(f"  Diameter ({stats['Diameter']}): Maximum separation in the network")

## 9. Degree Distribution

In [None]:
# Degree distribution
degrees = [G.degree(n) for n in G.nodes()]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
axes[0].hist(degrees, bins=range(min(degrees), max(degrees)+2), 
            color='purple', alpha=0.7, edgecolor='black')
axes[0].set_xlabel('Degree (Number of Friends)', fontsize=11)
axes[0].set_ylabel('Count', fontsize=11)
axes[0].set_title('Degree Distribution', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3, axis='y')

# Bar chart by person
degree_by_person = dict(G.degree())
sorted_people = sorted(degree_by_person.items(), key=lambda x: x[1], reverse=True)
people = [p[0] for p in sorted_people]
deg_values = [p[1] for p in sorted_people]

axes[1].bar(range(len(people)), deg_values, color='purple', alpha=0.7, edgecolor='black')
axes[1].set_xticks(range(len(people)))
axes[1].set_xticklabels(people, rotation=45, ha='right')
axes[1].set_ylabel('Degree', fontsize=11)
axes[1].set_title('Degree by Person', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print(f"Degree statistics:")
print(f"  Mean degree: {np.mean(degrees):.2f}")
print(f"  Median degree: {np.median(degrees):.0f}")
print(f"  Min degree: {min(degrees)}")
print(f"  Max degree: {max(degrees)}")

## Key Concepts Learned

### Network Fundamentals
- **Nodes**: Individuals in the network
- **Edges**: Connections between individuals
- **Weight**: Strength of connection
- **Density**: How connected the network is

### Centrality Measures
- **Degree**: Who has most connections?
- **Betweenness**: Who bridges groups?
- **Closeness**: Who can reach others fastest?
- **Eigenvector**: Who knows important people?

### Community Structure
- Groups of densely connected nodes
- Louvain method for detection
- Social circles and cliques

### Path Analysis
- Shortest paths between nodes
- Average path length
- Network diameter
- Small-world phenomenon

## Next Steps

### Extend the Analysis
- Add node attributes (age, interests)
- Analyze directed networks (followers)
- Temporal networks (evolution over time)
- Multiple edge types (friend, colleague, family)

### Real Social Networks
- **[Stanford SNAP](http://snap.stanford.edu/data/)**: Large network datasets
- **[Kaggle Social Networks](https://www.kaggle.com/datasets?search=social+network)**: Various datasets
- **[Network Repository](https://networkrepository.com/)**: Thousands of networks

### Advanced Methods
- Link prediction
- Influence propagation
- Network motifs
- Graph neural networks

## Resources

- **[NetworkX Documentation](https://networkx.org/)**: Python network analysis
- **Textbook**: *Networks, Crowds, and Markets* by Easley & Kleinberg
- **Course**: [Stanford CS224W](http://web.stanford.edu/class/cs224w/) - Machine Learning with Graphs