# Introduction



## Domain Background
- Social networks represent how entities connect and interact, forming complex structures of relationships.

- In the context of Facebook page-page networks, the nodes are public pages (companies, government orgs, politicians, TV shows) and edges represent mutual "likes" between them.

- These connections often reveal shared interests, topical similarity, or strategic alignment between pages. Studying such networks helps uncover communities, influence patterns, and hompohily, offering insights into how information and popularity spread across different categories of pages.

## Dataset Description

The dataset used in this project is the Facebook Large Page-Page Network from the Stanford Large Network Dataset Collection (SNAP). The network was collected through the Facebook Graph API in November 2017.

Dataset Info:

- 22,470 nodes (verified Facebook pages)
- 171,002 undirected edges (mutual 'likes' between pages)
- Four node labels (Facebook-defined categories) - supports multi-class node classficiation
    1. Politicians
    2. Government Organizations
    3. TV Shows
    4. Companies
- Node features extracted from site descriptions that the page owners created to summarize the purpose of the site.
- No edge features.

Citation:

- B. Rozemberczki, C. Allen and R. Sarkar. Multi-scale Attributed Node Embedding. 2019.


## Research Questions

1. Do Facebook pages tend to connect more often with other pages from the same category (e.g., politician-to-politician, brand-to-brand)?

2. Which Facebook pages connect different types of communities together (acting as structural bridges), and what kind of pages are they (e.g., media outlets, celebrities, organizations)?

3. Does the network show signs of structural balance — for example, do pages that both like a third page also tend to like each other?

# Methods

## Overview


Our analysis focused on three main aspects of the Page-Page network. First, we examined homophily, testing whether pages are more likely to connect with other in the same category. Then, we identified bridging pages that connect otherwise separate communities by analyzing centrality and community structure. Finally, we explored structural balance by checking whether pages that like same targets also tend to connect with each other. All the analyses were implemented in Julia using standard network analysis libraries and visualization tools.

### Computational Approach
  1. Data Preprocessing
      - Load data and drop self-loops/duplicate undirected edges.
      - Build an undirected simple graph
      - Keep/label nodes with missing category as “Unknown.”

  2. Homophily Analysis
      - Created a category-to-category mixing matrix.
      - Calculated assortativity to measure how often pages connect within same category.
      - Compared results to a random baseline by shuffling category labels.

  3. Structural Bridge Analysis
      - Applied community detection (Louvain/Leiden) to identify page clusters.
      - Calculated betweenness centrality and participation coefficients.
      - Highlighted pages with high cross-community connections as structural bridges.

  4. Structural Balance / Triadic Closure
      - Measured clustering and triangle formation across the network.
      - Calculated how often pages with common neighbors were also connected.
      - Compared closure levels to a random baseline.

## Data Preprocessing

In [4]:
# Data Preprocessing
include("src/preprocessing.jl")

data = NetworkPreprocessing.preprocess("data/musae_facebook_edges.csv",
                  "data/musae_facebook_target.csv",
                  "data/musae_facebook_features.json")

g            = data.g
labels       = data.labels
label_code   = data.label_code
label_levels = data.label_levels
X            = data.X
K            = data.K
targets_df   = data.targets_df





Row,id,facebook_id,page_name,page_type
Unnamed: 0_level_1,Int64,Int64,String,String15
1,0,145647315578475,The Voice of China 中国好声音,tvshow
2,1,191483281412,U.S. Consulate General Mumbai,government
3,2,144761358898518,ESET,company
4,3,568700043198473,Consulate General of Switzerland in Montreal,government
5,4,1408935539376139,Mark Bailey MP - Labor for Miller,politician
6,5,134464673284112,Victor Dominello MP,politician
7,6,282657255260177,Jean-Claude Poissant,politician
8,7,239338246176789,Deputado Ademir Camilo,politician
9,8,544818128942324,T.C. Mezar-ı Şerif Başkonsolosluğu,government
10,9,285155655705,Army ROTC Fighting Saints Battalion,government


## Homophily Analysis

In [12]:
# Homophily Analysis
include("src/homophily.jl")

hom = NetworkMetrics.summarize_homophily(g, label_code, K; label_levels=label_levels)
(hom.r, hom.edge_h, hom.base, hom.ratio)


Edge homophily (same-category edge share): 0.8853
Random-mixing baseline (Σ pₖ²): 0.2651
Homophily ratio (observed / baseline): 3.339
Assortativity (r): 0.821
Mean node-level homophily: 0.883
Median node-level homophily: 1.0

Per-category internal edge share:
  company: 0.836
  government: 0.915
  politician: 0.868
  tvshow: 0.839





(0.8206200407380531, 0.8853198925203281, 0.2651177250822885, 3.339346293219506)

## Structural Bridge 

In [11]:
include("src/structuralBridge.jl")

top = NetworkBridge.summarize_bridges(g; top_n=15) 

└ @ Main.NetworkBridge /workspaces/project1/src/structuralBridge.jl:19


Nodes: 22470   Edges: 170823   Communities: 940
Betweenness: mean=0.0002  max=0.1158

Top 15 structural bridges (high betweenness + high participation):
[1m15×5 DataFrame[0m
[1m Row [0m│[1m node  [0m[1m community [0m[1m betweenness [0m[1m participation [0m[1m bridge_score [0m
[1m     [0m│[90m Int64 [0m[90m Int64     [0m[90m Float64     [0m[90m Float64       [0m[90m Float64      [0m
─────┼────────────────────────────────────────────────────────────
   1 │   702         10    0.11579         0.94097        95.0889
   2 │ 11004          2    0.0896283       0.807664       73.6592
   3 │ 21730          2    0.03982         0.63364        33.1615
   4 │ 19744          2    0.0398052       0.627566       33.1267
   5 │ 21121         29    0.0259536       0.629275       22.0543
   6 │ 17984          2    0.0226966       0.827423       20.2
   7 │  8483         10    0.0195573       0.64386        16.9936
   8 │ 20416         92    0.0193082       0.404453       15.88

Row,node,community,betweenness,participation,bridge_score
Unnamed: 0_level_1,Int64,Int64,Float64,Float64,Float64
1,702,10,0.11579,0.94097,95.0889
2,11004,2,0.0896283,0.807664,73.6592
3,21730,2,0.03982,0.63364,33.1615
4,19744,2,0.0398052,0.627566,33.1267
5,21121,29,0.0259536,0.629275,22.0543
6,17984,2,0.0226966,0.827423,20.2
7,8483,10,0.0195573,0.64386,16.9936
8,20416,92,0.0193082,0.404453,15.8872
9,22172,2,0.0176409,0.514949,14.9723
10,10380,2,0.015456,0.515432,13.2266


## Structural Balance

In [13]:
include("src/structuralBalance.jl")

balance_stats = NetworkBalance.structural_balance_summary(g, label_code; R=50)

Triangles (closed triads):            794953
Balanced triads ratio:                0.9991

Friend-of-friend positive closure:
  Qualifying wedges (A–B, A–C +pos):  8462510
  Closed positive wedges (B–C +pos):  2215872
  Closure rate (observed):            0.2618



(n_triads = 794953, balance_ratio = 0.9991408297094294, fof_pos_closure = 0.2618457171690196, fof_baseline_mean = NaN, fof_baseline_std = NaN, fof_lift = NaN, fof_zscore = NaN)

# Results

# Discussion

# Conclusion