# Introduction



## Domain Background
- Social networks represent how entities connect and interact, forming complex structures of relationships.

- In the context of Facebook page-page networks, the nodes are public pages (companies, government orgs, politicians, TV shows) and edges represent mutual "likes" between them.

- These connections often reveal shared interests, topical similarity, or strategic alignment between pages. Studying such networks helps uncover communities, influence patterns, and hompohily, offering insights into how information and popularity spread across different categories of pages.

## Dataset Description

The dataset used in this project is the Facebook Large Page-Page Network from the Stanford Large Network Dataset Collection (SNAP). The network was collected through the Facebook Graph API in November 2017.

Dataset Info:

- 22,470 nodes (verified Facebook pages)
- 171,002 undirected edges (mutual 'likes' between pages)
- Four node labels (Facebook-defined categories) - supports multi-class node classficiation
    1. Politicians
    2. Government Organizations
    3. TV Shows
    4. Companies
- Node features extracted from site descriptions that the page owners created to summarize the purpose of the site.
- No edge features.

Citation:

- B. Rozemberczki, C. Allen and R. Sarkar. Multi-scale Attributed Node Embedding. 2019.


## Research Questions

1. Do Facebook pages tend to connect more often with other pages from the same category (e.g., politician-to-politician, brand-to-brand)?

2. Which Facebook pages connect different types of communities together (acting as structural bridges), and what kind of pages are they (e.g., media outlets, celebrities, organizations)?

3. Does the network show signs of structural balance — for example, do pages that both like a third page also tend to like each other?

# Methods

## Overview


Our analysis focused on three main aspects of the Page-Page network. First, we examined homophily, testing whether pages are more likely to connect with other in the same category. Then, we identified bridging pages that connect otherwise separate communities by analyzing centrality and community structure. Finally, we explored structural balance by checking whether pages that like same targets also tend to connect with each other. All the analyses were implemented in Julia using standard network analysis libraries and visualization tools.

### Computational Approach
  1. Data Preprocessing
      - Load data and drop self-loops/duplicate undirected edges.
      - Build an undirected simple graph
      - Keep/label nodes with missing category as “Unknown.”

  2. Homophily Analysis
      - Created a category-to-category mixing matrix.
      - Calculated assortativity to measure how often pages connect within same category.
      - Compared results to a random baseline by shuffling category labels.

  3. Structural Bridge Analysis
      - Applied community detection (Louvain/Leiden) to identify page clusters.
      - Calculated betweenness centrality and participation coefficients.
      - Highlighted pages with high cross-community connections as structural bridges.

  4. Structural Balance / Triadic Closure
      - Measured clustering and triangle formation across the network.
      - Calculated how often pages with common neighbors were also connected.
      - Compared closure levels to a random baseline.

## Data Preprocessing

In [8]:
# Data Preprocessing
include("src/preprocessing.jl")

data = NetworkPreprocessing.preprocess("data/musae_facebook_edges.csv",
                  "data/musae_facebook_target.csv",
                  "data/musae_facebook_features.json")

g            = data.g
labels       = data.labels
label_code   = data.label_code
label_levels = data.label_levels
X            = data.X
K            = data.K




4

## Homophily Analysis

In [34]:
# Homophily Analysis
include("src/homophily.jl")

hom = NetworkMetrics.summarize_homophily(g, label_code, K; label_levels=label_levels)
(hom.r, hom.edge_h, hom.base, hom.ratio)



(0.8206200407380531, 0.8853198925203281, 0.2651177250822885, 3.339346293219506)

## Structural Bridge 

In [37]:
include("src/structuralBridge.jl")

comm_labels = NetworkBridge.community_labels(g)
results = NetworkBridge.summarize_bridges(g, comm_labels, targets_df)

└ @ Main.NetworkBridge /workspaces/project1/src/structuralBridge.jl:16


UndefVarError: UndefVarError: `targets_df` not defined

## Structural Balance

In [36]:
include("src/structuralBalance.jl")

balance_stats = NetworkBalance.structural_balance_summary(g, label_code; R=50)

Triangles (closed triads):            794953
Balanced triads ratio:                0.9991

Friend-of-friend positive closure:
  Qualifying wedges (A–B, A–C +pos):  8462510
  Closed positive wedges (B–C +pos):  2215872
  Closure rate (observed):            0.2618





(n_triads = 794953, balance_ratio = 0.9991408297094294, fof_pos_closure = 0.2618457171690196, fof_baseline_mean = NaN, fof_baseline_std = NaN, fof_lift = NaN, fof_zscore = NaN)

# Results

# Discussion

# Conclusion