<div style="text-align: right;">¬© 2026 Moses Boudourides. All Rights Reserved.</div>

# LLMs for Qualitative and Mixed-Methods Social Network Analysis (SNA)
## Moses Boudourides

# Session 4: Research Designs with LLMs and Networks

In [1]:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
from pyvis.network import Network
import os
import json
import sys
import hashlib
import random
from sklearn.datasets import fetch_20newsgroups
import IPython
from openai import OpenAI
# import google.generativeai as genai

In [2]:
# --- 1. & 2. KEY LOADING & INITIALIZATION ---

# Force Google to use REST to avoid ALTS/GCP credential errors
os.environ["GOOGLE_API_USE_MTLS"] = "never" 

def get_api_key(file_path):
    if os.path.exists(file_path):
        with open(file_path, 'r') as f:
            return f.read().strip().replace('"', '').replace("'", "")
    return None

oa_key = get_api_key("openai_key.txt")
# gem_key = get_api_key("gemini_key.txt")

# Initialize OpenAI
client_oa = OpenAI(api_key=oa_key)

# # Initialize Gemini using 'rest' transport to bypass gRPC/ALTS errors
# genai.configure(api_key=gem_key, transport='rest')

# # Dynamic Model Selection
# available_models = [m.name for m in genai.list_models() if 'generateContent' in m.supported_generation_methods]
# target_model = 'gemini-1.5-flash' if 'models/gemini-1.5-flash' in available_models else available_models[0].split('/')[-1]
# model_gemini = genai.GenerativeModel(target_model)

## Part 1: Sequential and Parallel Research Designs with LLMs

**Goal:**  
Illustrate sequential and parallel design patterns for LLM-augmented qualitative and mixed-methods SNA.

This notebook focuses on **design logic**, not performance. LLM outputs are always provisional and subject to interpretation.

In [3]:
# --- 3. DATA & PERSISTENT QUERY STEP ---

# 2. Persistence Logic
CACHE_FILE = "llm_cache_s4.json"

if os.path.exists(CACHE_FILE):
    with open(CACHE_FILE, "r") as f:
        cache = json.load(f)
else:
    cache = {}

def get_label(model_id, text, api_func, prompt_type="design"):
    # Unique key ensures cache stays valid even if prompt or text changes
    cache_key = f"{model_id}_{prompt_type}_{text[:50]}"
    
    if cache_key in cache:
        return cache[cache_key]
    
    # Cache Miss: Call API
    result = api_func(text)
    cache[cache_key] = result
    
    # Save updated cache to disk
    with open(CACHE_FILE, "w") as f:
        json.dump(cache, f)
    return result

# 3. API Execution Wrappers
def query_openai_design(text, design_type="sequential"):
    if design_type == "sequential":
        prompt = f"""Analyze this text for relational patterns. Identify:
1. Trust relations
2. Formal authority structures
3. Hidden conflicts or tensions

Text: {text}"""
    else:  # parallel
        prompt = f"""Describe the roles and relationships of actors in this text.
Focus on their positions, responsibilities, and interpersonal dynamics.

Text: {text}"""
    
    res = client_oa.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7
    )
    return res.choices[0].message.content.strip()

# Small qualitative corpus
texts_small = [
    "I rely on Nina when sensitive issues arise.",
    "Paul formally supervises my work but offers little guidance.",
    "Nina and Paul often disagree behind the scenes."
]

df_small = pd.DataFrame({"text": texts_small})
df_small

Unnamed: 0,text
0,I rely on Nina when sensitive issues arise.
1,Paul formally supervises my work but offers li...
2,Nina and Paul often disagree behind the scenes.


## Sequential Design: Phase 1 (Qualitative Anchor)

We begin with a small qualitative corpus to establish meaning and identify key concepts.

In [4]:
print("PHASE 1: QUALITATIVE ANCHOR")
print("="*60)
print()
print("Qualitative Insights from Small Corpus:")
print("- Trust relations: Nina is a trusted advisor for sensitive issues")
print("- Formal authority: Paul has formal supervisory role but weak support")
print("- Hidden conflict: Nina and Paul disagree behind the scenes")
print()
print("These concepts will guide Phase 2 LLM analysis.")

PHASE 1: QUALITATIVE ANCHOR

Qualitative Insights from Small Corpus:
- Trust relations: Nina is a trusted advisor for sensitive issues
- Formal authority: Paul has formal supervisory role but weak support
- Hidden conflict: Nina and Paul disagree behind the scenes

These concepts will guide Phase 2 LLM analysis.


## Sequential Design: Phase 2 (LLM-Assisted Scaling)

We use the LLM to analyze the qualitative corpus using the Phase 1 concepts, scaling up the analysis.

In [5]:
print("PHASE 2: LLM-ASSISTED SCALING")
print("="*60)
print()

for i, text in enumerate(texts_small):
    analysis = get_label("openai", text, lambda t: query_openai_design(t, "sequential"), "sequential_phase2")
    print(f"Text {i+1}: {text}")
    print(f"LLM Analysis:\n{analysis}")
    print("-" * 60)

print("\nPhase 2 complete: LLM has identified patterns using Phase 1 concepts.")

PHASE 2: LLM-ASSISTED SCALING

Text 1: I rely on Nina when sensitive issues arise.
LLM Analysis:
Analyzing the text "I rely on Nina when sensitive issues arise," we can identify the following relational patterns:

1. **Trust Relations**:
   - The speaker has a trust relationship with Nina, as indicated by the phrase "I rely on Nina." This suggests that the speaker feels comfortable turning to Nina for support or guidance in sensitive situations, implying a level of trust in her judgment or ability to handle such matters.

2. **Formal Authority Structures**:
   - The text does not explicitly mention any formal authority structures. However, the reliance on Nina could imply that she holds a certain position of influence or expertise in handling sensitive issues, even if that authority is not formally defined. Without additional context, it's unclear whether Nina has an official role or title that grants her formal authority.

3. **Hidden Conflicts or Tensions**:
   - While the text does 

## Sequential Design: Phase 3 (Interpretive Return)

LLM output highlights candidates for closer qualitative analysis. Researchers return to cases, not summaries.

In [6]:
print("PHASE 3: INTERPRETIVE RETURN")
print("="*60)
print()
print("Researcher Reflection:")
print("- LLM outputs are candidates for closer analysis, not conclusions")
print("- Return to original texts to verify and contextualize")
print("- Refine theoretical understanding based on case details")
print("- Maintain interpretive control throughout")
print()
print("Key principle: The researcher, not the LLM, makes final interpretations.")

PHASE 3: INTERPRETIVE RETURN

Researcher Reflection:
- LLM outputs are candidates for closer analysis, not conclusions
- Return to original texts to verify and contextualize
- Refine theoretical understanding based on case details
- Maintain interpretive control throughout

Key principle: The researcher, not the LLM, makes final interpretations.


## Parallel Design: Human vs LLM Analysis

We analyze the same data independently using human coding and LLM analysis, then compare.

In [7]:
print("PARALLEL DESIGN: HUMAN CODING")
print("="*60)
print()

human_codes = {
    "Nina": "Trusted advisor - provides support for sensitive issues",
    "Paul": "Formal authority with weak support - supervises but offers little guidance",
    "Relationship": "Hidden conflict - Nina and Paul disagree behind the scenes"
}

print("Human Coding Results:")
for actor, code in human_codes.items():
    print(f"  {actor}: {code}")

PARALLEL DESIGN: HUMAN CODING

Human Coding Results:
  Nina: Trusted advisor - provides support for sensitive issues
  Paul: Formal authority with weak support - supervises but offers little guidance
  Relationship: Hidden conflict - Nina and Paul disagree behind the scenes


In [8]:
print("\nPARALLEL DESIGN: LLM CODING")
print("="*60)
print()

def query_openai_roles(text):
    prompt = f"""Describe the roles and relationships of actors in this text.
Focus on their positions, responsibilities, and interpersonal dynamics.

Text: {text}"""
    res = client_oa.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7
    )
    return res.choices[0].message.content.strip()

# Generate LLM coding for the corpus
corpus_text = "\n".join(texts_small)
llm_analysis = get_label("openai", corpus_text, query_openai_roles, "parallel_llm_coding")

print("LLM Coding Results:")
print(llm_analysis)


PARALLEL DESIGN: LLM CODING

LLM Coding Results:
In the provided text, the roles and relationships of the actors‚ÄîNina and Paul‚Äîcan be analyzed as follows:

1. **Nina**:
   - **Position**: Nina appears to be a key support figure, potentially a colleague or peer, who is trusted by the speaker to handle sensitive issues.
   - **Responsibilities**: Her primary responsibility seems to be providing emotional or strategic support during challenging situations. The speaker relies on her for assistance in navigating sensitive matters, indicating that Nina likely has experience or insight that the speaker values.
   - **Interpersonal Dynamics**: The relationship between Nina and the speaker is one of trust and reliance. The speaker feels comfortable turning to Nina for support, suggesting a collaborative and possibly empathetic dynamic.

2. **Paul**:
   - **Position**: Paul holds a formal supervisory role over the speaker‚Äôs work, indicating a level of authority and responsibility for over

## Comparing Parallel Analyses

Where do human and LLM interpretations converge? Where do they diverge? What does divergence reveal?

In [9]:
print("COMPARING PARALLEL ANALYSES")
print("="*60)
print()
print("Convergences:")
print("- Both human and LLM identify Nina as a trusted advisor")
print("- Both recognize Paul's formal authority with limited support")
print("- Both note tension between Nina and Paul")
print()
print("Divergences:")
print("- LLM may emphasize different aspects of relationships")
print("- LLM may infer motivations not explicitly stated")
print("- Human codes may capture nuances LLM misses")
print()
print("Analytic Resource:")
print("Differences between human and LLM analysis are opportunities for deeper investigation.")
print("They reveal what each method captures and what it misses.")

COMPARING PARALLEL ANALYSES

Convergences:
- Both human and LLM identify Nina as a trusted advisor
- Both recognize Paul's formal authority with limited support
- Both note tension between Nina and Paul

Divergences:
- LLM may emphasize different aspects of relationships
- LLM may infer motivations not explicitly stated
- Human codes may capture nuances LLM misses

Analytic Resource:
Differences between human and LLM analysis are opportunities for deeper investigation.
They reveal what each method captures and what it misses.


## Data Cleaning and Ethical Implementation

Before network construction, we implement data cleaning and ethical safeguards including anonymization.

In [10]:
# Simulated raw data from a social media scrape
raw_data = [
    {"user": "Alice", "text": "I really appreciate @Bob's help on the project.", "timestamp": "2023-10-01"},
    {"user": "Bob", "text": "Thanks @Alice! @Charlie also contributed a lot.", "timestamp": "2023-10-02"},
    {"user": "Charlie", "text": "Great collaboration with @Alice and @Bob!", "timestamp": "2023-10-03"}
]

df_raw = pd.DataFrame(raw_data)

# Simple cleaning: Ensure lowercase and remove special characters if needed
df_raw['clean_text'] = df_raw['text'].str.lower()

print("CLEANED DATA:")
print(df_raw[['user', 'clean_text']])

CLEANED DATA:
      user                                       clean_text
0    Alice  i really appreciate @bob's help on the project.
1      Bob  thanks @alice! @charlie also contributed a lot.
2  Charlie        great collaboration with @alice and @bob!


## Ethical Implementation: Anonymization

Protect participant identity while maintaining network structure using hashing.

In [11]:
# Responsible Use: Anonymizing PII
def anonymize_user(username):
    # Using a hash to protect identity while maintaining network structure
    return hashlib.sha256(username.encode()).hexdigest()[:8]

df_raw['anon_user'] = df_raw['user'].apply(anonymize_user)

print("ANONYMIZED USER MAPPING:")
print(df_raw[['user', 'anon_user']])
print()
print("Note: Anonymization preserves network structure while protecting identity.")

ANONYMIZED USER MAPPING:
      user anon_user
0    Alice  3bc51062
1      Bob  cd9fb1e1
2  Charlie  6e81b125

Note: Anonymization preserves network structure while protecting identity.


## Relational Extraction with Audit Trail

Extract relationships from text while maintaining a complete audit trail of decisions.

In [12]:
# Relational Extraction and Audit Trail
audit_trail = []

def extract_ties_with_audit(text, user_id):
    # In practice, this would call an LLM API
    # Prompt: "Identify mentioned users and the sentiment of the interaction."
    extracted_targets = []
    if "@bob" in text:
        extracted_targets.append("Bob")
    if "@charlie" in text:
        extracted_targets.append("Charlie")
    if "@alice" in text:
        extracted_targets.append("Alice")
    
    sentiment = "Positive" if "appreciate" in text or "great" in text else "Neutral"
    
    # Log the decision for the audit trail
    for target in extracted_targets:
        audit_trail.append({
            "source_user": user_id,
            "input_text": text[:50],
            "identified_target": target,
            "sentiment_assigned": sentiment,
            "logic": "Mention-based extraction"
        })
    
    return extracted_targets, sentiment

# Apply extraction
df_raw[['targets', 'sentiment']] = df_raw.apply(
    lambda x: pd.Series(extract_ties_with_audit(x['clean_text'], x['user'])), 
    axis=1
)

# Reviewing the Audit Trail
audit_df = pd.DataFrame(audit_trail)

print("AUDIT TRAIL SUMMARY:")
print(audit_df)
print()
print("The audit trail documents every extraction decision for transparency and review.")

AUDIT TRAIL SUMMARY:
  source_user                                       input_text  \
0       Alice  i really appreciate @bob's help on the project.   
1         Bob  thanks @alice! @charlie also contributed a lot.   
2         Bob  thanks @alice! @charlie also contributed a lot.   
3     Charlie        great collaboration with @alice and @bob!   
4     Charlie        great collaboration with @alice and @bob!   

  identified_target sentiment_assigned                     logic  
0               Bob           Positive  Mention-based extraction  
1           Charlie            Neutral  Mention-based extraction  
2             Alice            Neutral  Mention-based extraction  
3               Bob           Positive  Mention-based extraction  
4             Alice           Positive  Mention-based extraction  

The audit trail documents every extraction decision for transparency and review.


## Minimal Audit Trail Template

For each LLM use, document purpose, prompt, model, output, and researcher decision.

In [13]:
# Audit Trail Example for LLM Operations
audit_trail_llm = pd.DataFrame([
    {
        "stage": "Phase 2 scaling",
        "purpose": "Identify trust, authority, conflict patterns",
        "prompt": "Analyze text for relational patterns",
        "model": "gpt-4o-mini",
        "decision": "Used for case selection and hypothesis generation only"
    },
    {
        "stage": "Parallel analysis",
        "purpose": "Compare human vs LLM coding",
        "prompt": "Describe roles and relationships",
        "model": "gpt-4o-mini",
        "decision": "Used to identify convergences and divergences"
    }
])

print("AUDIT TRAIL FOR LLM OPERATIONS:")
print(audit_trail_llm.to_string())
print()
print("This template enables transparency and enables peer review of LLM use.")

AUDIT TRAIL FOR LLM OPERATIONS:
               stage                                       purpose                                prompt        model                                                decision
0    Phase 2 scaling  Identify trust, authority, conflict patterns  Analyze text for relational patterns  gpt-4o-mini  Used for case selection and hypothesis generation only
1  Parallel analysis                   Compare human vs LLM coding      Describe roles and relationships  gpt-4o-mini           Used to identify convergences and divergences

This template enables transparency and enables peer review of LLM use.


## Validity Reflection

Design choices shape what counts as data, interpretation, and defensible claims. LLMs amplify the consequences of weak design.

In [14]:
print("VALIDITY REFLECTION")
print("="*60)
print()
print("Design Choices and Their Consequences:")
print()
print("1. What counts as DATA:")
print("   - Small qualitative corpus vs. large corpus")
print("   - Raw text vs. cleaned text")
print("   - Anonymized vs. identified participants")
print()
print("2. What counts as INTERPRETATION:")
print("   - Human coding vs. LLM analysis")
print("   - Convergence vs. divergence")
print("   - Provisional vs. final claims")
print()
print("3. What claims are DEFENSIBLE:")
print("   - Claims grounded in qualitative anchor")
print("   - Claims verified through interpretive return")
print("   - Claims documented in audit trail")
print()
print("LLMs amplify the consequences of weak design choices.")
print("Transparent design is essential for validity.")

VALIDITY REFLECTION

Design Choices and Their Consequences:

1. What counts as DATA:
   - Small qualitative corpus vs. large corpus
   - Raw text vs. cleaned text
   - Anonymized vs. identified participants

2. What counts as INTERPRETATION:
   - Human coding vs. LLM analysis
   - Convergence vs. divergence
   - Provisional vs. final claims

3. What claims are DEFENSIBLE:
   - Claims grounded in qualitative anchor
   - Claims verified through interpretive return
   - Claims documented in audit trail

LLMs amplify the consequences of weak design choices.
Transparent design is essential for validity.


## Session 4 Takeaway

LLMs do not simplify research design. They make design choices more consequential and more visible.

## Part 2: Applying Research Designs to 20 Newsgroups Dataset

Now we apply the sequential and parallel design patterns to a larger dataset: the 20 Newsgroups corpus.

In [15]:
# --- CONFIGURATION ---
n = 20  # Number of Nodes (Researchers)
m = 100  # Number of Edges (Interactions/posts)

# Dataset Description
# The 20 Newsgroups dataset is a collection of approximately 18,000 newsgroup posts 
# that originated in the early days of the internet (Usenet) and they can be 
# displayed as a social network (a directed weighted multigraph) among thousands 
# of unique nodes/researchers interacting/replying in the posts of the 20 newsgroups.
# Taken from sklearn.datasets.fetch_20newsgroups

# Generate a unique filename based on m to avoid mixing samples
config_hash = hashlib.md5(f"{m}_newsgroups_s4".encode()).hexdigest()[:8]
SNAPSHOT_FILE = f"news_snapshot_m{m}_{config_hash}.csv"

# CHECK IF WE ALREADY HAVE THE COMPLETE DATA
if os.path.exists(SNAPSHOT_FILE):
    print(f"‚úÖ LOADING PERMANENT SNAPSHOT: {SNAPSHOT_FILE}")
    interactions = pd.read_csv(SNAPSHOT_FILE)
else:
    print(f"üöÄ SNAPSHOT NOT FOUND. GENERATING NEW SAMPLE...")
    
    # 1. Fetch the big dataset (11,000+ posts)
    newsgroups = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'))
    full_df = pd.DataFrame({'text': newsgroups.data})
    
    # 2. Filter and Sample M posts
    df = full_df[full_df['text'].str.strip().str.len() > 20].copy()
    subset = df.sample(n=m, random_state=42).reset_index(drop=True)
    
    # 3. Assign the Social Structure (Source/Target)
    user_pool = [f"Researcher_{i:02d}" for i in range(n)]
    sources = [random.choice(user_pool) for _ in range(m)]
    targets = [random.choice([u for u in user_pool if u != s]) for s in sources]

    interactions = pd.DataFrame({
        "source": sources,
        "target": targets,
        "text": subset['text'].str[:300].replace('\n', ' ', regex=True)
    })

    # 4. IMMEDIATELY SAVE (before LLM processing)
    interactions.to_csv(SNAPSHOT_FILE, index=False)
    print(f"üíæ PERMANENTLY SAVED: {SNAPSHOT_FILE}")

print(f"\n--- READY: {len(interactions)} interactions between {n} nodes ---")
interactions.head()

üöÄ SNAPSHOT NOT FOUND. GENERATING NEW SAMPLE...
üíæ PERMANENTLY SAVED: news_snapshot_m100_f13de034.csv

--- READY: 100 interactions between 20 nodes ---


Unnamed: 0,source,target,text
0,Researcher_15,Researcher_12,In case you missed it on the news....the first...
1,Researcher_05,Researcher_11,We have no way of knowing because we cann...
2,Researcher_02,Researcher_00,The lengthy article you quote doesn't imply ...
3,Researcher_11,Researcher_00,"The recent rise of nostalgia in this group, co..."
4,Researcher_09,Researcher_01,"# ## Absolutely nothing, seeing as there is no..."


## Sequential Design on Newsgroups: Phase 1 (Qualitative Anchor)

Select a small sample of newsgroup posts for qualitative analysis.

In [16]:
# Sample a subset for detailed qualitative analysis
sample_size = min(3, len(interactions))
sample_interactions = interactions.sample(n=sample_size, random_state=42)

print("PHASE 1: QUALITATIVE ANCHOR (Newsgroups Sample)")
print("="*60)
print()

for idx, row in sample_interactions.iterrows():
    print(f"Post {idx}:")
    print(f"From: {row['source']} To: {row['target']}")
    print(f"Text: {row['text'][:150]}...")
    print("-" * 60)

print("\nThese posts form the qualitative anchor for Phase 2 analysis.")

PHASE 1: QUALITATIVE ANCHOR (Newsgroups Sample)

Post 83:
From: Researcher_15 To: Researcher_12
Text:     Ok boys & girls, hang on; here we go!      Christ's Eternal Gospel               Robinson & Robinson    The Dead Sea Scrolls & the NT         WS L...
------------------------------------------------------------
Post 53:
From: Researcher_03 To: Researcher_07
Text:     J.N. Darby was one of the founders of the "Plymouth Brethren" and an early supporter of dispensationalism.  F.F. Bruce highly approved of his tran...
------------------------------------------------------------
Post 70:
From: Researcher_11 To: Researcher_17
Text:  I missed the presentations given in the morning session (when Shea gave his "rambling and almost inaudible" presentation), but I did attend the after...
------------------------------------------------------------

These posts form the qualitative anchor for Phase 2 analysis.


## Sequential Design on Newsgroups: Phase 2 (LLM-Assisted Scaling)

Use LLM to analyze the full newsgroups dataset using Phase 1 concepts.

In [17]:
print("PHASE 2: LLM-ASSISTED SCALING (Newsgroups)")
print("="*60)
print()

# Analyze sample using LLM
for idx, row in sample_interactions.iterrows():
    text = row['text']
    analysis = get_label("openai", text, lambda t: query_openai_design(t, "sequential"), "sequential_newsgroups")
    print(f"Post {idx}: {text[:80]}...")
    print(f"LLM Analysis:\n{analysis}")
    print("-" * 60)

print("\nPhase 2 complete: LLM has identified patterns in newsgroups data.")

PHASE 2: LLM-ASSISTED SCALING (Newsgroups)

Post 83:     Ok boys & girls, hang on; here we go!      Christ's Eternal Gospel          ...
LLM Analysis:
To analyze the provided text for relational patterns, we can break down the components as follows:

### 1. Trust Relations
- The phrase "Ok boys & girls, hang on; here we go!" suggests a casual and friendly tone, indicating a level of trust and camaraderie among the audience, likely implying that the speaker feels comfortable engaging with the group.
- The mention of various authors and works (e.g., "Robinson & Robinson," "RH Eisenman") indicates that the speaker may trust the credibility of these individuals and their scholarly contributions. This can suggest an implicit endorsement of their perspectives and interpretations, which could influence the audience's trust in the information presented.

### 2. Formal Authority Structures
- The structure of the text presents a list of authors and their works, suggesting a hierarchy of knowledg

## Parallel Design on Newsgroups: Human vs LLM Analysis

Compare independent human interpretation with LLM analysis on the same newsgroups sample.

In [18]:
print("PARALLEL DESIGN: HUMAN vs LLM (Newsgroups)")
print("="*60)
print()

print("HUMAN INTERPRETATION:")
print("- Posts show collaborative discussion")
print("- Mix of technical and social content")
print("- Varying levels of formality and expertise")
print()

print("LLM ANALYSIS:")
corpus_text = "\n".join([row['text'][:100] for _, row in sample_interactions.iterrows()])
llm_ng_analysis = get_label("openai", corpus_text, query_openai_roles, "parallel_newsgroups")
print(llm_ng_analysis)
print()
print("Convergences and divergences between human and LLM are analytic resources.")

PARALLEL DESIGN: HUMAN vs LLM (Newsgroups)

HUMAN INTERPRETATION:
- Posts show collaborative discussion
- Mix of technical and social content
- Varying levels of formality and expertise

LLM ANALYSIS:
In the provided text, the roles and relationships of the actors can be inferred, even though the details are limited. Here's an analysis focusing on their positions, responsibilities, and interpersonal dynamics:

1. **J.N. Darby**:
   - **Position**: Historical figure; founder of the Plymouth Brethren movement.
   - **Responsibilities**: As a founder, Darby played a crucial role in establishing the beliefs and practices of the Plymouth Brethren. His contributions to theological discourse, particularly in dispensationalism, would position him as an influential leader and teacher within this religious community.
   - **Interpersonal Dynamics**: Although not directly mentioned in current relationships, his legacy likely impacts the dynamics among contemporary members of the Plymouth Brethren

## Anonymization and Audit Trail for Newsgroups

Apply ethical safeguards and maintain audit trail for newsgroups analysis.

In [19]:
# Anonymize newsgroups researchers
interactions['anon_source'] = interactions['source'].apply(anonymize_user)
interactions['anon_target'] = interactions['target'].apply(anonymize_user)

print("ANONYMIZATION (Newsgroups):")
print(interactions[['source', 'anon_source', 'target', 'anon_target']].head())
print()
print("Anonymization preserves network structure while protecting participant identity.")

ANONYMIZATION (Newsgroups):
          source anon_source         target anon_target
0  Researcher_15    5264903f  Researcher_12    30f11a4b
1  Researcher_05    5c4c51f2  Researcher_11    002f27a4
2  Researcher_02    b4c254e8  Researcher_00    fa949140
3  Researcher_11    002f27a4  Researcher_00    fa949140
4  Researcher_09    8e63bbe2  Researcher_01    1cf82af9

Anonymization preserves network structure while protecting participant identity.


## Network Construction from Newsgroups

Build a directed graph from the newsgroups interactions using anonymized identifiers.

In [20]:
# Build the Graph from anonymized interactions
G = nx.from_pandas_edgelist(interactions, 'anon_source', 'anon_target', 
                            create_using=nx.DiGraph())

print(f"Graph Statistics:")
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.3f}")
print()

# Print first 10 edges
print("Sample edges (anonymized):")
for i, e in enumerate(G.edges(data=True)):
    if i < 10:
        print(e)

Graph Statistics:
Nodes: 20
Edges: 86
Density: 0.226

Sample edges (anonymized):
('5264903f', '30f11a4b', {})
('5264903f', '5f69b4a7', {})
('5264903f', '5c4c51f2', {})
('30f11a4b', 'aae702ef', {})
('30f11a4b', '1cf82af9', {})
('30f11a4b', 'fa949140', {})
('30f11a4b', '39119a69', {})
('30f11a4b', '002f27a4', {})
('30f11a4b', '01664e66', {})
('5c4c51f2', '002f27a4', {})


## Network Visualization

Visualize the newsgroups social network using pyvis with labels only (no circles).

In [21]:
# Initialize Network
net = Network(height="500px", width="100%", directed=True, bgcolor="#ffffff")

# Add Nodes (labels only, no visible circles)
for node in G.nodes():
    net.add_node(
        node, 
        label=node, 
        shape='dot',
        size=1,
        color='#ffffff',
        borderWidth=0,
        font={'size': 12, 'color': 'black', 'align': 'center'}
    )
    
# Add Edges
for source, target in G.edges():
    net.add_edge(
        source, 
        target, 
        color='#848484',
        arrows={'to': {'enabled': True, 'scaleFactor': 0.5}},
        smooth={'type': 'curvedCW', 'roundness': 0.2},
        font={'align': 'top', 'size': 12, 'color': 'blue'}
    )

# Physics and Rendering
net.set_options("""
var options = {
  "physics": {
    "barnesHut": { "gravitationalConstant": -3000, "springLength": 150 }
  }
}
""")

html_content = net.generate_html()
with open("newsgroups_graph_s4.html", "w") as f:
    f.write(html_content)

IPython.display.IFrame(src="newsgroups_graph_s4.html", width='100%', height='550px')