# Analysis Infrastructure v5 – Main Notebook

**Theory-driven qualitative interview analysis with Meta-Frame architecture**

This notebook runs the full analysis pipeline:
1. **Module A** – Narrative Structure (Schütze, Ricoeur)
2. **Module B** – Subject Positioning (Bamberg, Lucius-Hoene)
3. **Module C** – Discursive Framing with Meta-Frames (Foucault, Goffman, Entman)
4. **Module D** – Affective Dimension (Ahmed, Massumi)
5. **JusticeAnalyzer** – Social (In)Justice Tension Profiles

### Architecture

```
framebook_v3.1.yaml          ← Meta-Frames (universal, topic-independent)
overlays/
  housing_lux.yaml           ← Project-specific extension (optional)
```

**Two-layer model:**
- Layer 1: 10 Meta-Frames + 7 Meta-Topoi (theory-driven, never modify)
- Layer 2: Topic overlays (optional, per project)

### Prerequisites

```bash
conda env create -f environment.yml
conda activate analyseinfrastruktur
python scripts/setup_nltk.py
python -m spacy download de_core_news_sm
```


---
## 1. Load Infrastructure


In [None]:
import sys, os
INFRA_ROOT = os.path.abspath('..')
if INFRA_ROOT not in sys.path:
    sys.path.insert(0, INFRA_ROOT)

from core.datamodel import Corpus, Document
from core.language import LanguageGate
from core.framebook import Framebook
from core.integration import Integrator
from core.justice import JusticeAnalyzer
from core.export import export_all
from modules import ModulNarrativ, ModulPosition, ModulDiskurs, ModulAffekt
from turn_splitter import split_long_turns
from diagnose import run_diagnose

print('✅ Infrastructure loaded.')


---
## 2. Settings

All project-specific parameters in one place.
For a new interview: only modify this cell.


In [None]:
# ── Project Settings ───────────────────────────────────
LANGUAGE          = 'en'           # Transcript language
TRANSCRIPT_FILE   = '../transkripte/Example Interview Transcript.txt'
INTERVIEWER       = {"Interviewer"} # Interviewer speaker name(s)
FRAMEBOOK_PATH    = '../config/framebook_v3.1.yaml'
OVERLAY_PATH      = '../overlays/housing_lux.yaml'  # None = Meta-Frames only
SPLIT_MAX_CHARS   = 800            # Max characters per turn after splitting
DOC_ID            = 'interview_01' # Document ID


---
## 3. Run Analysis

This cell executes the complete pipeline in the correct order:
Framebook → Document → Turn-Splitting → Modules A–D → Diagnostics → Integrator → Justice

**After kernel restart:** Run cells 1–3, then inspect below.


In [None]:
# ══════════════════════════════════════════════════════
#  MASTER START: Complete analysis in correct order
# ══════════════════════════════════════════════════════

from pathlib import Path

# 1) Framebook + LanguageGate
fb = Framebook(FRAMEBOOK_PATH, overlay=OVERLAY_PATH)
gate = LanguageGate(LANGUAGE)

print(f'Framebook: {fb}')
for k, v in fb.summary().items():
    print(f'  {k}: {v}')

# 2) Build document
text = Path(TRANSCRIPT_FILE).read_text(encoding='utf-8')
doc = Document.from_text(text, doc_id=DOC_ID, language=LANGUAGE)

# Define respondent turns
respondent = [t for t in doc.turns if getattr(t, 'sprecher', 'UNKNOWN') not in INTERVIEWER]
doc.get_befragte_turns = lambda: respondent
print(f'\nTurns: {len(doc.turns)} (Respondent: {len(respondent)})')

# 3) Turn splitting (BEFORE analysis!)
split_long_turns(doc, interviewer=INTERVIEWER, max_chars=SPLIT_MAX_CHARS)

# 4) Initialize modules + run analysis
mod_a = ModulNarrativ(gate, fb.textsorten, fb.prozessstrukturen)
mod_b = ModulPosition(gate, fb.pronomen, fb.agency)
mod_c = ModulDiskurs(gate, fb.frames, fb.topoi, fb.frame_spannungen,
                     fb.frame_priorities, fb.frame_conflicts)
mod_d = ModulAffekt(gate, fb.affekt_dimensionen)
modules = {'A': mod_a, 'B': mod_b, 'C': mod_c, 'D': mod_d}

doc.annotations = []
print('\nRunning analysis...')
for name, mod in modules.items():
    n = mod.analyse(doc)
    print(f'  Module {name} ({mod.name}): {n} annotations')
print(f'Total: {len(doc.annotations)}')

# 5) Diagnostics
report = run_diagnose(doc, modules)

# 6) Integrator
integrator = Integrator(doc, mod_a, mod_b, mod_c, mod_d)

# 7) JusticeAnalyzer
ja = JusticeAnalyzer(doc, mod_b, mod_c, mod_d,
                     fb.frame_priorities, fb.frame_conflicts,
                     framebook=fb)

print('\n✅ Analysis complete.')


---
## 4. Integrated Report


In [None]:
integrator.print_bericht()


---
## 5. Social (In)Justice Analysis

Computes tension profiles from the interplay of Module B (Agency),
C (Frames), and D (Affect). Social (in)justice is modeled as the
relation between aspiration frames (A) and structural frames (S).


In [None]:
ja.print_profil()


---
## 6. Inspect Individual Modules


### Module A: Narrative Structure


In [None]:
print('=== A: Text Type Sequence ===')
for row in mod_a.zusammenfassung(doc):
    ps = f' → {row["prozessstrukturen"]}' if row['prozessstrukturen'] != '-' else ''
    print(f'  Turn {row["turn_id"]}: {row["sequenz_kurz"]}  |  {row["sequenz"]}{ps}')

print('\n=== A: Turning Points ===')
for wp in mod_a.wendepunkt_kandidaten(doc):
    print(f'  Turn {wp["turn_id"]} (Score: {wp["score"]}): {wp["reasons"]}')


### Module B: Subject Positioning


In [None]:
print('=== B: Agency ===')
for row in mod_b.zusammenfassung(doc):
    print(f'  Turn {row["turn_id"]}: {row["dominant_agency"]} ({row["agency_dichte"]}%) | {row["pronomen"]}')


### Module C: Discursive Framing


In [None]:
print('=== C: Frames (Raw → Adjusted → Dominant) ===')
for row in mod_c.zusammenfassung(doc):
    if row['frames']:
        raw = row['frames']
        adj = row.get('frames_adjusted', raw)
        dom = row['dominant_frame']
        
        diffs = []
        for f in raw:
            r = raw[f]
            a = adj.get(f, r)
            if isinstance(a, float) and a < r:
                diffs.append(f'{f}: {r}→{a:.1f}')
        
        print(f'  Turn {row["turn_id"]}: {raw}')
        if diffs:
            print(f'    ⚖ Adjusted: {diffs}')
        print(f'    ★ Dominant: {dom}')

print('\n=== C: Overall Distribution (Raw vs. Adjusted) ===')
from collections import Counter
raw_total = Counter()
adj_total = Counter()
for row in mod_c.zusammenfassung(doc):
    for f, c in row['frames'].items():
        raw_total[f] += c
    for f, c in row.get('frames_adjusted', row['frames']).items():
        adj_total[f] += c if isinstance(c, (int, float)) else c

total_raw = sum(raw_total.values())
total_adj = sum(adj_total.values())
print(f'{"Frame":<35} {"Raw":>5} {"Raw%":>6} {"Adj":>6} {"Adj%":>6}  {"Δ":>5}')
print('─' * 70)
for f in sorted(raw_total, key=raw_total.get, reverse=True):
    r = raw_total[f]
    a = adj_total.get(f, r)
    rp = r / total_raw * 100 if total_raw else 0
    ap = a / total_adj * 100 if total_adj else 0
    delta = ap - rp
    marker = ' ▼' if delta < -1 else ' ▲' if delta > 1 else ''
    print(f'  {f:<33} {r:>5} {rp:>5.1f}% {a:>6.1f} {ap:>5.1f}%  {delta:>+.1f}{marker}')

print('\n=== C: Claims ===')
for c in mod_c.generate_claims(doc):
    print(f'  [{c["typ"]}] {c["beschreibung"]}')
    print(f'    {c["prueffrage"]}')


### Module D: Affective Dimension


In [None]:
print('=== D: Affective Condensation Points ===')
for s in mod_d.verdichtungsstellen(doc):
    print(f'  Turn {s["turn_id"]} (Score: {s["score"]}, Density: {s["marker_dichte"]}%): {s["reasons"]}')


### Audit Trail


In [None]:
print('=== Audit Trail (first 10 annotations) ===')
for i, a in enumerate(doc.annotations[:10]):
    print(f'[{i+1}] {a.modul} | {a.kategorie} | Match: "{a.matched_text}" | Rule: {a.regel_id}')


---
## 7. Visualization


In [None]:
import matplotlib.pyplot as plt
import pandas as pd
from collections import Counter, defaultdict

vollbericht = integrator.vollbericht()
profiles = vollbericht['turn_profile']

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1) Affective condensation
tids = [p['turn_id'] for p in profiles]
aff = [p['affekt_dichte'] for p in profiles]
colors = plt.cm.YlOrRd([d / max(max(aff), 1) for d in aff])
axes[0,0].bar(tids, aff, color=colors, edgecolor='gray')
axes[0,0].set_title('Affective Condensation')
axes[0,0].set_xlabel('Turn')

# 2) Annotations per module
mcounts = {}
for a in doc.annotations:
    mcounts[a.modul] = mcounts.get(a.modul, 0) + 1
axes[0,1].barh(list(mcounts.keys()), list(mcounts.values()), color='steelblue')
axes[0,1].set_title('Annotations per Module')

# 3) Discursive framing
turn_counts = defaultdict(Counter)
for a in doc.annotations:
    m = str(getattr(a, "modul", ""))
    if not m.startswith("C"):
        continue
    cat = getattr(a, "kategorie", None)
    tid = getattr(a, "turn_id", None)
    if cat and tid is not None:
        turn_counts[int(tid)][str(cat)] += 1

sequence = [{"turn_id": tid, **turn_counts[tid]} for tid in sorted(turn_counts.keys())]
if sequence:
    df_f = pd.DataFrame(sequence).set_index('turn_id')
    df_num = df_f.select_dtypes(include="number")
    if not df_num.empty:
        df_num.plot(kind='bar', stacked=True, ax=axes[1,0], colormap='Set2', legend=False)
        axes[1,0].set_title('Discursive Framing')
        axes[1,0].legend(fontsize=7)
else:
    axes[1,0].text(0.5, 0.5, 'No frame data', ha='center', va='center')
    axes[1,0].set_title('Discursive Framing')

# 4) Justice tension profile
jp = ja.turn_profiles()
justice_turns = [p for p in jp if p['is_justice_site']]
if justice_turns:
    axes[1,1].barh(
        [f'Turn {p["turn_id"]}' for p in sorted(justice_turns, key=lambda x: x['intensity_norm'])],
        [p['intensity_norm'] for p in sorted(justice_turns, key=lambda x: x['intensity_norm'])],
        color='coral'
    )
    axes[1,1].set_title('(In)Justice Tension Intensity (/1000 chars)')
else:
    axes[1,1].text(0.5, 0.5, 'No justice sites', ha='center', va='center')
    axes[1,1].set_title('(In)Justice Tension Intensity')

plt.suptitle(f'{doc.doc_id} – Analysis Overview', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()


---
## 8. Export


In [None]:
corpus = Corpus(name='project')
corpus.add(doc)
export_all(corpus, modules, output_dir='../output')


---
## 9. Batch Mode (optional)

For multiple interviews: uncomment and adjust paths.


In [None]:
# import glob
# corpus_batch = Corpus(name='batch')
# for fp in sorted(glob.glob('../transkripte/*.txt')):
#     text = open(fp, 'r', encoding='utf-8').read()
#     d = Document.from_text(text, doc_id=os.path.basename(fp).replace('.txt',''), language=LANGUAGE)
#     split_long_turns(d, interviewer=INTERVIEWER, max_chars=SPLIT_MAX_CHARS)
#     d.annotations = []
#     for mod in modules.values():
#         mod.analyse(d)
#     corpus_batch.add(d)
#     print(f'  {d.doc_id}: {len(d.annotations)} annotations')
# export_all(corpus_batch, modules, output_dir='../output')
