# Analyseinfrastruktur v5 – Demo Notebook

**Theoriegeleitete qualitative Interviewanalyse mit Meta-Frame-Architektur**

Dieses Notebook demonstriert die vollständige Analyse-Pipeline:
1. **Modul A** – Narrative Struktur (Schütze, Ricoeur)
2. **Modul B** – Subjektpositionierung (Bamberg, Lucius-Hoene)
3. **Modul C** – Diskursive Rahmung mit Meta-Frames (Foucault, Goffman, Entman)
4. **Modul D** – Affektive Dimension (Ahmed, Massumi)
5. **JusticeAnalyzer** – (Un)Gerechtigkeits-Spannungsprofile

### Architektur

```
framebook_v3.1.yaml          ← Meta-Frames (universell, themenunabhängig)
overlays/
  housing_lux.yaml           ← Projektspezifische Erweiterung (optional)
```

**Zwei-Ebenen-Modell:**
- Ebene 1: 10 Meta-Frames + 7 Meta-Topoi (theoriegeleitet, nie anpassen)
- Ebene 2: Themen-Overlays (optional, pro Projekt zuschaltbar)

### Voraussetzungen

```bash
conda env create -f environment.yml
conda activate analyseinfrastruktur
python scripts/setup_nltk.py
python -m spacy download de_core_news_sm
```


---
## 1. Infrastruktur laden


In [None]:
import sys, os
INFRA_ROOT = os.path.abspath('..')
if INFRA_ROOT not in sys.path:
    sys.path.insert(0, INFRA_ROOT)

from core.datamodel import Corpus, Document
from core.language import LanguageGate
from core.framebook import Framebook
from core.integration import Integrator
from core.justice import JusticeAnalyzer
from core.export import export_all
from modules import ModulNarrativ, ModulPosition, ModulDiskurs, ModulAffekt
from turn_splitter import split_long_turns
from diagnose import run_diagnose

print('✅ Infrastruktur geladen.')


---
## 2. Einstellungen

Alle projektspezifischen Parameter an einer Stelle.
Für ein neues Interview: nur diese Zelle anpassen.


In [None]:
# ── Projekt-Einstellungen ──────────────────────────────
SPRACHE           = 'en'           # Sprache des Transkripts
TRANSKRIPT_DATEI  = '../transkripte/Example Interview Transcript.txt'
INTERVIEWER       = {"Interviewer"} # Name(n) der Interviewer-Sprecher
FRAMEBOOK_PFAD    = '../config/framebook_v3.1.yaml'
OVERLAY_PFAD      = '../overlays/housing_lux.yaml'  # None = nur Meta-Frames
SPLIT_MAX_CHARS   = 800            # Max. Zeichen pro Turn nach Splitting
DOC_ID            = 'interview_01' # Dokumenten-ID


---
## 3. Analyse ausführen

Diese Zelle führt die komplette Pipeline in korrekter Reihenfolge aus:
Framebook → Document → Turn-Splitting → Module A–D → Diagnose → Integrator → Justice

**Nach Kernel-Restart:** Zellen 1–3 ausführen, dann unten inspizieren.


In [None]:
# ══════════════════════════════════════════════════════
#  MASTER-START: Komplette Analyse in korrekter Reihenfolge
# ══════════════════════════════════════════════════════

from pathlib import Path

# 1) Framebook + LanguageGate
fb = Framebook(FRAMEBOOK_PFAD, overlay=OVERLAY_PFAD)
gate = LanguageGate(SPRACHE)

print(f'Framebook: {fb}')
for k, v in fb.summary().items():
    print(f'  {k}: {v}')

# 2) Document bauen
text = Path(TRANSKRIPT_DATEI).read_text(encoding='utf-8')
doc = Document.from_text(text, doc_id=DOC_ID, language=SPRACHE)

# Befragte-Turns definieren
befragte = [t for t in doc.turns if getattr(t, 'sprecher', 'UNKNOWN') not in INTERVIEWER]
doc.get_befragte_turns = lambda: befragte
print(f'\nTurns: {len(doc.turns)} (Befragte: {len(befragte)})')

# 3) Turn-Splitting (VOR Analyse!)
split_long_turns(doc, interviewer=INTERVIEWER, max_chars=SPLIT_MAX_CHARS)

# 4) Module initialisieren + Analyse
mod_a = ModulNarrativ(gate, fb.textsorten, fb.prozessstrukturen)
mod_b = ModulPosition(gate, fb.pronomen, fb.agency)
mod_c = ModulDiskurs(gate, fb.frames, fb.topoi, fb.frame_spannungen,
                     fb.frame_priorities, fb.frame_conflicts)
mod_d = ModulAffekt(gate, fb.affekt_dimensionen)
module = {'A': mod_a, 'B': mod_b, 'C': mod_c, 'D': mod_d}

doc.annotations = []
print('\nAnalyse...')
for name, mod in module.items():
    n = mod.analyse(doc)
    print(f'  Modul {name} ({mod.name}): {n} Annotations')
print(f'Gesamt: {len(doc.annotations)}')

# 5) Diagnose
report = run_diagnose(doc, module)

# 6) Integrator
integrator = Integrator(doc, mod_a, mod_b, mod_c, mod_d)

# 7) JusticeAnalyzer
ja = JusticeAnalyzer(doc, mod_b, mod_c, mod_d,
                     fb.frame_priorities, fb.frame_conflicts,
                     framebook=fb)

print('\n✅ Analyse abgeschlossen.')


---
## 4. Integrierter Bericht


In [None]:
integrator.print_bericht()


---
## 5. (Un)Gerechtigkeits-Analyse

Berechnet Spannungsprofile aus dem Zusammenspiel der Module B (Agency),
C (Frames) und D (Affekt). Soziale (Un)Gerechtigkeit wird modelliert als
Relation zwischen Anspruchsframes (A) und Strukturframes (S).


In [None]:
ja.print_profil()


---
## 6. Einzelmodule inspizieren


### Modul A: Narrative Struktur


In [None]:
print('=== A: Textsorten-Verlauf ===')
for row in mod_a.zusammenfassung(doc):
    ps = f' → {row["prozessstrukturen"]}' if row['prozessstrukturen'] != '-' else ''
    print(f'  Turn {row["turn_id"]}: {row["sequenz_kurz"]}  |  {row["sequenz"]}{ps}')

print('\n=== A: Wendepunkte ===')
for wp in mod_a.wendepunkt_kandidaten(doc):
    print(f'  Turn {wp["turn_id"]} (Score: {wp["score"]}): {wp["reasons"]}')


### Modul B: Subjektpositionierung


In [None]:
print('=== B: Agency ===')
for row in mod_b.zusammenfassung(doc):
    print(f'  Turn {row["turn_id"]}: {row["dominant_agency"]} ({row["agency_dichte"]}%) | {row["pronomen"]}')


### Modul C: Diskursive Rahmung


In [None]:
print('=== C: Frames (Raw → Adjusted → Dominant) ===')
for row in mod_c.zusammenfassung(doc):
    if row['frames']:
        raw = row['frames']
        adj = row.get('frames_adjusted', raw)
        dom = row['dominant_frame']
        
        diffs = []
        for f in raw:
            r = raw[f]
            a = adj.get(f, r)
            if isinstance(a, float) and a < r:
                diffs.append(f'{f}: {r}→{a:.1f}')
        
        print(f'  Turn {row["turn_id"]}: {raw}')
        if diffs:
            print(f'    ⚖ Adjusted: {diffs}')
        print(f'    ★ Dominant: {dom}')

print('\n=== C: Gesamt-Verteilung (Raw vs. Adjusted) ===')
from collections import Counter
raw_total = Counter()
adj_total = Counter()
for row in mod_c.zusammenfassung(doc):
    for f, c in row['frames'].items():
        raw_total[f] += c
    for f, c in row.get('frames_adjusted', row['frames']).items():
        adj_total[f] += c if isinstance(c, (int, float)) else c

total_raw = sum(raw_total.values())
total_adj = sum(adj_total.values())
print(f'{"Frame":<35} {"Raw":>5} {"Raw%":>6} {"Adj":>6} {"Adj%":>6}  {"Δ":>5}')
print('─' * 70)
for f in sorted(raw_total, key=raw_total.get, reverse=True):
    r = raw_total[f]
    a = adj_total.get(f, r)
    rp = r / total_raw * 100 if total_raw else 0
    ap = a / total_adj * 100 if total_adj else 0
    delta = ap - rp
    marker = ' ▼' if delta < -1 else ' ▲' if delta > 1 else ''
    print(f'  {f:<33} {r:>5} {rp:>5.1f}% {a:>6.1f} {ap:>5.1f}%  {delta:>+.1f}{marker}')

print('\n=== C: Claims ===')
for c in mod_c.generate_claims(doc):
    print(f'  [{c["typ"]}] {c["beschreibung"]}')
    print(f'    {c["prueffrage"]}')


### Modul D: Affektive Dimension


In [None]:
print('=== D: Affektive Verdichtung ===')
for s in mod_d.verdichtungsstellen(doc):
    print(f'  Turn {s["turn_id"]} (Score: {s["score"]}, Dichte: {s["marker_dichte"]}%): {s["reasons"]}')


### Audit Trail


In [None]:
print('=== Audit Trail (erste 10 Annotations) ===')
for i, a in enumerate(doc.annotations[:10]):
    print(f'[{i+1}] {a.modul} | {a.kategorie} | Match: "{a.matched_text}" | Regel: {a.regel_id}')


---
## 7. Visualisierung


In [None]:
import matplotlib.pyplot as plt
import pandas as pd
from collections import Counter, defaultdict

vollbericht = integrator.vollbericht()
profiles = vollbericht['turn_profile']

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1) Affektive Verdichtung
tids = [p['turn_id'] for p in profiles]
aff = [p['affekt_dichte'] for p in profiles]
farben = plt.cm.YlOrRd([d / max(max(aff), 1) for d in aff])
axes[0,0].bar(tids, aff, color=farben, edgecolor='gray')
axes[0,0].set_title('Affektive Verdichtung')
axes[0,0].set_xlabel('Turn')

# 2) Annotations pro Modul
mcounts = {}
for a in doc.annotations:
    mcounts[a.modul] = mcounts.get(a.modul, 0) + 1
axes[0,1].barh(list(mcounts.keys()), list(mcounts.values()), color='steelblue')
axes[0,1].set_title('Annotations pro Modul')

# 3) Diskursive Rahmung
turn_counts = defaultdict(Counter)
for a in doc.annotations:
    m = str(getattr(a, "modul", ""))
    if not m.startswith("C"):
        continue
    cat = getattr(a, "kategorie", None)
    tid = getattr(a, "turn_id", None)
    if cat and tid is not None:
        turn_counts[int(tid)][str(cat)] += 1

verlauf = [{"turn_id": tid, **turn_counts[tid]} for tid in sorted(turn_counts.keys())]
if verlauf:
    df_f = pd.DataFrame(verlauf).set_index('turn_id')
    df_num = df_f.select_dtypes(include="number")
    if not df_num.empty:
        df_num.plot(kind='bar', stacked=True, ax=axes[1,0], colormap='Set2', legend=False)
        axes[1,0].set_title('Diskursive Rahmung')
        axes[1,0].legend(fontsize=7)
else:
    axes[1,0].text(0.5, 0.5, 'Keine C-Daten', ha='center', va='center')
    axes[1,0].set_title('Diskursive Rahmung')

# 4) Justice-Spannungsprofil
jp = ja.turn_profiles()
justice_turns = [p for p in jp if p['is_justice_site']]
if justice_turns:
    axes[1,1].barh(
        [f'Turn {p["turn_id"]}' for p in sorted(justice_turns, key=lambda x: x['intensity_norm'])],
        [p['intensity_norm'] for p in sorted(justice_turns, key=lambda x: x['intensity_norm'])],
        color='coral'
    )
    axes[1,1].set_title('(Un)Gerechtigkeits-Intensität (/1000z)')
else:
    axes[1,1].text(0.5, 0.5, 'Keine Justice-Sites', ha='center', va='center')
    axes[1,1].set_title('(Un)Gerechtigkeits-Intensität')

plt.suptitle(f'{doc.doc_id} – Analyseübersicht', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()


---
## 8. Export


In [None]:
corpus = Corpus(name='projekt')
corpus.add(doc)
export_all(corpus, module, output_dir='../output')


---
## 9. Batch-Modus (optional)

Für mehrere Interviews: Kommentar entfernen und Pfade anpassen.


In [None]:
# import glob
# corpus_batch = Corpus(name='batch')
# for fp in sorted(glob.glob('../transkripte/*.txt')):
#     text = open(fp, 'r', encoding='utf-8').read()
#     d = Document.from_text(text, doc_id=os.path.basename(fp).replace('.txt',''), language=SPRACHE)
#     split_long_turns(d, interviewer=INTERVIEWER, max_chars=SPLIT_MAX_CHARS)
#     d.annotations = []
#     for mod in module.values():
#         mod.analyse(d)
#     corpus_batch.add(d)
#     print(f'  {d.doc_id}: {len(d.annotations)} Annotations')
# export_all(corpus_batch, module, output_dir='../output')
