# Summary Generation
in this notebook we import the summarisation functions from acled_tools.py to generate numerous summaries to get a feel of the method



In [None]:
import acled_tools as at

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"



In [2]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 5):
    prev_summary = store_vac.get(m-1, "")

    n_events, fatalities, severity = at.month_metrics(2025, m, "Violence against civilians",
                                                      uri=URI, user=USER, pwd=PWD)
    delta_sev = ((severity - prev_sev) / prev_sev) if prev_sev not in (None, 0) else 0
    metrics = (f"Events {n_events}, fatalities {fatalities}, severity {severity:.1f}, "
               f"Δ {delta_sev:+.0%} vs prev.")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Violence against civilians",
        month        = m, year = 2025,
        context      = prev_summary,
        style        = "etype_general",
        metrics_block=metrics,
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0.3, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Violence against civilians"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    prev_sev     = severity


### VAC – 01/2025 ###

### **Violence Against Civilians in Mexico – January 2025**

#### **Overall Pattern**
Violence against civilians in January 2025 continued to be marked by high levels of targeted killings, often involving dismemberment, torture, and the use of threatening messages. The month saw a notable increase in attacks linked to organized crime, particularly in states such as Sinaloa, Tabasco, and Guanajuato. Many incidents involved drive-by shootings, abductions, and targeted assassinations of individuals linked to criminal groups, local authorities, or community leaders.

#### **Key Actors and Tactics**
- **Organized Crime Groups**: The Jalisco New Generation Cartel (CJNG) and factions of the Sinaloa Cartel (Los Chapitos, Los Mayitos) were prominent in January, engaging in territorial disputes and retaliatory attacks. The CJNG was particularly active in Tabasco (e.g., ID=MEX97909, ID=MEX98889), while Sinaloa saw intense violence between rival factions (e.g., ID=MEX97404,

In [2]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 5):
    prev_summary = store_vac.get(m-1, "")

    n_events, fatalities, severity = at.month_metrics(2025, m, "Violence against civilians",
                                                      uri=URI, user=USER, pwd=PWD)
    delta_sev = ((severity - prev_sev) / prev_sev) if prev_sev not in (None, 0) else 0
    metrics = (f"Events {n_events}, fatalities {fatalities}, severity {severity:.1f}, "
               f"Δ {delta_sev:+.0%} vs prev.")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Violence against civilians",
        month        = m, year = 2025,
        context      = prev_summary,
        style        = "etype_general",
        metrics_block=metrics,
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Violence against civilians"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    prev_sev     = severity


### VAC – 01/2025 ###

### Brief on Violence Against Civilians in Mexico (January 2025)

#### **Overall Pattern of Violence Against Civilians**
Violence against civilians in January 2025 continued to be marked by high levels of targeted killings, often involving dismemberment, torture, and the use of threatening messages. The month saw a significant number of fatalities, with multiple incidents involving multiple victims. The use of firearms, particularly in drive-by shootings, remained a common tactic. Additionally, there was a notable increase in the targeting of local officials, community leaders, and individuals linked to organized crime, indicating a pattern of violence aimed at intimidation and control.

#### **Key Actors and Changes in Tactics or Targets**
- **Criminal Groups**: The Jalisco New Generation Cartel (CJNG) and other criminal organizations continued to be prominent actors, often leaving messages to assert dominance or retaliate against rivals. For example, ID=MEX979

In [3]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 5):
    prev_summary = store_vac.get(m-1, "")

    n_events, fatalities, severity = at.month_metrics(2025, m, "Violence against civilians",
                                                      uri=URI, user=USER, pwd=PWD)
    delta_sev = ((severity - prev_sev) / prev_sev) if prev_sev not in (None, 0) else 0
    metrics = (f"Events {n_events}, fatalities {fatalities}, severity {severity:.1f}, "
               f"Δ {delta_sev:+.0%} vs prev.")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Violence against civilians",
        month        = m, year = 2025,
        context      = prev_summary,
        style        = "etype_general",
        metrics_block=metrics,
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(metrics)
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Violence against civilians"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    prev_sev     = severity


### VAC – 01/2025 ###

Events 510, fatalities 527, severity 38626.4, Δ +0% vs prev.

#### **Overall Developments**
Violence against civilians in Mexico remained high in January 2025, with a notable increase in targeted assassinations, dismemberments, and drive-by shootings. The month saw a significant number of fatalities, particularly in states like Tabasco, Sinaloa, and Guanajuato. The violence appears to be driven by ongoing conflicts between criminal organizations, as well as disputes over extortion payments and territorial control.

#### **Key Actors Involved and Their Roles**
1. **Criminal Organizations**:
   - **Cártel Jalisco Nueva Generación (CJNG)**: Involved in multiple high-profile killings, including dismemberments and targeted assassinations (e.g., ID=MEX97491, ID=MEX97909, ID=MEX98889).
   - **Sinaloa Cartel (Los Chapitos and Los Mayitos)**: Engaged in violent clashes, particularly in Sinaloa, with multiple incidents of dismemberments and public messages (e.g., ID=MEX97

In [3]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 7):
    prev_summary = store_vac.get(m-1, "")

    n_events, fatalities, severity = at.month_metrics(2025, m, "Violence against civilians",
                                                      uri=URI, user=USER, pwd=PWD)
    delta_sev = ((severity - prev_sev) / prev_sev) if prev_sev not in (None, 0) else 0
    metrics = (f"Events {n_events}, fatalities {fatalities}, severity {severity:.1f}, "
               f"Δ {delta_sev:+.0%} vs prev.")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Violence against civilians",
        month        = m, year = 2025,
        context      = prev_summary,
        style        = "etype",
        metrics_block=metrics,
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(metrics)
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Violence against civilians"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    prev_sev     = severity


### VAC – 01/2025 ###

Events 510, fatalities 527, severity 38626.4, Δ +0% vs prev.
### **Violence Against Civilians in Mexico – January 2025**

#### **1. What happened?**
- **Overall scale**: 510 events, 527 fatalities, and a severity score of 38,626.4 (no change from previous month).
- **Salient developments**:
  - **Tabasco**: Multiple high-profile attacks, including the killing of seven people in Villahermosa (MEX97491, MEX97909).
  - **Morelos**: Targeted killings of local leaders and activists (MEX97876, MEX98120).
  - **Guanajuato**: Persistent gang violence, including drive-by shootings and attacks on public figures (MEX97567, MEX98372).
  - **Sinaloa**: Escalating cartel violence, with multiple dismembered bodies and narco messages (MEX97404, MEX98183).

#### **2. Who was involved?**
- **Perpetrators**:
  - **Cartels**: CJNG (MEX97909, MEX98889), Sinaloa Cartel factions (MEX98183, MEX98584).
  - **Local gangs**: La Linea (MEX98087), Los Mayitos (MEX98091).
- **Targets**:
  - 

In [2]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 7):
    prev_summary = store_vac.get(m-1, "")

    n_events, fatalities, severity = at.month_metrics(2025, m, "Violence against civilians",
                                                      uri=URI, user=USER, pwd=PWD)
    delta_sev = ((severity - prev_sev) / prev_sev) if prev_sev not in (None, 0) else 0
    metrics = (f"Events {n_events}, fatalities {fatalities}, severity {severity:.1f}, "
               f"Δ {delta_sev:+.0%} vs prev.")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Violence against civilians",
        month        = m, year = 2025,
        context      = prev_summary,
        style        = "etype2",
        metrics_block=metrics,
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(metrics)
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Violence against civilians"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    prev_sev     = severity


### VAC – 01/2025 ###

Events 510, fatalities 527, severity 38626.4, Δ +0% vs prev.
### **Situation Brief: Mexico – January 2025**

#### **Overall Situation**
January 2025 saw sustained high levels of violence across Mexico, with **510 events** resulting in **527 fatalities** and a **severity score of 38,626.4** (unchanged from the previous month). The month was marked by intense cartel activity, particularly in **Sinaloa, Guanajuato, and Tabasco**, with frequent use of **dismemberment, public executions, and narco-messages** to assert territorial control. **Drive-by shootings, targeted assassinations, and kidnappings** remained prevalent, with notable attacks on **politicians, journalists, and community leaders**.

#### **Key Developments**
1. **Cartel Violence & Territorial Disputes**
   - **Sinaloa Cartel factions (Los Chapitos vs. Los Mayitos)** clashed violently, leading to multiple executions and dismembered bodies.
   - **CJNG (Jalisco New Generation Cartel)** expanded operatio

In [2]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 7):
    prev_summary = store_vac.get(m-1, "")

    n_events, fatalities, severity = at.month_metrics(2025, m, "Violence against civilians",
                                                      uri=URI, user=USER, pwd=PWD)
    delta_sev = ((severity - prev_sev) / prev_sev) if prev_sev not in (None, 0) else 0
    metrics = (f"Events {n_events}, fatalities {fatalities}, severity {severity:.1f}, "
               f"Δ {delta_sev:+.0%} vs prev.")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Violence against civilians",
        month        = m, year = 2025,
        context      = None,
        style        = "etype_no_context",
        metrics_block=metrics,
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(metrics)
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Violence against civilians"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    prev_sev     = severity


### VAC – 01/2025 ###

Events 510, fatalities 527, severity 38626.4, Δ +0% vs prev.
### **Situation Summary – January 2025**

**Overall Situation:**
January 2025 saw sustained high levels of violence across Mexico, with 510 recorded events resulting in 527 fatalities and a severity score of 38,626.4. The month was marked by intense cartel-related violence, particularly in states like Sinaloa, Guanajuato, and Tabasco, with notable clashes between rival factions. Drive-by shootings, targeted assassinations, and dismembered bodies with narco messages were common, reflecting ongoing territorial disputes and extortion-related violence.

**Key Developments:**
1. **Cartel Violence & Territorial Disputes:**
   - **Sinaloa:** Intense fighting between factions of the Sinaloa Cartel (Los Chapitos vs. Los Mayitos) led to multiple high-profile killings, including the torture and execution of a social media influencer (ID=MEX97696) and a former police officer (ID=MEX98181). Dismembered bodies with 

In [2]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 6):
    prev_summary = store_vac.get(m-1, "")

    n_events, fatalities, severity = at.month_metrics(2025, m, "Violence against civilians",
                                                      uri=URI, user=USER, pwd=PWD)
    delta_sev = ((severity - prev_sev) / prev_sev) if prev_sev not in (None, 0) else 0
    metrics = (f"Events {n_events}, fatalities {fatalities}, severity {severity:.1f}, "
               f"Δ {delta_sev:+.0%} vs prev.")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Violence against civilians",
        month        = m, year = 2025,
        context      = None,
        style        = "etype_no_context",
        metrics_block=metrics,
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(metrics)
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Violence against civilians"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    prev_sev     = severity


### VAC – 01/2025 ###

Events 510, fatalities 527, severity 38626.4, Δ +0% vs prev.

#### **Overall Situation**
January 2025 saw a continuation of high levels of violence against civilians across Mexico, with **510 incidents** resulting in **527 fatalities** and a **severity score of 38,626.4** (unchanged from the previous month). The month was marked by persistent cartel-related violence, targeted assassinations, and clashes between criminal groups, particularly in states like Sinaloa, Guanajuato, and Tabasco. Dismemberment, torture, and the use of narco-messages remained common tactics, reflecting the brutal nature of organized crime operations.

#### **Key Developments**
1. **Cartel Violence and Inter-Gang Conflicts**
   - **Sinaloa**: The ongoing rivalry between factions of the Sinaloa Cartel (Los Chapitos vs. Los Mayitos) led to multiple high-profile killings, including the torture and execution of a social media influencer (ID=MEX97696). Dismembered bodies with narco-messages we

### Protests 

In [None]:

import pandas as pd
df = pd.read_csv("/Users/hannes/Documents/master_thesis/acled_mexico_2025.csv")
df


In [6]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 7):
    prev_summary = store_vac.get(m-1, "")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Protests",
        month        = m, year = 2025,
        context      = None,
        style        = "etype_no_context",
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Protests"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    prev_sev     = severity


### VAC – 01/2025 ###


#### **Summary of the Situation**
Protests in Mexico during January 2025 were marked by widespread labor disputes, demands for justice, and social unrest. Key themes included:
- **Labor disputes**: Teachers, healthcare workers, and public employees protested across multiple states, demanding outstanding payments, better working conditions, and the reinstatement of dismissed workers.
- **Justice and security**: Protests demanding justice for missing persons, femicides, and police brutality were common, particularly in Guerrero, Veracruz, and Tamaulipas.
- **Public services**: Residents protested against inadequate water, electricity, and healthcare services, as well as infrastructure failures.
- **Political and economic grievances**: Protests against government policies, including the proposed reform of the Housing Institute (Infonavit) and public transportation fare increases, were reported in multiple states.

#### **Key Developments**
1. **Labor Protests**
  

In [2]:
# 1) compute metrics
tot_ev, tot_fat, n_peaceful, n_interv, n_excess = at.protest_metrics(
    URI, USER, PWD, year=2025, month=4, country="Mexico"
)

# 2) build your metrics line
metrics = (
    f"Total protests: {tot_ev}  |  Fatalities: {tot_fat}  |  "
    f"Peaceful: {n_peaceful}  |  Intervention: {n_interv}  |  "
    f"Excessive force: {n_excess}"
)

In [3]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 7):
    prev_summary = store_vac.get(m-1, "")

    tot_ev, tot_fat, n_peaceful, n_interv, n_excess = at.protest_metrics(
    URI, USER, PWD, year=2025, month=m, country="Mexico"
)

    metrics = (
    f"Total protests: {tot_ev}  |  Fatalities: {tot_fat}  |  "
    f"Peaceful: {n_peaceful}  |  Intervention: {n_interv}  |  "
    f"Excessive force: {n_excess}"
)
    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Protests",
        month        = m, year = 2025,
        context      = None,
        style        = "protests",
        metrics_block=metrics,
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0.0, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(metrics)
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Protests"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    

Total protests: 447  |  Fatalities: 0  |  Peaceful: 443  |  Intervention: 4  |  Excessive force: 0

### VAC – 01/2025 ###


#### **Summary of Protest Situation**
In January 2025, Mexico experienced **447 protests**, with **443 (99%) peaceful** and **4 (1%) non-peaceful** (Protest with intervention). No fatalities were reported. The majority of protests were labor-related, focusing on demands for wage increases, outstanding payments, and better working conditions. Other key themes included demands for justice, security, and public services.

#### **Key Developments**
1. **Labor Protests**: The most frequent protests were by workers demanding wage adjustments, outstanding payments, and better working conditions. Notable sectors included teachers, healthcare workers, and public transport drivers.
2. **Justice and Security**: Protests demanding justice for missing persons, femicides, and police brutality were common, particularly in Guerrero, Veracruz, and Nuevo León.
3. **Public Services*

In [4]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 7):
    prev_summary = store_vac.get(m-1, "")

    tot_ev, tot_fat, n_peaceful, n_interv, n_excess = at.protest_metrics(
    URI, USER, PWD, year=2025, month=m, country="Mexico"
)

    metrics = (
    f"Total protests: {tot_ev}  |  Fatalities: {tot_fat}  |  "
    f"Peaceful: {n_peaceful}  |  Intervention: {n_interv}  |  "
    f"Excessive force: {n_excess}"
)
    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Protests",
        month        = m, year = 2025,
        context      = None,
        style        = "protests",
        metrics_block=metrics,
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0.3, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(metrics)
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Protests"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    

Total protests: 447  |  Fatalities: 0  |  Peaceful: 443  |  Intervention: 4  |  Excessive force: 0

### VAC – 01/2025 ###


#### **Summary of Protest Situation**
In January 2025, Mexico experienced **447 protests**, with **443 classified as peaceful**, **4 involving intervention**, and **no reported fatalities**. The majority of protests were labor-related, focusing on demands for wage increases, outstanding payments, and better working conditions. Other key themes included **public service disruptions, infrastructure demands, and security concerns**.

#### **Key Developments**
1. **Labor Disputes**:
   - Teachers, healthcare workers, and public sector employees staged multiple protests across states like **Guerrero, Oaxaca, and Veracruz**, demanding wage adjustments and payment of bonuses (e.g., MEX97416, MEX97529, MEX97896).
   - **INFONAVIT employees** protested nationwide against reforms and salary cuts (e.g., MEX97802, MEX97816).

2. **Public Services & Infrastructure**:
   - Resi

# strategic developments

## Mexico

In [1]:
import acled_tools as at

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"



In [2]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 7):
    prev_summary = store_vac.get(m-1, "")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Strategic developments",
        month        = m, year = 2025,
        context      = None,
        style       = "strategic",
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0.3, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Strategic developments"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    


### VAC – 01/2025 ###

### **Strategic Developments in Mexico – January 2025**

#### **Overall Landscape**
January 2025 saw heightened criminal activity across Mexico, with significant operations against drug trafficking, migrant flows, and cartel violence. Key trends included:
- **Drug seizures and lab dismantling**: Multiple narco-laboratories were destroyed, particularly in Sinaloa, Sonora, and Michoacán, with large quantities of methamphetamine, cocaine, and synthetic drugs confiscated.
- **Migrant caravans**: Large groups of migrants (Venezuela, Haiti, Central America) continued moving northward, with caravans reported in Chiapas (e.g., **MEX97413**, **MEX98216**).
- **Cartel violence and retaliation**: Sinaloa, Michoacán, and Guanajuato experienced targeted attacks, including arson, shootings, and roadblocks, often linked to factional disputes (e.g., **MEX97524**, **MEX98034**).
- **Security operations**: Authorities conducted raids, arrests, and defused explosives, particularly

## Congo

In [3]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 7):
    prev_summary = store_vac.get(m-1, "")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Congo",
        event_type   = "Strategic developments",
        month        = m, year = 2025,
        context      = None,
        style       = "strategic",
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0.3, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Congo", event_type="Strategic developments"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    


### VAC – 01/2025 ###


### **Summary of Strategic Landscape**
No significant strategic developments or events were recorded in Congo this month. The absence of reported incidents suggests a period of relative calm, though historical tensions and underlying risks remain.

### **Breakdown by Sub-Event Types**
1. **Political/Military Developments**
   - No notable events reported.

2. **Economic/Resource Disputes**
   - No notable events reported.

3. **Social/Protest Activity**
   - No notable events reported.

4. **External Interference/Alliances**
   - No notable events reported.

### **Spotlight on Key Developments**

**Supporting Citations:**
- (N/A – No events logged for the current month)

**Conclusion:**
The strategic landscape in Congo remains stable, with no immediate indicators of escalation. However, sustained vigilance is recommended given the region’s history of volatility.

**END BRIEF**

Citation check:
Cited IDs : 0  •  Matched : 0
✅ all cited IDs present in slice.
════

## riots

In [None]:
def riot_metrics(uri, user, pwd, year, month, country):
    """
    Returns (total_events, counts_by_subtype) for Riots in the given slice.
    counts_by_subtype is a dict: { 'Violent demonstration': n1, 'Mob violence': n2 }
    """
    q = """
    MATCH (e:Event)-[:TYPE]->(t:EventType {code:"Riots"}),
          (e)-[:SUBTYPE]->(se:SubEventType),
          (e)-[:IN_STATE]->(s:State)
    WHERE e.year = $yr
      AND e.month = $mo
      AND s.country = $cty
    RETURN se.code AS subtype, count(e) AS cnt
    """
    with GraphDatabase.driver(uri, auth=(user, pwd)).session() as sess:
        recs = sess.run(q, yr=year, mo=month, cty=country).data()

    total = sum(r["cnt"] for r in recs)
    counts = {r["subtype"]: r["cnt"] for r in recs}
    # ensure both keys exist
    counts.setdefault("Violent demonstration", 0)
    counts.setdefault("Mob violence", 0)

    return total, counts

In [3]:
# ── GENERATE JAN–APR VAC BRIEFS & VERIFY CITATIONS ────────────────
store_vac  = {}
prev_sev   = None

for m in range(1, 7):
    prev_summary = store_vac.get(m-1, "")

    vac_summary, _ = at.summarize_events(
        URI, USER, PWD,
        country      = "Mexico",
        event_type   = "Riots",
        month        = m, year = 2025,
        context      = None,
        style       = "riots",
        max_results  = 1000,          # trimmed list is plenty
        temperature  = 0.3, max_tokens = 1000
    )

    # ---- verify citations right here ----------------------------
    print(f"\n### VAC – {m:02d}/2025 ###\n")
    print(vac_summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = vac_summary,
        uri   = URI, user = USER, pwd = PWD,
        year  = 2025, month = m, country = "Mexico", event_type="Riots"
    )
    print("═"*70, "\n")

    # cache for next iteration
    store_vac[m] = vac_summary
    


### VAC – 01/2025 ###


### **Summary of Riot Situation**
This month, Mexico experienced a surge in **mob violence** and **violent demonstrations**, with at least **15 fatal incidents** and numerous injuries. Mob violence dominated, often targeting alleged criminals, while demonstrations frequently involved protests over security, governance, and service demands.

### **Key Developments**
- **Mob Violence**: The most common form of unrest, with **26 incidents** reported. Most involved vigilante justice against suspected thieves, killers, or abductors, often resulting in severe injuries or fatalities.
- **Violent Demonstrations**: **12 incidents** were recorded, primarily linked to political grievances, infrastructure demands, or security concerns.
- **Geographic Hotspots**: Morelos, Yucatán, and Ciudad de México saw repeated incidents, while states like Puebla, Chihuahua, and Oaxaca also reported significant unrest.

### **Sub-Types of Riots**

#### **1. Violent Demonstrations**
- **P

In [6]:
import acled_tools as at

# ── GENERATE JAN–JUN RIOTS BRIEFS & VERIFY CITATIONS ────────────────
store_riots = {}
prev_summary = None

for m in range(1, 7):  # January (1) through June (6)
    # 1) compute the headline metrics for this month
    total, ev_by_sub, fat_by_sub = at.riot_metrics(
        URI, USER, PWD,
        year=2025, month=m, country="Mexico"
    )

    # 2) build a metrics_block string
    metrics = (
        f"Total riots: {total}  |  "
        f"Violent demonstration: {ev_by_sub.get('Violent demonstration',0)} "
        f"({fat_by_sub.get('Violent demonstration',0)} fat.)  |  "
        f"Mob violence: {ev_by_sub.get('Mob violence',0)} "
        f"({fat_by_sub.get('Mob violence',0)} fat.)"
    )

    # 3) generate the summary with the new metrics_block
    summary, _ = at.summarize_events(
        URI, USER, PWD,
        country       = "Mexico",
        event_type    = "Riots",
        month         = m,
        year          = 2025,
        context       = None,
        metrics_block = metrics,
        style="riots",
        max_results   = 1000,
        temperature   = 0.3,
        max_tokens    = 1000
    )

    # 4) print & verify
    print(metrics)
    print(f"\n### RIOTS – {m:02d}/2025 ###\n")
    print(summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = summary,
        uri          = URI,
        user         = USER,
        pwd          = PWD,
        year         = 2025,
        month        = m,
        country      = "Mexico",
        event_type   = "Riots"
    )
    print("═" * 70, "\n")

    # 5) cache for next iteration
    store_riots[m] = summary
    prev_summary  = summary

Total riots: 51  |  Violent demonstration: 14 (0 fat.)  |  Mob violence: 37 (5 fat.)

### RIOTS – 01/2025 ###


#### **Summary**
This month, Mexico experienced **51 riots**, including **14 violent demonstrations** (0 fatalities) and **37 mob violence incidents** (5 fatalities). Mob violence was the dominant sub-type, primarily involving vigilante justice against suspected criminals, while violent demonstrations were linked to political grievances, service demands, and security concerns.

#### **Key Developments**
- **Mob Violence (37 incidents, 5 fatalities)**:
  - Most cases involved lynching or severe assaults on suspected thieves, killers, or abductors (e.g., **MEX97401**, **MEX97407**, **MEX98509**).
  - Several incidents involved police inaction or obstruction (e.g., **MEX97407**, **MEX97399**).
  - Labor disputes (e.g., taxi drivers vs. tuk-tuk drivers in **MEX97679**) and migrant-related clashes (e.g., **MEX98209**) also contributed.

- **Violent Demonstrations (14 incidents, 0 

### Battles

In [1]:
import acled_tools as at
URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

In [2]:


store_battles = {}
prev_summary = None

for m in range(1, 7):
    # 1) compute metrics
    total, ev_by_sub, fat_by_sub = at.battle_metrics(
        URI, USER, PWD,
        year=2025, month=m, country="Mexico"
    )

    # 2) format metrics_block
    metrics = (
        f"Total battles: {total}  |  "
        f"Armed clash: {ev_by_sub['Armed clash']} ({fat_by_sub['Armed clash']} fat.)  |  "
        f"Government regains territory: {ev_by_sub['Government regains territory']} "
        f"({fat_by_sub['Government regains territory']} fat.)  |  "
        f"Non-state actor overtakes territory: {ev_by_sub['Non-state actor overtakes territory']} "
        f"({fat_by_sub['Non-state actor overtakes territory']} fat.)"
    )

    # 3) generate summary
    summary, _ = at.summarize_events(
        URI, USER, PWD,
        country       = "Mexico",
        event_type    = "Battles",
        month         = m,
        year          = 2025,
        context       = prev_summary,
        metrics_block = metrics,
        style         = "battles",
        max_results   = 1000,
        temperature   = 0.3,
        max_tokens    = 1000
    )

    # 4) print & verify
    print(metrics)
    print(f"\n### BATTLES – {m:02d}/2025 ###\n")
    print(summary)
    print("\nCitation check:")
    at.verify_citations(
        summary_text = summary,
        uri          = URI,
        user         = USER,
        pwd          = PWD,
        year         = 2025,
        month        = m,
        country      = "Mexico",
        event_type   = "Battles"
    )
    print("═" * 70, "\n")

    # 5) cache for next iteration
    store_battles[m] = summary
    prev_summary    = summary

Total battles: 159  |  Armed clash: 159 (126 fat.)  |  Government regains territory: 0 (0 fat.)  |  Non-state actor overtakes territory: 0 (0 fat.)

### BATTLES – 01/2025 ###


#### **Summary**
In January 2025, Mexico experienced **159 armed clashes**, resulting in **126 fatalities**. No territorial gains or losses were reported by either government forces or non-state actors. The majority of clashes involved **cartel vs. security forces** or **inter-cartel violence**, with **Sinaloa, Michoacán, and Guanajuato** as key hotspots.

#### **Breakdown by Event Type**
- **Armed Clash (159 events, 126 fatalities)** – All recorded battles fell under this category, with no instances of government regaining territory or non-state actors overtaking territory.
- **Government Regains Territory (0 events, 0 fatalities)** – No reported successes by security forces in reclaiming territory.
- **Non-State Actor Overtakes Territory (0 events, 0 fatalities)** – No territorial gains by cartels or armed gro

In [6]:
import acled_tools as at
URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

In [3]:
from acled_tools import summarize_riots

# generate + verify citations in one go
summary, bullets, missing = summarize_riots(
    uri=URI, user=USER, pwd=PWD,
    country="Mexico", year=2025, month=4,
    context="",               # previous‐month summary if you have it
    check_citations=True     # or False to skip check
)

print(summary)
if missing:
    print("Missing citations:", missing)

Cited IDs : 15  •  Matched : 15
✅ all cited IDs present in slice.

**Summary of Riot Situation**
This month, Mexico experienced **62 riots**, including **24 violent demonstrations** (2 fatalities) and **38 mob violence incidents** (5 fatalities). The events were concentrated in states such as **Tlaxcala, Puebla, Michoacán, and Guerrero**, with key triggers including **extrajudicial justice, labor disputes, and political grievances**.

**Key Developments**
- **Extrajudicial Justice**: Mob violence frequently involved lynchings and beatings of accused criminals, often targeting alleged thieves, sexual offenders, and cartel members (e.g., MEX102158, MEX103452, MEX103153).
- **Labor and Political Protests**: Violent demonstrations included strikes by police officers (MEX102502), student protests over school conditions (MEX102662), and fare hikes (MEX102595).
- **Cartel and Organized Crime Links**: Some mob violence incidents involved suspected cartel members (MEX103453) or disputes over po

In [None]:
from acled_tools import (
    summarize_riots,
    summarize_protests,
    summarize_battles,
    summarize_strategic,
    summarize_vac
)

for fn, et in [
    (summarize_riots,       "Riots"),
    (summarize_protests,    "Protests"),
    (summarize_battles,     "Battles"),
    (summarize_strategic,   "Strategic developments"),
    (summarize_vac,         "Violence against civilians")
]:
    print(f"\n--- {et.upper()} (Apr 2025) ---")
    summary, bullets, missing = fn(
        uri=URI, user=USER, pwd=PWD,
        country="Mexico", year=2025, month=4,
        context="",            # or pass last month’s summary
        check_citations=True
    )
    print(summary)
    


--- RIOTS (Apr 2025) ---
Cited IDs : 15  •  Matched : 15
✅ all cited IDs present in slice.

#### **Summary**
This month saw **62 riots** across Mexico, with **24 violent demonstrations** (2 fatalities) and **38 mob violence incidents** (5 fatalities). Key themes included vigilante justice, labor disputes, and clashes over local governance. Mob violence was concentrated in Tlaxcala, Puebla, and Michoacán, while violent demonstrations were prominent in Michoacán, Puebla, and Guerrero.

#### **Key Developments**
- **Vigilante Justice**: Mob violence frequently targeted alleged criminals, including sexual assailants (MEX102158, MEX103452), thieves (MEX103153), and suspected cartel members (MEX103453).
- **Labor & Student Protests**: Violent demonstrations involved normalista students (MEX102304, MEX102665) and transport fare protesters (MEX102595, MEX102671).
- **Cartel & Political Tensions**: Mob violence linked to cartel activity (MEX103453) and political disputes (MEX103448).

#### **S

In [5]:
def compose_monthly_overview(
    sub_reports: dict[str,str],
    country: str,
    year: int,
    month: int,
    prev_overview: str = "",
    llm_provider: str = "mistral",
    temperature: float = 0.4,
    max_tokens: int = 600
) -> str:
    """
    Given a dict of event-type → its monthly summary, produce a single
    country-level overview that weaves them together.

    Args:
      sub_reports: e.g.
        {
          "Violence against civilians": "...VAC summary text...",
          "Protests":                  "...Protests summary text...",
          "Riots":                     "...Riots summary text...",
          "Battles":                   "...Battles summary text...",
          "Strategic developments":    "...Strategic summary text..."
        }
      country, year, month: slice identifiers.
      prev_overview: last month’s country-overview text (if any).
    Returns:
      A unified overview string.
    """
    # 1) build the “inbox” of sub-reports with headings
    block = "\n\n".join(f"### {et} ###\n{sub_reports[et].strip()}"
                        for et in [
                          "Violence against civilians",
                          "Protests",
                          "Riots",
                          "Battles",
                          "Strategic developments"
                        ] if et in sub_reports)

    # 2) meta-prompt
    meta = f"""SYSTEM
You are an analyst in a foreign agency writing a conflict-early-warning brief. 


USER
Below are the five summaries of the developments this month {country} by event type. The sections are:  Violence against civilians, Protests, Riots, Battles, and Strategic developments. 
{sub_block}

PREVIOUS OVERVIEW (if any)
--------------------------
{prev_text or 'N/A'}

TASK
----
Based only on those five briefs:
• Write a concise **nation-wide** overview.
• Highlight cross-cutting themes.


GUIDELINES
* Do not introduce new facts—draw only on what’s in the sub-reports.
* Organise logically—no need to restate each sub-heading.
* Keep it under 300 words.

BEGIN OVERVIEW →
""".replace("{sub_block}", block).replace("{prev_text}", prev_overview)

    # 3) LLM call
    return at.ask_llm(
      prompt       = meta,
      provider     = llm_provider,
      temperature  = temperature,
      max_tokens   = max_tokens
    )

In [7]:

from acled_tools import (
    summarize_riots,
    summarize_protests,
    summarize_battles,
    summarize_strategic,
    summarize_vac,
    compose_monthly_overview
)

# 1) Generate all five sub-reports & optionally verify citations
sub_reports = {}
for fn, et in [
    (summarize_riots,     "Riots"),
    (summarize_protests,  "Protests"),
    (summarize_battles,   "Battles"),
    (summarize_strategic, "Strategic developments"),
    (summarize_vac,       "Violence against civilians"),
]:
    print(f"\n--- {et.upper()} (Apr 2025) ---")
    summary, bullets, missing = fn(
        uri=URI, user=USER, pwd=PWD,
        country="Mexico", year=2025, month=4,
        context="",            # or pass last month’s summary
        check_citations=True   # toggle on/off
    )
    print(summary)
    

    # store for later composition
    sub_reports[et] = summary

# 2) Now compose the national overview from those five briefs
overview = compose_monthly_overview(
    sub_reports=sub_reports,
    country="Mexico",
    year=2025,
    month=4,
    prev_overview=None)  # if you have Mar-2025 overview


print("\n=== COUNTRY‐LEVEL OVERVIEW (Apr 2025) ===")
print(overview)


--- RIOTS (Apr 2025) ---
Cited IDs : 11  •  Matched : 11
✅ all cited IDs present in slice.

#### **Summary of Riot Situation**
In April 2025, Mexico experienced **62 riots**, including **24 violent demonstrations** (2 fatalities) and **38 mob violence incidents** (5 fatalities). The majority of events involved **vigilante justice, labor disputes, and protests against security or governance failures**.

#### **Key Developments**
- **Mob Violence Dominance**: Most incidents (38) were mob violence, often targeting alleged criminals, journalists, or officials.
- **Labor & Student Protests**: Violent demonstrations (24) frequently involved **normalista students** (Michoacán, Puebla) and **transport workers** (Nuevo León, Guerrero).
- **Geographic Hotspots**: **Tlaxcala, Puebla, Michoacán, and Guerrero** saw repeated clashes, with **Tlaxcala** recording extreme vigilante violence (e.g., lynchings, arson).

#### **Sub-Types: Violent Demonstrations & Mob Violence**

**1. Violent Demonstration

NameError: name 'sub_block' is not defined

In [1]:
URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

In [2]:
from acled_tools import (
    summarize_vac,
    summarize_protests,
    summarize_riots,
    summarize_battles,
    summarize_strategic,
    summarize_overview,
)

# 1) Generate each of the five event-type briefs for May 2025
sub_reports = {}
for fn, et in [
    (summarize_vac,         "Violence against civilians"),
    (summarize_protests,    "Protests"),
    (summarize_riots,       "Riots"),
    (summarize_battles,     "Battles"),
    (summarize_strategic,   "Strategic developments"),
]:
    summary, bullets, missing = fn(
        uri=URI, user=USER, pwd=PWD,
        country="Mexico", year=2025, month=5,
        context="",           # no previous‐month context
        check_citations=True, # skip citation check for speed
        temperature=0.3, max_tokens=1000
    )
    print(f"\n--- {et.upper()} (05/2025) ---\n")
    print(summary)
    sub_reports[et] = summary

# 2) Now build the country-level overview
overview, missing_overview = summarize_overview(
    uri=URI, user=USER, pwd=PWD,
    sub_reports=sub_reports,
    country="Mexico", year=2025, month=5,
    prev_overview="",      # or pass April’s overview if you have it
    check_citations=True   # now we do want to verify any IDs
)

print("\n=== COUNTRY OVERVIEW (05/2025) ===\n")
print(overview)

if missing_overview:
    print("\n⚠️ Missing citations in overview:", missing_overview)
else:
    print("\n✅ All overview citations OK")

Cited IDs : 15  •  Matched : 14
⚠️  missing IDs ➜ MEX103

--- VIOLENCE AGAINST CIVILIANS (05/2025) ---


**Summary of Overall Pattern:**
May 2025 saw a high level of violence against civilians across Mexico, with 480 recorded events resulting in 518 fatalities. The majority of incidents were classified as "Attacks" (458 events, 95.4% of total events), with a smaller number of abductions/forced disappearances (20 events) and sexual violence cases (2 events). Guanajuato, Sinaloa, and Guerrero were among the most affected states, with Guanajuato alone accounting for a significant portion of the violence. The tactics employed included drive-by shootings, targeted assassinations, and torture, often accompanied by threatening messages from criminal groups. The involvement of organized crime, particularly cartels, was evident in many incidents, with disputes over territory, extortion, and settling of scores as key drivers.

**Key Developments:**
1. **Guanajuato as a Hotspot:** Guanajuato expe

In [2]:
# ╔══════════════════════════════════════════════════════════════════╗
# ║  Generate Jan–May 2025 briefs (5 types + overview)             ║
# ╚══════════════════════════════════════════════════════════════════╝

from acled_tools import (
    summarize_vac,
    summarize_protests,
    summarize_riots,
    summarize_battles,
    summarize_strategic,
    summarize_overview,
)

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# 1) Prepare storage for each slice
event_types = [
    ("Violence against civilians", summarize_vac),
    ("Protests",                    summarize_protests),
    ("Riots",                       summarize_riots),
    ("Battles",                     summarize_battles),
    ("Strategic developments",      summarize_strategic),
]

# per‐event-type summaries: store[etype][month] = text
store = { et: {} for et,_ in event_types }
# overview per month
overview_store = {}

# 2) loop months
for m in range(1, 6):   # 1=Jan … 5=May
    print(f"\n\n===== MONTH {m:02d}/2025 =====\n")

    # 2.1) generate each event‐type brief with previous context
    sub_reports = {}
    for et, fn in event_types:
        prev_ctx = store[et].get(m-1, "")
        summary, bullets, missing = fn(
            uri=URI, user=USER, pwd=PWD,
            country="Mexico", year=2025, month=m,
            context=prev_ctx,
            check_citations=True,      # turn on if you like
            temperature=0.3, max_tokens=800
        )
        store[et][m] = summary
        sub_reports[et] = summary

        print(f"--- {et.upper()} ---")
        print(summary, "\n")

    # 2.2) country overview with prior overview as context
    prev_ov = overview_store.get(m-1, "")
    overview = summarize_overview(
        uri=URI, user=USER, pwd=PWD,
        sub_reports=sub_reports,
        country="Mexico", year=2025, month=m,
        prev_overview=prev_ov,
        check_citations=True,        # or True
        temperature=0.3, max_tokens=600
    )
    overview_store[m] = overview

    print("=== COUNTRY OVERVIEW ===")
    print(overview)



===== MONTH 01/2025 =====

Cited IDs : 14  •  Matched : 13
⚠️  missing IDs ➜ MEX97888
--- VIOLENCE AGAINST CIVILIANS ---

**Summary of Overall Pattern:**
Violence against civilians in Mexico remained high in January 2025, with 510 recorded events resulting in 527 fatalities. The majority of incidents were classified as "Attacks" (475 events, 89% of total), followed by "Abduction/forced disappearance" (32 events) and "Sexual violence" (3 events). The data suggests a continuation of entrenched patterns of organized crime-related violence, with a notable concentration of attacks in states such as Sinaloa, Guanajuato, and Tabasco.

**Key Developments:**
1. **Actors and Tactics:**
   - Cartel-related violence dominated, with groups like the Jalisco New Generation Cartel (CJNG) and Los Chapitos prominently involved in high-profile attacks (e.g., MEX97909, MEX98889).
   - Tactics included drive-by shootings, targeted assassinations, and the use of dismemberment and narco-messages to intimid

In [1]:
# ╔══════════════════════════════════════════════════════════════════╗
# ║  Generate Jan–May 2025 briefs (5 types + overview)             ║
# ╚══════════════════════════════════════════════════════════════════╝

from acled_tools import (
    summarize_vac,
    summarize_protests,
    summarize_riots,
    summarize_battles,
    summarize_strategic,
    summarize_overview,
)

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# 1) Prepare storage for each slice
event_types = [
    ("Violence against civilians", summarize_vac),
    ("Protests",                    summarize_protests),
    ("Riots",                       summarize_riots),
    ("Battles",                     summarize_battles),
    ("Strategic developments",      summarize_strategic),
]

# per‐event-type summaries: store[etype][month] = text
store = { et: {} for et,_ in event_types }
# overview per month
overview_store = {}

# 2) loop months
for m in range(1, 6):   # 1=Jan … 5=May
    print(f"\n\n===== MONTH {m:02d}/2025 =====\n")

    # 2.1) generate each event‐type brief with previous context
    sub_reports = {}
    for et, fn in event_types:
        prev_ctx = store[et].get(m-1, "")
        summary, bullets, missing = fn(
            uri=URI, user=USER, pwd=PWD,
            country="Mexico", year=2025, month=m,
            context=prev_ctx,
            check_citations=True,      # turn on if you like
            temperature=0.3, max_tokens=1000
        )
        store[et][m] = summary
        sub_reports[et] = summary

        print(f"--- {et.upper()} ---")
        print(summary, "\n")

    # 2.2) country overview with prior overview as context
    prev_ov = overview_store.get(m-1, "")
    overview = summarize_overview(
        uri=URI, user=USER, pwd=PWD,
        sub_reports=sub_reports,
        country="Mexico", year=2025, month=m,
        prev_overview=prev_ov,
        check_citations=True,        # or True
        temperature=0.3, max_tokens=600
    )
    overview_store[m] = overview

    print("=== COUNTRY OVERVIEW ===")
    print(overview)



===== MONTH 01/2025 =====

Cited IDs : 14  •  Matched : 14
✅ all cited IDs present in slice.
--- VIOLENCE AGAINST CIVILIANS ---

**Summary of Overall Pattern:**
Violence against civilians in Mexico remained high in January 2025, with 510 recorded events resulting in 527 fatalities. The majority of incidents (475) were classified as "Attacks," with notable spikes in targeted assassinations, drive-by shootings, and dismemberment-related violence. Abductions/forced disappearances (32) and sexual violence (3) were less frequent but significant. Guanajuato, Sinaloa, and Tabasco were among the most affected states, with organized crime groups (e.g., CJNG, Los Chapitos) frequently implicated.

**Key Developments:**
1. **Organized Crime Dominance:** Cartels like the CJNG and Los Chapitos continued to use extreme tactics, including dismemberment and narco-messages (e.g., MEX97491, MEX97909). Attacks on influencers and local leaders (e.g., MEX98183) suggest expanding territorial disputes.
2. *

KeyboardInterrupt: 

In [1]:
# ╔══════════════════════════════════════════════════════════════════╗
# ║  Re-generate Jan–May 2025 briefs (5 types + overview) with context ║
# ╚══════════════════════════════════════════════════════════════════╝

from acled_tools import (
    summarize_vac,
    summarize_protests,
    summarize_riots,
    summarize_battles,
    summarize_strategic,
    summarize_overview,
)

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# 1) map each event‐type name to its summarization function
event_types = [
    ("Violence against civilians", summarize_vac),
    ("Protests",                    summarize_protests),
    ("Riots",                       summarize_riots),
    ("Battles",                     summarize_battles),
    ("Strategic developments",      summarize_strategic),
]

# storage for per‐type, per‐month summaries
store = { et: {} for et, _ in event_types }
# storage for the country overview per month
overview_store = {}

# 2) loop over months
for m in range(1, 6):   # Jan=1 … May=5
    print(f"\n\n===== MONTH {m:02d}/2025 =====\n")
    sub_reports = {}

    # 2.1) for each event‐type, pull last month’s summary as context and re‐generate
    for et, fn in event_types:
        prev_ctx = store[et].get(m-1, "")
        summary, bullets, missing = fn(
            uri=URI, user=USER, pwd=PWD,
            country="Mexico", year=2025, month=m,
            context=prev_ctx,
            check_citations=True,
            temperature=0.3, max_tokens=800
        )
        store[et][m] = summary
        sub_reports[et] = summary

        print(f"--- {et.upper()} ({m:02d}/2025) ---")
        print(summary)
        if missing:
            print("⚠️ Missing citations:", missing)
        print()

    # 2.2) now build the country-level overview, feeding it last month’s overview
    prev_ov = overview_store.get(m-1, "")
    overview, missing_ov = summarize_overview(
        uri=URI, user=USER, pwd=PWD,
        sub_reports=sub_reports,
        country="Mexico", year=2025, month=m,
        prev_overview=prev_ov,
        check_citations=True,
        temperature=0.3, max_tokens=600
    )
    overview_store[m] = overview

    print("=== COUNTRY OVERVIEW ===")
    print(overview)
    if missing_ov:
        print("⚠️ Missing citations in overview:", missing_ov)
    print("\n" + "="*80)



===== MONTH 01/2025 =====

Cited IDs : 13  •  Matched : 12
⚠️  missing IDs ➜ MEX97888
--- VIOLENCE AGAINST CIVILIANS (01/2025) ---

**Summary of Overall Pattern:**
January 2025 saw a high level of violence against civilians across Mexico, with 510 recorded events resulting in 527 fatalities. The majority of incidents (475) were classified as "Attacks," with notable spikes in targeted assassinations, dismemberment, and drive-by shootings. Abductions/forced disappearances (32) and sexual violence (3) were less frequent but significant. Guanajuato, Sinaloa, and Tabasco were among the most affected states, with organized crime groups (e.g., CJNG, Los Chapitos) frequently implicated.

**Key Developments:**
1. **Organized Crime Dominance:** Cartels like the CJNG and Los Chapitos were linked to multiple high-profile attacks, including dismemberment and narco-messages (e.g., MEX97491, MEX97909). The CJNG claimed responsibility for several killings in Tabasco (MEX97888).
2. **Targeted Assassi

In [1]:
# ╔══════════════════════════════════════════════════════════════════╗
# ║  Re-generate Jan–May 2025 briefs (5 types + overview) with context ║
# ╚══════════════════════════════════════════════════════════════════╝

from acled_tools import (
    summarize_vac,
    summarize_protests,
    summarize_riots,
    summarize_battles,
    summarize_strategic,
    summarize_overview,
)

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# 1) map each event‐type name to its summarization function
event_types = [
    ("Violence against civilians", summarize_vac),
    ("Protests",                    summarize_protests),
    ("Riots",                       summarize_riots),
    ("Battles",                     summarize_battles),
    ("Strategic developments",      summarize_strategic),
]

# storage for per‐type, per‐month summaries
store = { et: {} for et, _ in event_types }
# storage for the country overview per month
overview_store = {}

# 2) loop over months
for m in range(1, 6):   # Jan=1 … May=5
    print(f"\n\n===== MONTH {m:02d}/2025 =====\n")
    sub_reports = {}

    # 2.1) for each event‐type, pull last month’s summary as context and re‐generate
    for et, fn in event_types:
        prev_ctx = store[et].get(m-1, "")
        summary, bullets, missing = fn(
            uri=URI, user=USER, pwd=PWD,
            country="Mexico", year=2025, month=m,
            context=prev_ctx,
            check_citations=True,
            temperature=0.3, max_tokens=800
        )
        store[et][m] = summary
        sub_reports[et] = summary

        print(f"--- {et.upper()} ({m:02d}/2025) ---")
        print(summary)
        if missing:
            print("⚠️ Missing citations:", missing)
        print()

    # 2.2) now build the country-level overview, feeding it last month’s overview
    prev_ov = overview_store.get(m-1, "")
    overview, missing_ov = summarize_overview(
        uri=URI, user=USER, pwd=PWD,
        sub_reports=sub_reports,
        country="Mexico", year=2025, month=m,
        prev_overview=prev_ov,
        check_citations=True,
        temperature=0.3, max_tokens=600
    )
    overview_store[m] = overview

    print("=== COUNTRY OVERVIEW ===")
    print(overview)
    



===== MONTH 01/2025 =====

Cited IDs : 14  •  Matched : 12
⚠️  missing IDs ➜ MEX97455, MEX97888
--- VIOLENCE AGAINST CIVILIANS (01/2025) ---

**Summary of Overall Pattern:**
Violence against civilians in Mexico remained high in January 2025, with 510 recorded events resulting in 527 fatalities. Attacks (475 events) were the predominant form of violence, followed by abductions/forced disappearances (32 events) and sexual violence (3 events). The tactics employed included drive-by shootings, targeted assassinations, and the use of dismemberment and torture as a means of intimidation. The Jalisco New Generation Cartel (CJNG) and other criminal groups were implicated in several high-profile attacks, particularly in Tabasco and Sinaloa.

**Key Developments:**
1. **Criminal Group Activity:** The CJNG was linked to multiple attacks in Tabasco (e.g., MEX97491, MEX97909, MEX97888), often leaving threatening messages alongside dismembered bodies. In Sinaloa, rival factions such as Los Chapitos

# Evaluation

In [2]:
import acled_tools as at

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"
results = []

# storage for previous‐month summary
prev_ctx = ""

for m in range(1, 6):
    # 1) Fetch the bullet list and generate WITHOUT context
    no_ctx_summary, bullets, _ = at.summarize_vac(
        uri=URI, user=USER, pwd=PWD,
        country="Mexico", year=2025, month=m,
        context="",           # no context
        check_citations=False # we’ll do halluc checks separately
    )

    # 2) Generate WITH context (prev month’s summary)
    with_ctx_summary, bullets, _= at.summarize_vac(
        uri=URI, user=USER, pwd=PWD,
        country="Mexico", year=2025, month=m,
        context=prev_ctx,     # pass last month’s summary
        check_citations=False
    )

    # 3) Evaluate both for hallucinations against the bullet list
    print(f"\n--- Month {m:02d}/2025 — No-Context ---")
    no_ctx_eval = at.evaluate_hallucination(
        summary     = no_ctx_summary,
        test_corpus = bullets,
        num_questions=20      # or pick whatever you like
    )

    print(f"\n--- Month {m:02d}/2025 — With-Context ---")
    with_ctx_eval = at.evaluate_hallucination(
        summary     = with_ctx_summary,
        test_corpus = bullets,
        num_questions=20
    )

    # store for next iteration
    prev_ctx = with_ctx_summary

    results.append({
        "month": m,
        "no_ctx_rate":    no_ctx_eval["hallucination_rate"],
        "with_ctx_rate":  with_ctx_eval["hallucination_rate"],
    })

# 4) display a quick table
import pandas as pd
df = pd.DataFrame(results)
print("\nHallucination rates by month:")
print(df.set_index("month").round(2))


--- Month 01/2025 — No-Context ---


Evaluating: 100%|██████████| 2/2 [00:34<00:00, 17.02s/it]


Hallucination rate: None

--- Month 01/2025 — With-Context ---


Evaluating: 100%|██████████| 2/2 [00:14<00:00,  7.47s/it]


Hallucination rate: None

--- Month 02/2025 — No-Context ---


Evaluating: 100%|██████████| 2/2 [00:22<00:00, 11.46s/it]


Hallucination rate: None

--- Month 02/2025 — With-Context ---
Error found. Retrying to generate questions at step 1.


Evaluating: 100%|██████████| 2/2 [00:31<00:00, 15.54s/it]


Hallucination rate: None

--- Month 03/2025 — No-Context ---


Evaluating: 100%|██████████| 2/2 [01:00<00:00, 30.28s/it]


Hallucination rate: None

--- Month 03/2025 — With-Context ---


Evaluating:   0%|          | 0/2 [00:14<?, ?it/s]


KeyboardInterrupt: 

In [None]:
import pandas as pd
from acled_tools import summarize_vac, evaluate_hallucination

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

results = []
prev_ctx = ""  # we'll pass this as “context” once we have a summary

for m in range(1, 6):  # Jan=1 … May=5
    print(f"\n=== Month {m:02d}/2025 ===")

    # 1) Generate WITHOUT context
    no_ctx_summary, no_ctx_bullets, _ = summarize_vac(
        uri=URI, user=USER, pwd=PWD,
        country="Mexico", year=2025, month=m,
        context="",            # no prior summary
        check_citations=False,
        max_tokens= 1000  # skip citation check here
    )

    # 2) Generate WITH context
    with_ctx_summary, with_ctx_bullets, _ = summarize_vac(
        uri=URI, user=USER, pwd=PWD,
        country="Mexico", year=2025, month=m,
        context=prev_ctx,      # last month’s summary
        check_citations=False
    )

    # 3) Evaluate hallucinations (use the bullet-list as the “trusted corpus”)
    print("\n-- No-Context Summary --")
    no_ctx_eval = evaluate_hallucination(
        summary     = no_ctx_summary,
        test_corpus = no_ctx_bullets,
        num_questions = 20,    # adjust if your bullet list is short
        manual = False,
        iterations = 1
    )

    print("\n-- With-Context Summary --")
    with_ctx_eval = evaluate_hallucination(
        summary     = with_ctx_summary,
        test_corpus = with_ctx_bullets,
        num_questions = 20,
        manual = False,
        iterations = 1
    )

    # record results
    results.append({
        "month": m,
        "no_ctx_questions": no_ctx_eval["total_questions"],
        "no_ctx_halluc_rate": no_ctx_eval["hallucination_rate"],
        "with_ctx_questions": with_ctx_eval["total_questions"],
        "with_ctx_halluc_rate": with_ctx_eval["hallucination_rate"],
    })

    # set up for next iteration
    prev_ctx = with_ctx_summary

# 4) show it as a table
df = pd.DataFrame(results).set_index("month")
print("\nHallucination rates by month:")
print(df.round(2))


=== Month 01/2025 ===

-- No-Context Summary --


Evaluating: 100%|██████████| 2/2 [00:56<00:00, 28.12s/it]


Hallucination rate: None

-- With-Context Summary --


Evaluating: 100%|██████████| 2/2 [03:33<00:00, 106.71s/it]


Hallucination rate: 0.30000000000000004

=== Month 02/2025 ===

-- No-Context Summary --


Evaluating: 100%|██████████| 2/2 [00:28<00:00, 14.49s/it]


Hallucination rate: None

-- With-Context Summary --


Evaluating: 100%|██████████| 2/2 [00:24<00:00, 12.33s/it]


Hallucination rate: None

=== Month 03/2025 ===

-- No-Context Summary --


Evaluating: 100%|██████████| 2/2 [00:29<00:00, 14.80s/it]


Hallucination rate: None

-- With-Context Summary --


Evaluating: 100%|██████████| 2/2 [00:15<00:00,  7.58s/it]


Hallucination rate: 0.0

=== Month 04/2025 ===

-- No-Context Summary --


Evaluating: 100%|██████████| 2/2 [00:21<00:00, 10.55s/it]


Hallucination rate: 0.0

-- With-Context Summary --


Evaluating: 100%|██████████| 2/2 [00:06<00:00,  3.33s/it]


Hallucination rate: 0.0

=== Month 05/2025 ===

-- No-Context Summary --


Evaluating: 100%|██████████| 2/2 [01:59<00:00, 59.66s/it]


Hallucination rate: None

-- With-Context Summary --


Evaluating: 100%|██████████| 2/2 [00:40<00:00, 20.14s/it]

Hallucination rate: 0.0

Hallucination rates by month:
       no_ctx_questions  no_ctx_halluc_rate  with_ctx_questions  \
month                                                             
1                     0                 NaN                  10   
2                     0                 NaN                   0   
3                     0                 NaN                  10   
4                    10                 0.0                  10   
5                     0                 NaN                  20   

       with_ctx_halluc_rate  
month                        
1                       0.3  
2                       NaN  
3                       0.0  
4                       0.0  
5                       0.0  





In [1]:
import acled_tools as at
URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# Example: Mexico, March 2025 – with previous context & citation checks
prev_briefs   = {...}       # ← dict from February if you have it
prev_overview = "..."       # ← February overview

bundle = at.master_monthly_briefs(
    uri=URI, user=USER, pwd=PWD,
    country="Mexico", year=2025, month=3,
    include_context=False,
    check_citations=True,
    prev_event_briefs=None,
    prev_overview=None
)

# Access items
vac_summary, vac_missing = bundle["Violence against civilians"]
overview,  ov_missing    = bundle["_overview"]

Cited IDs : 11  •  Matched : 11
✅ all cited IDs present in slice.
Cited IDs : 16  •  Matched : 16
✅ all cited IDs present in slice.
Cited IDs : 16  •  Matched : 16
✅ all cited IDs present in slice.
Cited IDs : 19  •  Matched : 19
✅ all cited IDs present in slice.
Cited IDs : 25  •  Matched : 25
✅ all cited IDs present in slice.
Cited IDs : 15  •  Matched : 15
✅ all cited IDs present in slice.


In [2]:
print("VAC Summary:")
print(vac_summary)
print("Missing citations in VAC:", vac_missing) 

VAC Summary:

**Summary of Patterns:**
March 2025 saw a high level of violence against civilians, with 483 recorded events resulting in 566 fatalities. Attacks (458 events) were the most common form of violence, followed by abductions/forced disappearances (21 events) and sexual violence (4 events). The month was marked by targeted assassinations, drive-by shootings, and brutal executions, often involving torture and dismemberment. Criminal groups, particularly in Guerrero, Sinaloa, and Guanajuato, were responsible for many attacks, often linked to territorial disputes or extortion.

**Key Developments:**
- **Guerrero and Sinaloa** remained hotspots, with frequent attacks involving dismemberment and messages left by criminal groups (e.g., MEX101778, MEX101779).
- **Guanajuato** saw multiple attacks linked to organized crime, including targeted killings of former police officers and community leaders (e.g., MEX101359, MEX101369).
- **Drive-by shootings** were common, particularly in Gua

In [3]:
import acled_tools as at
URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# Example: Mexico, March 2025 – with previous context & citation checks
prev_briefs   = {...}       # ← dict from February if you have it
prev_overview = "..."       # ← February overview

summaries = at.master_monthly_briefs(
    uri=URI, user=USER, pwd=PWD,
    country="Mexico", year=2025, month=3,
    include_context=False,
    check_citations=False,
    prev_event_briefs=None,
    prev_overview=None
)

# Access items
vac_summary = summaries["Violence against civilians"]


In [2]:
vac_summary = summaries["Violence against civilians"]
print("VAC Summary:")
print(vac_summary)

NameError: name 'summaries' is not defined

In [4]:
# ── cell 1 · imports & config ───────────────────────────────────────────────────
import pandas as pd
import acled_tools as at

# import the functions you pasted earlier; change the filename if necessary
from acled_tools import evaluate_hallucination

# Neo4j connection parameters
URI  = "bolt://localhost:7687"
USER = "neo4j"
PWD  = "password"

countries = ["Mexico", "Congo", "Georgia"]
months    = range(1, 7)          # 1 – 6 (Jan → Jun)
year      = 2025

In [5]:
# ── cell 2 · helper to fetch one month’s summary & reference corpus ─────────────
def fetch_month(country: str, year: int, month: int):
    """
    Returns (summary_text, test_corpus) for the given country/month.
    Adjust the dict keys so they match what master_monthly_briefs returns.
    """
    briefs = at.master_monthly_briefs(
        uri=URI, user=USER, pwd=PWD,
        country=country, year=year, month=month,
        include_context=False,
        check_citations=False,
        prev_event_briefs=None,
        prev_overview=None
    )
    
    # ── WHAT TO SCORE ───────────────────────────────────────────────────────────
    summary_text = briefs["Violence against civilians"]   # ← change if needed
    test_corpus  = briefs["reference_notes"]              # ← change if needed
    
    return summary_text, test_corpus

In [1]:
import pandas as pd
import acled_tools as at          # your local module with all helper functions
from acled_tools import evaluate_hallucination

from neo4j import GraphDatabase   # official Neo4j driver

# Neo4j credentials
URI  = "bolt://localhost:7687"
USER = "neo4j"
PWD  = "password"

countries = ["Mexico", "Congo", "Georgia"]
months    = range(1, 7)          # Jan – Jun
year      = 2025

In [2]:
def get_reference_notes(uri, user, pwd, country, year, month, event_type):
    """
    Returns a single text blob containing the notes for *all events*
    that match the (country, year, month, event_type) filter.
    """
    query = """
    MATCH (e:Event)-[:HAPPENED_IN]->(:Country {name:$country})
    WHERE e.event_type = $etype
      AND e.event_date >= date({year:$year, month:$month, day:1})
      AND e.event_date  <  date({year:$year, month:$month, day:1}) + duration('P1M')
    RETURN e.notes AS note
    ORDER BY e.event_date
    """
    drv = GraphDatabase.driver(uri, auth=(user, pwd))
    with drv.session() as s:
        notes = [rec["note"] for rec in s.run(query, country=country,
                                              etype=event_type,
                                              year=year, month=month)
                 if rec["note"]]
    drv.close()
    return "\n".join(notes)

In [3]:
def fetch_summary_and_corpus(country, year, month, event_type):
    """
    • Calls master_monthly_briefs to get the summary for `event_type`.
    • Fetches reference notes for the same slice from Neo4j.
    • Returns (summary_text, reference_corpus).
    """
    briefs = at.master_monthly_briefs(
        uri=URI, user=USER, pwd=PWD,
        country=country, year=year, month=month,
        include_context=False,
        check_citations=False,
        prev_event_briefs=None,
        prev_overview=None
    )
    
    summary_text = briefs[event_type]                     # raises KeyError if missing
    reference_corpus = get_reference_notes(
        URI, USER, PWD,
        country=country, year=year, month=month,
        event_type=event_type
    )
    return summary_text, reference_corpus

In [5]:
results = []

for country in countries:
    for month in months:
        # pull one month's brief to discover which event-type keys exist
        try:
            base_brief = at.master_monthly_briefs(
                uri=URI, user=USER, pwd=PWD,
                country=country, year=year, month=month,
                include_context=False, check_citations=False,
                prev_event_briefs=None, prev_overview=None
            )
        except Exception as e:
            print(f"⚠️  failed to pull briefs for {country} {year}-{month:02d}: {e}")
            continue
        
        event_types = list(base_brief.keys())   # dynamic: whatever that month gives
        print(f"{country} {year}-{month:02d}: event types → {event_types}")
        
        for etype in event_types:
            try:
                summary, corpus = fetch_summary_and_corpus(
                    country, year, month, etype
                )
                
                stat = evaluate_hallucination(
                    summary       = summary,
                    test_corpus   = corpus,
                    num_questions = 20,    # auto-size
                    manual        = False,
                    iterations    = 1
                )
                stat.update({
                    "country":    country,
                    "year":       year,
                    "month":      month,
                    "event_type": etype
                })
                results.append(stat)
                print(f"   ✓ {etype}")
            
            except Exception as e:
                print(f"   ⚠️  {etype} → {e}")

Mexico 2025-01: event types → ['Violence against civilians', 'Protests', 'Riots', 'Battles', 'Strategic developments', '_overview']


Evaluating: 100%|██████████| 2/2 [00:10<00:00,  5.32s/it]


Hallucination rate: None
   ✓ Violence against civilians


Evaluating: 100%|██████████| 2/2 [00:07<00:00,  3.84s/it]


Hallucination rate: None
   ✓ Protests


KeyboardInterrupt: 

In [6]:
# ------------------------------------------------------------
# 0) Imports & DB creds
# ------------------------------------------------------------
import acled_tools as at

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# ------------------------------------------------------------
# 1) Generate VAC summary twice
#    – w/out context   (baseline)
#    – with context    (pass previous-month summary)
# ------------------------------------------------------------
# pull April first so we can use it as context
apr_summary, _, _ = at.summarize_vac(
    uri=URI, user=USER, pwd=PWD,
    country="Mexico", year=2025, month=4,
    context="",                 # ← no context for April itself
    check_citations=True,       # quick sanity-check
    temperature=0.3
)

may_noctx_summary, bullets, _ = at.summarize_vac(
    uri=URI, user=USER, pwd=PWD,
    country="Mexico", year=2025, month=5,
    context="",                 # ← truly no context
    check_citations=True,
    temperature=0.3
)

may_ctx_summary, _, _ = at.summarize_vac(
    uri=URI, user=USER, pwd=PWD,
    country="Mexico", year=2025, month=5,
    context=apr_summary,        # ← April as context
    check_citations=True,
    temperature=0.3
)

# ------------------------------------------------------------
# 2) Quick hallucination evaluation (20 MCQs each)
#    bullets from the “no-context” call are good enough as corpus
# ------------------------------------------------------------
print("\n--- NO-CONTEXT SUMMARY ---")
noctx_eval = at.evaluate_hallucination(
    summary      = may_noctx_summary,
    test_corpus  = bullets,
    num_questions=20,
    iterations   = 1            # keep it light for the smoke-test
)

print("\n--- WITH-CONTEXT SUMMARY ---")
ctx_eval = at.evaluate_hallucination(
    summary      = may_ctx_summary,
    test_corpus  = bullets,
    num_questions=20,
    iterations   = 1
)

# ------------------------------------------------------------
# 3) Display results side by side
# ------------------------------------------------------------
import pandas as pd
df = pd.DataFrame(
    {
        "setup": ["no-context", "with-context"],
        "halluc_rate": [
            noctx_eval["hallucination_rate"],
            ctx_eval["hallucination_rate"],
        ],
        "correct": [
            noctx_eval["correct_answers"],
            ctx_eval["correct_answers"],
        ],
        "total_q": [
            noctx_eval["total_questions"],
            ctx_eval["total_questions"],
        ],
    }
)
print("\n=== QUICK TEST RESULT ===")
print(df.set_index("setup").round(2))

Cited IDs : 11  •  Matched : 11
✅ all cited IDs present in slice.
Cited IDs : 14  •  Matched : 14
✅ all cited IDs present in slice.
Cited IDs : 9  •  Matched : 9
✅ all cited IDs present in slice.

--- NO-CONTEXT SUMMARY ---


Evaluating: 100%|██████████| 2/2 [01:09<00:00, 34.84s/it]


Hallucination rate: None

--- WITH-CONTEXT SUMMARY ---


Evaluating: 100%|██████████| 2/2 [00:52<00:00, 26.40s/it]

Hallucination rate: None

=== QUICK TEST RESULT ===
             halluc_rate  correct  total_q
setup                                     
no-context          None        0        0
with-context        None        0        0





In [7]:
from acled_tools import ask_llm, read_json, create_questions_from_summary

TEST_PROMPT_SUMMARY = may_noctx_summary  # or any short paragraph for speed

raw = ask_llm(
    prompt = """
You are a multiple-choice question generation AI.

Based on the following text, generate exactly 3 multiple-choice questions
in the required JSON format.

json[
  {
    "question": "...",
    "options": ["...", "...", "...", "..."],
    "correct_answer": "..."
  },
  ...
]

Text:
""" + TEST_PROMPT_SUMMARY,
    temperature = 0.3,
    max_tokens  = 500
)

print("=== RAW MODEL OUTPUT ===")
print(raw)
try:
    print("\nParsed JSON:\n", read_json(raw))
except Exception as e:
    print("read_json failed →", e)

=== RAW MODEL OUTPUT ===
Here are three multiple-choice questions based on the provided text in the required JSON format:

```json
[
  {
    "question": "Which of the following states was NOT mentioned as one of the most affected by violence against civilians in May 2025?",
    "options": ["Guanajuato", "Sinaloa", "Guerrero", "Veracruz"],
    "correct_answer": "Veracruz"
  },
  {
    "question": "What was the most common type of violent event recorded in May 2025?",
    "options": ["Abductions/forced disappearances", "Sexual violence", "Attacks", "Political violence"],
    "correct_answer": "Attacks"
  },
  {
    "question": "Which of the following was a notable characteristic of violence in Sinaloa during May 2025?",
    "options": ["Mass executions and dismembered bodies", "Targeted killings of religious figures", "Attacks on public transport drivers", "Extortion by state actors"],
    "correct_answer": "Mass executions and dismembered bodies"
  }
]
```
read_json failed → No 'json[' 

In [1]:
import acled_tools as at   # or whatever the module is called

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# --- pick a slice ----------------------------------------------------
country = "Mexico"
year    = 2025
month   = 5          # May
event   = "Violence against civilians"

# 1) generate the VAC brief *without* context
summary, bullets, _ = at.summarize_vac(
    uri=URI, user=USER, pwd=PWD,
    country=country, year=year, month=month,
    context="",                # <── no previous-month summary
    check_citations=False
)

print("\n--- GENERATED SUMMARY (no context) ---\n")
print(summary[:800] + " …")         # preview

# 2) run the hallucination check (20 MCQs, single iteration)
eval_stats = at.evaluate_hallucination(
    summary      = summary,
    test_corpus  = bullets,   # we treat the bullet log as the “ground-truth” corpus
    num_questions=10,         # any multiple of 10 works
    manual=False,             # LLM does the answering & marking
    iterations=5             # 1 pass is fine for a quick test
)

print("\n--- EVALUATION RESULT ---")
print(eval_stats)


--- GENERATED SUMMARY (no context) ---


#### **Summary of Patterns**
This month saw **481 incidents of violence against civilians**, resulting in **518 fatalities**. The majority of attacks (458) involved shootings, abductions, or torture, often linked to organized crime disputes, territorial control, and political violence. **Guanajuato, Sinaloa, and Guerrero** remained hotspots, with **Guanajuato** experiencing the highest number of attacks (107). **Sexual violence (2 cases)** and **abductions/forced disappearances (21 cases)** were less frequent but significant, particularly in **Baja California, Michoacán, and Nuevo León**.

#### **Key Developments**
- **Organized Crime Dominance**: Cartels (e.g., **Santa Rosa de Lima, CJNG, Los Rusos**) c …
Error found. Retrying to generate questions at step 1.
Error found. Retrying to generate questions at step 1.
--- RAW MODEL OUTPUT ---
 [{'question': 'How many incidents of violence against civilians were recorded in Mexico in May 2025?', 'op

Evaluating: 100%|██████████| 5/5 [00:20<00:00,  4.02s/it]

Hallucination rate: None

--- EVALUATION RESULT ---
{'hallucination_rate': None, 'total_questions': 0, 'correct_answers': 0, 'incorrect_answers': 0, 'failed_batches': 5, 'iterations_per_batch': 5, 'total_batches': 1}





In [3]:
api_key = "1jwUcSzw7IwGdusNjHmnmKfMuWpf4qg3"

In [1]:
import acled_tools as at
from hallucination_eval import evaluate_hallucination

# ------------------------------------------------------------------
# 1.  Get one VAC summary and its bullet-list for a single month
# ------------------------------------------------------------------
URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

vac_summary, bullets, _, metrics = at.summarize_vac(
    uri=URI, user=USER, pwd=PWD,
    country="Mexico", year=2025, month=5,
    context="",            # no previous-month context for this test
    check_citations=False  # skip citation check – we only care about hallucinations here
)

# ------------------------------------------------------------------
# 2.  Run the hallucination evaluation
#     • 10 questions  • answered/graded 3 times to stress-test
# ------------------------------------------------------------------
result = evaluate_hallucination(
    summary       = vac_summary,
    corpus  =  f"{metrics}\n\n{bullets}",    # ground-truth slice
    total_q = 10,         # keep small for a quick demo
    iterations    = 3           # ask/answer the same batch 3×
)

print("\n--- EVALUATION SUMMARY ---")
for k, v in result.items():
    print(f"{k:20}: {v}")


--- EVALUATION SUMMARY ---
hallucination_rate  : 1.0
questions           : 10
avg_correct         : 0.0
iterations          : 3


In [3]:
from hallucination_eval import (
    generate_mcq,            # <-- uses your new filters
    _answer_mcq, _score_mcq  # low-level helpers
)
import acled_tools as at

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# --- get summary + trusted corpus ------------------------------------
vac_summary, bullets, _ , metrics = at.summarize_vac(
    uri=URI, user=USER, pwd=PWD,
    country="Mexico", year=2025, month=5,
    context="", check_citations=False
)

corpus  = bullets + "\n\nHEADLINE METRICS:\n" + metrics

# --- generate non-repeating, evidence-backed MCQs ---------------------
questions = generate_mcq(vac_summary, total_q=10, temp=0.0)

# --- print everything -------------------------------------------------
print("\n================ SUMMARY =================\n")
print(vac_summary)

print("\n================ QUESTIONS (ground-truth) ================\n")
for idx, q in enumerate(questions, 1):
    print(f"Q{idx}. {q['question']}")
    
    # -- print A-D in order with their option text -------------
    for letter in ["A", "B", "C", "D"]:            # guarantees A→D order
        print(f"   – {letter}. {q['options'][letter]}")
    
    print(f"   ✔ correct: {q['correct']}\n")

# --- let the model answer once to see how hard they are --------------
answers  = _answer_mcq(questions, corpus, temp=0.0)
corrects = _score_mcq(questions, answers)

print("================ MODEL ANSWERS ================\n")
for q, ans in zip(questions, answers):
    print(f"{q['question']}\n → model answered: {ans} | correct: {q['correct']}")

print("\n================ QUICK SCORE =================")
print(f"Correct: {corrects}/{len(questions)}   "
      f"hallucination_rate ≈ {1 - corrects/len(questions):.2f}")




**Summary of Patterns:**
May 2025 saw sustained high levels of violence against civilians, with 481 recorded events resulting in 518 fatalities. Attacks (458 events) remained the dominant sub-event type, often involving targeted assassinations, drive-by shootings, and torture. Abductions/forced disappearances (21 events) and sexual violence (2 events) were less frequent but significant in specific contexts. Guanajuato, Sinaloa, and Guerrero were hotspots, with attacks frequently linked to organized crime disputes, territorial control, and political violence.

**Key Developments:**
1. **Organized Crime Dominance:** Cartels (e.g., Santa Rosa de Lima, CJNG, Los Rusos) claimed responsibility for high-profile attacks, often leaving messages to intimidate rivals or authorities (e.g., MEX104338, MEX104942).
2. **Political Targeting:** Attacks on candidates (e.g., Morena candidate in Texistepec, MEX103973) and journalists (e.g., Acapulco, MEX104019) highlighted electoral and press-related v

In [2]:
import json, textwrap
print(textwrap.indent(json.dumps(questions[0], indent=2), "  "))

  {
    "question": "How many recorded events of violence against civilians occurred in Mexico in May 2025?",
    "options": {
      "A": "481",
      "B": "518",
      "C": "21",
      "D": "2"
    },
    "correct": "A"
  }


In [6]:
# ────────────────────────────────────────────────────────────────
#  bulk_eval.py  –  summaries + hallucination check at scale
# ────────────────────────────────────────────────────────────────
import pandas as pd
import acled_tools as at                    # your big helper module
from hallucination_eval import evaluate_hallucination

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# ---- configuration ------------------------------------------------------
COUNTRIES   = ["Democratic Republic of Congo", "Mexico", "Georgia"]
MONTHS      = range(1, 8)                   # 1 … 7   (Jan – Jul)
EVENT_FUNCS = {                             # wrapper  → style decided inside
    "Violence against civilians": at.summarize_vac,
    "Protests":                   at.summarize_protests,
    "Riots":                      at.summarize_riots,
    "Battles":                    at.summarize_battles,
    "Strategic developments":     at.summarize_strategic,
}
MCQ_N       = 10                            # questions per summary
ITERATIONS  = 5                             # answer/grade passes          

# ---- main loop ----------------------------------------------------------
records = []

for country in COUNTRIES:
    print(f"\n==== {country.upper()} ====")
    # keep last-month context per event-type
    last_summary = {et: "" for et in EVENT_FUNCS}

    for m in MONTHS:
        print(f"\n——  {m:02d}/2025  ——")
        for et, fn in EVENT_FUNCS.items():
            # 1) generate summary (+ bullets) with previous context
            summary, bullets, _ , metrics = fn(
                uri=URI, user=USER, pwd=PWD,
                country=country, year=2025, month=m,
                check_citations=False,               # purely hallucination study
                temperature=0.3, max_tokens=900
            )
            last_summary[et] = summary              # stash for next month

            # 2) hallucination evaluation (QA-accuracy style)
            eval_out = evaluate_hallucination(
                summary      = summary,
                corpus       = bullets + "\n\nHEADLINE METRICS:\n" + metrics,             # all events for the slice
                total_q      = MCQ_N,
                iterations   = ITERATIONS,
                temp_q       = 0.0,                 # deterministic generation
                temp_a       = 0.0                  # deterministic answering
            )

            # 3) store one row
            records.append({
                "country":       country,
                "month":         m,
                "event_type":    et,
                **eval_out       # expands to hallucination_rate, etc.
            })

# ---- results ------------------------------------------------------------
df = pd.DataFrame(records)
df.to_csv("hallucination_report.csv", index=False)

print("\n=== average hallucination rate (Jan–Jul) ===")
print(df.pivot_table(index="country",
                     columns="event_type",
                     values="hallucination_rate").round(2))
print("\nDetailed per-month CSV written to hallucination_report.csv")


==== DEMOCRATIC REPUBLIC OF CONGO ====

——  01/2025  ——

——  02/2025  ——

——  03/2025  ——

——  04/2025  ——

——  05/2025  ——

——  06/2025  ——

——  07/2025  ——

==== MEXICO ====

——  01/2025  ——

——  02/2025  ——

——  03/2025  ——

——  04/2025  ——

——  05/2025  ——

——  06/2025  ——

——  07/2025  ——

==== GEORGIA ====

——  01/2025  ——

——  02/2025  ——

——  03/2025  ——

——  04/2025  ——

——  05/2025  ——

——  06/2025  ——

——  07/2025  ——

=== average hallucination rate (Jan–Jul) ===
event_type                    Battles  Protests  Riots  \
country                                                  
Democratic Republic of Congo     0.06      0.04   0.07   
Georgia                          0.00      0.00   0.03   
Mexico                           0.06      0.07   0.06   

event_type                    Strategic developments  \
country                                                
Democratic Republic of Congo                    0.02   
Georgia                                         0.01   
Mexic

In [8]:
# ────────────────────────────────────────────────────────────────
#  bulk_eval.py  –  summaries + hallucination check at scale
# ────────────────────────────────────────────────────────────────
import pandas as pd
import acled_tools as at                    # your big helper module
from hallucination_eval import evaluate_hallucination

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# ---- configuration ------------------------------------------------------
COUNTRIES   = ["Democratic Republic of Congo", "Mexico", "Georgia"]
MONTHS      = range(1, 8)                   # 1 … 7   (Jan – Jul)
EVENT_FUNCS = {                             # wrapper  → style decided inside
    "Violence against civilians": at.summarize_vac,
    "Protests":                   at.summarize_protests,
    "Riots":                      at.summarize_riots,
    "Battles":                    at.summarize_battles,
    "Strategic developments":     at.summarize_strategic,
}
MCQ_N       = 10                            # questions per summary
ITERATIONS  = 5                             # answer/grade passes          

# ---- main loop ----------------------------------------------------------
records = []

for country in COUNTRIES:
    print(f"\n==== {country.upper()} ====")
    # keep last-month context per event-type
    last_summary = {et: "" for et in EVENT_FUNCS}

    for m in MONTHS:
        print(f"\n——  {m:02d}/2025  ——")
        for et, fn in EVENT_FUNCS.items():
            # 1) generate summary (+ bullets) with previous context
            summary, bullets, _ , metrics = fn(
                uri=URI, user=USER, pwd=PWD,
                country=country, year=2025, month=m,
                check_citations=False,  
                context=last_summary[et],  # use last month’s summary as context
                temperature=0.3, max_tokens=900
            )
            last_summary[et] = summary              # stash for next month

            # 2) hallucination evaluation (QA-accuracy style)
            eval_out = evaluate_hallucination(
                summary      = summary,
                corpus       = (bullets + "\n\nHEADLINE METRICS:\n" + metrics + "\n\nPREV-MONTH SUMMARY:\n" + last_summary[et]),
             # all events for the slice
                total_q      = MCQ_N,
                iterations   = ITERATIONS,
                temp_q       = 0.0,                 # deterministic generation
                temp_a       = 0.0                  # deterministic answering
            )

            # 3) store one row
            records.append({
                "country":       country,
                "month":         m,
                "event_type":    et,
                **eval_out       # expands to hallucination_rate, etc.
            })

# ---- results ------------------------------------------------------------
df_context = pd.DataFrame(records)
df_context.to_csv("hallucination_report_context.csv", index=False)

print("\n=== average hallucination rate (Jan–Jul) ===")
print(df.pivot_table(index="country",
                     columns="event_type",
                     values="hallucination_rate").round(2))
print("\nDetailed per-month CSV written to hallucination_report.csv")


==== DEMOCRATIC REPUBLIC OF CONGO ====

——  01/2025  ——

——  02/2025  ——

——  03/2025  ——

——  04/2025  ——

——  05/2025  ——

——  06/2025  ——

——  07/2025  ——

==== MEXICO ====

——  01/2025  ——

——  02/2025  ——

——  03/2025  ——

——  04/2025  ——

——  05/2025  ——

——  06/2025  ——

——  07/2025  ——

==== GEORGIA ====

——  01/2025  ——

——  02/2025  ——

——  03/2025  ——

——  04/2025  ——

——  05/2025  ——

——  06/2025  ——

——  07/2025  ——

=== average hallucination rate (Jan–Jul) ===
event_type                    Battles  Protests  Riots  \
country                                                  
Democratic Republic of Congo     0.06      0.04   0.07   
Georgia                          0.00      0.00   0.03   
Mexico                           0.06      0.07   0.06   

event_type                    Strategic developments  \
country                                                
Democratic Republic of Congo                    0.02   
Georgia                                         0.01   
Mexic

In [None]:
# ────────────────────────────────────────────────────────────────
#  bulk_eval.py  –  summaries + hallucination check at scale
# ────────────────────────────────────────────────────────────────
import pandas as pd
import acled_tools as at                    # your big helper module
from hallucination_eval import evaluate_hallucination

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# ---- configuration ------------------------------------------------------
COUNTRIES   = ["Democratic Republic of Congo", "Mexico", "Georgia"]
MONTHS      = range(1, 8)                   # 1 … 7   (Jan – Jul)
EVENT_FUNCS = {                             # wrapper  → style decided inside
    "Violence against civilians": at.summarize_vac,
    "Protests":                   at.summarize_protests,
    "Riots":                      at.summarize_riots,
    "Battles":                    at.summarize_battles,
    "Strategic developments":     at.summarize_strategic,
}
MCQ_N       = 10                            # questions per summary
ITERATIONS  = 5                             # answer/grade passes          

# ---- main loop ----------------------------------------------------------
records = []

for country in COUNTRIES:
    print(f"\n==== {country.upper()} ====")
    # keep last-month context per event-type
    last_summary = {et: "" for et in EVENT_FUNCS}

    for m in MONTHS:
        print(f"\n——  {m:02d}/2025  ——")
        for et, fn in EVENT_FUNCS.items():
            # 1) generate summary (+ bullets) with previous context
            summary, bullets, _ , metrics = fn(
                uri=URI, user=USER, pwd=PWD,
                country=country, year=2025, month=m,
                check_citations=False,               # purely hallucination study
                temperature=0.3, max_tokens=900
            )
            last_summary[et] = summary              # stash for next month

            # 2) hallucination evaluation (QA-accuracy style)
            eval_out = evaluate_hallucination(
                summary      = summary,
                corpus       = bullets + "\n\nHEADLINE METRICS:\n" + metrics,             # all events for the slice
                total_q      = MCQ_N,
                iterations   = ITERATIONS,
                temp_q       = 0.0,                 # deterministic generation
                temp_a       = 0.0                  # deterministic answering
            )

            # 3) store one row
            records.append({
                "country":       country,
                "month":         m,
                "event_type":    et,
                **eval_out       # expands to hallucination_rate, etc.
            })

# ---- results ------------------------------------------------------------
df = pd.DataFrame(records)
df.to_csv("hallucination_report.csv", index=False)

print("\n=== average hallucination rate (Jan–Jul) ===")
print(df.pivot_table(index="country",
                     columns="event_type",
                     values="hallucination_rate").round(2))
print("\nDetailed per-month CSV written to hallucination_report.csv")

In [1]:
from hallucination_eval import (
    generate_mcq,            # <-- uses your new filters
    _answer_mcq, _score_mcq  # low-level helpers
)
import acled_tools as at

URI, USER, PWD = "bolt://localhost:7687", "neo4j", "password"

# --- get summary + trusted corpus ------------------------------------
vac_summary, bullets, _ , metrics = at.summarize_vac(
    uri=URI, user=USER, pwd=PWD,
    country="Mexico", year=2025, month=5,
    context="", check_citations=True
)

corpus  = bullets + "\n\nHEADLINE METRICS:\n" + metrics

# --- generate non-repeating, evidence-backed MCQs ---------------------
questions = generate_mcq(vac_summary, total_q=10, temp=0.0)

# --- print everything -------------------------------------------------
print("\n================ SUMMARY =================\n")
print(vac_summary)

print("\n================ QUESTIONS (ground-truth) ================\n")
for idx, q in enumerate(questions, 1):
    print(f"Q{idx}. {q['question']}")
    
    # -- print A-D in order with their option text -------------
    for letter in ["A", "B", "C", "D"]:            # guarantees A→D order
        print(f"   – {letter}. {q['options'][letter]}")
    
    print(f"   ✔ correct: {q['correct']}\n")

# --- let the model answer once to see how hard they are --------------
answers  = _answer_mcq(questions, corpus, temp=0.0)
corrects = _score_mcq(questions, answers)

print("================ MODEL ANSWERS ================\n")
for q, ans in zip(questions, answers):
    print(f"{q['question']}\n → model answered: {ans} | correct: {q['correct']}")

print("\n================ QUICK SCORE =================")
print(f"Correct: {corrects}/{len(questions)}   "
      f"hallucination_rate ≈ {1 - corrects/len(questions):.2f}")

Cited IDs : 15  •  Matched : 15
✅ all cited IDs present in slice.



#### **Summary of Patterns**
This month saw **481 recorded incidents** of violence against civilians, resulting in **518 fatalities**. The majority of attacks (458) were direct assaults, with **21 abductions/forced disappearances** and **2 cases of sexual violence**. Guanajuato, Sinaloa, and Guerrero remained hotspots, with **Guanajuato** experiencing repeated drive-by shootings and targeted assassinations, often linked to cartel disputes. **Sinaloa** saw a surge in cartel-related violence, including dismemberment and public displays of bodies with narco-messages. **Guerrero** continued to witness attacks on public transport drivers and political figures, reflecting ongoing territorial disputes between gangs.

#### **Key Developments**
- **Cartel Warfare**: The **Santa Rosa de Lima Cartel** (Guanajuato) and **CJNG** (Guerrero) claimed responsibility for high-profile attacks, including the killing of civilians in publi