# ACLED Event Extraction Notebook

This notebook will connect to the ACLED API, fetch global event data, filter for significant events (such as wars, conflicts, political upheavals, economic crises, etc.), group the events by country, then by year and event type, and finally output a nested JSON structure. This structure will be useful for later integration into a defense spending analysis dashboard.

## Cell 1: Set Up and Fetch ACLED Data

In this cell, we'll define our API access details (make sure you replace `YOUR_API_KEY_HERE` and `YOUR_EMAIL_HERE` with your actual ACLED API credentials) and fetch data from the ACLED API. We'll request data in CSV format for a given time span.

In [2]:
import requests
import pandas as pd
import io

# Set your ACLED API credentials
ACLED_API_KEY = "A7l10mr8zXC8d*h5DF0G"
ACLED_EMAIL = "vn22886@bristol.ac.uk"

# Define the ACLED API endpoint for the 'acled/read' command
base_url = "https://api.acleddata.com/acled/read"

# Define query parameters
params = {
    "key": ACLED_API_KEY,
    "email": ACLED_EMAIL,
    "limit": 500,  # adjust limit as needed
    "event_date.gte": "2010-01-01",  # start date
    "event_date.lte": "2020-12-31",  # end date
    "format": "csv"
    # You can add further filters if desired
}

# Make the API request
response = requests.get(base_url, params=params)

if response.status_code == 200:
    print("Successfully fetched ACLED data!")
    acled_df = pd.read_csv(io.StringIO(response.text))
    print(f"Records fetched: {len(acled_df)}")
else:
    raise Exception(f"Error fetching ACLED data: {response.status_code}")

# Display the first few rows
acled_df.head()

Successfully fetched ACLED data!
Records fetched: 0


Unnamed: 0,"{""status"":200",success:true,last_update:37,count:500,messages:[],"data:[{""event_id_cnty"":""PSE69772""","event_date:""2025-02-28""","year:""2025""","time_precision:""1""","disorder_type:""Political violence""",...,"source_scale:""National"".265","notes:""On 28 February 2025.462",five people,including four children,were wounded when a mortar shell fired by unidentified armed persons from an unknown direction fell on a house in Sradarga area of Bannu district (Bannu,"KPK). 5 injured and no fatalities.""","fatalities:""0"".445","tags:"""".277","timestamp:""1741112539""}]","filename:""2025-03-06""}"


## Cell 2: Inspect and Prepare the Data

In this cell, we'll inspect the columns and convert the date column to a datetime object. We'll also extract the year for grouping purposes.

In [None]:
# Check the columns and basic info
print(acled_df.columns.tolist())
acled_df.info()

# Convert the event_date column to datetime (adjust column name if needed)
acled_df['event_date'] = pd.to_datetime(acled_df['event_date'], format='%Y-%m-%d', errors='coerce')

# Extract the year into a new column
acled_df['Year'] = acled_df['event_date'].dt.year

# Display the updated DataFrame
acled_df.head()

## Cell 3: Filter for Significant Events

Now, we want to filter the dataset to include only the events that are most likely to impact defense spending, such as wars, conflicts, political upheavals, or economic crises. Adjust the event types as needed based on the ACLED fields (here we assume there's an `event_type` column).

In [None]:
# Define a list of significant event types (adjust as needed based on ACLED documentation)
significant_event_types = ["Battle", "Explosions/Remote violence", "Protests", "Riots"]

# Filter the DataFrame for significant events
filtered_acled_df = acled_df[acled_df['event_type'].isin(significant_event_types)].copy()

print(f"Number of significant events: {len(filtered_acled_df)}")
filtered_acled_df.head()

## Cell 4: Group the Events by Country, Year, and Event Type

We will now group the filtered events by country. For each country, we'll group the events by year, and then within each year, group them by event type. This will create a nested dictionary structure.

In [None]:
import json

nested_events = {}

# Group by country
for country, country_group in filtered_acled_df.groupby("country"):
    country_dict = {"Country": country, "Time_Series": []}
    
    # Group by year within each country
    for year, year_group in country_group.groupby("Year"):
        year_dict = {"Year": year, "Events": {}}
        
        # Group by event type within the year
        for event_type, type_group in year_group.groupby("event_type"):
            events_list = type_group[["event_date", "event_type", "sub_event_type", "notes"]].to_dict(orient="records")
            year_dict["Events"][event_type] = events_list
        
        country_dict["Time_Series"].append(year_dict)
    
    nested_events[country] = country_dict

# Pretty-print the nested structure
print(json.dumps(nested_events, indent=2, default=str))

## Cell 5: Save the Nested Data to a JSON File

Finally, we'll save the nested event data to a JSON file so it can be used later by our API or for further analysis.

In [None]:
output_filename = "data/curated_acled_events.json"
with open(output_filename, "w") as outfile:
    json.dump(nested_events, outfile, indent=2, default=str)

print(f"Nested ACLED event data saved to {output_filename}")