<a name='toc'></a>
#<font color=#F46767><b> üí£ Exploring Terror in Europe</b> (2000 - 2020) üî•</font>

> _"The GTD--Global Terrorism Database-- defines a terrorist attack as the threatened or actual use of illegal force and violence by a nonstate actor to attain a political, economic, religious, or social goal through fear, coercion, or intimidation."_

<br>

## &#9889; _General_<br>

  - <font color='darkturquoise'>Scope</font>:
  Analyze terrorist incidents in Europe from January 1, 2000 through December 31, 2020 using the GTD dataset.

  - <font color='darkturquoise'>Objective</font>:
    - `Primary Obj-0`: _"How did the nature, lethality, and modus operandi of terrorist activity evolve across Europe from 2000 to 2020?"_
    - `Secondary Obj-1`: _"Which perpetrator groups changed their preferred attack types or target profiles over the two decades, and when did those shifts occur?"_
    - `Secondary Obj-2`: _"What country-to-country ‚Äúspill-over‚Äù patterns exist?"_


<br>

##<img src="https://img.icons8.com/?size=100&id=yTvVS6whPDpp&format=png&color=000000" width="30" height="30"/> Main File Structure

&emsp;&emsp;
<img src="https://img.icons8.com/?size=100&id=43817&format=png&color=FCC419" width='25' height='25'/> gtd_project/<br>
&emsp;&emsp;
&#x251C;&#x2500; <img src="https://img.icons8.com/?size=100&id=43817&format=png&color=FCC419" width='25' height='25'/> source_files/<br>
&emsp;&emsp;
&#x251C;&#x2500; <img src="https://img.icons8.com/?size=100&id=43817&format=png&color=FCC419" width='25' height='25'/> notebooks/<br>
&emsp;&emsp;&emsp;&emsp;
&#x251C;&#x2500; <img src="https://img.icons8.com/?size=100&id=13441&format=png&color=000000" width='25' height='25'/> gtd_part1_etl.ipynb<br>
&emsp;&emsp;&emsp;&emsp;
&#x251C;&#x2500; <img src="https://img.icons8.com/?size=100&id=13441&format=png&color=000000" width='25' height='25'/> gtd_part2_obj0.ipynb<br>
&emsp;&emsp;&emsp;&emsp;
&#x251C;&#x2500; <img src="https://img.icons8.com/?size=100&id=13441&format=png&color=000000" width='25' height='25'/> gtd_part2_obj1.ipynb<br>
&emsp;&emsp;&emsp;&emsp;
&#x2514;&#x2500; <img src="https://img.icons8.com/?size=100&id=13441&format=png&color=000000" width='25' height='25'/> gtd_part2_obj2.ipynb<br>
&emsp;&emsp;
&#x251C;&#x2500; <img src="https://img.icons8.com/?size=100&id=13441&format=png&color=000000" width='25' height='25'/> gtd_util.py/<br>
&emsp;&emsp;
&#x251C;&#x2500; <img src="https://img.icons8.com/?size=100&id=43817&format=png&color=FCC419" width='25' height='25'/> etl_outputs/<br>
&emsp;&emsp;
&#x251C;&#x2500; <img src="https://img.icons8.com/?size=100&id=43817&format=png&color=FCC419" width='25' height='25'/> images/<br>
&emsp;&emsp;
&#x251C;&#x2500; <img src="https://img.icons8.com/?size=100&id=43817&format=png&color=FCC419" width='25' height='25'/> processed_data/<br>
&emsp;&emsp;
&#x2514;&#x2500; <img src="https://img.icons8.com/?size=100&id=VUckOuTyLQ7W&format=png&color=19B1FC" width='25' height='25'/> README.md<br>
<br>

###<img src="https://img.icons8.com/?size=100&id=XOQ8AO4LZthX&format=png&color=000000" width="30" height="30"/>References
1. START (National Consortium for the Study of Terrorism and Responses to Terrorism). (2022). Global Terrorism Database, 1970 - 2020. <a href="https://www.start.umd.edu/gtd">Dataset</a>&emsp;<a href="https://www.start.umd.edu/sites/default/files/2024-10/Codebook.pdf">Codebook</a>&emsp;<a href="https://www.start.umd.edu/gtd-terms">TermsOfUse</a>
2. Icons by [icons8](https://icons8.com)

In [None]:
# general libs
import warnings
import numpy as np
import pandas as pd
from datetime import datetime

In [None]:
# defining the project directory to work with along with geodata's source file

PROJECT_PATH = '/content/drive/MyDrive/pf_pjs/gtd_project'
DATA_PATH = '/etl_outputs/gtd_final.pkl'

%cd $PROJECT_PATH

/content/drive/MyDrive/pf_pjs/gtd_project


In [None]:
# pandas global setting
pd.options.display.max_columns = None

# ignore warnings
warnings.filterwarnings('ignore')

In [None]:
gtd = pd.read_pickle(DATA_PATH)

In [None]:
gtd.columns

Index(['id', 'date', 'five_year', 'quarter', 'year', 'month', 'month_name',
       'day', 'region', 'country', 'alpha2', 'alpha3', 'province_state',
       'city', 'lat', 'lon', 'is_success', 'is_suicide', 'is_property_damaged',
       'terr_group', 'is_claimed', 'attack_type', 'weapon_type',
       'weapon_subtype', 'target_type', 'target_subtype', 'target_nationality',
       'fatalities_total', 'fatalities_terrorists', 'wounded_total',
       'wounded_terrorists', 'is_hostage', 'hostages_total',
       'hostage_duration', 'is_ransom'],
      dtype='object')

In [None]:
# an adjustment so the geodata align with the GTD dataframe
gtd['alpha3'] = gtd['alpha3'].cat.remove_categories('XKX')
gtd['alpha3'] = gtd['alpha3'].cat.add_categories(['KOS'])
gtd.loc[gtd.alpha3.isna(), 'alpha3'] = 'KOS'

<a name='eda'></a>
##<img src="https://img.icons8.com/?size=100&id=21144&format=png&color=000000" width="25" height="25"/> <font color=orange><b>Analysis</b></font>

### <font color='skyblue'>**Country-to-Country 'Spill-Over' Patterns** (Obj-2)</font>

> **Spill-over** refers to the spread of terrorist activity from one country to another, where activity in a source country appears to influence or trigger attacks in a neighboring country shortly afterward (0 - +3 months).

So, the **goal** here is to **identify** whether **terrorist activity in one country is followed by an uptick in neighboring countries**, suggesting a spill-over risk/spread.



<br>

Process breakdown:
1. Resample terrorist incidents to monthly counts for each country.

2. Build country adjacency list as a dictionary.

3. Implement a lagged cross-correlation for each neighbor countries.

4. Flag pair countries with meaningful lead-lag correlation.

5. Acquisition of actors involved in flagged pair countries.

#### <font color='coral'>Identifying Spillovers Between Country Pairs</font>

In [None]:
from gtd_util import obj2

In [None]:
# group by month and country
monthly_counts = (gtd
                  .groupby([pd.Grouper(key='date', freq='M'), 'country'])
                  .size()
                  .reset_index(name='attack_count'))


# pivot to get a time series per country (rows: months, columns: countries)
ts_matrix = (monthly_counts
             .pivot(index='date', columns='country', values='attack_count')
             .fillna(0) # fill NaNs with 0 (months with no attacks)
             )


neighbors = obj2.neighbors

In [None]:
def lagged_pearson_manual(time_series_1, time_series_2, max_lag=6):
  """
  USAGE:  computes Pearson correlation at lags from -max_lag to +max_lag.
          positive lag means time_series_1 leads time_series_2 (source -> target).
          also calculates zero bias (percentage of months where both are zero)
  INPUT:
    time_series_1/2, pd.Series: time series to correlate.
    max_lag, int: maximum lag to consider.
  OUTPUT:
    result, list: (lag, correlation, zero_bias).
  """
  results = []

  for lag in range(-max_lag, max_lag + 1):

    shifted1 = time_series_1.copy()
    shifted2 = time_series_2.shift(lag)   # target lags (source leads)

    # drop NaNs caused by shifting
    df = pd.DataFrame({'s1': shifted1, 's2': shifted2}).dropna()

    # acquire enough overlap for meaningful correlation
    if len(df) < 10:
      results.append((lag, np.nan, np.nan))
      continue

    # compute correlation corf between country pairs
    corr = df['s1'].corr(df['s2'])

    # computes zero bias
    both_zero = ((df['s1'] == 0) & (df['s2'] == 0)).sum()
    zero_bias = both_zero / len(df)

    results.append((lag, corr, zero_bias))

  return results

In [None]:
def assess_confidence_level(corr, lag, zero_bias):
  '''
  USAGE: assign a confidence level to the spillover signal.

  INPUT:
    corr, float: correlation coef of country pair
    lag, int: lag between country pair
    zero_bias, float: percentage of months where both countries had no attacks

  OUTPUT:
    returns one of the confidence levels: High, Medium, Low, Invalid
  '''
  if pd.isna(corr) or lag < 0 or lag > 3:
      return 'Invalid'
  if abs(corr) >= 0.4 and zero_bias <= 0.6:
      return 'High'
  if abs(corr) >= 0.4 and zero_bias <= 0.8:
      return 'Medium'
  return 'Low'

In [None]:
results = []

for src, dest_list in neighbors.items():

  # skip if no data for source
  if src not in ts_matrix.columns:
      continue

  for dest in dest_list:
    # skip if no data for target
    if dest not in ts_matrix.columns:
        continue

    # get the monthly time series
    s1 = ts_matrix[src]
    s2 = ts_matrix[dest]

    # implement the lagged correlation
    corr_data = lagged_pearson_manual(s1, s2)

    # Find lag with max absolute correlation
    best_lag, best_corr, best_zero_bias = max(corr_data,
                                              key=lambda x: abs(x[1]) if pd.notna(x[1]) else -1)


    confidence_level = assess_confidence_level(best_corr, best_lag, best_zero_bias)

    # store the result as tupple in a dictionary
    results.append({'source': src,
                    'target': dest,
                    'lag': best_lag,
                    'max_corr': round(best_corr, 3),
                    'zero_bias': round(best_zero_bias, 3),
                    'confidence': confidence_level
                    })

In [None]:
spill_results = pd.DataFrame(results)

# probable spillovers
spillover_data = (spill_results[spill_results.confidence != 'Invalid']
                .sort_values(by='max_corr', ascending=False)
                .reset_index(drop=True)
                )

spillover_data.to_csv('processed_data/valid_spillover_results.csv', index=False)

spillover_data

Unnamed: 0,source,target,lag,max_corr,zero_bias,confidence
0,Finland,Sweden,2,0.824,0.788,Medium
1,Hungary,Romania,0,0.632,0.972,Low
2,Romania,Hungary,0,0.632,0.972,Low
3,Finland,Estonia,3,0.476,0.928,Low
4,Ireland,United Kingdom,0,0.462,0.171,High
5,United Kingdom,Ireland,0,0.462,0.171,High
6,Serbia,Croatia,0,0.327,0.837,Low
7,Croatia,Serbia,0,0.327,0.837,Low
8,Russia,Belarus,3,0.317,0.116,Low
9,North Macedonia,Kosovo,0,0.313,0.623,Low


In [None]:
# filtering the high-confidence country pairs for spillover
country_spillover = spillover_data[spillover_data.confidence == 'High']

country_spillover

Unnamed: 0,source,target,lag,max_corr,zero_bias,confidence
4,Ireland,United Kingdom,0,0.462,0.171,High
5,United Kingdom,Ireland,0,0.462,0.171,High


<font color='orange'><b>Observations</b></font>

- Out of 41 European countries, 52 neighbor pairs showed valid lag alignment between monthly terrorist activity (lags 0 to +3).

- Only one pair *Ireland ‚Üî United Kingdom* reached the **High-Confidence threshold**:

  - Correlation ‚â• 0.4

  - Lag between 0 and +3 months

  - Zero-bias (co-inactivity) ‚â§ 0.6

- Most other country pairs had:

  - Very high zero-bias (often > 0.8)

  - Weak correlations (typically < 0.3)

  - Invalid lag alignment (outside 0&ndash;3 months)

- One notable Medium-confidence pair, *Finland ‚û° Sweden*, had a high correlation (0.824), but was excluded due to excessive zero-bias (0.788).
<br>

<font color='orange'><b>Insights</b></font>

- **Spill-over in terrorist activity is rare and localized** in Europe. Most countries do not influence or align with neighbors in a measurable, lagged way.

- The high-confidence case of Ireland ‚Üî UK is an **exception**, likely due to historical, cultural, and operational ties between dissident republican networks.

- High zero-bias in most country pairs suggests **correlation is often driven by shared periods of inactivity**, not coordinated escalation.

- **Cross-border diffusion**, if it occurs, is likely **ideological or covert**, not consistently detectable through monthly incident counts alone.

#### <font color='coral'>Identifying Actors Involved in High-Confidence Spillover</font>

In [None]:
def get_peak_months(series, threshold=0.9, min_count=1):
  """
  USAGE: Returns timestamps where the series exceeds the quantile threshold
          (calculated only on non-zero values).

  INPUT:
    threshold, float: quantile to define peaks (default 0.90 = top 10%)
    min_count, int: minimum number of attacks to be considered a peak (default 1)

  OUTPUT:
    pandas.DatetimeIndex: timestamps where the series exceeds the threshold.
  """
  # Exclude zeros for threshold calculation
  non_zero_series = series[series > 0]

  if len(non_zero_series) == 0:
      return pd.DatetimeIndex([])  # no peaks at all

  limit = non_zero_series.quantile(threshold)

  # Ensure we also apply a minimum count cutoff
  limit = max(limit, min_count)

  return series[series >= limit].index

In [None]:
group_alignment_results = []

df_obj2_2 = gtd.copy()
df_obj2_2['month_period'] = df_obj2_2['date'].dt.to_period('M')

# remove unused categories
df_obj2_2['terr_group'] = df_obj2_2['terr_group'].astype(str)

for _, row in country_spillover.iterrows():
  src = row['source']
  tgt = row['target']
  lag = row['lag']

  src_series = ts_matrix[src]
  tgt_series = ts_matrix[tgt]

  # 1. Get peak months in source
  peak_months_src = get_peak_months(src_series)


  # 2. Shift peak months to target based on lag
  aligned_months_tgt = [month + pd.DateOffset(months=lag) for month in peak_months_src]


  # 3. Convert both to Period[M] format to align with df_obj2_2['month_period']
  peak_months_src_periods = pd.to_datetime(peak_months_src).to_period('M')
  aligned_months_tgt_periods = pd.to_datetime(aligned_months_tgt).to_period('M')

  # 4. Get group(s) active in those months
  src_groups = (
      df_obj2_2[df_obj2_2['month_period'].isin(peak_months_src_periods) & (df_obj2_2['country'] == src)]['terr_group']
      .value_counts()
      .head(5)
      .index.tolist()
  )

  tgt_groups = (
      df_obj2_2[df_obj2_2['month_period'].isin(aligned_months_tgt_periods) & (df_obj2_2['country'] == tgt)]['terr_group']
      .value_counts()
      .head(5)
      .index.tolist()
  )

  group_alignment_results.append({
      'source': src,
      'target': tgt,
      'lag': lag,
      'source_peak_months': [period.strftime('%Y-%m') for period in peak_months_src_periods],
      'target_aligned_months': [period.strftime('%Y-%m') for period in aligned_months_tgt_periods],
      'source_groups': src_groups,
      'target_groups': tgt_groups
  })

df_group_aligned = pd.DataFrame(group_alignment_results)

In [None]:
df_group_aligned

Unnamed: 0,source,target,lag,source_peak_months,target_aligned_months,source_groups,target_groups
0,Ireland,United Kingdom,0,"[2008-08, 2012-07, 2012-08, 2012-10, 2013-01, ...","[2008-08, 2012-07, 2012-08, 2012-10, 2013-01, ...","[UKN, Conspiracy theory extremists, Dissident ...","[UKN, Conspiracy theory extremists, Dissident ..."
1,United Kingdom,Ireland,0,"[2000-09, 2001-01, 2001-02, 2001-08, 2005-07, ...","[2000-09, 2001-01, 2001-02, 2001-08, 2005-07, ...","[UKN, Dissident Republicans, The New Irish Rep...","[UKN, Conspiracy theory extremists, The New Ir..."


In [None]:
(set(df_group_aligned.source_groups[0]) | set(df_group_aligned.target_groups[0])) & \
(set(df_group_aligned.source_groups[1]) | set(df_group_aligned.target_groups[1]))

{'Conspiracy theory extremists',
 'Dissident Republicans',
 'The New Irish Republican Army',
 'UKN'}

<font color='orange'><b>Observations</b></font>

- For the bidirectional pair Ireland ‚Üî United Kingdom, the **same actor groups** were active in the same months (lag = 0):

  - Dissident Republicans

  - The New Irish Republican Army (NIRA)

  - Conspiracy theory extremists

  - UKN (Unknown group names)

- No conflicting or unrelated groups appeared during these months.

- Both countries show **symmetrical presence** of nationalist or separatist factions.

<br>

<font color='orange'><b>Insights</b></font>

- The actors involved provide **clear operational alignment**, strengthening the case for spillover.

- Presence of the same extremist groups across borders suggests **cross-border activation, shared planning, or ideological propagation**.

- The appearance of UKN reflects partial attribution gaps but doesn‚Äôt negate the visibility of key groups like NIRA and Dissident Republicans.

- These actor patterns affirm that the **Ireland-United Kingdom pair is not just statistically correlated**, but likely driven by **historical extremist continuity**.

<font color='lightgreen'><b>Wrapping-up Obj-2</b></font>

*Despite the broad testing across 41 European countries, **no widespread operational spill-over patterns** were found. However, in the case of **Ireland and the United Kingdom**, results show:*
- *Strong correlation of terrorist activity on a **monthly level**.*
- *Aligned activity by **the same groups** or **ideologically similar actors**.*
- *Historical continuity in nationalist extremist movements across borders.*

*This suggests that **terrorism spill-over in Europe is not the norm**, but **can exist under specific geopolitical, historical, and ideological conditions**, particularly in regions with long-standing cross-border disputes and group networks.*