## Insurgency and EU law

#### Integration of datasets

In [1]:
import pandas as pd

Function to retrieve and filter desired variables and group's names

In [2]:
def filter_tables_groups(table, proposed_variables, group_name):
    """table: pandas df retrieved raw, proposed_variables: from excel sheet, group_name: varoable that correspond the group name as it is"""
    existing_columns = list(table.columns)
    columns_selected = list(set(proposed_variables) & set(existing_columns))
    table = table[columns_selected]
    groups = list(table[group_name].unique())
    return table, groups

---
#### Reputation of Terror Groups (RTG) Dataset
Description: The dataset contains all domestic terrorist groups, which defined in Enders et al. (2011) and based on Global Terrorism Database, with more than 5 terrorist attacks from 1980 to 2011. The data is in group name - year format. The data codes terrorist groups' actions which can build reputation among constituency and out-group. Researchers can found originally coded variables in regard to building positive and negative reputation among the audience as well as existing group level variables.

[Link to data](http://www.efetokdemir.com/data.html)

In [3]:
rtg_table = pd.read_stata('datasets/replicationdatajpr-oldstata.dta')

In [4]:
rtg_table.head()

Unnamed: 0,year,gname,ffund,childrec,frec,rebel,parterr,terpwing,teraff,govcaus,...,nat,civcausreal,civcauseffreal,outnegrep,cleavage,reputation,last,counter,endedtype,endedtype2
0,1989,1 May Group,0,0,0,0,0,0,0,1,...,0.0,0.25,1.0,1.0,1.0,0.0,3.0,1.0,0.0,0.0
1,1991,1 May Group,0,0,0,0,0,0,0,0,...,0.0,2.333333,0.0,0.0,1.0,0.0,3.0,2.0,0.0,0.0
2,1992,1 May Group,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,1.0,0.0,3.0,3.0,1.0,1.0
3,1989,16 January Organization for the Liberation of ...,0,0,0,0,0,0,0,1,...,,22.625,0.0,0.0,,0.0,1.0,1.0,1.0,1.0
4,1983,2 April Group,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0


In [5]:
rtg_proposed_variables = ['gname',
'tcode',
'ffund',
'frec',
'parterr',
'terpwing',
'teraff',
'politics',
'media',
'pgood',
'intposrep',
'intnegrep',
'outnegrep',
'intnetrep',
'netrep',
'reputation',
'age',
'rebel',
'goal',
'broadgoal',
'type',
'terrstrong',
'statespons',
'international',
'nmbrtrr',
'left',
'rel',
'nat',
'peaksize',
'cleavage',
'endedtype',
'endedtype2',
'govcaus',
'educcaus',
'civcaus',
'nkill',
'nkillter',
'nwound',
'propertycount',
'govtargcount',
'civtargcount',
'eductargcount',
'eductargexist',
'civcausreal',
'civcausrealeff',
'ccodecow',
'logarea',
'loggdp',
'logpop',
'logmil']

In [6]:
df_rtg, groups_rtg = filter_tables_groups(rtg_table, rtg_proposed_variables, 'gname')

In [7]:
df_rtg.shape

(2641, 43)

In [8]:
len(groups_rtg)

443

---
#### The Foundations of Rebel Group Emergence (FORGE) Dataset

It provides information on the origins of violent non-state actors engaged in armed conflict against their government resulting in 25+ yearly battle deaths, active between 1946 and 2011. The unit of observation in this dataset is the rebel group organization. We also include information on the dyad and conflict in which these groups are participants for ease of integration with various Uppsala Conflict Data Program (UCDP) datasets. We draw upon the population of groups included in the Non-State Actor database described in greater detail here:
    
[Link to data](http://ksgleditsch.com/eacd.html)

In [9]:
forge_table = pd.read_excel('datasets/forge_v1.0_public.xlsx')#, header=[0], sheetname='Sheet')

In [10]:
forge_table.head()

Unnamed: 0,conflict_id,dyadid,NSAdyadid,actorid,gacronym,gname,ccode,cname,foundloc,foundyear,...,preorgfmr,preorgrel,preorgfor,preorgref,preorgeth,preorgoth,preorgname,merger,splinter,splinterUCDP
0,333.0,725.0,731.0,293.0,Harakat-i Inqilab-i Islami-yi Afghanistan,Movement of the Islamic Revolution/Uprising of...,700,Afghanistan,Pakistan,1978,...,0,0,0,0,0,0,Muslim Youth,0,,
1,333.0,731.0,737.0,298.0,Harakat-i Islami-yi Afghanistan,Islamic Movement,700,Afghanistan,"Qom, Iran",1978,...,0,0,0,0,0,0,,0,,
2,333.0,726.0,412.0,299.0,Hizb-i Islami-yi Afghanistan,Islamic Party of Afghanistan,700,Afghanistan,Pakistan,1976,...,0,0,0,0,0,0,Jam'iyyat-i Islami-yi Afghanistan,0,1.0,0.0
3,333.0,727.0,760.0,294.0,Hizb-i Islami-yi Afghanistan - Khalis faction,Islamic Party of Afghanistan - Khalis faction,700,Afghanistan,"Khugiani, Afghanistan",1979,...,0,0,0,0,0,0,Hizb-i Islami-yi Afghanistan,0,1.0,1.0
4,333.0,732.0,413.0,300.0,Hizb-i Wahdat,Unity Party,700,Afghanistan,"Tehran, Iran",1979,...,0,0,0,0,0,0,"Sazman-i-Nasr, Sepah-i Pasdaran/Pasdaran-i-Jih...",1,0.0,


In [11]:
forge_proposed_variables = [
    'conflict_id',
'dyadid',
'NSAdyadid',
'actorid',
'gacronym',
'gname',
'ccode',
'cname',
'foundloc',
'foundyear',
'foundmo',
'foundday',
'fightyear',
'fightmo',
'fightday',
'goalnominal',
'goalindep',
'goalauto',
'goalrights',
'goalrep',
'goalchange',
'goaldem',
'goalother',
'goalnote',
'ideology',
'ideolcom',
'ideolleft',
'ideolright',
'ideolnat',
'ideolanti',
'ideolrel',
'ideoloth',
'ideolnote',
'religious',
'religion',
'ethnic',
'ethnicity',
'preorg',
'preorgno',
'preorgreb',
'preorgpar',
'preorgmvt',
'preorgyou',
'preorglab',
'preorgrel',
'preorgmil',
'preorgfmr',
'preorggov',
'preorgfor',
'preorgref',
'preorgeth',
'preorgoth',
'preorgname',
'merger',
'splinter',
'splinterUCDP'
]

In [12]:
df_forge, groups_forge = filter_tables_groups(forge_table, forge_proposed_variables, 'gname')

In [13]:
df_forge.shape

(430, 56)

In [14]:
len(groups_forge)

415

---
#### Big Allied and Dangerous (BAAD) DatasetVersion 2.0

This dataset is an extract from the Big Allied and Dangerous (BAAD) Version 2.0 dataset used to create the results published in the article entitled "Crime, Conflict and the Legitimacy Tradeoff: Explaining Variation in Insurgents' Participation in Crime"
The Big Allied and Dangerous (BAAD) project focuses on creation and maintenance of a comprehensive database of terrorist and insurgent organizations – collectively referred to as “violent nonstate actors” (VNSAs) – and linking that data to prominent event, insurgency, and country characteristics datasets. Big Allied and Dangerous Version 2.0. BAAD Version 2.0 (BAAD2) contains data on nearly 600 and terrorist and insurgent organizations active 1998-2015 (with extension through 2017 planned). Organized into yearly time slices, BAAD2 records information on organizational characteristics (demographics, ideology, political activity, structure, leadership, exposure to counter-terrorism activity, social service provision, and engagement in violence) and organizational network relationships (both positive and negative).

[Link to data](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JT6GFR)

In [15]:
baad_table = pd.read_excel('datasets/BAAD2 Insurgency Crime Dataset.xlsx')#, header=[0], sheetname='Sheet')

In [16]:
baad_table.head()

Unnamed: 0,org,torg,year,torg_year,hbase,hbccode,hbiso,hb_iso_cc,left,reli,...,lead_hierarch,fddrugtk,fdextort,fdkidnap,fdrob,fdsmuggl,fdstate,ucdpbd,socsvcs,crim_degr_py
0,Abu Sayyaf Group (ASG),4,1998,4_1998,Philippines,840,PHL,608,0,1,...,1,1,1,1,1,1,0,55,0,
1,Abu Sayyaf Group (ASG),4,1999,4_1999,Philippines,840,PHL,608,0,1,...,1,1,1,1,1,1,0,0,0,1.0
2,Abu Sayyaf Group (ASG),4,2000,4_2000,Philippines,840,PHL,608,0,1,...,0,1,1,1,1,1,0,379,0,0.0
3,Abu Sayyaf Group (ASG),4,2001,4_2001,Philippines,840,PHL,608,0,1,...,0,1,1,1,1,1,0,333,0,0.0
4,Abu Sayyaf Group (ASG),4,2002,4_2002,Philippines,840,PHL,608,0,1,...,0,1,1,1,1,1,0,249,0,1.0


In [17]:
baad_proposed_variables = [
'ORG',
'TORG',
'YEAR',
'TORG_YEAR',
'HBASE',
'HBCCODE',
'HBISO',
'HB_ISO_CC',
'LEFT',
'RELI',
'ETHN',
'AGE',
'SIZE_REC',
'TERRCNTRL',
'LEAD_HIERARCH',
'FDDRUGTK',
'FDEXTORT',
'FDKIDNAP',
'FDROB',
'FDSMUGGL',
'FDSTATE',
'UCDPBD',
'SOCSVCS',
'CRIM_DEGR_PY'
]


In [18]:
baad_proposed_variables = [x.lower() for x in baad_proposed_variables]

In [19]:
df_baad, groups_baad = filter_tables_groups(baad_table, baad_proposed_variables, 'org')

In [20]:
df_baad.shape

(1386, 24)

In [21]:
len(groups_baad)

140

---
#### Intersections

In [25]:
print('Rebel groups per dataset\nFORGE: {}\nRTG: {}\nBAAD: {}'\
      .format(len(groups_forge), len(groups_rtg), len(groups_baad)))

Rebel groups per dataset
FORGE: 415
RTG: 443
BAAD: 140


In [26]:
len(set(groups_forge) & set(groups_rtg))

14

In [27]:
len(set(groups_baad) & set(groups_rtg))

69

In [28]:
len(set(groups_baad) & set(groups_forge))

17

In [29]:
set(groups_baad) & set(groups_forge) & set(groups_rtg)

{'Al-Shabaab', 'Karen National Union', 'Oromo Liberation Front'}

#### Only 3 extact matches among the 3 datasets

In [31]:
pd.Series(groups_baad).to_csv('groups_baad.csv')
pd.Series(groups_forge).to_csv('groups_forge.csv')
pd.Series(groups_rtg).to_csv('groups_rtg.csv')