## RAND Database of Worldwide Terrorism Incidents

The RAND Database of Worldwide Terrorism Incidents (RDWTI) is a compilation of data from 1968 through 2009.

This legacy RAND project developed and maintained a database of terrorism incidents stretching back to 1968, which provides comprehensive information on international and domestic terrorism. Over the years, many public and private sponsors have contributed to the maintenance of the RDWTI and its predecessors, the RAND Terrorism Chronology and the RAND-MIPT Terrorism Incident Database.

The data can be found here: https://www.rand.org/nsrd/projects/terrorism-incidents.html

With over 40,000 incidents of terrorism coded and detailed, the quality and completeness of the RDWTI was remarkable for its time. RAND staff conducted extensive research on candidate terrorist attacks, drawing on staff with regional expertise, relevant language skills, and in-country field work experience.

1. Please navigate to this website and download the data
2. Save the data as a csv file
3. Read the csv file into a pandas dataframe.

In [9]:
#Your code here:

#import pandas and set it up
!pip install pandas
import pandas as pd



In [10]:
#Read into a panda, and use the correct encoding (considering that utf-8 wasn't working)
df = pd.read_csv("RAND.csv", encoding="ISO-8859-1")

## Pandas groupby and get_group
Please review pandas groupby and get_group methods here: https://www.geeksforgeeks.org/python-pandas-dataframe-groupby/

These methods will be very helpful in completing your homeword!


Part 1: Please use the data, and provide evidence to answer the questions below. Please use visualizations/plots where appropriate to tell convey your evidence more completely.

1. What are the top 5 countries where terrorism incidents occured?

In [11]:
#Take a look at the head of the data
df.head()

Unnamed: 0,Date,City,Country,Perpetrator,Weapon,Injuries,Fatalities,Description
0,9-Feb-68,Buenos Aires,Argentina,Unknown,Firearms,0,0,ARGENTINA. The second floor of the U.S. embas...
1,12-Feb-68,Santo Domingo,Dominican Republic,Unknown,Explosives,0,0,DOMINICAN REPUBLIC. A homemade bomb was found...
2,13-Feb-68,Montevideo,Uruguay,Unknown,Fire or Firebomb,0,0,URUGUAY. A Molotov cocktail was thrown outsid...
3,20-Feb-68,Santiago,Chile,Unknown,Explosives,0,0,CHILE. An explosion from a single stick of dy...
4,21-Feb-68,"Washington, D.C.",United States,Unknown,Explosives,0,0,UNITED STATES. The Soviet embassy was bombed ...


In [96]:
#If there was "terror" mentioned in any of the descriptions, I counted that incidents as involving terrorism. I just counted those up 

#Group by country to get a full list of countries
country_group = df.groupby("Country")

#Pull unique countries into a list
unique_countries = df["Country"].unique()

#Create a dictionary with 0 for each country
terror_dictionary = {}
for country in unique_countries:
    terror_dictionary[country] = 0

#Get a specific country
for each in unique_countries:
    each_country = country_group.get_group(each)
    
    #Figure out the number of rows for that country
    index_list = each_country.index.tolist()

    #Edit the terror list to see how many rows have it mentioned
    for each_row in index_list:
        description = each_country.loc[each_row,"Description"]
        if "terror" in str(description):
            terror_dictionary[each] += 1

#Print results
print(terror_dictionary)

#Find the top 5
print(" ")
print(" ")
top_items = sorted(terror_dictionary.items(), key=lambda x: x[1], reverse=True)[:5]
new_dict = dict(top_items)

print("These are the top 5:")
print(new_dict)

{'Argentina': 16, 'Dominican Republic': 1, 'Uruguay': 2, 'Chile': 2, 'United States': 49, 'Israel': 88, 'Ecuador': 1, 'Colombia': 22, 'Guatemala': 7, 'Austria': 13, 'France': 64, 'Brazil': 2, 'Spain': 32, 'Canada': 4, 'Belgium': 12, 'Bahamas': 0, 'Denmark': 5, 'Norway': 1, 'Italy': 24, 'Turkey': 58, 'Bolivia': 5, 'Federal Republic of Germany': 39, 'Greece': 48, 'Switzerland': 8, 'Australia': 2, 'Jordan': 9, 'Angola': 0, 'Pakistan': 41, 'India': 30, 'United Kingdom': 26, 'Sudan': 1, 'Philippines': 32, 'Yugoslavia': 1, 'Japan': 4, 'Egypt': 29, 'Netherlands': 14, 'Ethiopia': 2, 'Peru': 33, 'Lebanon': 34, 'New Zealand': 0, 'West Bank/Gaza': 38, 'Jamaica': 0, 'Paraguay': 1, 'Venezuela': 9, 'South Africa': 1, 'Iran': 14, 'Thailand': 56, 'Costa Rica': 6, 'China, Republic of (Taiwan)': 0, 'Portugal': 6, 'Puerto Rico': 4, 'Cambodia': 5, 'Malaysia': 2, 'Korea, South': 0, 'Sweden': 8, 'USSR': 2, 'Sri Lanka (Ceylon)': 1, 'Mozambique': 0, 'Honduras': 2, 'El Salvador': 7, 'Yemen': 4, 'Ireland': 4, '


2. In those top 5 countries, what were the common perpetrators, weapons and objectives?

* Manually create a list with the 5 countries in it
* take the code from above to find the perpetrators and weapons
* 

In [106]:
#Manually make a list for the top 5 countries for the most terrorism
top_five = ["Iraq","Israel","France","Turkey","Thailand"]

#Group by country to get a full list of countries
country_group_1 = df.groupby("Country")

#Find the different perpetrators (put in a list)
perpetrators = df["Perpetrator"].unique()
weapons = df["Weapon"].unique()

#Create a dictionary with 0 for each type of perpetrator and weapon
perpetrator_dict = {}
for perpetrator in perpetrators:
    perpetrator_dict[perpetrator] = 0

weapons_dict = {}
for weapon in weapons:
    weapons_dict[weapon] = 0

#Get a specific country
for each in top_five:
    each_country = country_group_1.get_group(each)
    
    #Figure out the number of rows for that country
    index_list = each_country.index.tolist()
    
    #Figure out the different perpetrators & weapons
    #Edit the terror list to see how many rows have it mentioned
    for each_row in index_list:
        perp_input = each_country.loc[each_row,"Perpetrator"]
        perpetrator_dict[perp_input] += 1
        
        weapon_input = each_country.loc[each_row,"Weapon"]
        weapons_dict[weapon_input] += 1
        
    #Filter out just the top 6 for perpetrators and weapons
    top_perps = sorted(perpetrator_dict.items(), key=lambda x: x[1], reverse=True)[:6]
    top_weapons = sorted(weapons_dict.items(), key=lambda x: x[1], reverse=True)[:6]
    top_perps_dict = dict(top_perps)
    top_weapons_dict = dict(top_weapons)
    #Print results
    print(f"These are the top 6 Perpetrators for {each}:")
    print(top_perps_dict)
    print("  ")
    print(f"These are the top 6 Weapons for {each}:")
    print(top_weapons_dict)
    print("--------------------------------")


These are the top 6 Perpetrators for Iraq:
{'Unknown': 9765, 'Tanzim QaÕidat al-Jihad fi Bilad al-Rafidayn': 208, 'Al Qaeda': 165, 'Islamic State of Iraq': 135, 'Mujahideen Council': 84, 'Ansar al-Sunnah': 72}
  
These are the top 6 Weapons for Iraq:
{'Explosives': 5070, 'Firearms': 4723, 'Unknown': 475, 'Remote-detonated explosive': 337, 'Fire or Firebomb': 65, 'Knives & sharp objects': 52}
--------------------------------
These are the top 6 Perpetrators for Israel:
{'Unknown': 10299, 'Tanzim QaÕidat al-Jihad fi Bilad al-Rafidayn': 209, 'Other': 187, 'Palestine Islamic Jihad (PIJ)': 169, 'Al Qaeda': 165, 'Islamic State of Iraq': 135}
  
These are the top 6 Weapons for Israel:
{'Explosives': 6432, 'Firearms': 4841, 'Unknown': 532, 'Remote-detonated explosive': 358, 'Knives & sharp objects': 120, 'Fire or Firebomb': 115}
--------------------------------
These are the top 6 Perpetrators for France:
{'Unknown': 10935, 'Other': 325, 'Tanzim QaÕidat al-Jihad fi Bilad al-Rafidayn': 209, 'Pa

3. What were the top 10 most deadly attacks?


4. How often was "kidnapping" or some derviation of the word mentioned in all of the incident reports?

5. When kindapping was mentioned, how often did the incident result in fatalities?

6. When kidnapping was mentioned, how often was "ransom" mentioned?

7. In all of the incidents, how often were "students" mentioned as perpertators?


8. What was the first incident where a "suicide bomber" was mentioned?

9. How often were "priests" or "clergy" mentioned?

10. Name all the incidents where a woman or women were identified as terrorists. Not the victims, the terrorists.

## Part 2
In the incident reports investigate motivations of terrorism and categorize them into economic, political, religious or some combination of those categories. How would you do it? Is it a dictionary of economic words? The mention of money? The mention of religion? The word "liberation?" What's the signal here? Could you use vectors? You could use research or domain knowledge into perpetrators and their stated goals. You will have to resolve or define new categories. For example, is the Irish Republican Army religious or poltical or RP as a new category? Is Hamas religious, political or both? Find the signals and create a new column that categorizes the incidents. 

There is no "right" answer here, just a way for you as a emerging data scientist to think about how to parse data and categorize it. Just declare categories and justify your logic! I am as interested in your reasoning, as I am in your code!
