# **Project Name**    - Global Terrorism



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member 1 -** Sourav Karak


# **Project Summary -**

### Why I choose this project?


I am interested in a project that dives into a variety of resources to help my understand complex issues related to war, security, and peace. I want to explore topics like environmental damage and insecurity based on gender, with the goal of gaining a deeper understanding of today's security challenges.I also want to sharpen your skills to better analyze these complicated issues, including those linked to terrorism.

### What is Terrorism ?

The word "terrorism" comes from the Latin word "terror," which means "great fear." It was first used during the French Revolution in 1795. Back then, it described the planned use of violence and brutality to scare people, with the goal of pushing a specific political or social idea. Today, terrorism can be seen in many different forms and contexts, used by various groups to spread fear.

Although the United Nations Security Council agrees that terrorism is a threat to peace and security, it hasn't provided a clear definition of what terrorism is in its resolutions. Instead, it asks each country to define terrorism in its own laws. As a result, different countries have their own ways of describing terrorism and who counts as a terrorist.

### International Terrorism:
Acts of violence and criminality perpetrated by individuals and/or groups who draw inspiration from or have affiliations with designated foreign terrorist organizations or state-sponsored entities.

### Domestic terrorism:

Acts of violence and criminality perpetrated by individuals and groups often serve to advance ideological objectives rooted in domestic influences, spanning political, religious, social, racial, or environmental motivations. The criteria for categorizing terrorism typically involve considerations of the perpetrator, the victim, the method employed, and the underlying purpose. Different definitions of terrorism prioritize various characteristics, reflecting the concerns of the respective authorities.

In the past decade, terrorism has resulted in an average of 26,000 fatalities annually worldwide. The global death toll from terrorism fluctuated, reaching its highest at 44,600 in 2014 and declining to 8,200 in 2011. In 2017, terrorism accounted for 0.05% of global deaths. Geographically, terrorism exhibits a significant concentration, with 95% of fatalities in 2019 occurring in the Middle East, Africa, and South Asia. Although terrorism typically represents less than 0.01% of deaths in most countries, in regions marked by high levels of conflict, it can constitute several percent of fatalities.

While airline hijackings were once commonplace, they are now exceedingly rare. Nonetheless, public apprehension about terrorism remains substantial, with surveys indicating that more than half of respondents in many countries express concerns about becoming victims. Media coverage of terrorism often surpasses its frequency and share of fatalities, contributing to perceptions that may not accurately reflect the relative risk posed by terrorism compared to other threats.

### When I study terrorism, I often use the Global Terrorism Database (GTD) as a key source for information on terrorist incidents and deaths worldwide. It's considered the most thorough database for this kind of data, offering valuable insights into the world of terrorism. But it's important to remember that even a detailed database like the GTD has its limits. We need to be careful about what we conclude or assume from the patterns or trends we see in this data.

# **GitHub Link -**

https://github.com/karakjr

# **Problem Statement**


### As a security/defense analyst, my focus lies in identifying the regions most severely affected by terrorism, as well as understanding the types of weapons commonly employed by terrorist groups. Additionally, I aim to ascertain the most active terrorist organizations operating globally. Through exploratory data analysis (EDA), I seek to uncover various security concerns and glean insights crucial to promoting and safeguarding human rights.

#### **Define Your Business Objective?**

### Every day, thousands of researchers, analysts, policymakers, and students rely on the Global Terrorism Database (GTD) to delve into the complexities of global security. Our goal is to comprehensively grasp the capabilities and shortcomings of current security mechanisms by conducting an exhaustive examination of this database. Our objective is to dissect the root causes and repercussions of terrorism through a meticulous analysis of the GTD. By identifying hot zones of terrorist activity and pinpointing the most active terrorist groups, we aim to maintain vigilant oversight and curb their influence. Furthermore, we endeavor to catalog the types of weapons utilized, enabling us to advocate for their restriction and prohibition. Ultimately, our mission is to prevent and mitigate terrorism, fostering a world characterized by peace and harmony.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
# Imporing all the necessary libraries
import numpy as np # Numerical operations on arrays and matrices
import pandas as pd #  Data manipulation and analysis
import matplotlib.pyplot as plt #  Data visualization (basic plots).
import seaborn as sns # Statistical data visualization (built on top of Matplotlib).
import missingno as msno # To visualize missing values
%matplotlib inline
import folium # To create interactive leaflet maps directly in your Jupyter
import plotly.express as px # To create a variety of interactive visualizations
! pip install pycountry


### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:

# Read the CSV file.
data = pd.read_csv('/content/drive/MyDrive/Global Terrorism Data.csv', encoding='latin1')
g_df=pd.DataFrame(data)


### Dataset First View

In [None]:
# Dataset First Look
g_df.sample(n=3) #selected values randomly

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
g_df.shape #having approx 1.9 lakh rows and 135 columns

In [None]:
# number of rows
number_of_rows = len(g_df.index)
print(number_of_rows)

In [None]:
# Total number of columns
number_of_columns = len(g_df.columns)
print(number_of_columns)

In [None]:
# column names
g_df.columns

### Dataset Information

In [None]:
# Dataset Info
g_df.info()

#### Duplicate Values

In [None]:
# # Dataset Duplicate Value Count
new_var = g_df.duplicated().value_counts()
new_var


It means that there are 181691 rows in g_df, and none of them are duplicates. Essentially, all rows are unique because the number of False values (indicating non-duplicate rows) is equal to the total number of rows in the DataFrame.

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
g_df.isna().sum().sort_values(ascending=False).head(n=10)


This indicates that these 10 columns have the most missing values in the g_df DataFrame.

In [None]:
# Visualizing the missing values
import missingno as msno
msno.bar(g_df)

This visualization is helpful when you want to get a quick snapshot of where the missing data is concentrated in your DataFrame. If you find that some columns are almost completely missing, it might be worth investigating whether those columns are worth keeping or if there are any contextual reasons for the high number of missing values.

### What did you know about your dataset?

Upon examination of the dataset, we comprehended the significance of the values encapsulated within each column. Our analysis revealed that the dataset encompasses comprehensive details of terrorist incidents occurring globally from 1970 to 2017. It furnishes information on various aspects including locations, dates, responsible terrorist groups, weapons utilized, targets, casualties, and more. Additionally, we identified certain column headings with vague descriptors and made the decision to exclude them from our analysis to maintain clarity and focus.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
g_df.columns

In [None]:
# Dataset Describe
g_df.describe(include='all')

### Variables Description

Here's a revised version of the provided list of column headings with brief descriptions:

1. **eventid**: Unique identifier for each terrorist attack event.
2. **iyear**: Year in which the event occurred.
3. **imonth**: Month of the event.
4. **iday**: Day of the event.
5. **approxdate**: Approximate date of the event in DD/MM/YYYY format.
6. **extended**: Binary value indicating whether the duration of the incident was extended.
7. **resolution**: Resolution details related to the incident.
8. **country**: Numeric code representing the country where the attack occurred.
9. **country_txt**: Name of the country where the attack occurred.
10. **region**: Numeric code representing the region where the attack occurred.
11. **success**: Binary value indicating the success of the attack.
12. **addnotes**: Additional notes providing details about the attack.
13. **scite1**: Primary source citation providing details about the attack site.
14. **scite2**: Secondary source citation providing additional details about the attack site.
15. **scite3**: Tertiary source citation providing further details about the attack site.
16. **dbsource**: Source of the data, often the name of the mission or organization.
17. **weapontype**: Type of weapon used by the terrorists.
18. **targettype**: Type of target that was attacked by the terrorists.
19. **gname**: Name of the terrorist organization responsible for the attack.
20. **city**: Name of the city where the attack occurred.

This revised list aims to provide clear and concise descriptions of each column heading, enhancing the understanding of the dataset's contents.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in g_df.columns.tolist():
  print("Number of unique values in ",i,"is ->",g_df[i].nunique(),".")

## 3. ***Data Wrangling***

### Since the dataset has a lot of columns, trying to remember all of them is too hard. So, we'll give the columns simpler names to make them easier to understand. Then, we'll only keep the most important columns we need for our analysis. This will make the dataset simpler and easier to work with, helping us analyze the data more effectively.

In [None]:
try:
  # Renaming columns
  terror_master_data = g_df.rename(columns= {'iyear':'Year','imonth':'Month','iday':'Day','country_txt':'Country','provstate':'state','region_txt':'Region','attacktype1_txt':'AttackType','target1':'Target','nkill':'Killed','nwound':'Wounded','summary':'Summary','gname':'Group','targtype1_txt':'Target_type','weaptype1_txt':'Weapon_type','motive':'Motive','success':'Success'},inplace=True)
  #Removing unwanted columns
  terror_master_data = g_df[['Year','Month','Day','Country','state','Region','city','latitude','longitude','AttackType','Killed','Wounded','Target','Summary','Group','Target_type','Weapon_type','Motive','Success']]
except Exception as e:
  print(e)

In [None]:
g_df.head(n=3)

In [None]:
g_df.info()

In [None]:
g_df.shape

In [None]:
# Missing Values/Null Values Count
g_df.isna().sum().sort_values(ascending=False).head(n=10)

## Exploratory Data Analysis

In [None]:
g_df.hist(figsize=(40,20))  # This represents  the distribution of data  on each series in the DataFrame.

In [None]:
# Replace NaN values in the "Killed" column with zero.
# This is useful because NaN typically represents missing or undefined data.
# By filling it with zero, we are treating missing values as cases where no one was killed.
g_df["Killed"] = g_df["Killed"].fillna(0)

# Similarly, replace NaN values in the "Wounded" column with zero.
# This approach ensures that missing values are treated as cases where no one was wounded.
g_df["Wounded"] = g_df["Wounded"].fillna(0)

# Create a new column "Casualty" by summing "Killed" and "Wounded".
# This new column represents the total number of casualties for each record in the dataset.
g_df["Casualty"] = g_df["Killed"] + g_df["Wounded"]


In [None]:
g_df.describe()

## Observasion

1. The data includes terrorist activities that happened between the years 1970 and 2017.
2. The most people killed in a single event were 1570.
3. The most people wounded in a single event were 8191.
4. The highest total number of casualties in a single event were 9574.


### What all manipulations have you done and insights you found?

 We updated the column names to be easier to understand and removed columns that were unclear or confusing. Now, our dataset only includes columns that we can work with effectively.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

Terrorist attacks often shake up a country's economy, causing instability. But some countries are less affected by this because of various reasons.

In [None]:


# This code block plots a bar chart showing the number of attacks in various countries.
try:
    # Count the number of times each country appears in the dataset
    country_wise_attack_count = g_df['Country'].value_counts()

    # Sort the country counts in ascending order
    country_wise_attack_count.sort_values(axis=0, inplace=True, ascending=[True])
except Exception as e:
    # If an error occurs, print the error message
    print(e)
else:
    # If no error occurs, plot a bar graph for the top 20 countries with the most attacks
    plt.rcParams['figure.figsize'] = (15, 8)  # Set the plot size
    # Plotting horizontal bar chart for top 20 countries in reverse order to show the most attacked countries on top
    country_wise_attack_count[20::-1].plot(kind='barh', color="#8481DD")  # 20 countries from most to least
    plt.ylabel('Country')  # Label for the y-axis
    plt.xlabel('Attack Count')  # Label for the x-axis
    plt.title("Country-wise Attack Count")  # Title for the plot

    # Show the plot with the configured settings
    plt.show()


##### 1. Why did you pick the specific chart?

We picked bar graphs because they're easy to understand. People can see the differences in lengths more clearly than differences in areas or angles. So, to compare the number of attacks in different countries, we chose a bar graph. We used horizontal bars to fit more countries on the screen at once.

##### 2. What is/are the insight(s) found from the chart?

The chart makes it obvious which countries are least affected by terrorism. We noticed that countries like North Korea, Antigua & Barbuda, and Vatican City have something in common: they don't have a lot of different religions, which means there's less disagreement. Plus, these countries have strong central governments that help maintain law and order effectively.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, this insight can help other nations, like RAW and NIA in India, to improve their border security and handle internal affairs more effectively. However, it also suggests that religious extremism is a major factor behind terrorism.

#### Chart - 2

Every life is precious, and inflicting mental anguish is a serious offense. The government should extend support to the families of the deceased and the injured. To do this, they need to accurately count the total number of deaths and injuries.

In [None]:
# Chart - 2 visualization code

In [None]:
#Total Casualties (Killed + Wounded) in each Year
yc=g_df[["Year","Casualty"]].groupby("Year").sum()
yc.head()

In [None]:

# Plotting a bar graph to visualize total casualties each year
yc.plot(
    kind="bar",  # Specify that we want a bar plot
    color="#73C5C5",  # Use a specific color for the bars
    figsize=(15, 6)  # Set the size of the figure
)

# Set the title for the plot
plt.title("Year_wise_Casualties", fontsize=13)  # Title with font size 13

# Set the label for the x-axis and adjust its font size
plt.xlabel("Years", fontsize=13)  # Label for x-axis with font size 13

# Customize the tick labels on the x-axis
plt.xticks(fontsize=15)  # Set font size for x-axis tick labels

# Set the label for the y-axis and adjust its font size
plt.ylabel("Number of Casualties", fontsize=13)  # Label for y-axis with font size 13

# Display the plot
plt.show()  # This command shows the plot on the screen


In [None]:
# calculate killed people in  each year
yk=g_df[["Year","Killed"]].groupby("Year").sum()
yk.head()

In [None]:
# calculate wounded people in each reagion
yw=g_df[["Year","Wounded"]].groupby("Year").sum()
yw.head()

In [None]:

# Create a new figure with two subplots, one above the other
fig = plt.figure()  # Start a new figure

# Create the first subplot in a 2x1 grid, at position 1 (top plot)
ax0 = fig.add_subplot(2, 1, 1)

# Create the second subplot in a 2x1 grid, at position 2 (bottom plot)
ax1 = fig.add_subplot(2, 1, 2)

# Plot a bar graph for people killed each year
yk.plot(
    kind="bar",  # Specify the type of plot (bar plot)
    color="red",  # Set the color for the bars
    figsize=(15, 15),  # Set the figure size
    ax=ax0  # Plot on the first subplot
)
ax0.set_title("People Killed in Each Year")  # Title for the top subplot
ax0.set_xlabel("Years")  # Label for the x-axis
ax0.set_ylabel("Number of People Killed")  # Label for the y-axis

# Plot a bar graph for people wounded each year
yw.plot(
    kind="bar",
    color="lightgreen",
    figsize=(15, 15),
    ax=ax1  # Plot on the second subplot
)
ax1.set_title("People Wounded in Each Year")  # Title for the bottom subplot
ax1.set_xlabel("Years")  # Label for the x-axis
ax1.set_ylabel("Number of People Wounded")  # Label for the y-axis

# Display the plots
plt.show()  # This command shows the figure with the two subplots


##### 1. Why did you pick the specific chart?

I created a code to generate two graphs in the same plot, allowing for an easy comparison of the total number of deaths and total number of wounded people.


##### 2. What is/are the insight(s) found from the chart?

In 2007, a devastating terrorist attack caused a peak in the number of deaths, with over 12,000 people losing their lives. Similarly, in 2001, there were more than 20,000 wounded, creating a lasting sense of fear and emotional trauma among the general public.

#### Chart - 3

To find the area most affected by terrorism, we can check data on terrorist attacks. This helps us see which regions have the most incidents. We can then mark these places as "red zones" to let people know they should avoid visiting. We can also alert local governments to increase their security.

In [None]:
#Distribution of Terrorist Attacks over Regions from 1970-2012
regions=pd.crosstab(g_df.Year,g_df.Region)
regions.head()

In [None]:

# Plot an area chart to visualize the number of attacks in different regions over time
regions.plot(
    kind="area",  # Specify that we want an area plot
    stacked=False,  # Do not stack areas to keep them separate
    alpha=0.5,  # Set the transparency level for the plot
    figsize=(20, 10)  # Set the size of the figure
)

# Set the plot title with a larger font size
plt.title("Region-wise Attacks", fontsize=20)

# Label the x-axis as "Years" with a larger font size
plt.xlabel("Years", fontsize=20)

# Label the y-axis as "Number of Attacks" with a larger font size
plt.ylabel("Number of Attacks", fontsize=20)

# Show the plot with the defined configurations
plt.show()  # This command displays the area plot


In [None]:
#Total Terrorist Attacks in each Region from 1970-2012
regt=regions.transpose()
regt["Total"]=regt.sum(axis=1)
ra=regt["Total"].sort_values(ascending=False)
ra

In [None]:
# ploting a bar graph to understand it by desending order
ra.plot(kind="bar",figsize=(15,6))
plt.title("Total Number of Attacks in each Region from 1970-2017")
plt.xlabel("Region")
plt.ylabel("Number of Attacks")
plt.show()

In [None]:
# calculating the total number of killed people in each reagion
rk=g_df[["Region","Killed"]].groupby("Region").sum().sort_values(by="Killed",ascending=False)
print(rk)
# calculating the total number of wounded people in each reagion
rw=g_df[["Region","Wounded"]].groupby("Region").sum().sort_values(by="Wounded",ascending=False)
print(rw)

In [None]:

# Create a new figure for the plots
fig = plt.figure()

# Create two subplots in a single row and two columns
ax0 = fig.add_subplot(1, 2, 1)  # First subplot on the left
ax1 = fig.add_subplot(1, 2, 2)  # Second subplot on the right

# Plot data for people killed by region on the first subplot
rk.plot(kind="bar", color="orange", figsize=(15, 6), ax=ax0)  # Bar plot for 'rk' data
ax0.set_title("People Killed in Each Region")  # Title for the left subplot
ax0.set_xlabel("Regions")  # Label for the x-axis
ax0.set_ylabel("Number of People Killed")  # Label for the y-axis

# Plot data for people wounded by region on the second subplot
rw.plot(kind="bar", color="skyblue", figsize=(15, 6), ax=ax1)  # Bar plot for 'rw' data
ax1.set_title("People Wounded in Each Region")  # Title for the right subplot
ax1.set_xlabel("Regions")  # Label for the x-axis
ax1.set_ylabel("Number of People Wounded")  # Label for the y-axis

# Show the plots with the defined configuration
plt.show()  # Display the figure with the two subplots


 ##### 1. Why did you pick the specific chart?

We use an area plot to display the number of terrorist attacks by region. This type of chart helps us easily see which regions have the most attacks. Using this plot, we can quickly identify the area with the highest number of incidents and declare it a "red zone," indicating that it's a high-risk area for terrorism. This way, we can simplify the results and take appropriate measures to ensure safety.

##### 2. What is/are the insight(s) found from the chart?

The top 10 regions with the most terrorist attacks are:

1. Middle East & North Africa
2. South Asia
3. South America
4. Sub-Saharan Africa
5. Western Europe
6. Southeast Asia
7. Central America & Caribbean
8. Eastern Europe
9. North America
10. East Asia
11. Central Asia

Among these, the Middle East & North Africa and South Asia are the most heavily impacted by terrorist attacks. The Middle East & North Africa has seen 50,474 attacks, leading to 137,642 deaths and 214,308 people wounded. South Asia has also experienced a high number of attacks.

These two regions are considered the most hazardous, with a high likelihood of future attacks, earning them the designation of "ultra red zone." It's critical to take extra precautions when traveling to these areas and to strengthen security measures to minimize future threats.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Since 2010, terrorist attacks in the Middle East and North Africa have dramatically increased, mainly because of political instability and civil conflicts. Governments in these regions need to take strong action to address these issues.

Countries in the region should also advise people against visiting the most affected areas, especially the Middle East, North Africa, and South Asia, where the risk of terrorism is high. This can help keep people safe and reduce the chances of further violence.

#### Chart - 4

Countries might restrict tourists from high-terrorism areas to boost safety and security. Identifying the top 10 countries most affected by terrorism helps governments focus on defense and alerts travelers to risky regions. This way, nations can work to keep their people safe and improve national security.

In [None]:

# Get the top 10 countries with the most attacks
country_wise_attacks = g_df["Country"].value_counts().head(10)  # Top 10 countries by attack count
print(country_wise_attacks)  # Display the top 10 countries and their attack counts

# Plot a pie chart to visualize the proportion of attacks among these top 10 countries
country_wise_attacks.plot(
    kind="pie",  # Create a pie chart
    figsize=(20, 9)  # Set the size of the figure
)

# Set the title for the plot with a specific font size
plt.title("Country-wise Attacks", fontsize=13)

# Label the x-axis with a font size
plt.xlabel("Countries", fontsize=13)  # Although a pie chart doesn't have a traditional x-axis, adding a label for clarity

# Customize the font size of the tick labels on the x-axis
plt.xticks(fontsize=12)  # Set the font size for x-axis tick labels

# Label the y-axis with a font size
plt.ylabel("No of Attacks", fontsize=13)  # Although a pie chart doesn't have a traditional y-axis, a label is added

# Display the plot with the defined configurations
plt.show()  # This command shows the pie chart


##### 1. Why did you pick the specific chart?

Comparative charts are simple for people to understand, which is why I prefer using this type of chart.

##### 2. What is/are the insight(s) found from the chart?

Iraq is the country with the most terrorist attacks, having 24,636 incidents. Pakistan, Afghanistan, and India have a similar number of attacks.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These countries need to strengthen their defense systems and monitor terrorist groups:

- Iraq
- Pakistan
- Afghanistan
- India
- Colombia
- Philippines
- Peru
- El Salvador
- United Kingdom
- Turkey

These nations should focus on security to prevent terrorist attacks.

#### Chart - 5

Frequent terrorist attacks can destabilize a country's economy. If we focus only on the impact of terrorism, which countries' economies are least affected?


In [None]:
def terror_hot_zones():
  try:
    country_wise_attack_count = g_df['Country'].value_counts()         #Counting number of times each country name appears
    country_wise_attack_count.sort_values(axis =0 , inplace=True,ascending=[True] )   #Sorting countries as per no. of occurances
  except Exception as e:
    print(e)
  else:
    # Plotting bar graph for 20 countries
    plt.rcParams['figure.figsize']=(15,8)
    country_wise_attack_count[20::-1].plot(kind='barh', color = "green")
    plt.title('Total terror attacks country wise')
    plt.ylabel('Country')
    plt.xlabel('Attack Count')

terror_hot_zones()

##### 1. Why did you pick the specific chart?

Bar graphs are easy to understand because people can quickly see the difference in lengths. This makes them better than other types of charts that rely on areas or angles. Since we wanted to compare the number of attacks in different countries, we picked a bar graph. We chose a horizontal bar graph because it fits more country names on the screen.

##### 2. What is/are the insight(s) found from the chart?

The chart shows which countries have the least terrorism. We found that places like North Korea, Antigua & Barbuda, and Vatican City share some common traits—they don't have much religious diversity, which might reduce conflict. Additionally, these countries often have strong central governments, which helps maintain law and order.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight can help other countries, like India's RAW and NIA, strengthen their borders and manage internal security more effectively. However, it also suggests that religious extremism might be a major cause of terrorism.

#### Chart - 6

Let's compare how well the governments in Odisha, Jharkhand, and Chhattisgarh have done in reducing Naxalism.

In [None]:
def getNaxalTrend():
  try:

    #Finding records for the states where Maoists have attacked

    #Odisha
    terror_Od=g_df.loc[(g_df['state'] == 'Orissa') & (g_df['Group'] == 'Maoists')]
    #Jharkhand
    terror_Jh=g_df.loc[(g_df['state'] == 'Jharkhand') & (g_df['Group'] == 'Maoists')]
    #Chhattisgarh
    terror_Ch=g_df.loc[(g_df['state'] == 'Chhattisgarh') & (g_df['Group'] == 'Maoists')]

    #Finding count of attacks by Maoists on the 3 states year wise
    od_count = terror_Od.groupby('Year').size()
    jh_count = terror_Jh.groupby('Year').size()
    ch_count = terror_Ch.groupby('Year').size()

  except Exception as e:
    print(e)

  else:
    #Plotting line graphs
    plt.plot(od_count, linewidth = 10)
    plt.plot(jh_count, linewidth = 10)
    plt.plot(ch_count, linewidth = 10)
    plt.legend(["Odisha","Jharkhand","Chhattisgarh"])
    plt.show()

getNaxalTrend()

##### 1. Why did you pick the specific chart?

We want to compare the trends in Maoist activities in the states of Odisha, Jharkhand, and Chhattisgarh. Line graphs are a good way to show trends or changes over time.

##### 2. What is/are the insight(s) found from the chart?

Odisha successfully tackled Naxalism, almost eliminating it in 2013. However, we noticed a sharp increase in Naxal activity in neighboring states the same year. It seems that political unrest in Jharkhand in 2013 might have been a trigger for this rise.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Government programs like education, cash grants, and surrender benefits can be expanded to further reduce Naxalism in these states. These efforts encourage people to leave the insurgency and build a better life.

#### Chart - 7

Disturbances in various regions are usually caused by different, often local, terrorist groups. Let's look at:

i) The main terrorist groups operating worldwide.
ii) The main terrorist groups operating in the country with the most terrorist attacks.

In [None]:
def get_top_terror_groups_in_hot_zones():
  try:
    #Finding top 10 terror outfits of the world
    terror_attack_count = g_df['Group'].value_counts().head(10)

    #Finding name of the most effected country
    mostEffectedCountry = g_df['Country'].value_counts().index[0]

    #Segregatting attacks on most effected country
    countries_effect_count = g_df.loc[(g_df['Country'] == mostEffectedCountry)]

    #Finding top 5 terror outfits in the most effected country
    countries_effect_count = countries_effect_count["Group"].value_counts()[0:5]

  except Exception as e:
    print(e)


  else:
    #Plotting top 10 terror outfits of the world
    plt.rcParams['figure.figsize']=(15,4)
    terror_attack_count.plot(kind='bar', color = "#4CB140")

    plt.title('Total attacks by the top 10 terrorist organizations (worldwide).')
    plt.ylabel('Attack count')
    plt.xlabel('Terror Group')
    plt.show()

    print("\n \n ")
    #Plotting top terror groups of the most effected country
    countries_effect_count.plot(kind='bar', color = '#009596')
    plt.title(f'Contribution of top 5 organisations (On top terror target country - {mostEffectedCountry})')
    plt.ylabel('Attack count')
    plt.xlabel('Terror Group')
    plt.show()

get_top_terror_groups_in_hot_zones()

##### 1. Why did you pick the specific chart?

We used a bar graph to compare the number of terror attacks by the top 10 groups globally and the top 5 groups in a specific country. The length of each bar shows the relative number of attacks and helps us see the differences between these terrorist organizations.

##### 2. What is/are the insight(s) found from the chart?

As the graphs show, most terrorist attacks are carried out by "Unknown" groups both globally and in the most affected countries.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,This pattern suggests that global peacekeeping agencies should concentrate on identifying and monitoring these "Unknown" groups. Stopping the growth of these unknown terrorist organizations can improve national security.

#### Chart - 8

Terrorist attacks are usually carefully planned and strategically executed. Let's look at the success rates of some major terrorist groups. Which organizations have the highest success rates?

In [None]:
def get_Terror_Group_Success_Rate():
  try:

    #Counting total attacks by specific groups
    totalAttacks = g_df.groupby('Group')['Group'].count()

    #Counting success of specific groups
    success = g_df.groupby('Group')["Success"].sum()

    #merging dataframes on Terror group names(Index)
    success_Rate =  pd.merge(totalAttacks[0:100], success, how='inner', left_index=True, right_index=True)

    # Calculating success rate : Success/Total * 100
    success_Rate ["Success_Rate_Value"] = (success_Rate["Success"]/success_Rate["Group"])*100
    success_Rate.sort_values(by ='Success_Rate_Value', inplace=True)

  except Exception as e:
    print(e)

  else:
    #Plotting graph depicting success rate of 100 terror groups
    sns.barplot(x = success_Rate.index, y =success_Rate['Success_Rate_Value'] )

get_Terror_Group_Success_Rate()

##### 1. Why did you pick the specific chart?

We wanted to compare the success rates of 100 terrorist organizations worldwide. To make their data easier to understand, we used a bar graph.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that most of the selected terrorist groups achieve 100% accuracy in their attacks. It reveals how precisely and strategically these operations are planned.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

National security agencies can use this data to understand why some groups have high success rates while others have low ones. Analyzing their different strategies can help identify patterns and prevent more attacks in the future.

#### Chart - 9

Analyse the birth and growth of Boko Harams in Nigeria. Who are their primary targets?

In [None]:


def get_Nigeria_boko_details():
  try:
    # Filtering those records where Boko Harams have attacked Nigeria
    nigeria_data = g_df.loc[(g_df['Country'] == 'Nigeria') & (g_df['Group'] == 'Boko Haram')]

    #Finding year wise count of the attacks
    year_wise_attack_count = nigeria_data[['Year']].value_counts()

    #Sorting records year wise
    year_wise_attack_count.sort_index(axis =0 , inplace=True)

  except Exception as e:
      print(e)

  else:

    #Plotting line graph to show trend over the years
    plt.rcParams['figure.figsize']=(10,5)
    year_wise_attack_count.plot(kind='line', color = 'teal', linewidth = 6)
    plt.title('Total terror attacks by Boko Harams in Nigeria')

    #Assigning labels for x and y axis
    plt.ylabel('Attack count')
    plt.xlabel('Year')

  try:
    #Finding count of every target type
    primary_targets = nigeria_data[['Target']].value_counts()
    #Sorting values in decreasing order to find most effected targets
    primary_targets.sort_values(axis =0 , inplace=True, ascending=[False])
    #Printing top 3 targets

    print(f"Total attacks by Boko Harams in Nigeria = {len(nigeria_data)}")
    print(primary_targets[0:3])


  except Exception as e1:
    print(e1)

get_Nigeria_boko_details()

##### 1. Why did you pick the specific chart?

We picked a line graph to show the trends over time because it clearly shows how attacks by Boko Haram went up and down. This graph helps us see when the attacks increased or decreased.

##### 2. What is/are the insight(s) found from the chart?

The graph shows that Boko Haram was a small group from 2002 to 2009, but then became a major terror group in Nigeria from 2010 to 2012. In 2009, the police used too much force against Boko Haram, leading to the group fighting back with bombings and killings. This makes us question if the government at that time had good plans or policies to deal with conflicts through meaningful dialogue, rather than violence.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The graph shows that Boko Haram's terror attacks peaked in 2012 and 2014. In 2015, President Buhari took strong steps to reduce Boko Haram's attacks in Nigeria. His approach could be studied and copied to improve safety in Nigeria and in other countries dealing with violence and civil unrest.

#### Chart - 10

Although terrorism has no religion, but terrorist organisations have specific targets. Who are the most vulnerable targets? Civilians, military or politicians?

In [None]:

def get_Primary_Target_Distribution():
    try:
        # Get the count of each target type in the 'Target_type' column
        primary_target = terror_master_data.Target_type.value_counts()
    except Exception as e:
        # If there's an error, print the error message
        print(e)
    else:
        # Set the figure size for the pie chart
        plt.rcParams['figure.figsize'] = (15, 8)

        # Create a pie chart with the target counts and their labels
        plt.pie(primary_target, labels=primary_target.index)

        # Display the pie chart
        plt.show()  # Shows the plot if no exception occurred

# Call the function to get the pie chart for primary target distribution
get_Primary_Target_Distribution()  # Execute the function to see the result


##### 1. Why did you pick the specific chart?

We used a pie chart to show how different social groups make up the total number of casualties or targets. This type of chart helps us see the proportions of each group within the overall total.

##### 2. What is/are the insight(s) found from the chart?

Terrorists mostly target civilians and private property, making up about 25% of all attacks. The military, police, and government officials are also at high risk, with each of these groups facing a similar level of danger.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This information can help improve security in public places. Since terrorists don't have personal disputes with civilians, their main goal is to spread fear. Knowing this, we can focus on making public areas safer to reduce the impact of terrorist attacks.

#### Chart - 11

Taking a closer look at the trends before and after the War on Terror, we can see significant changes. Before the War on Terror, terrorist attacks were fewer and often more focused on specific targets. After the War on Terror began, the number of attacks increased, and they became more widespread, affecting more civilians and public places.

In [None]:
# Now we will look closer at trend Before and after the War on Terror

# Filter the data for years 2001 and later
data_after = g_df[g_df['Year'] >= 2001]

# Create a figure with two subplots (2 rows, 1 column)
fig, ax = plt.subplots(figsize=(15, 10), nrows=2, ncols=1)

# Plot the change in regions per year for the whole dataset
ax[0] = pd.crosstab(g_df.Year, g_df.Region).plot(ax=ax[0])  # Line plot showing regions by year
ax[0].set_title('Change in Regions per Year')  # Title for the first subplot
ax[0].legend(loc='center left', bbox_to_anchor=(1, 0.5))  # Positioning the legend
ax[0].vlines(x=2001, ymin=0, ymax=7000, colors='red', linestyles='--')  # Vertical line to mark 2001

# Plot the bar chart for the data after 2001, stacked by region
pd.crosstab(data_after.Year, data_after.Region).plot.bar(stacked=True, ax=ax[1])
ax[1].set_title('After Declaration of War on Terror (2001-2017)')  # Title for the second subplot
ax[1].legend(loc='center left', bbox_to_anchor=(1, 0.5))  # Positioning the legend

# Display the plots
plt.show()  # Show the plots created above


##### 1. Why did you pick the specific chart?

To understand the trends before and after the War on Terror, let's look at what changed. Before the War on Terror, terrorist attacks were less frequent and often targeted specific people or places. After the War on Terror began, there were more attacks, and they affected a broader range of targets, including many civilians.

##### 2. What is/are the insight(s) found from the chart?

The first plot shows a big difference in terrorism before and after the War on Terror. Before 2001, the levels of terrorism across different regions were similar, and by 2000, they all dropped to a low point. After 2001, there was a sharp rise in terrorism, especially in the Middle East and South Asia. Sub-Saharan Africa also saw a big increase in terrorist activity during this period.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The first plot clearly shows that terrorism changed a lot before and after the War on Terror. Before 2001, different regions had similar levels of terrorist activity, and by 2000, they all dropped to very low levels. After 2001, however, terrorism rose sharply, especially in the Middle East and South Asia. Sub-Saharan Africa also saw a big increase in terrorist attacks during this time.

#### Chart - 12

Method of Attack

In [None]:

# Set the figure size for the plot
plt.figure(figsize=(13, 6))

# Create a count plot for the 'AttackType' column, ordered by the frequency of each attack type
sns.countplot(
    g_df['AttackType'],  # The column for which we want to count occurrences
    order=g_df['AttackType'].value_counts().index,  # Order by the most frequent
    palette='hot'  # Color palette for the plot
)

# Rotate the x-axis labels for better readability
plt.xticks(rotation=90)

# Label the x-axis as 'Method'
plt.xlabel('Method')

# Set the title of the plot to 'Method of Attack'
plt.title('Method of Attack')

# Display the plot
plt.show()  # Show the plot with the configured settings


##### 1. Why did you pick the specific chart?

"To identify which weapons terrorists use most and least often in attacks."

##### 2. What is/are the insight(s) found from the chart?

"The chart shows that the Bombing/Explosion method was used most frequently."

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

"The chart shows that Bombing/Explosive devices are the most commonly used attack weapons by terrorists. Therefore, governments, global anti-terror organizations, and human rights groups should consider banning or restricting these explosives and their raw materials."

#### Chart - 13 - Correlation Heatmap

In [None]:
# Creating a new data frame with specific rows and specific columns
df_loc = g_df.loc[:, ["Year", "Month","Day","country","region","latitude","longitude","specificity","vicinity","crit1","crit2","crit3","doubtterr",
                          "multiple","Success","suicide","attacktype1","targtype1","targsubtype1","natlty1","weaptype1"]]

new_df_loc=df_loc.reset_index()
new_df_loc



In [None]:
# Correlation Heatmap visualization code

plt.figure(figsize=(40,20))
#This shows how much related is one parameter to the other in the dataset.
sns.heatmap(np.round(new_df_loc.corr(),2),annot=True, cmap='BuPu')

##### 1. Why did you pick the specific chart?

To Analyse the data correlation between various attributes from the Global Terrorism Database.

##### 2. What is/are the insight(s) found from the chart?

It shows analised data correlation between various attributes from the Global Terrorism Database.

#### Chart - 14 - Pair Plot

It displays analyzed data showing how different factors from the Global Terrorism Database are related to each other.

In [None]:
# Set style and color codes for Seaborn plots
sns.set(style="ticks", color_codes=True)

# Load the 'iris' dataset
iris = sns.load_dataset("iris")

# Create a pair plot to visualize relationships between different attributes
g = sns.pairplot(iris)

# Display the plot
plt.show()

In [None]:
# List of columns to plot
cols_to_plot = ['Group', 'Region', 'AttackType', 'Killed', 'Wounded']

# Create a pair plot with the selected columns
sns.pairplot(g_df[cols_to_plot])

# Display the pair plot
plt.show()


##### 1. Why did you pick the specific chart?

pair plots are particularly effective when you want to understand complex relationships among multiple features or when exploring a new dataset.

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. The Middle East and North Africa are the most frequently targeted regions, so their governments should strengthen their defense and investigation agencies, and restrict access to materials used for bombings and explosions.
2. Iraq is the most targeted country, so its government should reinforce its defense and investigation departments, and restrict access to bomb-making materials.
3. Increase public awareness about the threat of terrorism.
4. Anti-terrorism organizations and defense agencies should closely monitor Taliban and ISIL, which are the most active terrorist groups.
5. The international community should establish stronger laws and take decisive action against terrorism.

# **Conclusion**

1. The number of attacks has increased, but not every attack results in a high number of casualties.
2. Iraq has the highest number of terrorist attacks.
3. The Middle East and North Africa are the most targeted regions for terrorist attacks.
4. Most terrorist attacks involve bombings or explosions.
5. The majority of attacks are aimed at private citizens and property.
6. Taliban and ISIL are among the most active terrorist organizations.
7. The trend analysis shows a significant increase in global terrorist attacks since 1971. Terrorist groups like ISIL, Taliban, Al-Shabaab, Boko Haram, and NPA have contributed to this trend. However, there's been a slight decrease in attacks in recent times.
8. Every human life is valuable, and we must do everything possible to combat terrorism and those who fund it. The best long-term solutions are social, economic, and educational development.
9. It's crucial to raise awareness among the general public about the threat of terrorism.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***