<a href="https://colab.research.google.com/github/rahulrajbo/GlobalTerrorismDataset/blob/main/Global_Terrorism_Dataset_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -    Global Terrorism Dataset Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name**            - Rahul Bora

# **Project Summary -**

Exploratory Data Analysis (EDA) involves examining and understanding the characteristics, patterns, and relationships present in a dataset before conducting more formal statistical analyses or modeling. EDA helps in gaining insights, identifying anomalies, and formulating hypotheses about the data, which can guide subsequent analysis and decision-making.

The "Global Terrorism Dataset" project aims to conduct an exploratory data analysis (EDA) on the open-source database containing comprehensive information on terrorist attacks worldwide from 1970 to 2017. The dataset, maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, encompasses both domestic and international incidents, totaling over 180,000 attacks.

The project's main objective is to gain insights into the patterns and trends of global terrorism over the decades. By analyzing various dimensions of the dataset, including attack frequency, targeted countries, attack methods, weapon use, casualties, and the evolution of terrorist organizations, the project aims to provide a detailed overview of global terrorism trends. These findings will inform counterterrorism strategies, policies, and contribute to ongoing efforts in the field.

Through the project's exploratory analysis, potential trends in terrorism will be unveiled, shedding light on regions prone to attacks and understanding the factors behind their vulnerability. The project will leverage data-driven exploration to uncover complex patterns and provide valuable insights into the nature and dynamics of global terrorism.

The project's approach involves a combination of data manipulation, numerical computation, and graphic visualization techniques. By employing these methods, the project aims to derive a robust comprehension of the dataset and present key findings effectively. The project's end product will be a comprehensive analysis that contributes substantially to counterterrorism efforts and serves as a foundation for further research in this field.

Overall, the "Global Terrorism Dataset" project seeks to leverage the vast repository of terrorism-related information to generate insights, inform policies, and enhance understanding of terrorism's global landscape. Through rigorous analysis and visualization, the project aims to contribute to a safer and more secure world by addressing the complex challenges posed by terrorism.

# **GitHub Link -**

**MyGitHub Link:**  https://github.com/rahulrajbo/GlobalTerrorismDataset

# **Problem Statement**


Using exploratory data analysis (EDA) techniques on the Global Terrorism Dataset (GTD) to identify hot zones of terrorism and discern evolving patterns of terrorist activities.
The analysis aims to identify regions most affected by terrorism, understand contributing factors, and assess threat severity for resource allocation. Patterns in attack frequency, methods, and targets will be examined to adapt strategies. Correlations between socioeconomic conditions, instability, and terrorism will be explored.

#### **Define Your Business Objective?**

The objective of this project is to leverage the Global Terrorism Dataset (GTD) to derive actionable insights into terrorist activities worldwide from 1970 to 2017. The main objectives include:

**Identification of global hot zones for terrorist activities:** By determining the most affected regions, resources can be allocated more effectively to prevent future attacks.

**Analysis of frequency and intensity of attacks:** Understanding how these have evolved over time provides insights into changing dynamics of terrorism and allows for more accurate risk assessment.

**Examination of methodologies and weapons used in attacks:** Shedding light on the operational preferences of terrorist organizations and potentially providing early indicators of future trends.

**Assessment of casualty trends:** Identifying the most devastating types of attacks and allowing for targeted response planning to minimize human loss.

**Unveiling patterns related to terrorist organizations:** Understanding their strategies supports intelligence agencies in their counterterrorism efforts.

By achieving these objectives, the project aims to contribute to the understanding of global terrorism patterns, enhance risk assessment, and support efforts to counteract terrorist activities effectively.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

### Dataset Loading

In [None]:
# Load Dataset

from google.colab import drive
drive.mount('/content/drive')

In [None]:
df= pd.read_csv('/content/drive/MyDrive/Projects/Global Terrorism Data.csv', encoding= 'ISO-8859-1')

### Dataset First View

In [None]:
# Dataset First Look

df.head()


In [None]:
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

df.shape

### Dataset Information

In [None]:
# Dataset Info

df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

duplicate_values= df.duplicated().sum()
print(f'There are {duplicate_values} duplicate values in the dataset')

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

df.isna().sum()

In [None]:
# Visualizing the missing values

plt.figure(figsize=(30, 10))
sns.barplot(x=df.columns, y=df.isna().sum())
plt.xlabel('Columns')
plt.ylabel('Count of Missing Values')
plt.title('Missing Values in Each Column')
plt.xticks(rotation=90)
plt.show()

### What did you know about your dataset?

**Dataset Size:** The data set is quite large containing 181,691 entries or rows.

**Feature Quantity:** The data set contains 135 features or columns.

**Data Types:** The Dataset has a mix of data types. There are 55 features(columns) with floating point numbers(float64), 22 features with integers(int64), and 58 features with objects(object). The object datatype in pandas typically means the column contains string(text) data.

**Missing Values:** There are some columns with large number of missing values.|For example, in the 'approxdate' column has 172,452 missing values and the 'related' column has 156,653 missing values. However several columns do not have any missing values such as 'eventid', 'iyear', 'imonth','iday' etc.

**Memory Usage:** The Dataset uses over 187.1 MB of memory.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

columns_name= df.columns
print([column for column in columns_name])

In [None]:
# Dataset Describe

summary= df.describe()
print(summary)

### Variables Description

**eventid:** Incidents from the GTD follow a 12-digit Event ID system.

• First 8 numbers – date recorded “yyyymmdd”.

• Last 4 numbers – sequential case number for the given day (0001, 0002 etc).

**iyear:** This field contains the year in which the incident initiated.

**imonth:** the month in which the incident occurred.

**iday:** The day when the incident was initiated.

**country_txt:** Name of the country where the terrorist attack occured.

**region_txt:** Name of the region where the terrorist attack occured.

**city:** Name of the city, village, or town where the terrorist attack occured.

**latitude:** Records the latitude of the city in which the event occurred.

**longitude:** Records the longitude of the city in which the event occurred.

**attacktype1_txt:** General method of attack employed.

**success:** Success of a terrorist strike is defined according to the tangible effects of the attack.


**targtype1_txt:** The specific person, building, installation, etc. that is targeted.

**natlty1_txt:** The nationality of the target that was attacked.

**gname:** The name of the group that carried out the attack.

**weaptype1_txt:** The general type of weapon used in the incident.

**nkill:** The number of total confirmed fatalities.

**nwound:** The number of total confirmed wounded.Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

for i in df.columns.to_list():
  print(df[i].unique())
  print('****************')
  print('****************')

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

df.rename(columns={'iyear': 'Year', 'imonth': 'Month', 'iday': 'Day', 'country_txt': 'Country', 'region_txt': 'Region', 'city': 'City', 'attacktype1_txt': 'Attack_Type', 'targtype1_txt': 'Target_Type', 'gname': 'Group_Name', 'weaptype1_txt': 'Weapon_Type', 'nkill': 'Killed', 'nwound': 'Injured'}, inplace=True)
df = df[['Year', 'Month', 'Day', 'Country', 'Region', 'City', 'latitude', 'longitude', 'Attack_Type', 'success', 'Target_Type', 'Group_Name', 'Weapon_Type', 'Killed', 'Injured']]
df.head()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.isna().sum()

In [None]:
mode_Killed = df['Killed'].mode().iloc[0]
df.loc[:, 'Killed'] = df['Killed'].fillna(mode_Killed)

mode_Injured = df['Injured'].mode().iloc[0]
df.loc[:, 'Injured'] = df['Injured'].fillna(mode_Injured)

In [None]:
df.dropna(inplace= True)
df.reset_index().head()

In [None]:
df.isna().sum()

In [None]:
df.describe(include='all')

In [None]:
df.Country.nunique()

In [None]:
print('Top 5 countries with highest number of terrorist attacks:')
df['Country'].value_counts().head()

In [None]:
print('Top 5 Regions with highest number of terrorist attacks:')
df['Region'].value_counts().head()

In [None]:
print('Top 5 cities with highest number of terrorist attacks:')
df[df['City']!='Unknown']['City'].value_counts().head()

In [None]:
df['Casualties']= df['Killed']+ df['Injured']

In [None]:
df.head(1)

In [None]:
print('maximum number of casualties in a single terrorist attack')
df['Casualties'].max()

In [None]:
max_casualties_date_index = df['Casualties'].idxmax()
max_casualties_date = df.loc[max_casualties_date_index, ['Year', 'Month', 'Day']]
date_obj = datetime.datetime(max_casualties_date['Year'], max_casualties_date['Month'], max_casualties_date['Day'])
formatted_date = date_obj.strftime('%m/%d/%Y')
print('Date with the highest number of casualties:', formatted_date)

In [None]:
print('most active 5 terrorist groups:')
df[df['Group_Name']!= 'Unknown']['Group_Name'].value_counts().head()

In [None]:
print('most used 5 weapons in terror activities:')
df[df['Weapon_Type']!='Unknown']['Weapon_Type'].value_counts().head()

### What all manipulations have you done and insights you found?

After analysing the code book and csv file of Global Terrorism Dataset I concluded that there are so many columns(135) which are not providing us too much useful information.
So we will rename some column names for better understanding column name and then we will only extract some necessary columns for further analysis.

Now we see that the **longitude** and **latitude** columns have some null values and they are not so much useful as instead of them we can use the data provided in **country, region, city** columns. so we will the rows with the missing values in  them.

There are many null values in **killed** and **injured** columns so we will those null values by the taking the **mode** of those columns and filled in the values in their respective columns.

their are a little null values in **city** columns so we will directly drop them as there are app. 181,257 rows available.


**Some Insights**--


1- **205** of **249**(as of 2017) countries have reported terrorist activities at least once between 1970 and 2017.

2- the country faced most terrorist attacks is- **Iraq**

3- the region faced most terrorist attacks is- **Middle East and North Africa**

4- the city faced most terrorist attacks is- **Baghdad**

5- maximum no of casualties in single terrorist attack- **9574**

6- maximum no of casualties on date- **9/11/2011**

7- most active terrorist group- **Taliban**

8- most used weapon for terrorist activities- **Explosives**


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

plt.figure(figsize=(15,7))
sns.countplot(data= df, x='Year')
plt.xlabel('Year')
plt.xticks(rotation=90)
plt.ylabel('counts')
plt.title('Count of terrorist activities each year')
plt.show()

##### 1. Why did you pick the specific chart?

The countplot is a suitable choice for this task because it displays the frequency or count of observations in each category.
In this case, the categories are the years, and the countplot allows you to see the distribution and compare the number of terrorist activities across different years.

Most of the terrorist activities occured in year: **2017**

##### 2. What is/are the insight(s) found from the chart?

**Increasing Trend:** The chart shows that the number of terrorist activities has generally increased over the years, with some fluctuations. This indicates a rising trend in global terrorism.

**Peaks and Valleys:** There are noticeable peaks and valleys in certain years, suggesting variations in the occurrence of terrorist activities. These peaks might correspond to significant events or periods of heightened terrorist incidents.

**##### 3. Will the gained insights help creating a positive business impact?**






**Are there any insights that lead to negative growth? Justify with specific reason.**







**Strategic Planning:** Businesses can adapt their strategies based on long-term terrorism trends, identifying safer markets or regions, and developing contingency plans to ensure business continuity.

**Partnerships and Collaborations:** Collaborating with government agencies, security organizations, and industry associations can enhance information sharing and collective efforts to combat terrorism, fostering a safer business environment.


**insights that lead to negative growth--------**

**Disruption of Operations:** Security concerns, infrastructure damage, and travel restrictions can disrupt business operations, leading to delays and impacting growth.

**Increased Costs:** Security measures, insurance premiums, and risk mitigation strategies can increase costs, affecting profit margins and hindering growth.

**Restricted Market Expansion:** Regulatory barriers and security concerns can limit market expansion efforts, limiting growth potential.

**Economic Impact:** Terrorism can result in reduced foreign investment, decreased tourism, and disrupted economic activities, negatively impacting the overall business environment and growth opportunities.



#### Chart - 2

In [None]:
# Chart - 2 visualization code

plt.figure(figsize=(15,7))
sns.countplot(data=df, x='Country', order=df['Country'].value_counts().head(15).index)
plt.xlabel('Country')
plt.xticks(rotation=90)
plt.ylabel('counts')
plt.title('Count of terrorist activities by country')
plt.show()

##### 1. Why did you pick the specific chart?

The code uses a countplot from the seaborn library to visualize the count of terrorist activities by country. This specific chart type is suitable for displaying the frequency of occurrences in categorical data, making it appropriate for showcasing the count of terrorist activities in different countries.

Maximum terrorist activities in country: **Iraq**

##### 2. What is/are the insight(s) found from the chart?

The chart provides insights into the count of terrorist activities in different countries. By analyzing the bar heights, we can identify the countries with the highest number of reported terrorist activities. The top 15 countries with the highest counts are displayed in the chart, allowing us to compare their relative frequencies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially help create a positive business impact. Understanding the countries with a high incidence of terrorist activities can inform businesses operating in those regions about potential risks and enable them to implement appropriate security measures. It can also guide investment decisions and resource allocation to mitigate risks and ensure the safety of employees, assets, and operations.


**Insights that led to negative growth------------**

High frequency of terrorist activities in countries where a business operates or plans to expand, it could potentially have a negative impact on growth. Increased security costs, disruptions to operations, and potential market restrictions can hinder business expansion and profitability. Mitigating these risks and developing effective risk management strategies is crucial to minimize negative growth consequences.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

plt.figure(figsize=(15,7))
sns.countplot(data= df, x='Region')
plt.xlabel('Region')
plt.xticks(rotation=90)
plt.ylabel('counts')
plt.title('Count of terrorist activities by Region')
plt.show()

##### 1. Why did you pick the specific chart?

The code uses a countplot from the seaborn library to visualize the count of terrorist activities by Region. This specific chart type is suitable for displaying the frequency of occurrences in categorical data, making it appropriate for showcasing the count of terrorist activities in different regions.

Maximum terrorist activities in region: **Middle East and North Africa**

##### 2. What is/are the insight(s) found from the chart?

The chart provides insights into the count of terrorist activities in different regions. By analyzing the bar heights, we can identify the regions with the highest number of reported terrorist activities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially help create a positive business impact. Understanding the regions with a high incidence of terrorist activities can inform businesses operating in those regions about potential risks and enable them to implement appropriate security measures. It can also guide investment decisions and resource allocation to mitigate risks and ensure the safety of employees, assets, and operations.


**Insights that led to negative growth------------**

High frequency of terrorist activities in regions where a business operates or plans to expand, it could potentially have a negative impact on growth. Increased security costs, disruptions to operations, and potential market restrictions can hinder business expansion and profitability. Mitigating these risks and developing effective risk management strategies is crucial to minimize negative growth consequences.

#### Chart - 4

In [None]:
# Chart - 4 visualization code

plt.figure(figsize=(15,7))
sns.countplot(data=df, x='City', order=df['City'].value_counts().head(15).index)
plt.xlabel('City')
plt.xticks(rotation=90)
plt.ylabel('counts')
plt.title('Count of terrorist activities by City')
plt.show()

##### 1. Why did you pick the specific chart?

The code uses a countplot from the seaborn library to visualize the count of terrorist activities by city. This specific chart type is suitable for displaying the frequency of occurrences in categorical data, making it appropriate for showcasing the count of terrorist activities in different cities.

Maximum terrorist activities in city: **Baghdad**

##### 2. What is/are the insight(s) found from the chart?

The chart provides insights into the count of terrorist activities in different cities. By analyzing the bar heights, we can identify the countries with the highest number of reported terrorist activities. The top 15 cities with the highest counts are displayed in the chart, allowing us to compare their relative frequencies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially help create a positive business impact. Understanding the cities with a high incidence of terrorist activities can inform businesses operating in those regions about potential risks and enable them to implement appropriate security measures. It can also guide investment decisions and resource allocation to mitigate risks and ensure the safety of employees, assets, and operations.


**Insights that led to negative growth------------**

High frequency of terrorist activities in cities where a business operates or plans to expand, it could potentially have a negative impact on growth. Increased security costs, disruptions to operations, and potential market restrictions can hinder business expansion and profitability. Mitigating these risks and developing effective risk management strategies is crucial to minimize negative growth consequences.

#### Chart - 5

In [None]:
# Chart - 5 visualization code

plt.figure(figsize=(15,7))
sns.lineplot(data=df, x='Year', y= 'Casualties', estimator= sum)
plt.xlabel('Year')
plt.xticks(rotation=90)
plt.ylabel('No of casualties')
plt.title('No of casualties by terror attack')
plt.show()

##### 1. Why did you pick the specific chart?

The lineplot is suitable for showcasing the trend or progression of a variable over a continuous axis, which in this case is the number of casualties over the years. It allows for a clear visualization of the changes and patterns in casualty counts over time.

##### 2. What is/are the insight(s) found from the chart?

The lineplot provides insights into the trends in the number of casualties caused by terrorist attacks over the years. By analyzing the line's direction and slope, we can identify periods of high or low casualties, observe any increasing or decreasing trends, and detect significant changes in the pattern of casualties.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding the trends and patterns in the number of casualties caused by terrorist attacks can inform businesses, governments, and security agencies about the effectiveness of counterterrorism measures, the impact of policies, and the evolving nature of terrorist threats. This knowledge can guide the development of proactive strategies and initiatives to enhance security, mitigate risks, and protect lives and assets.

**insights that lead to negative growth-------------**

Negative growth may occur if the lineplot shows a consistently increasing trend or a sudden surge in the number of casualties over the years. Such insights indicate a worsening security situation and potential challenges for businesses operating in affected areas. Increased security concerns, disruptions to operations, and a decline in consumer confidence may negatively impact business growth and investment opportunities. It becomes crucial for businesses to assess the risks and develop appropriate risk mitigation strategies to navigate these challenges effectively.

#### Chart - 6

In [None]:
# Chart - 6 visualization code

plt.figure(figsize=(15,7))
sns.countplot(data=df, x='Attack_Type')
plt.xlabel('Attack Type')
plt.xticks(rotation=90)
plt.ylabel('counts')
plt.title('Most used weapon for terrorist activities')
plt.show()

##### 1. Why did you pick the specific chart?

The countplot is suitable for displaying the frequency of categorical variables, in this case, the different attack types. It provides a clear visualization of the distribution of attack types and allows for easy comparison between the categories.

##### 2. What is/are the insight(s) found from the chart?

The countplot reveals the most common attack types used in terrorist activities. By examining the heights of the bars, we can identify the attack types that occur most frequently. This insight helps in understanding the preferred methods of terrorists and provides valuable information for counterterrorism efforts and security planning.

Most used weapon for terrorist activities: **Bombing/Explosion**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding the most commonly used attack types in terrorist activities allows businesses, governments, and security agencies to develop targeted security measures and counterterrorism strategies. By aligning their security protocols and risk management strategies with the prevalent attack types, businesses can enhance their preparedness, protect their assets, and ensure the safety of their employees and customers.

**Insights that lead to negative growth-----------------**

Insights from the countplot may not directly lead to negative growth. However, certain attack types that are more frequent and potentially more destructive, such as bombings or armed assaults, can create a negative impact on businesses operating in affected areas. Heightened security concerns, infrastructure damage, and disruptions to normal operations can lead to decreased consumer confidence, reduced economic activity, and limited growth opportunities. It becomes essential for businesses to assess the risks associated with specific attack types and implement robust security measures to mitigate potential negative impacts and ensure business continuity.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

plt.figure(figsize=(15,7))
sns.countplot(data=df, y='Group_Name', order=df[df['Group_Name']!='Unknown']['Group_Name'].value_counts().head(10).index)
plt.xlabel('No of Attacks')
plt.ylabel('Terrorist Group')
plt.title('Count of terrorist activities by Terrorist Group')
plt.show()

##### 1. Why did you pick the specific chart?

The countplot is appropriate for displaying the frequency of categorical variables, in this case, the terrorist groups. By using the y-axis for the categorical variable, we can easily compare the number of attacks carried out by different terrorist groups.

##### 2. What is/are the insight(s) found from the chart?

The countplot reveals the top 10 terrorist groups based on the number of attacks. By examining the heights of the bars, we can identify the terrorist groups that have been involved in the highest number of attacks. This insight provides valuable information about the prominent terrorist organizations and their impact on global terrorism.

Most terrorist attacks done by: **Taliban**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This information can be beneficial for businesses operating in regions affected by these terrorist groups. It allows businesses to assess the security risks associated with specific groups and implement appropriate security measures to safeguard their assets, employees, and customers.

**Insights that lead to negative growth--------------**

The presence of prominent terrorist groups can create an environment of insecurity and instability, leading to reduced investor confidence, decreased tourism, and disrupted economic activities. Businesses may face challenges such as operational disruptions, increased security costs, and limited market expansion opportunities. It becomes crucial for businesses to assess the risks associated with operating in such regions and implement effective risk management strategies to mitigate potential negative impacts.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

plt.figure(figsize=(15,7))
sns.countplot(data=df, y='Target_Type')
plt.xlabel('count')
plt.ylabel('Target Type')
plt.title('Most common target of terror activities')
plt.show()

##### 1. Why did you pick the specific chart?

The countplot is suitable for displaying the frequency of categorical variables, in this case, the target types of terror activities. By using the y-axis for the categorical variable, we can easily compare the occurrence of different target types.

##### 2. What is/are the insight(s) found from the chart?

The countplot reveals the most common targets of terror activities. By examining the heights of the bars, we can identify the target types that have been most frequently targeted by terrorists. This insight provides valuable information about the areas or sectors that are more susceptible to terrorist attacks.

Most common target of terror activities: **Private Citizens and Property**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

By understanding the most common targets of terror activities, businesses operating in those sectors or locations can better assess the associated risks and take appropriate measures to enhance security and protect their assets, employees, and customers. This proactive approach can contribute to creating a safer environment and ensuring business continuity.

**Insights that lead to negative growth-----------**

 The presence of high-frequency target types indicates a higher risk of attacks, which can create an environment of insecurity and instability. This may result in decreased consumer confidence, reduced investments, and potential disruptions to business operations. It becomes crucial for businesses to assess the risks, implement effective security measures, and collaborate with relevant authorities to mitigate the potential negative impacts and maintain business growth.








#### Chart - 9

In [None]:
India_df= df[df['Country']=='India']
India_df.head()

In [None]:
# Chart - 9 visualization code

plt.figure(figsize=(15,7))
sns.countplot(data= India_df, x='Year')
plt.xlabel('Year')
plt.xticks(rotation=90)
plt.ylabel('counts')
plt.title('Count of terrorist activities each year in India')
plt.show()

In [None]:
plt.figure(figsize=(15,7))
sns.countplot(data= India_df, x='City', order=India_df['City'].value_counts().head(15).index)
plt.xlabel('Cities')
plt.xticks(rotation=90)
plt.ylabel('counts')
plt.title('Count of terrorist activities each year')
plt.show()

In [None]:
plt.figure(figsize=(15,7))
sns.lineplot(data=India_df, x='Year', y= 'Casualties', estimator= sum)
plt.xlabel('Year')
plt.xticks(rotation=90)
plt.ylabel('No of casualties')
plt.title('No of casualties by terror attack')
plt.show()

In [None]:
plt.figure(figsize=(15,7))
sns.countplot(data=India_df, x='Attack_Type')
plt.xlabel('Attack Type')
plt.xticks(rotation=90)
plt.ylabel('counts')
plt.title('Most used weapon for terrorist activities')
plt.show()

In [None]:
plt.figure(figsize=(15,7))
sns.countplot(data=India_df, y='Group_Name', order=India_df[India_df['Group_Name']!='Unknown']['Group_Name'].value_counts().head(10).index)
plt.xlabel('No of Attacks')
plt.ylabel('Terrorist Group')
plt.title('Count of terrorist activities by Terrorist Group')
plt.show()

In [None]:
plt.figure(figsize=(15,7))
sns.countplot(data=India_df, y='Target_Type')
plt.xlabel('count')
plt.ylabel('Target Type')
plt.title('Most common target of terror activities')
plt.show()

##### 1. Why did you pick the specific chart?

We can do all the things what we did before for each particular country.

##### 2. What is/are the insight(s) found from the chart?

Most terror activity in India in year: **2016**

Most terror activities in city in India: **Srinagar**

Most no of casualties in India: **5000**

Most used weapon for terror activities in India: **Bombing and Explosion**

Most active terrorist group in India: Communist **Party of India**

Most common target of terror activity in India: **Private Citizens and Property**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This understanding can aid in the development of more effective counterterrorism strategies, resource allocation for security measures, and initiatives focused on mitigating the consequences of such attacks. Ultimately, these actions can contribute to creating a safer environment, enhancing public confidence, and fostering positive social and economic growth.

**Insights that lead to negative growth-----------**

Heightened security concerns, loss of lives, disruptions in business activities, and a decline in tourism and investments can adversely affect economic growth and development. Therefore, it is crucial to use these insights to implement effective counterterrorism measures and preventive actions to minimize the negative impact and promote sustainable growth.






Regenerate response

#### Chart - 10

In [None]:
# Chart - 10 visualization code

plt.figure(figsize=(10, 10))
country_counts = df['Country'].value_counts()
top_countries = country_counts.head(10)
plt.pie(top_countries, labels=top_countries.index, autopct='%1.1f%%')
plt.title('Proportion of Terrorist Activities by Country')
plt.show()

##### 1. Why did you pick the specific chart?

The pie chart was chosen because it effectively shows the proportion or distribution of terrorist activities in different countries. It provides a visual representation of the relative sizes of each country's contribution to the total number of terrorist activities.

##### 2. What is/are the insight(s) found from the chart?

The pie chart allows us to identify the top countries with the highest proportion of terrorist activities.

It shows which countries are most affected by terrorism and provides a comparative view of their contributions.

It can highlight countries that may require special attention or resources in terms of security measures, risk assessment, and mitigation strategies.
The chart can also reveal any significant disparities in the distribution of terrorist activities among countries.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Enabling businesses to prioritize and allocate resources effectively based on the countries with the highest proportions of terrorist activities.

Facilitating collaboration with government agencies and security organizations to address the challenges posed by terrorism in specific countries.

**negative growth impacts due to the insights gained from the chart**

Businesses may face limitations or increased challenges when operating in countries with a high proportion of terrorist activities. This could lead to disrupted operations, increased security costs, and restricted market expansion.
Negative perceptions or associations with countries heavily affected by terrorism can impact consumer confidence, investor sentiment, and business partnerships, potentially resulting in reduced growth opportunities.

#### Chart - 11

In [None]:
# Chart - 11 visualization code

grouped_data = df.groupby(['Region', 'Attack_Type']).size().unstack()
grouped_data['Total'] = grouped_data.sum(axis=1)
grouped_data_percentage = grouped_data.div(grouped_data['Total'], axis=0) * 100
plt.style.use('seaborn')
grouped_data_percentage.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.title('Distribution of Attack Types by Region')
plt.xlabel('Region')
plt.ylabel('Percentage')
plt.legend(title='Attack Type', bbox_to_anchor=(1, 1))
plt.show()

##### 1. Why did you pick the specific chart?

A stacked bar chart is suitable for displaying the distribution of attack types by region because it allows for easy comparison of the proportions of different attack types within each region. The stacked bars show the overall composition of attack types while preserving the individual breakdown within each region. This helps identify the dominant attack types and their relative significance across regions.

##### 2. What is/are the insight(s) found from the chart?

**Regional Variations:** It enables comparison of attack type distributions across different regions, highlighting variations in the prevalence of specific attack types.

**Dominant Attack Types:** The chart can reveal the most common attack types within each region, indicating the types of threats prevalent in different areas.

**Regional Patterns:** It may uncover patterns or trends in the distribution of attack types, which could be further analyzed and investigated.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Informing Security Measures:** Understanding the prevalent attack types in different regions can assist in developing targeted security strategies and measures to mitigate risks effectively.

**Resource Allocation:** Insights on regional patterns can aid in allocating resources, such as security personnel or infrastructure, based on the identified threat landscape.


**insights from the scatter plot may also indicate negative growth-----**

Highly destructive attack types in certain regions, it could indicate a heightened risk environment. This could potentially result in negative growth or impact sectors such as tourism, investment, or local economies. A detailed assessment of the specific data and context would be necessary to determine negative growth implications with more certainty.

#### Chart - 12

In [None]:
# Chart - 12 visualization code

plt.figure(figsize=(12, 8))
df['Region'].value_counts().head(10).plot(kind='pie', autopct='%1.1f%%')
plt.title('Distribution of Terrorist Activities by Region')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

I picked the pie chart because it is an effective way to represent proportions or percentages of a whole. In this case, it allows us to visually understand the distribution of terrorist activities across different regions.

##### 2. What is/are the insight(s) found from the chart?

The pie chart provides a clear visualization of the relative distribution of terrorist activities among the top 10 regions.

It helps identify the regions with the highest and lowest proportions of terrorist activities.

It allows for a quick comparison of the contribution of each region to the overall terrorist activities.

max terrorist attack in region: **Middle East & North Africa**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Business operations can be tailored to address the specific risks and challenges associated with regions experiencing higher terrorist activities.

Security measures can be enhanced in regions where the threat of terrorism is more prevalent.

**Insights that may lead to negative growth:**

If the pie chart reveals a significant concentration of terrorist activities in specific regions, it may raise concerns among potential investors, customers, and partners. This can result in negative perceptions and reluctance to engage in business activities in those regions.


#### Chart - 13

In [None]:
# Chart - 13 visualization code

plt.figure(figsize=(12, 8))
df['Attack_Type'].value_counts().head(7).plot(kind='pie', autopct='%1.1f%%')
plt.title('Top 7 Terrorist Attack Types')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

I picked a pie chart to represent the distribution of terrorist attack types. A pie chart is suitable for showing the proportion or percentage of each attack type in relation to the total.


##### 2. What is/are the insight(s) found from the chart?

we can observe the distribution of the top 7 terrorist attack types. The insights gained include the relative frequency or occurrence of each attack type. This information can help in understanding the dominant attack types and their significance in the overall landscape of terrorist activities.

Most used attack type: **Bombing/Explosion**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially create a positive business impact by informing decision-making related to security measures, risk assessments, and resource allocation. Businesses can tailor their security strategies, develop contingency plans, and allocate resources effectively based on the identified attack types.


**some insights that lead to negative growth-------**

 For example, if a particular attack type has a significantly higher occurrence rate, it may indicate a higher level of risk or instability in certain regions or industries. This can lead to negative growth if businesses operating in those areas or industries are perceived as high-risk and face challenges such as reduced investments, decreased consumer confidence, and disruptions to operations.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

plt.figure(figsize=(10,10))
sns.heatmap(data= df.corr(), annot= True, cmap= 'coolwarm')
plt.xlabel('Features')
plt.ylabel('Features')
plt.title('Correlation between different features')
plt.show()

##### 1. Why did you pick the specific chart?

The selected heatmap chart is suitable for visualizing the correlation matrix as it allows us to identify patterns and trends in the data. By examining the color-coded correlation coefficients, we can gain insights into the strength and direction of relationships between variables.

##### 2. What is/are the insight(s) found from the chart?

Insights from the correlation matrix heatmap can be used to understand how different variables are related to each other. For example, positive correlations between 'Killed' and 'Casualties' indicate that higher numbers of killed individuals are associated with higher casualty counts. Similarly, positive correlations between 'Injured' and 'Casualties' suggest that higher numbers of injured individuals are also associated with higher casualty counts.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

sns.pairplot(data= df.head(5))
plt.xlabel('Features')
plt.ylabel('Features')
plt.title('Correlation between different features')
plt.show()

##### 1. Why did you pick the specific chart?

I picked the pair plot because it allows us to visualize the relationships and distributions between multiple variables in the dataset. By plotting each variable against every other variable, we can gain insights into the correlations, patterns, and distributions among the variables.

##### 2. What is/are the insight(s) found from the chart?

Correlations: We can identify the strength and direction of correlations between different variables. For example, we can see if there is a positive or negative correlation between the number of casualties and the number of people killed in terrorist attacks.

Distributions: We can examine the distribution of each variable and identify any patterns or outliers. This helps us understand the range and spread of values for each variable.

Variable relationships: We can observe how different variables interact with each other. For example, we can see if there is a relationship between the type of attack and the success of the attack.

Outliers: We can identify any outliers or unusual observations in the dataset that may require further investigation.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the exploratory data analysis conducted on the Global Terrorism Dataset, there are several recommendations that coould be provided to a client inetersted in using this information to decrease the impact of terrorism and thereby meet the stated business objective.

**Focus on Hotspot Regions:** The regions with the highest frequencies of terrorist activities should be prioritized for intervention efforts. These regions may need more robust security measures, targeted socio-economic programs to address root causes of terrorism, or more substantial international assistance.

**Understand Yearly Trends:** Keeping track of the rise or fall of terrorist incidents over the years could help forecast potential future threats and adjust counter terrorism strategies accordingly.

**Prioritize Major Threat Groups:** Our analysis shows that certain terrorist groups are more active than others. Intelligence efforts should be concentrated on these high impact groups to prevent future attacks.

**Target Most common Attack Types:** Understanding the most common types of attacks used by terrorists can help in developing preventive measures and response strategies. For instance if bombings are the most common attack type, more resources could be directed towards bomb deection and disposal.

# **Conclusion**

The Explortory Data Analysis (EDA) conducted on the Global Terrorism Dataset provided significant insights into trends and patterns in global terrorism from 1970 through 2017. With the help of the Python libraries like Pandas, Matplotlib, Seaborn and Numpy, we were able to handle, visualize and interpret complete data related to terrorist activities.

Through this analysis, we identified trends over time, regional hotspots, dominant terrorist groups and preferred modes of attacks. All these findings are crucial for devising effective counter terrorism strategies and interventions.

The process underscored the power of data-driven decision-making. By using EDA, we were able to transform raw data into meaningful insights. For instance, understanding that certain regions are more prone to terrorist attacks or that specific terrorist groups are more active allows security agencies and policymakers to allocate resources more efficiently, thereby potentially saving lives and property.

However, while this data analysis provides a robust foundation, it's important to acknowldedge that addressing terrorism requires more than just understanding past data. It necessitates a comprehensive approach that includes current intelligence, geopolitical considerations, and on the ground realities.

To conclude, this project demonstrates the potential of data analysis in informing and shaping counter-terrorism efforts. It provides a useful starting point for further study and action, emphasizing the importance of continous data collection, analysis, and interpretation in tackling global security challenges like terrorism.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***