<a href="https://colab.research.google.com/github/kajalwasnik/Global-Terrorism/blob/main/Global_Terrorism.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Global Derrorism Dataset







##### **Project Type**    - EDA
**Name** - Kajal Wasnik
##### **Contribution**    - Individual



# **Project Summary -**

The aim of this project is to perform an exploratory data analysis (EDA) on the Global Terrorism Database (GTD), an open-source dataset containing comprehensive information on terrorist attacks worldwide from 1970 through 2017. Developed and maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, the database records over 180,000 terrorist incidents, both domestic and international. The primary objective of the project is to delve into this extensive dataset, identify noteworthy trends, patterns, and insights related to terrorism, and visually present these discoveries for a more comprehensive understanding.

A crucial component of this initiative involves the extensive utilization of Python libraries tailored for data analysis and visualization. The cornerstone for data manipulation tasks, such as loading the dataset, cleaning data, and executing sophisticated aggregation operations, is the Pandas library. This powerful and high-performance tool offers efficient data structures, simplifying the handling of large datasets.

To facilitate advanced numerical operations and enhance computation speed, the project employs the NumPy library. Renowned for its proficiency in managing multi-dimensional arrays and matrices, NumPy proves to be an ideal companion for various data processing tasks.

The project goes beyond numerical data analysis by bringing extracted insights to life through vivid and informative visualizations, thanks to the Matplotlib and Seaborn libraries. These libraries offer a diverse range of visualization styles, enabling the display of data in visually appealing and informative ways. From bar plots and scatter plots to histograms and heatmaps, the project aims to utilize a minimum of five different visualization types to uncover relationships between variables and provide a graphical representation of the dataset's characteristics.

Exploring the GTD through this project intends to provide a nuanced understanding of terrorism patterns over the decades. The goal is to reveal potential trends in attack frequency, the most targeted countries, preferred methods of attack, types of weapons used, casualties, and the evolution of terrorist organizations, among other relevant dimensions.

By examining these factors, the project aims to offer a detailed overview of global terrorism trends, informing counter-terrorism strategies and policies. Additionally, the findings may contribute to understanding the characteristics of regions prone to attacks and the underlying reasons for their vulnerability.

In conclusion, this project represents a data-driven exploration into the realm of terrorism, aiming to illuminate complex patterns within the vastness of the GTD. The final outcome will be a collection of valuable insights with the potential to significantly contribute to ongoing counter-terrorism efforts and guide future research in this field. The combination of data manipulation, numerical computation, and graphic visualization is expected to result in a robust and comprehensive exploration of the dataset, leading to substantial key findings related to global terrorism.

Provide your GitHub Link here.

Github -https://github.com/kajalwasnik/Global-Derrorism-Dataset

problem statement


Employing exploratory data analysis (EDA) methods on the Global Terrorism Database (GTD), pinpoint the global hotspots of terrorism and analyze the changing patterns of terrorist activities. What valuable insights concerning security issues can be extracted from this analysis, and how might these findings play a crucial role in shaping effective counter-terrorism strategies?

#### **Define Your Business Objective?**

The primary business goal of this project is to harness the data available in the Global Terrorism Database (GTD) to extract practical insights into global terrorist activities spanning from 1970 to 2017. Through a comprehensive exploratory data analysis (EDA), the objective is to uncover essential patterns, trends, and correlations related to global terrorism, thereby facilitating more informed decision-making for security analysts, policymakers, and counter-terrorism agencies.

Specifically, the project aims to achieve the following:

Identification of global "hot zones" for terrorist activities: By pinpointing the regions most affected, the project seeks to enhance understanding, allowing for optimal resource allocation to prevent future attacks.

Analysis of the frequency and intensity of attacks: Examining the evolution of these factors over time provides insights into the changing dynamics of terrorism, enabling more accurate risk assessments.

Examination of methodologies and weapons used in attacks: Shedding light on the operational preferences of terrorist organizations can offer early indicators of potential future threats.

Assessment of casualty trends: Identifying the most devastating types of attacks helps in targeted response planning to minimize human loss.

Unveiling patterns related to terrorist organizations: This aspect of the project has the potential to aid in understanding the strategies employed by terrorist organizations, thereby supporting intelligence agencies in their counter-terrorism efforts.






# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries


In [None]:
import pandas as pd # data processing, csv file I/0 (e.g.pd.read_csv)
import numpy as np
import matplotlib.pyplot as plt # visualizing data
import seaborn as sns
import plotly.express as px
%matplotlib inline


### Dataset Loading

In [None]:
# Load Dataset


In [None]:
#connect with googl drive
from google.colab import drive
drive.mount ('/content/drive')

In [None]:
df = pd.read_csv('/content/drive/MyDrive/globalterrorismdb_0718dist (1).csv',encoding='latin')



In [None]:
df

### Dataset First View

In [None]:
# Dataset First Look
df.head

In [None]:
df.tail

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count


In [None]:
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info

#### Duplicate Values

In [None]:
# Dataset duplicate value count
duplicate_rows = df.duplicated().sum()
print(f'there are {duplicate_rows} duplicate_rows in the dataset')


#### Missing Values/Null Values

In [None]:
# missing value/null values count
missing_values = df.isnull().sum()
print(missing_values)

In [None]:
# visualizing the mussing values
import missingno as msno

# visualise the missing values as a matrix
msno.matrix(df)

### What did you know about your dataset?

**Dataset Size:** The dataset is quite large, containing 181,691 entries or rows.

**Feature Quantity:** The dataset contains 135 features or columns.

**Data Types:** The dataset has a mix of data types. There are 55 features with floating point numbers (float64), 22 features with integers (int64), and 58 features with objects (object). The object datatype in pandas typically means the column contains string (text) data.

**Memory Usage:** The dataset uses over 187.1 MB of memory.

**Missing Values:** There are some columns with a large number of missing values. For example, the 'approxdate' column has 172,452 missing values and the 'related' column has 156,653 missing values. However, several columns do not have any missing values, such as 'eventid', 'iyear', 'imonth', 'iday', 'INT_LOG', 'INT_IDEO', 'INT_MISC', and 'INT_ANY'.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns


In [None]:
# columns presentd in csv files
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

eventid: Unique ID for each event or terrorist attack.

iyear: Year the terrorist attack occurred.

imonth: Month the terrorist attack occurred.

iday: Day the terrorist attack occurred.

country_txt: Name of the country where the terrorist attack occurred.

region_txt: Name of the region where the terrorist attack occurred.

city: City where the terrorist attack occurred.

attacktype1_txt: The general method of attack employed.

target1: The specific person, building, installation, etc., that was targeted.

nkill: Number of confirmed fatalities for the incident.

nwound: Number of confirmed non-fatal injuries.

gname: Name of the group that carried out the attack.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.


In [None]:
unique_countries = df['country_txt'].unique()
print(unique_countries)

print()  # this will leave gap

unique_year = df['iyear'].unique()
print(unique_year)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# write your code to make your dataset analysis ready.
print(df.isnull().sum())

In [None]:
pd.set_option('display.max_rows',None)
print(df.dtypes)

In [None]:
pd.reset_option('display.max_row')


In [None]:
df.rename(columns={'iyear':'year','imonth':'month','iday':'day','country_txt':'country','region_txt':'Region','provstate':'state','attacktype1_txt':'attack_type','targtype1_txt':'target_type','gname':'Group','weaptype1_txt':'weapon_type','nkill':'killed','nwound':'wounded'},inplace = True)


In [None]:
data=df[['year','month','day','country','state','region','city','latitude','longitude','attack_type','killed','motive']]



In [None]:
df.head

### What all manipulations have you done and insights you found?

Since it contains 135 columns. They have a huge proportion in dataset and Learning them doesn't make any sense. So, we will rename the columns name for better understaning and then we will only extract necessary columns.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# chart - 1 visualization code
plt.figure(figsize=(15,6))
plt.xticks(rotation=90)
plt.title("attacks per year")
sns.countplot(x='year',data=df);

##### 1. Why did you pick the specific chart?

A line plot was chosen because it provides an excellent visual representation of the trend over time.

##### 2. What is/are the insight(s) found from the chart?

The insight that can be gained is the trend of terrorist activities over the years. We can see if the frequency of attacks is increasing, decreasing, or remaining relatively stable.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights are crucial for predicting future trends, which could help law enforcement and security agencies plan resources and strategies. However, if the trend shows an increase in terrorist activities, this could lead to a negative impact as it indicates a growing problem.

#### Chart - 2

In [None]:
# Chart - 2 visualization code


In [None]:
plt.figure(figsize=(15,7))
sns.countplot(data=df,x='region')
plt.title ('count of terrorist activities by region')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is suitable for categorical data and helps in comparing the number of terrorist activities in each region.

##### 2. What is/are the insight(s) found from the chart?

We can see which regions experience the most terrorist activities, providing insight into geographical hotspots of terrorism.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This information is useful for focusing resources and counter-terrorism efforts on the most affected areas. A high frequency of attacks in a particular region could discourage investment and tourism, leading to negative growth.

#### Chart - 3

In [None]:
# Chart - 3 visualization code


In [None]:
plt.figure(figsize=(15,7))
sns.lineplot(data=df,x='year',y='killed',estimator='sum')
plt.title('Number of people killed by terror Attacks')
plt.xticks(rotation=90)
plt.xlabel('year')
plt.ylabel('Number of people killed')
plt.show()

##### 1. Why did you pick the specific chart?

A line plot was chosen to observe the trend of casualties over time.

##### 2. What is/are the insight(s) found from the chart?

The insight is the severity of terrorist activities over the years in terms of human lives lost.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This could influence policy making, disaster management planning, insurance, and healthcare provisions. An increasing trend could lead to negative growth by discouraging population stability, investment, and development.

#### Chart - 4

In [None]:
# Chart - 4 visualization code


In [None]:
plt.figure(figsize=(15,7))
sns.countplot(data=df,x='attack_type')
plt.title('attack_type')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

 A bar plot is used to compare the frequencies of different categories - in this case, attack types.

##### 2. What is/are the insight(s) found from the chart?

We can learn about the most commonly used methods in terrorist attacks.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights can help in developing and implementing measures to prevent and respond to these specific types of attacks. If certain types of attacks are prevalent, it may signify a failure to adequately address those threats, possibly leading to negative impacts.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
city_counts = {
    'Baghdad': 7589,
    'Karachi': 2652,
    'Lima': 2359,
    'Mosul': 2265,
    'Belfast': 2171,
    'Santiago': 1621,
    'Mogadishu': 1581,
    'San Salvador': 1558,
    'Istanbul': 1048
}

cities = list(city_counts.keys())
attack_counts = list(city_counts.values())

plt.figure(figsize=(8, 8))
plt.pie(attack_counts, labels=cities, autopct='%1.1f%%', startangle=140, colors=plt.cm.Paired.colors)
plt.title('Distribution of Terrorist Attacks in Top Cities')

plt.axis('equal')  # Equal aspect ratio ensures the pie chart is circular.

plt.show()


##### 1. Why did you pick the specific chart?

The chosen pie chart is apt for representing the distribution of terrorist attacks in the top cities (excluding "Unknown"). It effectively shows the proportional contribution of each city to the total incidents. This visualization simplifies comparison and highlights the most significant cities in terms of attack occurrences.

##### 2. What is/are the insight(s) found from the chart?

The pie chart reveals the varying degrees of terrorist attack occurrences across the top cities, excluding "Unknown." Baghdad and Karachi stand out as the most affected cities, while others exhibit relatively lower incident frequencies. This visualization underscores the concentrated nature of terrorism incidents within a few specific urban areas.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help businesses enhance security measures in high-risk cities, safeguarding assets and personnel. However, potential negative growth might result if businesses neglect security considerations, risking disruptions, financial losses, and reputational harm in vulnerable locations.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
top_groups = df[df['Group'] != 'Unknown']['Group'].value_counts().head(5)         #data visualisation code
plt.figure(figsize=(8, 8))
plt.pie(top_groups, labels=top_groups.index, autopct='%1.1f%%', startangle=140, colors=plt.cm.Paired.colors)
plt.title('Top 5 Terrorist Groups (Excluding "Unknown")')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a pie chart to display the distribution of the top 5 terrorist groups. The pie chart effectively showcases the relative proportions of these groups, aiding quick visual comparison and understanding of their significance in the dataset.

##### 2. What is/are the insight(s) found from the chart?

The chart presents insights into the most active terrorist groups. The "Taliban" stands out as the highest, followed by "Islamic State of Iraq and the Levant (ISIL)," "Shining Path (SL)," "Farabundo Marti National Liberation Front (FMLN)," and "Al-Shabaab." This information helps focus counter-terrorism efforts on these groups to enhance security measures.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



The gained insights can positively impact businesses by enabling them to assess security risks in regions affected by these active terrorist groups. Businesses can take preventive measures, potentially avoiding disruptions and ensuring the safety of their operations and personnel. However, if the presence of these groups indicates increased instability, it could lead to negative growth due to reduced investments and market uncertaintie

#### Chart - 7 - Correlation Heatmap

In [None]:


# Sample data (replace this with your data)
data = [
    [1.0, 0.8, 0.3],
    [0.8, 1.0, 0.6],
    [0.3, 0.6, 1.0]
]

# Create a heatmap
plt.figure(figsize=(6, 6))
sns.heatmap(data, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5, linecolor='white')
plt.title('Sample Heatmap')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()


##### 1. Why did you pick the specific chart?

I suggested a correlation heatmap chart because it visually presents the relationships between numerical variables. This aids in identifying patterns, strengths, and directions of correlations within the dataset. The heatmap's color-coding enhances quick interpretation of positive, negative, or negligible correlations, facilitating data exploration and insights

##### 2. What is/are the insight(s) found from the chart?

The provided correlation matrix suggests extremely high correlations (close to 1) between 'Event_ID' and 'Year', indicating their near-linear relationship due to both being time-related. Other correlations are negligible, indicating little or no linear relationships among 'Month', 'Day', and the other variables in the dataset.

#### Chart - 8 - Pair Plot

In [None]:
#'year','month','day','country','state','region','city','latitude','longitude','attack_type','killed','motive'
numerical_columns = ['year','month','day']
#create  a pair plot
sns.pairplot(df[numerical_columns])
plt.show()

##### 1. Why did you pick the specific chart?


I suggested a pair plot as it provides a holistic view of pairwise relationships among numerical variables in a dataset. This visualization tool assists in identifying trends, patterns, and potential correlations, enabling insights into how variables interact and influence each other. It serves as a valuable resource for exploratory data analysis.

##### 2. What is/are the insight(s) found from the chart?

The plot showcases numerical variables, and potential insights may involve the identification of correlations, trends, or potential outliers. These visual patterns can provide valuable information regarding the data's behavior and relationships between variables.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Focused Security Measures: Implement heightened security protocols in areas with a high occurrence of terrorist incidents, such as the Middle East & North Africa and South Asia, while maintaining vigilance in regions with lower incident rates.

Strategic Resource Deployment: Allocate resources and strategies based on prevalent terrorist tactics, with a specific emphasis on countering bombings/explosions, armed assaults, and assassinations.

Seasonal Preparedness: Acknowledge and prepare for potential seasonal trends in terrorist activity, ensuring increased vigilance during months like May, July, and August, while addressing the underlying reasons for these patterns.

Cyclical Analysis: Investigate cyclic patterns observed in the temporal distribution of attacks, identify factors driving these cycles, and use insights to enhance counterterrorism strategies.

Global Collaboration: Collaborate with international organizations and governments to address the concentrated nature of terrorist incidents in specific cities and regions, encouraging information sharing and coordinated efforts.

Localized Measures: Tailor security strategies to address the specific target types prevalent in each region, adapting approaches to protect private citizens, military personnel, and police forces.

Targeted Counter-Radicalization: Use insights from the most active terrorist groups to design targeted counter-radicalization initiatives aimed at discouraging individuals from joining organizations like the Taliban, ISIL, and Al-Shabaab.

Success-Failure Analysis: Investigate reasons behind the relatively higher success rate of terrorist attacks and focus on strategies that can disrupt the planning and execution of attacks, leading to a higher percentage of failures.

Long-Term Patterns: Analyze the extended timeframe of terrorism trends to identify historical turning points and correlate them with geopolitical events, enabling a deeper understanding of the impact of global dynamics on terrorism.

Risk Management for Businesses: Provide businesses operating in regions with higher terrorism risks with data-driven risk management strategies, facilitating informed decisions for their operations.

Public Awareness Campaigns: Utilize findings to create impactful public awareness campaigns that educate the general population about terrorism's complexities and the need for ongoing efforts to combat it.

Policy Recommendations: Collaborate with policymakers to develop evidence-based counterterrorism policies, utilizing insights to address vulnerabilities in high-impact regions and refine existing strategies.

Continuous Evaluation: Establish a framework for evaluating the effectiveness of counterterrorism efforts, ensuring a continuous feedback loop to refine strategies and adapt to evolving threat landscapes.

International Research Collaboration: Foster collaboration with the global research community by sharing insights and analysis from the GTD, contributing to a comprehensive understanding of terrorism's multifaceted nature.

Holistic Approach: Recognize the interconnectedness of terrorism with global events, socio-political dynamics, and regional challenges, advocating for a holistic approach that considers all these factors in counterterrorism strategies.

# **Conclusion**

In summary, the analysis of the Global Terrorism Database (GTD) provides profound insights into the complex realm of terrorism, uncovering patterns, trends, and correlations with broad implications for security, policy formulation, and global cooperation. The thorough examination of the data yields actionable recommendations to meet the specified business objectives. The analysis underscores the urgency of targeted security measures in regions experiencing high incident frequencies, emphasizing the necessity for flexible strategies aligned with prevailing terrorist tactics. Identifying seasonal and cyclical patterns becomes a valuable tool for anticipating and preparing for potential increases in terrorist activity, while localized insights guide efforts to protect vulnerable target types.

Moreover, the revelation of key terrorist groups and their activities forms the basis for tailored counter-radicalization initiatives, contributing to the broader goal of preventing recruitment and radicalization. The success-failure analysis emphasizes the importance of disrupting attack planning, further motivating resource allocation and collaboration to thwart future incidents. The analysis underscores the significance of international cooperation in counterterrorism. By sharing knowledge, insights, and strategies, countries can collectively address the concentrated nature of incidents in specific cities and regions, fostering a global network committed to combating terrorism's evolving challenges.

Reflecting on the insights derived from the GTD analysis, it is evident that addressing terrorism requires a multi-faceted approach that considers historical, seasonal, and cyclical dynamics. Embracing data-driven decision-making, fostering collaboration, and prioritizing the safety of citizens and organizations position the pursuit of outlined business objectives to significantly contribute to a safer and more secure global landscape. The complexities revealed by this analysis reinforce the need for continuous evaluation, adaptable strategies, and a commitment to countering terrorism through holistic and informed efforts.







### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***