<a href="https://colab.research.google.com/github/luckysaxena94/Global-terrorism-data/blob/main/EDA_submission_of_Global_terrorist_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -Global Terrorism





##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Team
##### **Team Member 1 -** Lucky Saxena
##### **Team Member 2 -** Aditya Raj
##### **Team Member 3 -** Abhisek swain
##### **Team Member 4 -** Swarup sunil mane
##### **Team Member 5 -** Talari venkatesh

# **Project Summary -**

The "Exploring and Analyzing Global Terrorism Trends" project delves into the vast and comprehensive Global Terrorism Database (GTD) spanning from 1970 to 2017. This open-source repository, meticulously curated by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, contains a staggering record of over 180,000 terrorist incidents worldwide. The project's primary objective is to extract invaluable insights, patterns, and trends from this extensive dataset to enhance our understanding of global terrorism and contribute to better counter-terrorism strategies.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**



Terrorism is a global threat that demands rigorous analysis and understanding to develop effective counter-terrorism strategies. The Global Terrorism Database (GTD), spanning from 1970 to 2017, is a vast and meticulously curated resource containing records of over 180,000 terrorist incidents worldwide. However, this wealth of data remains underutilized, and there exists a pressing need to extract actionable insights from it.


#### **Define Your Business Objective?**

The business objective is to leverage the GTD's data to extract actionable insights that support efforts to counter terrorism comprehensively, efficiently, and effectively, ultimately contributing to improved security and safety on a global scale.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import plotly.express as px

### Dataset Loading

In [None]:
# Load Dataset
#mounting drive
from google.colab import drive
drive.mount('/content/drive')
filepath="/content/drive/MyDrive/Global Terrorism Data.csv"
df= pd.read_csv(filepath,encoding = "ISO-8859-1")



### Dataset First View

In [None]:
# Dataset First Look
df.head()

In [None]:
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

In [None]:
df.columns.values

### Dataset Information

In [None]:
# Dataset Info
df.info

In [None]:
df.describe()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
nan_count=df.isnull().sum()
#i just want to check how much columns have missing values
missing_values=nan_count[nan_count>0]
missing_values


In [None]:
# Visualizing the missing values
plt.figure(figsize=(14,6))
sns.heatmap(df.isnull(), cbar=False)
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

In this dataset, 106 out of 135 columns have missing values, with some columns having more than 90% missing data. This indicates a substantial amount of missing information that requires careful handling during data *analysis

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns.values


In [None]:
df.describe(include='all')

In [None]:
# Renaming the column for clarity and consistency
# We are renaming the 'old_column_name' to 'new_column_name' to improve data interpretability
df.rename(columns={'iyear':'Year','imonth':'Month','iday':"day",'gname':'Group','country_txt':'Country','region_txt':'Region','provstate':'State','city':'City','latitude':'latitude',
    'longitude':'longitude','summary':'summary','attacktype1_txt':'Attacktype','targtype1_txt':'Targettype','weaptype1_txt':'Weapon','nkill':'kill',
     'nwound':'Wound'},inplace=True)

### Variables Description

Year: The year in which the incident occurred.

Month: The month in which the incident occurred.

Day: The day of the month when the incident.

Country: The country where the incident took place.

State: The state or province within the country.

Region: The geographical region of the incident.

City: The city where the incident occurred.

Latitude: The latitude coordinate of the incident location.

Longitude: The longitude coordinate of the incident location.

Attacktype: The type of attack (Assassination, Bombing/Explosion).

Kill: The number of people killed in the incident.

Wound: The number of people wounded in the incident.

Target1: A description of the primary target of the attack.

Summary: A brief summary or description of the incident.

Group: The group responsible for the attack.

Targettype: The type of target that was attacked (Government, Military).

Weapon: The weapon or method used in the attack (Explosives, Firearms).

Motive: The suspected motive or reason behind the attack.

INT_LOG: A variable indicating whether the incident was part of international logistics (0 for no, 1 for yes,-9 for Uknown).

INT_IDEO: A variable indicating whether the incident was part of international ideology (0 for no, 1 for yes,-9 for Unknown).

Success: A binary variable indicating the success of the attack (0 for unsuccessful, 1 for successful).

Individual: A binary variable indicating if the attack was carried out by an individual (0 for no, 1 for yes).

Multiple: A binary variable indicating if multiple individuals or groups were involved in the attack (0 for no, 1 for yes).

Dbsource: The source of the data.

Nkillter: The number of terrorists killed in the incident.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# NaN percentage in each and every columnn of the dataset
nan_percentage=(nan_count/len(df))*100
nan_percentage

In [None]:
# Identify columns with a high percentage of missing values (>= 50%)
column_with_high_nan=nan_percentage[nan_percentage>=50]
#Iterate through columns with a high percentage of missing values
# and print the column names along with their missing value percentage
for key,values in column_with_high_nan.items():
  print(key,values) # These are the list of columns with high missing values
print(len(column_with_high_nan))

In [None]:
#Identify columns with a  missing values (< 50%)
missing_values_less_then_50=nan_percentage[nan_percentage<50]
print(missing_values_less_then_50)
print(len(missing_values_less_then_50))#these are the columns which has missing vlaue less then 50

In [None]:
#Selecting columns with less missing data and that are relevant for analysis
df1=df[['Year','Month','day','Country','State','Region','City','latitude','longitude',"Attacktype",'kill','Wound','target1','summary','Group','Targettype','Weapon','motive','INT_LOG','INT_IDEO','success','individual','multiple','dbsource','nkillter']]

In [None]:
df1.head()

In [None]:
df1.columns

In [None]:
# Total sum of missing value in our selecting columns in new dataframe
df1.isnull().sum()

In [None]:
df1.describe()

In [None]:
#identify the columns which has missing value grater the 0
df1.isnull().sum()[df1.isnull().sum()>0]

In [None]:
# Handling missing values of the 'kill' column
df['kill'].isna().sum()

In [None]:
#Drawing the QQ plot to check the df['kill'] is normally distribuated or not
sm.qqplot(df1['kill'], line='s')
plt.title("Q-Q Plot")
plt.show()

This Graph  clearly indicates that the data is not normally Distribuated

In [None]:
# Handling missing values in the 'kill' column by replacing them with zeros
df1['kill'].fillna(0,inplace=True)
#This code snippet converts the 'kill' column values to integers after cleaning.
df1['kill'] = df1['kill'].apply(lambda x: int(float(pd.to_numeric(x, errors='coerce'))))

filling missing values in the "kill" column that aligns with the distribution of the data. Given that the majority of values are zero (0th to 50th percentiles),mean of the column is 2.4 and the standard deviation(11.54) is relatively high, a reasonable choice could be to fill missing values with the median, which is 0.

In [None]:
# Handling missing values in the 'Wound' column
df1['Wound'].isna().sum()

In [None]:
# Handling missing values in the 'kill' column by replacing them with zeros
df1['Wound'].fillna(0,inplace=True)
#This code snippet converts the 'kill' column values to integers after cleaning.
df1['Wound'] = df1['Wound'].apply(lambda x: int(float(pd.to_numeric(x, errors='coerce'))))

filling missing values in the "wound" column that aligns with the distribution of the data. Given that the majority of values are zero (0th to 50th percentiles) mean is 3.16 and the standard deviation(35.94) is relatively high, a reasonable choice could be to fill missing values with the median, which is 0

In [None]:
# Handling missing values in the 'nkillter' column by replacing them with zeros
df1['nkillter'].fillna(0,inplace=True)

In [None]:
# Handling missing values in the 'multiple' column by replacing them with 1
df['multiple'].fillna(0,inplace=True)

multiple column has only one misssing vlaue this column has binary value which is 0,1 fill missing value either with 0 or 1

In [None]:
# Handling missing values in the city,target1 and state  columns
print(df1['City'].isna().sum())
print(df1['State'].isna().sum())
print(df1['target1'].isna().sum())

In [None]:
# Counting the number of records where 'City' is labeled as 'Unknown'
print(len(df1[df1['City']=='Unknown']))
# Counting the number of records where 'State' is labeled as 'Unknown'
print(len(df1[df1['State']=='Unknown']))
# Counting the number of records where 'target1' is labeled as 'Unknown'
print(len(df1[df1['target1']=='Unknown']))

In [None]:
# Handling missing values in the city,target1 and state  columns and replacing the missing value with 'Unknown'
df1.fillna(value={'City':'Unknown', 'target1': 'Unknown','State':'Unknown'}, inplace=True)

replasing missing values in the columns 'City', 'State', and 'target1' with the value 'Unknown' because this value appears frequently in those columns.

### What all manipulations have you done and insights you found?

All manipulations we have done yet.

Handling Missing Values:

Missing values in the 'kill' column were replaced with zeros. The 'kill' column values were converted to integers after cleaning. Missing values in the 'Wound' column were replaced with zeros. The 'Wound' column values were converted to integers after cleaning. Missing values in the 'nkillter' column were replaced with zeros. Missing values in the 'multiple' column were replaced with zeros. Missing values in the 'City,' 'target1,' and 'State' columns were addressed. Counting Unknown Values:

The number of records where 'City' is labeled as 'Unknown' was counted. The number of records where 'State' is labeled as 'Unknown' was counted. The number of records where 'target1' is labeled as 'Unknown' was counted. Handling Missing Values in 'City,' 'target1,' and 'State' Columns:

Missing values in the 'City,' 'target1,' and 'State' columns were replaced with 'Unknown.'

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
#data disteribution
# this represent the distribution of data on each series in the dataframe
# Create histograms to visualize the distribution of data for each series in the DataFrame
df1.hist(figsize=(20,10))

In [None]:
df1.describe()

##### 1. Why did you pick the specific chart?

Histograms were chosen as the specific chart because they are a suitable choice for visualizing the distribution of numeric data. They display how data is distributed across different value ranges, making it easy to identify patterns, central tendencies, and any skewness in the data.

##### 2. What is/are the insight(s) found from the chart?


The insights that can be derived from histograms include:

Central Tendency: You can identify the central value (mean, median) of each variable by looking at the peak of the histogram.

1. Spread: The width of the histogram gives an idea of the spread or variability of the data.             
2. Skewness: The shape of the histogram (symmetrical or skewed) indicates the distribution's skewness (positive or negative).                   
3. Outliers: Extreme values or outliers can be observed as values that fall far from the bulk of the data.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive business impact

The insights gained from the histograms can be valuable for various business decisions:

1: Market Opportunity: Recognizing countries with high-security challenges can also highlight regions where there is an unmet demand for goods and services. Businesses that can provide products or services to address security-related needs, such as security technology, risk assessment services, or crisis management solutions, may find growth opportunities.

2 : Infrastructure Development: Companies involved in infrastructure development, including construction and rebuilding projects in post-conflict regions, can experience growth as these areas rebuild and require critical infrastructure.

3 : Security Services: Security firms and organizations providing security services can use this data to identify potential clients or areas where their services are in demand.

4 : Travel and Tourism: Companies in the travel and tourism industry can use this data to provide travelers with information about safe destinations and offer guidance on travel security.

Negative Business impact

1: Market Volatility: Businesses operating in regions prone to security incidents may experience market volatility. Protests, violence, and terrorist attacks can disrupt business operations, leading to supply chain interruptions, decreased consumer confidence, and reduced sales.

2: Insurance Costs: Companies operating in regions with security concerns may face higher insurance premiums or limited coverage options. This can increase the cost of doing business and negatively impact profit margins.


#### Chart - 2

In [None]:
#idetify years with a higher frequency of attacks?
# Get unique years from the 'Year' column
year=df['Year'].unique()
# Count the number of occurrences of each year and reset the index to create a DataFrame
year_count=df['Year'].value_counts().reset_index()
# Rename the columns for clarity
year_count=year_count.rename(columns={'index':'year','Year':'value'})
# Create a bar plot to visualize the number of attacks per year
plt.figure(figsize=(12, 8))
sns.barplot(x='year',y='value',data=year_count)    # Create the bar plot
# Add labels and a title to the plot for clarity
plt.xlabel('year')  # X-axis label
plt.ylabel('count') # Y-axis label
plt.title('per year attack')
# Rotate the X-axis labels for better readability
plt.xticks(rotation=90)
plt.show()


##### 1. Why did you pick the specific chart?

The choice of a bar chart aligns with the data's characteristics and the goal of visualizing the distribution of attacks over the years. It provides an easy-to-understand representation that allows for quick comparisons and trend analysis

##### 2. What is/are the insight(s) found from the chart?

The chart displaying the frequency of attacks per year provides Years Go Up and Down: Some years have more attacks like in 2014 most of the attack world wide happend , and some have fewer. It's not the same every year.

Patterns Change Over Time: If we look closely, we can see that the number of attacks changes over time. Some years are calm, and then there are years with lots of attacks.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:         
1. Market Expansion: For businesses considering expansion into regions with a history of security challenges, these insights can inform risk assessments and security planning.
2. Investment Decisions: Investors and financial institutions may consider these insights when evaluating opportunities in regions with varying security challenges.

Negative Business Impact:

1.  If specific years have disproportionately high numbers of attacks, it could indicate negative impacts on local economies and stability.        

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Create a cross-tabulation of the 'Year' and 'Region' columns and then plot it as a stacked area plot
# This shows the number of terrorist attacks by region over the years
pd.crosstab(df1.Year, df1.Region).plot(kind='area',stacked=True,figsize=(20,10))
plt.ylabel('No:of Attacks',fontsize=25)  # Y-axis label
plt.xlabel("Years",fontsize=25)          # Y-axis label
plt.title('Terrorist Activities (Region) In Each Year',fontsize=30) # Title of the plot
plt.show()

##### 1. Why did you pick the specific chart?

The choice of an area plot aligns with the data's characteristics and the goal of visualizing how terrorist activities are distributed across regions over a time series. It is an effective choice for this specific analysis as it enables viewers to understand temporal and regional trends simultaneously.


##### 2. What is/are the insight(s) found from the chart?


The chart offers a comprehensive view of how terrorist activity are distributed across regions over time, in this graph we can clearly see that the terrorist activity has significantly increases in Middle East,North America and South Asia during the time frame 2012 to 2017 It can serves as a useful tool for policymakers, security analysts, and researchers to better understand the complex nature of security challenges.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

1. Supply Chain Optimization: Companies with global supply chains can use this information to optimize their routes and logistics. Safer regions can be chosen for supply chain paths, reducing the risk of disruptions.
2. Market Expansion and Targeting: Understanding the regions with changing patterns of terrorist activities can inform decisions about market expansion
3. Terrorism significantly increased in the Middle East and North Africa (MENA) and South Asia regions from 2014 to 2017.Invest in geospatial intelligence to monitor and analyze hotspots of terrorist activity in these regions. This will enable proactive responses and resource allocation.

Negative business impacts.

1. Unpredictable Patterns: In regions where terrorist activities show unpredictable patterns, businesses may struggle to plan and adapt. Sudden spikes in attacks or shifts in hotspots can lead to market volatility, supply chain disruptions
2. Insurance Costs: Operating in high-risk regions can lead to increased insurance premiums or limited coverage options

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Count the occurrences of each country in the 'Country' column,
# then reset the index and take the top 10 most frequent countries
country_count=df1[['Country']].value_counts().reset_index().head(10)
country_count
country_count =country_count.rename(columns={0: 'Values'})
country_count
plt.figure(figsize=(10, 6))
sns.barplot(x='Country', y='Values', data=country_count)
plt.xlabel('Country') # X-axis label
plt.ylabel('Values') # y-axis label
plt.title('Most Attack Country')
plt.xticks(rotation=60)
plt.show()

##### 1. Why did you pick the specific chart?

The selection of this chart type is based on the nature of the data and the goal of the visualization.

1. Categorical Data: The data being visualized consists of categorical data, specifically, the names of countries. Each country is a distinct category.

2. Count or Frequency: The goal is to display the frequency or count of attacks for each country, showing which countries have the highest number of attacks.

3. Comparison: Bar plots are excellent for comparing values across different categories. In this case, we want to compare the number of attacks in various countries.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart we can under stand following.

The insights found from the chart showing the top  countries with the highest number of recorded attacks are as follows:

1: Iraq Dominates: Iraq stands out as the country with the highest number of attacks by a significant margin, indicating a history of conflict or instability.

2: Pakistan and Afghanistan: Pakistan and Afghanistan follow Iraq in the ranking, emphasizing their historical involvement in conflicts and security challenges.

3: India and Colombia: India and Colombia also feature in the top 10, underscoring regional security concerns and historical conflict situations.

4: Middle Eastern Presence: Several Middle Eastern countries are prominent in the list, reflecting the region's complex geopolitical landscape.

5: Geopolitical Insights: The chart provides geopolitical insights, highlighting countries where terrorism and security issues have been significant concerns over time.

6: Informative Ranking: The chart efficiently ranks countries by the number of attacks, making it a useful tool for policymakers, researchers, and security analysts.

Overall, the chart provides a visual snapshot of the countries with the most recorded attacks, offering valuable insights into global and regional security challenges.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Gained insights may or may not lead to a positive business impact, depending on the nature of the business and its operations:

Positive Business Impact:

If a business operates in one of the countries with a high frequency of attacks, the insights can inform security measures, risk management, and contingency planning.
Businesses that provide security services or products may find opportunities for growth and support in regions with a high incidence of terrorism.

Negative Growth:

For businesses operating in regions with a high frequency of terrorist attacks, the insights might highlight increased risks to personnel, assets, and operations.
The presence of a business in a country with a high number of attacks could lead to negative growth if not properly managed. It may result in increased security costs and disruptions to operations.

Ultimately, the impact on a specific business depends on its industry, location, and the extent to which it is affected by terrorism in the regions where it operates. Therefore, it's crucial for businesses to assess these insights in the context of their unique circumstances and develop appropriate strategies and risk management plans.






#### Chart - 5

In [None]:
iraq_data = df1[df1['Country'] == 'Iraq']

#Group the data by terrorist group and count the incidents
group_counts = iraq_data['Group'].value_counts()

# Find the group with the highest activity
most_active_group_iraq = group_counts.idxmax()
most_active_group_iraq


According to the data most active terror group in Iraq is Unknown the reason behind this is may be the lack of information about terror acttivity data or it might be happend no group took the responcibilty of attacks so that we are neglacting the unknown values in terrorist group column and wanted to check which group is more active in iraq after Unknown who took the responcibilty.



In [None]:
iraq_data = df1[df1['Country'] == 'Iraq']

# Filter out incidents with "unknown" group
iraq_data = iraq_data[iraq_data['Group'] != 'Unknown']

# Group the data by terrorist group and count the incidents
group_counts = iraq_data['Group'].value_counts()

# Find the group with the highest activity
most_active_group = group_counts.idxmax()
print(f"The most active terrorist group in Iraq is {most_active_group} with {group_counts.max()} incidents.")

# Create a DataFrame for the most active group in Iraq
active_group_in_country = iraq_data[iraq_data['Group'] == most_active_group]

# Create a scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(active_group_in_country['longitude'], active_group_in_country['latitude'], s=20, alpha=0.5)
plt.xlabel('Longitude')  # X-axis label
plt.ylabel('Latitude')   # Y-axis label
plt.title(f'Scatter Plot of Incidents for {most_active_group} in Iraq')
plt.grid(True)

plt.show()

print(f"The most active terrorist group is: {most_active_group}")
print(f"The country with the highest activity for this group is: Iraq")

##### 1. Why did you pick the specific chart?

Because Scatter plot can clearly show us in which particular area in iraq  most Terrorist  activity happend and scatter plot can give the good visulazition bw two numarical variable.

##### 2. What is/are the insight(s) found from the chart?

In this chart we can clearly see the area which has longitude from 42 to 46  and Latitude from 33 to 37 has more terror activity happend by ISiL Group which most danger area in Iraq




##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights might not necessarily lead to a positive business impact but can be crucial for security and risk management:

Positive Impact:
1. intelligence agency should  monitor and analyze hotspots of terrorist activity in these area. This will enable proactive responses and resource allocation.
2. Most active Group in iraq is ISIL intelligence agency can track and disrupt the operations of these groups, dismantling their networks and preventing attacks.

3. Businesses operating in Iraq or nearby regions can use this information to assess security risks and implement appropriate security measures to protect their personnel and assets.

Negative Growth:

The presence of a highly active terrorist group in a region can lead to negative growth due to increased security costs, potential disruptions to operations, and reputational risks.

It's crucial for businesses to be aware of such security challenges and have risk mitigation strategies in place to address potential negative impacts.

In summary, while the insights from the scatter plot may not directly result in positive business growth, they are essential for risk assessment and security planning, which are critical for maintaining stability and minimizing negative impacts in regions affected by terrorism.







Chart-6

In [None]:
# Identifying the Most Active Terrorist Groups in the World
##df1['Group'].value_counts() counts the occurrences of each unique terrorist group.
most_active_group=df1['Group'].value_counts().reset_index().head(11)
most_active_group=most_active_group.rename(columns={'index': 'Group','Group':'values'})
#Droping the first row of the dataframe most_active_group
#most_active_group.drop(0) this code wiil drop the first row from the dataframe
most_active_group= most_active_group.drop(0)
most_active_group.reset_index(drop=True,inplace=True)
#idetify the most active group
plt.figure(figsize=(12,6))
sns.barplot(x='Group',y='values',data=most_active_group)
plt.xlabel('Terrorist Group')
plt.ylabel('Number of Attack Attempt')
plt.xticks(rotation=90)

plt.title('Most active Group')

##### 1. Why did you pick the specific chart?

The primary goal of this chart is to compare different terrorist groups based on their activity levels (number of attack attempts). Bar plots excel at visualizing and comparing data across categories, making them ideal for this purpose.

##### 2. What is/are the insight(s) found from the chart?

The most active terrorist groups during the analyzed period, which are characterized by a high number of recorded attack attempts.

1. Taliban: With 7,478 recorded attack attempts, the Taliban is the most active terrorist group during the analyzed period
2. Islamic State of Iraq and the Levant (ISIL): ISIL ranks second in activity, with 5,613 recorded attack attempts.
3. Shining Path (SL): The Shining Path group is among the top three most active groups, with 4,555 recorded attack attempts


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact
1. Resource Allocation: Counter-terrorism agencies can allocate their resources more effectively by focusing their efforts on monitoring, infiltrating, and disrupting the activities of the most active terrorist groups.
2. Intelligence Gathering: The chart provides a clear picture of which groups are the most prolific in terms of attack attempts. Counter-terrorism units can prioritize intelligence gathering and analysis related to these groups, enabling them to stay ahead of potential threats.
3. International Cooperation: The insights from this chart can facilitate international cooperation and intelligence sharing among countries and organizations. Active groups often operate across borders, making cross-border collaboration essential

Neagative Business Impact

While the chart provides insights into the most active terrorist groups, it doesn't directly indicate negative growth.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
#which group has killed the most people in attack
df2=df1.groupby(['Group'])[['kill','Wound']].sum().sort_values(by='kill',ascending=False).reset_index().head(10)
df2 = df2.drop(0)
plt.figure(figsize=(15,10))
sns.barplot(x='Group',y='kill',data=df2)
plt.xlabel('Group name')
plt.ylabel('Kill')
plt.title('Group which killed hieghest people')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

The primary goal of this chart is to compare different terrorist groups based on their activity level how much people they have killed. Bar plots excel at visualizing and comparing data across categories, making them ideal for this purpose.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart:

ISIL is the most dangerous terrorist group with killing highest number of people Tailban and Boko-Hram on rank secound and Third in our chart if this group give threats then counter terrorism agency need to take it seriously

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights might not necessarily lead to a positive business impact, but they are crucial for risk assessment and security planning:

Positive Impact:

1. The group(Islamic State of Iraq and the Levant (ISIL),Taliban) which killed highest number of people active in particular region then the Organizations offering counter-terrorism training and consulting services may see growth in demand. Businesses, especially those with international operations, may seek expert guidance to enhance their security
Negative Growth:

1. The presence of such deadly terrorist groups can lead to negative growth due to increased security costs, potential disruptions to operations, and reputational risks.
2. Failure to address the threat posed by these groups could result in severe consequences, including harm to personnel and damage to assets.

In summary, the insights from the bar chart are essential for understanding the impact of terrorist groups and are critical for risk assessment and security planning. Businesses operating in regions where these groups are active should use this information to inform their security strategies and risk management efforts.

#### Chart - 8

In [None]:
num_attack_df1 = df1.groupby('Year').size()
num_attack_df1.name = "number of attacks"
num_attack_df1.head()
#We group terrorist attacks by year and the number of persons killed
terroristtrends = df1.groupby('Year').agg({'kill':'sum','Wound':'sum'})
terroristtrends = pd.concat([terroristtrends,num_attack_df1],axis=1)
terroristtrends.head()
# Let's create a new column composed by dead and wounded persons by year named victims
terroristtrends['victims']=terroristtrends['kill']+terroristtrends['Wound']
fig = px.line(terroristtrends,x=terroristtrends.index, y='victims', title='Terrorist attacks trends',template='plotly_dark')
fig.data[0].name="number of victims"
fig.update_traces(showlegend=True)
fig.add_scatter(x = terroristtrends.index, y = terroristtrends['number of attacks'], mode ='lines',name='number of attacks')


fig.update_layout(xaxis_title='Year',yaxis_title='Terrorism Trends')
fig.show()

##### 1. Why did you pick the specific chart?

 The line chart depicting terrorism attack trends over the years, was selected for several reasons.
 1. Temporal Analysis: The primary objective is to analyze how terrorist attacks have evolved over time (by year). A line chart is well-suited for visualizing trends and changes in data over a continuous timeline.
 2. Multiple Data Series: The chart accommodates multiple data series (in this case, the number of victims and the number of attacks) on the same graph. This allows for easy comparison of two related but distinct aspects of terrorist incidents.

##### 2. What is/are the insight(s) found from the chart?

Insights

1. Overall Increase in Attacks: The chart reveals that the total number of terrorist attacks has generally increased over the years, particularly from the mid-2000s onwards. This suggests a rising trend in global terrorism incidents.
2. Fluctuations in Victims: The line representing the number of victims (combining those killed and wounded) from 1970 to 2017 most number of victim of Terrorism are bw the years 2012 to 2017 ,the years 2014 experienced more lethal attacks.
3. Correlation between Attacks and Victims: There appears to be a correlation between the number of attacks and the number of victims. When the number of attacks increases, the number of victims also tends to rise

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can be valuable for various purposes, including security, risk assessment, and policy development:

Positive Impact:

1. Businesses and organizations operating in regions or industries prone to terrorism can use this information to assess the security risks they face and develop strategies to mitigate these risks.
2. Government agencies and security organizations can use this data to allocate resources effectively and enhance counterterrorism efforts.

Negative Growth:

1. The positive correlation between the number of attacks and the number of victims implies that an increase in terrorist attacks often leads to a higher number of casualties. This could have negative implications for businesses and regions experiencing such spikes.
2. Negative growth may occur if a business operates in an area with a significant increase in terrorist attacks, as it may lead to higher security costs, operational disruptions, and potential harm to personnel and assets.

In summary, the insights gained from the chart provide a historical perspective on terrorism trends. While they are crucial for understanding the security landscape, they also highlight the potential risks and challenges that businesses and organizations may face in regions affected by terrorism.






#### Chart - 9

In [None]:
#identify the most common used weapon in Terrorism
#autopct='%1.1f%%' it shows the percentage in pie chart
df1.value_counts('Weapon').head().plot(kind='pie',figsize=(10,6),autopct='%1.1f%%')

##### 1. Why did you pick the specific chart?

1. Clear Comparison: A pie chart provides a clear and concise way to compare the proportions of different categories (in this case, weapon types). It allows viewers to easily see the relative distribution of each category at a glance.

2. Percentage Representation: Pie charts represent data as percentages of the whole, making it straightforward to understand the proportion of each weapon type in relation to the total number of incidents.

##### 2. What is/are the insight(s) found from the chart?

Insights:

1. Dominance of Explosives: The most striking insight is that explosives are the most commonly used weapon type around 51% in terrorist incidents
2. Significant Use of Firearms: Firearms are the second most frequently used weapon category around 32% although they are notably smaller in proportion compared to explosives.
3. Use of Unknown Weapons: The presence of an "Unknown" category highlights the challenge of identifying the specific weapon type in some incidents. This emphasizes the importance of improving data accuracy and intelligence in counter-terrorism efforts.
4. Incendiary Devices and Melee Weapons: Incendiary devices and melee weapons represent smaller but still significant portions of the chart. This indicates that these weapon types are used in a notable number of incidents.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impcts:

1. Security Service Providers: Companies that offer security services, including those specializing in explosives detection, firearm security, and threat assessment, may see increased demand for their expertise and solutions.
2. Intelligence Gathering: Counter-terrorism departments must gather intelligence on terrorist organizations, their tactics, and their access to different weapon types. This information helps in understanding potential threats and planning counter-measures.
3. Explosives Detection: Invest in advanced explosives detection technology and equipment to identify and neutralize explosive threats. This includes bomb-sniffing dogs, X-ray scanners, and chemical detectors.
I4. ncendiary Device Detection: Develop capabilities for detecting and neutralizing incendiary devices, particularly in public places where fires can cause significant damage.

Negative Impact:

1. Disruption of Operations: Acts of terrorism involving weapons, such as bombings or shootings, can disrupt business operations. This disruption may result in physical damage to facilities, loss of productivity, and supply chain interruptions.
2. Tourism Downturn: Regions affected by violent conflicts or terrorism often experience a decline in tourism. This can negatively impact businesses in the tourism and hospitality sectors, including hotels, restaurants, and entertainment venues.

#### Chart - 10

In [None]:
# Chart - 9 visualization code
# What is the most common target type,and percentage
df1.value_counts('Targettype').head().plot(kind='pie',figsize=(10,8),autopct='%1.1f%%')

##### 1. Why did you pick the specific chart?

1. Clear Comparison: A pie chart provides a clear and concise way to compare the proportions of different categories (in this case, weapon types). It allows viewers to easily see the relative distribution of each category at a glance.

2. Percentage Representation: Pie charts represent data as percentages of the whole, making it straightforward to understand the proportion of each weapon type in relation to the total number of incidents.

##### 2. What is/are the insight(s) found from the chart?

Insights:

1. Private Citizens & Property Predominance: The most significant insight is that attacks on private citizens and property constitute the largest share around 31% of terrorist incidents. This suggests that terrorists often target civilians and civilian infrastructure.
2. Military and Police Attacks: Military and police targets also account for a substantial portion of terrorist attacks. This highlights the vulnerability of security forces and their involvement in counter-terrorism efforts.
3. Government and Business Targets: Government (General) and Business targets follow closely in terms of attack frequency.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Impact

1. Private Citizens & Property are the most frequently targeted Prioritize threat assessments for civilian areas, critical infrastructure, and public spaces to enhance security measures and protect civilians.
2. Security Investments: Businesses can allocate resources more effectively for security measures. For example, businesses operating in areas with a high risk of attacks on private citizens can invest in enhanced security for employees and customers.
3. Intelligence Gathering: Data on target types can guide intelligence-gathering efforts. Agencies can focus on monitoring and infiltrating groups that have a history of targeting specific sectors.

#### Chart - 11 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(15,10))
#this show how much related one perameter to the other in the data set
sns.heatmap(np.round(df1.corr(),2),annot=True, cmap='BuPu')

##### 1. Why did you pick the specific chart?

1. Correlation Exploration: A correlation heatmap is specifically designed to display the correlation coefficients between pairs of variables in a dataset.
2. Numeric Data: Heatmaps are most suitable for datasets with numeric data, in a dataset like the Global Terrorism Database (GTD), where many numeric attributes are present.
3. Annotated Values: The inclusion of annotated values within each cell of the heatmap provides precise correlation coefficients

##### 2. What is/are the insight(s) found from the chart?

Insights:

1. Geographical Trends: The "latitude" and "longitude" variables show negative correlations with other variables, indicating that specific geographic coordinates may not be strongly correlated with attack-related factors.
2. Fatalities and Injuries: The "kill" variable is positively correlated with the "Wound" (number of injuries) variable, indicating that when number of kill and increased then number of injured people increases
3. Terrorist Kill and civilian kill- The kill variabel is positively correlated with the 'nkillter'(Terrorist killed) column which show that the when number of civilian killing increases then number of Terrorist kill also incrases

#### Chart - 12 - Pair Plot

In [None]:
# Pair Plot visualization code
# Select the columns of interest for the pair plot
columns_of_interest = [ 'kill','Wound','nkillter',]
#Create a subset DataFrame with the selected columns
subset_df = df1[columns_of_interest]

# Create a pair plot using seaborn
sns.pairplot(subset_df)
plt.show()


##### 1. Why did you pick the specific chart?

1. Pair plots are primarily used to gain insights into relationships between variables. They provide a visual representation of pairwise interactions, making it easier to identify patterns and correlations
2. Multivariate Analysis: Pair plots are particularly useful when dealing with multiple variables (columns) in a dataset. it allow us to visualize how each variable relates to every other variable, providing a comprehensive view of data relationships.

##### 2. What is/are the insight(s) found from the chart?

Insights.
1. nkillter and kill-There is a digonally upward trend in these columnn which means  the nkillter(Terrorist killed) and kill column are positivly correlated
2. wound and kill - there is digonally little upword trend in these column which means the wound (injured) and the kill column has positivly correleted
3. 3.Distribution Shapes: The graph of kill,wound, nkillter doesn't show the normal distribuation it means that the outliers are present in these columns

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?


1. Geospatial Intelligence: Terrorism significantly increased in the Middle East and North Africa and South Asia regions from 2014 to 2017

Solution: Invest in geospatial intelligence to monitor and analyze hotspots of terrorist activity in these regions. This will enable proactive responses and resource allocation.
2. Targeted Threat Assessment: Private Citizens & Property are the most frequently targeted.

Solution: Prioritize threat assessments for civilian areas, critical infrastructure, and public spaces to enhance security measures and protect civilians.
3. Focus on Active Groups:Identify and monitor the most active terrorist groups, such as Taliban, ISIL, and Boko Haram.

Solution: Allocate intelligence resources to track and disrupt the operations of these groups, dismantling their networks and preventing attacks.
4. Weapon and Attack Type Analysis:Explosives and firearms are the most commonly used weapons in attacks.

Solution: Enhance border security and implement stricter controls on firearms and explosives to reduce the availability of these weapons to terrorists.
5. Cross-Agency Collaboration:

Solution: Encourage collaboration and information sharing between national and international counter-terrorism agencies, promoting a coordinated response to global threats.
6. Public Awareness:

Solution: Educate the public about recognizing signs of radicalization and reporting suspicious activities to law enforcement.

# **Conclusion**

1. Trends: Terrorism has evolved over the years, with a significant increase in incidents from 2012 to 2017.
2. In 2014 number of terrorist activity was highest hence, casuality was also highest
3. Terrorism significantly increased in the Middle East , North Africa and South Asia regions after 2010
4. Most Affected Country : Iraq,Pakistan,Afganistan, india, Columbia are the most affected country by Terrorism  with the highest number of attacks by a significant margin, indicating a history of conflict or instability.
5. Active Groups:Top Most active terrorist groups in the world  such as Taliban followed by ISIL and Shinning Path. Counter Terrorism department can Monitoring and disrupting the operations of these active terrorist groups
6. Weapon and Attack Type: Explosives and firearms are commonly used in attacks.controls on these weapons can reduce their availability to terrorists.
7. Target identification: Private Citizens & Property  are the most frequently target by the terrorist group .hence it shows that the target on privat citizens is easy and it can do more damage apart from private citizens milatries are the secound top most target by the terrorist group


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***