<a href="https://colab.research.google.com/github/nm1708/global-terriorism-data-analysis-1/blob/main/edaS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## <b> The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.</b>

# <b> Explore and analyze the data to discover key findings pertaining to terrorist activities. </b>

In [1]:
#IMPORT SOME LIBERARIES
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline


In [2]:
#IMPORT GOOGLE COLAB 
from google.colab import drive

In [3]:
#MOUNT THE DRIVE
drive.mount('/content/drive/')

MessageError: ignored

In [None]:
#READING THE DATA
df_data=('/content/drive/MyDrive/data/Global Terrorism Data.csv')
df=pd.read_csv((df_data), encoding='ISO-8859-1')

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
#GETTING TO KNOW THE DATA
df.info()

Data Cleaning

In [None]:
#TO VIEW SHAPE
print("There are {} rows and {} columns in the dataset".format(df.shape[0],df.shape[1]))

In [None]:
# NAME OF COLUMNS
df.columns

In [None]:
#DATA TYPES OF COLUMNS
df.dtypes

In [None]:
#Descriptive Statistics
#Includes summarized central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values

df.describe()

In [None]:
#Calculating % of missing values in dataset
missing_values = (((df.isnull().sum()).sum())/df.size)*100
missing_values

**We have found that more than 50% of the values are Null,
Hence, we need to clean the Dataset**

**Selecting necessary columns only**

In [None]:
df = df[['iyear','imonth','iday','country_txt','provstate','region_txt','latitude','longitude','city','latitude','longitude','attacktype1_txt','nkill', 'nwound','gname','target1','targtype1_txt','weaptype1_txt']]
df.head(10)

**Renaming the columns**

In [None]:
df.rename(columns={'iyear':'Year',
                   'imonth':'Month',
                   'iday':'Day',
                   'country_txt':'Country',
                   'provstate':'State',
                   'region_txt':'Region',
                   'latitude':'Latitude',
                   'longitude':'Longitude',
                   'attacktype1_txt':'Attack_Type',
                   'target1':'Target',
                   'nkill':'Killed',
                   'nwound':'Attacked',
                   'gname':'Group',
                   'targtype1_txt':'Target_Type',
                   'weaptype1_txt':'Weapon_type',
                   'latitude':'Latitude',
                   'longitude':'Longitude',
                   'target1':'Target',
                   'city':'City'},inplace=True)
df.head(10)

**Again, checking the % of missing values**

In [None]:
missing_values = (((df.isnull().sum()).sum())/df.size)*100
missing_values

**Now the Dataset is ready to be worked on as the Null Values are only ~ 1.2% of the total dataset**

**Finding unique numbers in each column**

In [None]:
for i in df.columns:
    print(i, df[i].nunique())

In [None]:
df.info()

*From the above information we found that the numbers in Killed and Wounded are missing
Hence we'll fill the mean of 'Wounded' and 'Killed' in those places*

**Note : We can also fill 0 "Zero"**

In [None]:
df['Attacked'] = df['Attacked'].fillna(df['Attacked'].mean()).astype(int)
df['Killed'] = df['Killed'].fillna(df['Killed'].mean()).astype(int)

**Getting to know the new DataFrame**

In [None]:
df.info()

 **1-The Attack**
- Count of different types of attacks

In [None]:
df['Attack_Type'].value_counts() 

In [None]:
plt.figure(figsize=(12,6))
sns.barplot(x = df['Attack_Type'].value_counts().index, y = df['Attack_Type'].value_counts().values, palette='CMRmap')
plt.title('Different Types of Attacks', fontsize=20, weight = 'bold')
plt.xlabel('Type of Attack', fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.xticks(rotation= 90)
plt.show()

**Inference**
Top three type of attack is BOMBING/EXPLOSION,ARMED ASSAULT,ASSASSINATION

**- % of count of different types of attacks**

In [None]:
perc_Attack_Types = (df['Attack_Type'].value_counts()/df.shape[0])*100
perc_Attack_Types

In [None]:
mylabels = df['Attack_Type'].value_counts().index
myexplode = (0.05,0.05,0,0,0,0,0,0,0)

plt.figure(figsize = (12,12))
plt.pie(perc_Attack_Types, explode=myexplode, labels=mylabels, autopct='%0.1f%%', shadow=False)
#plt.legend()
plt.title('Attack Types Percentage', fontsize=20, weight = 'bold')
plt.show()

**Inference**

**Top 5 types of attacks in %**

1-Bombing/Explosion 48.6%

2-Armed Assault 23.5%

3-Assassination 10.6%

4-Hostage Taking (Kidnapping) and 6.1%

5-Facility/Infrastructure Attack 5.7%



**2. The Countries and The States**
- Top 15 Countries with most attacks

In [None]:
df.Country.value_counts()[:15]

In [None]:
plt.figure(figsize=(12,6))
sns.barplot(x = df['Country'].value_counts()[:15].index, y = df['Country'].value_counts()[:15].values, palette='CMRmap')
plt.title('Top 15 Countries Affected', fontsize=20, weight = 'bold')
plt.xlabel('Country', fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.xticks(rotation= 90)
plt.show()

**Inference**

Iraq,Pakistan,Afganistan are most affected counrties

In [None]:
perc_Country = (df['Country'].value_counts()[:15]/df['Target_Type'].shape[0])*100
perc_Country

In [None]:
# Comparing No. of Attacks with Killings for top 15 countries

attacked = df.Country.value_counts()[:15].to_frame()
attacked.columns = ['Attacked']

kills = df.groupby(['Country'])['Killed'].sum().sort_values(ascending =False).to_frame()
attacked.merge(kills, how = 'left' , left_index = True, right_index = True ).plot.bar(width = 0.6 , color = sns.color_palette('CMRmap',2))
fig=plt.gcf()
fig.set_size_inches(20,16)
plt.title("Attacks vs Kills in mostly attacked 15 countries", fontsize = 20, weight = 'bold')
plt.ylabel("Attacks vs Kills", fontsize = 15)
plt.xlabel("Country", fontsize = 15)
plt.show()

**Top 5 countries attacked are :**

1-Iraq

2-Pakistan

3-Afghanistan

4-India
 
5-Colombia

**Inference**
most attack vs kill countries are IRAQ,PAKISTAN and followed by AFGANISTAN

**Top 15 States with most attacks (Except Unknown)**

In [None]:
df.State.value_counts()[:16].drop('Unknown')

In [None]:
plt.figure(figsize = (12,6))
sns.barplot(x = df['State'].value_counts()[:16].drop('Unknown').index, y = df['State'].value_counts()[:16].drop('Unknown').values, palette='CMRmap') # or flare_r
plt.title('Top 15 Most Attacked States',fontsize=20, weight = 'bold')
plt.xlabel('States',fontsize=15)
plt.ylabel('Number of Attacks',fontsize=15)
plt.xticks(rotation=90)
plt.show()
#plt.gcf().set_size_inches(15, 5)

**Most number of state attacked**

In [None]:
perc_State = (df['State'].value_counts()[:15].drop('Unknown')/df['Target_Type'].shape[0])*100
perc_State

**Top 5 states attacked are :**

1-Baghdad

2-Northern Ireland

3-Balochistan

4-Saladin

5-Al Anbar

**Inference** BAGHDAD,NORTHERN IRELAND,BALOCHISTAN Are top three most number of attack states.

3. The Targets

**Top 15 Types of Target (Except Unknown)**

In [None]:
df['Target_Type'].value_counts()[:16].drop('Unknown')

In [None]:
plt.figure(figsize=(12,6))
sns.barplot(x = df['Target_Type'].value_counts()[:16].drop('Unknown').index, y= df['Target_Type'].value_counts()[:16].drop('Unknown').values, palette='CMRmap')
plt.title('Top 15 Target Types', fontsize=20, weight = 'bold')
plt.xlabel('Targets', fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.xticks(rotation= 90)
plt.show()

**distribution of the Top 15 Targets (Except Unknown)**

In [None]:
perc_Target_Types = (df['Target_Type'].value_counts()[:16].drop('Unknown')/df['Target_Type'].shape[0])*100
perc_Target_Types

**Top 5 target type are**

1-Private Citizens & Property

2-Military

3-Police

4-Government (General)

5-Business

**Inference**mostly target PRIVATE CITIZENS & PROPERTY,MILITARY And followed by POLICE


**4. The Region**
- Regions with most attacks

In [None]:
df['Region'].value_counts()

In [None]:
plt.figure(figsize = (12,6))
sns.barplot(x = df['Region'].value_counts().index, y = df['Region'].value_counts().values, palette='CMRmap') # or flare_r
plt.title('Most Attacked Regions',fontsize=20, weight = 'bold')
plt.xlabel('Regions',fontsize=15)
plt.ylabel('Number of Attacks',fontsize=15)
plt.xticks(rotation=90)
plt.show()

**Attacks with region**

In [None]:
perc_Attack_Region = (df['Region'].value_counts()/df['Region'].shape[0])*100
perc_Attack_Region

In [None]:
mylabels = df['Region'].value_counts().index
myexplode = (0.05,0.05,0.05,0,0,0,0,0,0,0.5,0.5,0.5)

plt.figure(figsize = (12,12))
plt.pie(perc_Attack_Region, labels=mylabels, explode=myexplode, autopct='%0.1f%%')
#plt.legend()
plt.title('Most attacked target %',fontsize=20, weight = 'bold')
plt.show()

**Top 5 attacked regions  are :**

1-Middle East & North Africa-27.8%

2-South Asia-24.8%

3-South America-10.4%

4-Sub-Saharan Africa-9.7%

5-Western Europe-9.2%

**Inference** Most attack region MIDDLE EAST & NORTH AFRICA,SOUTH ASIA,And followed by SOUTH AMERICA

**5.The weapons**
- Top 5 weapons used

In [None]:
df['Weapon_type'].value_counts().head(5)

In [None]:
plt.figure(figsize=(12,8))
sns.barplot(x = df['Weapon_type'].value_counts().head(5).index,y = df['Weapon_type'].value_counts().head(5).values, palette='CMRmap')
plt.title('Top 5 Weapons Types', fontsize=20, weight = 'bold')
plt.xlabel('Weapon', fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.xticks(rotation= 90)
plt.show()

 **weapons used**

In [None]:
perc_Weapon_Type = (df['Weapon_type'].value_counts()/df['Weapon_type'].shape[0])*100
perc_Weapon_Type

Since, The Chemical, Sabotage Equipment, Vehicle, Other, Biological, Fake Weapons, Radiological is less than 1, we will drop them while plotting

In [None]:
perc_Weapon_Type = (df['Weapon_type'].value_counts().head(5)/df['Weapon_type'].shape[0])*100
perc_Weapon_Type

In [None]:
mylabels = df['Weapon_type'].value_counts().head(5).index
myexplode = (0.01,0.01,0.01,0.01,0.01)

plt.figure(figsize = (12,12))
plt.pie(perc_Weapon_Type, labels=mylabels, explode=myexplode, autopct='%0.1f%%')
#plt.legend()
plt.title('Weapon Types',fontsize=25, weight = 'bold')
plt.show()

**Top 5 weapons used**

1-Explosives

2-Firearms

3-Unknown

4-Incendiary

5-Mele

**Inference** EXPLOSIVES,FIREARMS And UNKNOWN Are most used weapons type.

**6. The attacking groups
Top 15 most attacking groups (except unknown)**

In [None]:
df['Group'].value_counts()[1:16]

In [None]:
plt.figure(figsize=(12,6))
sns.barplot(x = df['Group'].value_counts()[1:16].index, y = df['Group'].value_counts()[1:16].values, palette='CMRmap')
plt.title('Top 15 Attacking Groups (Except Unknown)', fontsize=20, weight = 'bold')
plt.xlabel('Weapon', fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.xticks(rotation= 90)
plt.show()

**Top 15 most attacking groups (except unknown)**

In [None]:
perc_Group = (df['Group'].value_counts()[1:16]/df['Group'].shape[0])*100
perc_Group

In [None]:
plt.figure(figsize=(12,8))
sns.barplot(x = perc_Group.index, y = perc_Group.values, palette='CMRmap')
plt.title('Target Types %', fontsize=20, weight = 'bold')
plt.xlabel('Targets', fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.xticks(rotation= 90)
plt.show()

**Top 5 terrorist groups :**

1-Taliban

2-Islamic State of Iraq and the Levant (ISIL)

3-Shining Path (SL)

4-Farabundo Marti National Liberation Front (FMLN)

5-Al-Shabaab

**Inference** TALIBAN,ISLAMIC STATE OF IRAQ AND THE LEVANT(ISIL),SHINING PATH(SL) Are top three terrorist group

**7. Year-Wise Analysis of Attacks and Casualties**
- Taking unique years as x_year variable

In [None]:
x_year = df['Year'].unique()
x_year

- Taking count of years as y_count_years variable

In [None]:
y_count_years = df['Year'].value_counts(dropna = False).sort_index()
y_count_years

**Top 5 most Attacked Years**

In [None]:
df['Year'].value_counts().head(5)

In [None]:
y_count_years.plot.bar(width = 1.0, edgecolor = 'black')
fig=plt.gcf()
fig.set_size_inches(20,10)
plt.title("Number of Attacks each year", fontsize = 20, weight = 'bold')
plt.ylabel("Number of Attacks", fontsize = 15)
plt.xlabel("Year", fontsize = 15)
plt.xticks(rotation = 90)
plt.show()

**Comparing No. of Attacks with Killings for years 1972 to 2017**

In [None]:
# Comparing No. of Attacks with Killings for years 1972 to 2017
most_attacked_years = df.Year.value_counts().to_frame()
most_attacked_years.columns = ['Attacked']

most_killed_years = df.groupby(['Year'])['Killed'].sum().sort_values(ascending =False).to_frame()

**Top 5 years with most deaths**

In [None]:
most_killed_years.head(5)

In [None]:
most_attacked_years.merge(most_killed_years, how = 'left', left_index = True, right_index = True).plot.bar(width = 1.0, color = sns.color_palette('CMRmap',2))
fig=plt.gcf()
fig.set_size_inches(20,10)
plt.title("Attacks vs Kills in years", fontsize = 20, weight = 'bold')
plt.ylabel("Attacks vs Kills", fontsize = 15)
plt.xlabel("Year", fontsize = 15)
plt.xticks(rotation = 90)
plt.show()

**Inference** 2014,2015,2016 are top three year by attack and kills by terrorist

In [None]:
plt.figure(figsize = (20,10)
sns.kdeplot(df['Year'], hue = df['Region'])
plt.title('Terrorist Activities by Region in each Year',fontsize=20, weight = 'bold')
plt.xlabel('Years',fontsize=15)
plt.ylabel('Frequency of Attacks',fontsize=15)
#plt.xticks(rotation=90)
plt.show()

**SUMMARY**

-Global Terrorism Analysis was done by group of 4 members -Nitesh Mishra,Rohit Sharma,Abhishek verma,and Aditya Dhoundiyal.In this project we got 1 csv file global terrorism as an input.

So,we decided this into 4 different task.

Work done by every individual discription 

**NITESH MISHRA**-We first decided to take up this project solely due to our mutual interest in GLOBAL TERRORISM becouse of it relates  in real life.when we download the csv file we were shocked how big data is this,we talk about project in very tharoly manner and split the task.
After split the task we decided team member gives his insight from project in own end,after doing this i go for the project and some analysis from the given data,the very first problem that i faced in my csv file was the name of some columns are unmeaningful. Then i change name of columns in meaning way by using function rename like {'imonth':'month'},{'iyear':'year'},then select some useful rows and columns for further analysis.
The two insight of mine.

1-The top 3 attack types is BOMBING/EXPLOSION,ARMED ASSOULT,ASSASSAINATION

The top 3 attack types in percentage.

Bombing/Explosion=48.6%

Armed Assoult=23.5%

Assassination=10.6%

2-The top 3 three countries and states are most attacks.

(COUNTRIES)
IRAQ,
PAKISTAN,
AFGANISTAN

(STATES) BAGHDAD,NORTHERN IRELAND,BALOCHISTAN

**ROHIT SHARMA**-When i started the project i have some ideas like name of columns are meaningful manner,and i have already two insight that given by nitesh then i decided go in continuation.The two insight of my end are.

1-Top 3 target type are

Private Citizens & Property,
Military,Police

2-Top 3 attacked regions are in percentage :

Middle East & North Africa-27.8%,
South Asia-24.8%,
South America-10.4%

**ABHISHEK VERMA**-When i started the project i have multiple idea like structure data and four insight of project then i go for insight from my end.

1-Top 3 weapons used are;

Explosives,
Firearms,unknown

2-Top 3 terrorist groups :

Taliban,Islamic State of Iraq and the Levant (ISIL),Shining Path (SL)

**ADITYA DHOUNDIYAL**-I m working for end this project with the remaing insight of project from my end also,i have aslo multiple ideas clues that are given by my team members,the insight of mine are

1-Year-Wise Analysis of Attacks and Casualties
The top the 3 most attack and casualties in year is 

1-2014 = 46534

2-2015 = 40463

3-2016 = 36427



**CONCLUSIONS**

1-The top 3 attack types is BOMBING/EXPLOSION,ARMED ASSOULT,ASSASSAINATION

The top 3 attack types in percentage.

Bombing/Explosion=48.6%

Armed Assoult=23.5%

Assassination=10.6%

2-The top 3  countries and states are most attacks.

(COUNTRIES) IRAQ, PAKISTAN, AFGANISTAN

(STATES) BAGHDAD,NORTHERN IRELAND,BALOCHISTAN 

3-Top 3 target type are

Private Citizens & Property, Military,Police

4-Top 3 attacked regions are in percentage :

Middle East & North Africa-27.8%, South Asia-24.8%, South America-10.4%

5-Top 3 weapons used are;

Explosives, Firearms,unknown

6-Top 3 terrorist groups :

Taliban,Islamic State of Iraq and the Levant (ISIL),Shining Path (SL)

7-Year-Wise Analysis of Attacks and Casualties The top the 3 most attack and casualties in year is

1-2014 = 46534

2-2015 = 40463

3-2016 = 36427