<a href="https://colab.research.google.com/github/rajshekharsingh66/Capstone_project1/blob/main/capstone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  Exploratory Data Analysis on dataset Global Terrorism



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member  -** Raj shekhar singh


# **Project Summary -**

Our project delves into the intricate realm of global terrorism, aiming to extract meaningful insights from a comprehensive dataset. This dataset serves as a crucial repository of information, shedding light on the multifaceted nature of terrorist activities worldwide. To embark on this analytical journey, we first navigate through the basics: understanding what this dataset encapsulates and the tools required for our analysis.

1. What this dataset is all about?
The dataset encompasses a wide array of information, detailing terrorist incidents globally. It spans several years and provides data on the motives, methods, and impact of these activities, offering a nuanced understanding of the global terrorism landscape.

2. Installation of Libraries and Dataset
Our analysis hinges on the power of Python's data analysis stack, employing libraries such as Pandas for data manipulation, NumPy for numerical operations, and Matplotlib and Seaborn for visualizations. Additionally, we utilize the 'os' library for efficient file handling. Importantly, we load and preprocess the dataset to make it amenable to our analytical techniques.

3. Basic Analysis
We kick off our exploration with basic statistical analyses, gaining insights into the dataset's structure and dimensions. Descriptive statistics provide an initial understanding of the data's central tendencies and variations.

4. Motive Behind Activities Related to Terrorism
One of the pivotal questions we tackle involves understanding the motives driving terrorist activities. By employing sophisticated analytical methods, we unravel patterns and discern underlying motives, bringing context to seemingly disparate incidents.

5. Number of Terrorist Activities Occurred in Region with Respect to Year
A temporal analysis helps us dissect the dataset across years and regions, discerning trends and fluctuations. This analysis provides a nuanced perspective on the evolving nature of terrorism worldwide.

6. Number of Terrorist Activities vs Year
Visual representations, crafted using Matplotlib and Seaborn, elucidate the relationship between the number of terrorist activities and the passage of time. These visualizations offer compelling insights into the dataset's temporal patterns.

7. Who Are the Main Targets?
By dissecting the data, we identify primary targets of terrorist activities. This information is vital for understanding the societal, political, and economic impacts of such incidents.

8. Hot Zones of Terrorism by Country and City
Geospatial analyses pinpoint terrorism hotspots, unveiling regions and cities most affected by these activities. Visual representations overlaying maps provide a striking portrayal of these hot zones.

9. What Are the Attacking Methods Used?
Our exploration extends to the methods employed in these attacks. By categorizing and visualizing these methods, we discern patterns that offer insights into the strategies of different terrorist groups.

10. Attacks vs Killed
A critical analysis revolves around understanding the correlation between the number of attacks and the fatalities they cause. This analysis underscores the devastating impact of terrorism on human lives.

11. Most Notorious Groups
Delving deep into the dataset, we identify and analyze the most notorious terrorist groups. Understanding these groups is pivotal for global security efforts.

12. Conclusion
In conclusion, our exploratory data analysis paints a comprehensive picture of global terrorism. Through the lens of data, we uncover patterns, motives, and impacts, offering invaluable insights for policymakers, security experts, and researchers. By utilizing Python's powerful libraries, our analysis becomes not just a study in data, but a tool for informed decision-making in the realm of global security.

This project showcases the profound impact of data analysis in deciphering complex phenomena, empowering us with knowledge that transcends borders and ideologies.

# **GitHub Link -**

- https://github.com/rajshekharsingh66/Capstone_project1

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

## Business Objective
Terrorism involves the use of violence to instill fear within a population, but it's crucial to recognize that not all acts of violence qualify as terrorism.

In my role as a security and defense analyst, I identify regions with a high incidence of terrorism and extract valuable security-related information and insights through Exploratory Data Analysis (EDA).

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import geopandas as gpd
import plotly.express as px

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
csv_file_path = '/content/Global Terrorism Data.csv'

encodings_to_try = ['utf-8', 'ISO-8859-1', 'latin1']

for encoding in encodings_to_try:
    try:
        df = pd.read_csv(csv_file_path, encoding=encoding)
        break  # If successful, exit the loop
    except UnicodeDecodeError:
        continue  # If decoding fails, try the next encoding


### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print("Number of rows:",df.shape[0])
print("Number of columns:",df.shape[1])

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
dup = df.duplicated().sum()
print(f'number of duplicated rows are {dup}')

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
null_counts = df.isnull().sum()

# Plotting a bar chart
plt.figure(figsize=(10, 6))
sns.barplot(x=null_counts.index, y=null_counts.values, palette='viridis')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better readability
plt.title('Null Values by Column')
plt.xlabel('Columns')
plt.ylabel('Null Count')
plt.show()

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# renaming the columns
df.rename(columns={'iyear':'Year','imonth':'Month','iday':'Day','country_txt':'Country',
                       'region_txt':'Region','attacktype1_txt':'AttackType','target1':'Target',
                       'nkill':'Killed','nwound':'Wounded','summary':'Summary','gname':'Group',
                       'targtype1_txt':'Target_type','weaptype1_txt':'Weapon_type','motive':'Motive'},inplace=True)
df=df[['Year','Month','Day','Country','Region','city','latitude','longitude','AttackType',
               'Killed','Wounded','Target','Summary','Group','Target_type','Weapon_type','Motive']]

# concatinating the columns killed and wounded
df["Killed"]=df["Killed"].fillna(0)
df["Wounded"]=df["Wounded"].fillna(0)
df["Casualty"]=df["Killed"]+df["Wounded"]
df.head(5)

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1
##Which region has highest terrorirst attacks?


In [None]:
# Chart - 1 visualization code

plt.figure(figsize=(15, 6))  # Adjust the width and height as needed

# Assuming 'df' is your DataFrame with a 'Region' column
region_counts = df['Region'].value_counts()

# Identify regions with less than 3 percent occurrence
threshold = 3
small_regions = region_counts[region_counts/region_counts.sum() < threshold/100].index

# Replace small regions with 'Others'
df['Region'] = df['Region'].replace(small_regions, 'Others')

# Recalculate counts for the modified DataFrame
region_counts = df['Region'].value_counts()

# Create a pie chart
plt.pie(region_counts, labels=region_counts.index, autopct='%1.1f%%', startangle=90)

# Add labels and a title
plt.axis('equal')  # Equal aspect ratio ensures that the pie chart is circular
plt.title('Regions having highest terrorist attacks')

plt.show()


##### 1. Why did you pick the specific chart?

A pie chart expresses a part-to-whole relationship in your data. It's easy to explain the percentage comparison through area covered in a circle with different colors. Wherever different percentage comparison comes into action, pie chart is used frequently. So, i have used Pie Chart and which helped us to get the percentage comparison more clearly and precisely.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2
##Which country has highest terrorirst attacks?

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(47,6))
sns.countplot(x='Country', data=df , order=df['Country'].value_counts().index)

# Add labels and a title
plt.xlabel('Countries')
plt.ylabel('Sum of Occurences')
plt.title('Regions having highest terrorist attacks')

# Rotate x-axis labels for better readability (optional)
plt.xticks(rotation=90)

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are used to compare the size or frequency of different categories or groups of data. Bar charts are useful for comparing data across different categories, and they can be used to display a large amount of data in a small space.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3 Which is the most weapon used by terrorirsts ?

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(34,6))
df['Weapon_type'] = df['Weapon_type'].replace('Vehicle (not to include vehicle-borne explosives, i.e., car or truck bombs)', 'Vehicle')
sns.countplot(x='Weapon_type',data=df,order=df['Weapon_type'].value_counts().index)
plt.xlabel('Weapons')
plt.ylabel('Sum of Occurences')
plt.title('Weopens Used by terrorists')
plt.xticks(rotation=55)
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are used to compare the size or frequency of different categories or groups of data. Bar charts are useful for comparing data across different categories, and they can be used to display a large amount of data in a small space.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4   Number of terrorist activities occured in region with respect to year

In [None]:
# Chart - 4 visualization code
df_grouped = df.groupby(['Year', 'Region'])['Killed'].sum().reset_index()

# Set the style of the plot
sns.set(style="whitegrid")
plt.figure(figsize=(34, 6))

# Create the animated bar plot
sns.barplot(x="Region", y="Killed", hue="Year", data=df_grouped, palette="viridis")

# Customize the plot
plt.xticks(rotation=45)
plt.xlabel("Region")
plt.ylabel("Total Killed")
plt.title("Total Killed per Region Over the Years")

# Show the legend
plt.legend(title="Year")

# Show the plot
plt.tight_layout()
plt.show() # use a bin of 1000 or 500 .

##### 1. Why did you pick the specific chart?

Bar charts are used to compare the size or frequency of different categories or groups of data. Bar charts are useful for comparing data across different categories, and they can be used to display a large amount of data in a small space.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5 Number of Terrorist activities vs Year

In [None]:
# Assuming 'df' is your DataFrame with a 'Year' column
plt.subplots(figsize=(15, 6))
sns.countplot(x='Year', data=df, palette='GnBu', edgecolor=sns.color_palette('PuBuGn_r', 7))
plt.xticks(rotation=60)
plt.title('Number Of Terrorist Activities Each Year')
plt.show()  #line plot or point plot


##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6  Who are the main targets?

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(34,6))
sns.countplot(x='Target_type',data=df,order=df['Target_type'].value_counts().index )
plt.xlabel('Targets')
plt.ylabel('Sum of occurences')
plt.title('Main Targets')

plt.xticks(rotation = 60)# use 90 rotation
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are used to compare the size or frequency of different categories or groups of data. Bar charts are useful for comparing data across different categories, and they can be used to display a large amount of data in a small space.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# Load the world map data
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Assuming 'df' is your DataFrame with a 'Country' column
# Filter the data for the top 15 affected countries
top_countries = df['Country'].value_counts()[:100].index
df_top_countries = df[df['Country'].isin(top_countries)]

# Merge the world map with your data
merged_df = world.merge(df_top_countries['Country'].value_counts().reset_index(),
                        left_on='name',
                        right_on='index',
                        how='left')

# Create a choropleth map
fig = px.choropleth(merged_df,
                    locations='name',
                    locationmode='country names',
                    color='Country',
                    title='Top Affected Countries',
                    color_continuous_scale='Plasma',
                    height=600,
                    width=1000,
                    labels={'Country': 'Number of Occurrences'})

fig.update_layout(geo=dict(showframe=False, showcoastlines=False, projection_type='natural earth'))

fig.show()

In [None]:
plt.subplots(figsize=(18, 6))
sns.barplot(x=df['Country'].value_counts()[:15].index, y=df['Country'].value_counts()[:15].values, palette='plasma_r')
plt.title('Top Affected Countries')
plt.xticks(rotation=60)  # Rotate x-axis labels for better readability if needed
plt.show()

##### 1. Why did you pick the specific chart?

Choropleth maps are used to visualize spatial data by shading or coloring geographic regions based on the intensity of a particular variable. They are particularly useful when you want to show variations in data across different geographic areas, allowing you to identify patterns, trends, and hotspots.

Bar charts are used to compare the size or frequency of different categories or groups of data. Bar charts are useful for comparing data across different categories, and they can be used to display a large amount of data in a small space.

##### 2. What is/are the insight(s) found from the chart?

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8 Hot zones of terrorism (by City)

In [None]:
# Chart - 8 visualization code
plt.subplots(figsize=(18,6))
sns.barplot(x = df['city'].value_counts()[1:15].index,y = df['city'].value_counts()[1:15].values,palette='gnuplot2_r')
plt.title('Top Affected Cities')
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are used to compare the size or frequency of different categories or groups of data. Bar charts are useful for comparing data across different categories, and they can be used to display a large amount of data in a small space.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9 What are the attacking methods used?

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(34,6))
sns.countplot(x='AttackType',data=df,order=df['AttackType'].value_counts().index )
plt.xlabel('Attack type')
plt.ylabel('Sum of occurences')
plt.title('Attacking methods used')

plt.xticks(rotation = 60)
plt.show()#use pie chart

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10 Attack vs Killed

In [None]:
# Chart - 10 visualization code
coun_terror=df['Country'].value_counts()[:15].to_frame()
coun_terror.columns=['Attacks']
coun_kill=df.groupby('Country')['Killed'].sum().to_frame()
coun_terror.merge(coun_kill,left_index=True,right_index=True,how='left').plot.bar(width=0.9)
fig=plt.gcf()
fig.set_size_inches(18,6)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11 Most Notorious Groups

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(16, 6))
sns.barplot(x=df['Group'].value_counts()[1:15].values, y=df['Group'].value_counts()[1:15].index, orient='horizontal')
plt.xlabel('Sum of occurrences')
plt.ylabel('Group')
plt.title('Most Notorious Groups')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12 Activity of Top Terrorist Groups

In [None]:
# Chart - 12 visualization code
top_groups10=df[df['Group'].isin(df['Group'].value_counts()[1:11].index)]
pd.crosstab(top_groups10.Year,top_groups10.Group).plot(color=sns.color_palette('magma',10))
fig=plt.gcf()
fig.set_size_inches(18,6)
plt.show()

##### 1. Why did you pick the specific chart?

Stacked area charts are used to visualize the composition of a whole over time or across different categories. They are particularly useful when you want to show the changing proportions of multiple components within a total while highlighting their individual contributions.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 14 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

In summary, we've explored different forms of analysis and engaged in question-and-answer sessions with a dataset that piqued my interest. I'd like to highlight something valuable I've discovered. When we merely glance at a dataset, all we can discern is the column information and its contents. Therefore, Exploratory Data Analysis (EDA) proves to be an incredibly effective method to simplify our tasks. Thanks to Python libraries such as pandas, numpy, matplotlib, seaborn, and plotly, we're equipped to draw conclusions, perform calculations, and create visualizations, greatly enhancing our analytical capabilities.


Based on the preceding analysis, we can draw the following conclusions:
* Iraq ranks highest in the number of attacks.
* The Middle and East Africa region experiences the highest number of attacks.
* The predominant weapon employed by terrorists is explosives.
* Within the analysis, it is evident that in 2014, Iraq witnessed the highest level of terrorist activities.
* Private citizens and property emerge as the primary targets of terrorist activities.
* The most prevalent methods in terrorism activities involve bombing and explosions.
* The Taliban stands out as the most notorious group, having been involved in the highest number of terrorist activities.

***Robust security measures should be provided to countries such as Iraq, Pakistan, Afghanistan, and India, as well as to regions in the Middle East and cities like Baghdad, Karachi, Lima, and numerous others.***

***Precautions must be taken concerning explosive devices, as bombings and explosions are the most frequently utilized weapons by terrorists.***

***Effective security should be extended to safeguard private citizens, property, the military, police, and other entities, as they represent the primary targets of terrorism.***

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***