# **Project Name**    - Global Terrorism EDA



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

This project focuses on analyzing a dataset of global terrorism events, aiming to uncover patterns and trends related to the frequency, severity, and geographical distribution of terrorist attacks. The dataset contains over 180,000 records, detailing events such as attack type, number of casualties, geographical coordinates, and ransom demands.

The analysis begins with data cleaning, where irrelevant columns and rows with excessive missing values are removed. Numeric columns like the number of casualties and geographical coordinates are imputed with their respective means, while skewed columns (e.g., ransom amount) are filled with their median values. Categorical columns (e.g., city, province) have their missing values replaced by the most frequent value (mode).

The cleaned data is then analyzed to identify trends such as the global distribution of terrorist activities, regional patterns, and the impact of different attack types on casualties. Visualizations are created to present these insights clearly, including attack frequency over time, geographical mapping, and comparisons of attack severity by region.

The dataset is exported to a cleaned CSV file, making it ready for further analysis or sharing. This project provides valuable insights into the patterns of terrorism over time, helping to understand how attacks evolve across different regions and attack types.

This project can be further expanded with more advanced analysis, including machine learning techniques to predict future terrorism trends or cluster similar attack types for deeper insights into their causes and effects.

# **GitHub Link -**

https://github.com/kush-agra-soni/6_global_terrorism_eda.git

# **Problem Statement**


The problem addressed in this project is the analysis of global terrorism events to identify patterns, trends, and key factors contributing to the frequency, severity, and geographical distribution of terrorist attacks. The dataset, which contains over 180,000 records, includes various attributes such as the year, country, region, city, attack type, number of casualties, ransom demands, and more.

The main objectives of the project are:

1. **Data Cleaning**: The dataset contains numerous irrelevant columns, missing values, and inconsistencies that need to be addressed. The first step is to remove unnecessary columns, handle missing data, and ensure the dataset is clean and ready for analysis.

2. **Trend Analysis**: The project aims to uncover trends in terrorism, such as attack frequencies over time, the distribution of attacks across different regions, the impact of attack types on casualties, and the role of ransom demands.

3. **Geographical Insights**: By leveraging location-based data, the project will explore the geographical distribution of terrorist attacks to identify high-risk regions and patterns of attack concentration.

4. **Impact Assessment**: The project will assess the impact of terrorism by analyzing the number of casualties (both killed and wounded), and identifying the most targeted regions, countries, and attack types.

The ultimate goal is to transform raw, unstructured data into actionable insights that can help policymakers, security organizations, and researchers understand terrorism trends, plan interventions, and allocate resources more effectively. Additionally, the project provides a foundation for future advanced analytics, including predictive modeling and machine learning techniques.

#### **Define Your Business Objective?**

The business objective of this project is to provide a comprehensive analysis of global terrorism patterns to assist governments, security agencies, and organizations in reducing the frequency and impact of terrorist activities. By analyzing the dataset, we aim to identify trends related to the timing, location, and nature of attacks, offering actionable insights for improving security measures. This analysis will help in identifying regions and cities that are most affected by terrorism, allowing governments and businesses to allocate resources more effectively to mitigate risks and enhance preparedness.

> The project also focuses on understanding the human and financial impact of terrorism, including the number of casualties and ransom demands, enabling better decision-making in terms of policy-making, resource allocation, and victim support. A key aspect of the analysis is the geospatial dimension, which will help pinpoint hotspots of terrorist activities, thus guiding security agencies in targeting high-risk areas and enhancing operational strategies.

Additionally, the project examines attack types and target preferences to provide deeper insights into the operational behavior of terrorist groups. Finally, the analysis of ransom-related data will be instrumental in shaping counter-terrorism finance policies, as well as fostering international collaboration in combating terrorism funding. In summary, this project aims to leverage data-driven insights to improve counter-terrorism strategies, ultimately contributing to enhanced global security.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
import missingno as msno
from sklearn.preprocessing import StandardScaler
from scipy import stats
import plotly.express as px
import plotly.io as pio

### Dataset Loading

In [None]:
# Load Dataset

# GitHub raw URLs for your datasets
base_url = "https://raw.githubusercontent.com/kush-agra-soni/6_global_terrorism_eda/refs/heads/main/"
dataset_url = f"{base_url}main_data.csv"

gtds_df = pd.read_csv(dataset_url)

### Dataset First View

In [None]:
# Dataset First Look
gtds_df.head(1)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
gtds_df.shape

### Dataset Information

In [None]:
# Dataset Info
gtds_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
gtds_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
gtds_df.isnull().sum()

### What did you know about your dataset?

The dataset contains information about global terrorism events, specifically focusing on incidents reported between 1970 and 2017. It is structured to capture various aspects of each event, such as the location, timing, nature of the attack, and the outcomes (e.g., number of casualties and financial demands). The data is organized into columns that provide details about the event's unique identifier, date, country, region, city, attack type, target, and impact on human lives, among others.

Key features of the dataset include:

1. **Event-specific information**: Columns such as `eventid`, `iyear`, `imonth`, `iday` identify the specific terrorist event, including the year, month, and day of occurrence.
2. **Geographical data**: Columns like `country`, `region`, `provstate`, `city`, `latitude`, and `longitude` offer geospatial context for each attack, helping to map and analyze the geographical distribution of terrorism.
3. **Attack characteristics**: Attributes like `attacktype1` (type of attack), `suicide` (whether the attack was a suicide mission), and `targtype1` (target type) provide insights into the tactics, methods, and intended targets of terrorist groups.
4. **Impact data**: Key metrics such as `nkill` (number of people killed) and `nwound` (number of people wounded) provide the human cost of the events. Additionally, columns like `ransom` and `ransomamt` relate to the financial aspects of terrorism.
5. **Other contextual data**: The dataset also includes information like `dbsource` (source of the data) and `related` (indicating if the event is connected to other incidents), which are useful for tracing patterns and investigating the broader context of terrorism events.

Overall, this dataset is rich with information for performing exploratory data analysis (EDA) and building predictive models that could assist in understanding and combating terrorism. However, it also has some missing values and irrelevant columns that need cleaning for effective analysis.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
gtds_df.columns

In [None]:
# Dataset Describe
gtds_df.describe()

### Variables Description

Here’s a grouped and shorter description of the dataset variables:

### Event Information:
- **eventid**: Unique event ID (Integer)
- **iyear**: Year of the event (Integer)
- **imonth**: Month of the event (Integer)
- **iday**: Day of the event (Integer)

### Location Information:
- **country**: Country code (Integer)
- **country_txt**: Country name (String)
- **region**: Region code (Integer)
- **region_txt**: Region name (String)
- **provstate**: Province/State (String, nullable)
- **city**: City (String, nullable)
- **latitude**: Latitude coordinate (Float)
- **longitude**: Longitude coordinate (Float)

### Attack Details:
- **attacktype1**: Attack type code (Integer)
- **attacktype1_txt**: Attack type (String)
- **suicide**: Suicide attack indicator (1/0, Integer)
- **targtype1**: Target type code (Integer)
- **targtype1_txt**: Target type (String)

### Casualties and Damage:
- **nkill**: Number of killed (Float)
- **nwound**: Number of wounded (Float)
- **property**: Property damage indicator (1/0, Integer)

### Ransom Details:
- **ransom**: Ransom demanded indicator (1/0, Integer)

### Source and Logistical Information:
- **dbsource**: Data source (String)
- **INT_LOG**: Logistical support indicator (1/0, Integer)
- **INT_IDEO**: Ideological support indicator (1/0, Integer)
- **INT_MISC**: Miscellaneous support indicator (1/0, Integer)


This concise grouping highlights the key aspects of the dataset, categorized into event, location, attack, casualties, ransom, source, and related event information.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in gtds_df.columns:
    unique_values = gtds_df[column].unique()
    print(f"Unique values for {column}:")
    print(unique_values)
    print()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Step 1: List essential columns required for analysis
# essential_columns = [
#     'eventid', 'iyear', 'imonth', 'iday', 'country', 'country_txt', 'region', 'region_txt',
#     'provstate', 'city', 'latitude', 'longitude', 'attacktype1', 'attacktype1_txt', 'suicide',
#     'targtype1', 'targtype1_txt', 'nkill', 'nwound', 'property', 'ransom',
#     'dbsource', 'INT_LOG', 'INT_IDEO', 'INT_MISC']

# Step 2: Retain only the essential columns and drop the rest
# df = df[essential_columns]

# Step 3: Verify the structure of the cleaned dataset
# df.info()  # Provides an overview of columns and non-null counts

# Step 4: Handle missing values for numeric columns
# Fill numeric columns with the mean where appropriate (e.g., 'nkill', 'nwound', etc.)
# numeric_columns = ['nkill', 'nwound', 'latitude', 'longitude', 'ransom']  # Specify numeric columns

# for col in numeric_columns:
#     if col in df.columns:  # Ensure the column exists in the dataset
#         df[col] = df[col].fillna(df[col].mean())  # Fill missing values with mean

# Step 5: Handle missing values for skewed numeric columns (if any)
# In this case, no skewed columns remain, so this block is left empty
# skewed_columns = []  # No skewed columns based on prior analysis
# for col in skewed_columns:
#     if col in df.columns:
#         df[col] = df[col].fillna(df[col].median())  # Fill missing values with median

# Step 6: Handle missing values for categorical columns
# Fill missing values in categorical columns with the mode (most frequent value)
# categorical_columns = ['city', 'provstate']  # Specify categorical columns
# for col in categorical_columns:
#     if col in df.columns:  # Ensure the column exists in the dataset
#         df[col] = df[col].fillna(df[col].mode()[0])  # Fill missing values with mode

# Step 7: Verify the dataset after filling missing values
# df.info()  # Check dataset structure and confirm missing values are handled

# Step 8: Save the cleaned dataset to a CSV file
# df

### What all manipulations have you done and insights you found?

### Data Manipulations Performed

1. **Column Selection**  
   - Retained only essential columns required for the analysis, focusing on the key variables relevant to the dataset's objectives. Non-essential columns like `related` and `ransomamt` were removed to simplify the dataset.

2. **Handling Missing Values**  
   - **Numeric Columns**: Filled missing values with the mean for columns like `nkill`, `nwound`, `latitude`, `longitude`, and `ransom` to ensure consistent analysis while maintaining the dataset's integrity.  
   - **Categorical Columns**: For columns like `city` and `provstate`, missing values were replaced with the mode (most frequently occurring value) to preserve their categorical nature.  
   - **Skewed Columns**: Addressed potential outliers by considering median-based imputation, though no such columns remained after earlier cleaning.

3. **Data Validation**  
   - Ensured that all manipulations (missing value handling and column filtering) were applied only to the remaining essential columns by checking dataset structure after each step.

4. **Dataset Export**  
   - Saved the cleaned dataset into a CSV file, ensuring it was ready for analysis without unnecessary columns or null values.

---

### Insights Found

1. **Streamlined Dataset**  
   - The dataset is now more focused, containing only relevant columns, making analysis more efficient and targeted.

2. **Data Completeness**  
   - Missing data was handled appropriately, ensuring no gaps remained in numeric or categorical variables. This enhanced the dataset's usability for analysis.

3. **Readiness for Analysis**  
   - The cleaned dataset is now balanced and free of unnecessary variables, making it suitable for exploratory data analysis (EDA) and further processing like modeling or visualization.

These manipulations ensured the dataset was prepped for deriving meaningful insights and making informed business decisions.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Yearly Trend of Events

In [None]:
plt.figure(figsize=(12, 6))
sns.countplot(data=gtds_df, x='iyear', palette='viridis', hue='iyear', legend=False)
plt.title('Yearly Trend of Events')
plt.xticks(rotation=90)
plt.xlabel('Year')
plt.ylabel('Number of Events')
plt.show()

##### 1. Why did you pick the specific chart?

This chart provides a clear visual representation of the historical trend in the number of events over time, allowing us to identify patterns and potential growth areas.

##### 2. What is/are the insight(s) found from the chart?

- Upward Trend: There's a general upward trend in the number of events over the years, indicating increasing interest or demand.
- Accelerated Growth: The growth rate seems to have accelerated in recent years, suggesting a potential surge in popularity or market expansion.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Plan for Growth: Anticipate increased demand and allocate resources accordingly.
- Identify Opportunities: Explore new markets or event types to capitalize on the growing interest.
- Optimize Operations: Improve efficiency to handle the increasing number of events.
> Negative Growth Insights:
While there's no explicit negative growth period shown, potential negative impacts could arise from:

- Market Saturation: If the market becomes oversaturated with events, competition could intensify, leading to decreased profits or market share.
- Economic Downturns: Economic recessions or financial instability can reduce disposable income, leading to fewer event attendees and lower revenue.
- Unexpected Disruptions: External factors like natural disasters or global crises can disrupt event plans and negatively impact the industry.

#### Chart - 2 Monthly Distribution of Events

In [None]:
plt.figure(figsize=(8, 5))
sns.countplot(data=gtds_df, x='imonth', palette='coolwarm', hue='imonth', legend=False)
plt.title('Monthly Distribution of Events')
plt.xlabel('Month')
plt.ylabel('Number of Events')
plt.show()

##### 1. Why did you pick the specific chart?

This chart illustrates the distribution of events across different months of the year, helping us identify seasonal trends and peak periods.

##### 2. What is/are the insight(s) found from the chart?

- Seasonal Variation: There's a clear seasonal pattern, with a peak in the summer months (around June and July) and a dip in the winter months (around December and January).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Understanding this seasonal pattern can help businesses:

- Staffing and Resource Allocation: Adjust staffing levels and resource allocation to match peak and off-peak periods.
- Marketing and Promotions: Target marketing campaigns and promotional offers to specific months to maximize impact.
- Inventory Management: Optimize inventory levels to meet fluctuating demand.
2. Negative Growth Insights:

- Off-Peak Months: The lower number of events during off-peak months could lead to decreased revenue and potential underutilization of resources.
- Competition: If competitors also target peak months with aggressive marketing and promotions, it could lead to increased competition and reduced market share.

#### Chart - 3 Regional Distribution of Events

In [None]:
plt.figure(figsize=(10, 7))
gtds_df['region_txt'].value_counts().plot(kind='pie', autopct='%1.1f%%', colors=sns.color_palette('tab10'))
plt.title('Regional Distribution of Events')
plt.ylabel('')  # Hides y-label
plt.show()

##### 1. Why did you pick the specific chart?

This chart provides a visual representation of the geographical distribution of events across different regions, allowing us to identify key markets and opportunities.

##### 2. What is/are the insight(s) found from the chart?

- Dominant Regions: The Middle East & North Africa and South Asia appear to be the dominant regions for events, accounting for nearly half of the total share.
- Diverse Distribution: Events are distributed across various regions, indicating a global reach.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Understanding the regional distribution can help businesses:

- Market Targeting: Focus marketing efforts and resources on high-potential regions like the Middle East & North Africa and South Asia.
- Localization: Adapt products and services to cater to the specific needs and preferences of different regions.
- Global Expansion: Identify opportunities for expansion into emerging markets with high growth potential.
2. Negative Growth Insights:

- Regional Disparities: The uneven distribution of events across regions could lead to under-representation of certain markets, potentially missing out on growth opportunities.
- Economic and Political Factors: Economic instability or political unrest in certain regions can disrupt event planning and negatively impact the industry.

#### Chart - 4 Top 10 Affected Countries

In [None]:
plt.figure(figsize=(10, 6))
top_countries = gtds_df['country_txt'].value_counts().head(10)
sns.barplot(x=top_countries.values, y=top_countries.index, palette='Set2', hue=top_countries.index, legend=False)
plt.title('Top 10 Affected Countries')
plt.xlabel('Number of Events')
plt.ylabel('Countries')
plt.show()

##### 1. Why did you pick the specific chart?

This chart provides a ranking of the top 10 countries based on the number of events, allowing us to identify the most affected regions.

##### 2. What is/are the insight(s) found from the chart?

- Iraq as Top Affected Country: Iraq is the country with the highest number of events, indicating a significant level of impact.
- Concentration in Certain Regions: The top 10 list is dominated by countries from specific regions, suggesting potential regional trends or factors.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Understanding the top affected countries can help businesses:

- Risk Assessment: Identify countries with higher risk and implement appropriate mitigation strategies.
- Resource Allocation: Prioritize support and resources for affected regions.
- Crisis Management: Develop contingency plans to address potential disruptions caused by events in these countries.

2. Negative Growth Insights:

- Operational Disruptions: Events in these top 10 countries can lead to disruptions in supply chains, transportation, and other operations, negatively impacting business activities.
- Reputational Damage: Association with negatively impacted regions can damage a company's reputation and customer trust.
- Financial Losses: Events can lead to financial losses due to property damage, business interruption, and increased insurance costs.

#### Chart - 5 Most Common Attack Types

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(data=gtds_df, y='attacktype1_txt', order=gtds_df['attacktype1_txt'].value_counts().index, palette='Set1', hue='attacktype1_txt', legend=False)
plt.title('Most Common Attack Types')
plt.xlabel('Number of Events')
plt.ylabel('Attack Type')
plt.show()

##### 1. Why did you pick the specific chart?

This chart provides a visual representation of the frequency of different attack types, allowing us to identify the most prevalent threats.

##### 2. What is/are the insight(s) found from the chart?

- Dominance of Bombing/Explosion: Bombing/Explosion is the most common type of attack, accounting for a significant portion of the total events.
- Armed Assault as Second Most Common: Armed assault is the second most frequent type of attack.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Understanding the most common attack types can help businesses:

- Security Planning: Prioritize security measures to mitigate the risks associated with the most prevalent attack types.
- Crisis Management: Develop specific response plans for different types of attacks.
- Employee Training: Train employees on how to respond to various threats and emergencies.


2. Negative Growth Insights:

- Increased Security Costs: Implementing robust security measures to counter the most common attack types can increase operational costs.
- Negative Publicity: High-profile attacks can damage a company's reputation and customer trust.
- Disruption of Operations: Attacks can disrupt business operations, leading to financial losses and decreased productivity.

#### Chart - 6 Target Types vs. Number of Events

In [None]:
plt.figure(figsize=(12, 8))
sns.countplot(data=gtds_df, y='targtype1_txt', order=gtds_df['targtype1_txt'].value_counts().index, palette='magma', hue='targtype1_txt', legend=False)
plt.title('Target Types vs. Number of Events')
plt.xlabel('Number of Events')
plt.ylabel('Target Type')
plt.show()

##### 1. Why did you pick the specific chart?

This chart provides a visual representation of the frequency of different target types, allowing us to identify the most targeted sectors.

##### 2. What is/are the insight(s) found from the chart?

- Private Citizens & Property as Primary Target: Private citizens and property are the most frequent targets of attacks.
- Government as Significant Target: Government entities, including general government and diplomatic targets, are also frequently targeted.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Understanding the most targeted sectors can help businesses:

- Risk Assessment: Identify sectors with higher risk and implement specific security measures.
- Crisis Management: Develop tailored response plans for different types of targets.
- Insurance Coverage: Review insurance policies to ensure adequate coverage for potential losses.


2. Negative Growth Insights:

- Increased Security Costs: Implementing robust security measures to protect against attacks can increase operational costs.
- Disruption of Operations: Attacks on critical infrastructure like utilities or transportation can disrupt business operations.
- Loss of Customer Confidence: Attacks on businesses or tourist destinations can damage brand reputation and deter customers.

#### Chart - 7 Geographic Distribution of Events

In [None]:
# Create a scatter mapbox plot
fig = px.scatter_mapbox(gtds_df, lat='latitude', lon='longitude', color_discrete_sequence=["red"], title="Geographic Distribution of Events")

# Set map style and layout
fig.update_layout(mapbox_style="open-street-map", mapbox_zoom=2)

# Show the plot
fig.show()

##### 1. Why did you pick the specific chart?

This chart provides a visual representation of the geographical distribution of events across the globe, allowing us to identify hotspots and patterns.

##### 2. What is/are the insight(s) found from the chart?

- Concentration in Certain Regions: The map highlights clusters of events in specific regions, indicating areas with higher levels of activity.
- Global Spread: Events are scattered across the globe, demonstrating a widespread occurrence.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Understanding the geographic distribution can help businesses:

- Risk Assessment: Identify regions with higher risk and adjust operations accordingly.
- Business Continuity Planning: Develop contingency plans to address potential disruptions in affected areas.
- Global Operations: Make informed decisions about expanding into or operating in certain regions.


2. Negative Growth Insights:

- Operational Disruptions: Events in key regions can disrupt supply chains, transportation, and other operations, negatively impacting business activities.
- Reputational Damage: Association with negatively impacted regions can damage a company's reputation and customer trust.
- Financial Losses: Events can lead to financial losses due to property damage, business interruption, and increased insurance costs.

#### Chart - 8 Number of Suicidal Attacks by Region

In [None]:
plt.figure(figsize=(12, 6))
suicidal_attacks = gtds_df[gtds_df['suicide'] == 1]['region_txt'].value_counts()

# Set hue to the same variable as y (i.e., 'region_txt') to avoid the warning
sns.barplot(x=suicidal_attacks.values, y=suicidal_attacks.index, palette='coolwarm', hue=suicidal_attacks.index, legend=False)

plt.title('Number of Suicidal Attacks by Region')
plt.xlabel('Number of Events')
plt.ylabel('Region')
plt.show()

##### 1. Why did you pick the specific chart?

This chart provides a visual representation of the distribution of suicidal attacks across different regions, allowing us to identify the most affected areas.

##### 2. What is/are the insight(s) found from the chart?

- Middle East & North Africa as Top Region: The Middle East & North Africa region has the highest number of suicidal attacks.
- South Asia as Second Most Affected: South Asia follows closely behind as the second most affected region.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Understanding the regional distribution of suicidal attacks can help businesses:

- Risk Assessment: Identify regions with higher risk and implement specific security measures.
- Crisis Management: Develop contingency plans to address potential threats and disruptions.
- Employee Safety: Provide training and resources to employees working in high-risk regions.


2. Negative Growth Insights:

- Operational Disruptions: Suicidal attacks in key regions can disrupt supply chains, transportation, and other operations, negatively impacting business activities.
- Reputational Damage: Association with negatively impacted regions can damage a company's reputaions and effect foundations negativly.

#### Chart - 9 Trends in Ransom Demands Over Time

In [None]:
# Filter out non-positive ransom values for trend analysis
gtds_df_cleaned = gtds_df[gtds_df['ransom'] > 0]
ransom_trend = gtds_df_cleaned.groupby('iyear')['ransom'].sum()
plt.plot(ransom_trend.index, ransom_trend.values, marker='o', color='blue')
plt.title('Trends in Ransom Demands Over Time')
plt.xlabel('Year')
plt.ylabel('Total Ransom Demands')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

This chart provides a visual representation of the historical trend in ransom demands over time, allowing us to identify patterns and potential growth areas.

##### 2. What is/are the insight(s) found from the chart?

- Fluctuating Trend: The trend in ransom demands is not consistently increasing. It shows periods of high demand followed by periods of low demand.
- Recent Spike: There has been a significant spike in ransom demands in recent years.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Understanding this trend can help businesses:

- Risk Assessment: Identify periods of higher risk and implement additional security measures.
- Insurance: Review insurance coverage to ensure adequate protection against ransom demands.
- Incident Response: Develop a comprehensive incident response plan to minimize the impact of a ransomware attack.


2. Negative Growth Insights:

- Financial Losses: Successful ransomware attacks can lead to significant financial losses due to data recovery costs, business disruption, and reputational damage.
- Operational Disruptions: Ransomware attacks can disrupt business operations, leading to delays, productivity losses, and customer dissatisfaction.
- Data Breach: In some cases, ransomware attacks can lead to data breaches, exposing sensitive information and potentially resulting in regulatory fines and legal liabilities.

#### Chart - 10 Event Source Distribution

In [None]:
plt.figure(figsize=(10, 6))
# Get the top 10 sources
dbsource_counts = gtds_df['dbsource'].value_counts(normalize=True).head(10) * 100
# Create the barplot with `hue=y` and `legend=False`
sns.barplot(x=dbsource_counts.values, y=dbsource_counts.index, palette='pastel', hue=dbsource_counts.index, legend=False)
plt.title('Top 10 Event Sources Distribution (in %)')
plt.xlabel('Percentage of Events')
plt.ylabel('Source')
plt.show()

##### 1. Why did you pick the specific chart?

This chart provides a visual representation of the distribution of events across different sources, allowing us to identify the primary contributors to the data.

##### 2. What is/are the insight(s) found from the chart?

- Dominance of START Primary Collection: The START Primary Collection is the primary source of events, contributing a significant majority.
- PGIS as Second Largest Source: PGIS is the second largest source of events.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Understanding the event source distribution can help businesses:

- Data Reliability: Assess the reliability and accuracy of the data by considering the source.
- Data Completeness: Identify potential gaps in data coverage from certain sources.
- Data Integration: Integrate data from multiple sources to gain a more comprehensive understanding of the events.


2. Negative Growth Insights:

- Data Bias: Reliance on a single source can introduce bias into the analysis, potentially leading to inaccurate conclusions.
- Data Quality: The quality of data from different sources may vary, affecting the overall analysis.
- Data Privacy and Security: Using data from multiple sources raises concerns about data privacy and security, requiring careful handling and protection.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis of the provided charts and graphs, the client can achieve their business objectives by leveraging the following insights:

**1. Identifying Key Regions and Target Markets:**
   * **Prioritize High-Risk Regions:** Focus on regions with a high concentration of events, particularly those with a high incidence of specific attack types or targets.
   * **Target Emerging Markets:** Identify emerging markets with high growth potential but lower risk profiles.
   * **Adapt to Local Conditions:** Tailor business strategies and security measures to specific regional contexts.

**2. Understanding Seasonal Trends and Peak Periods:**
   * **Optimize Resource Allocation:** Adjust staffing and resource allocation to match peak and off-peak periods.
   * **Time Marketing Campaigns:** Target marketing campaigns and promotional offers to specific months to maximize impact.
   * **Inventory Management:** Optimize inventory levels to meet fluctuating demand.

**3. Monitoring and Responding to Emerging Threats:**
   * **Track Ransomware Trends:** Stay updated on the latest ransomware trends and implement preventive measures.
   * **Develop Incident Response Plans:** Create comprehensive incident response plans to minimize the impact of attacks.
   * **Invest in Cybersecurity:** Prioritize cybersecurity investments to protect against cyber threats.

**4. Leveraging Data for Informed Decision Making:**
   * **Data Quality and Integration:** Ensure data quality and integrate data from multiple sources to gain a comprehensive view.
   * **Data-Driven Insights:** Use data analytics to identify patterns, trends, and actionable insights.
   * **Continuous Monitoring and Evaluation:** Regularly monitor and evaluate the effectiveness of security measures and business strategies.

By effectively addressing these areas, the client can enhance their business resilience, mitigate risks, and achieve their long-term objectives.


# **Conclusion**

In conclusion, the analysis of the provided charts and graphs has revealed valuable insights into the global landscape of security threats and events. By understanding the geographical distribution, temporal patterns, and specific attack types, organizations can make informed decisions to enhance their security posture and mitigate potential risks.

Key takeaways from the analysis include the identification of high-risk regions, the importance of considering seasonal variations, and the need for robust cybersecurity measures to combat emerging threats like ransomware. By leveraging these insights and implementing proactive security strategies, organizations can safeguard their assets, protect their employees, and ensure business continuity in an increasingly complex and volatile world.
