# Climate Events Dataset - Column Descriptions

| Column | Description |
|--------|-------------|
| **event_id** | Unique identifier for each climate event |
| **date** | Date of the event occurrence (YYYY-MM-DD format) |
| **year** | Year when the event occurred |
| **month** | Month when the event occurred |
| **country** | Country where the climate event took place |
| **event_type** | Type of climate event (Tsunami, Hurricane, Drought, Heatwave, Wildfire, Earthquake, Cold Wave, Flood) |
| **severity** | Severity rating of the event on a scale of 1-10 |
| **duration_days** | Duration of the event measured in days |
| **affected_population** | Total number of people affected by the event |
| **deaths** | Number of fatalities caused by the event |
| **injuries** | Number of people injured during the event |
| **economic_impact_million_usd** | Economic impact of the event in millions of USD |
| **infrastructure_damage_score** | Infrastructure damage assessment score (0-100 scale) |
| **response_time_hours** | Emergency response time measured in hours |
| **international_aid_million_usd** | Amount of international aid received in millions of USD |
| **latitude** | Latitude coordinate of the event location |
| **longitude** | Longitude coordinate of the event location |
| **total_casualties** | Combined total of deaths and injuries |
| **impact_per_capita** | Economic impact per affected person |
| **aid_percentage** | Percentage of economic impact covered by international aid |

In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [4]:
df = pd.read_csv(r'../data/global_climate_events_economic_impact_2020_2025.csv')

In [5]:
df.head()

Unnamed: 0,event_id,date,year,month,country,event_type,severity,duration_days,affected_population,deaths,injuries,economic_impact_million_usd,infrastructure_damage_score,response_time_hours,international_aid_million_usd,latitude,longitude,total_casualties,impact_per_capita,aid_percentage
0,EV01539,2020-01-01,2020,1,Japan,Tsunami,1,1,420956,0,2,0.01,4.9,11,0.0,85.4321,138.7206,2,0.02,0.0
1,EV02303,2020-01-01,2020,1,Qatar,Hurricane,1,4,3276,1,10,0.0,3.4,5,0.0,-32.037,14.0111,11,0.0,0.0
2,EV01796,2020-01-02,2020,1,Canada,Drought,3,6,120382,0,9,0.1,8.9,10,0.0,78.4213,-112.7556,9,0.83,0.0
3,EV00175,2020-01-02,2020,1,Poland,Heatwave,6,16,185527,2,37,1.27,17.8,7,0.0,73.6564,115.065,39,6.85,0.0
4,EV01115,2020-01-03,2020,1,UAE,Wildfire,4,16,176642,2,27,2.01,18.7,17,0.0,52.6458,101.5023,29,11.38,0.0


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 20 columns):
 #   Column                         Non-Null Count  Dtype         
---  ------                         --------------  -----         
 0   event_id                       3000 non-null   object        
 1   date                           3000 non-null   datetime64[ns]
 2   year                           3000 non-null   datetime64[ns]
 3   month                          3000 non-null   datetime64[ns]
 4   country                        3000 non-null   object        
 5   event_type                     3000 non-null   object        
 6   severity                       3000 non-null   int64         
 7   duration_days                  3000 non-null   int64         
 8   affected_population            3000 non-null   int64         
 9   deaths                         3000 non-null   int64         
 10  injuries                       3000 non-null   int64         
 11  economic_impact_m

In [8]:
#Change to datetime column date
df['date'] = pd.to_datetime(df['date'])
#Change to datetime column year
df['year'] = pd.to_datetime(df['year'], format='%Y')
#Change to datetime column month
df['month'] = pd.to_datetime(df['month'], format='%m')

In [10]:
df.isnull().sum() / len(df) * 100

event_id                         0.0
date                             0.0
year                             0.0
month                            0.0
country                          0.0
event_type                       0.0
severity                         0.0
duration_days                    0.0
affected_population              0.0
deaths                           0.0
injuries                         0.0
economic_impact_million_usd      0.0
infrastructure_damage_score      0.0
response_time_hours              0.0
international_aid_million_usd    0.0
latitude                         0.0
longitude                        0.0
total_casualties                 0.0
impact_per_capita                0.0
aid_percentage                   0.0
dtype: float64

In [11]:
df.duplicated().sum()

np.int64(0)

In [12]:
#event_type_encoding as a new column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['event_type_encoding'] = le.fit_transform(df['event_type'])
df.head()

Unnamed: 0,event_id,date,year,month,country,event_type,severity,duration_days,affected_population,deaths,...,economic_impact_million_usd,infrastructure_damage_score,response_time_hours,international_aid_million_usd,latitude,longitude,total_casualties,impact_per_capita,aid_percentage,event_type_encoding
0,EV01539,2020-01-01,2020-01-01,1900-01-01,Japan,Tsunami,1,1,420956,0,...,0.01,4.9,11,0.0,85.4321,138.7206,2,0.02,0.0,9
1,EV02303,2020-01-01,2020-01-01,1900-01-01,Qatar,Hurricane,1,4,3276,1,...,0.0,3.4,5,0.0,-32.037,14.0111,11,0.0,0.0,6
2,EV01796,2020-01-02,2020-01-01,1900-01-01,Canada,Drought,3,6,120382,0,...,0.1,8.9,10,0.0,78.4213,-112.7556,9,0.83,0.0,1
3,EV00175,2020-01-02,2020-01-01,1900-01-01,Poland,Heatwave,6,16,185527,2,...,1.27,17.8,7,0.0,73.6564,115.065,39,6.85,0.0,5
4,EV01115,2020-01-03,2020-01-01,1900-01-01,UAE,Wildfire,4,16,176642,2,...,2.01,18.7,17,0.0,52.6458,101.5023,29,11.38,0.0,11


In [13]:
#country_encoding as a new column
df['country_encoding'] = le.fit_transform(df['country'])
df.head()


Unnamed: 0,event_id,date,year,month,country,event_type,severity,duration_days,affected_population,deaths,...,infrastructure_damage_score,response_time_hours,international_aid_million_usd,latitude,longitude,total_casualties,impact_per_capita,aid_percentage,event_type_encoding,country_encoding
0,EV01539,2020-01-01,2020-01-01,1900-01-01,Japan,Tsunami,1,1,420956,0,...,4.9,11,0.0,85.4321,138.7206,2,0.02,0.0,9,24
1,EV02303,2020-01-01,2020-01-01,1900-01-01,Qatar,Hurricane,1,4,3276,1,...,3.4,5,0.0,-32.037,14.0111,11,0.0,0.0,6,36
2,EV01796,2020-01-02,2020-01-01,1900-01-01,Canada,Drought,3,6,120382,0,...,8.9,10,0.0,78.4213,-112.7556,9,0.83,0.0,1,6
3,EV00175,2020-01-02,2020-01-01,1900-01-01,Poland,Heatwave,6,16,185527,2,...,17.8,7,0.0,73.6564,115.065,39,6.85,0.0,5,34
4,EV01115,2020-01-03,2020-01-01,1900-01-01,UAE,Wildfire,4,16,176642,2,...,18.7,17,0.0,52.6458,101.5023,29,11.38,0.0,11,47


In [None]:
df.to_csv(r'../data/climate_events_prepared.csv', index=False)