# parse_data.ipynb

This notebook parses the data files used for the FP-2 assignment. 

<br>
<br>

First let's read the attached data file:

In [37]:
import pandas as pd

df0 = pd.read_csv('climate_factors_and_disasters.csv')

df0.describe()

Unnamed: 0,Year,avg_temperature,Average precipitation in depth (mm per year),Carbon dioxide (CO2) emissions excluding LULUCF per capita (t CO2e/capita),Forest area (% of land area),Renewable energy consumption (% of total final energy consumption),deaths_drought,injured_drought,total_damages_drought,deaths_flood,...,total_damages_flood,deaths_storm,injured_storm,total_damages_storm,deaths_wildfire,injured_wildfire,total_damages_wildfire,deaths_temperature,injured_temperature,total_damages_temperature
count,480.0,440.0,440.0,460.0,460.0,460.0,101.0,101.0,101.0,438.0,...,438.0,297.0,297.0,297.0,91.0,91.0,91.0,154.0,154.0,154.0
mean,2009.5,18.361273,1310.163636,4.650892,30.647219,31.512913,213.19802,0.633663,2094242.0,409.780822,...,2206292.0,839.43771,2070.525253,7493842.0,24.186813,151.274725,1677343.0,1502.909091,26450.05,535396.9
std,5.772297,8.235418,813.383146,5.666606,18.650784,22.636024,1989.574286,4.480453,4272108.0,1219.832328,...,7335758.0,8228.135098,9271.537386,26385140.0,47.484012,385.991568,4130856.0,7847.514594,204357.8,2687169.0
min,2000.0,-4.87,327.0,0.037898,1.852782,3.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2004.75,13.335,645.0,0.760537,14.64803,11.1,0.0,0.0,0.0,8.0,...,0.0,6.0,0.0,0.0,0.0,0.0,0.0,14.0,0.0,0.0
50%,2009.5,21.495,975.0,1.859517,31.225458,28.5,0.0,0.0,200000.0,41.0,...,18000.0,40.0,60.0,175000.0,4.0,3.0,100000.0,61.0,0.0,0.0
75%,2014.25,25.2425,1738.0,5.876323,40.863619,45.225,0.0,0.0,2910000.0,186.75,...,724000.0,169.0,400.0,2015000.0,20.0,124.5,1507500.0,293.75,200.0,0.0
max,2019.0,27.7,3240.0,21.012618,68.493827,91.3,20000.0,32.0,25481200.0,9819.0,...,70759050.0,140985.0,92028.0,272939000.0,221.0,2292.0,22802000.0,74698.0,1800413.0,21940000.0


<br>
<br>

The dependent and independent variables (DVs and IVs) that we are interested in are:

**DVs**:
- Number of deaths from natural disasters  
  (e.g., deaths caused by floods, droughts, and storms)

**IVs**:
- Average temperature (`avg_temperature` column in the CSV file)  
- Average precipitation (`Average precipitation in depth (mm per year)` column in the CSV file)  
- Carbon dioxide emissions per capita (`CO2` column in the CSV file)  
- Forest area (% of land area) (`Forest area` column in the CSV file)
- Renewable energy consumption (% of total final energy consumption) (`Renewable energy consumption (% of total final energy consumption)` column in the CSV file)



<br>
<br>

Let's extract the relevant columns:


In [38]:
df = df0[
    [
        "avg_temperature",
        "Average precipitation in depth (mm per year)",
        "Carbon dioxide (CO2) emissions excluding LULUCF per capita (t CO2e/capita)",
        "Forest area (% of land area)",
        "Renewable energy consumption (% of total final energy consumption)",
        "deaths_flood",
        "deaths_drought",
        "deaths_storm"
    ]
]

df.describe()


Unnamed: 0,avg_temperature,Average precipitation in depth (mm per year),Carbon dioxide (CO2) emissions excluding LULUCF per capita (t CO2e/capita),Forest area (% of land area),Renewable energy consumption (% of total final energy consumption),deaths_flood,deaths_drought,deaths_storm
count,440.0,440.0,460.0,460.0,460.0,438.0,101.0,297.0
mean,18.361273,1310.163636,4.650892,30.647219,31.512913,409.780822,213.19802,839.43771
std,8.235418,813.383146,5.666606,18.650784,22.636024,1219.832328,1989.574286,8228.135098
min,-4.87,327.0,0.037898,1.852782,3.5,0.0,0.0,0.0
25%,13.335,645.0,0.760537,14.64803,11.1,8.0,0.0,6.0
50%,21.495,975.0,1.859517,31.225458,28.5,41.0,0.0,40.0
75%,25.2425,1738.0,5.876323,40.863619,45.225,186.75,0.0,169.0
max,27.7,3240.0,21.012618,68.493827,91.3,9819.0,20000.0,140985.0


<br>
<br>

Next let's use the `rename` function to give the columns simpler variable names:

In [39]:
df = df.rename(
    columns={
        "avg_temperature": "temperature",
        "Average precipitation in depth (mm per year)": "precipitation",
        "Carbon dioxide (CO2) emissions excluding LULUCF per capita (t CO2e/capita)": "co2_per_capita",
        "Forest area (% of land area)": "forest_area",
        "Renewable energy consumption (% of total final energy consumption)": "renewable_energy",
        "deaths_flood": "flood_deaths",
        "deaths_drought": "drought_deaths",
        "deaths_storm": "storm_deaths"
    }
)

df.describe()

Unnamed: 0,temperature,precipitation,co2_per_capita,forest_area,renewable_energy,flood_deaths,drought_deaths,storm_deaths
count,440.0,440.0,460.0,460.0,460.0,438.0,101.0,297.0
mean,18.361273,1310.163636,4.650892,30.647219,31.512913,409.780822,213.19802,839.43771
std,8.235418,813.383146,5.666606,18.650784,22.636024,1219.832328,1989.574286,8228.135098
min,-4.87,327.0,0.037898,1.852782,3.5,0.0,0.0,0.0
25%,13.335,645.0,0.760537,14.64803,11.1,8.0,0.0,6.0
50%,21.495,975.0,1.859517,31.225458,28.5,41.0,0.0,40.0
75%,25.2425,1738.0,5.876323,40.863619,45.225,186.75,0.0,169.0
max,27.7,3240.0,21.012618,68.493827,91.3,9819.0,20000.0,140985.0
