# Brazil_ETL

Here we are extracting monthly burn data for the area of the State of Amazonas, Brazil.<br>
<br>
We will be looking at a five year span from 2012 to 2016.  The goal is to try and see <br>
if there is an increase in the amount of forest fires in the region and if the demand <br>
for soybean and corn is directly linked to it. 

### Data files

We will be using soybean and corn data gathered from Kaggle.com and burn data gathered from <br>
the Global Fire Emissions Database (GFED).  Data from GFED are all stored in HDF5 files and <br>
requires some drilling down to get to the tables we need -- the monthly burned areas.  


### Import Libraries



In [1]:
import h5py
import pandas as pd
import numpy as np
import tables
from sqlalchemy import create_engine


### Connect to the database

In [2]:
engine = create_engine('postgresql://sqladmin:password@localhost:5432/brazil_etl')

### Define dthe data_get function 

This will grab the HDF5 file and go down to the burned_area data set. 

In [3]:
def data_get(HDF5_file):
    hdf = h5py.File(HDF5_file, "r")
    return hdf['burned_area']
    

### Call the data_get function

In [4]:
ba_2016 = data_get("Resources\GFED4_1s_2016.hdf5")
ba_2015 = data_get("Resources\GFED4_1s_2015.hdf5")
ba_2014 = data_get("Resources\GFED4_1s_2014.hdf5")
ba_2013 = data_get("Resources\GFED4_1s_2013.hdf5")
ba_2012 = data_get("Resources\GFED4_1s_2012.hdf5")


### Define the function to drill into the HDF5 file

The table we need resides in burned fraction under each month of the year. <br>
We will extract this data and put it to a dataframe.<br>
We then filter the data frame from column 426 to 494  and from row 352 to 397.<br>
The table corresponds to the lat, long coordinates for the rectangular area that <br>
is roughly the area of the State of Amazonas, Brazil. 

In [5]:
def drill_down(ba_YYYY,mnth):
    ba_mnth = ba_YYYY[mnth]
    bf_mnth = ba_mnth['burned_fraction']
    bf_mnth_df = pd.DataFrame(bf_mnth)
    return bf_mnth_df.iloc[352:397,426:494]

    

###  Calling the drill_down function 
We call the function for each month starting with January 2016. <br>
We then feed the dataframes to the brazil_etl database in postgresql<br>
Where we will be doing the calculations to find the percentage of area burned per month, year, and half decade


#### 2016

In [6]:
bf_2016_01_df = drill_down(ba_2016,'01')
bf_2016_01_df.to_sql('bf_2016_01', engine)

bf_2016_02_df = drill_down(ba_2016,'02')
bf_2016_02_df.to_sql('bf_2016_02', engine)

bf_2016_03_df = drill_down(ba_2016,'03')
bf_2016_03_df.to_sql('bf_2016_03', engine)

bf_2016_04_df = drill_down(ba_2016,'04')
bf_2016_04_df.to_sql('bf_2016_04', engine)

bf_2016_05_df = drill_down(ba_2016,'05')
bf_2016_05_df.to_sql('bf_2016_05', engine)

bf_2016_06_df = drill_down(ba_2016,'06')
bf_2016_06_df.to_sql('bf_2016_06', engine)

bf_2016_08_df = drill_down(ba_2016,'08')
bf_2016_08_df.to_sql('bf_2016_08', engine)

bf_2016_09_df = drill_down(ba_2016,'09')
bf_2016_09_df.to_sql('bf_2016_09', engine)

bf_2016_10_df = drill_down(ba_2016,'10')
bf_2016_10_df.to_sql('bf_2016_10', engine)

bf_2016_11_df = drill_down(ba_2016,'11')
bf_2016_11_df.to_sql('bf_2016_11', engine)

bf_2016_12_df = drill_down(ba_2016,'12')
bf_2016_12_df.to_sql('bf_2016_12', engine)




#### 2015

In [7]:
bf_2015_01_df = drill_down(ba_2015,'01')
bf_2015_01_df.to_sql('bf_2015_01', engine)

bf_2015_02_df = drill_down(ba_2015,'02')
bf_2015_02_df.to_sql('bf_2015_02', engine)

bf_2015_03_df = drill_down(ba_2015,'03')
bf_2015_03_df.to_sql('bf_2015_03', engine)

bf_2015_04_df = drill_down(ba_2015,'04')
bf_2015_04_df.to_sql('bf_2015_04', engine)

bf_2015_05_df = drill_down(ba_2015,'05')
bf_2015_05_df.to_sql('bf_2015_05', engine)

bf_2015_06_df = drill_down(ba_2015,'06')
bf_2015_06_df.to_sql('bf_2015_06', engine)

bf_2015_08_df = drill_down(ba_2015,'08')
bf_2015_08_df.to_sql('bf_2015_08', engine)

bf_2015_09_df = drill_down(ba_2015,'09')
bf_2015_09_df.to_sql('bf_2015_09', engine)

bf_2015_10_df = drill_down(ba_2015,'10')
bf_2015_10_df.to_sql('bf_2015_10', engine)

bf_2015_11_df = drill_down(ba_2015,'11')
bf_2015_11_df.to_sql('bf_2015_11', engine)

bf_2015_12_df = drill_down(ba_2015,'12')
bf_2015_12_df.to_sql('bf_2015_12', engine)


#### 2014

In [8]:
bf_2014_01_df = drill_down(ba_2014,'01')
bf_2014_01_df.to_sql('bf_2014_01', engine)

bf_2014_02_df = drill_down(ba_2014,'02')
bf_2014_02_df.to_sql('bf_2014_02', engine)

bf_2014_03_df = drill_down(ba_2014,'03')
bf_2014_03_df.to_sql('bf_2014_03', engine)

bf_2014_04_df = drill_down(ba_2014,'04')
bf_2014_04_df.to_sql('bf_2014_04', engine)

bf_2014_05_df = drill_down(ba_2014,'05')
bf_2014_05_df.to_sql('bf_2014_05', engine)

bf_2014_06_df = drill_down(ba_2014,'06')
bf_2014_06_df.to_sql('bf_2014_06', engine)

bf_2014_08_df = drill_down(ba_2014,'08')
bf_2014_08_df.to_sql('bf_2014_08', engine)

bf_2014_09_df = drill_down(ba_2014,'09')
bf_2014_09_df.to_sql('bf_2014_09', engine)

bf_2014_10_df = drill_down(ba_2014,'10')
bf_2014_10_df.to_sql('bf_2014_10', engine)

bf_2014_11_df = drill_down(ba_2014,'11')
bf_2014_11_df.to_sql('bf_2014_11', engine)

bf_2014_12_df = drill_down(ba_2014,'12')
bf_2014_12_df.to_sql('bf_2014_12', engine)


#### 2013

In [9]:
bf_2013_01_df = drill_down(ba_2013,'01')
bf_2013_01_df.to_sql('bf_2013_01', engine)

bf_2013_02_df = drill_down(ba_2013,'02')
bf_2013_02_df.to_sql('bf_2013_02', engine)

bf_2013_03_df = drill_down(ba_2013,'03')
bf_2013_03_df.to_sql('bf_2013_03', engine)

bf_2013_04_df = drill_down(ba_2013,'04')
bf_2013_04_df.to_sql('bf_2013_04', engine)

bf_2013_05_df = drill_down(ba_2013,'05')
bf_2013_05_df.to_sql('bf_2013_05', engine)

bf_2013_06_df = drill_down(ba_2013,'06')
bf_2013_06_df.to_sql('bf_2013_06', engine)

bf_2013_08_df = drill_down(ba_2013,'08')
bf_2013_08_df.to_sql('bf_2013_08', engine)

bf_2013_09_df = drill_down(ba_2013,'09')
bf_2013_09_df.to_sql('bf_2013_09', engine)

bf_2013_10_df = drill_down(ba_2013,'10')
bf_2013_10_df.to_sql('bf_2013_10', engine)

bf_2013_11_df = drill_down(ba_2013,'11')
bf_2013_11_df.to_sql('bf_2013_11', engine)

bf_2013_12_df = drill_down(ba_2013,'12')
bf_2013_12_df.to_sql('bf_2013_12', engine)


#### 2012

In [10]:
bf_2012_01_df = drill_down(ba_2012,'01')
bf_2012_01_df.to_sql('bf_2012_01', engine)

bf_2012_02_df = drill_down(ba_2012,'02')
bf_2012_02_df.to_sql('bf_2012_02', engine)

bf_2012_03_df = drill_down(ba_2012,'03')
bf_2012_03_df.to_sql('bf_2012_03', engine)

bf_2012_04_df = drill_down(ba_2012,'04')
bf_2012_04_df.to_sql('bf_2012_04', engine)

bf_2012_05_df = drill_down(ba_2012,'05')
bf_2012_05_df.to_sql('bf_2012_05', engine)

bf_2012_06_df = drill_down(ba_2012,'06')
bf_2012_06_df.to_sql('bf_2012_06', engine)

bf_2012_08_df = drill_down(ba_2012,'08')
bf_2012_08_df.to_sql('bf_2012_08', engine)

bf_2012_09_df = drill_down(ba_2012,'09')
bf_2012_09_df.to_sql('bf_2012_09', engine)

bf_2012_10_df = drill_down(ba_2012,'10')
bf_2012_10_df.to_sql('bf_2012_10', engine)

bf_2012_11_df = drill_down(ba_2012,'11')
bf_2012_11_df.to_sql('bf_2012_11', engine)

bf_2012_12_df = drill_down(ba_2012,'12')
bf_2012_12_df.to_sql('bf_2012_12', engine)


In [11]:
# ba_sum = ba2016_01.values.sum()

In [12]:
# ba_sum/3060
