# Data Tech Challenge notebook is divided into the following parts.

## Table of Contents:

    1. Definition of Hypothesis
    2. Datasets
    3. Study Area selection
    4. Data Preparation and initial Visualization of the data
    5. Data Analysis for Hypothesis 1
    6. Data Analysis for Hypothesis 2
    7. Conclusion
        
        

# 1. Definition of Hypothesis:

Forest fires, a catastrophe, causes severe effects for the environment due to extensive Carbon Emissions which in turn leads to global warming (rise in temperature over a period of time on earth surface).

Research shows that the natural forest fires are more probable when there is increase in temperature, decrease in precipitation (rainfall) and decrease in humidity [1]

Hence, taking this into consideration, I would like to form my hypothesis 1. 

#### HYPOTHESIS 1: Forests which have experienced forest fires in the past should have undergone increase in temperature, decrease in precipitation and decrease in humidity before the catastrophe. 

As forests store huge amounts of carbon, they serve as climate protectors for us. However, a forest fire will result in huge amount of carbon emission which is one of the most important global concern [2]

#### HYPOTHESIS 2: Change in the Aerosol Optical Depth during and/or post forest fires due to very high carbon emissions.

[1]https://public.wmo.int/en/media/news/climate-change-increases-risk-of-wildfires

[2]https://www.un.org/development/desa/dpad/wp-content/uploads/sites/45/publication/PB_111.pdf

I will try to check whether these hypothesis are true or not with the help of historical data from Climate Data Store and Adtmosphere Data Store by European Comission implement by ECMWF.

# 2. Datasets used:

    * Burned Area data from ECMWF CDS database to know about forest fires [1]
    * Meteorological data from ECMWF CDS database like temperature, pressure and humidity data [2]
    * Aerosol optical depth from ECMWF ADS database like black carbon aerosol optical depth, dust carbon aerosol optical [3]
    depth and total aerosol optical depth
    * Shapefiles of Andhra Pradesh state boundary and Nalla Malla forest area to subset the global dataset to Area of Interest
    
 [1] https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-fire-burned-area?tab=overview
 [2] https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-monthly-means?tab=overview
 [3] https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-global-greenhouse-gas-inversion?tab=overview

# 3. Study area selection:

#### SInce I need to confine our Area of Interest to a particular region, I am chosing Andhra Pradesh, a state in INDIA. 

Area of Andhra Pradesh: 274834.93 km2
Area of a thick forest (combination of Nallamalla and Lankamall forests) : 37723.32 km2

The shapefile of the Andhra Pradesh administrative boundary (state boundary) is downloaded from internet[3] and the global dataset of satellite fire burned area with all variables has been cropped to extent of Andhra Pradesh state. The forest shapefile is screen digitized using GIS software. Below you can find the pictures Area of Interest.

There is an important reason to chose Andhra Pradesh as area of interest due to its previous history of extensive wildfires in the years 2009 to 2011. Andhra Pradesh is in the top 2 states which are highly proned to forest fires in India.[4] 

Hence, I can analyse both our hypothesis by considering this study area. 

For this reason, the above datasets are downloaded for the years 2008,2009,2010,2011,2012 for months Jan, Feb, March, April, May and June. I have considered one year before and after the forest fires to check effectively the hypothesis and forest fires mostly happen from March till May in Andhra Pradesh.[4]

Andhra Pradesh boundary shapefile:![Andhra%20Pradesh%20Area.png](attachment:Andhra%20Pradesh%20Area.png)

[3] https://www.diva-gis.org/datadown

[4] http://apenvis.nic.in/All%20PDF%20Files/COMMON/Status%20of%20Forest%20Fires%20of%20Andhra%20Pradesh.pdf

Nallamalla and Lankamalla forest shapefile:![Forest%20Area%20in%20Andhra%20Pradesh.png](attachment:Forest%20Area%20in%20Andhra%20Pradesh.png)

# 4. Data Preparation and Initial Visualization of the data

## Importing all the necessary packages

In [52]:
import numpy as np
import pandas as pd
import geopandas as gpd
import xarray as xr
import netCDF4
import glob
import scipy
import altair as alt
import matplotlib.pyplot as plt

## 4.1 Burnt Area Data Preparation

#### The downloaded files of Burned Area for all the years are not in a single nc file. They are downloaded as separate nc file for every month of the respective year. 

Hence, I have to combine the data into a single netcdf file or I can directly chose to open the all the netcdf files into one. 

Please run the first cell in 4.2 section before running the cells below. Otherwise, md_merge not found error will be raised. For reason, please check the comment block in the code below.

In [53]:
%%time

#  Opening all the nc files as a single dataset.
# the open_mfdataset combines are data automatically by checking the metadata of all the files.

# here we have used the coordinates of time dimension of Meteorological datset as the coordinates of burned
# area dataset is reading incorrectly even though right files and correct file names are located in the respective directory.
# Since both of the datasets has same time dimension, I have used it. Please run the meteorological data cells first and come back here.

ba_merge = xr.open_mfdataset('Data_output/BA/Burned_Area_Grid/Burned_Area_Pixel_2008_2012/*.nc')
time_array = md_merge['time']
ba_merge = ba_merge.assign_coords({'time':time_array})
# ba_merge.to_netcdf('Data_output/Merged_Datasets/Burnt_Area_2008_2012.nc')



CPU times: user 755 ms, sys: 99.8 ms, total: 855 ms
Wall time: 915 ms


In [54]:
# DATASET overview

ba_merge

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 124.42 MB 4.15 MB Shape (30, 720, 1440) (1, 720, 1440) Count 90 Tasks 30 Chunks Type float32 numpy.ndarray",1440  720  30,

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 124.42 MB 4.15 MB Shape (30, 720, 1440) (1, 720, 1440) Count 90 Tasks 30 Chunks Type float32 numpy.ndarray",1440  720  30,

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 124.42 MB 4.15 MB Shape (30, 720, 1440) (1, 720, 1440) Count 90 Tasks 30 Chunks Type float32 numpy.ndarray",1440  720  30,

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 124.42 MB 4.15 MB Shape (30, 720, 1440) (1, 720, 1440) Count 90 Tasks 30 Chunks Type float32 numpy.ndarray",1440  720  30,

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 124.42 MB 4.15 MB Shape (30, 720, 1440) (1, 720, 1440) Count 90 Tasks 30 Chunks Type float32 numpy.ndarray",1440  720  30,

Unnamed: 0,Array,Chunk
Bytes,124.42 MB,4.15 MB
Shape,"(30, 720, 1440)","(1, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.24 GB,74.65 MB
Shape,"(30, 18, 720, 1440)","(1, 18, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.24 GB 74.65 MB Shape (30, 18, 720, 1440) (1, 18, 720, 1440) Count 90 Tasks 30 Chunks Type float32 numpy.ndarray",30  1  1440  720  18,

Unnamed: 0,Array,Chunk
Bytes,2.24 GB,74.65 MB
Shape,"(30, 18, 720, 1440)","(1, 18, 720, 1440)"
Count,90 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,345.60 kB,11.52 kB
Shape,"(30, 720, 2)","(1, 720, 2)"
Count,120 Tasks,30 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 345.60 kB 11.52 kB Shape (30, 720, 2) (1, 720, 2) Count 120 Tasks 30 Chunks Type float64 numpy.ndarray",2  720  30,

Unnamed: 0,Array,Chunk
Bytes,345.60 kB,11.52 kB
Shape,"(30, 720, 2)","(1, 720, 2)"
Count,120 Tasks,30 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,691.20 kB,23.04 kB
Shape,"(30, 1440, 2)","(1, 1440, 2)"
Count,120 Tasks,30 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 691.20 kB 23.04 kB Shape (30, 1440, 2) (1, 1440, 2) Count 120 Tasks 30 Chunks Type float64 numpy.ndarray",2  1440  30,

Unnamed: 0,Array,Chunk
Bytes,691.20 kB,23.04 kB
Shape,"(30, 1440, 2)","(1, 1440, 2)"
Count,120 Tasks,30 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,480 B,16 B
Shape,"(30, 2)","(1, 2)"
Count,90 Tasks,30 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 480 B 16 B Shape (30, 2) (1, 2) Count 90 Tasks 30 Chunks Type datetime64[ns] numpy.ndarray",2  30,

Unnamed: 0,Array,Chunk
Bytes,480 B,16 B
Shape,"(30, 2)","(1, 2)"
Count,90 Tasks,30 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.00 kB,2.70 kB
Shape,"(30, 18)","(1, 18)"
Count,120 Tasks,30 Chunks
Type,|S150,numpy.ndarray
"Array Chunk Bytes 81.00 kB 2.70 kB Shape (30, 18) (1, 18) Count 120 Tasks 30 Chunks Type |S150 numpy.ndarray",18  30,

Unnamed: 0,Array,Chunk
Bytes,81.00 kB,2.70 kB
Shape,"(30, 18)","(1, 18)"
Count,120 Tasks,30 Chunks
Type,|S150,numpy.ndarray


#### As our study area is Andhra Pradesh, I will subset the entire global dataset first to Andhra pradesh and perform visualization and analysis with that dataset. Later, we can change the subset bounds to the only forest area and run the cells again as we can expect better results with only forest area.

#### The reason of chosing only forest area subset is obvious. When aggregating data yearly, the contribution of pixels other than the forest area in the calculation might effect in proving the hypothesis.

#### Due to code redundance, I didnt write the same code again for the forest only area. You can just change the extent to the only forest area from the cell below where extent_vector() is defined. Comment the state shapefile and uncomment the forest area shape file code and run all the cells again. You can see the results for forest only area

In [55]:

# bounds of Andhra Pradesh. The extent is given by caling total_bounds function of geopandas. This extent is used to slice
# main dataset

#  Full Andhra Pradesh Extent

def extent_vector(vector_layer):
    aoi_lat = [float(vector_layer.total_bounds[1]), float(vector_layer.total_bounds[3])]
    aoi_lon = [float(vector_layer.total_bounds[0]), float(vector_layer.total_bounds[2])]
    
    
    return aoi_lat,aoi_lon

vector_data = gpd.read_file('Data_output/apshapefile/Andhra Pradesh.shp')
vector_data=vector_data.to_crs(epsg= 4326)
aoi_lat,aoi_lon=extent_vector(vector_data)

# ignore below two lines
# aoi_lat = [float(12.611840248), float(19.916088104)]
# aoi_lon = [float(76.756988525), float(84.761016846)]

# uncomment the below when you want to see the results for forest only area

# Extent of Nalla Malla forest in Andhra Pradesh

# vector_data = gpd.read_file('Data_output/forestshp/Forest_area_only_AP.shp')
# vector_data=vector_data.to_crs(epsg= 4326)
# aoi_lat,aoi_lon=extent_vector(vector_data)


# ignore below two lines
# aoi_lat = [float(13.081632269792184), float(16.666194551188916)]
# aoi_lon = [float(78.1464853977142), float(80.03619565150578)]



#### Slicing the dataset with the calculated bounds: Andhra Pradesh/ Forest Area


In [56]:

ba_merge_ap = ba_merge.sel(
    lat=slice(aoi_lat[1], aoi_lat[0]),
    lon=slice(aoi_lon[0], aoi_lon[1]),
    )


ba_merge_ap

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 115.20 kB 3.84 kB Shape (30, 30, 32) (1, 30, 32) Count 120 Tasks 30 Chunks Type float32 numpy.ndarray",32  30  30,

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 115.20 kB 3.84 kB Shape (30, 30, 32) (1, 30, 32) Count 120 Tasks 30 Chunks Type float32 numpy.ndarray",32  30  30,

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 115.20 kB 3.84 kB Shape (30, 30, 32) (1, 30, 32) Count 120 Tasks 30 Chunks Type float32 numpy.ndarray",32  30  30,

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 115.20 kB 3.84 kB Shape (30, 30, 32) (1, 30, 32) Count 120 Tasks 30 Chunks Type float32 numpy.ndarray",32  30  30,

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 115.20 kB 3.84 kB Shape (30, 30, 32) (1, 30, 32) Count 120 Tasks 30 Chunks Type float32 numpy.ndarray",32  30  30,

Unnamed: 0,Array,Chunk
Bytes,115.20 kB,3.84 kB
Shape,"(30, 30, 32)","(1, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.07 MB,69.12 kB
Shape,"(30, 18, 30, 32)","(1, 18, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.07 MB 69.12 kB Shape (30, 18, 30, 32) (1, 18, 30, 32) Count 120 Tasks 30 Chunks Type float32 numpy.ndarray",30  1  32  30  18,

Unnamed: 0,Array,Chunk
Bytes,2.07 MB,69.12 kB
Shape,"(30, 18, 30, 32)","(1, 18, 30, 32)"
Count,120 Tasks,30 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.40 kB,480 B
Shape,"(30, 30, 2)","(1, 30, 2)"
Count,150 Tasks,30 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.40 kB 480 B Shape (30, 30, 2) (1, 30, 2) Count 150 Tasks 30 Chunks Type float64 numpy.ndarray",2  30  30,

Unnamed: 0,Array,Chunk
Bytes,14.40 kB,480 B
Shape,"(30, 30, 2)","(1, 30, 2)"
Count,150 Tasks,30 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.36 kB,512 B
Shape,"(30, 32, 2)","(1, 32, 2)"
Count,150 Tasks,30 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 15.36 kB 512 B Shape (30, 32, 2) (1, 32, 2) Count 150 Tasks 30 Chunks Type float64 numpy.ndarray",2  32  30,

Unnamed: 0,Array,Chunk
Bytes,15.36 kB,512 B
Shape,"(30, 32, 2)","(1, 32, 2)"
Count,150 Tasks,30 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,480 B,16 B
Shape,"(30, 2)","(1, 2)"
Count,90 Tasks,30 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 480 B 16 B Shape (30, 2) (1, 2) Count 90 Tasks 30 Chunks Type datetime64[ns] numpy.ndarray",2  30,

Unnamed: 0,Array,Chunk
Bytes,480 B,16 B
Shape,"(30, 2)","(1, 2)"
Count,90 Tasks,30 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,81.00 kB,2.70 kB
Shape,"(30, 18)","(1, 18)"
Count,120 Tasks,30 Chunks
Type,|S150,numpy.ndarray
"Array Chunk Bytes 81.00 kB 2.70 kB Shape (30, 18) (1, 18) Count 120 Tasks 30 Chunks Type |S150 numpy.ndarray",18  30,

Unnamed: 0,Array,Chunk
Bytes,81.00 kB,2.70 kB
Shape,"(30, 18)","(1, 18)"
Count,120 Tasks,30 Chunks
Type,|S150,numpy.ndarray


#### Plotting all burned area data of Andhra Pradesh grouped by year (6 months in every year  from Jan to June)

In [57]:
burnt_area_ap = ba_merge_ap['burned_area']

burnt_area_ap.plot(col='time',cmap=plt.cm.Reds, col_wrap=6)
plt.title('6 months Burned Area in Andhra Pradesh from 2008-2012')

# Additional Plots:

# 1 Dimensional Plot using a point location that has forest fires

# burnt_area_ap_1d = burnt_area_ap.sel(lat = 18.50,lon=80.15, method = 'nearest')
# burnt_area_ap_1d.plot()

# 2 Dimensional Plot using one time period

# burnt_area_ap_2d = burnt_area_ap.sel(time = '2012-03-01', method = 'nearest')
# burnt_area_ap_2d.plot()


<IPython.core.display.Javascript object>

Text(0.5, 1, '6 months Burned Area in Andhra Pradesh from 2008-2012')

### Intrepretation from the plots: 
The above plots (grouped year wise) shows the monthly burned area. Dark Red pixels incase the area of burned area is more and and very light red color no burn. The units of the burned area scale are in sq.meters. 1e8 says that the number should be multiplied with 10^8 to get the area in sq.meters

Here you can see that the most of the forest fires have happened in the month of March for the given years of data. However, I can also see that there also was a big fire in the state on Feb 2012.



## 4.2 Meteorological Data Preparation

#### The meteorological data from the climate datastore is downloaded in a single nc file for the given input parameters.

Hence no merging of NC files is necessary. However, I are just making sure that this nc file is in the same merged datasets folders for better data management for the next tasks.

In [58]:
md_merge = xr.open_dataset('Data_output/Merged_Datasets/Met_data_2008_2012.nc')

In [59]:
#overview of the dataset.The dataset contains temperature,precipitation and humidity as variables
# with time,lat and lon as dimensions.

md_merge

### Slicing the global dataset to our Area of Interest: Andhra Pradesh/Forest Area

In [60]:
# using the same lat and lon extent used above for burned area product.

md_merge_ap = md_merge.sel(
    latitude=slice(aoi_lat[1], aoi_lat[0]),
    longitude=slice(aoi_lon[0], aoi_lon[1]),
    )


md_merge_ap

#### Plotting all meteorological data like temperature, precipitation and humidity of Andhra Pradesh grouped by year (6 months in every year  from Jan to June)


In [61]:
md_area_ap_t2m = md_merge_ap['t2m']
md_area_ap_t2m = md_area_ap_t2m - 273.15

md_area_ap_t2m.plot(col='time',cmap=plt.cm.summer, col_wrap=6)
plt.title("6 months Temperature Plots in Andhra Pradesh from 2008-2012 ")


# Additional Plots:

# 1 Dimensional Plot using a point location that has forest fires

# md_area_ap_1d = md_area_ap_t2m.sel(lat = 18.50,lon=80.15, method = 'nearest')
# md_area_ap_1d.plot()

# 2 Dimensional Plot using one time period

# md_area_ap_2d = md_area_ap_t2m.sel(time = '2012-03-01', method = 'nearest')
# md_area_ap_2d.plot()


<IPython.core.display.Javascript object>

Text(0.5, 1, '6 months Temperature Plots in Andhra Pradesh from 2008-2012 ')

### Intrepretation of the plots above: 

#### Temperature plots:

The temperature plots (grouped year wise) shows the air temperature (given in kelvin) value above 2 meters from the surface. Dark yellow pixels  corresponds ot high temperature and light green corresponds to less temperature.

There is constant increase from January to April and decrease in temperature from May to June canbe observed from the plots of all years considered 

In [62]:
md_area_ap_tp = md_merge_ap['tp']

# precipiation is recorded in meters and now we will change in milli meters as mm is generally used as units 
# to represent precipitation
md_area_ap_tp = md_area_ap_tp * 1000

md_area_ap_tp.plot(col='time',cmap=plt.cm.Blues, col_wrap=6)
plt.title("6 months Precipitation Plots in Andhra Pradesh from 2008-2012 ")


# Additional Plots:

# 1 Dimensional Plot using a point location that has forest fires

# md_area_ap_1d = md_area_ap_tp.sel(lat = 18.50,lon=80.15, method = 'nearest')
# md_area_ap_1d.plot()

# 2 Dimensional Plot using one time period

# md_area_ap_2d = md_area_ap_tp.sel(time = '2012-03-01', method = 'nearest')
# md_area_ap_2d.plot()

<IPython.core.display.Javascript object>

Text(0.5, 1, '6 months Precipitation Plots in Andhra Pradesh from 2008-2012 ')

### Intrepretation of the plots above: 
 

#### Precipitation plots:

The precipitation plots (grouped year wise) shows the precipitation (given in meters). Dark blue pixels  corresponds ot high precipitation and light blue corresponds to low precipitation.

The precipitation is very low in the centre parts of the selected region , but a some parts of the area have experienced high precipitation in the year 2008


In [63]:
md_area_ap_d2m = md_merge_ap['d2m']
md_area_ap_d2m = md_area_ap_d2m - 273.15


md_area_ap_d2m.plot(col='time',cmap=plt.cm.GnBu, col_wrap=6)
plt.title("6 months Humidity Plots in Andhra Pradesh from 2008-2012 ")

# Additional Plots:

# 1 Dimensional Plot using a point location that has forest fires

# md_area_ap_1d = md_area_ap_d2m.sel(lat = 18.50,lon=80.15, method = 'nearest')
# md_area_ap_1d.plot()

# 2 Dimensional Plot using one time period

# md_area_ap_2d = md_area_ap_d2m.sel(time = '2012-03-01', method = 'nearest')
# md_area_ap_2d.plot()

<IPython.core.display.Javascript object>

Text(0.5, 1, '6 months Humidity Plots in Andhra Pradesh from 2008-2012 ')

### Intrepretation of the the plots above: 


#### Humidity plots:

The humidity plots (grouped year wise) shows the humidity data (given in degrees centigrade) above 2m from the surface of earth. Dark blue pixels  corresponds ot high humidity and light green corresponds to low humidity.

Humidity is constantly increasing with months for all the years.

## 4.3 Carbon emissions and aerosol optical depth Data Preparation

The data is downloaded as a single nc file for every month of the respective year.

Hence no merging of NC files is necessary. However, I am just making sure that this nc file is in the same merged datasets folders for better data management for the next tasks.



In [64]:
#  Opening all the nc files as a single dataset.
# the open_mfdataset combines are data automatically by checking the metadata of all the files.

ca_merge = xr.open_dataset('Data_output/PD/co2_2008_2012.nc')
# co2_merge.to_netcdf('Data_output/Merged_Datasets/CO2_Fossil_Fuel_Emission_2008_2012.nc')




In [65]:
# overview of the dataset. The dataset contains black carbon aerosol optical depth, dust aerosol optical depth and
# total aerosol optical depth as variables with time,lat and lon as dimensions.

ca_merge

#### Slicing the global dataset to our Area of Interest: Andhra Pradesh/Forest Area

In [66]:
# using the same lat and lon extent used above for burned area product.

ca_merge_ap = ca_merge.sel(
    latitude=slice(aoi_lat[1], aoi_lat[0]),
    longitude=slice(aoi_lon[0], aoi_lon[1]),
    )


ca_merge_ap

#### Plotting Carbon Emissions induced aerosol optical depth grouped by year (6 months in every year  from Jan to June)


In [67]:

ca_area_ap_bcaod = ca_merge_ap['bcaod550']

ca_area_ap_bcaod.plot(col='time',cmap=plt.cm.hot_r, col_wrap=6)
plt.title("6 months Aerosol Optical Depth caused due to Carbon Emissions Plots in Andhra Pradesh from 2008-2012 ")


# Additional Plots:

# 1 Dimensional Plot using a point location that has forest fires

# ca_area_ap_1d = ca_area_ap_bcaod.sel(lat = 18.50,lon=80.15, method = 'nearest')
# ca_area_ap_1d.plot()

# 2 Dimensional Plot using one time period

# ca_area_ap_2d = ca_area_ap_bcaod.sel(time = '2012-03-01', method = 'nearest')
# ca_area_ap_2d.plot()


<IPython.core.display.Javascript object>

Text(0.5, 1, '6 months Aerosol Optical Depth caused due to Carbon Emissions Plots in Andhra Pradesh from 2008-2012 ')

#### Interpretation of the above plot:

The dark orange areas show high carbon emissions and light organge pixels shows low carbon emissions. It is a dimensionless quantity.

I can see a lot of carbon emissions in the months of March and April for all years from the above plots and very less carbon emissions in the month of june for all years as well

#### Plotting total aerosol optical depth data  grouped by year (6 months in every year  from Jan to June)


In [68]:

ca_area_ap_aod = ca_merge_ap['aod550']

ca_area_ap_aod.plot(col='time',cmap=plt.cm.YlOrBr, col_wrap=6)
plt.title("6 months Aerosol Optical Depth caused due to Carbon Emissions Plots in Andhra Pradesh from 2008-2012 ")


# Additional Plots:

# 1 Dimensional Plot using a point location that has forest fires

# ca_area_ap_1d = ca_area_ap_aod.sel(lat = 18.50,lon=80.15, method = 'nearest')
# ca_area_ap_1d.plot()

# 2 Dimensional Plot using one time period

# ca_area_ap_2d = ca_area_ap_aod.sel(time = '2012-03-01', method = 'nearest')
# ca_area_ap_2d.plot()


<IPython.core.display.Javascript object>

Text(0.5, 1, '6 months Aerosol Optical Depth caused due to Carbon Emissions Plots in Andhra Pradesh from 2008-2012 ')

#### Interpretation of the above plot:

The dark orange areas show high dust emissions and light organge pixels shows low dust emissions. It is a dimensionless quantity.

This looks interesting as I can see that there is an increase in dust emissions with months for all years from above.

#### Plotting dust induced aerosol optical depth data grouped by year (6 months in every year  from Jan to June)


In [69]:

ca_area_ap_duaod = ca_merge_ap['duaod550']

ca_area_ap_duaod.plot(col='time',cmap=plt.cm.YlOrBr, col_wrap=6)
plt.title("6 months Aerosol Optical Depth caused due to Carbon Emissions Plots in Andhra Pradesh from 2008-2012 ")


# Additional Plots:

# 1 Dimensional Plot using a point location that has forest fires

# ca_area_ap_1d = ca_area_ap_duaod.sel(lat = 18.50,lon=80.15, method = 'nearest')
# ca_area_ap_1d.plot()

# 2 Dimensional Plot using one time period

# ca_area_ap_2d = ca_area_ap_duaod.sel(time = '2012-03-01', method = 'nearest')
# ca_area_ap_2d.plot()


<IPython.core.display.Javascript object>

Text(0.5, 1, '6 months Aerosol Optical Depth caused due to Carbon Emissions Plots in Andhra Pradesh from 2008-2012 ')

#### Interpretation of the above plot:

The dark orange areas shows that Total AOD is high and light organge pixels shows low Total AOD. It is a dimensionless quantity.

The total AOD has increased significantly like Dust Emissions over the months for all the years.

# 5. Data Analysis:


I will divide the data analysis for the given hypothesis in two parts. Seeing the data trent over the given period of time 
for the complete state of Andhra Pradesh in both monthly and yearly wise. 

Later, I further shorten our study area to only forest region in Andhra Pradesh and calculates the trends of all variables and see if there is any correlation between the burned area and the respective variables considered for the hypothesis.

#### Due to code redundance, I didnt write the same code again for the forest only area. You can just change the extent to the only forest area from the cell above where extent_vector() is defined. Comment the state shapefile and uncomment the forest area shape file code and run all the cells again. You can see the results for forest only area

## Data Analysis: Hypothesis 1

##  H1 : Forests which have experienced forest fires in the past should have undergone increase in temperature, decrease in precipitation and decrease in humidity before the catastrophe.



#### The sixmonth_year() and year_wise() functions :

Thes functions takes two parameters as input. One is Dataarray and the other is identifier. 

The identifier is used separate the way we resample the data to perform aggregation analysis. Identifier == 1 says that the variable values should be summed up but not mean in the given dimension (time dimension in our case) any other value corresponds to mean.

In case of sixmonth_year(), the variable values are either summed up or mean is calculated for every dimension of time in the entire area of interest. 

However, in year_wise() function, the dataarray is first resampled into only years data by adding or taking mean in the given dimension. Later the variable dataarray is summed up or mean is calculated in the area of interest for all the resampled time dimensions.

In [70]:
def sixmonth_year(data_array,identifier):

            
    times = data_array['time']
    
#    created manually for the axis labels for plotting bar charts

    time_6m_axis = ['2008-01','2008-02','2008-03','2008-04','2008-05','2008-06',
                        '2009-01','2009-02','2009-03','2009-04','2009-05','2009-06',
                       '2010-01','2010-02','2010-03','2010-04','2010-05','2010-06',
                       '2011-01','2011-02','2011-03','2011-04','2011-05','2011-06',
                       '2012-01','2012-02','2012-03','2012-04','2012-05','2012-06']

    values = []
    if identifier == 1:
    

        for i in times:
#             dividing by 10^6 to chnage units from sq.mts to sq.kms
            x = data_array.sel(time = i).to_series().sum()/1000000
            values.append(x)
            
    else:
    

        for i in times:
            x = data_array.sel(time = i).to_series().mean()
            values.append(x)
        
        

    return time_6m_axis, values

In [71]:
def year_wise(data_array,identifier):
    
    values=[]
    global time_year

    if identifier==1:
        
        yearly_data = data_array.resample(time = '1Y').sum('time')
#     yearly_data.plot(col='time',cmap=plt.cm.YlOrBr, col_wrap=6)
        time_year = yearly_data['time']
        
        for i in time_year:
            x = yearly_data.sel(time = i).to_series().sum()/1000000
            values.append(x)
            
    else:
        
        yearly_data = data_array.resample(time = '1Y').mean('time')
#     yearly_data.plot(col='time',cmap=plt.cm.YlOrBr, col_wrap=6)
        time_year = yearly_data['time']
        
        for i in time_year:
            x = yearly_data.sel(time = i).to_series().mean()
            values.append(x)
            
# time_year.dt.year gives the year only data of the time dimension which serves as labels fo
    return time_year.dt.year, values

## Calculating all variable values related to hypothesis 1 for six month - year as temporal resolution 

In [72]:
time_range_6m_yr, ba_value_6m_yr = sixmonth_year(burnt_area_ap,1)

time_range_6m_yr, md_value_t2m_6m_yr = sixmonth_year(md_area_ap_t2m,2)

time_range_6m_yr, md_value_tp_6m_yr= sixmonth_year(md_area_ap_tp,2)

time_range_6m_yr, md_value_d2m_6m_yr= sixmonth_year(md_area_ap_d2m,2)


In [73]:
sixmonth_yearwise_data_h1 = {'Date': time_range_6m_yr,
                            'Burned Area': ba_value_6m_yr,
                            'Temperature': md_value_t2m_6m_yr,
                            'Precipitation': md_value_tp_6m_yr,
                            'Humidity': md_value_d2m_6m_yr}

h1_analysis_6m_yr = pd.DataFrame(sixmonth_yearwise_data_h1)


In [74]:
h1_analysis_6m_yr

Unnamed: 0,Date,Burned Area,Temperature,Precipitation,Humidity
0,2008-01,1277.594624,27.839104,0.016584,15.440383
1,2008-02,1774.851456,28.478615,0.071373,17.465099
2,2008-03,6063.141376,30.666988,0.131535,18.386534
3,2008-04,1210.674944,33.327911,0.037541,19.676519
4,2008-05,1275.501824,35.498169,0.022004,20.313694
5,2008-06,6.225099,31.56204,0.172057,22.352571
6,2009-01,714.759616,28.152271,0.0029,15.524917
7,2009-02,6395.54048,30.586935,0.00023,15.949995
8,2009-03,17527.738368,32.209351,0.014818,16.978258
9,2009-04,1458.552064,34.536671,0.008965,18.994337


#### Plotting data using altair library 

Used Altair library as it supports interactive visualization.
Please select an interval in any one of the plot and see only the selected data in the other plots. 
Click on a point or bar to see the value of that.

#### Incase you are viewing this notebook (in github) without running it, there is a probability the plots wont be visible without the running the code. Please run the code to see the interactive plots.

In [75]:
brush = alt.selection_interval(encodings= ["x"])


# plotting the dataframe using altair


hist1 = alt.Chart(h1_analysis_6m_yr).mark_bar().encode(x = "Date",
    color=alt.condition(brush, alt.value("cadetblue"), alt.value('lightgray'))
).interactive().properties(width=400, height= 250, title = 'Six Month-Year Wise Analysis').add_selection(brush)

hist1.configure_title(
    fontSize=20,
    font='Courier',
    anchor='start',
    color='gray'
)


# concatenating all plots and making them interactive

hist1.encode(alt.Y("Burned Area",title = 'Burned Area [in KM]'),color=alt.condition(brush, alt.value("cadetblue"), alt.value('lightgray')), tooltip=["Date", "Burned Area"]
) & hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Temperature", title = 'Temperature [in °C]', scale= alt.Scale(domain=(10,40))), color=alt.condition(brush, alt.value("green"), alt.value('lightgray')), tooltip=["Date", "Temperature"]
) | hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Precipitation",title = 'Precipitation [in mm]', scale= alt.Scale(domain=(0,0.5))),color=alt.condition(brush, alt.value("steelblue"), alt.value('lightgray')), tooltip=["Date", "Precipitation"]
) & hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Humidity", title = 'Humidity [in °C]', scale= alt.Scale(domain=(10,40))),color=alt.condition(brush, alt.value("#D35400"), alt.value('lightgray')), tooltip=["Date", "Humidity"])


In [76]:
corr_table_h1_6m_yr = h1_analysis_6m_yr.loc[:, ~h1_analysis_6m_yr.columns.isin(['Date'])].corr()
corr_table_h1_6m_yr

Unnamed: 0,Burned Area,Temperature,Precipitation,Humidity
Burned Area,1.0,-0.007884,-0.320451,-0.427323
Temperature,-0.007884,1.0,0.17021,0.740388
Precipitation,-0.320451,0.17021,1.0,0.645929
Humidity,-0.427323,0.740388,0.645929,1.0


#### The column date has been ignored while calculating correltion between variables.

#### Interpretation of correlation cooefficients for six month year wise data: For entire Andhra Pradesh

The temperature is negatively correlated with 0.7% (even though it is close to zero). Precipitation and Humidity 
are negatively correlated with burned area with 32% and 42.7% respectively.
    
#### Interpretation of correlation cooefficients for six month year wise data: For Nalla Malla Forest Area

The temperature is negatively correlated with 1.75% (even though it is close to zero). Precipitation and Humidity 
are negatively correlated with burned area with 17.1% and 64% respectively.

## Calculating all variable values related to hypothesis 1 for yearwise as temporal resolution 

In [77]:
time_range_yr,ba_value_yr = year_wise(burnt_area_ap,1)

time_range_yr, md_value_t2m_yr = year_wise(md_area_ap_t2m,2)

time_range_yr, md_value_tp_yr= year_wise(md_area_ap_tp,2)

time_range_yr, md_value_d2m_yr= year_wise(md_area_ap_d2m,2)


In [78]:
yearwise_data_h1 = {'Date': time_range_yr,
                            'Burned Area': ba_value_yr,
                            'Temperature': md_value_t2m_yr,
                            'Precipitation': md_value_tp_yr,
                            'Humidity': md_value_d2m_yr}
h1_analysis_yr = pd.DataFrame(yearwise_data_h1)

h1_analysis_yr

Unnamed: 0,Date,Burned Area,Temperature,Precipitation,Humidity
0,2008,11607.989248,31.228878,0.075182,18.939163
1,2009,26406.182912,32.312889,0.020939,18.557735
2,2010,15205.936128,32.138435,0.080086,19.50498
3,2011,12564.078592,31.403507,0.041154,18.64043
4,2012,30948.091904,32.229313,0.03288,18.322079


#### Plotting data using altair library 

Used Altair library as it supports interactive visualization.
Please select an interval in any one of the plot and see only the selected data in the other plots. 
Click on a point or bar to see the value of that.

In [79]:
brush = alt.selection_multi(encodings= ["x"])


# plotting the dataframe using altair


hist1 = alt.Chart(h1_analysis_yr).mark_bar(size = 20).encode(x = "Date",
    color=alt.condition(brush, alt.value("cadetblue"), alt.value('lightgray'))
).interactive().properties(width=400, height= 250, title = 'Six Month-Year Wise Analysis').add_selection(brush)

hist1.configure_title(
    fontSize=20,
    font='Courier',
    anchor='start',
    color='gray'
)


# concatenating all plots and making them interactive

hist1.encode(alt.Y("Burned Area",title = 'Burned Area [in KM]'),color=alt.condition(brush, alt.value("cadetblue"), alt.value('lightgray')), tooltip=["Date", "Burned Area"]
) & hist1.mark_line(point = True,strokeDash=[2, 2], align= "center").encode(alt.Y("Temperature", title = 'Temperature [in °C]', scale= alt.Scale(domain=(10,40))), color=alt.condition(brush, alt.value("green"), alt.value('lightgray')), tooltip=["Date", "Temperature"]
) | hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Precipitation",title = 'Precipitation [in mm]', scale= alt.Scale(domain=(0,0.5))),color=alt.condition(brush, alt.value("steelblue"), alt.value('lightgray')), tooltip=["Date", "Precipitation"]
) & hist1.mark_line(point = True,strokeDash=[2, 2], align= "center").encode(alt.Y("Humidity", title = 'Humidity [in °C]', scale= alt.Scale(domain=(10,40))),color=alt.condition(brush, alt.value("#D35400"), alt.value('lightgray')), tooltip=["Date", "Humidity"])


In [80]:
corr_table_h1_yr = h1_analysis_yr.loc[:, ~h1_analysis_yr.columns.isin(['Date'])].corr()
corr_table_h1_yr

Unnamed: 0,Burned Area,Temperature,Precipitation,Humidity
Burned Area,1.0,0.805756,-0.720598,-0.637048
Temperature,0.805756,1.0,-0.431962,-0.1137
Precipitation,-0.720598,-0.431962,1.0,0.865386
Humidity,-0.637048,-0.1137,0.865386,1.0


#### Interpretation of correlation cooefficients for year wise data: For entire Andhra Pradesh 

The temperature is highly positively correlated with burned area with 80%.
Precipitation and Humidity are negatively correlated with burned area with 72% and 63.7% respectively.

#### Interpretation of correlation cooefficients for year wise data: For Nalla Malla Forest only 

The temperature is positively correlated with burned area with 58.4%.
Precipitation and Humidity are highly negatively correlated with burned area with 90.4% and 88.3% respectively.

## Data Analysis for Hypothesis 2

## H2: Change in the Aerosol Optical Depth during and/or post forest fires due to very high carbon emissions.



## Calculating all variable values related to hypothesis 2 for six month - year as temporal resolution 

In [81]:
time_range_6m_yr, ca_area_ap_bcaod_6m_yr= sixmonth_year(ca_area_ap_bcaod,2)

time_range_6m_yr, ca_area_ap_duaod_6m_yr= sixmonth_year(ca_area_ap_duaod,2)

time_range_6m_yr, ca_area_ap_aod_6m_yr= sixmonth_year(ca_area_ap_aod,2)



In [82]:
sixmonth_yearwise_data_h2 = {'Date': time_range_6m_yr,
                             'Burned Area': ba_value_6m_yr,
                            'Black Carbon AOD': ca_area_ap_bcaod_6m_yr,
                            'Dust AOD': ca_area_ap_duaod_6m_yr,
                            'Total AOD': ca_area_ap_aod_6m_yr}

h2_analysis_6m_yr = pd.DataFrame(sixmonth_yearwise_data_h2)

h2_analysis_6m_yr

Unnamed: 0,Date,Burned Area,Black Carbon AOD,Dust AOD,Total AOD
0,2008-01,1277.594624,0.011274,0.003468,0.292205
1,2008-02,1774.851456,0.011465,0.005722,0.329403
2,2008-03,6063.141376,0.012534,0.006656,0.274198
3,2008-04,1210.674944,0.01528,0.031859,0.352928
4,2008-05,1275.501824,0.014889,0.11889,0.485269
5,2008-06,6.225099,0.012137,0.197736,0.65933
6,2009-01,714.759616,0.009973,0.001217,0.276198
7,2009-02,6395.54048,0.014584,0.004512,0.336395
8,2009-03,17527.738368,0.023578,0.02355,0.419638
9,2009-04,1458.552064,0.018784,0.035925,0.4036


#### Plotting data using altair library 

Used Altair library as it supports interactive visualization.
Please select an interval in any one of the plot and see only the selected data in the other plots. 
Click on a point or bar to see the value of that.

In [83]:
brush = alt.selection_interval(encodings= ["x"])


# plotting the dataframe using altair


hist1 = alt.Chart(h2_analysis_6m_yr).mark_bar().encode(x = "Date",
    color=alt.condition(brush, alt.value("cadetblue"), alt.value('lightgray'))
).interactive().properties(width=400, height= 250, title = 'Six Month-Year Wise Analysis').add_selection(brush)

hist1.configure_title(
    fontSize=20,
    font='Courier',
    anchor='start',
    color='gray'
)


# concatenating all plots and making them interactive

hist1.encode(alt.Y("Burned Area",title = 'Burned Area [in KM]'),color=alt.condition(brush, alt.value("cadetblue"), alt.value('lightgray')), tooltip=["Date", "Burned Area"]
) & hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Dust AOD", scale= alt.Scale(domain=(0,0.5))), color=alt.condition(brush, alt.value("brown"), alt.value('lightgray')), tooltip=["Date", "Dust AOD"]
) | hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Black Carbon AOD", scale= alt.Scale(domain=(0,0.1))),color=alt.condition(brush, alt.value("black"), alt.value('lightgray')), tooltip=["Date", "Black Carbon AOD"]
) & hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Total AOD",scale= alt.Scale(domain=(0,1))),color=alt.condition(brush, alt.value("red"), alt.value('lightgray')), tooltip=["Date", "Total AOD"])


#### Calculating Correlation values between variables

In [84]:
corr_table_h2_6m_yr = h2_analysis_6m_yr.loc[:, ~h2_analysis_6m_yr.columns.isin(['Date'])].corr()
corr_table_h2_6m_yr

Unnamed: 0,Burned Area,Black Carbon AOD,Dust AOD,Total AOD
Burned Area,1.0,0.679114,-0.36171,-0.166386
Black Carbon AOD,0.679114,1.0,-0.068728,0.276215
Dust AOD,-0.36171,-0.068728,1.0,0.901497
Total AOD,-0.166386,0.276215,0.901497,1.0


#### Interpretation of correlation cooefficients for six month year wise data: For Andhra Pradesh area

The carbon emissions is positively correlated with burned area with 67.9%. 
Dust AOD and Total AOD are negatively correlated with burned area with 36.1% and 16.6% respectively.

#### Interpretation of correlation cooefficients for six month year wise data: For Nalla Malla Forest area only

The carbon emissions is positively correlated with burned area with 36%. 
Dust AOD and Total AOD are negatively correlated with burned area with 36% and 27.3% respectively.
    

## Calculating all variable values related to hypothesis 2 for yearwise as temporal resolution 

In [85]:
time_range_yr, ca_area_ap_bcaod_yr= year_wise(ca_area_ap_bcaod,2)

time_range_yr, ca_area_ap_duaod_yr= year_wise(ca_area_ap_duaod,2)

time_range_yr, ca_area_ap_aod_yr= year_wise(ca_area_ap_aod,2)



In [86]:
yearwise_data_h2 = {'Date': time_range_yr,
                    'Burned Area': ba_value_yr,
                    'Black Carbon AOD': ca_area_ap_bcaod_yr,
                    'Dust AOD': ca_area_ap_duaod_yr,
                    'Total AOD': ca_area_ap_aod_yr}
h2_analysis_yr = pd.DataFrame(yearwise_data_h2)

h2_analysis_yr

Unnamed: 0,Date,Burned Area,Black Carbon AOD,Dust AOD,Total AOD
0,2008,11607.989248,0.01293,0.060722,0.398889
1,2009,26406.182912,0.015306,0.045651,0.401053
2,2010,15205.936128,0.014251,0.035658,0.386112
3,2011,12564.078592,0.013805,0.04817,0.395286
4,2012,30948.091904,0.015113,0.056077,0.401261


#### Plotting data using altair library 

Used Altair library as it supports interactive visualization.
Please select an interval in any one of the plot and see only the selected data in the other plots. 
Click on a point or bar to see the value of that.

In [87]:
brush = alt.selection_interval(encodings= ["x"])


# plotting the dataframe using altair


hist1 = alt.Chart(h2_analysis_yr).mark_bar(size = 20).encode(x = "Date",
    color=alt.condition(brush, alt.value("cadetblue"), alt.value('lightgray'))
).interactive().properties(width=400, height= 250, title = 'Six Month-Year Wise Analysis').add_selection(brush)

hist1.configure_title(
    fontSize=20,
    font='Courier',
    anchor='start',
    color='gray'
)


# concatenating all plots and making them interactive

hist1.encode(alt.Y("Burned Area",title = 'Burned Area [in KM]'),color=alt.condition(brush, alt.value("cadetblue"), alt.value('lightgray')), tooltip=["Date", "Burned Area"]
) & hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Dust AOD", scale= alt.Scale(domain=(0,0.5))), color=alt.condition(brush, alt.value("black"), alt.value('lightgray')), tooltip=["Date", "Dust AOD"]
) | hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Black Carbon AOD", scale= alt.Scale(domain=(0,0.1))),color=alt.condition(brush, alt.value("brown"), alt.value('lightgray')), tooltip=["Date", "Black Carbon AOD"]
) & hist1.mark_line(point = True, strokeDash=[2, 2], align= "center").encode(alt.Y("Total AOD",scale= alt.Scale(domain=(0,1))),color=alt.condition(brush, alt.value("red"), alt.value('lightgray')), tooltip=["Date", "Total AOD"])


#### Calculating Correlation values between variables


In [88]:
corr_table_h2_yr = h2_analysis_yr.loc[:, ~h2_analysis_yr.columns.isin(['Date'])].corr()
corr_table_h2_yr

Unnamed: 0,Burned Area,Black Carbon AOD,Dust AOD,Total AOD
Burned Area,1.0,0.898556,0.084085,0.544203
Black Carbon AOD,0.898556,1.0,-0.332712,0.256358
Dust AOD,0.084085,-0.332712,1.0,0.760963
Total AOD,0.544203,0.256358,0.760963,1.0


#### Interpretation of correlation cooefficients for year wise data: For Andhra Pradesh area

The Carbon emissions is highly positively correlated with burned area with 89.8% with year wise aggregration of data . 
Dust AOD and Total AOD are also positivley correlated with burned area with 8% and 54.4% respectively.

#### Interpretation of correlation cooefficients for year wise data: For Nalla Malla Forest area only

The Carbon emissions is positively correlated with burned area with 28.6% with year wise aggregration of data . 
Dust AOD and Total AOD are correlated with burned area with  positively 7% and negatiely 77.6% respectively.
    

## 7. Conclusion:

### 7.1 Conclusion for Hypothesis 1:

As I don't exactly know the dates of wildfires in the given months, the yearly aggregated data showed a better correlation of burned area data with the Air Temperature, Precipitaion and humidity than the monthly year wise based data.

The precipitaion and humidity are highly corrleated (negatively with ~90.4 and ~88.3 respectivley) when I considered the forest area only subset data whereas Air Temperature is highly correlated (positively ~ 80%) with the complete Andhra Pradesh state subset.


### 7.2 Conclusion for Hypothesis 2:

The yearly aggregated data showed a better correlation of burned area data with the Carbon Emissions (bcaod) with a very high value of 89.8% of correlation for the entire state of Andhra Pradesh.

When considered the forest only area subset of the dataset, the Total Aerosol Optical Depth (aod) showed high correlation of 77.6% with the burned area.

As our main focus is on carbon emissions during forest fires, the yearly aggregated bcaod data for the entire state of Andhra Pradesh showed highest correlation with burned area. It is interesting to see very less correlation of carbon emissions with the burned area when only forest area is considered (because I expected high correlation). This may lead to additional thoughts of propagation of Carbon Emissions outside the forest area region during/post wildfires.


## Final thoughts:


### Both hypothesis looks true when the data is aggregated yearly. The hypothesis 2 seems to be very promising because very high correlation of Carbon emissions with burned area. If analysis is applied with all historical data and an ML Model is developed, there is a scope that predicting the amount of Carbon Emissions into the atmosphere  during forest fires gives promising results which helps in saving environment and human health.



## Author: Jayendra Praveen Kumar Chorapalli
#### M.Sc ESPACE, TU Munich

Contact : jayendra.chorapali@tum.de 

Added new function

In [1]:
def hello():
    print("Hello world")

hello()

Hello world
