## Bushfire Project
The project below was the last oe developed during the Bootcamp done in General Assembly.

The table of contents will lead to the specific notebooks that develop each individual part of the project.
Running all the code in one notebook proved inefficient and different environments had to be created as well.

1. <a href='#Problem_Statement'>Problem Statement</a>
2. <a href='#Problem_Research'>Researching the Problem</a> 
    1. <a href='#Empirical_Knowledge'>Empirical Knowledge</a> 
    2. <a href='#Other_works'>Other works on the subject</a>
    3. <a href='#Current_governmental'>Current governmental initiatives</a> 
    4. <a href='#Methodology'>Methodology</a>
3. <a href='#Data_acquisition'>Data acquisition</a>
    1. [Weather Data](./Bushfire_Weather.ipynb)
    2. [Satelite Data](./Bushfire_Satelite.ipynb)
    3. [Map Data](./Bushfire_Map.ipynb)
    4. [Fire Scar](./Bushfire_Fire_Scar.ipynb)
4. <a href='#Data_Wrangling'>Data Wrangling</a>
    1. <a href='#Data_analysis'>Data Analysis</a>
    2. [Preparing the prediction data](./Bushfire_Weather_Kringe.ipynb)
    3. Understanding the data and developing a bottom-up methodology
5. Modeling
    1. Feature Engineering
    2. Testing different models and feature importance
6. Conclusions



<a id='Problem_Statement'></a>

### 1. Problem Statement
Bushfires are an intrinsic part of Australia's natural environment. The amount of bushfires and their concentration at specific times of the year are a recurring problem that can reach critical proportions, as seen at the end of 2019. This project will look in to the probability of fires starting as well as testing the feature importance to corroborate empiracal analysis.

<a id='Problem_Research'></a>

### 2. Researching the Problem

<a id='Empirical_Knowledge'></a>

#### A. Empirical Knowledge
- Fires need 3 things to start: Heat, Fuel and Oxigen.
- Bushfires have the 3 items above in more complexity: temperature, fuel mosture, fuel availability, wind speed, ignition source, type of vegetation etc.
- The seasonality and geographical position influence the probability of a fire starting.



<a id='Other_works'></a>

#### B. Other works on the subject
We were able to locate a work done by a data scientist in Portugal. This work is now publicly availabe and can be found on [here.](http://archive.ics.uci.edu/ml/datasets/Forest+Fires)

This work is using weather information and the Forest Fire Weather index to predict the burned area of forest fires using a SVM. It is not exactly what we are loking for but is an interesting look at a similar problem.



<a id='Current_governmental'></a>

#### C. Current governmental initiatives
CSIRO is by far the most advanced agency working on forecasting and preparing for bushfires in a holistic way. See more [here.](https://www.csiro.au/en/Research/Environment/Extreme-Events/Bushfire)



<a id='Methodology'></a>

#### D. Methodology
The Idea is to try and create a process that **forecasts** fires given other correlated features.

To be able to achieve this, we will have to subdivide the area of the state in to smaller areas and forecast the probability of a fire happening on that specific area. The concept is similar to Lewis Fry Richardson's division of the globe in "Weather Prediction By Numerical Process". 

To achieve this we will divide the state in 1 degree squares and kringe the features for each square using KNN and feature engineering where necessary.

There are a few very important features and data that we will have to try and procure. Next, we will look at exactly that.

<a id='Data_acquisition'></a>

### 3. Data acquisition
Please go to each Ntebook following the links below:
####    A. [Weather Data](./Bushfire_Weather.ipynb)
####    B. [Satelite Data](./Bushfire_Satelite.ipynb)
####    C. [Map Data](./Bushfire_Map.ipynb)
####    D. [Fire Scar](./Bushfire_Fire_Scar.ipynb)


**Possible problems going forward:**

I could not find a good dataset with the vegetation of the whole of NSW, only small pieces.

The firescar data is unreliable. Without knowing which are wildfires and wich are prescribed burs we are possibly giving the model unreliable information.

<a id='Data_Wrangling'></a>

### 4. Data Wrangling

<a id='Data_Analysis'></a>

#### A. Data Analysis
Now that we finished procuring data, let's look at what we have and determine what we want to do as far as feature engineering and data manipulation goes.

To make this easyer we will make a little [streamlit app](http://13.238.142.157:8501/) looking at the satelite data.

The code for the app follows:

In [None]:
#import statements
import pandas as pd
import altair as alt
import streamlit as st
import pydeck as pdk
import numpy as np


@st.cache
# Function to load in data
def get_data():
    satelite_data = "monthly_count.csv"
    df = pd.read_csv(satelite_data)
    return df.set_index("months")

# Loading in data
try:
    df = get_data()
except:
    st.error(
        """
        **could not load data**

        Save data to same directory
    """)
data = df.T
# Title and introduction
'''
# **Bushfires in NSW**
'''

st.image('NSW_2013.jpg', 
use_column_width=True)
'''
Image from 21/10/2013, Source [here](https://visibleearth.nasa.gov/images/82211/fires-in-new-south-wales-australia/82212w).


How intense are the bushfires in Australia? How do they progress?
Are we able to get a notion of what is going on?

Well, Australia is a big place, so in this project we will be looking at NSW only.

Let's check the number of events that happen over the years.

'''

# Selecting the specific year on a multiselect object
year = st.multiselect(
    "Choose years to see on graph", list(df.columns), ['2001', '2002']
)
if not year:
    st.error("Please select at least one year.")

# Filtering selected years
data = df.T.loc[year]
# Preparing a chckbox to look at data 
if st.checkbox('Show raw data'):
    st.subheader('number of events per month/year')
    st.write(data)


# preparing data for chart
data = pd.melt(data.T.reset_index(), id_vars=["months"]).rename(
    columns={"index": "month", "value": "Fires per month",'variable':'years'})

# Preparing chart
chart = (
    alt.Chart(data)
    .mark_area(opacity=0.3)
    .encode(
        x="months:O",
        y=alt.Y("Fires per month:Q", stack=None),
        color="years:N",
        # order="order:O"
    )
)
# plotting chart using altair(for mrore info: https://altair-viz.github.io/user_guide/encoding.html)
st.altair_chart(chart, use_container_width=True)

# Map starts beow -----------------------------

'''
### Let's take a look at the fires on a map.

The best way to look at this data is on a map. This way we can clearly understand 
the progression over time and the magnitude of the problem.

The markings below show the density of hotspots in a 1km area. 
This means that the more events that happen in the same area the stronger the color 
and the higher the bars.

We can see the information day-by-day over 2019. 
This is a great format to see how the fire fronts progressed on the most fire intense days
 of 2019.

'''

# preparing variables
DATE_TIME = "datetime"
DATA_URL = ('satelite_2019.csv')

@st.cache(persist=True)
# preparing data loadin function
def load_data():
    data = pd.read_csv(DATA_URL)
    lowercase = lambda x: str(x).lower()
    data.rename(lowercase, axis="columns", inplace=True)
    data[DATE_TIME] = pd.to_datetime(data[DATE_TIME])
    return data

# loading data in
df = load_data()
data = df
# month slider
month = st.slider(" Select a month to look at", 1, 12)
# Day slider
day = st.slider("Select a day to look at", 1, 31)
# map agle view
angle = st.slider("Select map angle", 1, 51,step=5)
# Filtering useing sliders
data = data[(data[DATE_TIME].dt.day == day) & (data[DATE_TIME].dt.month == month)]
# creating subheader to show what is being displayed on the map
st.subheader("Geo data on %i/%i/2019 " %(day, month))
# Getting the midpoint to zoom the map in
midpoint = (np.average(df["latitude"]), np.average(df["longitude"]))

st.write(pdk.Deck(
    map_style="mapbox://styles/mapbox/light-v9",
    initial_view_state={
        "latitude": midpoint[0],
        "longitude": midpoint[1],
        "zoom": 5, 
        "pitch": angle, 
    },
    layers=[
        pdk.Layer(
            "HexagonLayer",
            data=data,
            get_position=["longitude", "latitude"],
            radius=1000, 
            elevation_scale=6, 
            elevation_range=[0, 3000], 
            pickable=True,
            extruded=True,
        )
    ]
))
if st.checkbox('Want to know more about what we are looking at?'):
    
    st.write(
        '''The data that this project uses was collected by Nasa using the MODIS instrument. 
More information [here](https://modis.gsfc.nasa.gov/data/).

MODIS has several products, this project focuses on the active fire product. 
This product reads the brightness emitted from earth to space to determine active events.

The map represents the density of fires in NSW. The highest represent a maximum of 6 fires in a
2km radius and the lowest represent one.

want to know more about this [project?](https://juliocent.github.io/portfolio/)

''')



<a id='Preparing_data'></a>

#### B. Preparing the prediction data
In order to create a model that works we need to kringe the weather data for each of the squares.

THis process seems straightforward, but needs to be done carefully. A mistake in this step can change the results significantly.

