# Predicting Impact of Forest Fires

## Introduction

Over the past few years, extreme forest fires have raised a global concern for it being a major environmental problem. The conditions on Earth are now more conductive to fire proliferation, due to climate change creating hotter and drier conditions. However, the complexity and extent of forest fires go further beyond the statistics, including human casualties, financial costs, and burnt land [1].  It is recognized as a "major threat to public health security in the 21st century [2]." From our forest fires data set, we want to **predict how much area will be burned in a forest fire by analyzing weather conditions known to affect fire occurrences**. 

The data consists of a collection of 517 records, each corresponding to a fire that occurred in the Montesinho natural park, in the northeast region of Portugal, from January 2000 to December 2003. The following list below contains the selected dataset predictors against our response variable, "area", which is measured in hectares.

**Dataset Attributes:**
- Month (Month of the year)
- FFMC (FFMC index from the FWI system) -> moisture content of surface litter (key to ignition and fire spread)
- DMC (DMC index from the FWI system) -> moisture content of shallow organic layers ((important to surface fire intensity and difficulty of control)
- DC (DC index from the FWI system) -> moisture content of deep organic layers (important to surface fire intensity and difficulty of control)
- ISI (ISI index from the FWI system) -> score that correlates with fire velocity spread
- Temp (temperature in Celsius degrees)
- RH (relative humidity in %)
- Wind (wind speed in km/h)
- Rain (outside rain mm/m2)


## Exploratory Data Analysis

## Methods: Plan

In this analysis, we are drawing from a data set that is trustworthy. For one, there are no missing values. This means we don’t have to omit any observations, or fill in blanks ourselves, which makes our report more reliable. Additionally, the nature of the data is objective. By taking measurements of location, weather aspects, and area burnt, there is very little human subjectivity involved. Finally, by setting the seed to maintain consistent results and explaining our decision-making along the way, we will produce a reproducible and trustworthy report from the data.

**Methods we plan to use:**

- Use training data to apply forward substitution and Bayesian Information Criterion (BIC) values to choose predictor variables
- Use training data to utilize Asymptotic and Bootstrap methods to create linear regression models which will predict "area burned", our response variable 
- Compare the Asymptotic and Bootstrap methods
- Check for issues such as Heteroscedasticity, Normality, and Multicollinearity in our model, and adjust our model accordingly to rectify them
- With our test data, evaluate the statistical model using the Test Mean Squared Error
- Make a prediction using local forest data from British Columbia, while keeping in mind issues around generalizability


**We expect to achieve:**

Given the necessary information about a location, we expect to be able to predict with reasonable accuracy the area burned if a forest fire were to occur in that location. We also expect to go through a systematic, reproducible, process to determine which predictor variables create a good statistical model.

**Potential impact:**

Our report could be useful in many ways. For one, our results could help with resource designation, as there are only a finite amount of resources for fire-fighting. Therefore, if we know, for example, that low humidity is a big indicator of forest fire area, we can allocate more fire-fighting resources to areas with low-humidity than to high-humidity areas. In addition, our results could help with fire-preparation. If the data indicates that a fire is predicted to cover a very large area, it could be life saving to alert citizens of the danger to evacuate with ample warning.

## References

[1] Amorim, Jorge & J.J., KEIZER & A.I., MIRANDA & K., MONAGHAN. (2011). Forest fires research: beyond burnt area statistics. 

[2] Pitman, A. J., Narisma, G. T., & McAneney, J. (2007). The impact of climate change on the risk of forest and grassland fires in Australia. Climatic Change, 84(3-4), 383–401. https://doi.org/10.1007/s10584-007-9243-6 
