## Forest Fires - Optimizing Model Prediction

Sourced from the UCI Machine Learing Repository: [Forest Fires](https://archive.ics.uci.edu/dataset/162/forest+fires)

## Import Libraries

In [2]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

## Load Data

In [3]:
fires = pd.read_csv('data/fires.csv')

## Explore Data

A quick description of our columns:

| Variable Name | Type        | Description                                                                                          | Units            | Missing Values |
|--------------|------------|------------------------------------------------------------------------------------------------------|------------------|---------------|
| X            | Integer    | x-axis spatial coordinate within the Montesinho park map: 1 to 9                                   | -                | No            |
| Y            | Integer    | y-axis spatial coordinate within the Montesinho park map: 2 to 9                                   | -                | No            |
| month        | Categorical | Month of the year: 'jan' to 'dec'                                                                  | -                | No            |
| day          | Categorical | Day of the week: 'mon' to 'sun'                                                                    | -                | No            |
| FFMC         | Continuous | FFMC index from the FWI system: 18.7 to 96.20                                                       | -                | No            |
| DMC          | Integer    | DMC index from the FWI system: 1.1 to 291.3                                                         | -                | No            |
| DC           | Continuous | DC index from the FWI system: 7.9 to 860.6                                                          | -                | No            |
| ISI          | Continuous | ISI index from the FWI system: 0.0 to 56.10                                                         | -                | No            |
| temp         | Continuous | Temperature: 2.2 to 33.30                                                                          | Celsius degrees  | No            |
| RH           | Integer    | Relative humidity: 15.0 to 100                                                                      | %                | No            |
| wind         | Continuous | Wind speed: 0.40 to 9.40                                                                           | km/h             | No            |
| rain         | Integer    | Outside rain: 0.0 to 6.4                                                                           | mm/m²            | No            |
| area         | Integer    | The burned area of the forest: 0.00 to 1090.84 (this output variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform). | ha               | No            |


In [4]:
fires.head()

Unnamed: 0.1,Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
0,1,7,5,mar,fri,86.2,26.2,94.3,5.1,,51.0,6.7,0.0,0.0
1,2,7,4,oct,tue,90.6,,669.1,6.7,18.0,33.0,0.9,0.0,0.0
2,3,7,4,oct,sat,90.6,43.7,,6.7,14.6,33.0,1.3,0.0,0.0
3,4,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97.0,4.0,0.2,0.0
4,5,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99.0,,0.0,0.0


In [None]:
# Drop unnecessary index column
fires.drop(columns=['Unnamed: 0'], inplace=True)

In [9]:
fires.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 517 entries, 0 to 516
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   X       517 non-null    int64  
 1   Y       517 non-null    int64  
 2   month   517 non-null    object 
 3   day     517 non-null    object 
 4   FFMC    469 non-null    float64
 5   DMC     496 non-null    float64
 6   DC      474 non-null    float64
 7   ISI     515 non-null    float64
 8   temp    496 non-null    float64
 9   RH      487 non-null    float64
 10  wind    482 non-null    float64
 11  rain    485 non-null    float64
 12  area    517 non-null    float64
dtypes: float64(9), int64(2), object(2)
memory usage: 52.6+ KB


In [11]:
fires.isnull().sum()

X         0
Y         0
month     0
day       0
FFMC     48
DMC      21
DC       43
ISI       2
temp     21
RH       30
wind     35
rain     32
area      0
dtype: int64

## Establish Features, Target, and Reference

In [10]:
# Reference features
reference_features = ['temp', 'wind']

# Target variable
target = 'area'

# Instantiate reference model
model = LinearRegression()

## Data Processing