# DTSC 100  Group 18 Project 




### INTRODUCTION

Forest fires are a disaster that harms both the environment and the economy through the damage that they cause to our planet. Algeria is known to be one of the many countries that have been greatly affected by forest fires. The data set Algerian Forest Fires from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Algerian+Forest+Fires+Dataset++) contains attributes that were collected to determine the probability of forest fires occurring from many different factors. We are trying to answer whether or not fires can be accurately predicted by using a k-nearest neighbors classification on factors such as rain, wind speed, temperature, and humidity.

The data set contains information about forest fires in two regions of Algeria, the Bejaia region that is located in the Northeast of Algeria, and the Sidi Bel-abbes region located in the Northwest of Algeria, both of these regions are combined together into one data set. We will be using the Bejaia region within this project. The data set includes 11 attributes and 1 output attribute (class). The attributes within this data set are: Day, Month, Year, Temperature, Relative Humidity (RH), Wind Speed (WS), Rain, Fine Fuel Moisture Code (FFMC), Duff Moisutre Code (DMC), Drought Code (DC), Initial Spread Index (ISI), Buildup Index (BUI), and Fire Weather Index (FWI) components. 


### METHOD AND RESULTS

In [None]:
#describe in written English the methods you used to perform your analysis from beginning to end that narrates the code the does the analysis.

We will be conducting our data analysis by using classification by K-nearest neighbours. The attributes we will be using are date, temp, RH, Ws, Rain, and FWI components like FFMC, DMC, DC,ISI and BUI. We chose these attributes because we think these will be the most useful in helping us predict fires. We will report the results with graphs showing whether or not there is a strong relationship between the different factors and whether or not there will be a fire, as well as an accuracy % for our classification model.

In [None]:
library(tidyverse)
library(repr)
library(tidymodels)

In [None]:
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00547/Algerian_forest_fires_dataset_UPDATE.csv"

download.file(url, destfile = "algerian_fires.csv")
fire_df <- read_csv("algerian_fires.csv", skip = 1)
#original data set

In [None]:
#make class as factor. turn month day year into date. 
#select all columns except fwi, month, day year
#re order columns so date first
# take only rows from first region
df <- fire_df %>% 
        mutate(Classes = as.factor(Classes), date = as.Date(paste(month, day, year, sep = "."), format = "%m.%d.%y")) %>% 
        select(-FWI, -month, -year, -day) %>% 
        select(date, everything()) %>% 
        slice(1:122)
df

#make all rows except classes as numeric
df[, 2:10] <- sapply(df[, 2:10] , as.numeric)

#splitting into training and testing data
set.seed(1)
fire_split <- initial_split(df, prop = 0.75, strata = Classes)
fire_train <- training(fire_split)
fire_test <- testing(fire_split)

In [None]:
#why did we make this summary table and what does it tell the readers? 
summary(fire_train)

In [None]:
## Visualization plot examples 
#explain why we made these plots and what is the significant of using these certain predictors 
cbPalette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7", "#999999")
Temp_RH_plot <- ggplot(fire_train, aes(x = Temperature, y = RH, colour = Classes)) + 
                geom_point() + 
                labs(x = "Max Temperature at noon (Celsius degrees)", y = "Relative Humidity (%)", color = "Fire?") + 
                scale_color_manual(values = cbPalette) +
                theme(text = element_text(size = 14))


Temp_RH_plot

WS_Rain_plot <- ggplot(fire_train, aes(x = Ws, y = Rain, colour = Classes)) + 
                geom_point() + 
                labs(x = "Wind speed (km/h)", y = "Total Rain per day (mm)", color = "Fire?") + 
                scale_color_manual(values = cbPalette) + 
                theme(text = element_text(size = 14))


WS_Rain_plot

In [None]:
### Evaluation of Temperature and Relative Humidity 
set.seed(1738)
fire_training_vfold <- vfold_cv(fire_train, v = 30, strata = Classes)

knn_tune <- nearest_neighbor(weight_func = "rectangular", neighbors = tune()) %>%
       set_engine("kknn") %>%
      set_mode("classification") 

fire_recipe <- recipe(Classes ~ Rain + RH + Ws + Temperature,  data = fire_train) %>%
        step_scale(all_predictors()) %>%
        step_center(all_predictors()) 


cross_val_metrics <- workflow() %>%
                add_recipe(fire_recipe) %>%
                add_model(knn_tune) %>%
                tune_grid(resamples = fire_training_vfold, grid = 15) %>%
                collect_metrics()


accuracies <- cross_val_metrics %>% 
       filter(.metric == "accuracy")

cross_val_plot <- ggplot(accuracies, aes(x = neighbors, y = mean))+
       geom_point() +
       geom_line() +
       labs(x = "Neighbors", y = "Accuracy Estimate") +
       scale_x_continuous(breaks = seq(0, 14, by = 1)) +  # adjusting the x-axis
       scale_y_continuous(limits = c(0.4, 1.0)) # adjusting the y-axis



cross_val_plot

In [None]:
fire_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 13) %>%
      set_engine("kknn") %>%
      set_mode("classification")

fire_fit <- workflow() %>%
      add_recipe(fire_recipe) %>%
      add_model(fire_spec) %>%
      fit(data = fire_train)

fire_preds <- predict(fire_fit, fire_test) %>%
      bind_cols(fire_test)


fire_metrics <- fire_preds %>% 
    metrics(truth = Classes, estimate = .pred_class)

fire_conf_mat <- fire_preds %>% 
    conf_mat(truth = Classes, estimate = .pred_class) 

fire_metrics
fire_conf_mat

### Discussion 

In [None]:
#summarize what you found
#discuss whether this is what you expected to find?
#discuss what impact could such findings have?
#discuss what future questions could this lead to?

### References 

In [None]:
#At least 2 citations of literature relevant to the project (format is your choice, just be consistent across the references).
#Make sure to cite the source of your data as well.

Faroudja ABID et al. , â€œPredicting Forest Fire in Algeria using Data Mining Techniques: Case Study of the Decision Tree Algorithmâ€, International Conference on Advanced Intelligent Systems for Sustainable Development (AI2SD 2019) , 08 - 11 July , 2019, Marrakech, Morocco.