In [18]:
#proposal idea:
#Title: The Relationship between Fine Fuel Moisture, Build-up Index, and the presence of Wildfire in the Sidi Bel-abbes region of northwest Algeria
#Introduction:
#There are several contributing factors that lead to the presence of forest fires, and in this project, we will classifying whether a wildfire can happen under specified fine fuel moisture and build-up index values. 
    #Fine Fuel moisture Code (FFMC) represents fuel moisture of forest litter fuels under the shade of a forest canopy
    #The Build-Up index (BUI) is a numeric rating of the total amount of fuel available for combustion.
#The dataset we are using includes 9 columns classifiers (of which we chose two), as well as date and time classifiers and class labels.
    #For the class labels, there are two options: fire or not fire. 
    #For our analysis, we will only analyze the Sidi-Bel Abbes Region, as different regions may have varying abiotic and biotic characterstics that can interfere with our analysis
#In this classification project, we will create a model that will be able to classify the values of our two variables into whether we should expect a fire or not.
#Our exploratory table shows how many of each class (fire or no fire) there is, so we are able to visualize the distribution of our classes, and know we have enough data from both class to use in our analysis.
#We will visualize our results with a scatterplot, where we can clearly see the relationship between our independent and dependent variables, and whether our predictions align with a previously seen trend.
#Based on our preliminary observations, we expect that an observation with >80 FFMC and >40 Build-up Index will be classified as Fire. 
#These findings could help Wildfire Services develop ways to distrupt or control the spread of wildfires,
#Future questions include how will the loss of available fuel (aka forests) impact the prevalence of wildfires in Algeria, and ultimately, the destroy-regrowth cycle of it's ecosystem.

In [19]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)

In [20]:
set.seed(1)
data<-read_csv("FIREDATA.csv", skip=126)
  

data_clean<-data%>%
    filter(Classes!= is.na())

    

data_split<-initial_split(data,prop=0.75, strata=Classes)
data_train<-training(data_split)
data_test<-testing(data_split)

data_train
data_test
	
    


Parsed with column specification:
cols(
  day = [32mcol_double()[39m,
  month = [32mcol_double()[39m,
  year = [32mcol_double()[39m,
  Temperature = [32mcol_double()[39m,
  RH = [32mcol_double()[39m,
  Ws = [32mcol_double()[39m,
  Rain = [32mcol_double()[39m,
  FFMC = [32mcol_double()[39m,
  DMC = [32mcol_double()[39m,
  DC = [31mcol_character()[39m,
  ISI = [32mcol_double()[39m,
  BUI = [32mcol_double()[39m,
  FWI = [31mcol_character()[39m,
  Classes = [31mcol_character()[39m
)



ERROR: Error: Problem with `filter()` input `..1`.
[31m✖[39m 0 arguments passed to 'is.na' which requires 1
[34mℹ[39m Input `..1` is `Classes != is.na()`.


In [None]:
data_summarize<-data_test%>%
    group_by(Classes)%>%
    summarize(n=n())
data_summarize
#from our analysis, we can see that the "fire" and "no fire" classes have 19 and 10 counts respectively in our training data. However, there is one row with missing data/variables present in the testing data

In [None]:
data_variables<-data_train%>%
    select(FFMC, BUI, Classes)
data_variables

data_plot_initial<-data_variables%>%
    ggplot(aes(x=FFMC, y=BUI, group=Classes))+
    geom_point(aes(color= Classes))+
           geom_smooth(method= lm, se=FALSE)+
    labs(x="Fine Fuel Moisture", y="Build-Up Index", color="Class", title= "Build-up Index vs Fine Fuel Moisture")+
    theme(text=element_text(size=18))
          
data_plot_initial

data_plot_initial_3<-data_variables%>%
    ggplot(aes(x=FFMC, y=BUI, group=Classes))+
    geom_point(aes(color= Classes))+
    labs(x="Fine Fuel Moisture", y="Build-Up Index", color="Class",title= "Build-up Index vs Fine Fuel Moisture")+
    theme(text=element_text(size=18))

data_plot_initial_2<-data_variables%>%
ggplot(aes(x=FFMC, y=BUI))+
    geom_point()+
   geom_smooth(method= lm, se=FALSE)+
labs(x="Fine Fuel Moisture", y="Build-Up Index", color="Class",title= "Build-up Index vs Fine Fuel Moisture")+
    theme(text=element_text(size=18))

data_plot_initial_2
data_plot_initial_3

In [None]:


#Cross validation:

data_vfold<-vfold_cv(data_train, v=5,  strata= Classes) 

#Recipe:
data_recipe<- recipe( Classes~ FFMC+ BUI, data= data_train)%>%
step_scale(all_predictors())%>%
step_center(all_predictors()) 



data_spec<-nearest_neighbor(weight_func="rectangular", neighbors=tune())%>%
    set_engine("kknn")%>%
    set_mode("classification")

data_workflow<-workflow()%>%
    add_recipe(data_recipe)%>%
    add_model(data_spec)%>%
    fit_resamples(resamples=data_vfold)


data_workflow