# Fire or Not?  
<hr style="border:2px solid magenta"> 

**Description** 
* Wildfires are very unpredictable and can occur randomly. Using sattelite images, we can in real time detect a wildfire and warn the proper authorities to mitigate the outgoing damage from wildfires. 

**Objective**
* Create a model that can detect if there is a wildfire in the image with a high f1 score as this is a measure that combines recall and precision of the model. We would want authorities to respond to a real wildfire and false alarm rather than not be alerted that there is a wildfire. It is better to be safe than sorry.  

**Methodology**
* Using a Convolutional Neural Network for wildfire detection. The architecture was designed usign Keras API and was implemented using Python, Tensorflow.  

**Data**
* The dataset was provided by Kaggle: [Wildfire Prediction Dataset (Satellite Images)](https://www.kaggle.com/datasets/abdelghaniaaba/wildfire-prediction-dataset/data). It has been divided into three directories: test, train, and validation. The file name are the coordinates of the wildfire location.

### Imports
<hr style="border:2px solid magenta">  

Grabbing the important imports needed

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
import os
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from pathlib import Path
from tensorflow.keras.preprocessing.image import ImageDataGenerator

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, BatchNormalization
from tensorflow.keras import backend

from sklearn.metrics import confusion_matrix, classification_report
from tensorflow.keras.metrics import Recall, Precision, AUC

In [2]:
from src.extract_to_df import extract_to_df
from src.visualizations import plot_cm, plot_graph
from src.metric_notes import metric_note

In [3]:
#prevent a bug that some images are truncated
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True


### Extracting Data  
<hr style="border:2px solid magenta">  

Setting the paths to the data

In [4]:
test = Path('Data/test')
train = Path('Data/train')
valid = Path('Data/valid')

Using a function called extract_to_df that is in the src folder. This function will extract useful information from each file and return a dataframe containing the relative path, latitude and longtitude coordinates, and the class of the image : wildfire or nowildfire. 

In [5]:
train_df = extract_to_df(train, 'Train')
test_df = extract_to_df(test, 'Test')
val_df = extract_to_df(valid,'Valid')

Creating the generators used to extract the images

In [6]:
train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    width_shift_range = 0.2, 
    height_shift_range = 0.2,
    horizontal_flip = True, 
    vertical_flip = True
)

test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale = 1./255
)

Extracting images. Starting with 32x32 pixels. Might change it to 64x64 or 224x224 which is another image size standard. Pixel can be changed for the images. It helps with renaming the files and such later on. 

In [7]:
pixel = 32
size = f'{pixel}x{pixel}'

In [8]:
train_images = train_generator.flow_from_dataframe(dataframe=train_df,
                                                 x_col = 'Path',
                                                 y_col = 'Label',                           
                                                 target_size = (pixel,pixel),
                                                 class_mode = 'binary',
                                                 color_mode = 'rgb',
                                                 shuffle = True,
                                                 seed = 42,
                                                 batch_size = 128)

valid_images = train_generator.flow_from_dataframe(dataframe=val_df,
                                                 x_col = 'Path',
                                                 y_col = 'Label',                           
                                                 target_size=(pixel,pixel),
                                                 class_mode = 'binary',
                                                 color_mode = 'rgb',
                                                 shuffle = True,
                                                 seed = 42,
                                                 batch_size = 64)

test_images = test_generator.flow_from_dataframe(dataframe=test_df,
                                                 x_col='Path',
                                                 y_col='Label',
                                                 target_size=(pixel,pixel),
                                                 class_mode='binary',
                                                 color_mode='rgb',
                                                 shuffle=True,
                                                 seed=42,
                                                 batch_size=128)


Found 30250 validated image filenames belonging to 2 classes.
Found 6300 validated image filenames belonging to 2 classes.
Found 6300 validated image filenames belonging to 2 classes.


There is a lot of images already. Roughly 42850 images.

In [9]:
metrics = metric_note(train_images, test_images, valid_images)

### Modeling TIME  
<hr style="border:2px solid magenta">  

Let's do a simple CNN with 1-Convolutional Layer

In [10]:
cnn1 = Sequential()
cnn1.add(Conv2D(filters=32,
                kernel_size=(3, 3),
                activation='relu',
                input_shape=(pixel,pixel, 3)))
cnn1.add(MaxPooling2D(pool_size=(2,2)))

cnn1.add(Flatten())

cnn1.add(Dense(128, activation='relu'))
cnn1.add(Dense(1, activation='sigmoid'))

cnn1.compile(optimizer='adam',
             loss='binary_crossentropy',
             metrics=['accuracy', Precision(), Recall(), AUC()])


Adding an early stopping to prevent overfitting and save computational resources and time. 

In [11]:
early_stop = tf.keras.callbacks.EarlyStopping(
            monitor='loss',
            patience=4,
            restore_best_weights=True
        )

Fit the train_images into this one deep cnn with a batch size of 64, epochs of 50, validation data with valid_iamges. Using the early stopping for the reason mentioned above and using workers of 6 (Which I believe is similar to n_jobs in sklearn)

In [12]:
results1 = cnn1.fit(train_images,
                    batch_size = 64, 
                    epochs = 50,
                    validation_data = valid_images,
                    callbacks = [early_stop],
                    workers = 6)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Already at 95% accurate and 0.1220 log loss. This is a strong model already.

In [13]:
metrics.evaluate(cnn1, 'CNN 1', size)



In [14]:
metrics.printout

Unnamed: 0,Model,Size,train log_loss,train accuracy,train precision,train recall,train auc,test log_loss,test accuracy,test precision,test recall,test auc,val log_loss,val accuracy,val precision,val recall,val auc
0,CNN1,32x32,0.204928,0.920165,0.919472,0.927936,0.973508,0.267886,0.904444,0.971494,0.852012,0.978724,0.190346,0.926032,0.940386,0.924713,0.979167


In [None]:
plot_graph(results1, size, 'CNN 1')

In [None]:

cnn1_predictions = plot_cm(cnn1, 'CNN 1', test_images, size)

In [None]:
print(classification_report(y_true = test_images.labels, y_pred = cnn1_predictions))

In [None]:
backend.clear_session()


<hr style="border:2px solid magenta">  

Let's increase the complexity by making 4 deep. I am going to add some dropout and batchnormalization to help prevent overfitting

In [None]:
cnn4 = Sequential()

cnn4.add(Conv2D(filters=32,
                kernel_size=3,
                activation='relu',
                input_shape=(pixel,pixel, 3)))
cnn4.add(MaxPooling2D(pool_size=2))
cnn4.add(BatchNormalization())
cnn4.add(Dropout(0.2))

cnn4.add(Conv2D(filters=32,
                kernel_size=3,
                activation='relu'))
cnn4.add(MaxPooling2D(pool_size=2))
cnn4.add(BatchNormalization())
cnn4.add(Dropout(0.2))

cnn4.add(Conv2D(filters=64,
                kernel_size=3,
                activation='relu'))
cnn4.add(MaxPooling2D(pool_size=2))
cnn4.add(BatchNormalization())
cnn4.add(Dropout(0.2))

cnn4.add(Conv2D(filters=128,
                kernel_size=3,
                activation='relu'))
cnn4.add(MaxPooling2D(pool_size=2))
cnn4.add(BatchNormalization())
cnn4.add(Dropout(0.2))

cnn4.add(Flatten())

cnn4.add(Dense(128, activation='relu'))
cnn4.add(BatchNormalization())
cnn4.add(Dropout(0.2))

cnn4.add(Dense(1, activation='sigmoid'))

cnn4.compile(optimizer='adam',
             loss='binary_crossentropy',
             metrics=['accuracy', Precision(), Recall(), AUC()])


In [None]:
cnn4.summary()

In [None]:
results4 = cnn4.fit(train_images,
                    batch_size = 64, 
                    epochs = 50,
                    validation_data = valid_images,
                    callbacks = [early_stop],
                    workers = 6)

In [None]:
plot_graph(results4, size, 'CNN 4')

In [None]:
metrics.evaluate(cnn4, 'CNN 4', size)


In [None]:
cnn4_predictions = plot_cm(cnn4, 'CNN 4', test_images, size)

In [None]:
print(classification_report(y_true = test_images.labels, y_pred = cnn4_predictions))

In [None]:
backend.clear_session()

<hr style="border:2px solid magenta">  