# 🔥 Fire...or No Fire?

## **Step 1:** Setup
We need to set up the dataframe to initial state.
Then we can manipulate it more in later steps, by cleaning it, digesting it, etc.

### Import Packages
> "If I have seen further, it is by standing on the shoulders of Giants."<br />
> &mdash; Isaac Newton


In [1]:
# glob crawler for reading files and folders
import glob

# regex parser for advanced string-parsing
import re

# pandas used for dataframe manipulation
import pandas

# numpy is used for odds and ends, like high-efficiency arrays
import numpy

# python imaging library; used for deconstructing images
from PIL import Image

# other imports are described as they are used
from keras.models import Sequential
from keras.losses import binary_crossentropy
from keras.optimizers import Adadelta
from keras.preprocessing.image import img_to_array
from sklearn.model_selection import train_test_split

Using TensorFlow backend.


### Crawl Datafiles
The **glob** package is great and sets up our project for success.
Be sure you have your directory-structure set up correctly as stated in the README&hellip;
Otherwise, I cannot guarentee that this project will work for you!

In [2]:
# get all files from dirs in the data dir
files = glob.glob('data/*/*.*')

In [3]:
# prepare dataset in an array
dataset = []

# loop through every file that "glob" found.
for filepath in files:
	# regex used for Windows/MacOS compatibility
	filecrawl = re.split(r'\\+|/+', filepath)

	# remove the "data" folder entry; its not needed
	filecrawl = filecrawl[1:]

	# tag images from the fire-images folder as "fire"
	if filecrawl[0] == 'Fire images':
		filecrawl.append(1)
	else:
		filecrawl.append(0)

	# add filecrawl findings to dataset
	dataset.append(filecrawl)
	# ==NOTE==
	# Because the "glob" package arbitrarily crawls files, the
	# index of an item may not be the same each time this is run.

### Create Dataframe

In [4]:
# create our project's main dataframe
dataframe = pandas.DataFrame(dataset, columns=['folder', 'filename', 'fire'])

# display shows these tables neatly, shown below
display(dataframe.head(), dataframe.tail())

Unnamed: 0,folder,filename,fire
0,Fire images,1.jpg,1
1,Fire images,10-9-15-2-400.jpg,1
2,Fire images,11_10_19-mjs_ft_hotel-fire_19183862.jpg,1
3,Fire images,132-img1.png,1
4,Fire images,132343342_21n.jpg,1


Unnamed: 0,folder,filename,fire
646,Normal Images 5,x14862823.jpg,0
647,Normal Images 5,xbR1sVO.png,0
648,Normal Images 5,xR7vfcy.jpg,0
649,Normal Images 5,xsylvia-hotel-queen-room.jpg.pagespeed.ic.hw9T...,0
650,Normal Images 5,y5IQiH.jpg,0


## Step 2: Data Manipulation

### Check for Duplicates

In [5]:
# check for any duplicate filenames in the dataframe
duplicates = dataframe['filename'].duplicated(keep=False)

# Select rows with duplicate filenames
duplicate_rows = dataframe[duplicates]
display(duplicate_rows)

# create a warning if duplicates exist
if duplicates.sum() != 0:
	warning = Warning(
		f'There are {duplicates.sum()}'
		' duplicated filenames in the dataframe.'
		' proceed with caution.'
	) 
	display(warning)

Unnamed: 0,folder,filename,fire
0,Fire images,1.jpg,1
7,Fire images,14.jpg,1
70,Fire images,images.jpg,1
77,Fire images,maxresdefault.jpg,1
124,Normal Images 1,1.jpg,0
133,Normal Images 1,14.jpg,0
417,Normal Images 3,images.jpg,0
462,Normal Images 3,maxresdefault.jpg,0




### Rebalance Datapoints
I used [this article][rebalancing] to help figure things out.

[rebalancing]: https://towardsdatascience.com/having-an-imbalanced-dataset-here-is-how-you-can-solve-it-1640568947eb

In [6]:
# == FIXME ==
# this does not seem to be working properly at the moment

"""
from imblearn.ensemble import BalancedBaggingClassifier
from sklearn.tree import DecisionTreeClassifier

# create an object of the classifier, called "rebalancinator"
rebalancinator = BalancedBaggingClassifier(
	base_estimator=DecisionTreeClassifier(),
	sampling_strategy='auto',
	replacement=False,
	random_state=0
)

'''
# train the classifier.
rebalancinator.fit(x_train, y_train)
preds = rebalancinator.predict(x_train)
'''
"""

pass

### Train/Test Split
*Why do we need train/test split?*
*What does it do?*

In [7]:
# for x: keep the dataframe but drop the fire column
x = dataframe.drop(columns=['fire'])

# for y: drop everything in the dataframe but fire
y = dataframe.loc[:, ['fire']]

In [26]:
# Use the built-in "train test split" function
# to generate the four desireable segments of data.
x_train, x_test, y_train, y_test = train_test_split(
	x, y, test_size=0.35)

x_train = x_train.reset_index().drop('index', 1)
y_train = y_train.reset_index().drop('index', 1)
x_test = x_test.reset_index().drop('index', 1)
y_test = y_test.reset_index().drop('index', 1)

display(x_test.head())
display(y_test.head())

Unnamed: 0,folder,filename
0,Fire images,o-WASHINGTON-FIRE-900.jpg
1,Fire images,christmas_tree_fire.jpg
2,Fire images,fire storm.jpg
3,Normal Images 2,exterior-paint-colors-for-houses.jpg
4,Normal Images 4,Office.jpg


Unnamed: 0,fire
0,1
1,1
2,1
3,0
4,0


### Create Image Vectors

In [9]:
image_length = 256

In [10]:
def get_img_vector(x, index):
	# find filepath
	filename = x['filename'][index]
	folder = x['folder'][index]
	filepath = f'data/{folder}/{filename}'

	# open the image via its filepath
	img = Image.open(filepath)
	# ==NOTE==
	# the Image class was imported from PIL
	# (python image library)

	# remove transparency layer
	img = img.convert('RGB')

	# resize the image
	img = img.resize((image_length, image_length))

	# return the image vector
	return img_to_array(img)

In [11]:
def data_gen(x, y, batch_size):
	# n_batch variables are empty arrays of constant size.
	# x_batch is has RGB values for each pixel's coordinate.
	# y_batch represents whether there is fire or not (0/1).
	x_batch = numpy.zeros((batch_size, image_length, image_length, 3))
	y_batch = numpy.zeros((batch_size, 1))

	# loop through entire dataframe, index by index
	for index in range(len(x)):
		# using batch_size, we can determine 
		x_batch[index % batch_size] = get_img_vector(x, index)
		y_batch[index % batch_size] = y['fire'][index]

		# if there has been {batch_size} items, we yield.
		# the last batch is an outlier; its batch is smaller.
		if ((index + 1) % batch_size == 0
		or (index + 1) == len(dataframe)):
			yield (x_batch, y_batch)

## Step 3: Train Model

In [12]:
# use "sequential" mode from keras module
# see https://jovianlin.io/keras-models-sequential-vs-functional/
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
model = Sequential()
model.add(Conv2D(16, (4,4), activation='relu', padding='same', input_shape=(image_length, image_length, 3)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

In [13]:
# model = Sequential()
# model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(image_length, image_length, 3)))
# model.add(Conv2D(64, (3, 3), activation='relu'))
# model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
# model.add(Flatten())
# model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.5))
# model.add(Dense(1, activation='sigmoid'))

In [14]:
model.compile(
	loss=binary_crossentropy,
	optimizer=Adadelta(),
	metrics=['accuracy']
)

In [15]:
batch_size = 64
epochs = 5

for epoch in range(epochs):
	print(f"Epoch {epoch+1} / {epochs}")
	for x_batch, y_batch in data_gen(x_train, y_train, batch_size):
		model.train_on_batch(
			x_batch, y_batch
		)
		loss, accuracy = model.evaluate(x_batch, y_batch)
		print('accuracy:', accuracy)

Epoch 1 / 5
accuracy: 0.890625
accuracy: 0.8125
accuracy: 0.15625
accuracy: 0.828125
accuracy: 0.875
accuracy: 0.859375
Epoch 2 / 5
accuracy: 0.890625
accuracy: 0.34375
accuracy: 0.84375
accuracy: 0.828125
accuracy: 0.875
accuracy: 0.859375
Epoch 3 / 5
accuracy: 0.890625
accuracy: 0.796875
accuracy: 0.84375
accuracy: 0.859375
accuracy: 0.78125
accuracy: 0.75
Epoch 4 / 5
accuracy: 0.765625
accuracy: 0.78125
accuracy: 0.84375
accuracy: 0.6875
accuracy: 0.875
accuracy: 0.859375
Epoch 5 / 5
accuracy: 0.890625
accuracy: 0.796875
accuracy: 0.84375
accuracy: 0.828125
accuracy: 0.875
accuracy: 0.859375


## Step 4: Affirm Model Accuracy

In [116]:
def check_fire(index):
	fire = model.predict(get_img_vector(x_test, index).reshape(-1, 256, 256, 3))
# 	print(fire)
	if fire > 0.3:
		return (True, y_test['fire'][index]==1)
	else:
		return (False, y_test['fire'][index]==1)

In [117]:
def construct_confusion_matrix():
	TP = 0 # true positive
	TN = 0 # true negative
	FP = 0 # false positive
	FN = 0 # false negative
	for index in list(range(len(x_test))):
		if check_fire(index) == (True, True):
			TP += 1
		elif check_fire(index) == (False, False):
			TN += 1
		elif check_fire(index) == (True, False):
			FP += 1
		elif check_fire(index) == (False, True):
			FN += 1
	return [[TP, TN],[FP, FN]]


construct_confusion_matrix()

[[42, 87], [96, 3]]

In [132]:
for index in list(range(50)):
	predict = model.predict(get_img_vector(x_test, index).reshape(-1, 256, 256, 3))
	print(
		f'\n{predict[0][0]} \n' 
		'<img src="./data/'
		f'{x_train["folder"][index]}'
		'/'
		f'{x_train["filename"][index]}">'
	)


0.4992389976978302 
<img src="./data/Normal Images 2/choosing-right-cctv-for-office.jpg">

0.4992389976978302 
<img src="./data/Normal Images 2/ES_execconfroom_6_712x342_FitToBoxSmallDimension_Center.jpg">

0.4992389976978302 
<img src="./data/Normal Images 2/crystal-cave-8.jpg">

0.4992389976978302 
<img src="./data/Fire images/10-9-15-2-400.jpg">

0.4992389976978302 
<img src="./data/Normal Images 1/8471644323_90981b7693.jpg">

0.4992389976978302 
<img src="./data/Normal Images 2/dark-red-living-room.jpg">

0.0 
<img src="./data/Normal Images 2/darcy434_largeimage.jpg">

0.0 
<img src="./data/Normal Images 1/5724aaf0e4b00a870d466396_853x480_U_v1.jpg">

0.0 
<img src="./data/Normal Images 3/main-qimg-f42d9fe3ea8e255dece2196f1e588ce0-c.jpg">

0.4992389976978302 
<img src="./data/Normal Images 2/cdnassets.hw.net.jpg">

0.0 
<img src="./data/Normal Images 5/warm-tech-living-room.jpg">

0.4992389976978302 
<img src="./data/Normal Images 3/Le-Meridien-Piccadilly--Classic-Room.jpg">

0.0 



<img src="./data/Normal Images 1/29a74da6d3a8a4577a43d76b519128c1.jpg">