# 🔥 Fire...or No Fire?

## **Step 1:** Setup
We need to set up the dataframe to initial state.
Then we can manipulate it more in later steps, by cleaning it, digesting it, etc.

### Import Packages
> "If I have seen further, it is by standing on the shoulders of Giants."<br />
> &mdash; Isaac Newton


In [56]:
# glob crawler for reading files and folders
import glob

# regex parser for advanced string-parsing
import re

# pandas used for dataframe manipulation
import pandas

# numpy is used for odds and ends, like high-efficiency arrays
import numpy

# python imaging library; used for deconstructing images
from PIL import Image

# other imports are described as they are used
from keras.preprocessing.image import img_to_array
from sklearn.model_selection import train_test_split

### Crawl Datafiles
The **glob** package is great and sets up our project for success.
Be sure you have your directory-structure set up correctly as stated in the README&hellip;
Otherwise, I cannot guarentee that this project will work for you!

In [57]:
# get all files from dirs in the data dir
files = glob.glob('data/*/*.*')

In [58]:
# prepare dataset in an array
dataset = []

# loop through every file that "glob" found.
for filepath in files:
	# regex used for Windows/MacOS compatibility
	filecrawl = re.split(r'\\+|/+', filepath)

	# remove the "data" folder entry; its not needed
	filecrawl = filecrawl[1:]

	# tag images from the fire-images folder as "fire"
	if filecrawl[0] == 'Fire images':
		filecrawl.append(1)
	else:
		filecrawl.append(0)

	# add filecrawl findings to dataset
	dataset.append(filecrawl)
	# ==NOTE==
	# Because the "glob" package arbitrarily crawls files, the
	# index of an item may not be the same each time this is run.

### Create Dataframe

In [59]:
# create our project's main dataframe
dataframe = pandas.DataFrame(dataset, columns=['folder', 'filename', 'fire'])

# display shows these tables neatly, shown below
display(dataframe.head(), dataframe.tail())

Unnamed: 0,folder,filename,fire
0,Fire images,1.jpg,1
1,Fire images,10-9-15-2-400.jpg,1
2,Fire images,11_10_19-mjs_ft_hotel-fire_19183862.jpg,1
3,Fire images,132-img1.png,1
4,Fire images,132343342_21n.jpg,1


Unnamed: 0,folder,filename,fire
646,Normal Images 5,x14862823.jpg,0
647,Normal Images 5,xbR1sVO.png,0
648,Normal Images 5,xR7vfcy.jpg,0
649,Normal Images 5,xsylvia-hotel-queen-room.jpg.pagespeed.ic.hw9T...,0
650,Normal Images 5,y5IQiH.jpg,0


## Step 2: Manipulation

### Checking for Duplicates

In [69]:
# == FIXME ==
# this does not seem to be working properly at the moment

'''
# check for any duplicate filenames in the dataframe
duplicates = dataframe['filename'].duplicated()

# Select rows with duplicate filenames
duplicate_rows = dataframe[duplicates]
display(duplicate_rows)

# create a warning if duplicates exist
assert duplicates.sum() == 0, '\n' \
f'WARNING: there are {duplicates.sum()} duplicated filenames in the dataframe. proceed with caution.'
'''

pass

### Rebalancing Datapoints
I used [this article][rebalancing] to help figure things out.

[rebalancing]: https://towardsdatascience.com/having-an-imbalanced-dataset-here-is-how-you-can-solve-it-1640568947eb

In [70]:
# == FIXME ==
# this does not seem to be working properly at the moment

"""
from imblearn.ensemble import BalancedBaggingClassifier
from sklearn.tree import DecisionTreeClassifier

# create an object of the classifier, called "rebalancinator"
rebalancinator = BalancedBaggingClassifier(
	base_estimator=DecisionTreeClassifier(),
	sampling_strategy='auto',
	replacement=False,
	random_state=0
)

'''
# train the classifier.
rebalancinator.fit(x_train, y_train)
preds = rebalancinator.predict(x_train)
'''
"""

pass

### Train/Test Split
*Why do we need train/test split?*
*What does it do?*

In [61]:
x = dataframe.drop(columns=['fire'])
y = dataframe.loc[:, ['fire']]

display(x.head(), y.head())

Unnamed: 0,folder,filename
0,Fire images,1.jpg
1,Fire images,10-9-15-2-400.jpg
2,Fire images,11_10_19-mjs_ft_hotel-fire_19183862.jpg
3,Fire images,132-img1.png
4,Fire images,132343342_21n.jpg


Unnamed: 0,fire
0,1
1,1
2,1
3,1
4,1


In [73]:
# Use the built-in "train test split" function
# to generate the four desireable segments of data.
x_train, x_test, y_train, y_test = train_test_split(
	x, y, test_size=0.35, random_state=0)

display(x_train.shape)
display(x_test.shape)

(423, 2)

(228, 2)

---

In [79]:
def get_img_vector(id):
	# find filepath
	filename = dataframe['filename'][id]
	folder = dataframe['folder'][id]
	filepath = f'data/{folder}/{filename}'

	# open the image via its filepath
	img = Image.open(filepath)
	# ==NOTE==
	# the Image class was imported from PIL
	# (python image library)

	# resize the image
	img = img.resize((1024, 1024))

	# return the image vector
	return img_to_array(img)

In [83]:
get_img_vector(0)

array([[[117., 131., 140.],
        [137., 151., 160.],
        [135., 149., 158.],
        ...,
        [147., 169., 183.],
        [154., 176., 190.],
        [131., 153., 167.]],

       [[117., 131., 140.],
        [137., 151., 160.],
        [135., 149., 158.],
        ...,
        [147., 169., 183.],
        [154., 176., 190.],
        [131., 153., 167.]],

       [[138., 152., 161.],
        [158., 172., 181.],
        [157., 171., 180.],
        ...,
        [169., 191., 205.],
        [176., 198., 212.],
        [153., 175., 189.]],

       ...,

       [[110., 108., 130.],
        [119., 117., 138.],
        [113., 110., 129.],
        ...,
        [149., 116., 107.],
        [147., 114., 105.],
        [131.,  98.,  89.]],

       [[111., 103., 118.],
        [121., 110., 126.],
        [112., 101., 115.],
        ...,
        [138., 104.,  95.],
        [138., 104.,  95.],
        [125.,  91.,  82.]],

       [[111., 103., 118.],
        [121., 110., 126.],
        [112., 1

### Merge dfs

In [55]:
#Split dataframe into train and test based on fire column
# dataframe_train
# dataframe_test

def data_gen(dataframe, batch_size):
	while True:
		x_batch = numpy.zeros((batch_size, 3, 1024, 1024))
		y_batch = numpy.zeros((batch_size, 1))
		for j in range(len(dataframe/batch_size)):
			b = 0
			start = (j)*batch_size
			finish = (j+1)*batch_size
			for m, k in zip(
				dataframe['filename'].values[start:finish],
				dataframe['fire'].values[start:finish]
			):
				img = Image.open(f'{dataframe["folder"][b]}/{m}')
				image_red = img.resize((1024, 1024))
				x_batch[b] = img_to_array(image_red)
				y_batch[b] = k
				b += 1
			yield (x_batch, y_batch)

In [54]:
model.fit_generator(
	generator = data_gen(
		dataframe_train,
		batch_size = batch_size
	), 
	steps_per_epoch = len(dataframe_train) // batch_size,
	epochs = epochs
)

NameError: name 'model' is not defined