# 🔥 Fire...or No Fire?

## **Step 1:** Setup
We need to set up the dataframe to initial state.
Then we can manipulate it more in later steps, by cleaning it, digesting it, etc.

### Import Packages
> "If I have seen further, it is by standing on the shoulders of Giants."<br />
> &mdash; Isaac Newton


In [1]:
# glob crawler for reading files and folders
import glob

# regex parser for advanced string-parsing
import re

# pandas used for dataframe manipulation
import pandas

# numpy is used for odds and ends, like high-efficiency arrays
import numpy

# python imaging library; used for deconstructing images
from PIL import Image

# other imports are described as they are used
from keras.preprocessing.image import img_to_array

Using TensorFlow backend.


### Crawl Datafiles
The **glob** package is great and sets up our project for success.
Be sure you have your directory-structure set up correctly as stated in the README&hellip;
Otherwise, I cannot guarentee that this project will work for you!

In [2]:
# get all files from dirs in the data dir
files = glob.glob('data/*/*.*')

In [3]:
# prepare dataset in an array
dataset = []

# loop through every file that "glob" found.
for filepath in files:
	# regex used for Windows/MacOS compatibility
	filecrawl = re.split(r'\\+|/+', filepath)

	# remove the "data" folder entry; its not needed
	filecrawl = filecrawl[1:]

	# tag images from the fire-images folder as "fire"
	if filecrawl[0] == 'Fire images':
		filecrawl.append(1)
	else:
		filecrawl.append(0)

	# add filecrawl findings to dataset
	dataset.append(filecrawl)
	# ==NOTE==
	# Because the "glob" package arbitrarily crawls files, the
	# index of an item may not be the same each time this is run.

### Create Dataframe

In [4]:
# create our project's main dataframe
dataframe = pandas.DataFrame(dataset, columns=['folder', 'filename', 'fire'])

# display shows these tables neatly, shown below
display(dataframe.head(), dataframe.tail())

Unnamed: 0,folder,filename,fire
0,Fire images,1.jpg,1
1,Fire images,10-9-15-2-400.jpg,1
2,Fire images,11_10_19-mjs_ft_hotel-fire_19183862.jpg,1
3,Fire images,132-img1.png,1
4,Fire images,132343342_21n.jpg,1


Unnamed: 0,folder,filename,fire
646,Normal Images 5,x14862823.jpg,0
647,Normal Images 5,xbR1sVO.png,0
648,Normal Images 5,xR7vfcy.jpg,0
649,Normal Images 5,xsylvia-hotel-queen-room.jpg.pagespeed.ic.hw9T...,0
650,Normal Images 5,y5IQiH.jpg,0


## Step 2: Manipulation

In [5]:
# == FIXME ==
# this does not seem to be working properly at the moment

'''
# check for any duplicate filenames in the dataframe
duplicates = dataframe['filename'].duplicated()

# Select rows with duplicate filenames
duplicate_rows = dataframe[duplicates]
display(duplicate_rows)

# create a warning if duplicates exist
assert duplicates.sum() == 0, '\n' \
f'WARNING: there are {duplicates.sum()} duplicated filenames in the dataframe. proceed with caution.'
'''

pass

### Rebalancing Datapoints
I used [this article][rebalancing] to help figure things out.

[rebalancing]: https://towardsdatascience.com/having-an-imbalanced-dataset-here-is-how-you-can-solve-it-1640568947eb

In [11]:
from imblearn.ensemble import BalancedBaggingClassifier
from sklearn.tree import DecisionTreeClassifier

# create an object of the classifier, called "rebalancinator"
rebalancinator = BalancedBaggingClassifier(
	base_estimator=DecisionTreeClassifier(),
	sampling_strategy='auto',
	replacement=False,
	random_state=0
)

'''
Y_train = credit_df['Class']
X_train = credit_df.drop(['Class'], axis=1, inplace=False)

# train the classifier.
rebalancinator.fit(X_train, Y_train)
preds = rebalancinator.predict(X_train)
'''

pass

### Train/Test Split

In [None]:
# Y_train = credit_df['Class']
# X_train = credit_df.drop(['Class'], axis=1, inplace=False)

# # train the classifier.
# rebalancinator.fit(X_train, Y_train)
# preds = rebalancinator.predict(X_train)

---

In [267]:
img = Image.open(f'data/{dataframe["folder"][0]}/{dataframe["filename"][0]}')
print(img.size)
image_red = img.resize((1024, 1024))
image = img_to_array(image_red)

(852, 480)


In [268]:
image

array([[[117., 131., 140.],
        [137., 151., 160.],
        [135., 149., 158.],
        ...,
        [147., 169., 183.],
        [154., 176., 190.],
        [131., 153., 167.]],

       [[117., 131., 140.],
        [137., 151., 160.],
        [135., 149., 158.],
        ...,
        [147., 169., 183.],
        [154., 176., 190.],
        [131., 153., 167.]],

       [[138., 152., 161.],
        [158., 172., 181.],
        [157., 171., 180.],
        ...,
        [169., 191., 205.],
        [176., 198., 212.],
        [153., 175., 189.]],

       ...,

       [[110., 108., 130.],
        [119., 117., 138.],
        [113., 110., 129.],
        ...,
        [149., 116., 107.],
        [147., 114., 105.],
        [131.,  98.,  89.]],

       [[111., 103., 118.],
        [121., 110., 126.],
        [112., 101., 115.],
        ...,
        [138., 104.,  95.],
        [138., 104.,  95.],
        [125.,  91.,  82.]],

       [[111., 103., 118.],
        [121., 110., 126.],
        [112., 1

### Merge dfs

In [269]:
#Split df into train and test based on label column
df_train
df_test

def data_gen(df, batch_size):
	while True:
		x_batch = numpy.zeros((batch_size, 3, 1024, 1024))
		y_batch = numpy.zeros((batch_size, 1))
		for j in range(len(df/batch_size)):
			b = 0
			for m, k in zip(
				df['filename'].values[j*batch_size:(j+1)*batch_size],
				df['has_fire'].values[j*batch_size:(j+1)*batch_size]
			):
				img = Image.open(f'{df["Folder"][b]}/{m}')
				image_red = img.resize((1024, 1024))
				x_batch[b] = img_to_array(image_red)
				y_batch[b] = k
				b += 1
			yield (x_batch, y_batch)


model.fit_generator(
	generator=data_gen(
		df_train,
		batch_size=batch_size
	), 
	steps_per_epoch=len(df_train) // batch_size, epochs=epochs
)

NameError: name 'df_train' is not defined