<a id="item41"></a>

# Concrete Crack Detector with a Neural Net
### By Joshua Montgomery

This notebook contains all the code required to train and test the model, however it did take 16 hours to execute on my laptop, so I would have to advise against trying to run the code yourself. Instead I will upload this notebook to GitHub and you will be able to see my outputs.

### Part 1: Getting the data

The data set used to test and train this model consists of almost 40,000 images. There is an even split of positve images (cracked concrete) and negative (not cracket concrete) images. This data set is dowloaded from the URL below and then unzipped

In [None]:
!wget  --no-check-certificate https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0321EN/data/concrete_data_week4.zip

In [None]:
!unzip concrete_data_week4.zip

After the dataset has been unzipped you will find it is already split into training and validation sets.

### Part 2: Initialising the model  

To reduce traing time the partially trained VGG16 model will be used. This model is included in the keras open source package.

In [7]:
from keras.preprocessing.image import ImageDataGenerator
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.applications import VGG16
from keras.applications.resnet50 import preprocess_input
# Imports all of the required plug-ins

In [8]:
num_classes = 2 #Positve or Negative
image_resize = 224  #The images need to be resized so they are a fixed dimension
batch_size_training = 100 
batch_size_validation = 100
# This section of code defines several key values that are used in the code.

In [9]:
data_generator = ImageDataGenerator(
    preprocessing_function=preprocess_input,
)
#This ImageDataGenerator 'data_generator' will be called as the parent for subsequent generators

train_generator = data_generator.flow_from_directory(
    'concrete_data_week4/train',
    target_size=(image_resize, image_resize),
    batch_size=batch_size_training,
    class_mode='categorical')
#The 'train_generator' is called when the model is being trained and contains 30,001 images

validation_generator = data_generator.flow_from_directory(
    'concrete_data_week4/valid',
    target_size=(image_resize, image_resize),
    batch_size=batch_size_validation,
    class_mode='categorical')
#The 'validation_generator' is called when the model is being tested and contains 9,501 images

Found 30001 images belonging to 2 classes.
Found 9501 images belonging to 2 classes.


In [10]:
model = Sequential()
model.add(VGG16(
    include_top=False,
    pooling='avg',
    weights='imagenet',
    ))
# The model is intialised




In [11]:
model.add(Dense(num_classes, activation='softmax'))
model.layers[0].trainable = False
model.summary()
# The summary of the pretraining of the VGG16 model

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Model)                (None, 512)               14714688  
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 1026      
Total params: 14,715,714
Trainable params: 1,026
Non-trainable params: 14,714,688
_________________________________________________________________


### Part 3: Fitting the model

In this section the VGG16 model will be trained on the image data.

In [12]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
steps_per_epoch_training = len(train_generator)
steps_per_epoch_validation = len(validation_generator)
num_epochs = 2 # More epochs would be better, however each epoch takes approximately 7 hours to exectute using my current hardware.
# Compiling the model is the final step before fitting and defines the remaining inputs

In [None]:
fit_history = model.fit_generator(
    train_generator,
    steps_per_epoch=steps_per_epoch_training,
    epochs=num_epochs,
    validation_data=validation_generator,
    validation_steps=steps_per_epoch_validation,
    verbose=1,

#By verbose is stated equal to 1, whilst fitting the code outputs various details such as cuurent accuracy, loss and an
# expected time of completion. The code will run silently if verbose =0

In [None]:
model.save('classifier_VGG16.h5')
# This line of code saves the model

In [16]:
model = keras.models.load_model("classifier_resnet_VGG16.h5")
# If the model has been saved into the working directory it can be loaded using this line of code






### Part 4: Evaluation

In [17]:
data_generator = ImageDataGenerator(
    preprocessing_function=preprocess_input,
)
validation_generator = data_generator.flow_from_directory(
    'concrete_data_week4/valid',
    target_size=(image_resize, image_resize),
    #batch_size=batch_size_validation,
    #class_mode='categorical',
    shuffle=False)

# These data generators are very similar to the previously defined generators, except shuffle is now equal to false.

Found 9501 images belonging to 2 classes.


In [18]:
evalmodelvgg16=model.evaluate_generator(validation_generator)
# This code evaluates the model, and can take a while to execute

In [19]:
print('The loss of the VGG16 model is ' + str(round(evalmodelvgg16[0], 5))+ ' and the accuarcy is '+ str(100*round(evalmodelvgg16[1],3)) +'%')

The loss of the VGG16 model is 0.00083 and the accuarcy is 99.6%
The loss of the ResNet50 model is 0.0 and the accuarcy is 82.0%


### Part 5: Prediction

In [20]:
predictor_generator = data_generator.flow_from_directory(
    'concrete_data_week4/valid',
    target_size=(image_resize, image_resize),
    shuffle=False)

Found 9501 images belonging to 2 classes.


In [21]:
predictvgg16=model.predict_generator(predictor_generator)

In [67]:
number_of_predictions=len(predictvgg16)

In [70]:
predictions=[]
for i in range (0,number_of_predictions):
    if predictvgg16[i][0]>predictvgg16[i][1]:
        predictions.append('Negative')
    else:
        predictions.append('Positive')

print(predictions[0:5])

['Negative', 'Negative', 'Negative', 'Negative', 'Negative']


Because the downloaded file had two sub folders containing all the images, with the negative folder first, almost all the predictions are negative. Untill the subfolder switches to the positive folder and then almost all the predictions are postive.
I've also printed the last five predicitons as proof of this.

In [72]:
print(predictions[(number_of_predictions-5):number_of_predictions])

['Positive', 'Positive', 'Positive', 'Positive', 'Positive']


The full list of output predictions, shows that almost all of the predicitons are correct, in line with the predicted accuracy of 99.6%
