# Train a model to identify street signs

For support in using these steps in your image classification project, please go to the [Survey123 early adopter community forum](https://earlyadopter.esri.com/project/home.html?cap=e69ef91f45744b98882c651f7b518eb7).

This notebook demonstrates how to build and verify a model that can be used to automatically identify street signs using Survey123. It is designed to be used as part of the [Train a model to identify street signs](https://learn.arcgis.com/en/projects/train-a-model-to-identify-street-signs/) Learn ArcGIS tutorial. In the first part of the tutorial, you'll build a survey to collect images that can be used by this notebook to train a model. In the final part of the tutorial, you'll build a survey that uses the model created by this notebook.

The steps in this notebook include:

* [Set up the environment](#setup)
* [Download training images from the feature layer](#download)
* [Train the model](#train)
* [Test the model (optional)](#test)


This notebook must be run as an **Advanced with GPU support version 4.0** notebook. It will consume **30 credits/hour** in ArcGIS Online credits.
If you followed the instructions in the [Train a Model](https://learn.arcgis.com/en/projects/train-a-model-to-identify-street-signs/#train-a-model) section of the tutorial, you should be ready to start. You can run each code block in this notebook to train the model.

Another option is to create a new notebook of your own and copy and paste the code from this notebook into it.  **If** you choose to **create your own notebook and to copy/paste the code**, follow the five steps below, **otherwise, skip to the next cell.** 

1. From your ArcGIS Online account click **Notebook**.
2. Click **New Notebook** and click **Advanced with GPU support - 4.0**.
3. Name the new notebook *SignImageCollectionModel_(YourInitals)*.
4. Give the notebook at least one tag and a description.  
5. Keep this notebook open in a separate tab or window so you can follow along with the instructions. Remember to save your notebook often.

## Set up the environment <a class="anchor" id="setup"></a>

The first step is to check your ```arcgis``` version.  

1. Click in the code cell below and click the **Run** button on the notebook toolbar to run the cell.  

The cell will be highlighted in green when you click in it.

While the cell is running, you will see an asterisk in the brackets beside the cell.  

After the cell finishes running, you will see a number beside the cell.  Since this is the first cell to be run, the number will be 1.  After the cell runs, the next cell in the notebook will be highlighted.

The  ```arcgis``` version of the notebook will be printed out below the cell.

In [None]:
from arcgis import __version__ as arcgisv
print(arcgisv)

2. Check the ```arcgis``` version that was printed below the previous cell. If the version number is below 1.8.4, click in the cell below and click **Run** to update arcgis in the notebook to version 1.8.4.   

If the version number is above 1.8.4, you should exit the notebook and change the Notebook Runtime version to **Python 3 Advanced with GPU support - 4.0** on the Notebook item **Settings** tab, then reopen the notebook. 

This will take a little while. When the cell finishes running, a number will appear in the brackets beside it. At this point, proceed to the next instruction.

In [None]:
pip install arcgis==1.8.4

The next step is to Restart the kernel.

**This step is critical.**   

3. On the **Kernel** menu, choose **Restart**. Do not choose **Restart & Clear Output**.  

If you get errors later in the notebook, restarting the kernel at this point is the most likely remedy.

The next step is to import packages required for the code.  
4. Click in the cell below and click the **Run** button.  

Importing the packages will take some time.

In [None]:
%env ARCGIS_ENABLE_TF_BACKEND=1
import os
import shutil
import sys
import tensorflow as tf

from pathlib import Path
from arcgis.gis import GIS
from arcgis.learn import prepare_data, FeatureClassifier

from PIL import Image
from PIL import ImageFont
from PIL import ImageDraw

tf.keras.backend.clear_session() 
gis = GIS("home")

The next step is to set a variable to the feature service ID. This tells the model where to get its training data. 

> Use the default value in the cell below if you are using the supplied sample geodatabase, or change the feature_service_id variable to match your own feature service ID, if you collected your own training data.

> To identify the ```feature service id``` of your own layer go to ```Add``` on the navigation bar, find the **feature layer** and click the ```Add to Notebook``` button. A cell with sample code with the ID will be added. Copy just the ID and replace the value in the variable-setting statement below. Remove the sample feature service id.

5. Click in the next cell and click **Run**.

In [None]:
feature_service_id = 'fa40cf680eb4436daf4109b887b52b30'

The next step is to set some parameters for the model.

> Use the defaults listed if using the supplied data files, or change to suit your data.  

6. Click in the next cell and click **Run**.

In [None]:
tmp_dir = '/arcgis/home/tmp'
notebook_title = "SignageClassificationModel"
feature_layer_index = 0
classification_field = 'signType'
model_name = 'signs'
model_title = 'Signage Classification Model'
model_description = 'Classify speed and stop signs'
model_tags = 'SignDetection'
model_folder = 'SignDetection'

## Download training images from the feature layer <a class="anchor" id="download"></a>

The first step in this section it to access the feature layer.  

1. Click in the cell below and click **Run**.

In [None]:
feature_service = gis.content.get(feature_service_id)
feature_service

Next, you will print the layer's name and whether it has attached images.  

2. Click in the cell below and click **Run**.

In [None]:
feature_layer = feature_service.layers[feature_layer_index];
print('layer name:', feature_layer.properties.name)
print('has attachments:', feature_layer.properties.hasAttachments)

Next, you will get all the features from the layer and review the first few in the table head.  

3. Click in the cell below and click **Run**.

In [None]:
features = feature_layer.query(return_geometry=False, fields=classification_field)
features.sdf.head()

Next, you will go through the features and find the number of categories there are and how many images there are for each category.  

4. Click in the cell below and click **Run**.

In [None]:
# get the list of unique feature classes
feature_classes = {v:[k] for v, k in enumerate(features.sdf[classification_field].unique().tolist())}

# count the number of images in each class
total_images = 0
for i in feature_classes:
    image_count = len(features.sdf[features.sdf[classification_field] == feature_classes[i][0]])
    feature_classes[i].append(image_count)
    total_images += image_count

print('images per class', feature_classes)
print('total:', total_images)

Next, you will create a directory in your running Notebook environment where the code will download and save the data and images from this feature layer.  

5. Click in the cell below and click **Run**.

In [None]:
data_path = Path(os.path.join(tmp_dir, feature_service.id))
images_path = Path(os.path.join(data_path, 'images'))

if (os.path.exists(images_path)):
    print('deleting existing directory:', images_path)
    shutil.rmtree(images_path)

print('creating new images directory:', images_path)
os.makedirs(images_path)

Next you will download the images from the feature layer, then check to ensure the number of downloaded images is correct.  

6. Click in the cell below and click **Run**.

In [None]:
# Set initial variables to 0 for counts of images and for image size
images_count = 0
images_size = 0

def is_image_attachment(attachment):
    """
    Determine if the attachment is an image
    :param attachment: feature attachment
    """
    contentType = attachment['contentType']
    return contentType == 'image/png' or contentType == 'image/jpeg'


def resize_image(image_path):
    """
    Resize the image to 160x160. Crop to the center of the 
    image where necessary. We assume here that the source image
    will always be larger than the output size.
    :param str image_path: image path
    """
    with Image.open(image_path) as img:
        size = 160
        width, height = img.size
        # resize the image to the target dimensions
        if height > width:
            img = img.resize((size, round(size / width * height)), Image.BICUBIC)
        else:
            img = img.resize((round(size / height * width), size), Image.BICUBIC)
        # crop the image to the center
        width, height = img.size
        x = (width / 2) - (size / 2)
        y = (height / 2) - (size / 2)
        cropped = img.crop((x, y, x + size, y + size))
        cropped.save(image_path)
    

for index, feature in features.sdf.iterrows():
    objectId = int(feature.objectid)
    attachments = feature_layer.attachments.get_list(oid=objectId)

    # ensure the output folder exists for the image class
    classification = feature[classification_field]
    attachments_path = Path(os.path.join(images_path, classification))
    if not os.path.exists(attachments_path):
        os.makedirs(attachments_path)
    
    # download image attachments to the images folder
    for a in attachments:
        if is_image_attachment(a):
            if not os.path.exists(os.path.join(attachments_path, a['name'])):
                print(f'downloading ({a["id"]}) {a["name"]}')
                feature_layer.attachments.download(oid=objectId, attachment_id=a['id'], save_path=attachments_path)
                resize_image(os.path.join(attachments_path, a['name']))
            images_count += 1
            images_size += a['size']

print('\ndownloaded:', images_count, "images")

7. Click **Files** and browse to the images folder to see downloaded images.

## Train the model <a class="anchor" id="train"></a>


Now that you have the training data available, you can start building the model.  

The following steps use the [arcgis.learn module](https://developers.arcgis.com/python/api-reference/arcgis.learn.toc.html#prepare-data). 

> If you get an error message, the problem may be the notebook runtime. Verify that you are using an Advanced Runtime Notebook with GPU support.

The first step is to divide the training data into **training** and **validation** sets. The default split percentage is 0.1, which means 10% of the data is kept as **validation**.  

1. Click in the cell below and click **Run**.

In [None]:
batch_size=16
data = prepare_data(
    path=data_path,
    dataset_type='Imagenet',
    batch_size=batch_size,
    seed=42,
    resize_to=224
)

Next, display samples of the prepared images in a defined number of rows.  

2. Click in the cell below and click **Run**.

In [None]:
data.show_batch(rows=4)

Next, create a deep learning model for image classification with defined **backbone** and **backend**.  

3. Click in the cell below and click **Run**.

In [None]:
backbone = 'MobileNetV2'
backend = 'tensorflow'
model = FeatureClassifier(data, backbone=backbone, backend=backend)

Next, find the **learning rate** of the model.  

4. Click in the cell below and click **Run**.

> This step takes approximately 3 minutes to complete.

In [None]:
learning_rate = model.lr_find()
print(learning_rate)

Train the model for the specified number of **epochs**, using the specified **learning rate**.  

5. Click in the cell below and click **Run**.

> This step takes approximately 15 minutes to complete.

In [None]:
epochs = 20
model.fit(epochs=epochs, lr=learning_rate)

Plot **validation losses** and **training losses** after training the model.  

6. Click in the cell below and click **Run**.

In [None]:
model.plot_losses()

Display the **ground truth** images and results of the corresponding **prediction**.  

7. Click in the cell below and click **Run**.

In [None]:
model.show_results(rows=2, thresh=0.5)

The next step is to save the model with a unique name and publish it to your ArcGIS account.  

8. Click in the cell below and click **Run**.

>If you do not save the model it will be lost when the current session is disconnected.  

The variable model_name was set equal to 'signs' earlier in this notebook.  The current timestamp will be added to signs to make a unique name.

In [None]:
from datetime import datetime
timestamp = datetime.now().strftime('_%d%m%Y_%H%M%S')
model_name2 = model_name + timestamp
model_path = model.save(model_name2, framework="tflite", publish=True)
model_path

Next, update the model metadata.  

9. Click in the cell below and click **Run**.

If you get an error, please try increasing the number of ```max_items``` in ```gis.content.search```, where it says ```max_items=1000```.

In [None]:
model_data_name = model_name2 + '.dlpk'
model_data_path = os.path.join(model_path, model_data_name)
model_item_properties = {
    'type': 'Deep Learning Package',
    'title': model_title + timestamp,
    'description': model_description + ". " + "File Database: " + feature_service.title,
    'tags': model_tags
}

# determine if there is an existing deep learning package with the same name
dl_packages = gis.content.search('owner:' + gis.properties.user.username, item_type = 'Deep Learning Package', max_items=1000)

model_item = None
for i in range(len(dl_packages)):
    if dl_packages[i].name == model_data_name:
        model_item = dl_packages[i]
        break

# update the model metadata
if model_item:
    print('Updating:', model_item)
    model_item.update(item_properties=model_item_properties, data=model_data_path)
else:
    print('Adding:', model_data_path, "folder:", model_folder)
    model_item = gis.content.add(item_properties=model_item_properties, data=model_data_path, folder=model_folder)
    
model_item

Next you will save the configuration of your model building process. This includes:

<ul>
<li>Model name and dataset name</li>
<li>Notebook name</li>
<li>Feature service ID, number of objects, fiel geodatabase, last modified time</li>
<li>Data information (such as class name and saved path) and batch size</li>
<li>Model information (backbone and backend)</li>
<li>Learning rate</li>
<li>Other notes (you can change or add more)</li>
</ul>  

10. Click in the cell below and click **Run**.


In [None]:
import textwrap

wrapper = textwrap.TextWrapper(initial_indent='\t', subsequent_indent='\t')
str_data = wrapper.fill(str(data))

outfilepath = str(model_path) + "/SMTCMR_CONFIG.txt"
text_file = open(outfilepath, "w")
text_file.write(
    "TEST#" + str(model_name2) + " - " + str(feature_service.title)                                 +   "\n" +
    "ARGIS NOTEBOOKS: " + str(notebook_title)                                                    +   "\n" + 
    "\n" + 
    "ID: " + str(feature_service_id)                                                               +   "\n" + 
    "OBJECTS: " + str(total_images)                                                              +   "\n" +
    "FGDB_LAST_MODIFIED: " + str(datetime.fromtimestamp(feature_service.modified/1000))           +   "\n" +
    "\n" + 
    "\n" + 
    "DATA: "                                                                                     +   "\n" +
    "\t" + str(feature_classes)                                                                        +   "\n" +
    str(str_data)                                                                                +   "\n" +
    "\t" + "batch_size: " + str(batch_size)                                                      +   "\n" +
    "\n" + 
    "\n" + 
    "MODEL: "                                                                                    +   "\n" +
    "\t" + str(backbone) + ", " + str(backend)                                                   +   "\n" +
    "\n" + 
    "\n" + 
    "BEST LR: " + str(learning_rate)                                                                        +   "\n" +
    "\n" + 
    "\n" + 
    "**NOTE"                                                                                     +   "\n" +
    "\t- The images have been cropped from top & bottom (160 pixels each) into square"           +   "\n" +
    "\t- All images are checked"                                                                 +   "\n" +
    "\t- Model is saved as zip file"
    )
text_file.close()
print("Config file saved to: " + str(outfilepath))

Next, update the ModelFormat parameter to be "NHWC". 
This is a requirement for recent versions of Survey123.

11. Click in the cell below and click **Run**.

In [None]:
import json

def update_emd(new_data, file_name):
    with open(file_name, 'r+') as file:
        file_data = json.load(file)
        if list(new_data)[0] not in file_data:
            file_data.update(new_data)
            file.seek(0)
            json.dump(file_data, file, indent = 4)

model_format = {"ModelFormat":"NHWC"}
filename = os.path.join(model_path, ''.join([os.path.basename(model_path),'.emd']))
update_emd(model_format, filename)


Next, create a .zip file containing the TensorFlow Lite model and results.  You will download the model and use it in Survey123.

12. Click in the cell below and click **Run**.

In [None]:
import shutil
zip_name = model_path
directory_name = model_path
shutil.make_archive(zip_name, 'zip', directory_name)

Optionally, delete the folder of model and results to save storage space.  

13. Click in the cell below and click **Run**.

In [None]:
# Uncomment the two lines below by removing the # signs before each line before running the cell if want to delete the folder
# import shutil
# shutil.rmtree(model_path) 

You have completed the training and packaging of the model. All files are stored in the .zip file.  

14. Click **Files**, browse to the model path location, and download the .zip file to save it to your computer.



At this point you can return to the [Test the model](https://learn.arcgis.com/en/projects/train-a-model-to-identify-street-signs/#test-the-model) section of the Learn tutorial, to see how to use these files with Survey123 to test the model in the field.  

If you choose go back to the Learn tutorial, you can save and close this notebook now. This will stop the notebook from continuing to consume ArcGIS Online credits.  

15. Click Save.  

16.  Close this browser tab.

## Programatically test the model (optional) <a class="anchor" id="test"></a>

The following steps demonstrate how to validate the **accuracy** and **confidence rate** of your model. The last part of the accompanying learn tutorial demonstrates how to test your model in the field with Survey123. When using this model with your own images, you might choose to programatically test the  model first, to gain confidence, before going out into the field. 


1. Create a new folder to hold test data files that we'll download from an image collection.

In [None]:
test_path = tmp_dir + "/test"

if (os.path.exists(test_path)):
    print('deleting existing directory:', test_path)
    shutil.rmtree(test_path)

print('creating new test data directory:', test_path)
os.makedirs(test_path)

2. Set test image collection.

>Use the default listed if using the supplied image collection, or change to match your own image collection. 

In [None]:
test_images_collection_id = '67137e0896fa455dabcc78db7e20aa9c'

3. Retrive metadata about the test image collection.

In [None]:
imagecol = gis.content.get(test_images_collection_id)
imagecol

4. Download the test image collection and then extract it into a local folder. 

In [None]:
# extract the contents of the zip file into our test images folder
import zipfile
with zipfile.ZipFile(imagecol.download(), 'r') as archive:
    for filename in archive.namelist():
        # ignore MacOS specific metadata folder
        if not filename.startswith('__MACOSX'):
            archive.extract(filename, test_path)

# the new folder containing our test images
test_images_path = os.path.join(test_path, imagecol.title)

5. Count the total number of test images in the collection. 

In [None]:
# count the number of images in the test folder
import glob
prediction_images_count = 0
for file in glob.glob(test_images_path + "/**/*.jpg", recursive=True):
    prediction_images_count += 1

6. Prepare the dataset of prediction images. Leave the parameters unchanged or you can change/add anything you are familiar with.

In [None]:
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
test_images = test_datagen.flow_from_directory(
    test_images_path, 
    batch_size=prediction_images_count,
    shuffle=False,
    target_size=(224,224),
    interpolation="nearest",
    color_mode="rgb",
)
test_image, test_label = next(iter(test_images))
class_names = [f for f in os.listdir(images_path) if f not in "models"]
print(class_names)

7. Run the prediction and save the prediction of each image in your model path.

In [None]:
import pandas as pd
interpreter = tf.lite.Interpreter(model_path=str(model_path) + "/" + str(model_name2) + ".tflite")

input_details = interpreter.get_input_details()

interpreter.resize_tensor_input(input_details[0]['index'], (prediction_images_count, 224, 224, 3))
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

interpreter.set_tensor(input_details[0]['index'], test_image)
interpreter.invoke()

predictions = interpreter.get_tensor(output_details[0]['index'])
print("Prediction results shape:", predictions.shape)
predFrame = pd.DataFrame(predictions)
predFrame.columns = class_names
predFrame.to_csv(str(model_path)+"/prediction.csv")

8. Display the result numerically with **accuracy** and **average confidence rate**.

In [None]:
import numpy as np
tCount = 0
confLe = 0
trueLabel = np.argmax(test_label, axis=-1)
ids = np.argmax(predictions, axis=-1)
for n in range(prediction_images_count):
    if ids[n] == trueLabel[n]:
        tCount += 1
        confLe += predictions[n][ids[n]]
rs = 'Accuracy: ' + str(tCount/prediction_images_count) + '; ' + 'Average Confidence: ' + str(confLe/tCount)
print(rs)

9. Display the result visually showing prediction for each images. Save the figure in your model path.

In [None]:
import math
import matplotlib.pyplot as plt 

trueLabel = np.argmax(test_label, axis=-1)
ids = np.argmax(predictions, axis=-1)
predLabels = [numpy.array(class_names)[j] for j in ids]

# create a plot showing each image and the associated prediction
plt.figure(figsize=(20,9))
plt.subplots_adjust(hspace=0.5)
cols = 10
rows = math.ceil(prediction_images_count / cols)
for i in range(len(test_image)):
    row = math.floor(i / cols)
    col = i - (row * cols)
    plt.subplot(rows, cols, i + 1)
    plt.imshow((test_image[i]).astype(np.float32))
    color = "green" if ids[i] == trueLabel[i] else "red"
    plt.title(predLabels[i], color=color)
    plt.axis('off')
_ = plt.suptitle("Model predictions (correct: green, incorrect: red)\n" + rs)

# save the plot to a file
plt.savefig(str(model_path)+"/prediction.png")

10. Save prediction results in your .zip file. 

In [None]:
zip_name = model_path
directory_name = model_path
shutil.make_archive(zip_name, 'zip', directory_name)

Click **Files**, browse to the nominated location, and download the .zip to save locally.