**Classifying apples and bananas using image Classifier**    

In today's tutorial we are going to build a train an existing image classifier with addtional images to classify given fruit image as an apple or a banana. We are going to use imageAI library to train our model. If you are new to google colab just keep in mind that commands preceeded by ***!*** are run as shell commands while all other commands are run in python environment.   
  
Since image classification requires intensive computing resources we are going to use a GPU runtime to train our model. So first go to **'RunTime'** menu option displayed above in the menu bar and select **'Change Runtime Type'**. In the** 'Hardware accelerator'** dropdown select **'GPU'** option and click **'Save' **

Next we are going to install imageAI library so select cell below and hit 'Shift + Enter' key to install imageai.


In [0]:
!pip3 install https://github.com/OlafenwaMoses/ImageAI/releases/download/2.0.2/imageai-2.0.2-py3-none-any.whl 

Now we need to create directories for training our model. The **'mkdir'** command below will generate all necessary folders required for our needs based on nested object strucure passed to it.

In [0]:
!mkdir -p data/fruits/{train/{apple,banana},test/{apple,banana},valid}

Let's check if folders are created properly by installing ***tree*** package and running the tree command

In [0]:
!apt-get install tree
!tree data/

Next we need to download subversion package which will be useful in downloading images for training and test datasets from github.

In [0]:
!apt-get install subversion

Now let us populate training data in apple folder by downloading images from github repository by running below commands

In [0]:
cd /content/data/fruits/train/apple/

We are going to first download proper resolution images of apples from fruits images made available on github link here:-  [Fruit Image Dataset
](https://github.com/Horea94/Fruit-Images-Dataset). Since this images are of same resolution it will allow our model to identify basic shape and attribute of apple with ease.   

Here svn export command below is used to download images from a sub-folder from given github repository so that we only download apple images instead of entire repo. Next we simply move images to parent folder from temporary download folder and remove the temporary download folder. 

In [0]:
!svn export https://github.com/Horea94/Fruit-Images-Dataset.git/trunk/Training/Apple%20Red%201
!mv 'Apple Red 1'/* ./
!rm -rf 'Apple Red 1'/

We also need to add real world images with different resolutions to input dataset in order to make our model more robust in making predictions on data. In this case we are downloading data from Pixabay website using their api whose documentation is available here: [Pixabay Api](https://pixabay.com/api/docs/). This will allow us to get more real world photos in our input dataset.  

In the code below we request 3 pages containing 200 results per page from pixabay api using for loop then download image images to our folder using **urlretrieve**  module from urllib library. We are storing images using their ids as names.

In [0]:
import requests
import urllib.request
for x in range(1, 3):  
  response = requests.get("https://pixabay.com/api/?key=10623603-93416acbea978c20fa9a7a48d&q=apple+fruit&order=popular&image_type=photo&per_page=200&page={pageno}".format(pageno=x))
  imagesArray = response.json()["hits"]
  for imageObject in imagesArray:
     print("getting file"+str(imageObject["id"])+".jpg")
     urllib.request.urlretrieve(imageObject["previewURL"], str(imageObject["id"])+".jpg") 


Next we will try to visualize some of the images using **matplotlib** library just to see what kind of images we have downloaded.

In [0]:
import os
files = os.listdir(f'./')[:7]
files
import matplotlib.pyplot as plt 
img = plt.imread(f'./{files[6]}')
plt.imshow(img);

Next we are moving one image from downloaded ones to valid folder which will be used to validate the accuracy of our model predictions later.

In [0]:
!mv 1_100.jpg /content/data/fruits/valid/apple.jpg

Now we are going to run similar steps in test folder to add testing images of apples which will form a part our testing dataset.

In [11]:
cd /content/data/fruits/test/apple

/content/data/fruits/test/apple


We are importing test images from same github project below.

In [0]:
!svn export https://github.com/Horea94/Fruit-Images-Dataset.git/trunk/Test/Apple%20Red%201
!mv 'Apple Red 1'/* ./
!rm -rf 'Apple Red 1'/

And add some real world images to test folder too from pixabay.

In [0]:
response = requests.get("https://pixabay.com/api/?key=10623603-93416acbea978c20fa9a7a48d&q=apple+fruit&order=popular&image_type=photo&per_page=200&page=3")
imagesArray = response.json()["hits"]
for imageObject in imagesArray:
  print("getting file"+str(imageObject["id"])+".jpg")
  urllib.request.urlretrieve(imageObject["previewURL"], str(imageObject["id"])+".jpg") 

Next we are printing the count of total images in train and test directory.

In [0]:
!echo "Apples train directory has $(ls -l /content/data/fruits/train/apple | egrep -c '^-') files"
!echo "Apples test directory has $(ls -l /content/data/fruits/test/apple | egrep -c '^-') files"


Now we will run same steps for populating banana images in train and test folder.

In [0]:
cd /content/data/fruits/train/banana

In [0]:
!svn export https://github.com/Horea94/Fruit-Images-Dataset.git/trunk/Training/Banana
!mv 'Banana'/* ./
!rm -rf 'Banana'/

In [0]:
import requests
import urllib.request
for x in range(1, 3):  
  response = requests.get("https://pixabay.com/api/?key=10623603-93416acbea978c20fa9a7a48d&q=banana+fruit&order=popular&image_type=photo&per_page=200&page={pageno}".format(pageno=x))
  imagesArray = response.json()["hits"]
  for imageObject in imagesArray:
     print("getting file"+str(imageObject["id"])+".jpg")
     urllib.request.urlretrieve(imageObject["previewURL"], str(imageObject["id"])+".jpg") 


In [0]:
import os
files = os.listdir(f'./')[:7]
files
import matplotlib.pyplot as plt 
img = plt.imread(f'./{files[1]}')
plt.imshow(img);

In [0]:
!mv 0_100.jpg /content/data/fruits/valid/banana.jpg

In [0]:
cd /content/data/fruits/test/banana

In [0]:
!svn export https://github.com/Horea94/Fruit-Images-Dataset.git/trunk/Test/Banana
!mv 'Banana'/* ./
!rm -rf 'Banana'/

In [0]:
response = requests.get("https://pixabay.com/api/?key=10623603-93416acbea978c20fa9a7a48d&q=banana+fruit&order=popular&image_type=photo&per_page=200&page=3")
imagesArray = response.json()["hits"]
for imageObject in imagesArray:
  print("getting file"+str(imageObject["id"])+".jpg")
  urllib.request.urlretrieve(imageObject["previewURL"], str(imageObject["id"])+".jpg") 

In [0]:
!echo "Banana train directory has $(ls -l /content/data/fruits/train/banana | egrep -c '^-') files"
!echo "Banana test directory has $(ls -l /content/data/fruits/test/banana | egrep -c '^-') files"


In [0]:
cd /content

In [0]:
!ls

Now we will perform model training steps as follows:-
First import ModelTraining module from imageai

In [0]:
from imageai.Prediction.Custom import ModelTraining

Next we set our model type to Resnet. You can find more information about Resnet here: [Medium](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035)

In [0]:
model_trainer = ModelTraining()
model_trainer.setModelTypeAsResNet()

We specify data directory for our model to get input dataset images from.

In [0]:
model_trainer.setDataDirectory("./data/fruits")

If you try to run the model second time then run below commands to clear data from first run before running training again.

In [0]:
#!rm -rf data/fruits/json/
#!rm -rf data/fruits/models/
#!ls data/fruits/

We are now all set to start training on our model using **trainModel** function below with parameters as:-   

• number_objects : This refers to the number of different types of fruits in our dataset.  
• num_experiments : This is the number of times the model trainer will study all the images in the dataset in order to achieve maximum accuracy.  
• Enhance_data (Optional) : This is to tell the model trainer to create modified copies of the images in the dataset to ensure maximum accuracy is achieved.  
• batch_size: This refers to the number of images the set that the model trainer will study at once, until it has studied all the images in the dataset.  
• Show_network_summary (Optional) : This is to show the structure of the model type you are using to train the artificial intelligence model.  

In [0]:
model_trainer.trainModel(num_objects=2, num_experiments=10, enhance_data=False, batch_size=32, show_network_summary=True)


Once our model has finished traning we can have a look at all the models created by running the command below. The number after **ex-** specifies training step/epoch and number after **acc-** tells us accuracy achived.

In [0]:
!ls /content/data/fruits/models/

Now we will load our model to make predictions on validation data.
First specify the model name with maximum accuracy achieved in previous set in line 4 below. For example if your model with max accuracy had name ***"model_ex-010_acc-0.826705.h5"*** then the setModelPath will have url ***"./data/fruits/models/model_ex-010_acc-0.826705.h5"***

In [0]:
from imageai.Prediction.Custom import CustomImagePrediction
prediction = CustomImagePrediction()
prediction.setModelTypeAsResNet()
prediction.setModelPath("./data/fruits/models/model_ex-010_acc-0.811080.h5")

Now we load our model in prediction varaible with number of input object specified.

In [0]:
prediction.setJsonPath("./data/fruits/json/model_class.json")
prediction.loadModel(num_objects=2)

Let us check images in our validation folder.

In [0]:
!ls /content/data/fruits/valid

You can view the images by specifying filenames in below step

In [0]:
import matplotlib.pyplot as plt 
img = plt.imread(f'./data/fruits/valid/apple.jpg')
plt.imshow(img);

We will now run a prediction on the image we copied to the valid folder and print out the result to the shell below.

In [0]:
predictions, probabilities = prediction.predictImage("./data/fruits/valid/apple.jpg", result_count=2)
for eachPrediction, eachProbability in zip(predictions, probabilities):
    print(eachPrediction , " : " , eachProbability)

In [0]:
predictions, probabilities = prediction.predictImage("./data/fruits/valid/banana.jpg", result_count=2)

for eachPrediction, eachProbability in zip(predictions, probabilities):
    print(eachPrediction , " : " , eachProbability)

Let us check if our model works on image of apple fruit downloaded from internet below.

In [0]:
!wget "https://images.pexels.com/photos/102104/pexels-photo-102104.jpeg" -O /content/data/fruits/valid/applereal.jpg

In [0]:
predictions, probabilities = prediction.predictImage("./data/fruits/valid/applereal.jpg", result_count=2)

for eachPrediction, eachProbability in zip(predictions, probabilities):
    print(eachPrediction , " : " , eachProbability)

And that concludes our today's tutorial on training an existing image classifier to classify custom images. In case you would like to experiment a bit, you can add your own custom images of different kinds and test the accuracy of predictions made by the model. Happy coding :D