# ComputerVisionLab02 Intro

The goal of this lab is to show the importance of visual datasets when building a machine learning model. Furthermore this lab will introduce you with the steps you need to follow to build and deploy a machine learning model within a production environment. 

<br><br>

<img src="./images/VisualDataML.png">

<br><br><br><br><br>

<img src="./images/KeyFactorsML.png">

<br><br><br><br><br>

<img src="./images/ImportanceBigData.png">

<br><br><br><br><br>

<img src="./images/ModelML.png">

<br><br><br><br><br>

<img src="./images/DeepLearning.png">

<br><br><br><br><br>

<img src="./images/MLCategories.png">

<br><br><br><br><br>

<img src="./images/AlgorithmsML.png">

<br><br><br><br><br>

<img src="./images/RepresentingML.png">

<br><br><br><br><br>

# Keras

Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle low-level operations such as tensor products, convolutions and so on itself. 
<img src="https://keras.io/img/keras-logo-small.jpg" width="140">

Keras development is backed primarily by Google, and the Keras API comes packaged in TensorFlow as tf.keras. Additionally, Microsoft maintains the CNTK Keras backend. Amazon AWS is maintaining the Keras fork with MXNet support. Other contributing companies include NVIDIA, Uber, and Apple (with CoreML).

<img src="./images/KerasStack.png">


## Keras backend
Keras relies on a specialized, well optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. Rather than picking one single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras.

At this time, Keras has three backend implementations available: the TensorFlow backend, the Theano backend, and the CNTK backend.

* [TensorFlow](https://www.tensorflow.org/lite) is an open-source symbolic tensor manipulation framework developed by Google.
* [Theano](http://deeplearning.net/software/theano/) is an open-source symbolic tensor manipulation framework developed by LISA Lab at Université de Montréal.Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. The latest release of Theano was on 2017/11/15. 
* [CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/) is an open-source toolkit for deep learning developed by Microsoft. It describes neural networks as a series of computational steps via a directed graph. CNTK allows the user to easily realize and combine popular model types such as feed-forward DNNs, convolutional neural networks (CNNs) and recurrent neural networks (RNNs/LSTMs). CNTK implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers.<br><br><br><br>


## Keras popularity

<img src="./images/KerasPop01.png" width="600">
<br><br>
<img src="./images/KerasPop02.png" width="600">
<br><br><br><br><br>

# The Keras Ecosystem

* ### [Keras Tuner](https://keras-team.github.io/keras-tuner/)
scalable hyperparameter search framework
* ### [AutoKeras](https://autokeras.com/)
widely accessible and easy to learn and use machine learning AutoML system based on Keras
* ### [TensorFlow Cloud](https://github.com/tensorflow/cloud)
set of utilities for running large-scale Keras training jobs on Google Cloud Platform
* ### [TensorFlow.js](https://www.tensorflow.org/js)
used for running TF models in the browser or on a Node.js server
* ### [TensorFlow Lite](https://www.tensorflow.org/lite) 
open source deep learning framework for deploying ML models on mobile and IoT devices
* ### [Model optimization toolkit](https://www.tensorflow.org/model_optimization)
toolkit is used to make ML models faster and more memory and power efficient
* ### [TFX integration](https://www.tensorflow.org/tfx) 
ML platform for deploying and maintaining production machine learning pipelines

<br><br><br><br><br>

# Interoperability and Compatibility
Keras also supports the [ONNX](https://onnx.ai/) format. 

<img src="https://onnx.ai/assets/mlogo.png" width="140">

ONNX is a open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners.

<img src="./images/ONNXCommunity.png">

The keras2onnx model converter enables users to convert Keras models into the ONNX model format. Initially, the Keras converter was developed in the project onnxmltools. To support more kinds of Keras models and reduce the complexity of mixing multiple converters, keras2onnx was created to convert the Keras model only.
<br><br><br><br><br>

# End to end ML model requirements

#### 1. Acquiring/Creating a dataset 
#### 2. Preparing a dataset
#### 3. Dataset split to train and test data
#### 4. Training the ML model
#### 5. The ML model validation
#### 6. ML deployment

<br><br><br><br><br>

# Datasets 

### 1. [CIFAR-10/100](https://www.cs.toronto.edu/~kriz/cifar.html) dataset
Dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. 
<img src="./images/CIFAR.png" width="500">

### 2. [MNIST](http://yann.lecun.com/exdb/mnist/) Dataset
The MNIST database has a training set of 60,000 examples, and a test set of 10,000 examples. 
<img src="./images/MNIST.png" width="500">

### 3. IMDB-Wiki Dataset
Dataset has 520k images - gender and age prediction.
<img src="./images/IMDB.png" width="500">

### 4. [ImageNet](http://www.image-net.org/challenges/LSVRC/) dataset
Evaluation of object detection and image classification algorithms.
<img src="./images/imagenet.jpeg" width="500">

### 5. Places2 Database dataset
Dateaset has more than 10 million images and over 400 scenes. It is used for scene classification and scene parsing.
<img src="./images/Places.png" width="500">

### 6. [COCO](http://cocodataset.org/#external) dataset
A large-scale object detection, segmentation, and captioning dataset with several features including Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.
<img src="./images/COCO.png" width="500">

### 7. [Kaggle Datasets Collection](https://www.kaggle.com/datasets) 
Kaggle enables you to search among 36924 datasets. 
<img src="./images/KaggleData.png" width="500">

<br><br><br>

## Acquiring Datasets using [googleimagesdownload](https://pypi.org/project/google_images_download/2.3.0/) or [flickerimagesdownloader](https://github.com/ultralytics/flickr_scraper)

# 1. googleimagesdownload

Searching for images and downloading them to create your ML dataset can be performed using [googleimagesdownload](https://pypi.org/project/google_images_download/2.3.0/). Before we start using the library we need to download it and install. This can be done using pip as shown below. Take care about the dependencies (pip install requests-futures, pip install pandas, pip install bs4, pip install requests).

In [15]:
#!pip install googleimagedownloader

The second prerequisite that has to be met is downloading of [Chromedriver](https://sites.google.com/a/chromium.org/chromedriver/home). In order to get **googleimagedonwloader** working the downloaded version of Chromedriver has to be the same as the version of the Chrome browser installed on your PC.  
Additionally, you can download images through CLI interface or from a python file or jupyter notebook 

In [8]:
import os
os.getcwd()

'C:\\AIclass\\ComputerVisionLab02'

In [37]:
!googleimagesdownload -k "polar bear" -l 10 -o dataset/train -i polar -cd "./drivers/chromedriver.exe"


Item no.: 1 --> Item name = bear
Evaluating...
Starting Download...


Unfortunately all 10 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!

Errors: 0


Everything downloaded!
Total errors: 0
Total time taken: 2.3350934982299805 Seconds


# 2. flickerimagesdownloader

In [50]:
!python ./flickr_scraper_master/flickr_scraper.py --search 'honeybees&on&flowers' --n 10 --download

0/10 https://live.staticflickr.com/65535/48620326787_f87bafc371_o.jpg
1/10 https://live.staticflickr.com/65535/48129479887_01a6ef34b9_o.jpg
2/10 https://farm3.staticflickr.com/2850/33242579343_62e21599f2_b.jpg
3/10 https://live.staticflickr.com/1650/25826570813_9275af2eb2_o.jpg
4/10 https://farm4.staticflickr.com/3233/3359914159_de6e521fe0_b.jpg
5/10 https://farm4.staticflickr.com/3445/3359918993_d7dd1d4916_b.jpg
6/10 https://live.staticflickr.com/4469/36897146383_1827dc68ae_o.jpg
7/10 https://farm66.staticflickr.com/65535/49001595021_3730a6df28_b.jpg
8/10 https://live.staticflickr.com/4571/37795143414_8ccae77768_o.jpg
9/10 https://live.staticflickr.com/917/42416468264_e803e9c54b_o.jpg
Done. (0.8s)


'on' is not recognized as an internal or external command,
operable program or batch file.
'flowers'' is not recognized as an internal or external command,
operable program or batch file.


## Creating Datasets in Cloud environment

This is an example how image dataset was created within a cloud environment to detect objects in images. In this example GCP - [Google Cloud Platform Vision module](https://console.cloud.google.com/vision/datasets?project=tvdetection01) was used to upload, store, create, label, train, validate and export ML model to mobile, web or containerized platforms.