# ComputerVisionLab02 Intro

The goal of this lab is to show the importance of visual datasets when building a machine learning model. Furthermore this lab will introduce you with analytical tools applied to extract information from visual datasets. 

<br><br>

<img src="./images/VisualDataML.png">

<br><br><br><br><br>

<img src="./images/KeyFactorsML.png">

<br><br><br><br><br>

<img src="./images/ImportanceBigData.png">

<br><br><br><br><br>

<img src="./images/ModelML.png">

<br><br><br><br><br>

<img src="./images/DeepLearning.png">

<br><br><br><br><br>

<img src="./images/MLCategories.png">

<br><br><br><br><br>

<img src="./images/AlgorithmsML.png">

<br><br><br><br><br>

<img src="./images/RepresentingML.png">

<br><br><br><br><br>

# Keras

Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle low-level operations such as tensor products, convolutions and so on itself. 
<img src="https://keras.io/img/keras-logo-small.jpg" width="140">

Keras development is backed primarily by Google, and the Keras API comes packaged in TensorFlow as tf.keras. Additionally, Microsoft maintains the CNTK Keras backend. Amazon AWS is maintaining the Keras fork with MXNet support. Other contributing companies include NVIDIA, Uber, and Apple (with CoreML).

<img src="./images/KerasStack.png">


## Keras backend
Keras relies on a specialized, well optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. Rather than picking one single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras.

At this time, Keras has three backend implementations available: the TensorFlow backend, the Theano backend, and the CNTK backend.

* [TensorFlow](https://www.tensorflow.org/lite) is an open-source symbolic tensor manipulation framework developed by Google.
* [Theano](http://deeplearning.net/software/theano/) is an open-source symbolic tensor manipulation framework developed by LISA Lab at Université de Montréal.Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. The latest release of Theano was on 2017/11/15. 
* [CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/) is an open-source toolkit for deep learning developed by Microsoft. It describes neural networks as a series of computational steps via a directed graph. CNTK allows the user to easily realize and combine popular model types such as feed-forward DNNs, convolutional neural networks (CNNs) and recurrent neural networks (RNNs/LSTMs). CNTK implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers.<br><br><br><br>


## Keras popularity

<img src="./images/KerasPop01.png" width="600">
<br><br>
<img src="./images/KerasPop02.png" width="600">
<br><br><br><br><br>

# The Keras Ecosystem

* ### [Keras Tuner](https://keras-team.github.io/keras-tuner/)
scalable hyperparameter search framework
* ### [AutoKeras](https://autokeras.com/)
widely accessible and easy to learn and use machine learning AutoML system based on Keras
* ### [TensorFlow Cloud](https://github.com/tensorflow/cloud)
set of utilities for running large-scale Keras training jobs on Google Cloud Platform
* ### [TensorFlow.js](https://www.tensorflow.org/js)
used for running TF models in the browser or on a Node.js server
* ### [TensorFlow Lite](https://www.tensorflow.org/lite) 
open source deep learning framework for deploying ML models on mobile and IoT devices
* ### [Model optimization toolkit](https://www.tensorflow.org/model_optimization)
toolkit is used to make ML models faster and more memory and power efficient
* ### [TFX integration](https://www.tensorflow.org/tfx) 
ML platform for deploying and maintaining production machine learning pipelines

<br><br><br><br><br>

# Interoperability and Compatibility
Keras also supports the [ONNX](https://onnx.ai/) format. 

<img src="https://onnx.ai/assets/mlogo.png" width="140">

ONNX is a open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners.

<img src="./images/ONNXCommunity.png">

The keras2onnx model converter enables users to convert Keras models into the ONNX model format. Initially, the Keras converter was developed in the project onnxmltools. To support more kinds of Keras models and reduce the complexity of mixing multiple converters, keras2onnx was created to convert the Keras model only.
<br><br><br><br><br>

# End to end ML model requirements

#### 1. Acquiring/Creating a dataset 
#### 2. Preparing a dataset
#### 3. Dataset split to train and test data
#### 4. Training the ML model
#### 5. The ML model validation
#### 6. ML deployment

<br><br><br>

<img src="./images/tdsp-lifecycle.png" width="500">

# Datasets 

### 1. [CIFAR-10/100](https://www.cs.toronto.edu/~kriz/cifar.html) dataset
Dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. 
<img src="./images/CIFAR.png" width="500">

### 2. [MNIST](http://yann.lecun.com/exdb/mnist/) Dataset
The MNIST database has a training set of 60,000 examples, and a test set of 10,000 examples. 
<img src="./images/MNIST.png" width="500">

### 3. IMDB-Wiki Dataset
Dataset has 520k images - gender and age prediction.
<img src="./images/IMDB.png" width="500">

### 4. [ImageNet](http://www.image-net.org/challenges/LSVRC/) dataset
Evaluation of object detection and image classification algorithms.
<img src="./images/imagenet.jpeg" width="500">

### 5. Places2 Database dataset
Dateaset has more than 10 million images and over 400 scenes. It is used for scene classification and scene parsing.
<img src="./images/Places.png" width="500">

### 6. [COCO](http://cocodataset.org/#external) dataset
A large-scale object detection, segmentation, and captioning dataset with several features including Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.
<img src="./images/COCO.png" width="500">

### 7. [Kaggle Datasets Collection](https://www.kaggle.com/datasets) 
Kaggle enables you to search among 36924 datasets. 
<img src="./images/KaggleData.png" width="500">

<br><br><br>

## Acquiring Datasets using MS Azure Bing Search API

1. Go to [Azure Portal](https://portal.azure.com/#home) and sign in with your Microsoft account. If you don't have a Microsoft account, click Create one!.
2. From the portal, type Bing in the search box.
3. Under Marketplace in the search results, select the Bing service you're interested in (for example, Bing Search or Bing Custom Search).
4. If you have a free trial or pay account, skip to Create your Bing resource.
5. On the Create a free account splash screen, click Start free.
6. Next, you have the option of continuing with the free trial (click Start free again) or paying for an Azure subscription (click Or buy now). You can always start with the free trial and pay for a subscription later.


### Install using pip install azure-cognitiveservices-search-imagesearch

In [1]:
pip install azure-cognitiveservices-search-imagesearch

Collecting azure-cognitiveservices-search-imagesearch
  Downloading azure_cognitiveservices_search_imagesearch-2.0.0-py2.py3-none-any.whl (46 kB)
Collecting msrest>=0.5.0
  Downloading msrest-0.6.21-py2.py3-none-any.whl (85 kB)
Collecting azure-common~=1.1
  Downloading azure_common-1.1.28-py2.py3-none-any.whl (14 kB)
Collecting msrestazure<2.0.0,>=0.4.32
  Downloading msrestazure-0.6.4-py2.py3-none-any.whl (40 kB)
Collecting isodate>=0.6.0
  Downloading isodate-0.6.1-py2.py3-none-any.whl (41 kB)

Collecting adal<2.0.0,>=0.6.0
  Downloading adal-1.2.7-py2.py3-none-any.whl (55 kB)
Collecting cryptography>=1.1.0
  Downloading cryptography-37.0.2-cp36-abi3-win_amd64.whl (2.4 MB)
Collecting PyJWT<3,>=1.0.0
  Downloading PyJWT-2.4.0-py3-none-any.whl (18 kB)
Installing collected packages: PyJWT, isodate, cryptography, msrest, adal, msrestazure, azure-common, azure-cognitiveservices-search-imagesearch
Successfully installed PyJWT-2.4.0 adal-1.2.7 azure-cognitiveservices-search-imagesearch-2.0

In [2]:
pip install azure-cognitiveservices-search-visualsearch

Collecting azure-cognitiveservices-search-visualsearch
  Downloading azure_cognitiveservices_search_visualsearch-0.2.0-py2.py3-none-any.whl (124 kB)
Installing collected packages: azure-cognitiveservices-search-visualsearch
Successfully installed azure-cognitiveservices-search-visualsearch-0.2.0
Note: you may need to restart the kernel to use updated packages.


### Import necessary packages

In [35]:
from azure.cognitiveservices.search.imagesearch import ImageSearchClient
from msrest.authentication import CognitiveServicesCredentials
import requests
from requests import exceptions
import cv2
import os
import matplotlib.pyplot as plt
from PIL import Image
from io import BytesIO
from pprint import pprint

### Create variables for your subscription key and search term.

In [103]:
subscription_key = ''
subscription_endpoint = 'https://api.bing.microsoft.com/v7.0/images/search'
max_images=20
group_size=20
search_term = "car"

In [104]:
# when attempting to download images from the web both the Python programming language and the requests library have a number of
# exceptions that can be thrown so let's build a list of them now so we can filter on them
EXCEPTIONS = set([IOError, FileNotFoundError,exceptions.RequestException, exceptions.HTTPError, exceptions.ConnectionError, exceptions.Timeout])

In [106]:
# store the search term in a variable then set the headers and search parameters
term = search_term
headers = {'Ocp-Apim-Subscription-Key' : subscription_key}
params = {'q': term, 'offset': 0, 'count': group_size}

In [107]:
# make the search
print("Searching Bing API for '{}'".format(term))
search = requests.get(subscription_endpoint, headers=headers, params=params)
search.raise_for_status()

Searching Bing API for 'car'


In [108]:
pprint(search.json())

{'_type': 'Images',
 'currentOffset': 0,
 'instrumentation': {'_type': 'ResponseInstrumentation'},
 'nextOffset': 20,
 'pivotSuggestions': [{'pivot': 'car',
                       'suggestions': [{'displayText': 'Planes Movie',
                                        'searchLink': 'https://api.bing.microsoft.com/api/v7/images/search?q=Planes+Movie&tq=%7b%22pq%22%3a%22car%22%2c%22qs%22%3a%5b%7b%22cv%22%3a%22car%22%2c%22pv%22%3a%22car%22%2c%22hps%22%3atrue%2c%22iqp%22%3afalse%7d%2c%7b%22cv%22%3a%22Planes+Movie%22%2c%22pv%22%3a%22%22%2c%22hps%22%3afalse%2c%22iqp%22%3atrue%7d%5d%7d',
                                        'text': 'Planes Movie',
                                        'thumbnail': {'thumbnailUrl': 'https://tse2.mm.bing.net/th?q=Planes+Movie&pid=Api&mkt=en-WW&cc=HR&setlang=hr&adlt=moderate&t=1'},
                                        'webSearchUrl': 'https://www.bing.com/images/search?q=Planes+Movie&tq=%7b%22pq%22%3a%22car%22%2c%22qs%22%3a%5b%7b%22cv%22%3a%22car%22%2c%22

            'name': 'Lamborghini body kit auto quad | Smart car, Tuner cars, '
                    'Mini cars',
            'thumbnail': {'height': 474, 'width': 474},
            'thumbnailUrl': 'https://tse3.mm.bing.net/th?id=OIP.-RAC_ZLjCDLuwu1wvvKcEgHaHa&pid=Api',
            'webSearchUrl': 'https://www.bing.com/images/search?view=detailv2&FORM=OIIRPO&q=car&id=86DD586EAD6D19A59A9E2CA6770E19F1CC36CB24&simid=607992869233641029',
            'width': 640},
           {'accentColor': '202021',
            'contentSize': '153986 B',
            'contentUrl': 'https://www.lovethispic.com/uploaded_images/82611-Sexy-Sports-Car.jpg',
            'datePublished': '2021-04-24T06:49:00.0000000Z',
            'encodingFormat': 'jpeg',
            'height': 638,
            'hostPageDiscoveredDate': '2019-10-15T00:00:00.0000000Z',
            'hostPageDisplayUrl': 'https://www.lovethispic.com/image/82611/sexy-sports-car',
            'hostPageFavIconUrl': 'https://www.bing.com/th?id=ODF.S1qdbTa

In [109]:
# grab the results from the search, including the total number of
# estimated results returned by the Bing API
results = search.json()
estNumResults = min(results["totalEstimatedMatches"], max_images)
print("There are {} search results for '{}'".format(estNumResults,term))
# initialize the total number of images downloaded thus far

There are 20 search results for 'car'


In [110]:
# loop over the estimated number of results in `group_size` groups
results_all = []
for offset in range(0, estNumResults, group_size):
    # update the search parameters using the current offset, then make the request to fetch the results
    print("Making request for group {}-{} of {}...".format(offset, offset + group_size, estNumResults))
    params["offset"] = offset
    search = requests.get(subscription_endpoint, headers=headers, params=params)
    search.raise_for_status()
    results = search.json()
    results_all.append(results)
    #print("Saving images for group {}-{} of {}...".format(offset, offset + group_size, estNumResults))

Making request for group 0-20 of 20...


In [111]:
pprint(results_all)

[{'_type': 'Images',
  'currentOffset': 0,
  'instrumentation': {'_type': 'ResponseInstrumentation'},
  'nextOffset': 20,
  'pivotSuggestions': [{'pivot': 'car',
                        'suggestions': [{'displayText': 'Planes Movie',
                                         'searchLink': 'https://api.bing.microsoft.com/api/v7/images/search?q=Planes+Movie&tq=%7b%22pq%22%3a%22car%22%2c%22qs%22%3a%5b%7b%22cv%22%3a%22car%22%2c%22pv%22%3a%22car%22%2c%22hps%22%3atrue%2c%22iqp%22%3afalse%7d%2c%7b%22cv%22%3a%22Planes+Movie%22%2c%22pv%22%3a%22%22%2c%22hps%22%3afalse%2c%22iqp%22%3atrue%7d%5d%7d',
                                         'text': 'Planes Movie',
                                         'thumbnail': {'thumbnailUrl': 'https://tse2.mm.bing.net/th?q=Planes+Movie&pid=Api&mkt=en-WW&cc=HR&setlang=hr&adlt=moderate&t=1'},
                                         'webSearchUrl': 'https://www.bing.com/images/search?q=Planes+Movie&tq=%7b%22pq%22%3a%22car%22%2c%22qs%22%3a%5b%7b%22cv%22%3a%22ca

             'imageInsightsToken': 'ccid_nNIeKcIk*cp_AE2F1E2B0BC1F64F00989AC1C356E512*mid_F0E29A6D6BAD2C777FBCC586890A987C9E3294E0*simid_608024407181188343*thid_OIP.nNIeKcIk31b6ccZFsiq1ogHaHa',
             'insightsMetadata': {'availableSizesCount': 1,
                                  'pagesIncludingCount': 1},
             'isFamilyFriendly': True,
             'name': 'Samsung is partnering up with car brands like @BMW to '
                     'bring a connected ...',
             'thumbnail': {'height': 474, 'width': 474},
             'thumbnailUrl': 'https://tse3.mm.bing.net/th?id=OIP.nNIeKcIk31b6ccZFsiq1ogHaHa&pid=Api',
             'webSearchUrl': 'https://www.bing.com/images/search?view=detailv2&FORM=OIIRPO&q=car&id=F0E29A6D6BAD2C777FBCC586890A987C9E3294E0&simid=608024407181188343',
             'width': 640},
            {'accentColor': '8EA922',
             'contentSize': '105907 B',
             'contentUrl': 'https://i.pinimg.com/736x/61/9f/de/619fde5cfecfec12f0360d77d6

In [112]:
ROOT_DIR=os.getcwd()
output_dir=os.path.join(ROOT_DIR,search_term)
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
print('Folder exist')

Folder exist


In [113]:
# loop over the results
total = 0
for results in results_all:
    for v in results["value"]:
        # try to download the image
        print(v)
        try:
            # make a request to download the image
            print("Image fetching: {}".format(v["contentUrl"]))
            r = requests.get(v["contentUrl"], timeout=30)
            print(v["contentUrl"])

            # build the path to the output image
            ext = v["contentUrl"][v["contentUrl"].rfind("."):]

            p = os.path.join(output_dir, "{}{}".format(str(total).zfill(8), ext))
            print(p)

            # write the image to disk
            f = open(p, "wb")
            f.write(r.content)
            f.close()

        # catch any errors that would not enable us to download the image
        except Exception as e:

            # check to see if our exception is in our list of
            # exceptions to check for
            if type(e) in EXCEPTIONS:
                print("[INFO] skipping: {}".format(v["contentUrl"]))
                continue
        # try to load the image from disk
        image = cv2.imread(p)

        # if the image is `None` then we could not properly load the
        # image from disk (so it should be ignored)
        if image is None:
            print("[INFO] deleting: {}".format(p))
            os.remove(p)
            continue
        #cv2.imshow('image', image)
        #cv2.waitKey(0)
        #cv2.destroyAllWindows() 
        # update the counter
        total += 1                            

{'webSearchUrl': 'https://www.bing.com/images/search?view=detailv2&FORM=OIIRPO&q=car&id=9CEAE0578FACF289CACE6B9C3E1B9CD4876F6FBF&simid=608056293013785957', 'name': 'Italian Sports Car Brands - Driving your dream', 'thumbnailUrl': 'https://tse1.mm.bing.net/th?id=OIP.RZ21QL_IqKgA4afpS1aSRwHaJQ&pid=Api', 'datePublished': '2020-11-24T02:53:00.0000000Z', 'isFamilyFriendly': True, 'contentUrl': 'http://www.drivingyourdream.com/uploads/9/6/0/4/96041650/marciano-prototipi-2.jpg', 'hostPageUrl': 'https://www.drivingyourdream.com/articles/italiancarbrands', 'contentSize': '103507 B', 'encodingFormat': 'jpeg', 'hostPageDisplayUrl': 'https://www.drivingyourdream.com/articles/italiancarbrands', 'width': 640, 'height': 800, 'hostPageDiscoveredDate': '2020-10-02T00:00:00.0000000Z', 'thumbnail': {'width': 474, 'height': 592}, 'imageInsightsToken': 'ccid_RZ21QL/I*cp_A8FF4BB73ACC86E7F8D1139CE6EF0825*mid_9CEAE0578FACF289CACE6B9C3E1B9CD4876F6FBF*simid_608056293013785957*thid_OIP.RZ21QL!_IqKgA4afpS1aSRwHaJ

https://d.ibtimes.co.uk/en/full/1631045/poo-powered-car-australia.jpg
C:\pyprojects\ComputerVisionLab02\car\00000005.jpg
{'webSearchUrl': 'https://www.bing.com/images/search?view=detailv2&FORM=OIIRPO&q=car&id=68A65BE31D1186A3B555CA0F112E5D3F82D8163C&simid=608011535161500508', 'name': 'That Cars edit on @1fastbee Hellcat 👌 #challenger | Vehicles, Sports ...', 'thumbnailUrl': 'https://tse1.mm.bing.net/th?id=OIP.QdkCIRqbujdnX4tyZoXsLAHaHa&pid=Api', 'datePublished': '2021-08-19T07:15:00.0000000Z', 'isFamilyFriendly': True, 'contentUrl': 'https://i.pinimg.com/originals/ba/55/1e/ba551ead85c46f0e9e5adc0df9a8481d.jpg', 'hostPageUrl': 'https://www.pinterest.com/pin/640496378226383616/', 'contentSize': '80462 B', 'encodingFormat': 'jpeg', 'hostPageDisplayUrl': 'https://www.pinterest.com/pin/640496378226383616', 'width': 640, 'height': 640, 'hostPageFavIconUrl': 'https://www.bing.com/th?id=ODF.PmATFqOwm9_sUEmusAtcwA&pid=Api', 'hostPageDomainFriendlyName': 'Pinterest', 'hostPageDiscoveredDate': '2

https://i.pinimg.com/originals/23/65/15/236515455857fea8e3afc497ac68d3e5.jpg
C:\pyprojects\ComputerVisionLab02\car\00000011.jpg
{'webSearchUrl': 'https://www.bing.com/images/search?view=detailv2&FORM=OIIRPO&q=car&id=86DD586EAD6D19A59A9E2CA6770E19F1CC36CB24&simid=607992869233641029', 'name': 'Lamborghini body kit auto quad | Smart car, Tuner cars, Mini cars', 'thumbnailUrl': 'https://tse3.mm.bing.net/th?id=OIP.-RAC_ZLjCDLuwu1wvvKcEgHaHa&pid=Api', 'datePublished': '2021-04-25T12:53:00.0000000Z', 'isFamilyFriendly': True, 'contentUrl': 'https://i.pinimg.com/736x/61/9f/de/619fde5cfecfec12f0360d77d63fbf98--smart-car-lamborghini.jpg', 'hostPageUrl': 'https://www.pinterest.com/pin/134122895122499095/', 'contentSize': '105907 B', 'encodingFormat': 'jpeg', 'hostPageDisplayUrl': 'https://www.pinterest.com/pin/134122895122499095', 'width': 640, 'height': 640, 'hostPageFavIconUrl': 'https://www.bing.com/th?id=ODF.PmATFqOwm9_sUEmusAtcwA&pid=Api', 'hostPageDomainFriendlyName': 'Pinterest', 'hostPage

https://www.asiancommunitynews.com/wp-content/uploads/2020/12/Honda.jpg
C:\pyprojects\ComputerVisionLab02\car\00000017.jpg
[INFO] deleting: C:\pyprojects\ComputerVisionLab02\car\00000017.jpg
{'webSearchUrl': 'https://www.bing.com/images/search?view=detailv2&FORM=OIIRPO&q=car&id=A67CDB55C81BF1A4912CC8DCC6F0628572A5C493&simid=608052388889838970', 'name': 'Vinyl Wrap For Cars - It’s A Wrap! - Boost And Camber in 2020 | Vinyl ...', 'thumbnailUrl': 'https://tse1.mm.bing.net/th?id=OIP.McR26gcfStZ-zkJM0OKXAgHaHa&pid=Api', 'datePublished': '2020-10-05T11:16:00.0000000Z', 'isFamilyFriendly': True, 'contentUrl': 'https://i.pinimg.com/736x/f1/79/3d/f1793dc0010dd47cf56594e5a2ec4d54.jpg', 'hostPageUrl': 'https://www.pinterest.com/pin/776378423254147450/', 'contentSize': '90341 B', 'encodingFormat': 'jpeg', 'hostPageDisplayUrl': 'https://www.pinterest.com/pin/776378423254147450', 'width': 640, 'height': 640, 'hostPageFavIconUrl': 'https://www.bing.com/th?id=ODF.PmATFqOwm9_sUEmusAtcwA&pid=Api', 'host

### Exercise

#### 1. Create two folders with two groups of images for example for image classification task.
#### 2. Increase the number of downloadable images.
#### 3. How would you change the size of the images.
#### 4. How would you download only sqaured, standard aspect ratio.
#### 5. How would you download only gray/monochrome images.
#### 6. How would you download only PNG images.

### Tip: Check options at https://docs.microsoft.com/en-us/bing/search-apis/bing-image-search/reference/query-parameters.


<br><br><br>

## Creating Datasets in Cloud environment

This is an example how image dataset was created within a cloud environment to detect objects in images. In this example GCP - [Google Cloud Platform Vision module](https://console.cloud.google.com/vision/datasets?project=tvdetection01) was used to upload, store, create, label, train, validate and export ML model to mobile, web or containerized platforms.