<center><img src="https://github.com/sebderhy/visionapi/blob/master/logos/cloud_vision_api.PNG?raw=true"></center>

# State-of-the-art computer vision in a single line of code

The Cloud Vision API enables developers to integrate state-of-the-art computer vision algorithms in a line of code, without any algorithmic or integration struggle. Below is the list of algorithms currently available in the API:
* **Object Detection**: locates and classifies objects in a given picture. <br />
    * Option 1: Based on the paper [DETR: End-to-End Object Detection With Transformers](https://alcinos.github.io/detr_page/)
    * Option 2: Based on the paper ["EfficientDet: Scalable and Efficient Object Detection"](https://arxiv.org/pdf/1911.09070.pdf), ranked #2 as of May 2020 on [COCO's Test set](https://paperswithcode.com/sota/object-detection-on-coco).
* **Panoptic Segmentation**: Each pixel is assigned a class label and all object instances are uniquely segmented.
    * Based on the paper [DETR: End-to-End Object Detection With Transformers](https://alcinos.github.io/detr_page/)
* **Monocular Depth Estimation**:  estimates how far each pixel is from the camera <br />
    * Based on the paper ["From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation"](https://arxiv.org/pdf/1907.10326v5.pdf), currently state-of-the-art on [KITTI and MIT Datasets](https://paperswithcode.com/task/monocular-depth-estimation), and its [PyTorch implementation](https://github.com/Navhkrin/Bts-PyTorch). A video of the algorithm's results can be found [here](https://www.youtube.com/watch?v=ekezJiGaiQk&feature=youtu.be)
    
You think that another algorithm should be included in this API? Please tell me about it at sebderhy@gmail.com.

<table>
    <tr><td><center>Object Detection</center></td><td><center>Depth</center></td></tr>
    <tr><td><img src='img_out/efficientdet-d7.png'></td><td><img src='img_out/depth-bts.png'></td></tr>
</table>

## Read this before you use it

* Be patient! When you submit an image, the results may take about 20 seconds to arrive
* The *semantic segmentation and depth estimation* algorithms will work well **on road pictures** (i.e. pictures taken from a car), because they have been trained on such datasets.
* The Background segmentation algorithm will work **only on portraits/selfies**, and is currently **only giving a rough contour** (typically, it will miss the subtilities in hair)
* Keep in mind that this is a side-project and not a finished product yet! Although I do my best to keep everything working and resilient, the results may be disappointing, and the server may fail (apologies if that's the case). In any case, please share your feedback with me (sebderhy@gmail.com), so that I can improve it accordingly.   


In [1]:
from utils import *

In [2]:
from ipywidgets import widgets

In [3]:
## VIZ FOR OBJECT DETECTION
import matplotlib.pyplot as plt

def vis_bbox(img_pil, bbox, label=None, score=None,
             instance_colors=None, alpha=1., linewidth=2., ax=None, min_score=0.4):
    """Visualize bounding boxes inside the image.
    Args:
        img (~numpy.ndarray): An array of shape :math:`(3, height, width)`.
            This is in RGB format and the range of its value is
            :math:`[0, 255]`. If this is :obj:`None`, no image is displayed.
        bbox (~numpy.ndarray): An array of shape :math:`(R, 4)`, where
            :math:`R` is the number of bounding boxes in the image.
            Each element is organized
            by :math:`(y_{min}, x_{min}, y_{max}, x_{max})` in the second axis.
        label (~numpy.ndarray): An integer array of shape :math:`(R,)`.
            The values correspond to id for label names stored in
            :obj:`label_names`. This is optional.
        score (~numpy.ndarray): A float array of shape :math:`(R,)`.
             Each value indicates how confident the prediction is.
             This is optional.
        label_names (iterable of strings): Name of labels ordered according
            to label ids. If this is :obj:`None`, labels will be skipped.
        instance_colors (iterable of tuples): List of colors.
            Each color is RGB format and the range of its values is
            :math:`[0, 255]`. The :obj:`i`-th element is the color used
            to visualize the :obj:`i`-th instance.
            If :obj:`instance_colors` is :obj:`None`, the red is used for
            all boxes.
        alpha (float): The value which determines transparency of the
            bounding boxes. The range of this value is :math:`[0, 1]`.
        linewidth (float): The thickness of the edges of the bounding boxes.
        ax (matplotlib.axes.Axis): The visualization is displayed on this
            axis. If this is :obj:`None` (default), a new axis is created.
    Returns:
        ~matploblib.axes.Axes:
        Returns the Axes object with the plot for further tweaking.
    from: https://github.com/chainer/chainercv
    """

    if label is not None and not len(bbox) == len(label):
        raise ValueError('The length of label must be same as that of bbox')
    if score is not None and not len(bbox) == len(score):
        raise ValueError('The length of score must be same as that of bbox')

    # Returns newly instantiated matplotlib.axes.Axes object if ax is None
    if ax is None:
        fig = plt.figure()
        # ax = fig.add_subplot(1, 1, 1)
#         h, w, _ = img.shape
        w, h = img_pil.size
        w_ = w / 60.0
        h_ = w_ * (h / w)
        fig.set_size_inches((w_, h_))
        ax = plt.axes([0, 0, 1, 1])
#     ax.imshow(img.astype(np.uint8))
    ax.imshow(img_pil)
    ax.axis('off')
    # If there is no bounding box to display, visualize the image and exit.
    if len(bbox) == 0:
        return fig, ax

    if instance_colors is None:
        # Red
        instance_colors = np.zeros((len(bbox), 3), dtype=np.float32)
        instance_colors[:, 0] = 51
        instance_colors[:, 1] = 51
        instance_colors[:, 2] = 224
    instance_colors = np.array(instance_colors)

    for i, bb in enumerate(bbox):
        if score[i]<min_score: 
            continue

        xy = (bb[0], bb[1])
        height = bb[3] - bb[1]
        width = bb[2] - bb[0]
        color = instance_colors[i % len(instance_colors)] / 255
        ax.add_patch(plt.Rectangle(
            xy, width, height, fill=False,
            edgecolor=color, linewidth=linewidth, alpha=alpha))

        caption = []
        caption.append(obj_list[label[i]])
        if(len(score) > 0):
            sc = int(score[i]*100)
            caption.append('{}%'.format(sc))

        if len(caption) > 0:
            face_color = np.array([225, 51, 123])/255
            ax.text(bb[0], bb[1],
                    ': '.join(caption),
                    fontsize=12,
                    color='black',
                    style='italic',
                    bbox={'facecolor': face_color, 'edgecolor': face_color, 'alpha': 1, 'pad': 0})
    return fig, ax

In [4]:
def pltfigure2img(fig ,ax):
    buf = io.BytesIO()
    fig.savefig(buf, format='png', dpi = 100)
    plt.close(fig) 
    buf.seek(0)
    img_pil_out = (Image.open(buf)).copy()
    buf.close()
    return img_pil_out

In [5]:
obj_list = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
            'fire hydrant', '', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep',
            'cow', 'elephant', 'bear', 'zebra', 'giraffe', '', 'backpack', 'umbrella', '', '', 'handbag', 'tie',
            'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
            'skateboard', 'surfboard', 'tennis racket', 'bottle', '', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
            'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut',
            'cake', 'chair', 'couch', 'potted plant', 'bed', '', 'dining table', '', '', 'toilet', '', 'tv',
            'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
            'refrigerator', '', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
            'toothbrush']

def parse_objdet_response(r):
    rois = eval(r.json()['rois'])
    rois = np.array(rois)

    class_ids = eval(r.json()['class_ids'])
    class_ids = np.array(class_ids)

    scores = eval(r.json()['scores'])
    scores = np.array(scores)
    return rois, class_ids, scores

def generate_objdet_img(r, img_in):
    rois, class_ids, scores = parse_objdet_response(r)    
    fig, ax = vis_bbox(img_pil=img_in, bbox=rois,
                   label=class_ids, score=scores)
    img_out = pltfigure2img(fig,ax)
    img_out = img_out.resize(img_in.size)
    return img_out

In [6]:
URL_BG = 'https://img.theculturetrip.com/768x432/wp-content/uploads/2018/01/webp-net-compress-image-45.jpg'
def viz_out(model, img_in, img_out, url_bg=URL_BG):
    if model[0] == 'binseg':
        response = requests.get(url_bg)
        bg = Image.open(BytesIO(response.content))
        bg2 = bg.resize(img_in.size)
        img_out2=Image.composite(img_in, bg2, img_out)
    else:
        img_out2 = img_out
    return img_out2

## Option 1: call from URL 

Define the Image URL you want to test below, and click on process. Both the input and output will be displayed. 

**Be patient, the results can take ~20 seconds to appear**

In [7]:
url_placeholder = widgets.Text(
    placeholder='URL of an image',
    value = 'https://www.go-telaviv.com/images/driving-in-israel-tel-aviv-traffic-jam1.jpg',
    disabled=False
)

In [8]:
out_pl_url = widgets.Output()
out_pl_url.clear_output()

In [9]:
out_pl_url_2 = widgets.Output()
out_pl_url_2.clear_output()

In [10]:
models_list_url = widgets.Dropdown(
    options=[('Super-resolution', ['superres', 'superres-2b']),
             ('Style Transfer 1', ['styletransfer', 'styletransf-1']),
             ('Style Transfer 2', ['styletransfer', 'styletransf-2']),
             ('Style Transfer 3', ['styletransfer', 'styletransf-3']),
             ('Semantic Segmentation', ['semseg', 'semseg-3']),
             ('Depth', ['depth', 'depth-bts']),
             ('Object Detection', ['objdet', 'efficientdet-d4']),
             ('Background Extraction', ['binseg', 'binseg-3'])],
    value=['semseg', 'semseg-3'],
    disabled=False,
)

In [11]:
btn_run_url = widgets.Button(description='Process')

In [12]:
lbl_status_url = widgets.Label()
lbl_status_url.value = 'waiting for user input'

In [13]:
def on_click_process_url(change):
    out_pl_url.clear_output()
    out_pl_url_2.clear_output()
    lbl_status_url.value = 'loading input image...'
    response = requests.get(url_placeholder.value)
    img = Image.open(BytesIO(response.content))
    lbl_status_url.value = 'Image loaded (see below). Processing...'
    with out_pl_url: display(img)
    r = URLImgAPICall(url_placeholder.value, f'{models_list_url.value[0]}/{models_list_url.value[1]}/')
    if models_list_url.value[0] != 'objdet':
        img_out = response2img(r)
        img_out = viz_out(models_list_url.value, img, img_out)
    else: 
        img_out = generate_objdet_img(r, img)
    with out_pl_url_2: display(img_out)
    lbl_status_url.value = 'Here is the output image !'

btn_run_url.on_click(on_click_process_url)

In [14]:
widgets.VBox([widgets.Label('Choose your algorithm'), models_list_url,
              widgets.Label('Write your image URL'), url_placeholder, 
              widgets.Label('Click below to process the image'), btn_run_url, 
              lbl_status_url, out_pl_url, out_pl_url_2])

VBox(children=(Label(value='Choose your algorithm'), Dropdown(index=4, options=(('Super-resolution', ['superre…

## Option 2: Import local picture

In [15]:
btn_upload = widgets.FileUpload()

In [16]:
out_pl = widgets.Output()
out_pl.clear_output()

In [17]:
out_pl2 = widgets.Output()
out_pl2.clear_output()

In [18]:
models_list = widgets.Dropdown(
    options=[('Super-resolution', ['superres', 'superres-2b']),
             ('Style Transfer 1', ['styletransfer', 'styletransf-1']),
             ('Style Transfer 2', ['styletransfer', 'styletransf-2']),
             ('Style Transfer 3', ['styletransfer', 'styletransf-3']),
             ('Semantic Segmentation', ['semseg', 'semseg-3']),
             ('Depth', ['depth', 'depth-bts']),
             ('Object Detection', ['objdet', 'efficientdet-d4']),
             ('Background Extraction', ['binseg', 'binseg-3'])],
    value=['objdet', 'efficientdet-d4'],
    disabled=False,
)

In [19]:
btn_run = widgets.Button(description='Process')

In [20]:
lbl_status = widgets.Label()
lbl_status.value = 'waiting for user input'

In [21]:
def on_click_process(change):
    out_pl.clear_output()
    out_pl2.clear_output()
    lbl_status.value = 'loading input image...'
    img = Image.open(BytesIO(btn_upload.data[-1]))
    with out_pl: display(img)
    lbl_status_url.value = 'Image loaded (see below). Processing...'
    r = pilImgAPICall(img, f'{models_list.value[0]}/{models_list.value[1]}/')
    if models_list.value[0] != 'objdet':
        img_out = response2img(r)
        img_out = viz_out(models_list_url.value, img, img_out)
    else: 
        img_out = generate_objdet_img(r, img)
    with out_pl2: display(img_out)
    lbl_status.value = 'Here is the output image !'

btn_run.on_click(on_click_process)

Upload your image below, and click on process. Both the input and output will be displayed. 

**Be patient, the results can take ~20 seconds to appear**

In [22]:
widgets.VBox([widgets.Label('Choose your algorithm'), models_list,
              widgets.Label('Choose your image'), btn_upload, 
              widgets.Label('Click below to process the image'), btn_run, 
              lbl_status, out_pl, out_pl2])

VBox(children=(Label(value='Choose your algorithm'), Dropdown(index=6, options=(('Super-resolution', ['superre…

Elapsed time: 10.800711870193481
