# Introductory notebook with colab basics and a simple neural network example

## Outline

1.   Notebook saving
2.   Colab basic concepts
3.   Running a neural network
4.   Google drive integration and automation

This a Colaboratory notebook, a document containing information and executable code.


## Integration with Drive

Colaboratory is integrated with Google Drive. It allows you to share, comment, and collaborate on the same document with multiple people:

* **File->Make a Copy** creates a copy of the notebook in Drive.

* **File->Save** saves the File to Drive. **File->Save and checkpoint** pins the version so it doesn't get deleted from the revision history.

* **File->Revision history** shows the notebook's revision history.

* The **Share** button (top-right of the toolbar) allows you to share the notebook and control permissions set on it.


## Runtime
To execute the notebook, you must connect to a Runtime by clicking connect on the upper right corner.

This runtime *is* a server in a datacenter somewhere. So everything that you do **here** is really done **there**.

**When you stop the runtime everything you have not saved is lost.**

When you finish working you have to remember to stop the runtime, because there is a time limit and to avoid wasting resources. To stop the runtime click **Manage Sessions** on the **Runtime** menu. Once the dialog opens click terminate on the current runtime. It's something like shutting down the computer.


## Files

Since everything that we do in colab is done on a remote server, all the files are also stored there. There is a simple file browser on the left column, it can be opened clicking in the 📁 icon.

## Cells
A notebook is a list of cells. Cells contain either explanatory text or executable code and its output. Click a cell to select it.

### Code cells
Below there's a **code cell**. Once you have connected to a Runtime click in the cell to select it and execute the contents clicking the **Play icon** in the left gutter of the cell.

Once the cell has executed a green checkbox icon will be shown on the left.


In [None]:
10 + 10

Code cells can run Python code like before or execute system commands on the remote machine when preceeded with ! or %.

In the next cell we list the contents of the current folder.

In [None]:
!ls

# Running computer vision neural networks using a library and automating it.

We will see how to easily run computer vision neural networks writing code using a library. Then we will see how to automate the preocessing.

## Outline

1.   Simple way to run different neural networks for different tasks
  1.   Huggingface
  2.   Specific libraries (Yolo)
2.   Google Drive
3.   Creating a GUI and automating computer vision


First we install the library.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

!pip install -q transformers gradio ultralytics

## Huggingface

This library can run different tasks, that they organize in what they call [pipelines](https://huggingface.co/docs/transformers/pipeline_tutorial), and they are based on tasks or input/output format, like DepthEstimation or TextToAudio ([list](https://huggingface.co/docs/transformers/main_classes/pipelines)).

We can create a pipeline with the task name

```
transcriber = pipeline(task="automatic-speech-recognition")
```
or by selecting the model
```
transcriber = pipeline(model="openai/whisper-large-v2")
```

In [None]:
from transformers import pipeline

captioner = pipeline(model="Salesforce/blip-image-captioning-base", device=0)

Once created we can give it an input and it will process it and give us an output.

In [None]:
output = captioner("https://farm4.staticflickr.com/3129/3189318645_5466feb31a_z.jpg")
print(output)

We can create other pipelines for other tasks. In this case we are going to use a DepthEstimate pipeline.

In [None]:
depth_estimator = pipeline(task="depth-estimation", model="Intel/dpt-hybrid-midas", device=0)
output = depth_estimator("https://farm4.staticflickr.com/3129/3189318645_5466feb31a_z.jpg")
display(output["depth"])

## Initialize another library (Yolo) for other computer vision tasks

First we load the models.


In [None]:
from ultralytics import YOLO

yolo_detection = YOLO('yolov8m.pt')  # load a model for object detection
yolo_pose = YOLO('yolov8m-pose.pt')  # load a model for pose detection
yolo_seg = YOLO('yolov8m-seg.pt')    # load a model for segmentation

Then we define the function to process the images

In [None]:
def process_detections(image):
  """
  This function applies object detection to the image

  Args:
    image (PIL.Image): The input image.

  Returns:
    tuple: A tuple containing the image with detections plotted and the JSON representation of the detections.
  """
  detections = yolo_detection(image)
  detection_image = detections[0].plot(img=Image.new('RGB', (detections[0].orig_shape[1], detections[0].orig_shape[0])))
  return detection_image, detections[0].tojson()

def process_pose(image):
  """
  This function applies pose detection to the image

  Args:
    image (PIL.Image): The input image.

  Returns:
    tuple: A tuple containing the image with pose plotted and the JSON representation of the pose.
  """
  pose = yolo_pose(image)
  pose_image = pose[0].plot(boxes=False, labels=False, img=Image.new('RGB', (pose[0].orig_shape[1], pose[0].orig_shape[0])))
  return pose_image, pose[0].tojson()

def process_seg(image):
  """
  This function applies segmentation to the image

  Args:
    image (PIL.Image): The input image.

  Returns:
    tuple: A tuple containing the image with segmentation plotted and the JSON representation of the segmentation.
  """
  seg = yolo_seg(image)
  seg_image = seg[0].plot(boxes=False, labels=False, img=Image.new('RGB', (seg[0].orig_shape[1], seg[0].orig_shape[0])))
  return seg_image, seg[0].tojson()

Process an image with the different models

In [None]:
import requests
from PIL import Image

# Load an image
url = "https://farm4.staticflickr.com/3129/3189318645_5466feb31a_z.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# Apply object detection
detection_image, detection_json = process_detections(image)
display(detection_image)

# Apply pose detection
pose_image, pose_json = process_pose(image)
display(pose_image)

# Apply segmentation
seg_image, seg_json = process_seg(image)
display(seg_image)

## Google drive

Optional: connect to google drive if you want to use images from there.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Give it a GUI

We can also create a GUI in Colab.


In [None]:
#@markdown  ▶  We first define some processing functions

#@markdown This cell has the code hidden, double click to view it.

from PIL import Image, ImageOps
from pathlib import Path
import cv2

def process_caption(image):
  output = captioner(image)
  return None, output

def process_depth(image):
  output = depth_estimator(image)
  return output["depth"], None

def process_single(image, task):
  """
  This function processes the given image based on the specified task.

  Args:
    image (PIL.Image): The input image.
    task (str): The task to be performed. It can be "detection", "segmentation", "pose", "depth" or "caption".

  Returns:
    tuple: A tuple containing the image with the task results plotted and the JSON representation of the results.
    If the task is not recognized, it defaults to processing depth.
  """
  if task == "detection":
    return process_detections(image)
  elif task == "segmentation":
    return process_seg(image)
  elif task == "pose":
    return process_pose(image)
  elif task == "depth":
    return process_depth(image)
  elif task == "caption":
    return process_caption(image)

def process(uploaded_files, local_files, output_folder, task):
    """
    This function processes the images selected in the GUI

    Args:
      uploaded_files: files uploaded through the GUI
      local_files: files selected from the local file system
      output_folder: the folder where the results will be saved
      task (str): The task to be performed. It can be "detection", "segmentation", "pose", or "depth".

    Returns:
      tuple: A tuple of lists containing the processed images and the corresponding JSON files
    """

    # Check the input
    if len(output_folder) == 0:
        raise gr.Error("You have to select an output folder!")

    if uploaded_files is None and len(local_files) == 0:
        raise gr.Error("You have to select at least one file or folder!")

    output_folder = os.path.dirname(output_folder[0]) if not os.path.isdir(output_folder[0]) else output_folder[0]

    input_files = []
    output_images = []
    output_json = []

    # List all the files to be processed
    if uploaded_files is not None:
        if isinstance(uploaded_files, list):
            input_files.extend(uploaded_files)
        else:
            input_files.append(uploaded_files)

    input_files.extend(local_files)

    input_files = [input_file for input_file in input_files if (input_file.lower().endswith(".jpg") or input_file.lower().endswith(".jpeg") or input_file.lower().endswith(".png")) and not os.path.isdir(input_file)]

    # Process all the files
    if len(input_files) > 0:
        for input_file in input_files:
            # Open the image
            image = Image.open(input_file)
            image = ImageOps.exif_transpose(image)

            # Process the image
            result_image, result_json = process_single(image, task)
            output_json.append(result_json)

            # Save the resulting image
            if result_image is not None:
              output_images.append(result_image)
              output_filename = f"{Path(input_file).stem}-{task}.png"
              output_path = os.path.join(output_folder, output_filename)
              if not isinstance(result_image, Image.Image):
                  result_image = Image.fromarray(result_image)
              result_image.save(output_path)

            yield output_images, output_json

In [None]:
#@markdown  ▶  Then we create the GUI

import os

import gradio as gr

with gr.Blocks() as demo:
  with gr.Column():
    input_file = gr.File(file_count="multiple", file_types=[".jpg", ".png"], label="Input images")
    input_files = gr.FileExplorer(label="Remote files")
    output_folder = gr.FileExplorer(label="Remote output folder")
    task = gr.Radio(["detection", "segmentation", "pose", "depth", "caption"], label="Task", value="detection")
    process_button = gr.Button(value="Process")

  with gr.Column():
    with gr.Tabs():
      with gr.Tab(label="Images"):
        gallery = gr.Gallery(label="Processed images", show_label=False, columns=[3], object_fit="contain", height="auto")
      with gr.Tab(label="Data"):
        json = gr.JSON(label="Output data")

  process_button.click(process, [input_file, input_files, output_folder, task], [gallery, json])

demo.launch(quiet=True, debug=False, height=768)


# Finalizing

When you finish working you have to remember to **stop the runtime**, because there is a time limit and to avoid wasting resources. To stop the runtime click Manage Sessions on the Runtime menu. Once the dialog opens click terminate on the current runtime.

> But when you stop the runtime everything you have not saved is ⚠ **lost** ⚠, so be sure to **download** everything you want to keep before stopping it.


# Credits

Taller Estampa https://tallerestampa.com / https://github.com/estampa
