# Running a neural network using a library

This continues what we did in the first notebook because we will continue using [Huggingface Transformers](https://github.com/huggingface/transformers). We could also use  [Huggingface Diffusers](https://github.com/huggingface/diffusers) that work with image generation and uses the same workflow.

We will try to understand more the code that we used.


First we install the library.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

!pip install -q transformers gradio ultralytics

## Pipelines

This library can run different tasks, that they organize in what they call [pipelines](https://huggingface.co/docs/transformers/pipeline_tutorial), and they are based on tasks or input/output format, like DepthEstimation or TextToAudio ([list](https://huggingface.co/docs/transformers/main_classes/pipelines)).

We can create a pipeline with the task name

```
transcriber = pipeline(task="automatic-speech-recognition")
```
or by selecting the model
```
transcriber = pipeline(model="openai/whisper-large-v2")
```

In [None]:
from transformers import pipeline

captioner = pipeline(model="Salesforce/blip-image-captioning-base", device=0)

Once created we can give it an input and it will process it and give us an output.

In [None]:
output = captioner("https://farm4.staticflickr.com/3129/3189318645_5466feb31a_z.jpg")

The output format varies with the pipeline.

In [None]:
print(output)

In [None]:
output = captioner("https://farm4.staticflickr.com/3129/3189318645_5466feb31a_z.jpg", max_new_tokens=8)
print(output)

## Models

But we can also search for a specific model and use it directly without a piepline. The code is more complex, but we can have more control.

We can search for a model here https://huggingface.co/models and usually they provide the code needed.

In this example we are going to use the DPT model for Depth Estimation https://huggingface.co/Intel/dpt-large.


In [None]:
#@title ▶ Use the DPT model for depth estimation

#@markdown `processor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")`

#@markdown `model = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas")`

#@markdown This cell has the code hidden, double click to view it.

from transformers import DPTImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "https://farm4.staticflickr.com/3129/3189318645_5466feb31a_z.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas")
model.to('cuda')

def process_depth(image):
  # prepare image for the model
  inputs = processor(images=image, return_tensors="pt").to('cuda')

  with torch.no_grad():
      outputs = model(**inputs)
      predicted_depth = outputs.predicted_depth

  # interpolate to original size
  prediction = torch.nn.functional.interpolate(
      predicted_depth.unsqueeze(1),
      size=image.size[::-1],
      mode="bicubic",
      align_corners=False,
  )

  # visualize the prediction
  output = prediction.squeeze().cpu().numpy()
  formatted = (output * 255 / np.max(output)).astype("uint8")

  return formatted, ""

depth, _ = process_depth(image)

from IPython.display import clear_output
clear_output()

Display the ouput

In [None]:
display(depth)

## Initialize another library (Yolo) for other computer vision tasks

First we load the models.


In [None]:
from ultralytics import YOLO

yolo_detection = YOLO('yolov8m.pt')  # load a model for object detection
yolo_pose = YOLO('yolov8m-pose.pt')  # load a model for pose detection
yolo_seg = YOLO('yolov8m-seg.pt')    # load a model for segmentation

Then we define the function to process the images

In [None]:
def process_detections(image):
  """
  This function applies object detection to the image

  Args:
    image (PIL.Image): The input image.

  Returns:
    tuple: A tuple containing the image with detections plotted and the JSON representation of the detections.
  """
  detections = yolo_detection(image)
  detection_image = detections[0].plot(img=Image.new('RGB', (detections[0].orig_shape[1], detections[0].orig_shape[0])))
  return detection_image, detections[0].tojson()

def process_pose(image):
  """
  This function applies pose detection to the image

  Args:
    image (PIL.Image): The input image.

  Returns:
    tuple: A tuple containing the image with pose plotted and the JSON representation of the pose.
  """
  pose = yolo_pose(image)
  pose_image = pose[0].plot(boxes=False, labels=False, img=Image.new('RGB', (pose[0].orig_shape[1], pose[0].orig_shape[0])))
  return pose_image, pose[0].tojson()

def process_seg(image):
  """
  This function applies segmentation to the image

  Args:
    image (PIL.Image): The input image.

  Returns:
    tuple: A tuple containing the image with segmentation plotted and the JSON representation of the segmentation.
  """
  seg = yolo_seg(image)
  seg_image = seg[0].plot(boxes=False, labels=False, img=Image.new('RGB', (seg[0].orig_shape[1], seg[0].orig_shape[0])))
  return seg_image, seg[0].tojson()

Process an image with the different models

In [None]:
from PIL import Image

# Load an image
url = "https://farm4.staticflickr.com/3129/3189318645_5466feb31a_z.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# Apply object detection
detection_image, detection_json = process_detections(image)
display(detection_image)

# Apply pose detection
pose_image, pose_json = process_pose(image)
display(pose_image)

# Apply segmentation
seg_image, seg_json = process_seg(image)
display(seg_image)

## Google drive

Optional: connect to google drive if you want to use images from there.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Give it a GUI

We can also create a GUI in Colab.


In [None]:
#@markdown  ▶  We first define some processing functions

#@markdown This cell has the code hidden, double click to view it.

from PIL import Image, ImageOps
from pathlib import Path
import cv2

def process_caption(image):
  output = captioner(image)
  return None, output

def process_single(image, task):
  """
  This function processes the given image based on the specified task.

  Args:
    image (PIL.Image): The input image.
    task (str): The task to be performed. It can be "detection", "segmentation", "pose", "depth" or "caption".

  Returns:
    tuple: A tuple containing the image with the task results plotted and the JSON representation of the results.
    If the task is not recognized, it defaults to processing depth.
  """
  if task == "detection":
    return process_detections(image)
  elif task == "segmentation":
    return process_seg(image)
  elif task == "pose":
    return process_pose(image)
  elif task == "depth":
    return process_depth(image)
  elif task == "caption":
    return process_caption(image)

def process(uploaded_files, local_files, output_folder, task):
    """
    This function processes the images selected in the GUI

    Args:
      uploaded_files: files uploaded through the GUI
      local_files: files selected from the local file system
      output_folder: the folder where the results will be saved
      task (str): The task to be performed. It can be "detection", "segmentation", "pose", or "depth".

    Returns:
      tuple: A tuple of lists containing the processed images and the corresponding JSON files
    """

    # Check the input
    if len(output_folder) == 0:
        raise gr.Error("You have to select an output folder!")

    if uploaded_files is None and len(local_files) == 0:
        raise gr.Error("You have to select at least one file or folder!")

    output_folder = os.path.dirname(output_folder[0]) if not os.path.isdir(output_folder[0]) else output_folder[0]

    input_files = []
    output_images = []
    output_json = []

    # List all the files to be processed
    if uploaded_files is not None:
        if isinstance(uploaded_files, list):
            input_files.extend(uploaded_files)
        else:
            input_files.append(uploaded_files)

    input_files.extend(local_files)

    input_files = [input_file for input_file in input_files if input_file.lower().endswith(".jpg") and not input_file.lower().endswith(".png") and not os.path.isdir(input_file)]

    # Process all the files
    if len(input_files) > 0:
        for input_file in input_files:
            # Open the image
            image = Image.open(input_file)
            image = ImageOps.exif_transpose(image)

            # Process the image
            result_image, result_json = process_single(image, task)
            output_json.append(result_json)

            # Save the resulting image
            if result_image is not None:
              output_images.append(result_image)
              output_filename = f"{Path(input_file).stem}-{task}.png"
              output_path = os.path.join(output_folder, output_filename)
              Image.fromarray(result_image).save(output_path)

            yield output_images, output_json

In [None]:
#@markdown  ▶  Then we create the GUI

import os

import gradio as gr

with gr.Blocks() as demo:
  with gr.Column():
    input_file = gr.File(file_count="multiple", file_types=[".jpg", ".png"], label="Input images")
    input_files = gr.FileExplorer(label="Remote files")
    output_folder = gr.FileExplorer(label="Remote output folder")
    task = gr.Radio(["detection", "segmentation", "pose", "depth", "caption"], label="Task", value="detection")
    process_button = gr.Button(value="Process")

  with gr.Column():
    with gr.Tabs():
      with gr.Tab(label="Images"):
        gallery = gr.Gallery(label="Processed images", show_label=False, columns=[3], object_fit="contain", height="auto")
      with gr.Tab(label="Data"):
        json = gr.JSON(label="Output data")

  process_button.click(process, [input_file, input_files, output_folder, task], [gallery, json])

demo.launch(quiet=True, debug=False, height=768)


# Finalizing

When you finish working you have to remember to **stop the runtime**, because there is a time limit and to avoid wasting resources. To stop the runtime click Manage Sessions on the Runtime menu. Once the dialog opens click terminate on the current runtime.

> But when you stop the runtime everything you have not saved is ⚠ **lost** ⚠, so be sure to **download** everything you want to keep before stopping it.
