# PepelineAI demo.

Usefull links:

- [Huggingface: Zero-shot image classification](https://huggingface.co/docs/transformers/tasks/zero_shot_image_classification)

- [Roboflow: What is Zero-Shot Classification?](https://blog.roboflow.com/what-is-zero-shot-classification/)

## A. Install python packages.

In [None]:
!pip install transformers
!pip install torch
!pip install Pillow

## B. Modules necessaries.

In [13]:
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import requests
import torch
import matplotlib.pyplot as plt
import time

# Show version of installed packages.
import pkg_resources
from importlib.metadata import version

## C. Optional: Showing modules and version used.

In [29]:
TORCH_PACKAGE_NAME = 'torch'
REQUESTS_PACKAGE_NAME = 'requests'
MATPLOTLIB_PACKAGE_NAME = 'matplotlib'
PIL_PACKAGE_NAME = 'Pillow'
TRANSFORMERS_PACKAGE_NAME = 'transformers'

INSTALLED_PACKAGES = [
    TORCH_PACKAGE_NAME,
    REQUESTS_PACKAGE_NAME,
    MATPLOTLIB_PACKAGE_NAME,
    PIL_PACKAGE_NAME,
    TRANSFORMERS_PACKAGE_NAME,
    ]

for package_name in INSTALLED_PACKAGES:
    print(f'{package_name}=={pkg_resources.get_distribution(package_name).version}')

torch==2.1.0+cu118
requests==2.31.0
matplotlib==3.7.1
Pillow==9.4.0
transformers==4.35.2


## D. Zero-Shot Image Classification models used.

In [32]:
OPENAI_CLIP_VIT_LARGE_PATCH_14 = "openai/clip-vit-large-patch14"
OPENAI_CLIP_VIT_LARGE_PATCH_16 = "openai/clip-vit-large-patch16" # Note: Needs auth token.
OPENAI_CLIP_VIT_LARGE_PATCH_32 = "openai/clip-vit-large-patch32" # Note: Needs auth token.

## E. Select and load pretrained model from huggingface.

In [26]:
model_name = OPENAI_CLIP_VIT_LARGE_PATCH_14

model = CLIPModel.from_pretrained(model_name)
processor = CLIPProcessor.from_pretrained(model_name)

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.


## F. Source code to build the PipelineAI

In [38]:
def give_me_probabilities_about_the_image(url:str, user_sentences_input:list, show_image:bool = True) -> None:
    """
        Function that recives the url image and the user sentence inputs
        to detect the sentence with the highest probability of success.
    """
    start = time.process_time()
    image = Image.open(requests.get(url, stream=True).raw)

    inputs = processor(text=user_sentences_input, images=image, return_tensors="pt", padding=True)

    outputs = model(**inputs)
    logits_per_image = outputs.logits_per_image # this is the image-text similarity score
    probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities

    values = probs.detach().tolist()

    print(f'\nImage: {url}.\nProbabilities for each set input:\n')
    for index, item in enumerate(values[0]):
        print(f'\tSentence {index + 1}: {"{:.2f}".format(item)} => {user_sentences_input[index]}')

    if show_image:
        plt.imshow(image)
        print(f'\nImage shape (width, height): {image.size}')

    total_time = f'{time.process_time() - start:.2f}'
    print(f'\nPipelineAI total processing time: {total_time} seconds')

class FilterAI:

    def __init__(self, filter_name:str, model_name:str, filter_text_inputs:list):

        self.filter_name = filter_name
        self.model_name = model_name
        self.filter_text_inputs = filter_text_inputs

class PipelineAI:

    def __init__(self, filters_ai:list, images_url:list):
        self.filters_ai=filters_ai
        self.images_url = images_url

    def run(self):
        start = time.process_time()
        print('\nStarting PipelineAI...\n')

        for img in self.images_url:
            for filter in self.filters_ai:

                print(f'Processing {img} in {filter.filter_name}')
                give_me_probabilities_about_the_image(url=img,user_sentences_input=filter.filter_text_inputs,show_image=False)

                print('------------------------------------------------------------------------------------------------------')

        total_time = f'{time.process_time() - start:.2f}'
        print(f'\nPipelineAI total processing time: {total_time} seconds')

## G. Main celd: Creating the objects to run the PipelineAI.

In [39]:
pipeline_ai = PipelineAI(
    filters_ai=[
        FilterAI(
            filter_name="FilterA",
            model_name="openai/clip-vit-large-patch14",
            filter_text_inputs=[
                "the photo contains one cat",
                "the photo contains several cats",
                "the photo contains one person",
                "the photo contains several people",
                "the photo contains something else"
            ]),
        FilterAI(filter_name="FilterB",
            model_name="openai/clip-vit-large-patch14",
            filter_text_inputs=[
                "the photo contains a black cat",
                "the photo contains a brown cat",
                "the photo contains something else"
            ]),
        FilterAI(filter_name="FilterC",
            model_name="openai/clip-vit-large-patch14",
            filter_text_inputs=[
                "the photo contains a sitting cat",
                "the photo contains a standing cat",
                "the photo contains something else"
            ])
        ], images_url= [
            "http://images.cocodataset.org/val2017/000000039769.jpg",
            "https://www.wbm.ca/isl/uploads/2020/11/hp-most-secure-printer-banner-1024x326.jpg",
            "https://www.theupsstore.com/Image%20Library/theupsstore/general-content/gc1/gc1_print-copies.jpg"
        ]
    )

pipeline_ai.run()


Starting PipelineAI...

Processing http://images.cocodataset.org/val2017/000000039769.jpg in FilterA

Image: http://images.cocodataset.org/val2017/000000039769.jpg.
Probabilities for each set input:

	Sentence 1: 0.25 => the photo contains one cat
	Sentence 2: 0.75 => the photo contains several cats
	Sentence 3: 0.00 => the photo contains one person
	Sentence 4: 0.00 => the photo contains several people
	Sentence 5: 0.00 => the photo contains something else

PipelineAI total processing time: 3.83 seconds
------------------------------------------------------------------------------------------------------
Processing http://images.cocodataset.org/val2017/000000039769.jpg in FilterB

Image: http://images.cocodataset.org/val2017/000000039769.jpg.
Probabilities for each set input:

	Sentence 1: 0.05 => the photo contains a black cat
	Sentence 2: 0.95 => the photo contains a brown cat
	Sentence 3: 0.01 => the photo contains something else

PipelineAI total processing time: 2.44 seconds
---