# Plant Verification Using CLIP Model

### Overview:

This Jupyter notebook is specifically designed to demonstrate the application of OpenAI's CLIP (Contrastive Language–Image Pretraining) model in verifying whether a given image contains a plant. Through a series of cells, we utilize the CLIP model's ability to assess the correlation between images and text descriptions, focusing on distinguishing images of plants from those of other subjects.

### Environment Setup:

- The notebook requires Python 3.x.
- Necessary libraries: PIL (Python Imaging Library), requests, and transformers.
- Internet connection is required for model downloading and image retrieval.

### Workflow:

- **Model Initialization**: Load the CLIP model and its processor.
- **Image Processing**: Retrieve and process two distinct images - one of a plant and another of a non-plant subject (dog).
- **Input Preparation**: Pair each image with two textual descriptions, one accurate and one inaccurate.
- **Model Prediction**: Use the CLIP model to evaluate the similarity between the images and each of the text descriptions.
- **Result Analysis**: Analyze the model's predictions to determine its accuracy in identifying the subject of the images.

In [1]:
from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")


2023-12-27 15:36:46.672636: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [16]:
url = "https://firebasestorage.googleapis.com/v0/b/tree-hops.appspot.com/o/plants%2FV3ZBN1N68mSLhg2mw9fQVUiiQum2_1703375486488_indoor-plants-1643136651.jpeg?alt=media&token=940e10eb-5f73-4b3a-a878-977f17c41c1c"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["This photo of a plant", "This not a photo of a plant"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) 

In [17]:
probs[0][0] > probs[0][1]

tensor(False)

In [30]:
url = "https://firebasestorage.googleapis.com/v0/b/tree-hops.appspot.com/o/plants%2FV3ZBN1N68mSLhg2mw9fQVUiiQum2_1703572721555_image.jpg?alt=media&token=cfc9ecf6-b3e0-4fa2-9b55-636091ae0aab"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["This is a photo of a living plant, showing characteristics like leaves, stems, or flowers commonly found in botanical subjects.", "his photo does not depict a plant but may include objects, animals, landscapes, or people that are clearly distinguishable from botanical subjects."], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) 

In [31]:
probs[0][0] > probs[0][1]

tensor(False)

In [32]:
probs[0][0]

tensor(0.0279, grad_fn=<SelectBackward0>)

In [33]:
probs[0][1]

tensor(0.9721, grad_fn=<SelectBackward0>)

In [52]:
url="https://firebasestorage.googleapis.com/v0/b/tree-hops.appspot.com/o/plants%2FVSxKaW0cWhYMIZHzYNBFwAVUYz52_1703389831411_image.jpg?alt=media&token=013f3f81-0019-4aaa-b6b9-2c3e1f974c12"

In [53]:
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["This is a photo of a living plant, showing characteristics like leaves, stems, or flowers commonly found in botanical subjects.", "his photo does not depict a plant but may include objects, animals, landscapes, or people that are clearly distinguishable from botanical subjects."], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) 

In [54]:
probs[0][0] > probs[0][1]

tensor(True)