# Imageomics Bioclip-demo 
BioClip is a model using CLIP architecture as a vison model for general organismal biology. Trained on TreeOfLife-10M dataset. BioClip includes a understanding on the hierarchical structure that relates species across the tree of life.

For the purpose of Plant Commnicator I will utilize this model to identify common houseplants

## 1. Load the Model and Tokenizer
To begin, we will install the necessary libraries for the model and houseplant identification task. We will load a pre-trained model and its corresponding tokenizer using the Hugging Face Transformers library. The tokenizer is responsible for converting the input text into a format that the model can understand, while the model is used to perform the actual predictions or classifications. We will specify the model name, load the tokenizer, and then load the model. This process ensures that we have all the necessary components to perform text processing and analysis.

In [3]:
import open_clip
import torch
import requests
import numpy as np
from PIL import Image
from io import BytesIO
import pandas as pd

model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:imageomics/bioclip')
tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip')

## 2. Tokenize Common houseplant names

We will input a file that includes a list of common houseplant names included in the BioClip model. This will be used to classify and tokenize the ext using the model's tokenizer. This prepares the items for input into the model by converting them into a format the model can process.

In [4]:
# Load the CSV file
df = pd.read_csv("./houseplants.csv")  # Ensure the path is correct

# Extract plant names (common or species names depending on the column names)
plant_names = df["Common Name"].tolist()

tokenized_names = tokenizer(plant_names)

print("Tokenized Plant Names:", tokenized_names)

Tokenized Plant Names: tensor([[49406,  8798,  3912,  ...,     0,     0,     0],
        [49406,  3021, 10647,  ...,     0,     0,     0],
        [49406,   628, 41965,  ...,     0,     0,     0],
        ...,
        [49406,  3329,   539,  ...,     0,     0,     0],
        [49406,  9287,  4108,  ...,     0,     0,     0],
        [49406,  1192,   917,  ...,     0,     0,     0]])


In [7]:
image_path="./RubberTreePlant.webp"
image = preprocess_val(Image.open(image_path)).unsqueeze(0)

In [8]:
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
image = image.to(device)

# Get text embeddings
with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(tokenized_names)

# Compute similarity
similarities = torch.cosine_similarity(image_features, text_features)
predicted_index = similarities.argmax().item()
predicted_plant = plant_names[predicted_index]

print("Predicted Plant:", predicted_plant)

Predicted Plant: Pothos
