# Geminize Trulens Custom Feedback Function

We created a game where two players are given a word to draw and a canvas. The two players then race against each other to draw the word well enough to where the AI model (Gemini Pro) guesses that same word when asked "What is this an object of? Give a one word answer." Whoever sends an image where the AI guesses correctly is the winner.

To gauge Gemini Pro's ability to recognize certain words, we tested it on images of "apple" drawings chosen by our team as potential winners. We asked the model if each image depicted an apple and recorded its confidence level. Analyzing the average confidence across this dataset lets us evaluate how well our model performs. This analysis used a custom feedback function from Trulens.

## Install Dependencies

In [None]:
%pip install trulens-eval==0.19.2 llama-index google-generativeai>=0.3.0 matplotlib qdrant_client Ipython

In [7]:
import os
os.environ["GOOGLE_API_KEY"] = "..."

## Initialize Model and Load Images from URLs

Here we collect our dataset of apple drawings. The images are also stored in this repo in the folder 'apple_test_images'. There are two image urls of real apples that were helpful during testing the notebook.

In [8]:
from llama_index.multi_modal_llms.gemini import GeminiMultiModal

from llama_index.multi_modal_llms.generic_utils import (
    load_image_urls,
)

REAL_APPLE_IMAGE = "https://t4.ftcdn.net/jpg/01/68/57/49/240_F_168574964_oT96d6RDgQfar4OqdljIxwl7qYdOzGfe.jpg"
REAL_APPLE_IMAGE_2 =  "https://t3.ftcdn.net/jpg/01/76/97/96/240_F_176979696_hqfioFYq7pX13dmiu9ENrpsHZy1yM3Dt.jpg"

image_urls = [
    "https://i.imgur.com/48MZGWM.png",
    "https://i.imgur.com/gj7xoIz.png",
    "https://i.imgur.com/dn2p2mh.png",
    "https://i.imgur.com/CXrtEV7.png",
    "https://i.imgur.com/VkaBMhF.png",
    "https://i.imgur.com/MQLkY89.png",
    "https://i.imgur.com/Q7MmZvC.png",
    "https://i.imgur.com/PPPxgsX.png",
    "https://i.imgur.com/JuYydqX.png",
    "https://i.imgur.com/xxqX440.png"
]

image_documents = load_image_urls(image_urls)

gemini_pro = GeminiMultiModal(model_name="models/gemini-pro-vision")

## Setup TruLens Instrumentation

We set up our Tru instance and the model we are using (Gemini Pro).

In [9]:
from trulens_eval import TruCustomApp
from trulens_eval import Tru
from trulens_eval.tru_custom_app import instrument
from trulens_eval import Provider
from trulens_eval import Feedback
from trulens_eval import Select

tru = Tru()
tru.reset_database()

# create a custom class to instrument
class Gemini:
    @instrument
    def complete(self, prompt, image_documents):
        completion = gemini_pro.complete(
            prompt=prompt,
            image_documents=image_documents,
        )
        return completion

gemini = Gemini()

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `Tru` to prevent this.


## Setup Trulens Custom Provider with Gemini

Here we create a Trulens custom feedback function for evaluating if the model can guess correct drawings of one of our words: apple. We ask it specifically if the image is an apple, and the likelihood represented by a probability.

In [10]:

# create a custom gemini feedback provider
class Gemini_Provider(Provider):
    def apple_guess_rating(self, image_url) -> float:
        image_documents = load_image_urls([image_url])
        apple_score = float(gemini_pro.complete(prompt = "Is the image of an apple? Respond with the float likelihood from 0.00 (not apple) to 1.00 (apple).",
        image_documents=image_documents).text)
        return apple_score

gemini_provider = Gemini_Provider()

f_custom_function = Feedback(gemini_provider.apple_guess_rating, name = "Apple Guess Rating").on(Select.Record.calls[0].args.image_documents[0].image_url)

✅ In Apple Guess Rating, input image_url will be set to __record__.calls[0].args.image_documents[0].image_url .


## Evaluate Model

We can use our TruLens custom feedback function on these 10 test images to see how well the model is performing at guessing drawings. The team deemed these as worthy of a win, so let's see how confident the model is on these.

In [11]:
import statistics

confidence_levels = []

from IPython.display import Image
from IPython.display import display


for image_url in image_urls:
    #display the image
    display(Image(url=image_url, width=100, height=100))

    # print the confidence for the image
    confidence = gemini_provider.apple_guess_rating(image_url=image_url)
    confidence_levels.append(confidence)
    print(f"confidence: {confidence}")

    print("\n")
    

print(f"average confidence: {statistics.mean(confidence_levels)}")

confidence: 0.9




confidence: 0.99




confidence: 0.99




confidence: 0.99




confidence: 0.95




confidence: 0.9




confidence: 0.99




confidence: 0.99




confidence: 0.9




confidence: 0.9


average confidence: 0.95


## Wrapping Up/Next Steps

We were able to see that our model is highly accurate at guessing these drawings. Because of our time constraints, we had to work with a smaller dataset for this exercise. If we were to scale this application in the cloud, we could capture and store successful drawings from our player community and create a test set out of those.