In [None]:
!pip install -q -U google-generativeai

In [None]:
import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [None]:
# Used to securely store your API key
from google.colab import userdata

In [None]:
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

In [None]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-pro
models/gemini-pro-vision


In [None]:
model = genai.GenerativeModel('gemini-pro')

## Emotion Experiments

#### Emotion Experiment Design:

Step 1: Run a pilot study on the general capability of the most advanced language models / vision-language models to understand whether they are already perfect in their capabilities (Completed)

To-Do:
Step 2: Design multimodal prompts in multiple ways.
  1. Simple questions in prompts - this is the way in which following examples are done. Eg., asking models about whether they understand about emotions expressed in images of facial expressions.
  2. Designing Conversational prompts - Asking the model about different parts of the image dataset and then finally asking about an emotion inference.
  3. Designing Chain-of-Thought Prompts - asking the model about an emotion inference, where the model also uses chain-of-thought reasoning to arrive at the inference.
  4. Generating explanations: For all emotion inference generated, ask for a corresponding explanation.
  5. Prompting with and without a sample class space: Asking models about emotional responses with and without a set of ground truth classes to find default granularity.

Step 3: Applying all of these prompting techniques to both expressed and evoked emotions and tallying the results for each of the prompting methods.

#### Datasets to be used (in addition to the pilot dataset):

1. EMOTIC Dataset: https://paperswithcode.com/dataset/emotic
2. EmoSet Dataset: https://vcc.tech/EmoSet
3. The ArtPhoto Dataset
4. The AbstractArt Dataset
5. Emotion6 Dataset

Possible Extensions: carrying out the same for emotional video understanding.

In [None]:
import PIL.Image

In [None]:
vision_model = genai.GenerativeModel('gemini-pro-vision')

#### Experiments in recognizing human faces

 Using a small subset of images from a Facial expression recognition dataset: https://www.kaggle.com/datasets/jonathanoheix/face-expression-recognition-dataset?resource=download
 All images are from the training set, and the respective emotion categories.

In [None]:
# case 1: Happiness (easy)
happiness_easy = "emotion_face_images/happy/82.jpg"
img = PIL.Image.open(happiness_easy)
response1 = vision_model.generate_content(["What is the emotion expressed in the given picture?", img], stream = True)

In [None]:
response1.resolve()

In [None]:
to_markdown(response1.text)

>  The emotion expressed in the given picture is happiness.

Observation: Identifies happiness easily from pictures that are clearly depictive of it.

In [None]:
# case 2: Happiness (difficult)
happiness_difficult = "emotion_face_images/happy/239.jpg"
img = PIL.Image.open(happiness_difficult)
response2 = vision_model.generate_content(["What is the emotion expressed in the given picture?", img], stream = True)

In [None]:
response2.resolve()

In [None]:
to_markdown(response2.text)

>  The emotion expressed in the picture is sadness.

Observation: Misclassifies ambiguous examples, or examples that do not have clear identifiers of an emotion.

In [None]:
# case 3: Happiness (moderate)
happiness_moderate = "emotion_face_images/happy/1550.jpg"
img = PIL.Image.open(happiness_moderate)
response3 = vision_model.generate_content(["What is the emotion expressed in the given picture?", img], stream = True)

In [None]:
response3.resolve()

In [None]:
to_markdown(response3.text)

>  The emotion expressed in the given picture is happiness.

Observation: Is able to classify moderately ambiguous examples.

In [None]:
# case 4: another moderate example: Happiness (moderate)
happiness_moderate2 = "emotion_face_images/happy/2733.jpg"
img = PIL.Image.open(happiness_moderate2)
response4 = vision_model.generate_content(["What is the emotion expressed in the given picture?", img], stream = True)

In [None]:
response4.resolve()

In [None]:
to_markdown(response4.text)

>  The emotion expressed in the picture is happiness.

Observation: able to classify moderately difficult images

#### Comparing *surprise* and *fear*

In [None]:
# case 5: Surprise (positive)
surprise_p = "emotion_face_images/surprise/1552.jpg"
img = PIL.Image.open(surprise_p)
response5 = vision_model.generate_content(["What is the emotion expressed in the given picture?", img], stream = True)

In [None]:
response5.resolve()

In [None]:
to_markdown(response5.text)

>  The emotion expressed in the picture is happiness.

Observation: Takes the cue of the more basic emotion, instead of classifying surprise.

In [None]:
# case 6: Surprise (negative)
surprise_n = "emotion_face_images/surprise/964.jpg"
img = PIL.Image.open(surprise_n)
response6 = vision_model.generate_content(["What is the emotion expressed in the given picture?", img], stream = True)

In [None]:
response6.resolve()

In [None]:
to_markdown(response6.text)

>  The emotion expressed in the given picture is fear.

Observation: classifies negative surprise as fear. Can be considered logically okay as well, because it is zero-shot evaluation.

In [None]:
# case 7: Actual Fear (easy)
fear_easy = "emotion_face_images/fear/669.jpg"
img = PIL.Image.open(fear_easy)
response7 = vision_model.generate_content(["What is the emotion expressed in the given picture?", img], stream = True)

In [None]:
response7.resolve()

In [None]:
to_markdown(response7.text)

>  The emotion expressed in the given picture is fear.

In [None]:
# case 8: Fear (Difficult)
fear_diff = "emotion_face_images/fear/739.jpg"
img = PIL.Image.open(fear_diff)
response8 = vision_model.generate_content(["What is the emotion expressed in the given picture?", img], stream = True)

In [None]:
response8.resolve()

In [None]:
to_markdown(response8.text)

>  The emotion expressed in the given picture is sadness.

Observation: can categorize easy images of fear but confuses it with other common negative emotions when more ambiguous.

Ask gemini to write code or use predefined tools for calculations instead of performing them on the fly