Cell 1: Setup Environment Variables and API Key
Objective: Load the environment variables and configure the Google Generative AI client with the API key.

In [None]:
import warnings
warnings.filterwarnings('ignore')

# Load environment variables and API keys
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())  # Load .env file
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')  # Load API Key from .env

# Set up the generative AI library
import google.generativeai as genai
from google.api_core.client_options import ClientOptions

# Configure the API client
genai.configure(
        api_key=GOOGLE_API_KEY,
        transport="rest",
        client_options=ClientOptions(
            api_endpoint=os.getenv("GOOGLE_API_BASE"),
        ),
)


Cell 2: Helper Functions
Objective: Define helper functions for formatting text and displaying media in Markdown format.

In [None]:
import textwrap
import PIL.Image
from IPython.display import Markdown, Image

# Convert text to markdown format
def to_markdown(text):
    """
    Convert input text to Markdown for better display in notebooks.
    
    Args:
        text (str): Text to be converted to Markdown.
    
    Returns:
        Markdown: Markdown formatted text.
    """
    text = text.replace('•', '  *')
    return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))


Cell 3: Call the Large Multimodal Model (LMM)
Objective: Define a function that sends an image and a prompt to the Google Generative AI model.



In [None]:
def call_LMM(image_path: str, prompt: str) -> str:
    """
    Call the Large Multimodal Model (LMM) to analyze the provided image and prompt.

    Args:
        image_path (str): Path to the image file to be analyzed.
        prompt (str): Text prompt describing the task for the model.

    Returns:
        str: Model's response formatted in Markdown.
    """
    # Load the image using PIL
    img = PIL.Image.open(image_path)
    
    # Call generative model using the image and prompt
    model = genai.GenerativeModel('gemini-1.5-flash')
    response = model.generate_content([prompt, img], stream=True)
    response.resolve()
    
    # Return the model's response in markdown format
    return to_markdown(response.text)


Cell 4: Analyze Image with LMM
Objective: Use the LMM to analyze the content of an image and return a description.



In [None]:
# Analyze the SP-500 image using LMM
Image(url="SP-500-Index-Historical-Chart.jpg")  # Display the image in the notebook

# Call LMM and pass the prompt to analyze the image
call_LMM("SP-500-Index-Historical-Chart.jpg", "Explain what you see in this image.")


Cell 5: Analyze a Harder Image
Objective: Test the model with a more complex image (e.g., a diagram) to see how it handles harder cases.

In [None]:
# Display and analyze a more complex image using LMM
Image(url="clip.png")

# Use the LMM to describe the more complex image
call_LMM("clip.png", "Explain what this figure is and where is this used.")


Cell 6: Decode a Hidden Message in an Image
Objective: Test the model by giving it an image with a hidden message and asking it to reveal the content.

In [None]:
# Load and display an image with a hidden message
Image(url="blankimage3.png")

# Use LMM to read and decode the hidden message in the image
call_LMM("blankimage3.png", "Read what you see on this image.")


Cell 7: Visualizing How the Model "Sees" the Image
Objective: Convert the image to a NumPy array and visualize the way the model interprets the image.

In [None]:
import imageio.v2 as imageio
import numpy as np
import matplotlib.pyplot as plt

# Load the image and convert it to a NumPy array
image = imageio.imread("blankimage3.png")
image_array = np.array(image)

# Visualize the image using a threshold to highlight specific areas
plt.imshow(np.where(image_array[:, :, 0] > 120, 0, 1), cmap='gray')


Cell 8: Create Your Own Hidden Message
Objective: Create an image with a hidden message, save it, and use the LMM to decode the message.

In [None]:
def create_image_with_text(text, font_size=20, font_family='sans-serif', text_color='#73D955', background_color='#7ED957'):
    """
    Create an image with hidden text inside.

    Args:
        text (str): Text to be displayed in the image.
        font_size (int): Size of the font.
        font_family (str): Font family to be used.
        text_color (str): Color of the text.
        background_color (str): Background color of the image.

    Returns:
        plt.Figure: Figure object of the created image.
    """
    # Create a plot with the given text and styling
    fig, ax = plt.subplots(figsize=(5, 5))
    fig.patch.set_facecolor(background_color)
    ax.text(0.5, 0.5, text, fontsize=font_size, ha='center', va='center', color=text_color, fontfamily=font_family)
    ax.axis('off')
    plt.tight_layout()
    return fig

# Generate an image with a hidden message
fig = create_image_with_text("Hello, world!")

# Display the generated image with hidden text
plt.show()

# Save the generated image
fig.savefig("extra_output_image.png")

# Use LMM to decode the hidden message
call_LMM("extra_output_image.png", "Read what you see on this image.")


Cell 9: Decode the Created Hidden Message
Objective: Convert the image with the hidden message to a NumPy array and visualize it.

In [None]:
# Load the image created with hidden text
image = imageio.imread("extra_output_image.png")

# Convert the image to a NumPy array
image_array = np.array(image)

# Visualize the hidden text by thresholding the red channel
plt.imshow(np.where(image_array[:, :, 0] > 120, 0, 1), cmap='gray')


The above code demonstrates how to interact with a Large Multimodal Model (LMM) using the Google Generative AI API to analyze and generate content from both text and images. Here's a summary of the key components:

Environment Setup: The code loads environment variables, including the Google API key, and configures the generative AI client to make API calls.
Helper Functions: Utility functions are defined to format text and display images or markdown in the notebook.
LMM Function: A function call_LMM is provided to load an image and a prompt, and send them to the LMM for analysis, retrieving a description or explanation from the model.
Image Analysis: The code shows how to analyze several images using the LMM, including an SP-500 chart, a complex figure, and an image with hidden text.
Hidden Message Creation: The notebook also allows users to create images with hidden messages and then uses the LMM to decode these messages.
Image Processing: Finally, the notebook demonstrates how to visualize the model's interpretation of images by converting them to NumPy arrays and applying filters.
Overall, the code showcases the use of LMM for multimodal image-text analysis and the ability to interactively generate and interpret visual data.