# Open-source Experiments

In this notebook, we will show you how you can run the open-source models from our experiments on COLUMBUS. These models are: Fuyu-8b (non-instruction-tuned VQA), BLIP-2 Flan-T5-XXL (instruction-tuned VQA), and Mistral-7B (text-only instruction-tuned QA). All results computed in this notebook will be stored under the `model_results` folder (found in the same directory as this notebook). Additionally, all models will be downloaded under `model_downloads`.

## Setup
To get started, run the setup as shown below.

<!-- Run these commands to install necessary libraries -->
%pip install accelerate

In [None]:
# Add root folder to allow module imports
import sys
sys.path.append("../")

In [None]:
# Import Python modules
import os
import json
import random

from tqdm import tqdm
from PIL import Image

# Import COLUMBUS benckmark
from puzzles.Benchmark import Benchmark

The following code can be tweaked to alter the number of puzzles which the models will run on (first *n* puzzles), as well as the prompts that will be used. These prompts correspond to prompt 2 (see the paper) for the vision-language models, and prompt 3 for the text-only models.

In [None]:
N_PUZZLES = 1008
MODELS_DIR = "./model_downloads"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

PROMPT_TEMPLATE = "You are given a rebus puzzle. It consists of text or icons that is used to convey a word or phrase. It needs to be solved through creative thinking. Which word/phrase is conveyed in this image from the following options (either A, B, C, or D)?\n(A) {} (B) {} (C) {} (D) {}"
MISTRAL_PROMPT_TEMPLATE = "You are given a description of a graph that is used to convey a word or phrase. It needs to be solved through creative thinking. The nodes are elements that contain text or icons, which are then manipulated through the attributes of their node. The description is as follows:\n{}\nWhich word/phrase is conveyed in this description from the following options (either A, B, C, or D)?\n(A) {} (B) {} (C) {} (D) {}"

## Non-instruction-tuned VQA Model (Fuyu-8b)

The following code will download and run Fuyu-8b on COLUMBUS.

In [None]:
# Download model and processor for Fuyu-8b
model = FuyuForCausalLM.from_pretrained(
    "adept/fuyu-8b",
    cache_dir=MODELS_DIR,
    device_map={"": 0},
)

processor = FuyuProcessor.from_pretrained(
    "adept/fuyu-8b",
    cache_dir=MODELS_DIR
)

# Get puzzles from the benchmark
benchmark = Benchmark(with_metadata=True)
puzzles = benchmark.get_puzzles()

# Loop over N_PUZZLES puzzles and prompt Fuyu-8b to solve the given puzzle
for puzzle in tqdm(puzzles, desc=f"Prompting Fuyu-8b"):
    # Get path to image and options
    image = Image.open(puzzle["image"]).convert("RGB")
    options = puzzle["options"]

    # Format prompt
    prompt_format = list(options.values())
    prompt = PROMPT_TEMPLATE.format(*prompt_format)
    puzzle["prompt"] = prompt

    # Prompt Fuyu-8b
    inputs = processor(images=image, text=prompt, return_tensors="pt").to(device=DEVICE, dtype=torch.float16)
    generated_ids = model.generate(**inputs, max_new_tokens=200)
    generated_text = processor.batch_decode(generated_ids[:, -200:], skip_special_tokens=True)[0].strip()
    puzzle["output"] = generated_text

# Save results under the 'model_results' folder
with open(f"./model_results/fuyu_prompt_2.json", "w") as file:
    json.dump(puzzles, file, indent=3)

## Instruction-tuned VQA Model (BLIP-2 Flan-T5-XXL)

The following code will download and run BLIP-2 Flan-T5-XXL on COLUMBUS.

In [None]:
# Download model and processor for BLIP-2 Flan-T5-XXL
processor = Blip2Processor.from_pretrained(
    f"Salesforce/blip2-flan-t5-xxl",
    cache_dir=models_dir
)

model = Blip2ForConditionalGeneration.from_pretrained(
    f"Salesforce/blip2-flan-t5-xxl",
    cache_dir=models_dir,
    device_map={"": 0},
    torch_dtype=torch.float16
)

# Get puzzles from the benchmark
benchmark = Benchmark(with_metadata=True)
puzzles = benchmark.get_puzzles()

# Loop over N_PUZZLES puzzles and prompt BLIP-2 Flan-T5-XXL to solve the given puzzle
for puzzle in tqdm(puzzles, desc=f"Prompting BLIP-2 Flan-T5-XXL"):
    # Get path to image and options
    image = Image.open(puzzle["image"]).convert("RGB")
    options = puzzle["options"]
    
    # Format prompt
    prompt_format = list(options.values())
    prompt = PROMPT_TEMPLATE.format(*prompt_format)
    puzzle["prompt"] = prompt

    # Prompt BLIP-2 FLan-T5-XXL
    inputs = processor(images=image, text=prompt, return_tensors="pt").to(device=DEVICE, dtype=torch.float16)
    generated_ids = model.generate(**inputs, max_length=512)
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
    puzzle["output"] = generated_text

# Save results under the 'model_results' folder
with open(f"./model_results/blip2-flan-t5-xxl_prompt_2.json", "w") as file:
    json.dump(puzzles, file, indent=3)

## Instruction-tuned text-only QA Model (Mistral-7b)

The following code will download and run Mistral-7b on COLUMBUS. This requires an API key for authentication, which by default is under the `MISTRAL_API_KEY` environmen

In [None]:
# Variable for Mistral API key (change this is you do not have the environment variable set)
MISTRAL_API_KEY = os.getenv("MISTRAL_API_KEY")

# Download model and processor for Mistral-7B
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.2",
    cache_dir=models_dir,
    token=MISTRAL_API_KEY
).to(DEVICE)

tokenizer = AutoTokenizer.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.2",
    cache_dir=models_dir,
    token=MISTRAL_API_KEY
)

# Get puzzles from the benchmark
benchmark = Benchmark(with_metadata=True)
puzzles = benchmark.get_puzzles()

# Loop over N_PUZZLES puzzles and prompt Mistral-7B to solve the given puzzle
for puzzle in tqdm(puzzles, desc=f"Prompting Mistral-7B"):
    # Get options
    options = puzzle["options"]

    # Format prompt
    prompt_format = [puzzle["metadata"]["nodes"]] + list(options.values())
    prompt = MISTRAL_PROMPT_TEMPLATE.format(*prompt_format)
    puzzle["prompt"] = prompt

    # Mistral-7B
    messages = [
        {"role": "user", "content": prompt}
    ]

    model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(DEVICE)
    generated_ids = model.generate(model_inputs, max_new_tokens=100, do_sample=True)
    generated_text = tokenizer.batch_decode(generated_ids)[0]
    puzzle["output"] = generated_text

# Save results under the 'model_results' folder
with open(f"./model_results/blip2-flan-t5-xxl_prompt_2.json", "w") as file:
    json.dump(puzzles, file, indent=3)