# **Prompt gen**


**Documentation**

This project generates structured, creative image generation prompts for a vision-language model like Show-O using Qwen 1.5-0.5B-Chat as the backbone language model. The process is designed to create prompts that vary by difficulty (Easy, Medium, Hard) while being grounded in real-world object co-occurrence data from large-scale vision-language datasets.

The input is a list of target objects (from interesting_objects_v2.csv) that we want to generate prompts for. For each object, the pipeline generates three types of prompts:

Easy Prompt: A simple, short prompt that includes only the main object and uses light, natural phrasing suitable for image generation. No extra objects are added.

Medium Prompt: A longer, more descriptive prompt that includes the main object along with three additional objects that frequently co-occur with it in vision-language datasets. These co-occurring objects are retrieved by scanning two JSON files (object_cooccurences_LLaVA-mix665k.json and object_cooccurences_LLaVA-Pretrain.json) which contain frequency mappings of object co-occurrence. The three additional objects are selected randomly from the top 5–10% of co-occurring entries to maintain diversity.

Hard Prompt: The hardest level of prompt includes the main object and three rarely co-occurring objects, chosen randomly from the bottom 5–10% of the frequency data. This challenges the model to generalize beyond common pairings.

The language model used is Qwen 1.5-0.5B-Chat, a small but capable instruction-following model. It is prompted to produce single, complete sentences starting with phrases like "Create an image of..." or "Visualize..." — explicitly asking a generative model to produce an image. Each prompt is constructed to be descriptive, fluent, and grounded on the specified objects.

The entire pipeline is configurable to generate prompts for all objects or just a top subset (e.g., top 10), and uses a progress bar for user feedback. For each generated prompt, the associated co-occurrence objects and their frequencies are logged for traceability. The final output is saved as generated_prompts_with_cooccurrence.csv, containing the original object, the generated prompts for each difficulty level, and the high/low co-occurrence metadata.

This setup ensures not only prompt diversity and structure but also real-world grounding and task difficulty control — making it ideal for training or evaluating multimodal models like Show-O.

In [None]:
!pip install transformers accelerate -q
!pip install --upgrade einops -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m81.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m80.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m48.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m36.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import os
import json
import random
import pandas as pd
from tqdm import tqdm

In [None]:
# CONFIGURATION
NUM_OBJECTS = 10  # testing

BASE_PATH = "/content/drive/MyDrive/Grad/CAP6412-0001/Project"

# input/output filenames
CSV_FILENAME = f"{BASE_PATH}/interesting_objects_v2.csv"
MIX_JSON = f"{BASE_PATH}/object_cooccurences_LLaVA-mix665k.json"
PRETRAIN_JSON = f"{BASE_PATH}/object_cooccurences_LLaVA-Pretrain.json"
OUTPUT_FILENAME = f"{BASE_PATH}/generated_prompts_with_cooccurrence.csv"


In [None]:
!pip install -q transformers accelerate tqdm

import os, json, random
import torch
import pandas as pd
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load Qwen 0.5B model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True).to("cuda" if torch.cuda.is_available() else "cpu")
model.eval()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 1024)
    (layers): ModuleList(
      (0-23): 24 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (o_proj): Linear(in_features=1024, out_features=1024, bias=False)
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=1024, out_features=2816, bias=False)
          (up_proj): Linear(in_features=1024, out_features=2816, bias=False)
          (down_proj): Linear(in_features=2816, out_features=1024, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm((1024,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((1024,), eps=1e-06)
      )
    )
    (norm): Qwen2RMSNorm((1024,), eps=1e-06)
    (rotary_emb): 

In [None]:
def get_cooccurring_objects(main_object, mode="high", top_percent=5, bottom_percent=10) -> tuple[list[str], str]:
    file_paths = [MIX_JSON, PRETRAIN_JSON]
    combined_freq = {}

    for path in file_paths:
        try:
            with open(path, "r") as f:
                data = json.load(f)
                if main_object in data:
                    for obj, freq in data[main_object].items():
                        combined_freq[obj] = combined_freq.get(obj, 0) + freq
        except:
            print(f"Could not load or find {path}")

    if not combined_freq:
        return [], ""

    sorted_items = sorted(combined_freq.items(), key=lambda x: x[1], reverse=(mode == "high"))
    count = len(sorted_items)

    if mode == "high":
        cutoff = max(1, int(count * (top_percent / 100)))
        candidates = sorted_items[:cutoff]
    else:
        cutoff = max(1, int(count * (bottom_percent / 100)))
        candidates = sorted_items[-cutoff:]

    selected = random.sample(candidates, min(3, len(candidates)))
    selected_objs = [obj for obj, _ in selected]
    freq_string = ", ".join([f"{obj}:{freq}" for obj, freq in selected])

    return selected_objs, freq_string


In [None]:
def generate_prompt_with_qwen(object_name: str, level: str, additional_objects=None) -> str:
    system = "You are a creative prompt engineer for a unified vision-language generation model like Show-O."

    level = level.lower()
    token_limit = {
        "easy": 20,
        "medium": 35,
        "hard": 50
    }.get(level, 35)

    # Instruction prompt
    instruction = f"""Create a {level} image generation prompt that explicitly includes the object: {object_name}"""
    if additional_objects:
        instruction += f" along with: {', '.join(additional_objects)}"
    instruction += """.
- Begin with a phrase like "Create an image of" or "Generate a picture showing"
- Use natural, diverse phrasing and a creative verb (e.g., imagine, visualize, depict, dream of)
- Make the prompt more descriptive as level increases (easy < medium < hard)
- It must be a single, complete sentence with no extra prefixes or trailing punctuation
- Do not include quotes or the words 'Prompt:' or 'Output:'
- Ensure the prompt is suitable for a model like Show-O and includes the object
- Output only the prompt, nothing else"""

    messages = [
        {"role": "system", "content": system},
        {"role": "user", "content": instruction}
    ]

    input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
    attention_mask = torch.ones_like(input_ids)

    with torch.no_grad():
        output_ids = model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_new_tokens=token_limit,
            do_sample=True,
            temperature=0.9,
            top_p=0.95,
        )

    full_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    input_text = tokenizer.decode(input_ids[0], skip_special_tokens=True)

    raw_response = full_output[len(input_text):].strip()
    for line in raw_response.split("\n"):
        line = line.strip(' "\'.:')
        if line and object_name.lower() in line.lower():
            return line
    return raw_response.strip(' "\'.:')


In [None]:
# Load object list
df = pd.read_csv(CSV_FILENAME)
object_names = df["Object"].dropna().astype(str).tolist()

# Select top N or all
if NUM_OBJECTS is not None:
    object_names = object_names[:NUM_OBJECTS]

output_data = []

for obj in tqdm(object_names, desc="Generating prompts"):
    easy_prompt = generate_prompt_with_qwen(obj, "easy")

    high_objs, high_freqs = get_cooccurring_objects(obj, mode="high")
    medium_prompt = generate_prompt_with_qwen(obj, "medium", high_objs)

    low_objs, low_freqs = get_cooccurring_objects(obj, mode="low")
    hard_prompt = generate_prompt_with_qwen(obj, "hard", low_objs)

    output_data.append({
        "object": obj,
        "easy_prompt": easy_prompt,
        "medium_prompt": medium_prompt,
        "hard_prompt": hard_prompt,
        "obj_high": high_freqs,
        "obj_low": low_freqs
    })

# Save to CSV
df_output = pd.DataFrame(output_data)
df_output.to_csv(OUTPUT_FILENAME, index=False)
print(f"✅ Saved: {OUTPUT_FILENAME}")


Generating prompts: 100%|██████████| 10/10 [03:39<00:00, 21.98s/it]

✅ Saved: /content/drive/MyDrive/Grad/CAP6412-0001/Project/generated_prompts_with_cooccurrence.csv





# 159 Objects refined prompt gen

In [None]:
import os
import json
import random
import torch
import pandas as pd
from tqdm import tqdm
from pathlib import Path
from transformers import AutoTokenizer, AutoModelForCausalLM

# ==== CONFIGURATION ====
NUM_OBJECTS = 159  # testing
BASE_PATH = "/content/drive/MyDrive/Grad/CAP6412-0001/Project"
CSV_FILENAME = f"{BASE_PATH}/interesting_objects_v3.csv"
CO_OCCUR_JSON = f"{BASE_PATH}/final_merged_sorted_cooccur.json"
OUTPUT_FILENAME = f"{BASE_PATH}/generated_prompts_from_final_cooccur.csv"

# Loading Qwen 0.5B model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True).to("cuda" if torch.cuda.is_available() else "cpu")
model.eval()

with open(CO_OCCUR_JSON, "r") as f:
    cooccur_data = json.load(f)

def get_cooccurring_objects(main_object: str, mode: str = "high", top_percent=5, bottom_percent=10) -> tuple[list[str], str]:
    main_object = main_object.lower()
    if main_object not in cooccur_data:
        return [], ""

    related = list(cooccur_data[main_object].items())
    count = len(related)

    if count == 0:
        return [], ""

    if mode == "high":
        cutoff = max(1, int(count * (top_percent / 100)))
        candidates = related[:cutoff]
    else:
        cutoff = max(1, int(count * (bottom_percent / 100)))
        candidates = related[-cutoff:]

    selected = random.sample(candidates, min(3, len(candidates)))
    selected_objs = [obj for obj, _ in selected]
    freq_string = ", ".join([f"{obj}:{freq}" for obj, freq in selected])
    return selected_objs, freq_string

# Prompt generator
def generate_prompt_with_qwen(object_name: str, level: str, additional_objects=None) -> str:
    system = "You are a creative prompt engineer for a unified vision-language generation model like Show-O."

    level = level.lower()
    token_limit = {
        "easy": 20,
        "medium": 35,
        "hard": 50
    }.get(level, 35)

    instruction = f"""Create a {level} image generation prompt that focuses primarily on the object: {object_name}"""
    if additional_objects:
        instruction += f" while subtly including: {', '.join(additional_objects)}"
    instruction += """.
- Begin with a phrase like "Create an image of" or "Generate a picture showing"
- Use natural, diverse phrasing and a creative verb (e.g., imagine, visualize, depict, dream of)
- The main object must be the primary focus of the sentence
- It must be a single, complete sentence with no extra prefixes or trailing punctuation
- Do not include quotes or the words 'Prompt:' or 'Output:'
- Ensure the prompt is suitable for a model like Show-O and includes the object
- Output only the prompt, nothing else"""

    messages = [
        {"role": "system", "content": system},
        {"role": "user", "content": instruction}
    ]

    input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
    attention_mask = torch.ones_like(input_ids)

    with torch.no_grad():
        output_ids = model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_new_tokens=token_limit,
            do_sample=True,
            temperature=0.9,
            top_p=0.95,
        )

    full_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    input_text = tokenizer.decode(input_ids[0], skip_special_tokens=True)
    raw_response = full_output[len(input_text):].strip()

    for line in raw_response.split("\n"):
        line = line.strip(' "\'.:')
        if line and object_name.lower() in line.lower():
            return line
    return raw_response.strip(' "\'.:')

# Loading objects
df = pd.read_csv(CSV_FILENAME)
object_names = df["Object"].dropna().astype(str).tolist()
if NUM_OBJECTS is not None:
    object_names = object_names[:NUM_OBJECTS]

# Generating prompts
output_data = []
for obj in tqdm(object_names, desc="Generating prompts"):
    easy_prompt = generate_prompt_with_qwen(obj, "easy")
    high_objs, high_freqs = get_cooccurring_objects(obj, mode="high")
    medium_prompt = generate_prompt_with_qwen(obj, "medium", high_objs)
    low_objs, low_freqs = get_cooccurring_objects(obj, mode="low")
    hard_prompt = generate_prompt_with_qwen(obj, "hard", low_objs)

    output_data.append({
        "object": obj,
        "easy_prompt": easy_prompt,
        "medium_prompt": medium_prompt,
        "hard_prompt": hard_prompt,
        "obj_high": high_freqs,
        "obj_low": low_freqs
    })

# Save to CSV
df_output = pd.DataFrame(output_data)
df_output.to_csv(OUTPUT_FILENAME, index=False)
print(f"Saved to: {OUTPUT_FILENAME}")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/661 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


generation_config.json:   0%|          | 0.00/206 [00:00<?, ?B/s]

Generating prompts: 100%|██████████| 159/159 [05:35<00:00,  2.11s/it]

Saved to: /content/drive/MyDrive/Grad/CAP6412-0001/Project/generated_prompts_from_final_cooccur.csv





## 7B Model Qwen run

In [None]:
import os
import json
import random
import torch
import pandas as pd
from tqdm import tqdm
from pathlib import Path
from transformers import AutoTokenizer, AutoModelForCausalLM

# ==== CONFIGURATION ====
NUM_OBJECTS = 159  # testing
BASE_PATH = "/content/drive/MyDrive/Grad/CAP6412-0001/Project"
CSV_FILENAME = f"{BASE_PATH}/interesting_objects_v3.csv"
CO_OCCUR_JSON = f"{BASE_PATH}/final_merged_sorted_cooccur.json"
OUTPUT_FILENAME = f"{BASE_PATH}/generated_prompts_7b_from_final_cooccur.csv"

# Load Qwen 0.5B model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-7B-Chat", trust_remote_code=True).to("cuda" if torch.cuda.is_available() else "cpu")
model.eval()

# Load co-occurrence data
with open(CO_OCCUR_JSON, "r") as f:
    cooccur_data = json.load(f)

# Co-object selection function
def get_cooccurring_objects(main_object: str, mode: str = "high", top_percent=5, bottom_percent=10) -> tuple[list[str], str]:
    main_object = main_object.lower()
    if main_object not in cooccur_data:
        return [], ""

    related = list(cooccur_data[main_object].items())
    count = len(related)

    if count == 0:
        return [], ""

    if mode == "high":
        cutoff = max(1, int(count * (top_percent / 100)))
        candidates = related[:cutoff]
    else:
        cutoff = max(1, int(count * (bottom_percent / 100)))
        candidates = related[-cutoff:]

    selected = random.sample(candidates, min(3, len(candidates)))
    selected_objs = [obj for obj, _ in selected]
    freq_string = ", ".join([f"{obj}:{freq}" for obj, freq in selected])
    return selected_objs, freq_string

# Prompt generator
def generate_prompt_with_qwen(object_name: str, level: str, additional_objects=None) -> str:
    system = "You are a creative prompt engineer for a unified vision-language generation model like Show-O."

    level = level.lower()
    token_limit = {
        "easy": 20,
        "medium": 35,
        "hard": 50
    }.get(level, 35)

    instruction = f"""Create a {level} image generation prompt that focuses primarily on the object: {object_name}"""
    if additional_objects:
        instruction += f" while subtly including: {', '.join(additional_objects)}"
    instruction += """.
- Begin with a phrase like "Create an image of" or "Generate a picture showing"
- Use natural, diverse phrasing and a creative verb (e.g., imagine, visualize, depict, dream of)
- The main object must be the primary focus of the sentence
- It must be a single, complete sentence with no extra prefixes or trailing punctuation
- Do not include quotes or the words 'Prompt:' or 'Output:'
- Ensure the prompt is suitable for a model like Show-O and includes the object
- Output only the prompt, nothing else"""

    messages = [
        {"role": "system", "content": system},
        {"role": "user", "content": instruction}
    ]

    input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
    attention_mask = torch.ones_like(input_ids)

    with torch.no_grad():
        output_ids = model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_new_tokens=token_limit,
            do_sample=True,
            temperature=0.9,
            top_p=0.95,
        )

    full_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    input_text = tokenizer.decode(input_ids[0], skip_special_tokens=True)
    raw_response = full_output[len(input_text):].strip()

    for line in raw_response.split("\n"):
        line = line.strip(' "\'.:')
        if line and object_name.lower() in line.lower():
            return line
    return raw_response.strip(' "\'.:')

# Loaing objects
df = pd.read_csv(CSV_FILENAME)
object_names = df["Object"].dropna().astype(str).tolist()
if NUM_OBJECTS is not None:
    object_names = object_names[:NUM_OBJECTS]

# Generate prompts
output_data = []
for obj in tqdm(object_names, desc="Generating prompts"):
    easy_prompt = generate_prompt_with_qwen(obj, "easy")
    high_objs, high_freqs = get_cooccurring_objects(obj, mode="high")
    medium_prompt = generate_prompt_with_qwen(obj, "medium", high_objs)
    low_objs, low_freqs = get_cooccurring_objects(obj, mode="low")
    hard_prompt = generate_prompt_with_qwen(obj, "hard", low_objs)

    output_data.append({
        "object": obj,
        "easy_prompt": easy_prompt,
        "medium_prompt": medium_prompt,
        "hard_prompt": hard_prompt,
        "obj_high": high_freqs,
        "obj_low": low_freqs
    })

# Save to CSV
df_output = pd.DataFrame(output_data)
df_output.to_csv(OUTPUT_FILENAME, index=False)
print(f"Saved to: {OUTPUT_FILENAME}")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/663 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/31.7k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.99G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.54G [00:00<?, ?B/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/243 [00:00<?, ?B/s]

Generating prompts: 100%|██████████| 159/159 [10:12<00:00,  3.85s/it]

Saved to: /content/drive/MyDrive/Grad/CAP6412-0001/Project/generated_prompts_7b_from_final_cooccur.csv





# **minor improvements**

In [None]:
import os
import json
import random
import torch
import pandas as pd
from tqdm import tqdm
from pathlib import Path
from transformers import AutoTokenizer, AutoModelForCausalLM

# ==== CONFIGURATION ====
NUM_OBJECTS = 159  # testing
BASE_PATH = "/content/drive/MyDrive/Grad/CAP6412-0001/Project"
CSV_FILENAME = f"{BASE_PATH}/interesting_objects_v3.csv"
CO_OCCUR_JSON = f"{BASE_PATH}/final_merged_sorted_cooccur.json"
OUTPUT_FILENAME = f"{BASE_PATH}/generated_prompts_7b_from_final_cooccur.csv"

# Load Qwen 0.5B model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-7B-Chat", trust_remote_code=True).to("cuda" if torch.cuda.is_available() else "cpu")
model.eval()

# Load co-occurrence data
with open(CO_OCCUR_JSON, "r") as f:
    cooccur_data = json.load(f)

# Co-object selection function
def get_cooccurring_objects(main_object: str, mode: str = "high", top_percent=5, bottom_percent=10) -> tuple[list[str], str]:
    main_object = main_object.lower()
    if main_object not in cooccur_data:
        return [], ""

    related = list(cooccur_data[main_object].items())
    count = len(related)

    if count == 0:
        return [], ""

    if mode == "high":
        cutoff = max(1, int(count * (top_percent / 100)))
        candidates = related[:cutoff]
    else:
        cutoff = max(1, int(count * (bottom_percent / 100)))
        candidates = related[-cutoff:]

    selected = random.sample(candidates, min(3, len(candidates)))
    selected_objs = [obj for obj, _ in selected]
    freq_string = ", ".join([f"{obj}:{freq}" for obj, freq in selected])
    return selected_objs, freq_string

# Prompt generator
def generate_prompt_with_qwen(object_name: str, level: str, additional_objects=None) -> str:
    system = (
        "You are a creative prompt engineer for a unified vision-language generation model like Show-O.\n"
        "The output must be a single, fluent sentence in English.\n"
        "Do not include the words 'Prompt:', 'Output:', quotes, or any prefix/label.\n"
        "The sentence must focus primarily on the object mentioned and subtly reference others if included.\n"
    )

    level = level.lower()
    token_limit = {
        "easy": 20,
        "medium": 35,
        "hard": 50
    }.get(level, 35)

    # Prompt instruction
    instruction = f"""Create a {level} image generation prompt that emphasizes the object: {object_name}"""
    if additional_objects:
        instruction += f" while subtly referencing: {', '.join(additional_objects)}"
    instruction += """.
- Begin with a phrase like "Create an image of", "Generate a picture showing", "Imagine", or "Depict"
- The sentence must highlight the main object clearly
- The sentence must not have any formatting, label, or introductory keyword
- Do not include any quotation marks or non-English words
- Output only the sentence, nothing else"""

    messages = [
        {"role": "system", "content": system},
        {"role": "user", "content": instruction}
    ]

    input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
    attention_mask = torch.ones_like(input_ids)

    with torch.no_grad():
        output_ids = model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_new_tokens=token_limit,
            do_sample=True,
            temperature=0.9,
            top_p=0.95,
            repetition_penalty=1.1,
        )

    full_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    input_text = tokenizer.decode(input_ids[0], skip_special_tokens=True)
    raw_response = full_output[len(input_text):].strip()

    for line in raw_response.split("\n"):
        line = line.strip(' "\'.:').replace("Prompt:", "").replace("Output:", "").strip()
        if line and object_name.lower() in line.lower() and line.isascii():
            return line

    return raw_response.strip(' "\'.:')

# Loading objects
df = pd.read_csv(CSV_FILENAME)
object_names = df["Object"].dropna().astype(str).tolist()
if NUM_OBJECTS is not None:
    object_names = object_names[:NUM_OBJECTS]

# Generating prompts
output_data = []
for obj in tqdm(object_names, desc="Generating prompts"):
    easy_prompt = generate_prompt_with_qwen(obj, "easy")
    high_objs, high_freqs = get_cooccurring_objects(obj, mode="high")
    medium_prompt = generate_prompt_with_qwen(obj, "medium", high_objs)
    low_objs, low_freqs = get_cooccurring_objects(obj, mode="low")
    hard_prompt = generate_prompt_with_qwen(obj, "hard", low_objs)

    output_data.append({
        "object": obj,
        "easy_prompt": easy_prompt,
        "medium_prompt": medium_prompt,
        "hard_prompt": hard_prompt,
        "obj_high": high_freqs,
        "obj_low": low_freqs
    })

# Save to CSV
df_output = pd.DataFrame(output_data)
df_output.to_csv(OUTPUT_FILENAME, index=False)
print(f"Saved to: {OUTPUT_FILENAME}")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Generating prompts: 100%|██████████| 159/159 [08:44<00:00,  3.30s/it]

Saved to: /content/drive/MyDrive/Grad/CAP6412-0001/Project/generated_prompts_7b_from_final_cooccur.csv



