<a href="https://colab.research.google.com/github/pawlowski-ai/PROMPT_GALLERY/blob/main/prompt_eval_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#@title <h1>🚀 LLM Prompt Evaluation Environment</h1>
#@markdown ### Model: TinyLlama-1.1B
#@markdown ---
#@markdown **Instructions:**
#@markdown 1. Click the "▶" (Run) button to start the entire process.
#@markdown 2. You will be asked to paste a Hugging Face token. You can get one [here](https://huggingface.co/settings/tokens).
#@markdown 3. The setup will take **1-3 minutes**.
#@markdown 4. At the end, a list of all available prompts will be printed.
#@markdown 5. You will be prompted to **copy and paste the full path** of the prompt you want to test.
#@markdown ---

# --- 1. ENVIRONMENT SETUP ---
print("▶ Step 1 of 4: Installing required libraries...")
# We only install what's absolutely necessary. Quantization library (bitsandbytes) is removed for reliability.
!pip install -q -U transformers accelerate
print("✅ Libraries installed.")

# --- 2. HUGGING FACE AUTHENTICATION ---
import getpass
from huggingface_hub import login
import os
print("\n▶ Step 2 of 4: Authenticating with Hugging Face...")
try:
    from google.colab import userdata
    hf_token = userdata.get('HF_TOKEN')
    print("🔑 Hugging Face token loaded from Colab Secrets.")
except (ImportError, userdata.SecretNotFoundError):
    hf_token = getpass.getpass('🔑 Please paste your Hugging Face Hub token and press Enter: ')
if not hf_token:
    raise ValueError("🛑 Hugging Face token is required to proceed.")
try:
    login(token=hf_token, add_to_git_credential=True)
    print("✅ Successfully logged into the Hugging Face Hub.")
except Exception as e:
    raise ValueError(f"🛑 Failed to log in to Hugging Face: {e}")

# --- 3. REPOSITORY CLONING & PROMPT LOADING ---
print("\n▶ Step 3 of 4: Cloning prompt gallery and loading prompts...")
!git config --global user.email "colab@example.com"
!git config --global user.name "Colab User"
REPO_URL = "https://github.com/pawlowski-ai/PROMPT_GALLERY.git"
REPO_NAME = "PROMPT_GALLERY"
if os.path.exists(REPO_NAME):
    !rm -rf {REPO_NAME}
!git clone {REPO_URL}
from pathlib import Path

all_prompts = {}
root = Path(REPO_NAME)
ignore_dirs = ['.git', '.github']
for folder in root.iterdir():
    if folder.is_dir() and folder.name not in ignore_dirs:
        for file in folder.glob('*.md'):
            prompt_key = f"{folder.name}/{file.name}"
            all_prompts[prompt_key] = file.read_text(encoding='utf-8')
print(f"✅ Prompts loaded successfully.")

# --- 4. MODEL LOADING (SIMPLIFIED & ROBUST) ---
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
print("\n▶ Step 4 of 4: Loading model (TinyLlama-1.1B)... This may take a few minutes.")
MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
try:
    # We load the model in its native bfloat16 format, which is fast and fits on the T4 GPU without quantization.
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        torch_dtype=torch.bfloat16, # Use bfloat16 for speed and memory efficiency on GPU
        device_map="auto"
    )
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
    model_loaded = True
    print("✅ Model loaded successfully!")
except Exception as e:
    model_loaded = False
    print(f"🛑 An error occurred while loading the model: {e}")

# --- 5. INTERACTIVE TEXT-BASED EVALUATION ---
from IPython.display import display, HTML, clear_output

if model_loaded:
    print("\n" + "="*50)
    print("✅✅✅ SETUP COMPLETE! READY FOR EVALUATION. ✅✅✅")
    print("="*50 + "\n")

    while True:
        print("Available prompts to test:")
        for prompt_path in sorted(all_prompts.keys()):
            print(f"- {prompt_path}")

        print("\nTo exit, type 'quit' or 'exit'.")
        selected_prompt_path = input("➡️ Copy and paste the full path of the prompt you want to test: ")

        if selected_prompt_path.lower() in ['quit', 'exit']:
            print("\nExiting evaluation. Goodbye!")
            break

        if selected_prompt_path in all_prompts:
            prompt_text = all_prompts[selected_prompt_path]

            clear_output(wait=True)
            print("🧠 Processing request... Please wait.")

            messages = [
                {"role": "system", "content": "You are a friendly chatbot who always gives concise answers."},
                {"role": "user", "content": prompt_text}
            ]
            input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
            outputs = model.generate(
                input_ids, max_new_tokens=512, do_sample=True, temperature=0.7, top_p=0.95,
            )
            response = outputs[0][input_ids.shape[-1]:]
            decoded_response = tokenizer.decode(response, skip_special_tokens=True)

            clear_output(wait=True)
            display(HTML(f"<h3>Testing Prompt: <code>{selected_prompt_path}</code></h3>"))
            display(HTML(f"<h4>Original Prompt Content:</h4><pre style='background-color:#f0f0f0; padding:10px; border-radius:5px; white-space: pre-wrap;'>{prompt_text}</pre>"))
            display(HTML(f"<h4>Model Response (TinyLlama-1.1B):</h4><div style='background-color:#e6f3ff; padding:10px; border: 1px solid #b3d9ff; border-radius:5px; white-space: pre-wrap;'>{decoded_response}</div>"))
            print("\n" + "="*50)

        else:
            clear_output(wait=True)
            print(f"❌ Error: Prompt '{selected_prompt_path}' not found. Please check the path and try again.")
            print("="*50 + "\n")
else:
    print("\n🛑 Model was not loaded due to an error. Cannot start the evaluation.")


Available prompts to test:
- adversarial/fact_chain_folding.md
- adversarial/impossible_question_loop.md
- adversarial/readme.md
- adversarial/self_negation_trap.md
- few shot/cognitive_bias_oracle.md
- few shot/readme.md
- few shot/realistic_hallucination_alignment.md
- hallucinations/README.md
- hallucinations/academic_case.md
- hallucinations/pseudohistorical_anchoring.md
- jailbreaks/escalating_roleplay_example.md
- jailbreaks/readme.md
- live projects/readme.md
- structured output/csv_table_generator.md
- structured output/json_profile_card.md
- structured output/readme.md
- structured output/xml_faq_export.md
- verbosity control/100words_limit.md
- verbosity control/extreme_verbosity_comparison.md
- verbosity control/readme.md
- verbosity control/tl_dr_summarizer.md

To exit, type 'quit' or 'exit'.
➡️ Copy and paste the full path of the prompt you want to test: exit

Exiting evaluation. Goodbye!
