# Evidence-Based Policy Scoring with Open LLMs (Zero-Shot)
This notebook lets you score policy documents using an open-source LLM and a rubric-based prompt. It uses Mistral-7B via Hugging Face. Steps include:
* Load Mistral-7B-Instruct via Hugging Face
* Accept .txt policy files in a folder
* Prompt the model using the evidence-based policy (EBP) rubric
* Parse the response and output a .csv with scores and justifications for them

In [None]:
!pip install -q transformers accelerate bitsandbytes sentencepiece
!pip install -q unstructured


In [None]:
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mistral-7B-Instruct-v0.2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_4bit=True)

llm = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=1024)


In [None]:
rubric_prompt = """You are a policy analyst evaluating how evidence-based a policy document is.
Use the rubric below to assess the document on a 0–3 scale for each dimension.
Provide both a score and a short justification for each.

### Rubric
1. Use of Empirical Research
- 0: No references to empirical evidence or data
- 1: Vague or anecdotal references (e.g., “studies show”)
- 2: Clear empirical support, but limited sourcing
- 3: Multiple, clearly cited, high-quality sources (e.g., peer-reviewed, systematic reviews)

2. Formal Evidence-Gathering Process
- 0: No structured data gathering
- 1: Informal or anecdotal input
- 2: Basic assessments (e.g., internal reports, cost estimates)
- 3: Formal tools (e.g., RCTs, modeling, pilot programs)

3. Transparency and Accessibility
- 0: No documentation or rationale
- 1: Minimal or internal-only documentation
- 2: Public access with basic explanation
- 3: Fully open access, replicable, with detailed methods

4. Expert and Stakeholder Input
- 0: No input from external experts or stakeholders
- 1: Informal or internal-only consultation
- 2: Formal expert or stakeholder involvement
- 3: Broad, interdisciplinary consultation, including marginalized groups

5. Evaluation and Iteration
- 0: No evaluation mechanism
- 1: Evaluation mentioned but vague
- 2: Evaluation planned or metrics included
- 3: Evaluation built-in with feedback loops

### Document:
{document_text}

### Task:
Provide a JSON-formatted output like this:
{
  "Use of Empirical Research": {"score": 2, "justification": "..."},
  "Formal Evidence-Gathering Process": {"score": 1, "justification": "..."},
  "Transparency and Accessibility": {"score": 3, "justification": "..."},
  "Expert and Stakeholder Input": {"score": 1, "justification": "..."},
  "Evaluation and Iteration": {"score": 2, "justification": "..."}
}
"""


In [None]:
import os

folder_path = "/content/policies"
os.makedirs(folder_path, exist_ok=True)

print(f"📂 Upload your .txt policy documents into: {folder_path}")


In [None]:
import json
import pandas as pd

results = []

for filename in os.listdir(folder_path):
    if not filename.endswith(".txt"):
        continue
    with open(os.path.join(folder_path, filename), "r", encoding="utf-8") as f:
        doc_text = f.read()

    full_prompt = rubric_prompt.replace("{document_text}", doc_text[:4000])  # truncate if needed
    response = llm(full_prompt)[0]["generated_text"]

    try:
        json_start = response.find("{")
        json_end = response.rfind("}") + 1
        parsed = json.loads(response[json_start:json_end])
    except Exception as e:
        print(f"❌ Failed to parse output for {filename}: {e}")
        continue

    flat = {"filename": filename}
    for k, v in parsed.items():
        flat[f"{k} Score"] = v.get("score")
        flat[f"{k} Justification"] = v.get("justification")
    results.append(flat)

df = pd.DataFrame(results)
df.to_csv("/content/evidence_scores.csv", index=False)
print("✅ Scoring complete! Download your CSV from /content/evidence_scores.csv")
df.head()
