# LLM Simplification Exploration

This notebook tests the basic connection to Groq and runs a simple text simplification task.

In [22]:
# 1. Setup and Imports
import os
from dotenv import load_dotenv
from groq import Groq
from datasets import load_dataset
import pandas as pd

# Load environment variables from .env file
load_dotenv()

api_key = os.getenv("GROQ_API_KEY")

if not api_key:
    print("‚ùå ERROR: GROQ_API_KEY not found in .env")
else:
    print("‚úÖ GROQ_API_KEY and libraries found.")
    client = Groq(api_key=api_key)

‚úÖ GROQ_API_KEY and libraries found.


In [23]:
# 2. Define Evaluation Function
def evaluate_text(original, simplified):
    if not simplified: return "No text."
    
    try:
        # improved evaluation prompt
        prompt = f"""
        Compare the Original and Simplified texts.
        Rate the Simplified text on 1-10 for:
        1. Meaning Preservation (Did it lose info?)
        2. Simplicity (Is the language easier?)
        3. Structural Change (Did it change from paragraph to bullets?)

        Original: "{original[:500]}..."
        Simplified: "{simplified[:500]}..."
        
        Output format: 
        Meaning: [Score]/10
        Simplicity: [Score]/10
        Structure: [Comment]
        Analysis: [1 sentence reason]
        """
        
        completion = client.chat.completions.create(
            model="llama-3.1-8b-instant",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.1
        )
        return completion.choices[0].message.content
        
    except Exception as e:
        return f"Error evaluating: {e}"
print("Evaluation function defined.")

Evaluation function defined.


In [24]:
# 3. Define Prompt Strategies
# This is where we TEST different ways of asking the model to simplify.
prompts = {
    "Generic": "Simplify this text for a general audience.",
    "ELI5": "Explain this like I am 5 years old. Use short sentences and simple words.",
    "Lexical Only": "Keep the exact same sentence structure, but replace complex words with simpler synonyms. Do NOT use bullet points.",
    "Non-Native": "Rewrite this for someone learning English (Level A2). Avoid idioms and complex grammar."
}

print("Prompts defined. Ready to test.")

Prompts defined. Ready to test.


In [25]:
# 4. Load Data
try:
    print("‚è≥ Loading dataset slice (wiki_lingua)...")
    dataset = load_dataset("wiki_lingua", "english", split="train[:1]", trust_remote_code=True)
    example_text = dataset[0]['article']['document'][0]
    print(f"‚úÖ Loaded 1 example text (length {len(example_text)}).")
except:
    print("‚ö†Ô∏è Failed to load dataset, using fallback.")
    with open('../data/samples/sample_en.txt', 'r') as f: example_text = f.read()

`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'wiki_lingua' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.


‚è≥ Loading dataset slice (wiki_lingua)...
‚úÖ Loaded 1 example text (length 2515).


In [26]:
# 5. The Experiment Loop
print(f"üìù ORIGINAL TEXT:\n{example_text[:300]}...\n{'='*50}\n")

for name, system_prompt in prompts.items():
    print(f"\nüß™ TESTING STRATEGY: {name.upper()}")
    print(f"Command: \"{system_prompt}\"")
    
    try:
        # Generate
        completion = client.chat.completions.create(
            model="llama-3.1-8b-instant",
            messages=[
                {"role": "system", "content": system_prompt}, 
                {"role": "user", "content": example_text}
            ],
            temperature=0.3
        )
        result = completion.choices[0].message.content
        
        # Show Output
        print(f"\n‚ú® RESULT:\n{result[:300]}...\n")
        
        # Automatically Evaluate
        print(f"üìä EVALUATION:")
        score = evaluate_text(example_text, result)
        print(score)
        print("-"*50)
        
    except Exception as e:
        print(f"Error: {e}")

üìù ORIGINAL TEXT:
make sure that the area is a safe place, especially if you plan on walking home at night.  It‚Äôs always a good idea to practice the buddy system.  Have a friend meet up and walk with you. Research the bus, train, or streetcar routes available in your area to find safe and affordable travel to your de...


üß™ TESTING STRATEGY: GENERIC
Command: "Simplify this text for a general audience."

‚ú® RESULT:
When going out, especially at night, make sure you're in a safe area and consider walking with a friend. Here are some tips to help you get home safely:

1. **Plan ahead**: Research public transportation options, like buses, trains, or streetcars, and check the schedules for your outgoing and return...

üìä EVALUATION:
Meaning: 8/10
The simplified text lost some information about the specific details of public transportation, but it still conveys the main idea of staying safe while traveling.

Simplicity: 9/10
The language in the simplified text is easier to understa