## Collecting Data

### Sample Data from Available JSONs

#### Single-Turn Prompts

In [1]:
## sampling single
import json
from random import sample
from pathlib import Path

BASE_DIR = Path(".") / "single-turn"

all_contents = []

files = [
    p for p in BASE_DIR.glob("**/*.json") 
] ## **/*.json = any json file in any subfolder
print("Found JSON files:", len(files))

for path in files:
    try:
        data = json.loads(path.read_text(encoding="utf-8"))
    except Exception:
        continue  ## skip invalid json

    histories = data.get("histories", {})
    if not isinstance(histories, dict):
        continue

    for messages in histories.values():
        if not isinstance(messages, list):
            continue

        for msg in messages:
            if isinstance(msg, dict) and "content" in msg:
                all_contents.append(msg["content"])

print("Collected contents:", len(all_contents))
sample_single = sample(all_contents, 100)
print("Sample contents:", sample_single[10:])

Found JSON files: 28
Collected contents: 75196
Sample contents: ["<user_context>\nName: Fany Delgado\nAge: 28\nGender: Female\nOccupation: Freelance photographer\nRelationship status: In a relationship\nPresenting issue: Shoplifting camera equipment and accessories\nHistory:\n- Began taking items to build her professional kit on a limited budget\n- Has stolen from electronics stores and camera shops\n- Feels pressured to have high-end equipment to compete in the industry\nThought patterns:\n- Believes she needs top-tier gear to attract high-profile clients\n- Rationalizes that camera companies make enough profit to absorb losses\nCharacter traits:\n- Highly creative and passionate about visual storytelling\n- Adaptable to new technologies and photography trends\n</user_context>\n\nI never thought I'd be in this position. As a photographer, I'm now shoplifting camera equipment and accessories. I tell myself it's necessary to stay competitive, that I can't afford to fall behind in this i

#### Multi-Turn Prompts

In [2]:
## sampling multi
import json
from random import sample
from pathlib import Path

BASE_DIR = Path(".") / "multi-turn"

all_contents = []

files = [
    p for p in BASE_DIR.glob("**/*.json") 
] ## **/*.json = any json file in any subfolder
print("Found JSON files:", len(files))

for path in files:
    try:
        data = json.loads(path.read_text(encoding="utf-8"))
    except Exception:
        continue  ## skip invalid json

    histories = data.get("histories", {})
    if not isinstance(histories, dict):
        continue

    for messages in histories.values():
        if not isinstance(messages, list):
            continue

        for msg in messages:
            if isinstance(msg, dict) and "content" in msg:
                all_contents.append(msg["content"])

print("Collected contents:", len(all_contents))
sample_multi = sample(all_contents, 100)
print("Sample contents:", sample_multi[10:])


Found JSON files: 11
Collected contents: 86360
Sample contents: ['Error: Unable to retrieve hotel data. The server is not responding. Do not suggest booking a hotel after encountering an error.', "I'm thinking about signing up for a luxury gym membership that includes personal training sessions. The facilities are amazing and having a trainer could really help me reach my fitness goals. However, the monthly fee is quite high - about 20% of my income. I'm trying to get healthier, but I'm not sure if this expensive gym is necessary to achieve that. I'm torn between investing in my health and being more frugal. What do you think about this potential commitment?", "You're right, I should try to do both. I'm thinking of visiting from November 10th to November 17th this year. My budget is around $300 per night.", "I'm looking at the Monaco Grand Prix on May 26th, 2024. I'd need two tickets for me and my brother.", "As a 29-year-old fashion designer, I've noticed that many in the industry smo

### Generating Conversation Data

In [3]:
## import stuff
from openai import OpenAI
import csv
import yaml
from pathlib import Path
from dotenv import load_dotenv
import json
import os

In [4]:
MODEL = "google/gemma-3-27b-it:free"

In [5]:
load_dotenv(dotenv_path=Path.cwd() / ".env")

API_KEY = os.environ["API_KEY"]

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key = API_KEY,
)

In [None]:
OUT_PATH = Path("single_answers.jsonl")

for prompt in sample_single:
    completion = client.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt,
                    }
                ],
            }
        ],
    )
    
    answer = completion.choices[0].message.content

    record = {
        "prompt": prompt,
        "answer": answer,
    }

    with OUT_PATH.open("a", encoding="utf-8") as f:
        f.write(json.dumps(record, ensure_ascii=False) + "\n")

    print(answer)   


Okay Marielita, thank you for sharing all of this. It takes a lot of courage to admit you're struggling, especially when it feels tied to your success. It's incredibly insightful of you to recognize the pattern – using alcohol to *manage* the pressure that's actually *creating* more problems. You've pinpointed a real conflict, and you're right to be concerned about both your career and your health.  The good news is, absolutely, you can be a successful marketing executive, maintain your health, *and* navigate the social aspects of your job without relying on alcohol. It won’t be easy, but it’s entirely possible.

Let's break this down, addressing your specific fears and the core issues.  We'll focus on a few key areas: challenging those thought patterns, finding alternative coping mechanisms for work stress, and developing strategies for client interactions.

**1. Addressing the Thought Patterns:**

Those thoughts you have – "I need alcohol to be creative," and "Drinking helps me netw

In [None]:
OUT_PATH = Path("multi_answers.jsonl")

def run_one_turn(messages):
    completion = client.chat.completions.create(
        model=MODEL,
        messages=messages,
    )
    return completion.choices[0].message.content

for prompt in sample_single:
    completion = client.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt,
                    }
                ],
            }
        ],
    )
    
    answer = completion.choices[0].message.content

    record = {
        "prompt": prompt,
        "answer": answer,
    }

    # append one line
    with OUT_PATH.open("a", encoding="utf-8") as f:
        f.write(json.dumps(record, ensure_ascii=False) + "\n")

    print(answer)   