# Encoding Prompter - Example Usage

This notebook demonstrates how to use the `encoding_prompter` package to identify psychological constructs in interview transcripts.

## Setup

First, install the package and set up your API key.

In [None]:
# Install the package (if not already installed)
# !pip install encoding_prompter

# Or install from local source
# !pip install -e /path/to/encoding_prompter

In [None]:
import os

# Set your OpenRouter API key
# You can get one at https://openrouter.ai/
os.environ["OPENROUTER_API_KEY"] = "your-api-key-here"

In [None]:
from encoding_prompter import EncodingPrompter, Codebook, DocumentLoader

## Load and Inspect Codebook

Let's first look at the codebook we'll be using.

In [None]:
# Load codebook
codebook = Codebook.from_file("example_codebook.json")

print(f"Loaded {len(codebook)} constructs:\n")
for construct in codebook:
    print(f"ðŸ“Œ {construct.name}")
    print(f"   Definition: {construct.definition[:80]}...")
    print()

## Load Documents

Load and inspect the interview documents.

In [None]:
# Load a single document
documents = DocumentLoader.load("path/to/your/interview.txt")

for doc in documents:
    print(f"Document: {doc.doc_id}")
    print(f"Speakers: {', '.join(doc.speakers)}")
    print(f"Content length: {len(doc.content)} characters")
    print()

## Initialize the Prompter

Create an EncodingPrompter instance with your preferred model.

In [None]:
# Using the default free model
prompter = EncodingPrompter()

# Or specify a different model:
# prompter = EncodingPrompter(model="anthropic/claude-3-5-sonnet")

print(prompter)

## Preview the Prompt

Before running, let's see what prompt will be sent to the LLM.

In [None]:
# Preview the formatted prompt
prompt_preview = prompter.preview_prompt(
    document=documents[0],
    codebook=codebook
)

# Show first 1000 characters
print(prompt_preview[:1000] + "...")

## Run Encoding

Process the documents and extract construct instances.

In [None]:
# Run encoding
results = prompter.encode(
    documents=documents,
    codebook=codebook,
    show_progress=True
)

print(f"\nFound {len(results)} construct instances")

## Analyze Results

In [None]:
# Display the results DataFrame
results.head(10)

In [None]:
# Count instances by construct
print("Instances by construct:")
print(results['construct'].value_counts())

In [None]:
# Filter high-confidence instances (score = 2)
high_confidence = results[results['confidence'] == 2]

print(f"High-confidence instances: {len(high_confidence)}")
print()
high_confidence

In [None]:
# Group by speaker
if 'speaker_id' in results.columns:
    print("Instances by speaker:")
    print(results.groupby('speaker_id')['construct'].value_counts())

## Custom Scoring Criteria

You can customize just the scoring criteria without changing the entire prompt.

In [None]:
# Custom 5-point scale
custom_scoring = """
Rate each instance on a 5-point confidence scale:
1 = Very unlikely to be this construct (weak or tangential relation)
2 = Somewhat unlikely (possible but doubtful)
3 = Unclear/ambiguous (could go either way)
4 = Likely this construct (good match to definition)
5 = Definite/prototypical example (clearly matches definition and examples)
"""

results_5pt = prompter.encode(
    documents=documents,
    codebook=codebook,
    scoring_criteria=custom_scoring,
    show_progress=True
)

results_5pt.head()

## Process a Directory

Process multiple interview files at once.

In [None]:
# Process all files in a directory
all_results = prompter.encode(
    documents="path/to/interviews/",  # Directory path
    codebook=codebook,
    show_progress=True
)

print(f"Total instances across all documents: {len(all_results)}")

## Export Results

In [None]:
# Save to CSV
results.to_csv("encoding_results.csv", index=False)
print("Results saved to encoding_results.csv")

In [None]:
# Save to Excel with multiple sheets
with pd.ExcelWriter("encoding_results.xlsx") as writer:
    results.to_excel(writer, sheet_name="All Results", index=False)
    
    # Summary by construct
    summary = results.groupby('construct').agg({
        'doc_id': 'count',
        'confidence': 'mean'
    }).rename(columns={'doc_id': 'count', 'confidence': 'avg_confidence'})
    summary.to_excel(writer, sheet_name="Summary")
    
print("Results saved to encoding_results.xlsx")