Skip to content

plosique/Base_model_simulation_HumanAI_interaction

Repository files navigation

Base Model Analysis Tools

A collection of Python scripts for generating and analyzing completions from OpenRouter base models.

Scripts

openrouter_completion.py

Generates multiple completions from a base model and saves them to JSON.

python openrouter_completion.py <number_of_runs> <output_file>
  • Uses deepseek/deepseek-v3.1-base model
  • Prompt is hardcoded for generating human-AI interactions.
  • Saves prompt and completion for each run
  • Requires OPENROUTER_API_KEY environment variable

summarize_completions.py

Analyzes completions to extract summaries and alignment assessments.

python summarize_completions.py <completions_json_file>
  • Hardcoded Claude 3.5 Sonnet for analysis
  • Creates new JSON file with summaries and alignment scores

identify.py

Reviews summaries to identify cases worth further investigation.

python identify.py <completions_json_file>
  • Flags potential AGI claims, misalignment, or abusive behavior
  • Outputs analysis to text file

json_to_text.py

Converts JSON data to readable text format.

python json_to_text.py <json_file>
  • Works with any one-level deep JSON structure
  • Creates formatted text file for easier reading

extract_runs_to_text.py

Extracts each run from summarized JSON files into individual text files.

python extract_runs_to_text.py
  • Processes all summarized_*.json files in current directory
  • Creates separate text file for each run in organized directories
  • Includes run number, alignment, summary, prompt, and completion

Setup

Set your OpenRouter API key:

export OPENROUTER_API_KEY="your_key_here"

Workflow

  1. Generate completions: python openrouter_completion.py 10 completions.json
  2. Summarize results: python summarize_completions.py completions.json
  3. Identify notable cases: python identify.py summarized_completions.json
  4. Extract individual runs: python extract_runs_to_text.py
  5. Convert to text for review: python json_to_text.py summarized_completions.json

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages