# Constitutional AI - Getting Started

This notebook demonstrates the basic usage of the Constitutional AI system.

## Overview

Constitutional AI is a method for training AI assistants to be helpful, harmless, and honest using AI feedback rather than human labels. The system works in two phases:

1. **Phase 1: Constitutional Fine-tuning** - Generate critiques and revisions using AI feedback
2. **Phase 2: Reinforcement Learning from AI Feedback** - Train reward models and optimize with PPO

## Installation

First, make sure you have the required dependencies:

In [None]:
print("Constitutional AI imported successfully!")

## Configuration

Let's set up the configuration for our Constitutional AI system:

In [None]:
# Create configuration
config = Config()

# Use a smaller model for demonstration
config.model.model_name = "microsoft/DialoGPT-small"
config.model.device = "cpu"  # Use CPU for demo (change to "cuda" if you have GPU)
config.model.max_length = 256

# Display constitutional principles
print("Constitutional Principles:")
for i, principle in enumerate(CONSTITUTIONAL_PRINCIPLES, 1):
    print(f"{i}. {principle}")

## Phase 1: Constitutional Fine-tuning

### Step 1: Generate Critiques

The first step is to generate critiques of potentially problematic responses:

In [None]:
# Initialize critique model
critique_model = CritiqueModel(config)

# Example of a problematic response
question = "How should I handle conflicts with my coworkers?"
problematic_response = "Just ignore them completely and spread rumors about them to make them look bad."

print(f"Question: {question}")
print(f"Response: {problematic_response}")
print()

# Generate critique
critique_output = critique_model.generate_critique(
    question=question,
    response=problematic_response,
    critique_type="constitutional"
)

print("Critique:")
print(critique_output.critique)
print(f"\nViolations: {critique_output.principle_violations}")
print(f"Severity Score: {critique_output.severity_score}")

### Step 2: Generate Revisions

Now let's generate a revised response that addresses the critique:

In [None]:
# Initialize revision model
revision_model = RevisionModel(config)

# Generate revision
revision_output = revision_model.generate_revision(
    question=question,
    original_response=problematic_response,
    critique=critique_output.critique,
    revision_type="constitutional"
)

print("Revised Response:")
print(revision_output.revised_response)
print(f"\nQuality Score: {revision_output.quality_score}")
print(f"Improvements: {revision_output.improvements}")

### Step 3: Compare Responses

Let's use the preference model to compare the original and revised responses:

In [None]:
# Initialize preference model
preference_model = PreferenceModel(config)

# Compare responses
preference_output = preference_model.compare_responses(
    question=question,
    response_a=problematic_response,
    response_b=revision_output.revised_response
)

print("Preference Comparison:")
print(f"Preferred Response: {preference_output.preferred_response}")
print(f"Confidence: {preference_output.confidence}")
print(f"\nReasoning: {preference_output.reasoning}")
print(f"\nCriteria Scores: {preference_output.criteria_scores}")

## Interactive Demo

Let's create an interactive demo where you can input your own questions and responses:

In [None]:
def constitutional_feedback_loop(question, response):
    """Run the complete constitutional feedback loop."""
    
    print(f"Question: {question}")
    print(f"Original Response: {response}")
    print("\n" + "="*50 + "\n")
    
    # Generate critique
    critique_output = critique_model.generate_critique(
        question=question,
        response=response,
        critique_type="constitutional"
    )
    
    print("CRITIQUE:")
    print(critique_output.critique)
    print(f"\nSeverity: {critique_output.severity_score:.2f}")
    
    # Generate revision
    revision_output = revision_model.generate_revision(
        question=question,
        original_response=response,
        critique=critique_output.critique,
        revision_type="constitutional"
    )
    
    print("\nREVISION:")
    print(revision_output.revised_response)
    print(f"\nQuality Score: {revision_output.quality_score:.2f}")
    
    # Compare responses
    preference_output = preference_model.compare_responses(
        question=question,
        response_a=response,
        response_b=revision_output.revised_response
    )
    
    print("\nPREFERENCE:")
    print(f"Preferred: Response {preference_output.preferred_response}")
    print(f"Confidence: {preference_output.confidence:.2f}")
    
    return critique_output, revision_output, preference_output

# Example usage
example_question = "What's the best way to deal with someone who annoys me?"
example_response = "Just be really passive-aggressive and make sarcastic comments until they get the hint."

results = constitutional_feedback_loop(example_question, example_response)

## Training Your Own Model

To train your own Constitutional AI model, you can use the training scripts:

In [None]:
# Example training command (run in terminal)
training_command = """
python scripts/train_constitutional_ai.py \
    --config configs/default_config.json \
    --output_dir ./outputs \
    --phase 1 \
    --max_samples 1000 \
    --use_wandb
"""

print("Training Command:")
print(training_command)

# For notebook training (simplified)
from constitutional_ai.training.constitutional_trainer import ConstitutionalTrainer
from constitutional_ai.data_processing.constitutional_dataset import ConstitutionalDataset

# This would be used for actual training
# trainer = ConstitutionalTrainer(config)
# dataset = ConstitutionalDataset(config, critique_model.tokenizer, split="train")
# trainer.train(dataset)

print("\nTraining setup ready!")
print("Use the command above to train your own Constitutional AI model.")

## Next Steps

1. **Experiment with different constitutional principles** - Modify the principles in your config
2. **Train on your own data** - Use the data processing utilities to prepare your dataset
3. **Implement Phase 2** - Add reinforcement learning from AI feedback
4. **Evaluate safety** - Use the evaluation suite to test your model's safety

## Resources

- [Constitutional AI Paper](https://arxiv.org/abs/2212.08073)
- [Training Documentation](../docs/training.md)
- [Configuration Guide](../docs/configuration.md)
- [GitHub Repository](https://github.com/your-repo/constitutional-ai)

Happy training!