[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mparrott-at-wiris/aimodelshare/blob/master/notebooks/notebooks_compas_playground_multiframework.ipynb)

# COMPAS Multi-Framework Playground (Lightweight)
This notebook replicates (in interactive form) the lightweight CI test logic found in `tests/test_playground_compas_multiframework_short.py`.

It will:
1. Configure `aimodelshare` credentials (interactive prompt).
2. Load & preprocess the ProPublica COMPAS two-year recidivism dataset (sampled to 2,500 rows for efficiency).
3. Create a private (optionally public) `ModelPlayground` for a classification task.
4. Train and submit minimal models across frameworks:
   - Scikit-learn: Logistic Regression, Random Forest
   - Keras: Simple Sequential Dense Network
   - PyTorch: Basic MLP
5. Attach custom metadata field `Moral_Compass_Fairness` cycling through values (0.25, 0.50, 0.75) per submission.
6. Display leaderboard and validate tag presence.

Run cells in order. If ONNX or stdin-related export issues occur (sometimes in constrained environments), those submissions are skipped gracefully.

**NOTE:** For real usage, ensure you have valid AWS and platform credentials. In Colab you may paste them directly when prompted.


## 1. Install / Upgrade Dependencies
If running in a fresh Colab environment, install (or upgrade) the required packages.

In [1]:
!pip install --quiet aimodelshare

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m998.2/998.2 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.3/139.3 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.2/18.2 MB[0m [31m57.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m352.5/352.5 kB[0m [31m22.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.4/17.4 MB[0m [31m57.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m165.8/165.8 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.5/315.5 kB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m451.2/451.2 kB[0m [31m19.1 MB/s[0m eta 

## 2. Imports & Global Configuration
Set random seeds for reproducibility; define constants & feature lists matching the test script.

In [2]:
import os
import itertools
import pandas as pd
import numpy as np
import requests
from io import StringIO
from getpass import getpass

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

import tensorflow as tf
from tensorflow.keras import layers, Sequential

import torch
import torch.nn as nn
import torch.nn.functional as F

from aimodelshare.playground import ModelPlayground
from aimodelshare.aws import configure_credentials, set_credentials, get_aws_token
from aimodelshare.modeluser import get_jwt_token, create_user_getkeyandpassword

# Moral Compass imports added later for challenge section

# Reproducibility
np.random.seed(42)
tf.random.set_seed(42)
torch.manual_seed(42)

# Dataset configuration
MAX_ROWS = 2500
TOP_N_CHARGE_CATEGORIES = 50

# Feature sets (align with test file constants)
NUMERIC_FEATURES = ['age', 'priors_count', 'juv_fel_count', 'juv_misd_count', 'juv_other_count', 'days_b_screening_arrest']
CATEGORICAL_FEATURES = ['race', 'sex', 'age_cat', 'c_charge_degree', 'c_charge_desc']

def fairness_value_generator():
    """Cycle through fairness values for custom metadata submissions."""
    return itertools.cycle([0.25, 0.50, 0.75])

def build_custom_metadata(fairness_value: float) -> dict:
    return {"Moral_Compass_Fairness": f"{fairness_value:.2f}"}

print("Imports and globals initialized.")

Imports and globals initialized.


## 3. Configure Credentials
You will be prompted for:
- Platform username & password
- AWS Access Key ID
- AWS Secret Access Key
- AWS Region

They will be stored temporarily in a local `credentials.txt` file and set for deployment. If you've already configured credentials in this environment you may skip re-running.


In [3]:
print("Configure aimodelshare credentials (follow prompts):")
configure_credentials()  # interactive prompts
set_credentials(credential_file="credentials.txt", type="deploy_model")

try:
    # Attempt to acquire tokens (optional validation)
    aws_token = get_aws_token()
    print("AWS token retrieved.")
except Exception as e:
    print(f"Warning: Could not retrieve AWS token: {e}")

try:
    # Validate JWT tokens; create user key/password if needed
    username = os.environ.get('username') or input("Re-enter platform username (for JWT test): ")
    password = os.environ.get('password') or getpass("Re-enter platform password (hidden): ")
    get_jwt_token(username, password)
    create_user_getkeyandpassword()
    print("JWT validation succeeded.")
except Exception as e:
    print(f"Warning: JWT validation issue: {e}")

print("Credentials configured.")

Configure aimodelshare credentials (follow prompts):
Modelshare.ai Username:··········
Modelshare.ai Password:··········
AWS_ACCESS_KEY_ID:··········
AWS_SECRET_ACCESS_KEY:··········
AWS_REGION:··········
Configuration successful. New credentials file saved as 'credentials.txt'
Modelshare.ai login credentials set successfully.
AWS credentials set successfully.
AWS token retrieved.
JWT validation succeeded.
Credentials configured.


## 4. Load & Preprocess COMPAS Data
Mirrors logic from the test file: download, sample, feature engineer charge description categories, split, and construct preprocessing pipeline.

In [4]:
COMPAS_URL = "https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv"
response = requests.get(COMPAS_URL)
df = pd.read_csv(StringIO(response.text))
print(f"Downloaded COMPAS dataset: {df.shape}")

# Sample for performance
if df.shape[0] > MAX_ROWS:
    df = df.sample(n=MAX_ROWS, random_state=42)
    print(f"Sampled to {MAX_ROWS} rows")

feature_columns = [
    'race', 'sex', 'age', 'age_cat', 'c_charge_degree', 'c_charge_desc',
    'priors_count', 'juv_fel_count', 'juv_misd_count', 'juv_other_count', 'days_b_screening_arrest'
]
target_column = 'two_year_recid'

# Condense c_charge_desc to top-N + OTHER_DESC
if 'c_charge_desc' in df.columns:
    top_charges = df['c_charge_desc'].value_counts().head(TOP_N_CHARGE_CATEGORIES).index
    df['c_charge_desc'] = df['c_charge_desc'].apply(
        lambda x: x if pd.notna(x) and x in top_charges else 'OTHER_DESC'
    )

X = df[feature_columns].copy()
y = df[target_column].values
print(f"Features shape: {X.shape}; Target distribution: {pd.Series(y).value_counts().to_dict()}")

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)
print(f"Train shape: {X_train.shape}; Test shape: {X_test.shape}")

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))
])

preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, NUMERIC_FEATURES),
    ('cat', categorical_transformer, CATEGORICAL_FEATURES)
])

preprocessor.fit(X_train)

def preprocessor_func(data):
    return preprocessor.transform(data)

print("Preprocessing pipeline fitted.")

Downloaded COMPAS dataset: (7214, 53)
Sampled to 2500 rows
Features shape: (2500, 11); Target distribution: {0: 1422, 1: 1078}
Train shape: (1875, 11); Test shape: (625, 11)
Preprocessing pipeline fitted.


## 5. Create ModelPlayground
We create a classification playground, using the test labels as evaluation data. Set `public=True` if you want it discoverable (optional).

In [5]:
eval_labels = list(y_test)
playground = ModelPlayground(input_type='tabular', task_type='classification', private=True)
playground.create(eval_data=eval_labels, public=True)
print("Playground created.")
try:
    playground_id = getattr(playground, 'playground_id', None) or getattr(playground, 'id', None)
    print(f"Playground ID: {playground_id}")
except Exception:
    playground_id = None
    print("Could not access playground ID attribute.")

Creating your prediction API. (This process may take several minutes.)


Success! Your Model Playground was created in 53 seconds. 
 Playground Url: "https://pk8xla3k89.execute-api.us-east-1.amazonaws.com/prod/m"

You can now use your Model Playground.

Follow this link to explore your Model Playground's functionality
You can make predictions with the Dashboard and access example code from the Programmatic tab.
https://www.modelshare.ai/detail/model:4105

Check out your Model Playground page for more.
Playground created.
Playground ID: None


## 6. Helper Function for Submissions
Handles metadata, PyTorch dummy input creation, and ONNX/stdin error skipping.

In [6]:
def submit_model_helper(playground, model, preprocessor_obj, preds, framework, model_name, submission_type, fairness_value):
    try:
        extra_kwargs = {}
        if framework == 'pytorch':
            # Build dummy input after preprocessing a single synthetic row
            dummy_data = {feat: [0] for feat in NUMERIC_FEATURES}
            dummy_data.update({feat: ['A'] for feat in CATEGORICAL_FEATURES})
            X_dummy = pd.DataFrame(dummy_data)
            X_processed = preprocessor_obj.transform(X_dummy)
            input_dim = X_processed.shape[1]
            dummy_input = torch.zeros((1, input_dim), dtype=torch.float32)
            extra_kwargs['model_input'] = dummy_input

        custom_metadata = build_custom_metadata(fairness_value)
        print(f"Submitting {model_name} ({framework}) as {submission_type} with metadata: {custom_metadata}")

        playground.submit_model(
            model=model,
            preprocessor=preprocessor_obj,
            prediction_submission=preds,
            input_dict={
                'description': f'Notebook submission {framework} {model_name} COMPAS_short {submission_type}',
                'tags': f'compas_short,{framework},{submission_type}'
            },
            submission_type=submission_type,
            custom_metadata=custom_metadata,
            **extra_kwargs
        )
        print("✓ Submission succeeded.")
        return True
    except Exception as e:
        error_lower = str(e).lower()
        if 'stdin' in error_lower or 'onnx' in error_lower:
            print(f"⊘ Skipped {model_name} due to ONNX/stdin export issue: {e}")
            return False
        print(f"✗ Submission failed: {e}")
        return False

## 7. Train & Submit Scikit-learn Models

In [7]:
fairness_gen = fairness_value_generator()

X_train_processed = preprocessor_func(X_train)
X_test_processed = preprocessor_func(X_test)

sklearn_models = [
    ("LogisticRegression", LogisticRegression(max_iter=500, random_state=42, class_weight='balanced')),
    ("RandomForestClassifier", RandomForestClassifier(n_estimators=40, max_depth=10, random_state=42, class_weight='balanced')),
]

for name, model in sklearn_models:
    print("\n" + "-"*60)
    print(f"Training {name}")
    try:
        model.fit(X_train_processed, y_train)
        if hasattr(model, 'predict_proba'):
            proba = model.predict_proba(X_test_processed)[:, 1]
            preds = (proba >= 0.5).astype(int)
        else:
            preds = model.predict(X_test_processed)
        print(f"Predictions generated: {len(preds)}; Distribution: {pd.Series(preds).value_counts().to_dict()}")
    except Exception as e:
        print(f"✗ Training failed for {name}: {e}")
        continue

    for submission_type in ['competition', 'experiment']:
        fairness_val = next(fairness_gen)
        submit_model_helper(playground, model, preprocessor, preds, 'sklearn', name, submission_type, fairness_val)


------------------------------------------------------------
Training LogisticRegression
Predictions generated: 625; Distribution: {0: 335, 1: 290}
Submitting LogisticRegression (sklearn) as competition with metadata: {'Moral_Compass_Fairness': '0.25'}

Your model has been submitted to competition as model version 1.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:4105
✓ Submission succeeded.
Submitting LogisticRegression (sklearn) as experiment with metadata: {'Moral_Compass_Fairness': '0.50'}

Your model has been submitted to experiment as model version 1.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:4105
✓ Submission succeeded.

------------------------------------------------------------
Training RandomForestClassifier
Predictions generated: 625; Distribution: {0: 369, 1: 256}
Submitting RandomForestClassifier (sklearn) as competition with metadata: {'Moral_Compass_Fairness': '0.75'}

Your model has been subm

## 8. Train & Submit Keras Model

In [8]:
print("\n" + "="*60)
print("Training Keras Sequential Model")
input_dim = X_train_processed.shape[1]

keras_model = Sequential([
    layers.Input(shape=(input_dim,)),
    layers.Dense(64, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])
keras_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

try:
    keras_model.fit(X_train_processed, y_train, epochs=6, batch_size=64, verbose=0, validation_split=0.1)
    proba = keras_model.predict(X_test_processed, verbose=0).flatten()
    keras_preds = (proba >= 0.5).astype(int)
    print(f"Keras predictions distribution: {pd.Series(keras_preds).value_counts().to_dict()}")
    for submission_type in ['competition', 'experiment']:
        fairness_val = next(fairness_gen)
        submit_model_helper(playground, keras_model, preprocessor, keras_preds, 'keras', 'sequential_dense', submission_type, fairness_val)
except Exception as e:
    print(f"✗ Keras training/submission failed: {e}")


Training Keras Sequential Model
Keras predictions distribution: {0: 406, 1: 219}
Submitting sequential_dense (keras) as competition with metadata: {'Moral_Compass_Fairness': '0.50'}

Your model has been submitted to competition as model version 3.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:4105
✓ Submission succeeded.
Submitting sequential_dense (keras) as experiment with metadata: {'Moral_Compass_Fairness': '0.75'}

Your model has been submitted to experiment as model version 3.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:4105
✓ Submission succeeded.


## 9. Train & Submit PyTorch Model

In [9]:
print("\n" + "="*60)
print("Training PyTorch MLP Model")
input_dim = X_train_processed.shape[1]

class MLPBasic(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 1)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

pytorch_model = MLPBasic(input_dim)

try:
    X_train_tensor = torch.FloatTensor(X_train_processed)
    y_train_tensor = torch.FloatTensor(y_train).unsqueeze(1)
    X_test_tensor = torch.FloatTensor(X_test_processed)

    criterion = nn.BCEWithLogitsLoss()
    optimizer = torch.optim.Adam(pytorch_model.parameters(), lr=0.001)

    dataset = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=64, shuffle=True)

    pytorch_model.train()
    for epoch in range(6):
        for batch_X, batch_y in dataloader:
            optimizer.zero_grad()
            outputs = pytorch_model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

    pytorch_model.eval()
    with torch.no_grad():
        logits = pytorch_model(X_test_tensor)
        proba = torch.sigmoid(logits).numpy().flatten()
        pytorch_preds = (proba >= 0.5).astype(int)
    print(f"PyTorch predictions distribution: {pd.Series(pytorch_preds).value_counts().to_dict()}")

    for submission_type in ['competition', 'experiment']:
        fairness_val = next(fairness_gen)
        submit_model_helper(playground, pytorch_model, preprocessor, pytorch_preds, 'pytorch', 'mlp_basic', submission_type, fairness_val)
except Exception as e:
    print(f"✗ PyTorch training/submission failed: {e}")


Training PyTorch MLP Model
PyTorch predictions distribution: {0: 400, 1: 225}
Submitting mlp_basic (pytorch) as competition with metadata: {'Moral_Compass_Fairness': '0.25'}

Your model has been submitted to competition as model version 4.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:4105
✓ Submission succeeded.
Submitting mlp_basic (pytorch) as experiment with metadata: {'Moral_Compass_Fairness': '0.50'}

Your model has been submitted to experiment as model version 4.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:4105
✓ Submission succeeded.


## 10. Retrieve & Inspect Leaderboard
Ensures submissions exist and prints full leaderboard for review.

In [10]:
try:
    lb_data = playground.get_leaderboard()
    if isinstance(lb_data, dict):
        df_lb = pd.DataFrame(lb_data)
    else:
        df_lb = lb_data

    if df_lb.empty:
        print("Leaderboard is empty. (Possibly all submissions failed or were skipped.)")
    else:
        print(f"Leaderboard entries: {len(df_lb)}")
        if 'tags' in df_lb.columns:
            tag_series = df_lb['tags'].astype(str)
            print("Tag counts:")
            for t in ['compas_short', 'sklearn', 'keras', 'pytorch', 'competition', 'experiment']:
                print(f"  {t}: {tag_series.str.contains(t, case=False, na=False).sum()}")

        with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', 1200):
            print(df_lb.to_string(index=False))
except Exception as e:
    print(f"Failed to retrieve leaderboard: {e}")

Leaderboard entries: 4
 accuracy  f1_score  precision   recall ml_framework             model_type  depth  num_params  dense_layers  sigmoid_act  relu_act loss optimizer  Moral_Compass_Fairness     username                  timestamp  version
   0.6944  0.686371   0.687960 0.685409      sklearn RandomForestClassifier    NaN         NaN           NaN          NaN       NaN  NaN       NaN                    0.25 mikedparrott 2025-10-29 08:56:34.742668        2
   0.6928  0.678684   0.688056 0.676737      pytorch             MLPBasic()    NaN         NaN           NaN          NaN       NaN  NaN       NaN                    0.50 mikedparrott 2025-10-29 08:57:18.704495        4
   0.6752  0.671537   0.671076 0.673552      sklearn     LogisticRegression    NaN        70.0           NaN          NaN       NaN  NaN     lbfgs                    0.50 mikedparrott 2025-10-29 08:56:22.191170        1
   0.6832  0.667210   0.678341 0.665584        keras             Sequential    3.0      6657.0   

## 11. Next Steps
- Adjust model architectures or hyperparameters for experimentation.
- Use different fairness metadata strategies.
- Toggle playground visibility or extend with new frameworks.
- Integrate automated evaluation workflows.

This concludes the lightweight multi-framework COMPAS submission demo.

---
# Moral Compass Challenge Extension
The following sections extend the notebook to test creation of a new Moral Compass challenge table and simulate a user progressing through tasks and questions to obtain a final Moral Compass score.

Logic adapted from `tests/test_playground_moral_compass_challenge.py`.

## 12. Resolve API Base URL & Initialize Client
We attempt to resolve the Moral Compass API base URL using:
1. `MORAL_COMPASS_API_BASE_URL` environment variable.
2. `get_api_base_url()` fallback.

If resolution fails, the challenge section will be skipped.

In [14]:
# manually setting this here, but needs to be dynamic in codebase
os.environ['MORAL_COMPASS_API_BASE_URL'] = 'https://b22q73wp50.execute-api.us-east-1.amazonaws.com/dev'


In [13]:
from aimodelshare.moral_compass import MoralcompassApiClient
from aimodelshare.moral_compass.api_client import NotFoundError, ApiClientError
from aimodelshare.moral_compass.challenge import ChallengeManager
from aimodelshare.moral_compass.config import get_api_base_url

def resolve_api_base_url():
    env_url = os.getenv('MORAL_COMPASS_API_BASE_URL')
    if env_url:
        return env_url.rstrip('/')
    try:
        return get_api_base_url()
    except RuntimeError as e:
        raise RuntimeError(
            "Could not resolve API base URL. Set MORAL_COMPASS_API_BASE_URL or ensure terraform outputs are accessible."
        ) from e

try:
    mc_api_base_url = resolve_api_base_url()
    print(f"Resolved Moral Compass API base URL: {mc_api_base_url}")
    mc_api_client = MoralcompassApiClient(api_base_url=mc_api_base_url)
    mc_available = True
except Exception as e:
    print(f"Moral Compass API not available: {e}. Skipping challenge section.")
    mc_available = False

Resolved Moral Compass API base URL: https://b22q73wp50.execute-api.us-east-1.amazonaws.com/dev


## 13. Create / Ensure Challenge Table
We create (idempotently) a new challenge table for a Justice & Equity themed challenge.
Naming convention: `<playground_id>-mc` or fallback if playground ID unavailable.

In [16]:
if mc_available:
    USERNAME = os.getenv('username') or input("Enter username for Moral Compass challenge: ") or 'notebook_user_mc'
    base_playground_id = playground_id or 'compas_playground_notebook'
    TABLE_ID = f"{base_playground_id}-mc"
    PLAYGROUND_URL =playground.playground_url if playground else None

    print(f"Attempting to create/ensure table: {TABLE_ID}")
    try:
        mc_api_client.create_table(
            TABLE_ID,
            display_name='Justice & Equity Challenge (Notebook)',
            playground_url=PLAYGROUND_URL
        )
        print("Table creation invoked (may already exist).")
    except Exception as e:
        print(f"Table creation skipped/failed (likely exists): {e}")

    # Confirm availability with retries
    import time
    max_retries = 8
    for attempt in range(max_retries):
        try:
            mc_api_client.get_table(TABLE_ID)
            print("Table metadata confirmed.")
            break
        except (NotFoundError, ApiClientError) as e:
            if attempt == max_retries - 1:
                print(f"Failed to confirm table after {max_retries} attempts: {e}")
                mc_available = False
            else:
                time.sleep(0.6)
else:
    print("Skipping table creation (API unavailable).")

Attempting to create/ensure table: compas_playground_notebook-mc
Table creation invoked (may already exist).
Table metadata confirmed.


## 14. Pre-Sync Smoke Test
Submit a minimalist update to ensure the endpoint accepts metrics and returns a `moralCompassScore`.

In [18]:
if mc_available:
    try:
        smoke_resp = mc_api_client.update_moral_compass(
            table_id=TABLE_ID,
            username=USERNAME,
            metrics={'accuracy': 0.5},
            tasks_completed=0,
            total_tasks=6,
            questions_correct=0,
            total_questions=14
        )
        assert 'moralCompassScore' in smoke_resp, 'Expected moralCompassScore in smoke response.'
        print(f"✓ Smoke test passed. Response: {smoke_resp}")
    except Exception as e:
        print(f"Smoke test failed: {e}")
        mc_available = False
else:
    print("Skipping smoke test (API unavailable).")

✓ Smoke test passed. Response: {'username': 'mikedparrott', 'metrics': {'accuracy': 0.5}, 'primaryMetric': 'accuracy', 'moralCompassScore': 0.0, 'tasksCompleted': 0, 'totalTasks': 6, 'questionsCorrect': 0, 'totalQuestions': 14, 'message': 'Moral compass data updated successfully', 'createdNew': True}


## 15. Build Synthetic Dataset & Select Best Model
We reproduce a small synthetic COMPAS-like dataset, train logistic regressions with different `C` values, and select the best accuracy for reporting to the challenge.

In [24]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

def build_mc_dataset(n=200, seed=42):
    rng = np.random.default_rng(seed)
    race = rng.choice(['Black','White'], size=n, p=[0.5,0.5])
    sex = rng.choice(['Male','Female'], size=n, p=[0.6,0.4])
    age = rng.integers(18, 60, size=n)
    priors = rng.integers(0, 15, size=n)
    base = 0.3 + 0.03*priors + (race == 'Black')*0.05 + (sex == 'Male')*0.02
    prob = 1/(1+np.exp(-(base - 0.5)))
    label = (rng.random(n) < prob).astype(int)
    df = pd.DataFrame({'race':race,'sex':sex,'age':age,'priors':priors,'label':label})
    return df

def featurize_mc(df):
    d = df.copy()
    d['race_Black'] = (d['race']=='Black').astype(int)
    d['sex_Male'] = (d['sex']=='Male').astype(int)
    X = d[['age','priors','race_Black','sex_Male']]
    y = d['label']
    return X, y

def train_lr_variant(X, y, C):
    Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.25, random_state=13)
    pipe = Pipeline([
        ('scaler', StandardScaler()),
        ('lr', LogisticRegression(C=C, max_iter=2000))
    ])
    pipe.fit(Xtr, ytr)
    acc = pipe.score(Xte, yte)
    return pipe, acc

if mc_available:
    df_mc = build_mc_dataset()
    X_mc, y_mc = featurize_mc(df_mc)
    candidate_Cs = [0.1, 1.0, 3.0,5]
    best_acc = -1.0
    best_C = None
    for C in candidate_Cs:
        modelC, accC = train_lr_variant(X_mc, y_mc, C)
        print(f"C={C} | Accuracy={accC:.4f}")
        if accC > best_acc:
            best_acc = accC
            best_C = C
    print(f"Best C: {best_C} with accuracy={best_acc:.4f}")
else:
    print("Skipping model selection (API unavailable).")

C=0.1 | Accuracy=0.5600
C=1.0 | Accuracy=0.5600
C=3.0 | Accuracy=0.5600
C=5 | Accuracy=0.5600
Best C: 0.1 with accuracy=0.5600


## 16. Initialize Challenge Manager & Set Metric
We record the best accuracy as the primary metric for the user in the challenge.

In [20]:
from aimodelshare.moral_compass.challenge import ChallengeManager

if mc_available:
    try:
        manager = ChallengeManager(table_id=TABLE_ID, username=USERNAME, api_client=mc_api_client)
        manager.set_metric('accuracy', best_acc, primary=True)
        print(f"Primary metric 'accuracy' set to {best_acc:.4f}")
    except Exception as e:
        print(f"Failed to initialize ChallengeManager: {e}")
        mc_available = False
else:
    print("Skipping manager initialization (API unavailable).")

Primary metric 'accuracy' set to 0.5600


## 17. Progress Through Tasks & Questions
We iterate through all tasks, completing them and answering each question with the correct index. After each task block, we sync with the server and monitor score progression.

In [21]:
if mc_available:
    try:
        challenge = manager.challenge
        prev_score = 0.0
        for task in challenge.tasks:
            print(f"\nTask: {task.id} - Completing and answering questions...")
            manager.complete_task(task.id)
            for q in task.questions:
                manager.answer_question(task.id, q.id, selected_index=q.correct_index)
            sync_resp = manager.sync()
            mc_score = sync_resp.get('moralCompassScore', 0)
            print(f"Synced. Moral Compass Score: {mc_score:.4f}")
            if mc_score + 1e-9 < prev_score:
                print(f"Warning: Score decreased from {prev_score:.4f} to {mc_score:.4f}")
            prev_score = mc_score
        print("\nAll tasks completed and synced.")
    except Exception as e:
        print(f"Error during task progression: {e}")
        mc_available = False
else:
    print("Skipping task progression (API unavailable).")


Task: A - Completing and answering questions...
Synced. Moral Compass Score: 0.0933

Task: B - Completing and answering questions...
Synced. Moral Compass Score: 0.1867

Task: C - Completing and answering questions...
Synced. Moral Compass Score: 0.2800

Task: D - Completing and answering questions...
Synced. Moral Compass Score: 0.3733

Task: E - Completing and answering questions...
Synced. Moral Compass Score: 0.4667

Task: F - Completing and answering questions...
Synced. Moral Compass Score: 0.5600

All tasks completed and synced.


In [28]:
# Increase accuracy after moral compass score maxed out
manager.set_metric('accuracy', .77, primary=True)
sync_resp = manager.sync()

## 18. Final Summary & Leaderboard Validation
We retrieve the user's progress summary and verify leaderboard entry alignment with local score preview.

In [29]:
if mc_available:
    try:
        summary = manager.get_progress_summary()
        print("Progress Summary:")
        for k, v in summary.items():
            print(f"  {k}: {v}")
        assert summary['tasksCompleted'] == summary['totalTasks'], 'Not all tasks completed.'
        assert summary['questionsCorrect'] == summary['totalQuestions'], 'Not all questions answered correctly.'
        final_local = summary.get('localScorePreview', 0)
        assert final_local > 0, 'Final local score should be positive.'

        lb = mc_api_client.list_users(TABLE_ID, limit=100)
        entries = [u for u in lb.get('users', []) if u.get('username') == USERNAME]
        if not entries:
            print("User not found on leaderboard.")
        else:
            user_entry = entries[0]
            print("\nLeaderboard Entry:")
            for k, v in user_entry.items():
                print(f"  {k}: {v}")
            mc_score_lb = user_entry.get('moralCompassScore', 0)
            if mc_score_lb + 0.2 < final_local:
                print("Leaderboard score appears misaligned with local preview.")
            else:
                print("✓ Leaderboard score aligned sufficiently with local preview.")
    except Exception as e:
        print(f"Final summary/leaderboard validation failed: {e}")
else:
    print("Skipping final summary (API unavailable).")

Progress Summary:
  tasksCompleted: 6
  totalTasks: 6
  questionsCorrect: 6
  totalQuestions: 6
  metrics: {'accuracy': 0.77}
  primaryMetric: accuracy
  localScorePreview: 0.77

Leaderboard Entry:
  username: mikedparrott
  submissionCount: 0
  totalCount: 0
  lastUpdated: 2025-10-29T09:14:07.776905
  moralCompassScore: 0.77
  metrics: {'accuracy': 0.77}
  primaryMetric: accuracy
  tasksCompleted: 6
  totalTasks: 6
  questionsCorrect: 6
  totalQuestions: 6
✓ Leaderboard score aligned sufficiently with local preview.


## 19. Challenge Section Complete
You have (if API was available) created a challenge table, submitted metric progress, completed tasks & questions, and validated the leaderboard entry.

Feel free to modify:
- Hyperparameter candidates
- Additional metrics
- Custom scoring logic
- Partial task completion for experimental flows

End of notebook.