Skip to content

huzr1999/OSCS

Repository files navigation

OSCS: Online Selection with Provable FAR Control for LLM Safety


Environment Setup

We use uv for dependency and environment management.

The project has been tested with Python 3.10.19.

1. Install uv

If you have not installed uv, run:

curl -LsSf https://astral.sh/uv/install.sh | sh

or

pip install uv

2. Create and Sync the Environment

Since the project already includes pyproject.toml and uv.lock, you can directly create and synchronize the environment with:

uv sync

This will automatically:

  • create a virtual environment,
  • install the correct Python dependencies,
  • and reproduce the locked package versions from uv.lock.

Activate the environment if needed:

Linux / macOS

source .venv/bin/activate

Windows (PowerShell)

.venv\Scripts\Activate.ps1

Running Experiments

We provide a run.sh script to reproduce experiments across different datasets, backdoor attack methods, and scoring functions.

Supported Configurations

Dataset

  • agnews
  • Yelp
  • HSOL

Backdoor Attacks (Poisoners)

  • badnets
  • addsent
  • stylebkd
  • synbkd

Scoring Functions

  • md (Mahalanobis Distance)
  • badacts

Defense Method

  • OSCS

Pretrained Models

  • roberta-base (default)
  • bert-base-uncased (optional)

Run All Experiments

First, ensure the script is executable:

chmod +x run.sh

Then execute:

bash run.sh

The script automatically iterates over all combinations of:

  • scoring functions,
  • backdoor attack methods,
  • and datasets,

and runs main.py with the corresponding configurations.


Script Details

The core command executed by run.sh is:

python main.py \
  --dataset_name agnews \
  --poisoner_name <poisoner_name> \
  --method OSCS \
  --score_name <score_name> \
  --model_name roberta-base \
  --T 20000

To switch from RoBERTa to BERT, modify the model name in run.sh:

--model_name bert-base-uncased

Model Checkpoints and Outputs

Trained models and intermediate results are saved to:

./models

You may change the output directory by modifying MODEL_SAVE_PATH in run.sh.

About

Code Repository for ICML2026 paper OSCS: Online Selection with Provable FAR Control for LLM Safety

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages