OSCS: Online Selection with Provable FAR Control for LLM Safety

Environment Setup

We use uv for dependency and environment management.

The project has been tested with Python 3.10.19.

1. Install uv

If you have not installed uv, run:

curl -LsSf https://astral.sh/uv/install.sh | sh

or

pip install uv

2. Create and Sync the Environment

Since the project already includes pyproject.toml and uv.lock, you can directly create and synchronize the environment with:

uv sync

This will automatically:

create a virtual environment,
install the correct Python dependencies,
and reproduce the locked package versions from uv.lock.

Activate the environment if needed:

Linux / macOS

source .venv/bin/activate

Windows (PowerShell)

.venv\Scripts\Activate.ps1

Running Experiments

We provide a run.sh script to reproduce experiments across different datasets, backdoor attack methods, and scoring functions.

Supported Configurations

Dataset

agnews
Yelp
HSOL

Backdoor Attacks (Poisoners)

badnets
addsent
stylebkd
synbkd

Scoring Functions

md (Mahalanobis Distance)
badacts

Defense Method

OSCS

Pretrained Models

roberta-base (default)
bert-base-uncased (optional)

Run All Experiments

First, ensure the script is executable:

chmod +x run.sh

Then execute:

bash run.sh

The script automatically iterates over all combinations of:

scoring functions,
backdoor attack methods,
and datasets,

and runs main.py with the corresponding configurations.

Script Details

The core command executed by run.sh is:

python main.py \
  --dataset_name agnews \
  --poisoner_name <poisoner_name> \
  --method OSCS \
  --score_name <score_name> \
  --model_name roberta-base \
  --T 20000

To switch from RoBERTa to BERT, modify the model name in run.sh:

--model_name bert-base-uncased

Model Checkpoints and Outputs

Trained models and intermediate results are saved to:

./models

You may change the output directory by modifying MODEL_SAVE_PATH in run.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
OSCS		OSCS
configs		configs
openbackdoor		openbackdoor
poison_data		poison_data
scoring		scoring
utils		utils
.gitignore		.gitignore
.python-version		.python-version
Readme.md		Readme.md
main.py		main.py
openbackdoor.py		openbackdoor.py
pyproject.toml		pyproject.toml
run.sh		run.sh
settings.py		settings.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OSCS: Online Selection with Provable FAR Control for LLM Safety

Environment Setup

1. Install uv

2. Create and Sync the Environment

Linux / macOS

Windows (PowerShell)

Running Experiments

Supported Configurations

Dataset

Backdoor Attacks (Poisoners)

Scoring Functions

Defense Method

Pretrained Models

Run All Experiments

Script Details

Model Checkpoints and Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OSCS: Online Selection with Provable FAR Control for LLM Safety

Environment Setup

1. Install uv

2. Create and Sync the Environment

Linux / macOS

Windows (PowerShell)

Running Experiments

Supported Configurations

Dataset

Backdoor Attacks (Poisoners)

Scoring Functions

Defense Method

Pretrained Models

Run All Experiments

Script Details

Model Checkpoints and Outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages