Skip to content

mj064/meta_hack

Repository files navigation

title ContentGuardEnv
emoji 🛡️
colorFrom indigo
colorTo blue
sdk docker
app_port 7860
tags
openenv
trust-and-safety
meta
llama-3
moderation-research
pinned false

ContentGuardEnv

I built ContentGuardEnv for the Meta x Hugging Face Hackathon 2026 as a practical moderation environment where an AI agent has to do more than just classify text.

Instead of only asking "is this toxic?", the environment asks the model to:

  1. Detect the policy category.
  2. Choose a proportional enforcement action.
  3. Explain a decision in an appeal-style format.

Live Deployment

What This Project Does

ContentGuardEnv is an OpenEnv-style environment with three difficulty tiers:

  • Easy: category detection
  • Medium: enforcement action + severity
  • Hard: appeal ruling + policy references

It includes:

  • A FastAPI backend for reset/step/state APIs
  • A WebSocket reasoning stream for live agent traces
  • A browser dashboard to run episodes and inspect rewards
  • A grading pipeline that returns reward + feedback for each decision

Why I Built It

The goal was to simulate the type of moderation decisions that are messy in real systems: ambiguous context, policy tradeoffs, and high-cost mistakes.

This project is meant to be usable both as:

  • A demo app for human-in-the-loop moderation testing
  • A benchmark harness for agent evaluation loops

Stack

  • Python + FastAPI
  • Vanilla JS/CSS frontend
  • OpenAI/Hugging Face compatible inference routing
  • Dockerized runtime for Hugging Face Spaces

Run Locally

  1. Install dependencies.
pip install -r requirements.txt
  1. Set environment variables (or use a local .env).
API_BASE_URL=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini
HF_TOKEN=your_token_here
  1. Start the app.
python server/app.py

Open http://localhost:7860

API Overview

  • POST /reset
  • POST /step/{episode_id}
  • GET /state/{episode_id}
  • GET /health
  • WS /ws

Deploy to Hugging Face Space

This repo includes a helper script:

python sync_repo.py

It syncs the project folder to the Space while ignoring local-only artifacts.

Project Layout

  • server/app.py: FastAPI app + WebSocket gateway
  • server/env/: environment, tasks, graders, data generation
  • server/static/: dashboard HTML/CSS/JS
  • inference.py: script for benchmark/evaluation flows
  • sync_repo.py: one-command Hugging Face Space sync

Notes

This is actively iterated during hackathon development, so UI and evaluation behavior continue to evolve as edge cases are discovered.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors