SCM Secret Scanner

로컬 checkout에서 secret 유출 가능성을 찾고, 검토 가능한 결과로 정리하는 local-first scanner입니다.

첫 번째 목표는 “한 번 실행해 보는 도구”가 아닙니다. 같은 입력에서 같은 스캔 결과와 판단을 다시 만들 수 있는 작은 파이프라인을 완성하는 것이 목표입니다.

targets.local.yaml -> workspace -> Gitleaks -> Finding -> local store -> report -> gate

이 저장소는 공개 저장소입니다. 실제 대상 목록, 스캔 결과, 외부 보안 도구 export, credential, 내부 이름, 원본 finding은 커밋하지 않습니다.

그런 자료 중 로컬 입력/출력 artifact는 private/ 또는 저장소 밖에 둡니다. 운영 DB와 notification 연동은 이 저장소에 직접 구현하지 않습니다. 이 공개 저장소가 소유하는 것은 provider-neutral adapter/env 계약과 redacted snapshot export입니다.

지금 되는 것

현재 구현은 로컬 실행 경로에 집중되어 있습니다.

targets.local.yaml에 적힌 로컬 checkout을 읽습니다.
Gitleaks를 실행해 secret 후보를 찾습니다.
스캐너 결과를 내부 표준 모델인 Finding으로 바꿉니다.
결과를 JSONL 또는 DynamoDB-compatible local store에 저장합니다.
저장된 결과로 report와 gate 판단을 다시 만듭니다.
synthetic corpus로 precision, recall, false negative를 계산합니다.
Ollama-compatible verifier로 finding의 triage 상태를 보조합니다.

자세한 진행 상황은 progress dashboard와 project overview에 정리되어 있습니다.

현재 지원 범위

현재 다루는 일:

로컬 filesystem checkout 스캔
Gitleaks 기반 secret detection
Finding 모델로 결과 정규화
로컬 저장소에 스캔 이력과 finding 저장
저장된 결과 기반 report/gate 생성
synthetic corpus 기반 품질 측정
redacted metadata만 사용하는 verifier 실험

현재 지원 범위에 포함하지 않는 일:

실제 외부 보안 도구 export 또는 alert data 커밋
조직 전체 repository discovery
관리형 클라우드 실행 환경 연동
내부 endpoint 또는 알림 시스템 연동
Gitleaks를 대체하는 primary scanner 도입
관리형 SAST/SCA 연동

빠른 시작

전제: uv, gitleaks v8, security-scanner source checkout, 스캔할 별도 로컬 checkout이 준비되어 있습니다.

아래 명령은 security-scanner 저장소 root에서 실행합니다. targets.local.yaml에는 스캔 대상 저장소 경로를 적습니다.

uv sync
cp examples/targets.local.example.yaml targets.local.yaml
# targets.local.yaml에 로컬 checkout 경로를 적습니다.

uv run security-scanner scan   --manifest targets.local.yaml --output private/findings.jsonl
uv run security-scanner report --findings private/findings.jsonl
uv run security-scanner gate   --findings private/findings.jsonl --max 0

private/는 gitignore 대상입니다. 실제 스캔 결과와 로컬 설정은 이 경계 안에 둘 수 있지만, 운영 연동 코드와 실행 wrapper는 이 공개 저장소에 두지 않습니다.

단계별 설명, manifest 필드, 트러블슈팅은 시작하기 가이드를 참고하세요.

로컬 NoSQL 저장소

DynamoDB-compatible backend는 로컬에서 조회 패턴을 검증하기 위한 저장소입니다. 관리형 저장소 연동은 현재 지원 범위가 아닙니다.

로컬 DB는 저장소에 포함된 docker-compose.yml의 DynamoDB Local 컨테이너로 띄웁니다. Host에서는 http://localhost:4567에서 응답하고, 데이터는 named Compose volume에 유지됩니다.

docker compose up -d db

Host의 4567 포트가 이미 사용 중이면 SECURITY_SCANNER_DYNAMO_HOST_PORT로 바꿔 띄울 수 있습니다. Worker 컨테이너는 compose 내부 endpoint를 사용하므로 그대로 동작합니다.

SECURITY_SCANNER_DYNAMO_HOST_PORT=14567 docker compose up -d db

새 PC에서 public HTTPS repo 하나를 바로 검증하려면 Docker 경로를 사용할 수 있습니다.

SECURITY_SCANNER_QUICKSTART_TARGET=https://github.com/<owner>/<repo> \
  docker compose up --build --abort-on-container-exit --exit-code-from quickstart quickstart

커스텀 GitLab 도메인은 URL만으로 provider를 판별할 수 없으므로 provider hint를 함께 지정합니다.

SECURITY_SCANNER_QUICKSTART_TARGET=https://source.example.test/<group>/<repo> \
SECURITY_SCANNER_SCM_PROVIDER=gitlab \
  docker compose up --build --abort-on-container-exit --exit-code-from quickstart quickstart

Compose runtime contract (#87)

docker-compose.yml now defines the public container scale-unit contract for db, quickstart, secret-discovery, secret-scan-worker, llm-verify, ghas-compare, vuln-scan, vuln-freshness, and notification-publisher.

quickstart is the 로컬 검증 전용 scanner runtime path and the only scanner runtime service that bypasses the runtime owner gate. db is local infrastructure. Every operating role starts through the owner gate and stays disabled until its SECURITY_SCANNER_OWNER_<ROLE> value matches SECURITY_SCANNER_RUNTIME_LAUNCHER=compose.

notification-publisher is only a public contract slot. The default command is fail-closed and does not send notifications, publish to an operations store, or claim delivery. A private adapter can replace that slot outside this public repository.

#87 stops at the Docker/Compose contract. #105 Phase A decides scaling intent and rollout guardrails, #88 owns elastic discovery work distribution, and #105 Phase B maps the settled contracts to Nomad.

#105 Phase A keeps scaling intent at contract level. It does not set fixed worker counts, fixed shard ownership, Nomad job specs, or autoscale policy.

Role	Phase A scaling intent
`db`	Singleton local infrastructure dependency.
`quickstart`	Singleton local proof/batch path and the owner-gate bypass exception.
`secret-discovery`	Scheduled/batch discovery producer today; elastic discovery worker target only after #88 defines safe work distribution.
`secret-scan-worker`	Horizontally scalable queue worker when lease and idempotency gates are respected.
`llm-verify`	Horizontally scalable worker, constrained by verifier endpoint budget and timeout behavior.
`ghas-compare`	Scheduled/batch comparison role, not an always-on scale-out target in Phase A.
`vuln-scan`	Horizontally scalable scan worker/batch role when artifact and cache boundaries remain isolated.
`vuln-freshness`	Scheduled/batch freshness evaluation role.
`notification-publisher`	Singleton public fail-closed slot; private adapters decide real delivery topology outside this repository.

#88 must treat secret-discovery scale-out as elastic work distribution, not fixed producer ownership. The handoff checklist is:

provider-specific backlog namespace;
lease-based work claiming;
idempotent enqueue/write boundary;
retry and dead-letter state by provider/work type;
duplicate prevention when worker count changes;
no fixed producer instance ownership;
launcher-neutral invocation contract;
owner gate / activation gate reuse so multiple launchers do not run the same producer or worker concurrently.

#88 adds that launcher-neutral discovery work contract as provider-scoped commands. discovery-work enqueue --provider <provider> materializes provider/repo work rows from the catalog. discovery-work drain --provider <provider> leases repo work and reuses the existing single-repo incremental discovery path to enqueue deterministic SCAN_JOB rows. discovery-work backlog --provider <provider> and discovery-work reap-expired --provider <provider> expose the provider-scoped observability/recovery seam. The Compose secret-discovery projection runs enqueue once, then drain --daemon; scaled replicas rely on idempotent enqueue and lease fencing rather than fixed producer ownership. discover-updates remains the compatibility path for direct local sweeps.

uv run security-scanner init-storage \
  --storage-backend dynamodb \
  --dynamodb-endpoint-url http://localhost:4567 \
  --dynamodb-table security_scanner_local_dev

uv run security-scanner scan \
  --manifest targets.local.yaml \
  --storage-backend dynamodb

uv run security-scanner report \
  --storage-backend dynamodb \
  --scan-run-id scan_<id>

uv run security-scanner gate \
  --storage-backend dynamodb \
  --scan-run-id scan_<id> \
  --max 0

스캔을 실행하면 Scan run ID가 출력됩니다. 특정 실행 결과만 보고 싶으면 그 값을 --scan-run-id로 넘깁니다. 저장소 전체를 대상으로 판단할 때만 생략합니다.

카탈로그(add-target)에 등록한 여러 저장소를 한 번에 스캔하는 scan-all 흐름은 시작하기 가이드의 "주기 스캔 로컬 테스트" 절을 참고합니다. 기본 scan-all은 verifier를 호출하지 않으며, terminal verifier verdict를 disposition에 반영하려면 --verify-artifacts를 명시합니다.

외부 운영 runner가 최신 상태를 읽어야 할 때는 public-safe snapshot만 export합니다. 이 명령은 read-only이며 운영 DB publish나 notification 전송을 하지 않습니다.

uv run security-scanner export-latest-snapshot \
  --storage-backend dynamodb \
  --no-findings \
  --redact-repo-identifiers \
  --output private/latest-snapshot.json

외부 runner가 publish/notification adapter를 붙일 때도 이 저장소는 generic env contract만 검증합니다. 기본값은 delivery disabled이며, 실제 publish/send는 이 명령으로 실행되지 않습니다.

uv run security-scanner validate-operations-contract --json

Schema와 조회 기준은 소스 스캔 결과 NoSQL Schema에 정리되어 있습니다.

평가와 verifier

품질 평가는 실제 비공개 repository가 아니라 synthetic corpus로 먼저 확인합니다.

uv run security-scanner scan \
  --manifest eval/synthetic-corpus/targets.local.example.yaml \
  --output private/eval-findings.jsonl

uv run security-scanner evaluate \
  --expected eval/synthetic-corpus/expected-findings.example.json \
  --findings private/eval-findings.jsonl

Verifier 적용 전후도 같은 방식으로 비교합니다.

Ollama가 scanner를 실행하는 Ubuntu host에 설치되어 있으면 host는 localhost로 둡니다.

export SECURITY_SCANNER_OLLAMA_HOST=http://127.0.0.1:11434
export SECURITY_SCANNER_OLLAMA_MODEL=lfm2.5-thinking

uv run security-scanner verify \
  --findings private/eval-findings.jsonl \
  --output private/eval-verified-findings.jsonl

uv run security-scanner evaluate \
  --expected eval/synthetic-corpus/expected-findings.example.json \
  --findings private/eval-findings.jsonl \
  --after-findings private/eval-verified-findings.jsonl

오탐 감소 흐름은 detector-visible documentation candidate를 포함한 별도 corpus로 확인합니다.

uv run security-scanner scan \
  --manifest eval/verifier-corpus/targets.local.example.yaml \
  --output private/verifier-findings.jsonl

uv run security-scanner verify \
  --findings private/verifier-findings.jsonl \
  --output private/verifier-verified-findings.jsonl

uv run security-scanner evaluate \
  --expected eval/verifier-corpus/expected-findings.example.json \
  --findings private/verifier-findings.jsonl \
  --after-findings private/verifier-verified-findings.jsonl \
  --precision-min 0.5

Verifier는 detector가 아닙니다. Finding을 삭제하지 않고, 사람이 검토할 때 참고할 triage 상태만 붙입니다.

응답 실패, timeout, 낮은 confidence는 모두 “검토 필요” 상태로 남깁니다.

구조

의존 방향은 단순합니다.

core <- scanners/storage/llm/adapters <- cli/runtime

core/는 Finding, policy, report, evaluation을 담당합니다.
scanners/gitleaks/는 Gitleaks 실행과 결과 parsing을 담당합니다.
storage/는 JSONL과 DynamoDB-compatible 저장소를 담당합니다.
llm/ollama/는 redacted metadata 기반 verifier 호출을 담당합니다.
Operations integrations must enter only through provider-neutral adapter/env contracts; private runtime source code is out of scope.

더 자세한 설명은 시스템 구조와 실행 환경을 보세요.

문서

문서의 시작점은 docs/README.md입니다.

docs/views/, docs/assets/, docs/dashboards/는 publish 후보입니다.

docs/workbench/와 docs/reference/는 기본 publish 대상이 아닙니다.

공개 저장소 안전 기준

커밋 전 확인할 것:

git status로 staged file을 확인합니다.
Diff에서 secret-like value, private name, host, path, raw finding을 찾습니다.
파일명을 지정해서 stage합니다. git add -A와 git add .는 피합니다.
실제 증거 자료는 private/ 또는 저장소 밖에 둡니다. 운영 연동 코드는 committed source 밖에서 generic env contract에 붙이고, 이 공개 저장소에는 provider-neutral adapter/env 계약과 redacted snapshot 표면만 둡니다.

세부 기준은 공개 저장소 안전 정책에 있습니다.

다음 작업

로컬 저장소와 workspace 흐름을 먼저 안정화합니다.
확장 adapter는 core pipeline을 재사용하는 방식으로만 추가합니다.

Name		Name	Last commit message	Last commit date
Latest commit History 376 Commits
.agents/skills		.agents/skills
.claude/skills/capture-collaboration-idea		.claude/skills/capture-collaboration-idea
.codex/specs		.codex/specs
.github		.github
deploy/systemd		deploy/systemd
docs		docs
eval		eval
examples		examples
governance		governance
ledger		ledger
private		private
rules		rules
scripts		scripts
src/security_scanner		src/security_scanner
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pr_agent.toml		.pr_agent.toml
AGENTS.md		AGENTS.md
CURRENT.md		CURRENT.md
Dockerfile		Dockerfile
README.md		README.md
TRACE.md		TRACE.md
docker-compose.yml		docker-compose.yml
dynamodb-local-metadata.json		dynamodb-local-metadata.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SCM Secret Scanner

지금 되는 것

현재 지원 범위

빠른 시작

로컬 NoSQL 저장소

Compose runtime contract (#87)

평가와 verifier

구조

문서

공개 저장소 안전 기준

다음 작업

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SCM Secret Scanner

지금 되는 것

현재 지원 범위

빠른 시작

로컬 NoSQL 저장소

Compose runtime contract (#87)

평가와 verifier

구조

문서

공개 저장소 안전 기준

다음 작업

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages