Skip to content

osctrl: defensive hardening round — login timing, rate-limit XFF bypass, JWT rotation, info-disclosure closes#826

Merged
javuto merged 8 commits into
jmpsec:mainfrom
alvarofraguas:pr/security-hardening
May 18, 2026
Merged

osctrl: defensive hardening round — login timing, rate-limit XFF bypass, JWT rotation, info-disclosure closes#826
javuto merged 8 commits into
jmpsec:mainfrom
alvarofraguas:pr/security-hardening

Conversation

@alvarofraguas
Copy link
Copy Markdown
Collaborator

osctrl: defensive hardening round — login timing, rate-limit XFF bypass, JWT rotation, info-disclosure closes

This is the first of a multi-PR series. Each PR after this one builds
on its predecessor and will go up separately once you've had a chance
to look at this one. This PR is intentionally the smallest of the
series — pure defensive hardening, no new endpoints, no behaviour
change for legitimate users.

Eight discrete fixes, surfaced during the pentest passes I ran against
the dev stack. Numbered below in the order I'd suggest reading the
diff.

1. pkg/users: close username-enumeration timing leak

CheckLoginCredentials returned in ~15 ms when the user didn't
exist (DB miss → no bcrypt) vs ~300 ms when the user existed but
the password was wrong (DB hit → bcrypt cost-12 compare). A trivial
remote distinguisher for "valid username, wrong password" vs
"unknown username."

Fix: add a package-init dummyHash bcrypt-hashed at the same cost,
and run bcrypt.CompareHashAndPassword(dummyHash, …) on the DB-miss
path. Result is discarded; we still return false. The compare's
sole purpose is to burn the same wall-clock as a real compare.

Regression test in pkg/users/users_test.go:
TestCheckLoginCredentials_TimingEqualization asserts ratio < 2.0×.
Measured locally: 1.00×. Skipped under -short because bcrypt
cost-12 burns ~300ms per iteration.

A second regression test TestCheckLoginCredentials_UnknownUserStillReturnsFalse
pins that the dummyHash compare doesn't accidentally turn an unknown
user into a successful login — the compare result must be discarded.

2. pkg/ratelimit: close X-Forwarded-For rotation bypass

KeyByIP keyed the token bucket on X-Forwarded-For (or
X-Real-IP). When osctrl-api sits behind nginx that's the right
key — but the same code path runs when osctrl-api is directly
exposed (or when nginx isn't stripping client-supplied X-Forwarded-For).
Result: an attacker rotates the header on each request and the
bucket never fills.

Fix: switch KeyByIP to use a new utils.RemoteIP(r) helper that
returns the TCP peer address from r.RemoteAddr. The trusted-proxy
case is handled at the proxy layer (nginx replaces XFF rather than
appending), not in our rate-limiter.

Regression tests: TestKeyByIPIgnoresForwardingHeaders covers 5
sub-cases (no headers, single XFF, chained XFF, X-Real-IP, both).
TestMiddlewareXFFRotationDoesNotBypass is end-to-end: burst=2 then
5 rotated-XFF attempts all 429.

3. osctrl-api: require auth on /queries/samples + /carves/samples

Both endpoints returned a sample-template library
(SELECT * FROM osquery_info LIMIT 5-style) without any auth check.
Two issues:

  • Pre-auth fingerprint: the template payload uniquely identifies the
    osctrl-api version (each release ships a different starter pack),
    letting a scanner version-pin without authenticating.
  • Carve target disclosure: the carve samples include file-path
    arguments ('/etc/shadow', '/var/log/secure') that name
    privileged read targets, useful for someone scoping a future
    CarveLevel-credential phishing campaign.

Fix: move the route registrations from the pre-auth block to the
authenticated block in cmd/api/main.go. The handler functions
themselves were unchanged.

4. osctrl-api: RootHandler 404s on misrouted GETs

Go's ServeMux uses / as a wildcard catch-all for any GET request
the mux doesn't have a more-specific pattern for. With RootHandler
returning 200 unconditionally, typos like GET /api/v1/totally-fake
silently succeeded — confusing for clients debugging an integration
and a weak fingerprint signal for scanners ("endpoint X returned
200 → must exist").

Tighten the contract: respond 200 ONLY when r.URL.Path == "/".
Otherwise fall through to http.NotFound.

5. deploy/nginx: strip nginx version from Server header

Without server_tokens off, nginx emits Server: nginx/1.27.x
a free fingerprint for vuln scanners (Shodan, nuclei) that match
known CVEs by version. server_tokens off makes it just
Server: nginx. Doesn't hide that this IS nginx (no portable way
in OSS nginx) but stops version-keyed CVE matching.

Applied to both deploy/docker/conf/nginx/osctrl.conf and the
dev-stack frontend-dev.conf.

6. osctrl-admin: disable autoindex on /static/

Go's http.FileServer autoindexed /static/GET /static/
returned an HTML directory listing. Useful recon for a scanner
mapping the deployed JS / CSS / icon set.

Fix: wrap http.FileServer in a noDirListing middleware that
404s any path ending in /. Static asset serving for real files
(/static/js/foo.js, etc.) is unaffected.

7. pkg/users: stamp random jti on every JWT

Foundation for the rotation fix in (8). CreateToken's claims
were deterministic: Username + Issuer + ExpiresAt (at 1s
resolution). HMAC-SHA256 is deterministic for the same key +
payload, so two CreateToken calls for the same user in the same
second returned identical JWT strings.

That silently broke any caller depending on token rotation as a
revocation primitive. The auth middleware in cmd/api/auth.go
compares every presented JWT against the stored AdminUser.APIToken
(constant-time) — so logging in is supposed to invalidate the
previous session by overwriting the stored token. But if the new
"minted" token is bitwise identical to the old one, UpdateToken
is a no-op, the stored value doesn't change, and any previously-
issued copy keeps validating.

Stamp a random 16-byte hex jti (RFC 7519 §4.1.7) on every issuance
so claims are guaranteed distinct.

Regression tests in pkg/users/users_test.go:

  • TestCreateTokenIsNonDeterministic — two CreateToken calls
    return different strings.
  • TestCreateTokenStampsJTI — the parsed claims carry distinct
    non-empty jti values.

Both fail loudly if the jti claim is removed (verified locally).

8. osctrl-api: rotate JWT on every login

LoginHandler had a 60s-freshness branch: if the user's stored
APIToken had >60s of life left, login returned it as-is instead
of minting fresh. The original intent was to avoid handing out a
token that would fail mid-request. The side effect was that a
second login from a different device got the SAME JWT — leaving
the previous device's copy valid until natural expiry.

Drop the reuse branch. Always CreateToken + UpdateToken on
successful login. With the jti claim from (7), the new token is
provably distinct, so UpdateToken actually rotates the stored
value. The auth middleware's constant-time compare in
cmd/api/auth.go then fails the old JWT 401 on its next request —
even though the old JWT is still cryptographically valid against
the secret.

The DB-row APIToken check IS the revocation primitive; this
commit just makes login exercise it correctly.

Note: a server-side /logout endpoint that explicitly invalidates
the stored token is part of the next PR in the series (OIDC support
on osctrl-api). The richer logout flow there returns
idp_logout_url for federated sessions, which makes more sense to
introduce alongside OIDC than as a half-feature here.

Verified

  • go build ./... clean.
  • go vet ./... clean.
  • go test ./... green across all packages on pr/security-hardening.
  • Live smoke on a dev stack against real Postgres + Redis + Keycloak:
    • Login timing: bad-user 0.21s median vs unknown-user 0.21s
      median, ratio 1.01×.
    • 15 burst requests with rotated X-Forwarded-For headers from
      the same TCP peer all returned 429 starting at request A lot of changes everywhere #11.
    • GET /api/v1/queries/samples and GET /api/v1/carves/samples
      unauthenticated → 401.
    • GET /api/v1/totally-fake-path → 404.
    • curl -I http://host:8088/Server: nginx with no version.
    • GET /static/ on osctrl-admin → 404.
    • Login JWT carries a non-empty jti claim.
    • Re-login from a fresh session overwrites APIToken in
      admin_users; the previous JWT then fails 401 on
      /checks-auth despite still being signature-valid.

Test plan

  • CI: build + tests pass once workflows are approved.
  • Manual: hit /api/v1/login with a known-bad user and then
    an unknown user; medians should be within 2×.
  • Manual: curl -I http://host/ — Server header should not
    include a version.
  • Manual: curl http://admin-host/static/ → 404.

Follow-ups (separate PRs, will not open until you signal)

Five more PRs are prepared on my fork and waiting:

  • B (pr/auth-shared-package) — pure refactor lifting the
    existing OIDC code in cmd/admin/oidc.go into a reusable
    pkg/auth/ + pkg/auth/oidc/ package. Legacy admin behaviour
    preserved.
  • C (pr/oidc-api-spa) — OIDC support on osctrl-api + the
    React SPA. Introduces /api/v1/logout, /auth/methods,
    /auth/oidc/login, /auth/oidc/callback.
  • D (pr/saml-api-spa) — SAML 2.0 provider for osctrl-api +
    SPA. 41/41 pentest probes pass against a Keycloak IdP.
  • E1 (pr/spa-users-page) — SPA /users page reaches parity
    with the legacy admin (Add User / Delete User / Reset Password,
    env dropdown, bulk grant).
  • E2 (pr/spa-ux-polish) — TargetSelector + carves-new layout
    • node-detail cleanup + audit-page self-pollution fix + CSRF-
      from-cookie boot priming.

Happy to open them in sequence as you merge each predecessor, or
file all six at once with stacked-PR base branches set on my
fork — your preference.

Go's ServeMux uses '/' as a wildcard catch-all for any GET request
the mux doesn't have a more-specific pattern for. With RootHandler
returning 200 unconditionally, typos like 'GET /api/v1/totally-fake'
would silently succeed — confusing for clients debugging an
integration and a weak fingerprint signal for scanners
('endpoint X returned 200 → must exist').

Tighten the contract: respond 200 ONLY when r.URL.Path == '/'.
Otherwise fall out to http.NotFound. Doesn't leak endpoint
structure beyond what's already in the public OpenAPI; just stops
the api from silently claiming success on misrouted requests.
…tials

Pentest finding: POST /api/v1/login takes ~300 ms for a valid
username with the wrong password but only ~15-25 ms for an unknown
username, a 10-15x wall-clock differential that lets an
unauthenticated attacker enumerate valid usernames by timing the
response.

Root cause: CheckLoginCredentials short-circuits when m.Get(username)
returns gorm.ErrRecordNotFound and never runs
bcrypt.CompareHashAndPassword in that branch.

Fix: precompute a dummy bcrypt hash at the current BcryptCost
during package init and, on the DB-miss branch, run a discard
CompareHashAndPassword against it. Both branches now spend the
same ~200 ms of bcrypt-cost-12 work; the result is discarded and
the caller still receives (false, AdminUser{}) for non-existent
users.

dummyHash will be nil only if bcrypt init failed (platform crypto
broken); in that case the comparison is skipped and the timing
leak is accepted — but the authentication subsystem is already
unusable in that scenario.
… bypass)

Pentest finding: an attacker behind an edge proxy that *appends*
to X-Forwarded-For (nginx default, ELB, Cloudflare with default
settings) could rotate the header value per request and cycle
rate-limit buckets, defeating the limiter that protects /login
and the preAuth surfaces.

Root cause: ratelimit.KeyByIP returned utils.GetIP(r), which when
--trusted-proxies is configured walks the XFF chain right-to-left
and returns the first untrusted hop. A typical edge appends the
client-supplied XFF on the way in; the right-most-untrusted hop is
then the attacker-controlled value.

Fix: introduce utils.RemoteIP that returns the direct TCP peer's
IP, ignoring every forwarding header even when --trusted-proxies
is configured, and switch ratelimit.KeyByIP to use it. utils.GetIP
remains the source of truth for audit-log 'real client IP' fields
where the operator wants to see past the proxy; the rate-limit
key trades that visibility for unspoofability.
Both endpoints accepted anonymous GET and returned osctrl-internal
data — the SQL-template starter pack and the carve target list.
Two problems:

1. Either response uniquely fingerprints the deployment as osctrl,
   helping a scanner target version-keyed CVEs rather than
   guessing what's running.
2. /carves/samples specifically returns the canonical exfiltration
   shopping list — /etc/passwd, /etc/shadow, \Windows\System32
   \config\SAM, browser-keychain paths, shell history. An attacker
   who reaches the network surface gets a free hint about what data
   the operator considers carve-worthy.

The pre-auth posture was justified on 'the data is static, ships
with the binary, the login page can lazy-load it.' Tracing actual
consumers shows that's not happening — QuickTemplates lives on
/queries/new and the carve-targets picker on /carves/new, both
post-login. The pre-auth exposure was a wrong design choice with
no consumer benefit.

Fix: move both routes inside the existing flagParams.Osquery.Query
and flagParams.Osquery.Carve blocks where they're guarded by
handlerAuthCheck. The endpoints themselves are unchanged; only the
route registration moves.
Two of the four bundled nginx configs (the legacy admin server
block and the SPA frontend's dev config) didn't have server_tokens
set, so they emit 'Server: nginx/1.27.x' — a free fingerprint of
the deployed build for vuln scanners like Shodan, Censys, and
nuclei templates that index by exact version string.

The other two configs (deploy/nginx/nginx.conf and
deploy/docker/conf/nginx/nginx.conf) already have it set; this
brings the two stragglers in line.

server_tokens off doesn't hide that the server is nginx — there's
no portable way to do that in OSS nginx — but it stops
version-keyed CVE matching cold.
GET /static/ on the admin returned an HTML directory listing of
css/, fonts/, img/, js/. Source was Go's http.FileServer, which
enables autoindex by default.

Fix: wrap the FileServer in a noDirListing middleware that 404s
any request whose URL path ends with '/'. Legitimate file requests
(/static/css/custom.css etc.) pass through unchanged.

404 instead of 403 because the listing was the only signal that
/static/* exists at all — surfacing a 403 would confirm structure
that 404 hides.

Verified:
  GET /static/             -> 404 (was 200 with listing)
  GET /static/css/         -> 404 (was 200 with listing)
  GET /static/css/<real>   -> 200 (unchanged)
CreateToken's claims were deterministic: Username + Issuer + ExpiresAt
(at 1s resolution). HMAC-SHA256 is deterministic for the same key +
payload, so two CreateToken calls for the same user in the same second
returned identical JWT strings.

That silently broke any caller that depends on token rotation as a
revocation primitive. The auth middleware in cmd/api/auth.go compares
every presented JWT against the stored AdminUser.APIToken (constant-
time) — so logging in is supposed to invalidate the previous session
by overwriting the stored token to a new value. But if the new "minted"
token is bitwise identical to the old one, UpdateToken is a no-op, the
stored value doesn't change, and any previously-issued copy (e.g. a
JWT exfiltrated from the user's previous device) keeps validating
against the unchanged stored value.

Stamp a random 16-byte hex jti on every issuance. Even back-to-back
calls in the same nanosecond now produce distinct claims → distinct
signatures → distinct JWT strings. UpdateToken's overwrite now
actually rotates.

Regression tests pin both halves: (a) two CreateToken calls return
different strings, (b) the parsed claims carry distinct jti values.
Verified by neutralising the ID claim — both tests fail loudly.
LoginHandler had a 60s-freshness branch: if the user's stored
APIToken had >60s of life left, login returned it as-is instead
of minting fresh. The original intent was to avoid handing out
a token that would fail mid-request. The side effect was that
a second login from a different device (or from the same user
after a credential rotation) got the SAME JWT — leaving the
previous device's copy valid indefinitely.

Drop the reuse branch. Always CreateToken + UpdateToken on
successful login. With the jti claim from the previous commit,
the new token is provably distinct from any previously-issued
token for this user, so UpdateToken actually rotates the stored
value. The auth middleware in cmd/api/auth.go compares every
presented JWT against the stored APIToken (constant-time), so
once UpdateToken overwrites, any previously-issued copy fails
401 on its next request — even though the old JWT is still
cryptographically valid against the secret.

The DB-row APIToken check IS the revocation primitive; this
commit just makes login actually exercise it.

Server-side /logout is intentionally NOT in this PR: the
richer logout flow (clearing cookies + APIToken + redirecting
to IdP end-session URL for federated users) is part of the
OIDC PR, which introduces the endpoint with its full
behaviour rather than requiring a follow-up rewrite.
@javuto javuto added ✨ enhancement New feature or request 🔐 security Security related issues osctrl-api osctrl-api related changes osctrl-admin osctrl-admin related changes labels May 18, 2026
Copy link
Copy Markdown
Collaborator

@javuto javuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the PR!

@javuto javuto merged commit 064f5b6 into jmpsec:main May 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

✨ enhancement New feature or request osctrl-admin osctrl-admin related changes osctrl-api osctrl-api related changes 🔐 security Security related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants