chore(ai): Add check-code-attribution skill (JAVA-499)#5449
Draft
0xadam-brown wants to merge 1 commit into
Draft
chore(ai): Add check-code-attribution skill (JAVA-499)#54490xadam-brown wants to merge 1 commit into
0xadam-brown wants to merge 1 commit into
Conversation
Adds a check-code-attribution skill that validates license headers + THIRD_PARTY_NOTICES.md entries for code copied or adapted from third parties. Also verifies license compatiblity against Sentry's licensing policy. Focus is limited to the branch diff. Reports any issues found via PR comments (when run on CI) or to the terminal (when run locally). To run it in Claude Code: ``` /check-code-attribution ``` Runs on CI automatically via [Warden](https://warden.sentry.dev/). - Purely advisory / does not block merge. - Generates PR comments with code suggestions for all discovered issues. - Automatically manages removing stale comments as PRs are updated. Current Warden configs: ┌─────────────────┬─────────────────────────────┬───────────────────────────────────────────────────┐ │ Setting │ Value │ Effect │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ model │ anthropic/claude-sonnet-4-6 │ Model used for analysis │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ maxTurns │ 30 │ Max tool calls per chunk │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ skill │ check-code-attribution │ Per-file vendored code attribution check │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ failOn │ off │ Do not fail workflow if attribution issues found │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ reportOn │ medium │ Show findings at >= medium severity via PR comment│ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ requestChanges │ false │ Never post REQUEST_CHANGES comments on PRs │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ failCheck │ false │ No red X on workflow in GitHub UI if it fails │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ triggers │ pull_request + local │ Runs on PR open/sync and local warden invocations │ ├─────────────────┼─────────────────────────────┼───────────────────────────────────────────────────┤ │ reportOnSuccess │ false (default) │ No comment when everything is clean │ └─────────────────┴─────────────────────────────┴───────────────────────────────────────────────────┘ Going forward, we can consider blocking PRs once we've had a chance to vet behavior in the wild.
📲 Install BuildsAndroid
|
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📜 Description
Adds a
check-code-attributionskill that validates license headers andTHIRD_PARTY_NOTICES.mdentries for code copied or adapted from third parties. Also verifies license compatibility against Sentry's Licensing Policy.The skill focuses on the branch diff only. It reports findings via PR comments when run on CI, or to the terminal when run locally.
To run it from Claude Code:
Warden configuration ensures the skill runs automatically on all PRs:
💡 Motivation and Context
Third-party code attribution is a legal and compliance requirement. Currently, attribution correctness is only caught during manual code review. This skill automates detection of vendored code in branch diffs and can help us flag missing or incomplete attributions before a PR is merged.
Background: Click to expand
Sentry SDKs and third-party code
3 possible ways third-party code enters Sentry’s SDKs (including sentry-java):
1. Plain vanilla dependencies
2. Shaded code
3. Vendored code
All third-party code must be properly attributed, and licenses must be compatible with Sentry’s licensing policies.
Plain deps + shaded code: We run an
enforce-license-complianceGitHub workflow that applies a FOSSA check to all plain vanilla dependencies and our few shaded dependencies, which ensures their licenses are properly attributed and are compatible with Sentry’s licensing policies.Vendored code: Relies on a manual process where developers add attributions to files containing vendored code + include a corresponding entry is included in the THIRD_PARTY_NOTICES.md file that ships with the SDK. Developers are also responsible for ensuring license compatibility.
The criteria for what counts as a proper attribution of vendored code lives in the CODE_ATTRIBUTION_CRITERIA.md file under the heading “Third-Party Code Attribution”.
Goal of this PR: Create a skill that helps us properly attribute vendored code
Types of vendored code:
The skill introduced in this PR protects (1) from regression and identifies instances of (2). (Addressing (3) is out of scope – and is obviously non-trivial.)
Skill does not mandate that license headers exactly match the template from
AGENTS.md(link) so long as all template fields are present.That^^ lets us maintain our current, diverse header formats and remain relatively unopinionated going forward.
Example output
Local runs
PR comments
💚 How did you test it?
[1] Automated validation tests (
check-code-attribution-tests.sh) with scenario files covering:Note: the tests are not run on CI / are only run manually atm (see the
check-code-attribution-tests.shscript).[2] Ran the skill on branches with known attribution issues to verify correct detection and reporting.
Manual tests + output: Click to expand
Note the skill's output format has changed since these tests were run, but the behavior remains the same.
Diff 1: Remove entire license header
Output 1
Required attribution field(s) removed:
Diff 2: Modify existing license header, but retain all required fields
Output 2
Vendored code detected (Apache Commons Collections) – verify that
THIRD_PARTY_NOTICES.mdreflects your updates.Diff 3: Modify existing license header by removing one or more required fields
Output 3
Required attribution field(s) removed:
Copyright (C) 2016 Matej Tymeswas removed from the license header. Please restore it.Diff 4: Leave existing license header unchanged, but make an inconsistent modification to THIRD_PARTY_NOTICES.md entry
Output 4
Entry metadata inconsistent with source file headers:
THIRD_PARTY_NOTICES.md, but the source files (QueueFile.java,FileObjectQueue.java,ObjectQueue.java) all still say "Copyright (C) 2010Square, Inc."
Diff 5: Leave existing license header unchanged, but remove THIRD_PARTY_NOTICES.md entry
Output 5
Source file(s) still reference this library:
-
io.sentry.android.core.SentryShakeDetectorstill contains attribution header for Square's Seismic. Either restore theTHIRD_PARTY_NOTICES.mdentry or remove the vendored code.Diff 6: Add newly-vendored code with valid license header and THIRD_PARTY_NOTICES.md entry
Output 6
Vendored code detected (Dropwizard Metrics SlidingWindowReservoir) – verify that
THIRD_PARTY_NOTICES.mdreflects your updates.Diff 7: Add newly-vendored code with valid license header but no THIRD_PARTY_NOTICES.md entry
Output 7
Vendored code detected (Caffeine Cache) — attribution header is complete.
THIRD_PARTY_NOTICES.md. An entry needs to be added.Diff 8: Add newly-vendored code with an invalid license header and existing THIRD_PARTY_NOTICES.md entry
Output 8
Vendored code detected (Resilience4j RateLimiter) — missing required fields:
Copyright 2019 Robert Winkler and Bohdan StorozhukandLicensed under the Apache License, Version 2.0(per the existing
THIRD_PARTY_NOTICES.mdentry).Diff 9: Add newly-vendored code with an invalid license header and no THIRD_PARTY_NOTICES.md entry
Output 9
Vendored code detected (Guava RateLimiter) — missing required fields:
THIRD_PARTY_NOTICES.md. An entry needs to be added for Guava RateLimiter.Diff 10: Add newly-vendored code with an invalid license header, no THIRD_PARTY_NOTICES.md entry, and a new license type
Output 10
Vendored code detected (compact-writer) — missing required fields:
THIRD_PARTY_NOTICES.md. An entry needs to be added.THIRD_PARTY_NOTICES.md. Please verify it is compatible with Sentry's licensing policies:https://open.sentry.io/licensing/.
Diff 11: False positive
Output 11 (numbered as “1.” because it’s the first entry in the False Positives section)
[3] Local Warden runs.
Running Warden locally: Click to expand
Added to my
.*profile:[4] Pushed up diffs with attribution violations in a draft PR to vet the UX (see #5444).
📝 Checklist
sendDefaultPIIis enabled.🔮 Next steps
failOn/failCheckonce we've vetted behavior in the wild#skip-changelog