-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Omega: How should we handle attestations of work completed? #28
Comments
in-toto/attestation#77 is relevant here. We've been talking about defining human review predicates for code review probably starting with VCS reviews and dependency reviews like with |
Another minor pass at an attestation schema:
We should also take a look at OSCAL, but from a first glance, it seems very complicated and doesn't actually cover this use case, but I could be wrong there. |
I tried taking a pass at this spec as an in-toto attestation. Note that I've not included the signature itself here because this would be the payload to DSSE, which will include the actual signature. I'm a little unclear on a couple of the fields in your passes but I think some discussion will help clarify that for me! |
Thanks @adityasaky! Yep, I like this better, headed in the right direction. In the meantime, I made some other updates, including using the REGO policy engine from OpenPolicyAgent to set up consumption policies on the assertions. Example assertion (for security advisories): {
"_type": "https://in-toto.io/Statement/v0.1",
"operational": {
"execution_start": "2022-11-03T19:38:42.195814Z",
"execution_stop": "2022-11-03T19:38:42.196237Z",
"environment": {
"operator": null,
"hostname": "scovetta-xps",
"machine_identifier": "00000000-0000-0000-0000-cc96e5042f66"
}
},
"subject": [
{
"type": "https://github.com/ossf/alpha-omega/omega-analysis-toolchain/Types/PackageURL/v0.1",
"purl": "pkg:pypi/django@3.0.0"
}
],
"predicateType": "https://github.com/ossf/alpha-omega/v0.1",
"predicateGenerator": {
"name": "openssf.omega.security_advisories[deps.dev]",
"version": "0.1.0"
},
"predicate": {
"security_advisories": {
"medium": 4,
"high": 4,
"critical": 2,
"unknown": 10
}
},
"timestamp": "2022-09-08T08:30:01Z",
"signature": "MEQCIDrirKX+puM8agLVAPxO88kG7zcuI7GK9RMaLnh/bud/AiAcq+AjpYfamogV/JHBiC1ybXw4a9yIhUZNJmxu3YNiRw=="
} And then the policy that would check for this:
I'm certain that I'm not writing the rego policies the "right" way -- still getting my head around the language. Yep, I agree, whether the signature is DSSE, COSE, or whatever -- it should probably be outside of the main assertion above. |
Could you explain the distinction between the execution times in
I'm not very familiar with rego myself but I understand what you're checking for and it makes sense to me. Here's my pass at the attestation statement which would then go into DSSE as a payload.
|
Yep, that makes sense! I'll re-work to put it all within the predicate. |
I was think of the operational timestamps as being "as the assertion is being created" - I don't know that we need start/end, but some sort of a tie back to the system that created it, as opposed to the systems that generate the outer levels of the assertion. The "timestamp" was more "at what time were the facts true?" -- example, suppose security advisories are refreshed weekly -- we pull them on 11/7 but they're true as of 11/4. We'd have 11/7 in the operational section and 11/4 in the timestamp. Maybe we need better names for the fields, but that was my thinking... |
@scovetta, @adityasaky - Nice to meet you.
|
I think it would be worth adding a hash value for all of the scan artifacts in the operational data. In an ideal world, these attestations would be perfectly reproducible by others. From a more practical perspective, it lets there be a more verifiable connection between these public attestations and private results. |
Agree. There needs to be a connection between the public attestation and private results. |
Agree that hashes of inputs would be valuable here. I have a couple other incomplete thoughts here: What if we thought of "reproducibility" as a property that emerges from having multiple attestation authors? If three different trusted parties all provided a "no publicly known vulnerabilities" attestation, does that meet the underlying goal? Alternatively, how do we feel about reproducibility being less boolean and more nuanced? E.g. for anything where the analysis results could change over time. I'll keep noodling on this. |
Here's another iteration: {
"_type": "https://in-toto.io/Statement/v0.1",
"subject": [
{
"type": "https://github.com/ossf/alpha-omega/omega-analysis-toolchain/Types/PackageURL/v0.1",
"purl": "pkg:npm/express@4.4.3"
}
],
"predicateType": "https://github.com/ossf/alpha-omega/v0.1",
"predicate": {
"generator": {
"name": "openssf.omega.security_advisories[deps.dev]",
"version": "0.1.0"
},
"operational": {
"execution_start": "2022-11-07T23:32:21.749384Z",
"execution_stop": "2022-11-07T23:32:21.749801Z",
"environment": {
"operator": null,
"hostname": "scovetta-xps",
"machine_identifier": "00000000-0000-0000-0000-cc96e5042f66"
},
"timestamp": "2021-12-14T19:30:14Z"
},
"content": {
"security_advisories": {
"medium": 2
}
},
"evidence": {
"_type": "https://github.com/ossf/alpha-omega/types/evidence/v0.1",
"reproducibility": "temporal",
"source-type": "url",
"source": "https://deps.dev/_/s/npm/p/express/v/4.4.3",
"content": [
{
"source": "GHSA",
"sourceID": "GHSA-gpvr-g6gh-9mc2",
"sourceURL": "https://github.com/advisories/GHSA-gpvr-g6gh-9mc2",
"title": "No Charset in Content-Type Header in express",
"description": "Vulnerable versions of express do not specify a charset field in the content-type header while displaying 400 level response messages. The lack of enforcing user's browser to set correct charset, could be leveraged by an attacker to perform a cross-site scripting attack, using non-standard encodings, like UTF-7.\n\n\n## Recommendation\n\nFor express 3.x, update express to version 3.11 or later.\nFor express 4.x, update express to version 4.5 or later. ",
"referenceURLs": [
"https://nvd.nist.gov/vuln/detail/CVE-2014-6393",
"https://github.com/advisories/GHSA-gpvr-g6gh-9mc2",
"https://www.npmjs.com/advisories/8",
"https://bugzilla.redhat.com/show_bug.cgi?id=1203190",
"https://nodesecurity.io/advisories/express-no-charset-in-content-type-header"
],
"severity": "MEDIUM",
"gitHubSeverity": "MODERATE",
"scoreV3": 6.1,
"aliases": [
"CVE-2014-6393"
],
"disclosedAt": 1540315374,
"observedAt": 1639539014
},
{
"source": "NSWG",
"sourceID": "NSWG-ECO-8",
"sourceURL": "https://github.com/nodejs/security-wg/blob/main/vuln/npm/8.json",
"title": "No Charset in Content-Type Header",
"description": "Vulnerable versions of express do not specify a charset field in the content-type header while displaying 400 level response messages. The lack of enforcing user's browser to set correct charset, could be leveraged by an attacker to perform a cross-site scripting attack, using non-standard encodings, like UTF-7.",
"referenceURLs": [],
"severity": "MEDIUM",
"gitHubSeverity": "UNKNOWN",
"scoreV3": 5.4,
"aliases": [
"CVE-2014-6393"
],
"disclosedAt": 1445040000,
"observedAt": 1639538873
}
]
}
}
} |
@scovetta - I think if three well-known independent researchers put out the same attestation for something, that's probably sufficient regarding the attestation credibility. The hash of the artifacts for reproducibility on that front would mainly be just an extra layer of protection in that case. e.g. for situations where two attestations each attest 1 critical finding, but they are attesting different findings. I think more importantly, though, having a hash of the artifacts gives assurance to anyone who is digging into the attestation, such as publicly attesting to a vulnerability in a piece of open source, then privately going to the authors and giving them the details and artifacts. The authors can independently verify that these artifacts hash to the same value in the public attestation. EDIT: putting the evidence as in your latest example does functionally serve the same purpose, if that's not an issue of too much information in the attestation. |
@antrompl - If I understand your perspective, does separating the findings from attestation make sense. Having the attester information + reference to the findings being attested in the payload, will that work? while keeping the findings as a separate in-toto statement or file that can be referred in attestation statement. I am assuming that we still need the findings in some form to allow policy evaluations on the findings. |
About reproducibility, should we not go after the characteristics of the code and the scanner? For example, a tool with a rule set (maybe identified by some hash) was run on a specific repository (with a commit hash) and we found a set of things. Later in order to reproduce, we will have to run the same tool on the same code. If we are running on a different code (different branch, different commit), then we will not be able to reproduce. Maybe these are more important than the time. The time is more important for recency information. So that an older attestation on a branch will be replaced by a newer attestation (more recent commit hash). |
@openrefactory Agree - we should have enough information in the assertion to probably re-create it, but there are a bunch of reasons why this might not be possible:
But we should try our best anyway -- How would:
What else do you think we should include? |
@dasarpjonam - My only concern regarding evidence in the attestation is scalability. The more evidence that can be provided the better, as long as it isn't "Too Much", but I have no idea where that line would be. So, separating it out makes sense to me mainly as a way to separate the scalability problem away from the attestations themselves. I do see evidence as a Nice to Have, rather than a Need to Have, because if someone sees an attestation and wants to get more information about an attestation, they will always have two options available: do their own security analysis and try to replicate the findings using the same parameters (and then post their own attestation if they want), or, contact whoever posted the attestation and seek the information directly from them. |
Another round of updates in:
{
"_type": "https://in-toto.io/Statement/v0.1",
"predicate": {
"content": {
"file_extensions": [
"",
"md",
"json",
"js",
"npmignore"
],
"programming_languages": [
"package.json",
"Unknown",
"javascript"
]
},
"evidence": {
"_type": "https://github.com/ossf/alpha-omega/types/evidence/v0.1",
"content": {
"output": {
"appVersion": "Microsoft Application Inspector 1.6.16-beta+7d93aa8d85",
"metaData": {
"CPUTargets": [],
"OSTargets": [],
"appTypes": [
"web.application",
"web.service"
],
"applicationName": "/opt/src/npm/express/4.4.3",
"authors": "\"",
"cloudTargets": [],
"dateScanned": "11/09/2022 22:58:20",
"description": "Sinatra inspired web development framework",
"detailedMatchList": [
{
"confidence": "High",
"endLocationColumn": 24,
"endLocationLine": 6,
"excerpt": " */\n\nvar escapeHtml = require('escape-html');\nvar http = require('http');\nvar path = require('path');\nvar mixin = require('utils-merge');\nvar sign = require('cookie-signature').sign;\n",
"fileName": "/opt/src/npm/express/4.4.3/reference-binaries/npm-express@4.4.3.tgz/npm-express@4.tar/package/lib/response.js",
"language": "javascript",
"pattern": "require\\(['\\\"](http|https|request|axios|superagent|got)",
"ruleDescription": "Network Connection: HTTP",
"ruleId": "AI032400",
"ruleName": "Network Connection: HTTP",
"sample": "require('http",
"severity": "Moderate",
"startLocationColumn": 11,
"startLocationLine": 6,
"tags": [
"OS.Network.Connection.Http"
],
"type": "Regex"
},
...
],
"filesSkipped": 0,
"filesTimeOutSkipped": 0,
"filesTimedOut": 0,
"languages": {
"Unknown": 8,
"javascript": 24,
"package.json": 2
},
"lastUpdated": "01/01/0001 00:00:00",
"outputs": [],
"packageTypes": [],
"sourcePath": "/opt/src/npm/express/4.4.3",
"sourceVersion": "4.4.3",
"tagCounters": [
{
"count": 44,
"tag": "Metric.Code.Function.Defined"
},
{
"count": 2,
"tag": "Metric.Code.URL"
},
{
"count": 2,
"tag": "Metric.Code.Logging.Call"
},
{
"count": 14,
"tag": "Metric.Code.Exception.Caught"
}
],
"targets": [],
"timedOut": false,
"totalFiles": 34,
"totalMatchesCount": 78,
"uniqueDependencies": [],
"uniqueMatchesCount": 17,
"uniqueTags": [
"Application.Type.Web.Service",
"CloudServices.Code.Repo.GitHub",
"Cryptography.Encoding.Base64",
"Cryptography.Encryption.General",
"Cryptography.HashAlgorithm.Legacy",
"Data.Sensitive.Credentials",
"Data.Sensitive.Identification",
"Data.Sensitive.Secret",
"Metadata.Application.Author",
"Metadata.Application.Description",
"Metadata.Application.Version",
"OS.Network.Connection.General",
"OS.Network.Connection.Http",
"OS.Network.Connection.Http.Ajax",
"OS.Network.Connection.Socket",
"OS.Process.DynamicExecution",
"WebApp.Cookies.Attr.Expires"
]
},
"resultCode": 0
}
},
"reproducibility": "high",
"source": "docker run --rm -t -v /tmp/omega-dee912:/opt/export --env-file .env openssf/omega-toolshed:latest pkg:npm/express@4.4.3",
"source-type": "command"
},
"generator": {
"name": "openssf.omega.programming_languages",
"version": "0.1.0"
},
"operational": {
"environment": {
"hostname": "scovetta-xps",
"machine_identifier": "00000000-0000-0000-0000-cc96e5042f66",
"operator": null
},
"execution_start": "2022-11-09T23:01:31.165813Z",
"execution_stop": "2022-11-09T23:01:31.166015Z",
"timestamp": "2022-11-09T22:58:20Z"
}
},
"predicateType": "https://github.com/ossf/alpha-omega/v0.1",
"subject": [
{
"purl": "pkg:npm/express@4.4.3",
"type": "https://github.com/ossf/alpha-omega/omega-analysis-toolchain/Types/PackageURL/v0.1"
}
]
} Still to do (~today):
|
@antrompl - My view on attestation is that it is source of security score of a package based on some standard. Attestation identity provides credibility. The findings are the information that the attestation identity used to provide attestation. So keeping it separate makes sense to me. |
|
Thanks @dasarpjonam!
Yes, the subject should contain both the reference (e.g. pkg:npm/express@4.4.3) as well as the hash of the underlying files scanned. For the tool, I'm not sure the hash is needed, but certainly a version reference that's enough to get back to it. Though maybe -- I suppose versions aren't always clear/immutable, but I wonder how slippery of a slope this is (do I need to include the architecture of the box the tool was run from? environment variables present? host distro?) -- I'm not sure where to stop -- for PoC/MVP, maybe stopping with "Application Inspector version X" is enough?
Yes, "subject" is the target of analysis (express). From the perspective of the assertion above, "Application Inspector" is just an implementation detail -- it could be swapped out in favor of something else, with the assertion remaining the same. |
I have quick comment on the CLI API, on single vs. double dash, see: #35 |
Does anyone have thoughts on where we should store assertions for PoC? We have local SQLite and later today a local file system (git repo) will be pushed in. Do we need an actual running service for get/put operations? Also, I added a few thousand assertions at https://github.com/scovetta/test-assertion1/tree/main/assertions for easier testing. |
@scovetta can you link me to where you're generating the attestation in the scanner? |
Sure thing, @adityasaky - code starts here: https://github.com/ossf/alpha-omega/tree/scovetta/attest-2/omega/oaf and the main entry point is omega/oaf.py. the generators themselves are in omega/assertions. |
Thanks! On a side note, would you be interested in some of these bits (especially base.py) being upstreamed to https://github.com/in-toto/in-toto? In our Go implementation, we define the models for in-toto attestations but not so yet in the Python implementation. |
@adityasaky Of course! I can't guarantee things will be stable on our end yet, but I'd be thrilled to align and share with in-toto. |
For Omega, we're targeting the top 10,000 projects, using tooling (Omega Analysis Toolchain, etc.) and triage.
We need to provide some evidence that work was completed, both for internal tracking ("did we already look at X?") and external assurance ("has X been reviewed and believed to be safe to use?").
Let's use this thread to discuss how we can do this.
Some preliminaries:
We'll want to assert that some activity took place. That activity could be a tool execution (with some result), a manual action ("reviewing a thing"), or some combination of the two. For either, the target could be a physical artifact (foo.tar.gz) or something else, like a GitHub repo (like Scorecard results).
In-toto attestation seems to be a reasonable vehicle for providing attestations, but delegates the actual predicate content to the user (us) -- though I haven't read the specs in a while, so I might be wrong here.
Some questions:
I came up with a slightly frankenstein'ed attestation format, to see what this could look like. I'll push the code to generate this up shortly.
Thoughts?
The text was updated successfully, but these errors were encountered: