Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collecting metadata for impacted functions in advisories #68

Closed
tarcieri opened this issue Nov 28, 2018 · 15 comments
Closed

Collecting metadata for impacted functions in advisories #68

tarcieri opened this issue Nov 28, 2018 · 15 comments

Comments

@tarcieri
Copy link
Member

RustPräzi is a crater-like tool which builds a call graph across all of the crates published to crates.io:

https://internals.rust-lang.org/t/prototype-dev-tool-rustprazi-a-tool-to-build-an-entire-call-graph-of-crates-io/8912

It would be interesting to use this tool, perhaps in conjunction with crates-audit, to generate a list of impacted crates based on this call graph.

To do that we'd need to collect structured information about vulnerable functions in advisories. Looking at our existing advisories, that information is often already there, but buried in an unstructured prose description, e.g.:

https://rustsec.org/advisories/RUSTSEC-2018-0003

If an iterator passed to SmallVec::insert_many panicked in Iterator::next, destructors were run during unwinding while the vector was in an inconsistent state, possibly causing a double free (a destructor running on two copies of the same value).

I think it'd be good to come up with a structured format for this sort of information that can be fed into tools like RustPräzi to determine impacted users of vulnerable crates.

@tarcieri
Copy link
Member Author

tarcieri commented Dec 13, 2018

I think we should just collect a canonical path starting with the crate name, and sans any parameters as an optional attribute of the advisory.

Using the advisory from the above description as an example:

affected_functions = ["smallvec::SmallVec::insert_many"]

@Inventitech
Copy link
Contributor

Sorry to chime in on this after it's been merged.

One problem with the current design is that it does not allow for the specification of a version.

Why could this be important?
Some security bugs affect quite a broad change of versions of a crate, and as such, the vulnerability might itself be subject to refactoring, renames, etc. So just specifying a name in affected_functions is perhaps overly simplistic.

Example

v1 introduce vulnerable function f1
v2 rename f1 to f2
v3 introduce new function f1, which is not vulnerable
v4 fix vulnerability in f2

Just naming all affected_functions f1 and f2 is not a great solution, because f1 would be considered vulnerable from v2 to v3, even though we know it isn't.

What would the change proposal be?
For each function, add a range of affected versions.

affected_functions = [ ["f1","<2"], ["f2",">2 <4"]]

What are the consequences of the proposed change?

  • It makes the declaration of this array more complicated.
  • It increases precision and resolves possible ambiguities. I don't know how often this would occur in practice.

Another question would be whether we would want to restrict this to functions, only. Couldn't a data structure have a vulnerability, too?

@tarcieri
Copy link
Member Author

tarcieri commented Dec 21, 2018

Haha, no worries.

Modifying the existing affected_functions might be tricky, but we can just abandon it, especially as this is a good reason:

Another question would be whether we would want to restrict this to functions, only. Couldn't a data structure have a vulnerability, too?

I definitely agree with this. There have been a few advisories filed which don't pertain to a function, but e.g. C FFI-related memory corruption (e.g. RUSTSEC-2018-0011).

How about this:

  • Deprecate and remove affected_functions. We can delete the existing ones.
  • Add a new attribute called affected_paths. How about using a TOML table where the keys are VersionReqs? e.g.
[affected_paths]
">= 1.0.0, < 1.2.3" = ["mycrate::foo::bar"]
"1.2.3" = ["mycrate::foo::baz"]

Or (these are two different TOML syntaxes for the same thing):

affected_paths."1.2.3" = [
  "mycrate::foo::bar",
  "mycrate::foo::baz",
  "mycrate::foo::quux",
]

@Inventitech
Copy link
Contributor

That sounds good!

Personally, I would prefer the first affected_paths syntax you have given.

@tarcieri
Copy link
Member Author

Both syntaxes will work, although we can document the first one.

@Shnatsel
Copy link
Member

Shnatsel commented Jan 7, 2019

We need to decide and document whether affected_paths should list public functions so that humans could cross-reference them with their codebase, or private functions that are the root cause of the issue because that's what enables the most precise machine analysis.

@tarcieri
Copy link
Member Author

tarcieri commented Jan 8, 2019

It could potentially list both. Curious what @Inventitech thinks

@Shnatsel
Copy link
Member

Shnatsel commented Jan 8, 2019

Listing both would probably result in conflicting information, too much room for human error, and increased burden for filing advisories.

@Inventitech
Copy link
Contributor

Hmm ... hard question.

Pros for listing public affected paths are that these are easy to consume for most users.
Problems are that whoever writes the advisory has to build a call graph or data flow graph to see where they are being used (as you said), so there is more potential for wrong hard-coded information in the advisory.

Pros for listing just the offending (maybe private, maybe public) functionality is that it is easy to do for the advisory creator, it is easiest to check for correctness and it keeps the advisories short (this might be an important practical argument). In theory, once we have the directly affected paths, we can just auto-expand to all affected paths in the module. Of course, the danger is that people take a quick manual look at the advisory, see they're not using anything of what is listed under affected_paths directly and wrongly assume they're safe. Maybe a simple fix of this would be to rename the attribute to root_cause_paths. Also, another argument for listing what is directly affected might be Rust's re-export feature, or if reflection will be further developed, one can no longer assume it is impossible to get access to private paths. Thus it seems that the root_cause_paths solution is also more future proof.

I also discussed this with @jhejderup, and like you, we see pros and cons for both approaches, but in the end, would only mention the root cause paths. Main arguments are that it is easy for the advisory creator, it keeps the advisories short, it does not include parts that can be auto-generated (transitive uses), and it seems to be more future-proof.

@tarcieri
Copy link
Member Author

An alternative to listing both under [affected_paths] is having advisory reporters continue to document the most relevant user-facing functions in the advisory description, which is what they're doing already organically.

@tarcieri
Copy link
Member Author

tarcieri commented Mar 3, 2019

We tried using the new syntax in #89, which provides a pretty good real-world example:

[affected_paths]
">= 0.4.0, <= 0.10.0"  = ["safe_transmute::guarded_transmute_vec_permissive"]
"= 0.10.0"             = ["safe_transmute::guarded_transmute_to_bytes_vec"]

This now has me wondering if instead of a map from paths to VersionReqs, it should be the other way around:

[affected_paths]
"safe_transmute::guarded_transmute_vec_permissive" = [">= 0.4.0, <= 0.10.0"]
"safe_transmute::guarded_transmute_to_bytes_vec" = ["= 0.10.0"]

(although ironically this seems to be breaking GitHub's TOML parser?)

Perhaps we should use an array of tables instead?

[[affected_path]]
name = "safe_transmute::guarded_transmute_vec_permissive"
versions = [">= 0.4.0, <= 0.10.0"]

[[affected_path]]
name = "safe_transmute::guarded_transmute_to_bytes_vec"
versions = ["= 0.10.0"]

@Inventitech
Copy link
Contributor

Inventitech commented Mar 4, 2019

This now has me wondering if instead of a map from paths to VersionReqs, it should be the other way around:

IMHO, it should (see my post #68 (comment)). Anyway, this is only a minor change.

Your other suggestion seems a bit more verbose than needs-be, by repeating [[affected_path]].

@tarcieri
Copy link
Member Author

tarcieri commented Mar 4, 2019

@Inventitech another way of writing the same thing is:

affected_paths = [
    { name = "safe_transmute::guarded_transmute_vec_permissive", versions = [">= 0.4.0, <= 0.10.0"] }
]

Your version is pretty close to that, aside from using nested arrays instead of an array of tables.

The reason for the more verbose form is the less verbose form seems to freak out TOML parsers (see the other version in my GitHub comment).

@vks
Copy link
Contributor

vks commented Sep 4, 2019

@tarcieri How would you specify implemented methods of a trait? crate::<Type as Trait>::method?

@tarcieri
Copy link
Member Author

tarcieri commented Sep 4, 2019

@vks right now the path parser doesn't support parameters, so it'd be crate::Type::method.

I'm not sure it's worth the effort to disambiguate in the case there are methods with the same name in multiple traits impl'd on the same type: it's fairly uncommon, and even if it were to happen that there's a vulnerability in a method with the same name impl'd in multiple traits, where the code in question happens to be using the not-vulnerable method of a different trait, the worst case is a false positive in the taint analysis, which is the state of affairs for everything today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants