Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To discuss: Issues and prioritization for a '1.0' #62

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
97 changes: 97 additions & 0 deletions design/1.0.md
@@ -0,0 +1,97 @@
I recently opened a Kubernetes Enhancement Proposal to introduce `go-flow-levee` during testing as a defense against accidental credential logging.
See [KEP-1933](https://github.com/kubernetes/enhancements/pull/1936) details.
Such integration demands a stable API and well-defined scope.
immutableT marked this conversation as resolved.
Show resolved Hide resolved

I propose the following.
Portions may already be implemented, but are included here as a "big picture."

# 1.0 Targets
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved

## Basic taint propagation detection

Given a spec defining sources and sinks (below), diagnostics are reported for
any direct call to a sink that includes a source. Additionally, we detect any
taint propagation from a source to a sink within the scope of a single function.

Analysis does not explicitly track across the use of reflection.

## Analysis Configuration

Configuration will consist of the following key components:

* Identification of sources, sinks, and sanitizers
* Identification of analysis scope, e.g. to skip third-party or testing packages.
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved
* Specify diagnostics to be ignored, e.g false-positives
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved

### Identification of sources, sinks, and sanitizers

Much of our work is directly with `ssa.Value`s and `ssa.Instruction`s.
Whether those values or instructions represent one element or another,
the configuration to identify these members should be as uniform as possible.
In that light, specification should follow:

#### ValueSpecifier
A ValueSpecifier marks an `ssa.Value`, e.g., for identification as a source or as the safe result of a sanitizer.
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved
Values should be specifiable via any or all of:

* Specifier name: For identification during reporting
* Type / Field: Package path and type name, optional field name
* Field tags
* Scope: local variable, free variable, or global
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved
* Context: within a specific package or function call. See CallSpecifier below.
* Is a reference
* Is "reference-like," e.g., slices, maps, chans.
* Const value (e.g., `"PASSWORD"`)
* Argument position (for use below)

PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved
#### CallSpecifier
A CallSpecifier marks an `*ssa.Call`, e.g., as a sink or sanitizer.
Calls should be specifiable via any or all of:

* Specifier name: For identification during reporting
* Symbol: package path, function name, optional receiver name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to handle the following cases:

  1. funcs stored in struct fields, e.g.
type MySinker struct {
  sinkFunc func(args ...interface{})
}
  1. funcs passed as function parameters:
func callSink(arg interface{}, sink func(...interface{})) {
  sink(arg)
}

In theory, since funcs are first class in Go, the actual range of cases we could potentially handle is much wider, e.g. a user could put functions in a map[string]func(), store functions in local variables, etc.

There are some cases we already handle, e.g.:

func Test(s core.Source) {
  f := core.Sink
  f(s) // a diagnostic is produced here
}

But it is quite easy to throw our analysis off in this case:

func Test(s core.Source) {
  var f func(...interface{})
  if true {
    f = core.Sink
  }
  f(s) // no diagnostic
}

My opinion: I think we should handle funcs stored in struct fields, because that seems to be a fairly common pattern. The other cases seem more outlandish and seem like they would require some kind of cross-function or even whole-program analysis, so I think for a 1.0 they are out of scope.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion: I think we should handle funcs stored in struct fields, because that seems to be a fairly common pattern. The other cases seem more outlandish and seem like they would require some kind of cross-function or even whole-program analysis, so I think for a 1.0 they are out of scope.

We currently rely on the static dispatch in our call matching. In effect, we cover only case (a) of the four kinds of Call described in the docs.

I think we should make a reasonable effort to cover all four cases. (b) seems achievable in our current framework, and (c) seems a bit out-of-scope. Dynamic dispatch in case (d) is probably a whole can of worms. We could rely on pointer.PointsTo to get a set of possible functions, but that predicates whole-program analysis.

I don't think the fact that a given func is stored in a struct field will make the problem any simpler. I'd be inclined to put this at a "post-1.0 but preferably 1.1" sort of target.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rereading this, I think well-defined specifiers catch both of these cases. It's just a difference of a sink being identified by a ValueSpecifier rather than a CallSpecifier. Pseudo-config in yaml for readability and ignoring implementation details:

Sinks:
  ValueSpecifiers:
    - name: "MySinker matcher"
      symbol: "path/to/my/pkg.MySinker.sinkFunc"
    - name: "log wrapper"
      context:
       # This is a CallSpecifier
        symbol: "path/to/my/pkg.callSink"
      argPosition: 2

[edit:] We still have the aforementioned issue of currently only following statically-linked calls. I'm just saying I think the problem is a lot more tractable than I originally thought.

* Context: A CallSpecifier indicating scope, such as package or specific functions.
* Based on argument value and position (ValueSpecifier).
* Based on return values (ValueSpecifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to identify a sink/sanitizer based on the return value. Could you give an example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either. I was just throwing stuff out there.

All the instances that come to my mind are specific runtime values, like a buffer writer returning non-zero bytes written. But we won't see that in SSA.

Probably best to scratch from the list.


## Inference

Inference represents the scope of what a user does *not* need to configure.
For instance, we can currently detect getter methods and mark as propagating
a marked field without requiring any explicit configuration.

If a source type is aliased, the alias is properly identified as equivalent as the aliased source type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, we already handle aliases e.g. type MySource = core.Source, but not named types e.g. type MySource core.Source. (Interestingly, in the short investigation I just did, the SSA was exactly the same for both cases).

Do we want named types to also count as "aliases"? Playing devil's advocate: what if a user defines a named type type SafeSource core.Source and uses it to explicitly differentiate between sources that have and have not been properly sanitized?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want named types to also count as "aliases"?

Opinion: I think we should include named types. The only difference between the two types would be the method set associated, which to me suggests that the kind of data stored is reasonably similar. Also, mentioned below, I think it would be the easier way to handle the fact that field tags are naturally inherited in a way that a source specified by type path and name would not be.

Although...

Playing devil's advocate: what if a user defines a named type type SafeSource core.Source and uses it to explicitly differentiate between sources that have and have not been properly sanitized?

I hadn't thought of that. I think I might prefer an edge-case false-positive to a systematic false-negative, though.

Also and alternatively: this could be enabled/disabled in configuration. And/or we could allow a user to include a safe-list for types that should not be considered sources, though that feels like it's getting a bit over-elaborate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you make a good case for named type inference.

The scenario I proposed is sufficiently unlikely that we should probably wait until we actually hit it before we think too deeply about how to handle it.

Also, as a user I think I would expect a named type that aliases a Source to also be considered a Source. Of course, if it weren't, it wouldn't be a big deal: I would just need to specify it in the config. But if I forget to do that, then I might think everything is fine when in reality named type Sources are reaching sinks.


If `Source` is identified as a source type, collection types based on that should also be identified as a source type.

## Internal Process

Add benchmarks.

# Beyond a 1.0

The following seem like significant improvements but seem larger in scope than what constitutes a stable 1.0.
These items are open to interpretation and warrant discussion.

## Improved Inference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume propagation/sanitization involving collections would fall under here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...

I think propagation in general should be part of 1.0, and I believe you've made strides to cover most collections and are working on the remaining?

I think propagation and sanitization of specific elements within a collection might be a Beyond 1.0 target.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Important distinction here. Yes, propagation/sanitization of specific elements is what I had in mind.

I think for now all that's left is chans.


* Detect partial sanitization of collections.
* Detect sanitization via zero-value.
* Explicit sanitization of `type Source struct{ data string }` is easily done via `s.data = ""`.
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved
* Alternatively, a user may specify redaction placeholders, e.g. `"REDACTED"` rather than require a zero value.
* A type defined from a source type (e.g., `type Foo core.Source`) should also be considered a source type.
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved
* Note: as tags take part in type identify of the underlying struct literal, field tags are "inherited" in defined types based on existing types.
This may result in conflicting current behavior between those sources identified by tag and those identified by path/package/name.
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved
* A type which contains a known source type as a field should itself be a source type and associated field. E.g., `type Wrapper struct { data core.Source }`
* Current effort under the umbrella of *cross functional analysis* greatly refines inference of safety in scenarios that would currently produce a finding.
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved

## Reporting

At present, reporting of Diagnostics is a generic `a source has reached a sink` message with source and sink positions included.
As we iterate to include wider inference and definitions of sources, sinks, cross-function analysis, etc, the path between a source and a sink can become ambiguous.
If much inference is targeted for a 1.0 release, reporting will need to be improved to remain meaningful.
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved

## Extensibility

A user may have highly specific considerations for the definition of a source/sink/sanitizer and may wish to implement custom specification code that interacts directly with `ssa` / `types` packages.
We should expose an extensible interface for this purpose.
PurelyApplied marked this conversation as resolved.
Show resolved Hide resolved