Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
cc44e89
C#: Introduce support for Negative summary models.
michaelnebel Jul 21, 2022
0cf4e64
C#: Update test cases with new empty query predicate.
michaelnebel Jul 21, 2022
87c7dd9
C#: Disregard destructors in model generation.
michaelnebel Jul 21, 2022
9c6bcec
C#: Add model generator testcase for Negative summaries for abstract …
michaelnebel Jul 26, 2022
136bdeb
C#: Add test for Negative summary generation.
michaelnebel Aug 3, 2022
7d46d15
C#: Update summary model generation test output.
michaelnebel Aug 3, 2022
929f1b3
C#: Prepare for .NET negative summaries and use in UnsupportedExterna…
michaelnebel Aug 3, 2022
0578d3e
C#: Improve python script to allow generation of Negative summary mod…
michaelnebel Aug 3, 2022
15c05e2
Java: Re-factor specialized CSV predicates into overrides of the row …
michaelnebel Jul 22, 2022
5255e16
Java: Sync files and make framework specific code.
michaelnebel Aug 3, 2022
120fb25
Java: Sync files and model generator and tests.
michaelnebel Aug 3, 2022
fbc0e6a
Ruby: Sync files and make dummy negative summary implementation.
michaelnebel Aug 3, 2022
8db454a
Swift: Sync files and make dummy negative summary implementation.
michaelnebel Aug 3, 2022
19469a2
C#: Re-factor CSV validation into a separate file.
michaelnebel Aug 4, 2022
4c59cfb
C#: Re-factor the invalidModelRow predicate.
michaelnebel Aug 4, 2022
3315d76
C#: Introduce negative summary column count validation.
michaelnebel Aug 4, 2022
053460f
C#: Introduce validation of negative summaries.
michaelnebel Aug 4, 2022
4939439
Java: Re-factor CSV Validation into standalone module.
michaelnebel Aug 4, 2022
9f9129d
Java: Introduce column validation for negative summaries.
michaelnebel Aug 4, 2022
37f01fe
Go: Re-factor CSV validation into separate file.
michaelnebel Aug 4, 2022
2c2e09b
Go: Add summary model validation on the kind column.
michaelnebel Aug 4, 2022
54e85ff
Swift: Remove some of the copied (and dead) language specific (to C#)…
michaelnebel Aug 4, 2022
9b16192
Swift: Re-factor CsvValidation into a separate file.
michaelnebel Aug 4, 2022
00d1b86
C#: Add negative generated .NET Runtime models.
michaelnebel Aug 4, 2022
ad671f7
C#: Update test expected output after addition of negative summaries.
michaelnebel Aug 4, 2022
8949f71
C#: Fixup CSV validation refactor.
michaelnebel Aug 4, 2022
581824a
C#/Java/Ruby/Swift: Fix various typos.
michaelnebel Aug 4, 2022
d2087ec
C#: Update negative summaries reported by FlowSummaries test after re…
michaelnebel Aug 11, 2022
37976d5
C#/Java/Go/Swift: Move CsvValidation back into ExternalFlow.
michaelnebel Aug 11, 2022
c3e21e8
C#: Move NegativeSummary.qll to the internal folder.
michaelnebel Aug 11, 2022
160ae93
C#/Java/Ruby/Swift: Fix typo in QL doc.
michaelnebel Aug 11, 2022
592b60d
C#: Fix rebase error merge (validation on encryption kind was un-inte…
michaelnebel Aug 12, 2022
30d5545
C#/Java: Fix some QL doc spelling typos.
michaelnebel Aug 12, 2022
2e273f2
C#: Re-arange the import order, such that CsvValidation follows Exter…
michaelnebel Aug 12, 2022
fbc3680
C#: Fix merge issues after re-base.
michaelnebel Aug 16, 2022
f728ddf
C#: Update negative summaries (there has been a rebase since last upd…
michaelnebel Aug 18, 2022
761ed28
C#/Java/Ruby/Swift: Address review comments.
michaelnebel Aug 18, 2022
51e7b08
C#: Update negative models.
michaelnebel Aug 18, 2022
e446eab
C#: Update C# Flowsummaries test expected out (Negative models has be…
michaelnebel Aug 19, 2022
a412c95
Java: One implementation of the interface has no flow (which seems un…
michaelnebel Aug 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 137 additions & 70 deletions csharp/ql/lib/semmle/code/csharp/dataflow/ExternalFlow.qll
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@
*
* The CSV specification has the following columns:
* - Sources:
* `namespace; type; subtypes; name; signature; ext; output; kind`
* `namespace; type; subtypes; name; signature; ext; output; kind; provenance`
* - Sinks:
* `namespace; type; subtypes; name; signature; ext; input; kind`
* `namespace; type; subtypes; name; signature; ext; input; kind; provenance`
* - Summaries:
* `namespace; type; subtypes; name; signature; ext; input; output; kind`
*
* `namespace; type; subtypes; name; signature; ext; input; output; kind; provenance`
* - Negative Summaries:
* `namespace; type; name; signature; provenance`
* The interpretation of a row is similar to API-graphs with a left-to-right
* reading.
* 1. The `namespace` column selects a namespace.
Expand Down Expand Up @@ -163,11 +164,27 @@ class SummaryModelCsv extends Unit {
abstract predicate row(string row);
}

private predicate sourceModel(string row) { any(SourceModelCsv s).row(row) }
/**
* A unit class for adding negative summary model rows.
*
* Extend this class to add additional flow summary definitions.
*/
class NegativeSummaryModelCsv extends Unit {
/** Holds if `row` specifies a negative summary definition. */
abstract predicate row(string row);
}

/** Holds if `row` is a source model. */
predicate sourceModel(string row) { any(SourceModelCsv s).row(row) }

private predicate sinkModel(string row) { any(SinkModelCsv s).row(row) }
/** Holds if `row` is a sink model. */
predicate sinkModel(string row) { any(SinkModelCsv s).row(row) }

private predicate summaryModel(string row) { any(SummaryModelCsv s).row(row) }
/** Holds if `row` is a summary model. */
predicate summaryModel(string row) { any(SummaryModelCsv s).row(row) }

/** Holds if `row` is a negative summary model. */
predicate negativeSummaryModel(string row) { any(NegativeSummaryModelCsv s).row(row) }

/** Holds if a source model exists for the given parameters. */
predicate sourceModel(
Expand Down Expand Up @@ -230,6 +247,20 @@ predicate summaryModel(
)
}

/** Holds if a summary model exists indicating there is no flow for the given parameters. */
predicate negativeSummaryModel(
string namespace, string type, string name, string signature, string provenance
) {
exists(string row |
negativeSummaryModel(row) and
row.splitAt(";", 0) = namespace and
row.splitAt(";", 1) = type and
row.splitAt(";", 2) = name and
row.splitAt(";", 3) = signature and
row.splitAt(";", 4) = provenance
)
}

private predicate relevantNamespace(string namespace) {
sourceModel(namespace, _, _, _, _, _, _, _, _) or
sinkModel(namespace, _, _, _, _, _, _, _, _) or
Expand Down Expand Up @@ -286,38 +317,7 @@ predicate modelCoverage(string namespace, int namespaces, string kind, string pa

/** Provides a query predicate to check the CSV data for validation errors. */
module CsvValidation {
/** Holds if some row in a CSV-based flow model appears to contain typos. */
query predicate invalidModelRow(string msg) {
exists(
string pred, string namespace, string type, string name, string signature, string ext,
string provenance
|
sourceModel(namespace, type, _, name, signature, ext, _, _, provenance) and pred = "source"
or
sinkModel(namespace, type, _, name, signature, ext, _, _, provenance) and pred = "sink"
or
summaryModel(namespace, type, _, name, signature, ext, _, _, _, provenance) and
pred = "summary"
|
not namespace.regexpMatch("[a-zA-Z0-9_\\.]+") and
msg = "Dubious namespace \"" + namespace + "\" in " + pred + " model."
or
not type.regexpMatch("[a-zA-Z0-9_<>,\\+]+") and
msg = "Dubious type \"" + type + "\" in " + pred + " model."
or
not name.regexpMatch("[a-zA-Z0-9_<>,]*") and
msg = "Dubious member name \"" + name + "\" in " + pred + " model."
or
not signature.regexpMatch("|\\([a-zA-Z0-9_<>\\.\\+\\*,\\[\\]]*\\)") and
msg = "Dubious signature \"" + signature + "\" in " + pred + " model."
or
not ext.regexpMatch("|Attribute") and
msg = "Unrecognized extra API graph element \"" + ext + "\" in " + pred + " model."
or
not provenance = ["manual", "generated"] and
msg = "Unrecognized provenance description \"" + provenance + "\" in " + pred + " model."
)
or
private string getInvalidModelInput() {
exists(string pred, AccessPath input, string part |
sinkModel(_, _, _, _, _, _, input, _, _) and pred = "sink"
or
Expand All @@ -332,9 +332,11 @@ module CsvValidation {
part = input.getToken(_) and
parseParam(part, _)
) and
msg = "Unrecognized input specification \"" + part + "\" in " + pred + " model."
result = "Unrecognized input specification \"" + part + "\" in " + pred + " model."
)
or
}

private string getInvalidModelOutput() {
exists(string pred, string output, string part |
sourceModel(_, _, _, _, _, _, output, _, _) and pred = "source"
or
Expand All @@ -343,58 +345,123 @@ module CsvValidation {
invalidSpecComponent(output, part) and
not part = "" and
not (part = ["Argument", "Parameter"] and pred = "source") and
msg = "Unrecognized output specification \"" + part + "\" in " + pred + " model."
result = "Unrecognized output specification \"" + part + "\" in " + pred + " model."
)
}

private string getInvalidModelKind() {
exists(string row, string kind | summaryModel(row) |
kind = row.splitAt(";", 8) and
not kind = ["taint", "value"] and
result = "Invalid kind \"" + kind + "\" in summary model."
)
or
exists(string row, string kind | sinkModel(row) |
kind = row.splitAt(";", 7) and
not kind = ["code", "sql", "xss", "remote", "html"] and
not kind.matches("encryption-%") and
result = "Invalid kind \"" + kind + "\" in sink model."
)
or
exists(string row, string kind | sourceModel(row) |
kind = row.splitAt(";", 7) and
not kind = ["local", "file"] and
result = "Invalid kind \"" + kind + "\" in source model."
)
}

private string getInvalidModelSubtype() {
exists(string pred, string row |
sourceModel(row) and pred = "source"
or
sinkModel(row) and pred = "sink"
or
summaryModel(row) and pred = "summary"
|
exists(string b |
b = row.splitAt(";", 2) and
not b = ["true", "false"] and
result = "Invalid boolean \"" + b + "\" in " + pred + " model."
)
)
}

private string getInvalidModelColumnCount() {
exists(string pred, string row, int expect |
sourceModel(row) and expect = 9 and pred = "source"
or
sinkModel(row) and expect = 9 and pred = "sink"
or
summaryModel(row) and expect = 10 and pred = "summary"
or
negativeSummaryModel(row) and expect = 5 and pred = "negative summary"
|
exists(int cols |
cols = 1 + max(int n | exists(row.splitAt(";", n))) and
cols != expect and
msg =
result =
"Wrong number of columns in " + pred + " model row, expected " + expect + ", got " + cols +
" in " + row + "."
)
or
exists(string b |
b = row.splitAt(";", 2) and
not b = ["true", "false"] and
msg = "Invalid boolean \"" + b + "\" in " + pred + " model."
)
)
or
exists(string row, string kind | summaryModel(row) |
kind = row.splitAt(";", 8) and
not kind = ["taint", "value"] and
msg = "Invalid kind \"" + kind + "\" in summary model."
)
or
exists(string row, string kind | sinkModel(row) |
kind = row.splitAt(";", 7) and
not kind = ["code", "sql", "xss", "remote", "html"] and
not kind.matches("encryption-%") and
msg = "Invalid kind \"" + kind + "\" in sink model."
)
or
exists(string row, string kind | sourceModel(row) |
kind = row.splitAt(";", 7) and
not kind = ["local", "file"] and
msg = "Invalid kind \"" + kind + "\" in source model."
}

private string getInvalidModelSignature() {
exists(
string pred, string namespace, string type, string name, string signature, string ext,
string provenance
|
sourceModel(namespace, type, _, name, signature, ext, _, _, provenance) and pred = "source"
or
sinkModel(namespace, type, _, name, signature, ext, _, _, provenance) and pred = "sink"
or
summaryModel(namespace, type, _, name, signature, ext, _, _, _, provenance) and
pred = "summary"
or
negativeSummaryModel(namespace, type, name, signature, provenance) and
ext = "" and
pred = "negative summary"
|
not namespace.regexpMatch("[a-zA-Z0-9_\\.]+") and
result = "Dubious namespace \"" + namespace + "\" in " + pred + " model."
or
not type.regexpMatch("[a-zA-Z0-9_<>,\\+]+") and
result = "Dubious type \"" + type + "\" in " + pred + " model."
or
not name.regexpMatch("[a-zA-Z0-9_<>,]*") and
result = "Dubious member name \"" + name + "\" in " + pred + " model."
or
not signature.regexpMatch("|\\([a-zA-Z0-9_<>\\.\\+\\*,\\[\\]]*\\)") and
result = "Dubious signature \"" + signature + "\" in " + pred + " model."
or
not ext.regexpMatch("|Attribute") and
result = "Unrecognized extra API graph element \"" + ext + "\" in " + pred + " model."
or
not provenance = ["manual", "generated"] and
result = "Unrecognized provenance description \"" + provenance + "\" in " + pred + " model."
)
}

/** Holds if some row in a CSV-based flow model appears to contain typos. */
query predicate invalidModelRow(string msg) {
msg =
[
getInvalidModelSignature(), getInvalidModelInput(), getInvalidModelOutput(),
getInvalidModelSubtype(), getInvalidModelColumnCount(), getInvalidModelKind()
]
}
}

private predicate elementSpec(
string namespace, string type, boolean subtypes, string name, string signature, string ext
) {
sourceModel(namespace, type, subtypes, name, signature, ext, _, _, _) or
sinkModel(namespace, type, subtypes, name, signature, ext, _, _, _) or
sourceModel(namespace, type, subtypes, name, signature, ext, _, _, _)
or
sinkModel(namespace, type, subtypes, name, signature, ext, _, _, _)
or
summaryModel(namespace, type, subtypes, name, signature, ext, _, _, _, _)
or
negativeSummaryModel(namespace, type, name, signature, _) and ext = "" and subtypes = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes anything depending on the other kinds of models also depend on negative summaries. This might be fine for now, but if they grow too large that we don't want to scan them for regular queries then we'd likely need to split this predicate here (and do the implementation sharing bits with parameterised modules instead). But it's probably fine to leave that as potential future work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the modules containing the negative summaries are included near the queries that relies on them. Are you worried that if a query that imports the negative summaries are included in query package containing many other queries that these other queries will be slower?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently inclusion is based on file module activation (aka transitive import), and the negative models aren't activated for regular queries, so they'll be fine. But all of the positive models and their dependencies will be recomputed once negative models are included, so any queries referring to negative models need to do extra work. Again, this is probably fine for now.
I originally didn't notice that the negative models were excluded from the computation by way of not being activated, since the picture in my mind was a more explicit split at the predicate level. And once we move the MaD rows to external files then we need to refactor this, since at that point the exclusion-by-no-import won't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you very much for the explanation! 😃

}

private predicate elementSpec(
Expand Down Expand Up @@ -508,7 +575,7 @@ private Element interpretElement0(
)
}

/** Gets the source/sink/summary element corresponding to the supplied parameters. */
/** Gets the source/sink/summary/negativesummary element corresponding to the supplied parameters. */
Element interpretElement(
string namespace, string type, boolean subtypes, string name, string signature, string ext
) {
Expand Down
2 changes: 1 addition & 1 deletion csharp/ql/lib/semmle/code/csharp/dataflow/SSA.qll
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ module Ssa {
}

/**
* Holds is this SSA definition is live at the end of basic block `bb`.
* Holds if this SSA definition is live at the end of basic block `bb`.
* That is, this definition reaches the end of basic block `bb`, at which
* point it is still live, without crossing another SSA definition of the
* same source variable.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2129,18 +2129,37 @@ module Csv {
if isBaseCallableOrPrototype(c) then result = "true" else result = "false"
}

/** Computes the first 6 columns for CSV rows of `c`. */
private predicate partialModel(
DotNet::Callable c, string namespace, string type, string name, string parameters
) {
c.getDeclaringType().hasQualifiedName(namespace, type) and
c.hasQualifiedName(_, name) and
parameters = "(" + parameterQualifiedTypeNamesToString(c) + ")"
}

/** Computes the first 6 columns for positive CSV rows of `c`. */
string asPartialModel(DotNet::Callable c) {
exists(string namespace, string type, string name |
c.getDeclaringType().hasQualifiedName(namespace, type) and
c.hasQualifiedName(_, name) and
exists(string namespace, string type, string name, string parameters |
partialModel(c, namespace, type, name, parameters) and
result =
namespace + ";" //
+ type + ";" //
+ getCallableOverride(c) + ";" //
+ name + ";" //
+ "(" + parameterQualifiedTypeNamesToString(c) + ")" + ";" //
+ parameters + ";" //
+ /* ext + */ ";" //
)
}

/** Computes the first 4 columns for negative CSV rows of `c`. */
string asPartialNegativeModel(DotNet::Callable c) {
exists(string namespace, string type, string name, string parameters |
partialModel(c, namespace, type, name, parameters) and
result =
namespace + ";" //
+ type + ";" //
+ name + ";" //
+ parameters + ";" //
)
}
}
Loading