Skip to content

Java: Add synthetic fields; model Commons Lang's MutableObject type #5796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 16, 2021

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Apr 28, 2021

Based on #5772

This adds support for synthetic fields (for now, specifically in external flow specifications), then uses one to model Commons Lang's MutableObject type.

@smowton smowton requested a review from a team as a code owner April 28, 2021 13:23
@github-actions github-actions bot added the Java label Apr 28, 2021
@Marcono1234
Copy link
Contributor

Not trying to discredit this pull request, it sounds extremely useful for cases where multiple synthetic fields exist, but for MutableObject wouldn't the existing dataflow capabilities suffice? Or is this partially done here for demonstration purposes?

@smowton
Copy link
Contributor Author

smowton commented May 10, 2021

It actually does add value even with just one field: it enables us to use value flow instead of taint flow. Value flow has to be type-preserving, so we need an Object-typed field, as opposed to the MutableObject or Mutable-typed container. With value flow it's possible to preserve access paths: for example, with taint rules, this example doesn't work:

MyClass c = new MyClass();
c.field = unsafe(); // Access path created; we expect a `.field` get to re-expose unsafe data
Mutable<MyClass> container = new Mutable<MyClass>(c);
sink(container.get().field);

It doesn't work because taint rules only apply with an empty access path: we can say that if c is tainted then container is tainted then container.get() is tainted, but not that latent taint that requires a field access to re-expose is preserved. Value-flow on the other hand can do that: if we document that the exact value of c is stored in some field of container that is fetched by its get() method, then the fetch from .field will work as expected.

* The owning type has no significance except to provide a source location, and to
* aid in avoiding name clashes.
*/
abstract class SyntheticField extends Unit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this would work when you have more than one synthetic field - as this class extends unit, it only has one instance; which might just return multiple values for its methods once there are multiple imlementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This observation is spot on. Declaring class SyntheticField extends Unit means that there's only ever one SyntheticField.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was written back when I thought we identified instances with rows in the charpred table (i.e., with owningType/fieldName tuples), will change...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten in the style of the CSV row extension point, and rebased. Ready for re-review.

@smowton smowton force-pushed the smowton/feature/apache-mutable-flow branch from 1764b65 to 9845c27 Compare June 18, 2021 13:09
@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged. The following differences were found:

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    `Apache Commons Lang <https://commons.apache.org/proper/commons-lang/>`_,``org.apache.commons.lang3``,,420,,,,,,,,
+    `Apache Commons Lang <https://commons.apache.org/proper/commons-lang/>`_,``org.apache.commons.lang3``,,417,,,,,,,,
-    Totals,,84,1625,181,13,6,6,,33,1,58
+    Totals,,84,1622,181,13,6,6,,33,1,58
  • Changes to framework-coverage-java.csv:
- org.apache.commons.lang3,,,420,,,,,,,,,,,,,,324,96
+ org.apache.commons.lang3,,,417,,,,,,,,,,,,,,324,93

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged. The following differences were found:

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    `Apache Commons Lang <https://commons.apache.org/proper/commons-lang/>`_,``org.apache.commons.lang3``,,420,,,,,,,,
+    `Apache Commons Lang <https://commons.apache.org/proper/commons-lang/>`_,``org.apache.commons.lang3``,,417,,,,,,,,
-    Totals,,84,1625,181,13,6,6,,33,1,58
+    Totals,,84,1622,181,13,6,6,,33,1,58
  • Changes to framework-coverage-java.csv:
- org.apache.commons.lang3,,,420,,,,,,,,,,,,,,324,96
+ org.apache.commons.lang3,,,417,,,,,,,,,,,,,,324,93

@smowton
Copy link
Contributor Author

smowton commented Jun 18, 2021

^^ Checked, these diffs are reversed. Actually we gained 3 rows, as expected.

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged. The following differences were found:

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    `Apache Commons Lang <https://commons.apache.org/proper/commons-lang/>`_,``org.apache.commons.lang3``,,420,,,,,,,,
+    `Apache Commons Lang <https://commons.apache.org/proper/commons-lang/>`_,``org.apache.commons.lang3``,,417,,,,,,,,
-    Totals,,84,1625,181,13,6,6,,33,1,58
+    Totals,,84,1622,181,13,6,6,,33,1,58
  • Changes to framework-coverage-java.csv:
- org.apache.commons.lang3,,,420,,,,,,,,,,,,,,324,96
+ org.apache.commons.lang3,,,417,,,,,,,,,,,,,,324,93

@aschackmull
Copy link
Contributor

@tamasvajk The action is a bit spammy - is there any way to easily do something about that? E.g. don't post another comment if it already posted an identical comment?

@tamasvajk
Copy link
Contributor

tamasvajk commented Jun 18, 2021

@tamasvajk The action is a bit spammy - is there any way to easily do something about that? E.g. don't post another comment if it already posted an identical comment?

Hmm, good point. The fix is coming in #6115.

I'll figure something out. (I can probably figure out what was the run_id of a previous execution, and then compare the artifacts before adding a new comment.)

Note to self: the following could be used to select the second to last successful run of the job that produces change artifacts, where the head_branch would need to come from the current execution of the workflow_run:
gh api -X GET repos/github/codeql/actions/runs -f event=pull_request -f status=success -f name="Check framework coverage changes" --jq '[.workflow_runs.[] | select(.head_branch=="smowton/feature/apache-mutable-flow") | { created_at: .created_at, run_id: .id}] | sort_by(.created_at) | reverse | .[1]'

(todo: head_repository also needs to be filtered)

@smowton
Copy link
Contributor Author

smowton commented Jun 21, 2021

@aschackmull this is ready for review

@smowton smowton force-pushed the smowton/feature/apache-mutable-flow branch from 58a77f5 to c066e39 Compare June 29, 2021 11:12
@aschackmull
Copy link
Contributor

Synthetic field support superseded by #6291

@smowton smowton force-pushed the smowton/feature/apache-mutable-flow branch from c066e39 to f3d9894 Compare July 15, 2021 14:17
@smowton
Copy link
Contributor Author

smowton commented Jul 15, 2021

Rebased. I've kept the improvement to clearsContent, which implicitly assumes these synthetic fields are more like ordinary fields than like ArrayElement etc which cannot be cleared. If we want Synthetic things which aren't subject to that sort of strong update then we'll need either a way to customise them or a cousin to SyntheticField that doesn't appear in clearsContent.

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged. The following differences were found:

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    `Apache Commons Lang <https://commons.apache.org/proper/commons-lang/>`_,``org.apache.commons.lang3``,,420,,,,,,,,
+    `Apache Commons Lang <https://commons.apache.org/proper/commons-lang/>`_,``org.apache.commons.lang3``,,423,,,,,,,,
-    Totals,,84,2109,296,13,6,6,107,33,1,66
+    Totals,,84,2112,296,13,6,6,107,33,1,66
  • Changes to framework-coverage-java.csv:
- org.apache.commons.lang3,,,420,,,,,,,,,,,,,,,292,128
+ org.apache.commons.lang3,,,423,,,,,,,,,,,,,,,292,131

@joefarebrother
Copy link
Contributor

Rebased. I've kept the improvement to clearsContent, which implicitly assumes these synthetic fields are more like ordinary fields than like ArrayElement etc which cannot be cleared. If we want Synthetic things which aren't subject to that sort of strong update then we'll need either a way to customise them or a cousin to SyntheticField that doesn't appear in clearsContent.

In fact I have a use case for this kind of thing; in #6174 I have a Table type in which its Rows and Columns are like MapKey so shouldn't be cleared. So far I've been working around it by using things like SyntheticField[Table.row] of MapKey of ..., which is less satisfactory.

@aschackmull
Copy link
Contributor

I don't think we should try to include clearing and it also won't work as-is. The reason it won't work is that clearing flow isn't summarised, so if it happens inside a method then callers won't see it happening. And the reason for not summarising clearing flow is that all the calculated flow by the library is existential flow ("there exists at least one path such that"), for which something like setter- and getter-flows can be summarised and incorporated at the call site, but clearing flow is essentially a barrier rather than flow, so it would have to be expressed in terms of universal flow ("for all paths it holds that ..." or "no path exists such that ...").

@aschackmull
Copy link
Contributor

Clearing-as-summary is partially supported by FlowSummaryImpl.qll, but it isn't something that's currently hooked up to csv rows, as it would need special handling there (it's a separate kind of summary).

@aschackmull
Copy link
Contributor

Hmm, I see the test does seem to indicate that it actually works... I wonder why that is. I'm not sure it should.

@smowton smowton force-pushed the smowton/feature/apache-mutable-flow branch from f3d9894 to 9cde13b Compare July 16, 2021 08:45
@smowton
Copy link
Contributor Author

smowton commented Jul 16, 2021

@aschackmull dropped clearing for synthetic fields. @joefarebrother this means synthetic fields always behave like Element et al so your case can probably be simplified.

@aschackmull aschackmull merged commit ef9d096 into github:main Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants