Java: model top 100 JDK APIs #11572

jcogs33 · 2022-12-05T17:51:34Z

Description

This PR adds MaD models for top 100 JDK APIs which were not previously modeled. It also adds a test case that checks that a manual model exists for each of the top 100 APIs.

Consideration

Models:

Please take a close look at the models added for exceptions as I'm not sure if I handled the field flow for them correctly.
I left the following two apis unmodeled based on comments here, and here. Let me know if I should add a model for either of these after all: java.lang.String#valueOf(Object) and java.lang.Throwable#printStackTrace().
I put java.lang.String#charAt(int) as a positive model based on the comment here, let me know if I should remove the model instead.

Test Case:

Let me know if the test case needs to be changed. I'm not sure if I structured it as intended.

java/ql/lib/ext/TopJdkApis.model.yml

aschackmull · 2022-12-06T13:53:37Z

Should all of these models be stored in one file as a record of the top-100 models? Or should I separate these into files by package name?

We shouldn't duplicate rows, and I think these generally make more sense in the <packagename>.model.yml files. I believe going through this top100 was mainly meant to ensure that we do have manual models checked in where applicable, and that we hadn't missed any among the top 100.
We can see that we're missing some models for exceptions, but as noted these aren't that relevant yet.

jcogs33 · 2022-12-07T04:40:52Z

We shouldn't duplicate rows, and I think these generally make more sense in the .model.yml files. I believe going through this top100 was mainly meant to ensure that we do have manual models checked in where applicable, and that we hadn't missed any among the top 100.

Would it make sense to store them in a single file if maybe the model generator will be run on a list of top APIs like you suggested in the planning meeting today? Or at least tag the existing models somehow so they can be identified as top APIs by the generator?

yo-h · 2022-12-08T20:15:10Z

We shouldn't duplicate rows, and I think these generally make more sense in the .model.yml files.

I think this makes sense - it will put the added data where it naturally belongs.

Or at least tag the existing models somehow so they can be identified as top APIs by the generator?

Good point that we want to ideally retain the information about top APIs somewhere.

I believe going through this top100 was mainly meant to ensure that we do have manual models checked in where applicable, and that we hadn't missed any among the top 100.

Perhaps capture the information about the list of top 100 APIs in a test case that checks that a manual model exists for each of the APIs in the list?

yo-h · 2022-12-08T20:29:29Z

Perhaps capture the information about the list of top 100 APIs in a test case that checks that a manual model exists for each of the APIs in the list?

This should be possible to do in a database-independent way. (Unlike running the metric queries or generator queries.)

aschackmull · 2022-12-09T12:15:48Z

Perhaps capture the information about the list of top 100 APIs in a test case that checks that a manual model exists for each of the APIs in the list?

Yes, I think that makes the most sense. The top 100 is a moving target (although unlikely to have changes that often), so taking a snapshot a wrapping that up as a test seems fine.

This should be possible to do in a database-independent way. (Unlike running the metric queries or generator queries.)

Unfortunately, not quite. As an example we have java.util.Set.add(Object) in the top 100, but this is covered by a model for Collection.add(). However, as a qltest this can be easily fixed by adding a bit of Java to the Java side of the qltest to actually reference those 100 api endpoints, thus ensuring that they and their supertypes are in the db.

github-actions · 2022-12-16T20:43:17Z

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

Changes to framework-coverage-java.rst:

-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86

Changes to framework-coverage-java.csv:

- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

github-actions · 2022-12-19T01:04:24Z

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

Changes to framework-coverage-java.rst:

-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86

Changes to framework-coverage-java.csv:

- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

…ted by String.length

github-actions · 2022-12-19T02:49:46Z

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

Changes to framework-coverage-java.rst:

-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86

Changes to framework-coverage-java.csv:

- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

jcogs33 · 2022-12-19T04:15:38Z

Review Consideration for commit bf6148c onwards:

Is the code style for TopJdkApis.qll and TopJdkApisTest.ql okay? Should anything else be updated? See #11572 comment.
In f3fc683 and f933fc7, I updated some existing test cases that were affected by the addition of the Integer.parseInt MaD model. I'm a little bit confused as to why these needed to be updated since it looks like Integer.parseInt was already supported by NumberTaintPreservingCallable. Please let me know if anything looks wrong with the MaD model.
Regarding this comment, would it be okay to leave the parameterless static methods as neutral models for now and remove them later if an exclusion specific to summary models is added? See #11717 comment.
Let me know if the category and description in the change note look okay.

atorralba · 2022-12-19T10:37:41Z

Is the code style for TopJdkApis.qll and TopJdkApisTest.ql okay? Should anything else be updated? See #11572 (comment).

I think what you did is pretty similar to what @aschackmull meant, with the addition that you could have Callable c as a parameter of hasCallable (renamed to e.g. hasApiName), making getApiName unnecessary. So something like:

/** Holds if `c` has the MaD-formatted name `apiName`. */
predicate hasApiName(Callable c, string apiName) {
  apiName =
    c.getDeclaringType().getPackage() + "." + c.getDeclaringType().getSourceDeclaration() + "#" +
      c.getName() + paramsString(c)
}

Which would still remove the code duplication, since the uses would look like:

class TopJdkApi extends SummarizedCallableBase {
  TopJdkApi() {
    exists(string apiName |
      hasApiName(this.asCallable(), apiName) and
      topJdkApiName(apiName)
    )
  }
}

and

 // top jdk api names for which there isn't a manual model
 exists(TopJdkApi topApi |
  not topApi.hasManualMadModel() and
  hasApiName(topApi.asCallable(), apiName) and
  message = "no manual model"
)

In f3fc683 and f933fc7, I updated some existing test cases that were affected by the addition of the Integer.parseInt MaD model. I'm a little bit confused as to why these needed to be updated since it looks like Integer.parseInt was already supported by NumberTaintPreservingCallable. Please let me know if anything looks wrong with the MaD model.

Those changes only add new edges and nodes, but the select results are the same. I'm not an expert, but AFAIK that has to do with how MaD rows are represented in the dataflow graph compared to regular CodeQL steps. So nothing you should worry about as long as the results don't change.

Regarding this comment, would it be okay to leave the parameterless static methods as neutral models for now and remove them later if an exclusion specific to summary models is added? See #11717 (comment).

Given that it seems that the model generator will still be generally interested in parameterless static methods (just not as sinks nor summaries), I think it's a good idea to leave them as neutral models for the moment. If an hypothetical exclusion for summary models happens, those can be removed.

Let me know if the category and description in the change note look okay.

I think it looks right. As an optional nitpick, I'm not sure if specifying the kind of models (summary and neutral) is of general interest, so maybe a simpler Added more dataflow models for top-100 JDK APIs. would work.

I felt tempted to categorize this as a majorAnalysis change given the widespread impact that changes in JDK models could have, but I think majorAnalysis is more appropriate for significant CodeQL API changes and not just modeling changes, so minorAnalysis is probably the right category.

github-actions · 2022-12-19T13:44:53Z

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

Changes to framework-coverage-java.rst:

-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86

Changes to framework-coverage-java.csv:

- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

jcogs33 · 2022-12-19T13:45:40Z

Thanks @atorralba! I've added a hasApiName predicate in 42ddd66 and updated the change note as suggested in f37f0a0.

atorralba

Looks great! Just a few more minor suggestions after taking a closer look.

java/ql/test/ext/TopJdkApis/TopJdkApis.qll

java/ql/test/ext/TopJdkApis/TopJdkApisTest.ql

github-actions · 2022-12-19T14:11:57Z

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

Changes to framework-coverage-java.rst:

-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86

Changes to framework-coverage-java.csv:

- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

atorralba

Happy to proxy @aschackmull almost-complete review (I didn't re-review in full) and get this merged if DCA is happy 😃.

github-actions · 2022-12-19T19:26:09Z

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

Changes to framework-coverage-java.rst:

-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86

Changes to framework-coverage-java.csv:

- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

jcogs33 · 2022-12-19T19:28:19Z

Happy to proxy @aschackmull almost-complete review (I didn't re-review in full) and get this merged if DCA is happy 😃.

Thanks @atorralba! DCA looks good now I think (the only failures are the typical apache_commons ones).
Can you re-approve when you have a chance? 🙏 I pushed one very minor update to store the neutral models alphabetically.

java/ql/lib/change-notes/2022-12-16-add-top-jdk-models.md

Co-authored-by: yo-h <55373593+yo-h@users.noreply.github.com>

github-actions · 2022-12-20T12:58:23Z

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

Changes to framework-coverage-java.rst:

-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86

Changes to framework-coverage-java.csv:

- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

github-actions bot added the Java label Dec 5, 2022