Skip to content

Java: model top 100 JDK APIs #11572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Dec 20, 2022
Merged

Conversation

jcogs33
Copy link
Contributor

@jcogs33 jcogs33 commented Dec 5, 2022

Description

This PR adds MaD models for top 100 JDK APIs which were not previously modeled. It also adds a test case that checks that a manual model exists for each of the top 100 APIs.

Consideration

Models:

  • Please take a close look at the models added for exceptions as I'm not sure if I handled the field flow for them correctly.
  • I left the following two apis unmodeled based on comments here, and here. Let me know if I should add a model for either of these after all: java.lang.String#valueOf(Object) and java.lang.Throwable#printStackTrace().
  • I put java.lang.String#charAt(int) as a positive model based on the comment here, let me know if I should remove the model instead.

Test Case:

  • Let me know if the test case needs to be changed. I'm not sure if I structured it as intended.

@github-actions github-actions bot added the Java label Dec 5, 2022
@aschackmull
Copy link
Contributor

Should all of these models be stored in one file as a record of the top-100 models? Or should I separate these into files by package name?

We shouldn't duplicate rows, and I think these generally make more sense in the <packagename>.model.yml files. I believe going through this top100 was mainly meant to ensure that we do have manual models checked in where applicable, and that we hadn't missed any among the top 100.
We can see that we're missing some models for exceptions, but as noted these aren't that relevant yet.

@jcogs33
Copy link
Contributor Author

jcogs33 commented Dec 7, 2022

We shouldn't duplicate rows, and I think these generally make more sense in the .model.yml files. I believe going through this top100 was mainly meant to ensure that we do have manual models checked in where applicable, and that we hadn't missed any among the top 100.

Would it make sense to store them in a single file if maybe the model generator will be run on a list of top APIs like you suggested in the planning meeting today? Or at least tag the existing models somehow so they can be identified as top APIs by the generator?

@yo-h
Copy link
Contributor

yo-h commented Dec 8, 2022

We shouldn't duplicate rows, and I think these generally make more sense in the .model.yml files.

I think this makes sense - it will put the added data where it naturally belongs.

Or at least tag the existing models somehow so they can be identified as top APIs by the generator?

Good point that we want to ideally retain the information about top APIs somewhere.

I believe going through this top100 was mainly meant to ensure that we do have manual models checked in where applicable, and that we hadn't missed any among the top 100.

Perhaps capture the information about the list of top 100 APIs in a test case that checks that a manual model exists for each of the APIs in the list?

@yo-h
Copy link
Contributor

yo-h commented Dec 8, 2022

Perhaps capture the information about the list of top 100 APIs in a test case that checks that a manual model exists for each of the APIs in the list?

This should be possible to do in a database-independent way. (Unlike running the metric queries or generator queries.)

@aschackmull
Copy link
Contributor

Perhaps capture the information about the list of top 100 APIs in a test case that checks that a manual model exists for each of the APIs in the list?

Yes, I think that makes the most sense. The top 100 is a moving target (although unlikely to have changes that often), so taking a snapshot a wrapping that up as a test seems fine.

This should be possible to do in a database-independent way. (Unlike running the metric queries or generator queries.)

Unfortunately, not quite. As an example we have java.util.Set.add(Object) in the top 100, but this is covered by a model for Collection.add(). However, as a qltest this can be easily fixed by adding a bit of Java to the Java side of the qltest to actually reference those 100 api endpoints, thus ensuring that they and their supertypes are in the db.

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86
  • Changes to framework-coverage-java.csv:
- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86
  • Changes to framework-coverage-java.csv:
- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86
  • Changes to framework-coverage-java.csv:
- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

@jcogs33
Copy link
Contributor Author

jcogs33 commented Dec 19, 2022

Review Consideration for commit bf6148c onwards:

  1. Is the code style for TopJdkApis.qll and TopJdkApisTest.ql okay? Should anything else be updated? See #11572 comment.
  2. In f3fc683 and f933fc7, I updated some existing test cases that were affected by the addition of the Integer.parseInt MaD model. I'm a little bit confused as to why these needed to be updated since it looks like Integer.parseInt was already supported by NumberTaintPreservingCallable. Please let me know if anything looks wrong with the MaD model.
  3. Regarding this comment, would it be okay to leave the parameterless static methods as neutral models for now and remove them later if an exclusion specific to summary models is added? See #11717 comment.
  4. Let me know if the category and description in the change note look okay.

@atorralba
Copy link
Contributor

Is the code style for TopJdkApis.qll and TopJdkApisTest.ql okay? Should anything else be updated? See #11572 (comment).

I think what you did is pretty similar to what @aschackmull meant, with the addition that you could have Callable c as a parameter of hasCallable (renamed to e.g. hasApiName), making getApiName unnecessary. So something like:

/** Holds if `c` has the MaD-formatted name `apiName`. */
predicate hasApiName(Callable c, string apiName) {
  apiName =
    c.getDeclaringType().getPackage() + "." + c.getDeclaringType().getSourceDeclaration() + "#" +
      c.getName() + paramsString(c)
}

Which would still remove the code duplication, since the uses would look like:

class TopJdkApi extends SummarizedCallableBase {
  TopJdkApi() {
    exists(string apiName |
      hasApiName(this.asCallable(), apiName) and
      topJdkApiName(apiName)
    )
  }
}

and

 // top jdk api names for which there isn't a manual model
 exists(TopJdkApi topApi |
  not topApi.hasManualMadModel() and
  hasApiName(topApi.asCallable(), apiName) and
  message = "no manual model"
)

In f3fc683 and f933fc7, I updated some existing test cases that were affected by the addition of the Integer.parseInt MaD model. I'm a little bit confused as to why these needed to be updated since it looks like Integer.parseInt was already supported by NumberTaintPreservingCallable. Please let me know if anything looks wrong with the MaD model.

Those changes only add new edges and nodes, but the select results are the same. I'm not an expert, but AFAIK that has to do with how MaD rows are represented in the dataflow graph compared to regular CodeQL steps. So nothing you should worry about as long as the results don't change.

Regarding this comment, would it be okay to leave the parameterless static methods as neutral models for now and remove them later if an exclusion specific to summary models is added? See #11717 (comment).

Given that it seems that the model generator will still be generally interested in parameterless static methods (just not as sinks nor summaries), I think it's a good idea to leave them as neutral models for the moment. If an hypothetical exclusion for summary models happens, those can be removed.

Let me know if the category and description in the change note look okay.

I think it looks right. As an optional nitpick, I'm not sure if specifying the kind of models (summary and neutral) is of general interest, so maybe a simpler Added more dataflow models for top-100 JDK APIs. would work.

I felt tempted to categorize this as a majorAnalysis change given the widespread impact that changes in JDK models could have, but I think majorAnalysis is more appropriate for significant CodeQL API changes and not just modeling changes, so minorAnalysis is probably the right category.

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86
  • Changes to framework-coverage-java.csv:
- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

@jcogs33
Copy link
Contributor Author

jcogs33 commented Dec 19, 2022

Thanks @atorralba! I've added a hasApiName predicate in 42ddd66 and updated the change note as suggested in f37f0a0.

Copy link
Contributor

@atorralba atorralba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a few more minor suggestions after taking a closer look.

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86
  • Changes to framework-coverage-java.csv:
- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

atorralba
atorralba previously approved these changes Dec 19, 2022
Copy link
Contributor

@atorralba atorralba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to proxy @aschackmull almost-complete review (I didn't re-review in full) and get this merged if DCA is happy 😃.

@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86
  • Changes to framework-coverage-java.csv:
- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

@jcogs33
Copy link
Contributor Author

jcogs33 commented Dec 19, 2022

Happy to proxy @aschackmull almost-complete review (I didn't re-review in full) and get this merged if DCA is happy 😃.

Thanks @atorralba! DCA looks good now I think (the only failures are the typical apache_commons ones).
Can you re-approve when you have a chance? 🙏 I pushed one very minor update to store the neutral models alphabetically.

Co-authored-by: yo-h <55373593+yo-h@users.noreply.github.com>
@github-actions
Copy link
Contributor

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    Java Standard Library,``java.*``,3,591,130,28,,,7,,,10
+    Java Standard Library,``java.*``,3,602,130,28,,,7,,,10
-    Totals,,217,8438,1563,129,6,10,107,33,1,86
+    Totals,,217,8449,1563,129,6,10,107,33,1,86
  • Changes to framework-coverage-java.csv:
- java.lang,13,,66,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,54,12
+ java.lang,13,,75,,,,,,,,,,,,8,,,,,,4,,,1,,,,,,,,,,,,,,,,56,19
+ java.math,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,
- java.sql,11,,,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,,
+ java.sql,11,,1,,,,,,,,4,,,,,,,,,,,,,,,,,7,,,,,,,,,,,,1,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants