Make use of multi-language and indirect tracing #744

edoardopirovano · 2021-09-15T13:51:17Z

This PR changes the CodeQL Action to use the new --db-cluster (available in CLI 2.5.6+) and --begin-tracing (available in CLI 2.6.0+) options. I think it makes sense to use these two options together or not at all, as the two are closely intertwined so if we wanted to support using just --db-cluster in CLIs 2.5.6 to 2.6.0 this would add another path to the code. I do not think there is a need for this.

Mostly, the tests do not need changing as the existing integration tests will use this functionality when testing against new versions of the CLI. One exception is the test modified by #495 to check that the .win32env file created by the action is read successfully. My understanding is that now that we have the proper solution to the above using multi-language tracing, we no longer need to create that file in the Action, nor do we need to have that modification to the test.

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Confirm the readme has been updated if necessary.
Confirm the changelog has been updated if necessary.

criemen · 2021-09-21T09:01:24Z

I agree with your comment about .win32env.
There might be one bit missing in this PR: There's a step in the action where it outputs the build-tracer.log, either as artifact or to stdout (I can't remember). That needs to be adjusted to point at the correct log file in the db-cluster case.
Potentially that step needs to be enabled by setting some debug setting first.

edoardopirovano · 2021-09-21T09:51:37Z

I agree with your comment about .win32env.
There might be one bit missing in this PR: There's a step in the action where it outputs the build-tracer.log, either as artifact or to stdout (I can't remember). That needs to be adjusted to point at the correct log file in the db-cluster case.
Potentially that step needs to be enabled by setting some debug setting first.

Are you sure this step is part of the CodeQL Action rather than something that is added onto workflows being debugged in a separate step using upload-artifact? The only reference to build-tracer.log I can find in this repository is in

codeql-action/src/tracer-config.ts

Line 114 in 7112cda

"compound-build-tracer.log"

where we set a path for the concatenated tracer spec that the Action needs to create before this PR.

We could certainly create a spec to put the log in the same place to not break anyone uploading the tracer logs, but since my understanding is the log uploading is only done temporarily while debugging a workflow, I don't think we want to do this as it seems overkill to define a new tracer spec just for this reason (previously, we needed to define one anyway to combine the environment).

criemen · 2021-09-21T09:56:16Z

Ah, then I probably misremembered. Then keeping it as-is is fine! I'd rather now override anything in the spec file, as the whole point of moving this into the CLI was to not depend on tracer internals in the action anymore.

CC @aibaars maybe you have something to contribute to this conversation?

hmakholm

Looks sensible, but we also need to change util.ts such that the action doesn't pass CODEQL_FEATURE_MULTI_LANGUAGE and CODEQL_FEATURE_SANDWICH when it uses the CLI features. Those feature flags specifically indicate that the Action's DIY implementation is used.

hmakholm · 2021-09-22T12:43:59Z

.github/workflows/pr-checks.yml

-        # Note we want to make sure that the .win32env file is read correctly, so we unset the CODEQL_EXTRACTOR_CSHARP_ROOT from the .sh file.
        run: |
          cat ./codeql-runner/codeql-env.sh | Invoke-Expression
-          $Env:CODEQL_EXTRACTOR_CSHARP_ROOT = ""


IIUC, the tracer is still supposed to be resistant towards build systems that clobber nonessential environment variables like this one. Even if the concrete regression this test was introduced to catch, wouldn't it be a good idea to keep it present, as defense in depth against tracer breakage?

This is a good point - in fact, I'm not convinced this test was ever testing that the .win32env file was doing what it should as the tracer is resistant towards the unsetting the environment variable as you say. I've left the code as it is, and updated the comment to better describe what is going on here.

The .win32env file is how the tracer resists unsetting ...

Ah, right. So if I understand correctly, before this the test was checking that the .win32env file that the Action created for the new compound environment it synthesised worked correctly. Now, the test is just checking that the tracer is resistant to environment variables being unset by whatever mechanism it chooses which happens to also be creating a .win32env file.

edoardopirovano · 2021-09-22T13:51:26Z

Thanks @hmakholm, I totally forgot I should've removed those! I've done that now and also rebased to fix a merge conflict.

hmakholm

We don't want to remove the feature flags unconditionally. Starting from release 2.5.7, the CLI depends on seeing CODEQL_ACTION_FEATURE_MULTI_LANGUAGE for enabling special-case code that will set CODEQL_SCRATCH_DIR consistently between languages. So for release 2.5.7, 2.5.8 and 2.5.9 it is important that the action passes the variable since then it is actually doing DIY language combination then.

I can see it doesn't look entirely trivial to make initializeEnvironment depend on the CodeQL version, but it's a problem that I think we'll need to solve at some point anyway.

hmakholm · 2021-09-22T14:23:43Z

A possible quick-and-dirty workaround could be to set the feature flag explicitly in getTracerEnv() when it launches codeql database trace-command).

Even though that doesn't set it for all codeql invocation, that doesn't matter -- we now know all the exact codeql releases it will ever need to work with, and for all of them trace-command is the only subcommand that actually looks for the flag!

edoardopirovano · 2021-09-22T14:32:13Z

Right, whoops. As you say, making initializeEnvironment depend on a CodeQL version is not so easily done. In fact, it's sort of impossible to have all CLI calls have an environment that depends on the version because we can't actually know what version is running without doing codeql version in the first place.

The choices are I think two:

Delay setting the environment until after we've obtained the CLI we're using and done an initial codeql version call to establish what version it is.
As you suggest, set the environment as it is now at the start, but then override it later on before any calls for which it is relevant.

I think (1) is the more morally right solution, so I'll have a look at how tricky that is to implement before possibly resigning myself to the more hacky solution.

edoardopirovano · 2021-09-22T15:20:17Z

Okay, so (1) above doesn't quite work because there are code paths where we care about some of the environment variables but don't actually have a CodeQL CLI available. In particular, the upload SARIF step doesn't currently use a CLI and it's my understanding that we may not want to make it need one because some users are using this as a way to upload SARIF from non-CodeQL tools to Code Scanning. However, this step also does care about some of the environment variables (in particular, it sends a status report that includes the version of the Action for telemetry purposes).

Thus, I've gone for splitting the variables into two groups - one that we unconditionally set at the start of every run as we do now, and another that we set only on code paths where we use a CLI once we have enough information to know what CLI version we are going to use.

hmakholm

This looks like it has a chance of working :-)

I'm happy with merging this as is now, but just floating a suggestion: instead of needing to remember to call enrichEnvironment in the right places, we could also make it a side effect of the first getCodeQL that actually reads the CLI version.

edoardopirovano · 2021-09-22T17:12:29Z

Hmm, that does sound very sensible in principle. However, there's a small issue with that - we need to know whether we are running on Actions or the Runner, as we set the environment variables differently depending on which scenario we are in and that information isn't readily available in getCodeQL. We could of course thread it through the relevant function calls, but that seems a little messy.

edoardopirovano force-pushed the use-db-cluster branch 5 times, most recently from 9f5383d to 3ec870a Compare September 15, 2021 15:46

edoardopirovano marked this pull request as ready for review September 15, 2021 16:05

edoardopirovano requested a review from a team as a code owner September 15, 2021 16:05

edoardopirovano requested a review from hmakholm September 15, 2021 16:05

hmakholm requested changes Sep 22, 2021

View reviewed changes

edoardopirovano force-pushed the use-db-cluster branch from 3ec870a to cc955cc Compare September 22, 2021 13:48

edoardopirovano requested a review from hmakholm September 22, 2021 13:51

hmakholm requested changes Sep 22, 2021

View reviewed changes

Make use of multi-language and indirect tracing

1f4460b

edoardopirovano force-pushed the use-db-cluster branch from cc955cc to 1f4460b Compare September 22, 2021 15:12

hmakholm approved these changes Sep 22, 2021

View reviewed changes

Merge branch 'main' into use-db-cluster

ee57c2e

edoardopirovano enabled auto-merge September 23, 2021 07:54

edoardopirovano merged commit f151a3c into github:main Sep 23, 2021

This was referenced Sep 27, 2021

Merge main into v1 #752

Closed

Merge main into v1 #758

Merged

edoardopirovano mentioned this pull request Oct 6, 2021

CodeQL 1.0.16 fails on Linux (1.0.15 ok) #760

Closed

edoardopirovano deleted the use-db-cluster branch October 7, 2021 07:27

This was referenced Oct 8, 2021

Temporarily disable new style of tracing #766

Merged

Re-enable new style of tracing #770

Merged

snyk-bot mentioned this pull request Apr 24, 2023

[Snyk] Fix for 21 vulnerabilities aliscco/codeql-action#85

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make use of multi-language and indirect tracing #744

Make use of multi-language and indirect tracing #744

edoardopirovano commented Sep 15, 2021 •

edited

Loading

criemen commented Sep 21, 2021

edoardopirovano commented Sep 21, 2021

criemen commented Sep 21, 2021

hmakholm left a comment

hmakholm Sep 22, 2021

edoardopirovano Sep 22, 2021

hmakholm Sep 22, 2021

edoardopirovano Sep 22, 2021

edoardopirovano commented Sep 22, 2021

hmakholm left a comment

hmakholm commented Sep 22, 2021

edoardopirovano commented Sep 22, 2021

edoardopirovano commented Sep 22, 2021

hmakholm left a comment

edoardopirovano commented Sep 22, 2021

Make use of multi-language and indirect tracing #744

Make use of multi-language and indirect tracing #744

Conversation

edoardopirovano commented Sep 15, 2021 • edited Loading

Merge / deployment checklist

criemen commented Sep 21, 2021

edoardopirovano commented Sep 21, 2021

criemen commented Sep 21, 2021

hmakholm left a comment

Choose a reason for hiding this comment

hmakholm Sep 22, 2021

Choose a reason for hiding this comment

edoardopirovano Sep 22, 2021

Choose a reason for hiding this comment

hmakholm Sep 22, 2021

Choose a reason for hiding this comment

edoardopirovano Sep 22, 2021

Choose a reason for hiding this comment

edoardopirovano commented Sep 22, 2021

hmakholm left a comment

Choose a reason for hiding this comment

hmakholm commented Sep 22, 2021

edoardopirovano commented Sep 22, 2021

edoardopirovano commented Sep 22, 2021

hmakholm left a comment

Choose a reason for hiding this comment

edoardopirovano commented Sep 22, 2021

edoardopirovano commented Sep 15, 2021 •

edited

Loading