Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug: merge command generates multiple runs #2513

Merged
merged 17 commits into from
Aug 19, 2022

Conversation

yongyan-gh
Copy link
Collaborator

@yongyan-gh yongyan-gh commented Jul 28, 2022

Description

Fixes for below issues:

  • merge command generates SARIF with multiple runs ([SARIF Multitool] merge command generates SARIF with multiple runs #2488)
    By default, the results are grouped into multiple runs per rule id + tool version.
    When merge-runs argument is set, group the results by tool version, this will place all results of runs produced by same tool same version into a single run.

  • Another issue it fixes is System.IO.IOException/System.IO.DirectoryNotFoundException exception when execute merge command with --split PerRule option, and the Sarif files to be merged contains rule Id has invalid character for a file name. E.g. a sub rule id "SEC105/001".
    With --split PerRule option, merge command creates separate Sarif files for each rule id. E.g. "SEC105/001_merged.sarif". The character "/" in the rule Id is an invalid file name character, it will cause the error to create the merged file.

@@ -117,7 +121,7 @@ public int Run(MergeOptions mergeOptions)
mergedLog.SchemaUri = mergedLog.Version.ConvertToSchemaUri();

FileSystem.DirectoryCreateDirectory(outputDirectory);
outputFilePath = Path.Combine(outputDirectory, GetOutputFileName(_options, key));
outputFilePath = Path.Combine(outputDirectory, ReplaceInvalidCharInFileName(GetOutputFileName(_options, key), "."));
Copy link
Collaborator Author

@yongyan-gh yongyan-gh Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReplaceInvalidCharInFileName

If PerRule split strategy is specified, it creates a SarifLog file per rule, the file name is {ruleId}_merged.sarif.

If the ruleId is a sub rule Id, e.g. SEC105/001, it contains invalid file name char '/' and will cause saving the log file failed.

This is the fix for this issue. #ByDesign

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the unit test for this issue? Did I miss it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dedicated unit tests added. pls review.

@@ -179,7 +183,14 @@ private async Task<bool> MergeSarifLogsAsync()
};
}

key = CreateRuleKey(result.RuleId, run);
if (_options.MergeRuns)
Copy link
Collaborator Author

@yongyan-gh yongyan-gh Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MergeRuns

Acccording to merge-runs argument description

 [Option( 
     "merge-runs", 
     HelpText = "Merge runs of the same tool + verion combination (requires " + 
                "eliding run-specific details such as invocations data.")] 
 public bool MergeRuns { get; set; } 

If merge-runs is false (default), the merged log runs are aggregated by key ruleId+tool name+tool version.
if it is set to true, the merged log runs will be aggregated by key tool name+tool version. This means all results produced by same tool and same version will be merged into 1 single run. #ByDesign

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your understanding is correct. Do we really have the term verion burned into our help? If so, can you please correct this to version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed typo.

@yongyan-gh yongyan-gh marked this pull request as ready for review July 28, 2022 23:46
@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Jul 29, 2022

Does the tool rewrite JSON Pointers in sarif: hyperlinks as noted in SARIF-v2.1.0 §3.11.6 Messages with embedded links? If so, does this pull request require any changes in how the tool does that?


In reply to: 1199064853

@@ -1,5 +1,9 @@
# SARIF Package Release History (SDK, Driver, Converters, and Multitool)

## Unreleased

* BUGFIX: Fix issue `merge` command generates SARIF with multiple runs [#2488](https://github.com/microsoft/sarif-sdk/pull/2488)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix

This description does not clearly describe the issue and the fix. @marmegh or @EasyRhinoMSFT can you please help Yong with this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated pls review. cc @VCOLAWTON

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, let's look at what we're fixing here.

The problem: the merge command was generating a sarif with a run per rule instead of a run per tool.

The description here does not capture that. We need to tell the customers what behavior they may have been experiencing that is now changed. Specific first, concise second.

"Fixed merge command producing runs per rule to the expected runs per tool."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions. Updated both, pls review.

@@ -1,5 +1,9 @@
# SARIF Package Release History (SDK, Driver, Converters, and Multitool)

## Unreleased

* BUGFIX: Fix issue `merge` command generates SARIF with multiple runs [#2488](https://github.com/microsoft/sarif-sdk/pull/2488)
Copy link
Member

@michaelcfanning michaelcfanning Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUGFIX

There's no reference in the release notes to the invalid characters issue. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has someone built/run your tests and examined code coverage? Do you have any information for us on the completeness of code & data coverage for your tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran code coverage analysis in VS. All code paths I made change were covered. In term of MergeCommand coverage, it was 84%, added couple test cases its now 94%

@@ -307,6 +318,11 @@

return prefix ?? string.Empty;
}

private static string ReplaceInvalidCharInFileName(string fileName, string replacement)
Copy link
Member

@michaelcfanning michaelcfanning Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReplaceInvalidCharInFileName

You provided no unit tests for this helper. Generally, a helper with a single line of code with a single upstream caller doesn't warrant dedicating its own method. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dedicated unit tests added pls review

@@ -31,6 +32,9 @@
private readonly Dictionary<string, HashSet<Result>> _ruleIdToResultsMap;
private readonly Dictionary<string, RunMergingVisitor> _ruleIdToMergeVisitorsMap;

private static readonly string s_invalidFileNameCharRegexString = $"[{Regex.Escape(new string(Path.GetInvalidFileNameChars()))}]";
Copy link
Member

@michaelcfanning michaelcfanning Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_invalidFileNameCharRegexString

This functionality seems very general and not something at all specific to merging SARIF logs. Should it be somewhere else? A file utilities class? Should it simply be an extension method somewhere? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, moved the function to PathExtensions changed it to an extension method of string.

}
}

private static Run CreateTestRun(int numberOfResult, string toolName = null, bool createSubRule = false)
Copy link
Member

@michaelcfanning michaelcfanning Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateTestRun

Don't we have other code somewhere that autogenerates test files? If so, can we please us it instead of creating this duplicative helper? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the RandomSarifLogGenerator from Test.UnitTests.Sarif to Test.Utilities.Sarif so can re-use its functionalities.

public void MergeCommand_WhenSplitPerRule_LogShouldAggregateByRuleToolVersion()
{
// 2 logs 2 runs, same tool version, 6 rules
SarifLog sarifLog1 = new SarifLog { Runs = new[] { CreateTestRun(6, null, true) } };
Copy link
Member

@michaelcfanning michaelcfanning Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SarifLog

this code doesn't meet our style standard, you should be using 'var' instead of an explicit type, because the type itself is clearly specified after the new keyword.

Why did this code pass build? Are we not enforcing this standard in our diagnostics configuration? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Its no enforced. In VS there are messages (not warning or erro) in error list showing this kind of issue.

@michaelcfanning
Copy link
Member

Um, this is an excellent question. We support multiple varieties of intra-logfile link, one of which is a simply reference to a location id (which I think should generally be stable as I believe it's associated with an id). The support for the SARIF schema, I also think, is mostly non-existent. And so I'd guess we have no code whatsoever to assist with this. Yong, we should create an issue for this?


In reply to: 1199064853

@yongyan-gh
Copy link
Collaborator Author

Right, the merge command does not update the pointers. Instead it calls FixupVisitor to resolve result's artifact location if it has index pointer points to artifacts array, and remove the artifact array. This needs further efforts, created issue #2520 for tracking.


In reply to: 1200469594

@@ -1,5 +1,10 @@
# SARIF Package Release History (SDK, Driver, Converters, and Multitool)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you send me a few sentences that explain what this document is used for? I can refine the language for you.

Copy link

@VCOLAWTON VCOLAWTON left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need an overview, please see comment. Thanks, Yong! @yongyan-gh

## Unreleased

* BUGFIX: Fix the issue `merge` command generates SARIF with multiple runs even specified `--merge-runs` option [#2488](https://github.com/microsoft/sarif-sdk/pull/2488)
* BUGFIX: Fix `System.IO.IOException` and `System.IO.DirectoryNotFoundException` when merge Sarif files contains invalid file name characters in the rule id with `--split PerRule` option.
Copy link
Contributor

@marmegh marmegh Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any other references to these exceptions in this change? Was this broken out into one of your other PRs? Or was this included and not documented in a closed PR? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error was not reported by any issue, but found during my testing for merge command. My other 2 PRs are for match-result-forward command. Its a straightforward fix, so I think it can be fixed along with the multiple runs issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the specific test implemented to prevent regression?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, in this test `MergeCommandUnitTests.MergeCommand_WhenSplitPerRule_LogShouldAggregateByRuleToolVersion', it creates sarif with id contains invalid char '/', e.g. 'TESTRULE/001', and the fix replaced the invalid char and generate valid file name e.g. 'TESTRULE.001_merged.sarif'

@@ -19,6 +20,8 @@ namespace Microsoft.CodeAnalysis.Sarif.Driver
/// <summary>Path extension functions.</summary>
public static class PathExtensions
{
private static readonly Regex s_invalidFileNameCharRegex = new Regex($"[{Regex.Escape(new string(Path.GetInvalidFileNameChars()))}]", RegexOptions.Compiled | RegexOptions.CultureInvariant);
Copy link
Collaborator

@shaopeng-gh shaopeng-gh Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path.GetInvalidFileNameChars()

not an actual issue,
just to note that with current way we will possibly generate different name in Linux and Windows,
and a name without any invalid char (win+Linux) would be
working in both os, easier to share.

the other way of cause is to hard code the combination of them.
each has its pros and cons, just to call it out.

Copy link
Collaborator

@shaopeng-gh shaopeng-gh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shaopeng-gh
Copy link
Collaborator

shaopeng-gh commented Aug 18, 2022

Need to resolve the conflicts.


In reply to: 1218848244


In reply to: 1218848244

@yongyan-gh
Copy link
Collaborator Author

resolved


In reply to: 1218848244

@@ -13,7 +13,7 @@
/// <summary>
/// Generate random sarif logs for testing.
/// </summary>
internal static class RandomSarifLogGenerator
public static class RandomSarifLogGenerator
Copy link
Collaborator Author

@yongyan-gh yongyan-gh Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public

Move this helper to the common Teset.Utilities.Sarif project so it can be used by other test projects. #ByDesign

@@ -1,5 +1,10 @@
# SARIF Package Release History (SDK, Driver, Converters, and Multitool)

## Unreleased

* BUGFIX: Fixed `merge` command producing runs per rule to the expected runs per tool. [#2488](https://github.com/microsoft/sarif-sdk/pull/2488)
Copy link
Member

@michaelcfanning michaelcfanning Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er rule to the expected r

this doesn't describe the adverse condition precisely or the state once resolved #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated as suggested

## Unreleased

* BUGFIX: Fixed `merge` command producing runs per rule to the expected runs per tool. [#2488](https://github.com/microsoft/sarif-sdk/pull/2488)
* BUGFIX: Fixed `System.IO.IOException` or `System.IO.DirectoryNotFoundException` when `merge` command producing per rule files.
Copy link
Member

@michaelcfanning michaelcfanning Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

command producing per rule files

I don't understand this #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the note as suggested

outputFilePath = Path.Combine(outputDirectory, GetOutputFileName(_options, key));
outputFilePath = Path.Combine(
outputDirectory,
GetOutputFileName(_options, key).ReplaceInvalidCharInFileName("."));
Copy link
Member

@michaelcfanning michaelcfanning Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetOutputFileName

Please comment this code to describe why we're doing this.

// We must replace invalid file characters in the constructed output file name that may appear in any rule ids incorporated into the file name candidate.

#Resolved

@@ -1,5 +1,10 @@
# SARIF Package Release History (SDK, Driver, Converters, and Multitool)

## Unreleased

* BUGFIX: Fixed `merge` command producing runs per rule to the expected runs per tool. [#2488](https://github.com/microsoft/sarif-sdk/pull/2488)
Copy link
Member

@michaelcfanning michaelcfanning Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Use this:

Update merge command to properly produce runs by tool and version when passed the --merge-runs argument. #Resolved

## Unreleased

* BUGFIX: Fixed `merge` command producing runs per rule to the expected runs per tool. [#2488](https://github.com/microsoft/sarif-sdk/pull/2488)
* BUGFIX: Fixed `System.IO.IOException` or `System.IO.DirectoryNotFoundException` when `merge` command producing per rule files.
Copy link
Member

@michaelcfanning michaelcfanning Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed `System.IO.IOException

Please use this:

'Eliminate IOException and DirectoryNotFoundException exceptions thrown by merge command when splitting by rule (due to invalid file characters in rule ids). #Resolved

outputHelper.WriteLine($"TestName: {nameof(PathExtensionsTests)} has random seed {randomSeed}");
}

[Theory]
Copy link
Member

@michaelcfanning michaelcfanning Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theory

Please stop introducing [Theory] usage into the code base, this functionality uses reflection and historically has caused performance issues.

All this is really buying is is the need not to inject a foreach in the method that calls into your helper. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to Fact pattern

VerifyReplaceInvalidCharInFileName(fileName, replacement, expectFileName);
}

[Theory]
Copy link
Member

@michaelcfanning michaelcfanning Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theory

No more [Theory] usage. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to [Fact] test pattern

Copy link
Member

@michaelcfanning michaelcfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@michaelcfanning michaelcfanning merged commit e1bb6fe into main Aug 19, 2022
@michaelcfanning michaelcfanning deleted the users/yongyan-gh/resultmatching-splitlogbug branch August 19, 2022 14:48
This was referenced Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants