Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing AnalyzeCommandBase and MultithreadedAnalyzeCommandBase artifacts generation #2433

Merged
merged 16 commits into from
Jan 31, 2022
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/ReleaseHistory.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Unreleased

* BUGFIX: Fix `AnalyzeCommandBase` and `MultithreadedAnalyzeCommandBase` from outputting all artifacts to SARIF even if no results were produced when Hashes is enabled. [#2433](https://github.com/microsoft/sarif-sdk/pull/2433)
Copy link
Member

@michaelcfanning michaelcfanning Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUGFIX

  • BREAKING: AnalyzeCommandBase previously persisted all scan target artifacts to SARIF logs rather than only persisting artifacts referenced by an analysis result, when an option to persist hashes, text file or binary information was set. MultithreadedAnalyzeCommandBase previously persisted all scan targets artifacts to SARIF logs in cases when hash insertion was eenabled rather than only persisting artifacts referenced by an analysis result. #Resolved

* BUGFIX: Adjust Json Serialization field order for ReportingDescriptor and skip emit empty AutomationDetails node. [#2420](https://github.com/microsoft/sarif-sdk/pull/2420)
* BREAKING: Fix `InvalidOperationException` when using PropertiesDictionary in a multithreaded application, and remove `[Serializable]` from it. Now use of BinaryFormatter on it will result in `SerializationException`: Type `PropertiesDictionary` is not marked as serializable. [#2415](https://github.com/microsoft/sarif-sdk/pull/2415)

Expand Down
11 changes: 6 additions & 5 deletions src/Sarif.Driver/Sdk/AnalyzeCommandBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ private void Analyze(TOptions options, AggregatingLogger logger)
targets = ValidateTargetsExist(_rootContext, targets);

// 5. Initialize report file, if configured.
InitializeOutputFile(options, _rootContext, targets);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitializeOutputFile(options, _rootContext, targets);

we won't initialize the outputfile with targets because we don't want to emit all targets in the artifacts.

Copy link
Member

@michaelcfanning michaelcfanning Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

targets

Revert this change please it's unnecessarily breaking customers. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API isn't exposed to customers. Only tools that use it will see a difference.

InitializeOutputFile(options, _rootContext);

// 6. Instantiate skimmers.
ISet<Skimmer<TContext>> skimmers = CreateSkimmers(options, _rootContext);
Expand Down Expand Up @@ -287,13 +287,14 @@ private ISet<string> ValidateTargetsExist(TContext context, ISet<string> targets

if ((options.DataToInsert.ToFlags() & OptionallyEmittedData.Hashes) != 0)
{
if (_pathToHashDataMap != null && _pathToHashDataMap.TryGetValue(filePath, out HashData hashData))
if (_pathToHashDataMap?.TryGetValue(filePath, out HashData hashData) == true)
{
context.Hashes = hashData;
}
else
{
context.Hashes = HashUtilities.ComputeHashes(filePath);
_pathToHashDataMap?.Add(filePath, context.Hashes);
}
}
}
Expand Down Expand Up @@ -352,7 +353,7 @@ protected virtual void InitializeConfiguration(TOptions options, TContext contex
}
}

private void InitializeOutputFile(TOptions analyzeOptions, TContext context, ISet<string> targets)
private void InitializeOutputFile(TOptions analyzeOptions, TContext context)
{
string filePath = analyzeOptions.OutputFilePath;
AggregatingLogger aggregatingLogger = (AggregatingLogger)context.Logger;
Expand Down Expand Up @@ -388,7 +389,7 @@ private void InitializeOutputFile(TOptions analyzeOptions, TContext context, ISe
dataToRemove,
tool: _tool,
run: _run,
analysisTargets: targets,
analysisTargets: null,
quiet: analyzeOptions.Quiet,
invocationTokensToRedact: GenerateSensitiveTokensList(),
invocationPropertiesToLog: analyzeOptions.InvocationPropertiesToLog,
Expand All @@ -404,7 +405,7 @@ private void InitializeOutputFile(TOptions analyzeOptions, TContext context, ISe
dataToRemove,
tool: _tool,
run: _run,
analysisTargets: targets,
analysisTargets: null,
invocationTokensToRedact: GenerateSensitiveTokensList(),
invocationPropertiesToLog: analyzeOptions.InvocationPropertiesToLog,
levels: analyzeOptions.Level,
Expand Down
7 changes: 3 additions & 4 deletions src/Sarif.Driver/Sdk/MultithreadedAnalyzeCommandBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ public abstract class MultithreadedAnalyzeCommandBase<TContext, TOptions> : Plug
private Channel<int> _resultsWritingChannel;
private Channel<int> _fileEnumerationChannel;
private Dictionary<string, List<string>> _hashToFilesMap;
private IDictionary<string, HashData> _pathToHashDataMap;
private ConcurrentDictionary<int, TContext> _fileContexts;

public Exception ExecutionException { get; set; }
Expand Down Expand Up @@ -485,12 +486,9 @@ private async Task<bool> HashAsync()
_hashToFilesMap[hashData.Sha256] = paths;
}

_run?.GetFileIndex(new ArtifactLocation { Uri = context.TargetUri },
dataToInsert: _dataToInsert,
hashData: hashData);
Copy link
Collaborator Author

@eddynaka eddynaka Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this to prevent us from generating artifacts for all the files #ByDesign


paths.Add(localPath);
context.Hashes = hashData;
_pathToHashDataMap?.Add(localPath, hashData);
}
Copy link
Collaborator Author

@eddynaka eddynaka Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this will enable us to use the code from SarifLogger to create the artifacts #ByDesign


await _fileEnumerationChannel.Writer.WriteAsync(index);
Expand Down Expand Up @@ -705,6 +703,7 @@ private void InitializeOutputFile(TOptions analyzeOptions, TContext context)
kinds: analyzeOptions.Kind,
insertProperties: analyzeOptions.InsertProperties);
}
_pathToHashDataMap = sarifLogger.AnalysisTargetToHashDataMap;
sarifLogger.AnalysisStarted();
aggregatingLogger.Loggers.Add(sarifLogger);
},
Expand Down
5 changes: 2 additions & 3 deletions src/Sarif/HashUtilities.cs
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,14 @@ internal static IFileSystem FileSystem

public static IDictionary<string, HashData> MultithreadedComputeTargetFileHashes(IEnumerable<string> analysisTargets, bool suppressConsoleOutput = false)
{
if (analysisTargets == null) { return null; }
Copy link
Member

@michaelcfanning michaelcfanning Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null

Please revert this change. You should be careful about allocating objects to indicate no results. Why not just have the caller handle null? This code succeeded previously, so you probably have only just introduced the requirement for this to be non-null. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

var fileToHashDataMap = new ConcurrentDictionary<string, HashData>();
if (analysisTargets == null) { return fileToHashDataMap; }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will prevent a NullReferenceException when we use the reerence variable from the SarifLogger


if (!suppressConsoleOutput)
{
Console.WriteLine("Computing file hashes...");
}

var fileToHashDataMap = new ConcurrentDictionary<string, HashData>();

var queue = new ConcurrentQueue<string>(analysisTargets);

var tasks = new List<Task>();
Expand Down
3 changes: 3 additions & 0 deletions src/Sarif/OptionallyEmittedData.cs
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@ public enum OptionallyEmittedData : int
// Enrich SARIF log with git blame information
GitBlameInformation = 0x1000,

// Enrich with artifacts only.
Artifacts = 0x2000,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifacts

To keep the same behavior as before, I introduced a new flag that will be able to emit the artifacts only


// A special enum value that indicates that insertion should overwrite any existing
// information in the SARIF log file. In the absence of this setting, any existing
// data that would otherwise have been overwritten by the insert operation will
Expand Down
7 changes: 6 additions & 1 deletion src/Sarif/Writers/SarifLogger.cs
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ public class SarifLogger : BaseLogger, IDisposable, IAnalysisLogger

_persistArtifacts =
(_dataToInsert & OptionallyEmittedData.Hashes) != 0 ||
(_dataToInsert & OptionallyEmittedData.Artifacts) != 0 ||
(_dataToInsert & OptionallyEmittedData.TextFiles) != 0 ||
(_dataToInsert & OptionallyEmittedData.BinaryFiles) != 0;
}
Expand Down Expand Up @@ -447,12 +448,16 @@ private void CaptureArtifact(ArtifactLocation fileLocation)
catch (ArgumentException) { } // Unrecognized encoding name
}

HashData hashData = null;
AnalysisTargetToHashDataMap?.TryGetValue(fileLocation.Uri.OriginalString, out hashData);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalysisTargetToHashDataMap?.TryGetValue

if we have the hash, let's re-use it.
this will prevent us from hashing the same file twice


// Ensure Artifact is in Run.Artifacts and ArtifactLocation.Index is set to point to it
int index = _run.GetFileIndex(
fileLocation,
addToFilesTableIfNotPresent: _persistArtifacts,
_dataToInsert,
encoding);
encoding,
hashData);

// Remove redundant Uri and UriBaseId once index has been set
if (index > -1 && this.Optimize)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ protected override string ConstructTestOutputFromInputResource(string inputResou
PrettyPrint = true,
Optimize = true,
Kind = new List<ResultKind> { ResultKind.Fail },
DataToInsert = new List<OptionallyEmittedData> { OptionallyEmittedData.Artifacts },
Level = new List<FailureLevel> { FailureLevel.Error, FailureLevel.Warning, FailureLevel.Note, FailureLevel.None },
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,7 @@ protected override string ConstructTestOutputFromInputResource(string inputResou
PrettyPrint = true,
Optimize = true,
Kind = new List<ResultKind> { ResultKind.Fail },
DataToInsert = new List<OptionallyEmittedData> { OptionallyEmittedData.Artifacts },
Level = new List<FailureLevel> { FailureLevel.Error, FailureLevel.Warning, FailureLevel.Note, FailureLevel.None },
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,42 +9,42 @@
"rules": [
{
"id": "SARIF2003",
"name": "ProvideVersionControlProvenance",
"fullDescription": {
"text": "Provide 'versionControlProvenance' to record which version of the code was analyzed, and to enable paths to be expressed relative to the root of the repository."
},
"helpUri": "http://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html",
"messageStrings": {
"Note_Default": {
"text": "{0}: This run does not provide 'versionControlProvenance'. As a result, it is not possible to determine which version of code was analyzed, nor to map relative paths to their locations within the repository."
}
},
"name": "ProvideVersionControlProvenance",
"defaultConfiguration": {
"level": "note"
},
"helpUri": "http://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html"
}
},
{
"id": "SARIF2002",
"name": "ProvideMessageArguments",
"fullDescription": {
"text": "In result messages, use the 'message.id' and 'message.arguments' properties rather than 'message.text'. This has several advantages. If 'text' is lengthy, using 'id' and 'arguments' makes the SARIF file smaller. If the rule metadata is stored externally to the SARIF log file, the message text can be improved (for example, by adding more text, clarifying the phrasing, or fixing typos), and the result messages will pick up the improvements the next time it is displayed. Finally, SARIF supports localizing messages into different languages, which is possible if the SARIF file contains 'message.id' and 'message.arguments', but not if it contains 'message.text' directly."
},
"helpUri": "http://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html",
"messageStrings": {
"Note_Default": {
"text": "{0}: The 'message' property of this result contains a 'text' property. Consider replacing it with 'id' and 'arguments' properties. This potentially reduces the log file size, allows the message text to be improved without modifying the log file, and enables localization."
}
},
"name": "ProvideMessageArguments",
"defaultConfiguration": {
"level": "note"
},
"helpUri": "http://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html"
}
},
{
"id": "SARIF2005",
"name": "ProvideToolProperties",
"fullDescription": {
"text": "Provide information that makes it easy to identify the name and version of your tool.\r\n\r\nThe tool's 'name' property should be no more than three words long. This makes it easy to remember and allows it to fit into a narrow column when displaying a list of results. If you need to provide more information about your tool, use the 'fullName' property.\r\n\r\nThe tool should provide either or both of the 'version' and 'semanticVersion' properties. This enables the log file consumer to determine whether the file was produced by an up to date version, and to avoid accidentally comparing log files produced by different tool versions.\r\n\r\nIf 'version' is used, facilitate comparison between versions by specifying a version number that starts with an integer, optionally followed by any desired characters."
},
"helpUri": "http://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html",
"messageStrings": {
"Warning_ProvideToolVersion": {
"text": "{0}: The tool '{1}' does not provide any of the version-related properties {2}. Providing version information enables the log file consumer to determine whether the file was produced by an up to date version, and to avoid accidentally comparing log files produced by different tool versions."
Expand All @@ -59,14 +59,14 @@
"text": "{0}: The tool '{1}' does not provide 'informationUri'. This property helps the developer responsible for addessing a result by providing a way to learn more about the tool."
}
},
"helpUri": "http://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html"
"name": "ProvideToolProperties"
},
{
"id": "SARIF2012",
"name": "ProvideRuleProperties",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name

the order of the properties were enforced by shaopeng in a previous PR.

"fullDescription": {
"text": "Rule metadata should provide information that makes it easy to understand and fix the problem.\r\n\r\nProvide the 'name' property, which contains a \"friendly name\" that helps users see at a glance the purpose of the rule. For uniformity of experience across all tools that produce SARIF, the friendly name should be a single Pascal-case identifier, for example, 'ProvideRuleFriendlyName'.\r\n\r\nProvide the 'helpUri' property, which contains a URI where users can find detailed information about the rule. This information should include a detailed description of the invalid pattern, an explanation of why the pattern is poor practice (particularly in contexts such as security or accessibility where driving considerations might not be readily apparent), guidance for resolving the problem (including describing circumstances in which ignoring the problem altogether might be appropriate), examples of invalid and valid patterns, and special considerations (such as noting when a violation should never be ignored or suppressed, noting when a violation could cause downstream tool noise, and noting when a rule can be configured in some way to refine or alter the analysis)."
},
"helpUri": "http://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html",
"messageStrings": {
"Note_FriendlyNameNotAPascalIdentifier": {
"text": "{0}: '{1}' is not a Pascal-case identifier. For uniformity of experience across all tools that produce SARIF, the friendly name should be a single Pascal-case identifier, for example, 'ProvideRuleFriendlyName'."
Expand All @@ -84,10 +84,10 @@
"text": "'{0}' does not provide metadata for rule '{1}'. Rule metadata contains information that helps the user understand why each rule fires and what the user can do to fix it."
}
},
"name": "ProvideRuleProperties",
"defaultConfiguration": {
"level": "note"
},
"helpUri": "http://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html"
}
}
]
}
Expand Down
Loading