Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New model #2625

Merged
merged 44 commits into from
Feb 24, 2023
Merged

New model #2625

merged 44 commits into from
Feb 24, 2023

Conversation

michaelcfanning
Copy link
Member

@michaelcfanning michaelcfanning commented Feb 22, 2023

Foundational work for updated threading model that better utilizes CPU while throttling memory use and allows for cancellation and timeout. The code is refactored to allow API consumers to plug-in at the highest level possible (rather than calling into individual methods to instantiate skimmers, analyze targets, etc.). Consumers can drive analysis strictly from command-line options (mirroring client tool behavior closely) or simply create a new (greatly expanded) context object that is sufficient to drive end-to-end analysis.

As a result of the context work, all parameterization can be expressed as an XML configuration file as well.

  • BRK: MultithreadedAnalyzeCommandBase IDispose implementation now manages logging dispose. Be sure to call base.Dispose() in any derived type implementations. #2614
  • BRK: Rename LogFilePersistenceOptions to FilePersistenceOptions (due to its general applicability in other file persistence contexts other than output logs).#2625
  • BRK: Many breaking changes in IAnalysisContext and AnalyzeContextBase. #2625
  • BUG: Eliminate per-context allocations contributing to unnecessary memory use. #2625
  • NEW: Rewrite MultithreadedAnalyzeCommandBase pipeline to allow for timeout, cancellation, and better API-driven use. #2625
  • NEW: Move large amounts of scan data to the context object, to streamline pipeline and allow for XML-driven configuration. #2625
  • NEW: Switch file processing to an ArtifactProvider model where enumerated artifacts consist of URI and optional content. #2625
  • NEW: Add new FailureLevelSet and ResultKindSet types that are compatible with XML-based configuration. #2625
  • NEW: Add PeakWorkingSet to --trace command to report maximum working set value during analysis. #2619

This is a large change and it's going in with many TBD comments I will address next week. Everyone will have further opportunity to review as it goes into sarif-pattern-matcher, where the new pipeline is highly exercised.

@michaelcfanning michaelcfanning marked this pull request as ready for review February 24, 2023 16:21

namespace Microsoft.CodeAnalysis.Sarif
{
public class AggregatingArtifactsProvider : IArtifactProvider
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AggregatingArtifactsProvider

An aggregating artifacts provider is an artifacts provider that itself aggregates scan targets generated by one or more providers. Directly analogous to our aggregating logger.

Note that individual providers own problems such as skipping files which aren't appropriate for scanning (because, for example, they exceed a size threshold) and for reporting those that were skipped.

@@ -16,42 +19,254 @@ public AnalyzeContextBase()
// All of these properties are persisted to configuration XML and can be
// passed using that mechanism. All command-line arguments are
// candidates to follow this pattern.
public IEnumerable<IOption> GetOptions()
public virtual IEnumerable<IOption> GetOptions()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetOptions

This is the way that a class declares it has serializable properties. This array is not complete.

An 'export-config' command will use MEF to retrieve all instances of IOptionsProvider and subsequently dump all published properties to an XML representation. What's serialized is what's expressed in the underlying static descriptors.

public bool PrettyPrint => OutputFileOptions.HasFlag(FilePersistenceOptions.PrettyPrint);
public bool ForceOverwrite => OutputFileOptions.HasFlag(FilePersistenceOptions.ForceOverwrite);

public virtual string AutomationId
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutomationId

This is the way we create class 'surface area' for properties that are backed by the policy property bag. That dictionary can be kept empty, in which case any retrieval of the static gets the default value from the static descriptor.

This also means that the policy dictionary only contains data values that override defaults, which allows for very compact serialized representation.


namespace Microsoft.CodeAnalysis.Sarif
{
public class EnumeratedArtifact : IEnumeratedArtifact
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnumeratedArtifact

EnumeratedArtifact is a thing.

It may exist only as a URI reference, in which case the URI can be resolved to retrieve it.

It can be populated with a stream, in which case there's no need to retrieve/download/load it.

It can be pre-populated with a string representation. We allow this because in the majority of cases we are in the business of analyzing strings. We might need to reconsider this approach, though, as other scan targets won't always be a string.

Probably, this type needs to be generic, with EnumeratedArtifact ultimately resolving to a string.

@@ -84,7 +88,7 @@ public static void LogExceptionLoadingTarget(IAnalysisContext context)
// Could not instantiate skimmers from the following plugins: {0}
context.Logger.LogConfigurationNotification(
CreateNotification(
context.TargetUri,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TargetUri

there is no target uri for this and some other error handlers i cleaned up.

@@ -190,15 +190,16 @@ public static string ToAndPhrase(this List<string> words)
}
}

public static OptionallyEmittedData ToFlags(this IEnumerable<OptionallyEmittedData> optionallyEmittedData)
public static TFlags ToFlags<TFlags>(this IEnumerable<TFlags> enumeratedFlags) where TFlags : Enum, IConvertible
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToFlags

made this work for any enum type that is int32

{
[Serializable]
[JsonConverter(typeof(TypedPropertiesDictionaryConverter))]
public class FailureLevelSet : HashSet<FailureLevel>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FailureLevelSet

Enable clean serialization of this set type to and from XML.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we could make these sets immutable, could be desirable.

Msg002_FileSkippedDueToSize,
ruleId: Msg002_FileSkippedDueToSize,
ruleId: null,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null

oops, this is an associated rule id, not the notification id, and should have been null.

@@ -95,6 +95,20 @@ public static class PropertiesDictionaryExtensionMethods
continue;
}

FailureLevelSet failureLevelSet = property as FailureLevelSet;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failureLevelSet

changes in this file allow for clean serialization and deserialization of sets that contain failure levels and result kinds, highly used to configure analysis.

sb.Append(" : ").Append(this.Descriptor.Id);
sb.Append(" : ").Append(this.AssociatedRule?.Id);
sb.Append(" : ").Append(this.Level);
if (this.TimeUtc != default) { sb.Append($"{this.TimeUtc}"); }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this

Provides a very nice way in the debugger to see what unexpected stuff is getting persisted to configuration and tool notifications, often something like an unhandled engine exception.

@@ -15,7 +15,7 @@ public class ConsoleLogger : BaseLogger, IAnalysisLogger
// TODO: We directly instantiate this logger in two classes, creating
// unamanged dependencies. Fix this pattern with dependency injection or a factory.
// #2272 https://github.com/microsoft/sarif-sdk/issues/2272
public ConsoleLogger(bool quietConsole, string toolName, IImmutableSet<FailureLevel> levels = null, IImmutableSet<ResultKind> kinds = null) : base(levels, kinds)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IImmutableSet

immutable set can't easily be serialized/deserialized, so we create alias types to these constructs.

@michaelcfanning michaelcfanning enabled auto-merge (squash) February 24, 2023 23:09
@michaelcfanning michaelcfanning merged commit a29f202 into main Feb 24, 2023
@michaelcfanning michaelcfanning deleted the new-model branch February 24, 2023 23:13
airtower-luna added a commit to airtower-luna/convert-to-sarif that referenced this pull request Apr 22, 2023
The option was removed before the 4.1.0 release [1], unfortunately the
documentation wasn't updated until significantly later [2].

[1] microsoft/sarif-sdk#2625
[2] microsoft/sarif-sdk#2656
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant