Skip to content

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Aug 30, 2019

This change makes a big impact on CoreFX and Mono:

Project Build + extraction time before Build + extraction time after Factor Same DB (# source files + size)
CoreFX 02:07:03 01:07:38 0.5323363505 yes
Mono 03:26:02 01:31:02 0.4418378903 yes
EntityFrameworkCore 00:21:10 00:22:06 1.044094488 yes
CoreCLR 00:06:35 00:06:52 1.043037975 yes
OneOf 00:05:39 00:06:02 1.067846608 yes
Powershell 00:06:43 00:06:58 1.037220844 yes
OrchardCore 00:33:02 00:33:42 1.020181635 yes
ql 00:02:55 00:02:46 0.9485714286 yes
roslyn 01:04:43 01:04:31 0.996909606 yes

@hvitved hvitved added the C# label Aug 30, 2019
@hvitved hvitved force-pushed the csharp/early-identify-duplicate-extraction branch from a68bad1 to a369db9 Compare September 2, 2019 08:04
@hvitved hvitved force-pushed the csharp/early-identify-duplicate-extraction branch 3 times, most recently from f9260b1 to e63e791 Compare September 11, 2019 14:06
@hvitved hvitved force-pushed the csharp/early-identify-duplicate-extraction branch from e63e791 to 8f3f940 Compare September 11, 2019 18:47
@hvitved hvitved marked this pull request as ready for review September 12, 2019 09:41
@hvitved hvitved requested a review from a team as a code owner September 12, 2019 09:41
@hvitved hvitved requested a review from calumgrant September 12, 2019 09:41
Copy link
Contributor

@calumgrant calumgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice sleuthing. This is an extremely reasonable solution. I'm sorry that it wasn't clear enough that the args-files should have been the verbatim command line arguments that can be passed by @ to the extractor command line.

/// Logs information about the extractor, as well as the arguments to Roslyn.
/// </summary>
/// <param name="roslynArgs">The arguments passed to Roslyn.</param>
/// <returns>A Boolean indicating whether to proceed with extraction.</returns>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, "A Boolean indicating whether the same arguments have been logged previously." The decision about extraction is taken elsewhere.


layout = new Layout();
this.options = options;

extractor = new Extraction.Extractor(false, GetOutputName(compilation, commandLineArguments), Logger);
finalizeInit = comp =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to just pull this out into a method FinalizeInitialization(CSharpCompilation comp ...), and to remove the finalizeInit argument on line 51. Then just call FinalizeInitialization explicitly.

@@ -259,7 +260,7 @@ void DoAnalyseAssembly(PortableExecutableReference r)
var projectLayout = layout.LookupProjectOrDefault(assemblyPath);
using (var trapWriter = projectLayout.CreateTrapWriter(Logger, assemblyPath, true, options.TrapCompression))
{
var skipExtraction = FileIsCached(assemblyPath, trapWriter.TrapFile);
var skipExtraction = options.Cache && File.Exists(trapWriter.TrapFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally, push logic into FileIsCached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileIsCached actually does more, it also checks for timestamps, which is what didn't work for Mono and CoreFX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, change FileIsCached to use File.Exists and not timestamps.

{
Logger.Log(Severity.Info, $" Arguments to Roslyn: {string.Join(' ', roslynArgs)}");
streamWriter.WriteLine($"Arguments to Roslyn: {string.Join(' ', roslynArgs)}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this, because we want to be able to @-include the arguments file on the command line arguments for debugging purposes, if we need to reproduce extraction of a particular compilation command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it is there is because we need the hash of the file to include all arguments to Roslyn, not just the ones from the response file, in order to identify identical compilations. However, it appears that response files can contain comments, so adding this line as a comment should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment worked beautifully, nice find. However, the hash of the file and the contents of the file could be different. You hash all the "inputs" (i.e. the command line), and could even have an empty file just to indicate that the arguments have been seen.

calumgrant
calumgrant previously approved these changes Sep 17, 2019
@@ -259,7 +260,7 @@ void DoAnalyseAssembly(PortableExecutableReference r)
var projectLayout = layout.LookupProjectOrDefault(assemblyPath);
using (var trapWriter = projectLayout.CreateTrapWriter(Logger, assemblyPath, true, options.TrapCompression))
{
var skipExtraction = FileIsCached(assemblyPath, trapWriter.TrapFile);
var skipExtraction = options.Cache && File.Exists(trapWriter.TrapFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, change FileIsCached to use File.Exists and not timestamps.

{
Logger.Log(Severity.Info, $" Arguments to Roslyn: {string.Join(' ', roslynArgs)}");
streamWriter.WriteLine($"Arguments to Roslyn: {string.Join(' ', roslynArgs)}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment worked beautifully, nice find. However, the hash of the file and the contents of the file could be different. You hash all the "inputs" (i.e. the command line), and could even have an empty file just to indicate that the arguments have been seen.

Copy link
Contributor

@calumgrant calumgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small suggestion but the overall approach LGTM.

@@ -456,7 +456,7 @@ public bool LogRoslynArgs(string[] roslynArgs, string extractorVersion)
bool argsWritten;
using (var streamWriter = new StreamWriter(new FileStream(tempFile, FileMode.Append, FileAccess.Write)))
{
streamWriter.WriteLine($"# Arguments to Roslyn: {string.Join(' ', roslynArgs)}");
streamWriter.WriteLine($"# Arguments to Roslyn: {string.Join(' ', roslynArgs.Where(arg => arg[0] != '@'))}");
Copy link
Contributor

@calumgrant calumgrant Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!arg.StartsWith("@")

and fix CommandLineExtensions.cs:19 as well?

It's just possible that arg is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is better.

@hvitved hvitved force-pushed the csharp/early-identify-duplicate-extraction branch from 804521d to 61bd9f2 Compare September 19, 2019 11:39
@semmle-qlci semmle-qlci merged commit 0387177 into github:master Sep 19, 2019
@hvitved hvitved deleted the csharp/early-identify-duplicate-extraction branch September 19, 2019 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants