Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hostfxr hosting layer API #161

Merged
merged 12 commits into from
Aug 10, 2022
Merged

Conversation

Forgind
Copy link
Member

@Forgind Forgind commented Jun 23, 2022

This makes the process of finding .NET SDK instances more efficient and fix dotnet/msbuild#7466

This makes the process of finding .NET SDK instances more efficient and fix dotnet/msbuild#7466
@jzabroski
Copy link

@rainersigwald I remain perplexed how you can review and/or test quality assurance of any of these changes.

@jzabroski
Copy link

Hi @Forgind

I'm not trying to negatively criticize your improvements here. I just see a lot of changes to MSBuildLocator with no tests. There is a huge pile of assumptions that underlies how MSBuildLocator even works that isn't routinely proven and/or is legacy.

For example, if the goal of this commit is to improve performance, it would be nice to benchmark that.

I'm curious what other build tools like Bazel do for stuff like this.

@Forgind
Copy link
Member Author

Forgind commented Jun 28, 2022

@jzabroski,

You're absolutely right that the lack of strong testing infrastructure is a major issue in this repo. My question is: how would you test it? Take this change, for example. This changes how we first find a list of sdks that might have MSBuild. We could try to mock the native call and verify that the mocked native call returns as expected, but that's really just testing the test, not testing the change. We could download a specific set of SDKs and verify that those and only those are found, but that's an end-to-end test, not a unit test; having a long list of those would be very expensive computationally. We could have a short list of end-to-end tests, but at that point, how different is it from manually testing a change?

If you have a good idea for how to write unit tests for MSBuildLocator or feel moved to write a few solid end-to-end tests, we would likely take them. Short of that, I think manual testing is sufficient. This is only issue/pr#161; if it would take our team a lot of effort to create good tests, I don't think it's worth it unless MSBuildLocator suddenly sees a lot more activity.

@jzabroski
Copy link

jzabroski commented Jun 28, 2022

how would you test it?

Docker.

Short of that, I think manual testing is sufficient.

I don't even know how you do manual testing for this stuff. Where is the testing script?

@jzabroski
Copy link

To elaborate a bit more, there are two open source projects that allow creating ad-hoc docker containers using the Docker API:

  1. https://github.com/Deffiss/testenvironment-docker -> https://www.nuget.org/packages/TestEnvironment.Docker
  2. https://github.com/testcontainers/testcontainers-dotnet -> https://www.nuget.org/packages/DotNet.Testcontainers

I prefer the first one. I tried to get involved in the second one, but I found it was fairly difficult to communicate with the maintainers. TestContainers is a broader initiative supporting many different languages, so I imagine some of the change control process involves fitting the overall project's architecture. Each supported language doesn't really simply do a language binding, either, but a ground-up native re-implementation.

Despite the fact Deffiss's TestEnvironment.Docker project has about 4x fewer downloads, I think it is a lot cleaner, especially how they take advantage of C# 9 records' with syntax. The main disadvantage relative to TestContainers.Net is that Deffiss's project doesn't currently support Moby Ryuk for "resource reaping" (garbage collection for Docker test artifacts like images and containers) . In general, that shouldn't be that big of a deal.

Windows also now supports podman, so you can use that in some scenarios, but Docker is still better supported, I believe.

Cheers.

@Forgind
Copy link
Member Author

Forgind commented Jun 28, 2022

Docker is certainly better than what I'd originally been imagining. It would let you run tests with different environments in a fairly clean way. On the other hand, there are still only two relevant parts of MSBuildLocator: does it find the MSBuilds you expect, and after you've picked one, does it let you use the API as expected. Using this change as an example, only the first half is relevant, so if I had great tests for it, they'd create an environment with some SDKs and verify that those SDKs are found. I'd also want to make sure that works on both linux and windows. So that's what I tested. I have several SDKs on my windows computer, and I made sure it found the ones I expected it to find and didn't find the ones I didn't expect it to find. I verified that it put the right one at the top, i.e., sorted by version in descending order. Then I tested it on Ubuntu. That testing is why I noticed the PATH variable is different on windows and Ubuntu, hence the latest change. I also noticed that the call to hostfxr_resolve_sdk_2 returned the path to the global.json rather than the path to the best MSBuild, though I haven't figured out an ideal solution to that yet.

The problem is: there aren't very many tests I can run for MSBuildLocator, and I do run those every time I make a change because they build off of each other. I just run them manually rather than automatically. As I said, I'd love to have it all automated; I'm just not sure if it's worth the time at the moment, given everything else we have to do.

@Forgind
Copy link
Member Author

Forgind commented Jun 28, 2022

Edit: figured out that last part 🙂

@jzabroski
Copy link

I think it's pretty easy to create an image that installs a set of SDKs, and associate that image with negative/positive test cases. Certainly about as much time as manually checking it. I also think it would be cleaner in terms of catching issues, like returning multiple matches or duplicate matches.

Why do you think it would take longer?

I do agree, the initial investment/learning curve is a bit annoying, but once the guidewire architecture for writing tests is in place, the tests write themselves practically.

Comment on lines 115 to 119
string dotnetPath = File.Exists("/usr/share/dotnet/dotnet") ? "/usr/share/dotnet" :
File.Exists($"/home/{Environment.GetEnvironmentVariable("USER")}/share/dotnet/dotnet") ? $"/home/{Environment.GetEnvironmentVariable("USER")}/share/dotnet" :
File.Exists("dotnet") ? "." :
!string.IsNullOrEmpty(Environment.GetEnvironmentVariable("DOTNET_HOME")) ? Environment.GetEnvironmentVariable("DOTNET_HOME") :
null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Can you explain?

For instance, given a local unzipped SDK I expect something like

$ PATH=/tmp/dotnet_6.0.400-preview:$PATH /home/raines/src/MSBuildLocator/samples/BuilderApp/bin/Debug/net6.0/BuilderApp MSBuildLocator.sln
Sample MSBuild Builder App 1.4.6+65bb4606a6.

0) Custom path
1) .NET Core SDK - 6.0.400

Select an instance of MSBuild:

But with your change I instead get

$ PATH=/tmp/dotnet_6.0.400-preview:$PATH /home/raines/src/MSBuildLocator/samples/BuilderApp/bin/Debug/net6.0/BuilderApp MSBuildLocator.sln
Sample MSBuild Builder App 1.4.11+1e1e39dd5d.

0) Custom path
1) .NET Core SDK - 6.0.300
2) .NET Core SDK - 5.0.408

Select an instance of MSBuild:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked about the recommended way to find dotnet. I believe your test got different results because the old version skipped to just checking your PATH, whereas this new version looks at several other places that seem more reasonable (if dotnet is there) than PATH. Those other places don't know about this random SDK you just installed but do know about the SDKs you installed the normal way. This is not a case in which I think we should be optimizing for bad customers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to replicate "the build that you would get if you typed dotnet build instead of toolThatUsesLocator". That would respect PATH and multi-level lookup and friends and would not bypass them to find a machine-installed SDK.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would respect PATH and multi-level lookup and friends and would not bypass them to find a machine-installed SDK.

That would be a good integration test 👍

Comment on lines 123 to 139
// Windows
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(';'))
{
if (File.Exists(Path.Combine(dir, "dotnet.exe")))
{
WorkingDirectory = workingDirectory,
CreateNoWindow = true,
UseShellExecute = false,
RedirectStandardOutput = true
dotnetPath = dir;
}
};

// Ensure that we set the DOTNET_CLI_UI_LANGUAGE environment variable to "en-US" before
// running 'dotnet --info'. Otherwise, we may get localized results.
process.StartInfo.EnvironmentVariables[DOTNET_CLI_UI_LANGUAGE] = "en-US";
}

process.OutputDataReceived += (_, e) =>
// Unix
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(':'))
{
if (!string.IsNullOrWhiteSpace(e.Data))
if (File.Exists(Path.Combine(dir, "dotnet")))
{
lines.Add(e.Data);
dotnetPath = dir;
}
};

process.Start();
}
catch
{
// when error running dotnet command, consider dotnet as not available
yield break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could unify these with Path.PathSeparator and a runtime check for full executable name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't think I understand why a PATH search is required here. Is there nothing else we can do? No hostfxr API we can call to get the directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked. I don't think unifying those makes it better. There are three differences between unix and windows here: Path.PathSeparator, the full executable name, and the delimiter in the PATH variable. Having three places each doing those double checks seems much messier to me, given how little code this actually is. Note that we can't split by both PATH delimiters, as that would break up, for instance, C:\anything

{
lines.Add(e.Data);
dotnetPath = dir;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. realpath is for fixing character widths, and CharSet.Auto seems to have taken care of that. I did test this on both windows and WSL, and it worked for both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realpath is for fixing character widths

What gave you that impression? realpath is for resolving symlinks to their final destination: https://www.man7.org/linux/man-pages/man3/realpath.3.html

// running 'dotnet --info'. Otherwise, we may get localized results.
process.StartInfo.EnvironmentVariables[DOTNET_CLI_UI_LANGUAGE] = "en-US";
// Windows
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(';'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeInformation.IsOSPlatform

Comment on lines 81 to 125
enum hostfxr_resolve_sdk2_flags_t
{
disallow_prerelease = 0x1,
};

enum hostfxr_resolve_sdk2_result_key_t
{
resolved_sdk_dir = 0,
global_json_path = 1,
};

[UnmanagedFunctionPointer(CallingConvention.Cdecl, CharSet = CharSet.Auto)]
private delegate void hostfxr_resolve_sdk2_result_fn(
hostfxr_resolve_sdk2_result_key_t key,
string value);

[UnmanagedFunctionPointer(CallingConvention.Cdecl, CharSet = CharSet.Auto)]
private delegate void hostfxr_get_available_sdks_result_fn(
hostfxr_resolve_sdk2_result_key_t key,
[MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)]
string[] value);

[DllImport("hostfxr", CharSet = CharSet.Auto, ExactSpelling = true, CallingConvention = CallingConvention.Cdecl)]
private static extern int hostfxr_resolve_sdk2(
string exe_dir,
string working_dir,
hostfxr_resolve_sdk2_flags_t flags,
hostfxr_resolve_sdk2_result_fn result);

[DllImport("hostfxr", CharSet = CharSet.Auto, ExactSpelling = true, CallingConvention = CallingConvention.Cdecl)]
private static extern int hostfxr_get_available_sdks(string exe_dir, hostfxr_get_available_sdks_result_fn result);

[DllImport("libc", ExactSpelling = true, CallingConvention = CallingConvention.Cdecl)]
private static extern IntPtr realpath(string path, IntPtr buffer);

[DllImport("libc", ExactSpelling = true, CallingConvention = CallingConvention.Cdecl)]
private static extern void free(IntPtr ptr);

private static string realpath(string path)
{
IntPtr ptr = realpath(path, IntPtr.Zero);
string result = Marshal.PtrToStringAuto(ptr);
free(ptr);
return result;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .NET convention for this type of thing is to put these into a class named NativeMethods: https://docs.microsoft.com/dotnet/fundamentals/code-analysis/quality-rules/ca1060

process = new Process()
{
StartInfo = new ProcessStartInfo("dotnet", "--info")
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(';'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(';'))
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(Path.PathSeparator))

This provides semantic meaning and gets rid of semi-magic constants.

}
else
{
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(':'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(':'))
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(Path.PathSeparator))

process = new Process()
{
StartInfo = new ProcessStartInfo("dotnet", "--info")
foreach (string dir in Environment.GetEnvironmentVariable("PATH").Split(';'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment the path search with the reasoning you figured out in consultation with the .NET Runtime folks.

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the version impact of this be? It's theoretically a same-behavior bugfix but it "feels" bigger than that; I'd probably bump the minor version.


lineSdkIndex++;
}
if (rc != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need any means of error reporting here? As is we're silently ignoring failures, which might be reasonable but that isn't fully obvious to me. Justify with a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this (and didn't test it, since I don't know how to introduce an error in hostfxr), but I think errors are automatically printed to stderr:
https://github.com/dotnet/runtime/blob/8e407e32bbb13874a7a8b8651d6d3f1d70574754/src/native/corehost/hostfxr.h#L64

I can add a comment to that effect if you think it reasonable. This part was more aimed at doing something reasonable after writing the error. Note that the docs I looked at all checked the return value but didn't do anything with it other than returning false or whatever.

That said, this made me notice the more important bug: I wasn't resetting rc with the hostfxr_get_available_sdks call, so this part is completely invalid 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I don't love us writing to stderr accidentally. What I was wondering was whether we should throw an exception if the native call returns anything other than 0. I think my ideal would be to overwrite that error mechanism, capture it to a string, and throw with that string. What does the SDK resolver do?

Copy link
Member Author

@Forgind Forgind Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, even though those are public methods, they're probably assuming some invariants from the caller.
The right thing to do would be to patch the sdk resolver.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they have another case in which they don't check at all:
https://github.com/dotnet/sdk/blob/a30e465a2e2ea4e2550f319a2dc088daaafe5649/src/Resolvers/Microsoft.DotNet.NativeWrapper/NETCoreSdkResolverNativeWrapper.cs#L45

They get the errorCode, and do nothing with it - that should change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not wrong; we could do better. On the other hand, I imagine the scenarios in which the difference between "prints to stderr" and "we caught the exception and sent our own error message" really matters are few and far between, noting that the SDK resolver has been used extensively, and that code has peacefully existed for years. I also believe failure here is pretty rare; we aren't doing anything unusual or passing in arguments, so there isn't much that can go wrong.

Including moving to version 1.5
@Forgind
Copy link
Member Author

Forgind commented Aug 9, 2022

What should the version impact of this be? It's theoretically a same-behavior bugfix but it "feels" bigger than that; I'd probably bump the minor version.

That sounds reasonable to me. If we were pushing to MSBuildLocator regularly, I might push back, but we haven't had a version bump in over a year. It's time. I'm thinking we can push a release version after this goes in and the target framework change goes in. Sound good to you?

@Forgind
Copy link
Member Author

Forgind commented Aug 9, 2022

Forgot I'd already pushed the version update to the tf PR. Ah, well.

@Forgind Forgind merged commit d0d15f7 into microsoft:master Aug 10, 2022
@Forgind Forgind deleted the new-sdk-find-logic branch August 10, 2022 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MSBuild hangs and eventually times out for OOP builds queued from .NET 6 process running in VS apphost
3 participants