Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spurious CI error for MSVC #81890

Open
ehuss opened this issue Feb 8, 2021 · 7 comments
Open

spurious CI error for MSVC #81890

ehuss opened this issue Feb 8, 2021 · 7 comments
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-bug Category: This is a bug. O-windows Operating system: Windows O-windows-msvc Toolchain: MSVC, Operating system: Windows T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@ehuss
Copy link
Contributor

ehuss commented Feb 8, 2021

There have been a few instances of this error happening on CI:

= note: LINK : fatal error LNK1201: error writing to program database 'D:\a\rust\rust\build\i686-pc-windows-msvc\test\run-make-fulldeps\extern-flag-fun\extern-flag-fun\foo.pdb'; check for insufficient disk space, invalid path, or insufficient privilege

Or some variant of that. See:

(and probably many more)

@ehuss ehuss added the C-bug Category: This is a bug. label Feb 8, 2021
@JohnTitor JohnTitor added A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) O-windows-msvc Toolchain: MSVC, Operating system: Windows T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Feb 10, 2021
@kennytm kennytm changed the title spurious CI error for i686 MSVC spurious CI error for MSVC Nov 13, 2021
@hkratz
Copy link
Contributor

hkratz commented Nov 28, 2021

The cargo test lto::complicated has been triggering this a lot lately. And cargo itself prints warnings, because the same .pdb file is written multiple times:

021-11-28T17:13:11.5135303Z The bin target `test` in package `test v0.0.0 (D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage2-tools\x86_64-pc-windows-msvc\tmp\cit\t1300\foo)` has the same output filename as the lib target `test` in package `test v0.0.0 (D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage2-tools\x86_64-pc-windows-msvc\tmp\cit\t1300\foo)`.
2021-11-28T17:13:11.5137366Z Colliding filename is: D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage2-tools\x86_64-pc-windows-msvc\tmp\cit\t1300\foo\target\release\deps\test.pdb
2021-11-28T17:13:11.5138476Z The targets should have unique names.
2021-11-28T17:13:11.5139250Z Consider changing their names to be unique or compiling them separately.
2021-11-28T17:13:11.5140516Z This may become a hard error in the future; see <https://github.com/rust-lang/cargo/issues/6313>.
2021-11-28T17:13:11.5141536Z warning: output filename collision.
2021-11-28T17:13:11.5143164Z The bin target `test` in package `test v0.0.0 (D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage2-tools\x86_64-pc-windows-msvc\tmp\cit\t1300\foo)` has the same output filename as the lib target `test` in package `test v0.0.0 (D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage2-tools\x86_64-pc-windows-msvc\tmp\cit\t1300\foo)`.
2021-11-28T17:13:11.5145380Z Colliding filename is: D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage2-tools\x86_64-pc-windows-msvc\tmp\cit\t1300\foo\target\release\test.pdb
2021-11-28T17:13:11.5146482Z The targets should have unique names.
2021-11-28T17:13:11.5147438Z Consider changing their names to be unique or compiling them separately.
2021-11-28T17:13:11.5148495Z This may become a hard error in the future; see <https://github.com/rust-lang/cargo/issues/6313>.
[...]
021-11-28T17:13:11.5246422Z   = note: LINK : fatal error LNK1201: error writing to program database 'D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage2-tools\x86_64-pc-windows-msvc\tmp\cit\t1300\foo\target\release\deps\test.pdb'; check for insufficient disk space, invalid path, or insufficient privilege
2021-11-28T17:13:11.5247882Z           
2021-11-28T17:13:11.5248143Z 
2021-11-28T17:13:11.5248698Z error: could not compile `test` due to previous error

So I guess something locks the file after it is written so it can't be overwritten shortly thereafter. Can the test be updated not to do this?

@ehuss
Copy link
Contributor Author

ehuss commented Nov 29, 2021

@hkratz I'm looking into fixing that test (and doing some other collision stuff).

I'm also trying to dig into LNK1201. Most of the errors seem to be on tests that build the same crate twice in a row. Unfortunately everything I've tried to reproduce it haven't succeeded.

Just on the off chance we can find someone who wants to help:

@rustbot ping windows

This issue is causing significant trouble on CI, and I'm having a hard time making progress on it. A huge help would be if someone can find some way to reproduce it. I've tried running rustc -g in a loop, but I can't trigger the error. A really trivial test that triggered this error in the past is https://github.com/rust-lang/rust/blob/master/src/test/run-make-fulldeps/override-aliased-flags/Makefile.

@rustbot
Copy link
Collaborator

rustbot commented Nov 29, 2021

Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some instructions for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3

cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @nico-abram @retep998 @rylev @sivadeilra @wesleywiser

@rustbot rustbot added the O-windows Operating system: Windows label Nov 29, 2021
@hkratz
Copy link
Contributor

hkratz commented Nov 29, 2021

@hkratz I'm looking into fixing that test (and doing some other collision stuff).

@ehuss Great! I wrote a little script to grep through the CI logs and of the last 30 failed msvc runs which had LNK1201 in it all failed at lto::complicated. So fixing this test will solve this for us.

The script for reference (excuse my Python, not a Python native):

import github
import requests
from github import Github


def main():
    token = "<replace with token>"
    g = Github(token)
    # github.enable_console_debug_logging()
    repo_name = "rust-lang-ci/rust"
    repo = g.get_repo(repo_name)
    workflow_runs = repo.get_workflow_runs(status="failure", branch="auto")
    for run in workflow_runs:
        jobs_url = run.jobs_url + "?per_page=100"
        response = requests.get(jobs_url, headers={"Authorization": f"token {token}"})
        response.raise_for_status()
        for job in response.json()["jobs"]:
            if job["conclusion"] == "failure" and "msvc" in job["name"]:
                log_url = f"https://api.github.com/repos/{repo_name}/actions/jobs/{ job['id']}/logs"
                log_response = requests.get(
                    log_url, headers={"Authorization": f"token {token}"}
                )
                log_response.raise_for_status()
                log = log_response.text
                if "LNK1201" in log:
                    print(log_url)
                    print("\n\n")
                    print(log)

bors added a commit to rust-lang/cargo that referenced this issue Nov 29, 2021
Fix some tests with output collisions.

This fixes some tests which run afoul of creating colliding outputs (tracked in #6313). In particular, these tests are creating duplicate pdb files on Windows because they have a binary and a library (dylib) with the same name. This is causing significant issues on rust-lang's CI (rust-lang/rust#81890) where the MSVC linker is failing with a mysterious LNK1201 error. Presumably two LINK.exe processes are trying to write to the same PDB file at the same time, which causes it to fail.

Ideally this shouldn't happen, but I don't really have any ideas on how to resolve it, as the name of the PDB has some importance.

I have not been able to reproduce the LNK1201 error. My hope is that this change will help alleviate the issue, though.

I updated the `doc_all_member_dependency_same_name` test to illustrate that it is hitting a collision, which is a fundamental part of that test (and something we should probably figure out how to resolve in the future).
@wesleywiser
Copy link
Member

So I guess something locks the file after it is written so it can't be overwritten shortly thereafter.

I talked with an engineer internally and they said that LNK1201 most often occurs because some program has a lock on the file. I don't know what the configuration of the GH runners looks like but perhaps Windows Defender (or similar AV/AM software) is scanning the file right after the linker finishes writing it the first time?

Regardless, rust-lang/cargo#10137 looks like the correct approach to resolving this to me. Thanks @ehuss!

@ematipico
Copy link

I tried to upgrade the toolchain to rust 1.72.0 and I hit this issue: https://github.com/biomejs/biome/actions/runs/5975779047/job/16212389309?pr=66

Interestingly, this issue occurs only when running tests using nextest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-bug Category: This is a bug. O-windows Operating system: Windows O-windows-msvc Toolchain: MSVC, Operating system: Windows T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants