Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Ask a question: Can we fix the link-checker in this repo? #102

Closed
sshine opened this issue Jan 16, 2020 · 18 comments
Closed

Ask a question: Can we fix the link-checker in this repo? #102

sshine opened this issue Jan 16, 2020 · 18 comments
Labels
status/help-wanted Extra attention is needed type/ci Relates to the Continuous Integration in this repository. type/discussion Discussion regarding v3

Comments

@sshine
Copy link
Contributor

sshine commented Jan 16, 2020

The link-checker CI check is bumming me out.

  • When it fails, I get 1800 lines of output, but the actual error isn't highlighted.

  • When I scroll to the culprit on line 773 (there you are!), the failed link is this:

    FILE: ./languages/csharp/exercises/concept/dates/.docs/hints.md
    [✖] https://csharp.net-tutorials.com/data-types/working-with-dates-time/
    
    6 links checked.
    [✖] https://csharp.net-tutorials.com/data-types/working-with-dates-time/ → Status: 0
    

    which means to say, (a) this link works when I check it, so it's a flaky check, (b) this link is not in my pull request, so my pull request has failed because of a link placed in another part of the repo at another point in time.

Perhaps the link-check run during pull requests should be limited to the files being changed? Then periodic link-checking can report on existing links that are broken, so that I don't have to address C#-related links when pushing Haskell-related documentation that has no links.

Also, wrt. flakiness, one strategy we have at work is to run it twice and only report failure if it fails in both cases. Otherwise the amount of links added over time will make the link-check fail a linearly growing percentage of the time.

@sshine sshine added the status/help-wanted Extra attention is needed label Jan 16, 2020
@ErikSchierboom
Copy link
Member

I like these ideas!

@SleeplessByte
Copy link
Member

Perhaps the link-check run during pull requests should be limited to the files being changed? Then periodic link-checking can report on existing links that are broken, so that I don't have to address C#-related links when pushing Haskell-related documentation that has no links.

Absolutely agreed!

The files that have been changed I think come in into GH via the API.

Also, wrt. flakiness, one strategy we have at work is to run it twice and only report failure if it fails in both cases. Otherwise the amount of links added over time will make the link-check fail a linearly growing percentage of the time.

I'd hope we can just turn off absolute paths, because they shouldn't be checked imo.

@iHiD
Copy link
Member

iHiD commented Jan 16, 2020

I'd hope we can just turn off absolute paths, because they shouldn't be checked imo.

I'm good with this.

Then periodic link-checking can report on existing links that are broken, so that I don't have to address C#-related links when pushing Haskell-related documentation that has no links.

Master should be green, so this should never be a problem though? ie, the only errors you should see is ones that the PR introduces.

The other problem is that you might remove a file, that has links in other files pointing to it. There is lots of valuable cross-referencing.

When I scroll to the culprit on line 773 (there you are!), the failed link is this:

Can we pipe the output through a grep pattern that only shows errors? That sounds straight-forward for someone who can linux.

@SaschaMann
Copy link
Contributor

Master should be green, so this should never be a problem though? ie, the only errors you should see is ones that the PR introduces.

It's possible that an external link breaks in the time between the last master build and the PR.

@SleeplessByte
Copy link
Member

It's possible that an external link breaks in the time between the last master build and the PR.

TBH I don't care that much about external links at time of merge, but we should check it periodically.

@SaschaMann
Copy link
Contributor

(sorry I clicked the wrong button)

SaschaMann added a commit that referenced this issue Jan 16, 2020
@SleeplessByte
Copy link
Member

Can we pipe the output through a grep pattern that only shows errors? That sounds straight-forward for someone who can linux.

Sounds like a great idea!

@ErikSchierboom
Copy link
Member

Can we pipe the output through a grep pattern that only shows errors? That sounds straight-forward for someone who can linux.

We'd have to make sure we don't lose the context of the errors, i.e. the file in which the errors are contained.

@SaschaMann
Copy link
Contributor

We'd have to make sure we don't lose the context of the errors, i.e. the file in which the errors are contained.

The current link verifier works on individual files anyway, so that'd be doable. I'd still like to switch to a faster one anyway.

@iHiD
Copy link
Member

iHiD commented Jan 16, 2020

For clarity, if someone PRs it to be better, I'll merge it :)

@ErikSchierboom - You can use -b 1 to grep to previous lines I believe (I don't understand piping really though, so it might not work in that context)

@ErikSchierboom
Copy link
Member

Ah cool, didn't know!

@SleeplessByte
Copy link
Member

https://github.com/exercism/v3/pull/127/checks?check_run_id=393575497

Is it possible that it skips _files.md and then not allow /root/based/paths?

@iHiD
Copy link
Member

iHiD commented Jan 16, 2020

I'm merging #134 for now.

@iHiD
Copy link
Member

iHiD commented Jan 16, 2020

If someone wants to edit the OP with the tangible steps that need doing that will probably solicit fixes form the community.

@ghost
Copy link

ghost commented Feb 11, 2020

I believe this is similar to #222 (comment). It asserts markdown persisted in the repository metadata ( .git/) when it probably shouldn't.
The logic is here for future contributors:

deleted_or_renamed=$(git diff --no-commit-id --name-only --diff-filter DR origin/master | grep -i .md$ | grep -v -i _sidebar.md | grep -v -i ISSUE_TEMPLATE | wc -l)
if [ "$GITHUB_REF" = "master" ] || [ $deleted_or_renamed -ne 0 ]
then
files=$(find . -name \*.md ! -iname _sidebar.md ! -ipath \*/ISSUE_TEMPLATE/\*.md)
else
files=$(git diff --no-commit-id --name-only --diff-filter AM origin/master | grep -i .md$ | grep -v -i _sidebar.md | grep -v -i ISSUE_TEMPLATE | cat)
fi
for file in $files; do
if [ -f "$file" ]
then
markdown-link-check -q -c .github/markdown-link-check-config.json $file
fi
done

@ErikSchierboom
Copy link
Member

@sshine I think this issue can be closed. Correct?

@sshine
Copy link
Contributor Author

sshine commented Feb 11, 2020

Reading back it seems that @iHiD suggested one thing that I don't see fulfilled:

The other problem is that you might remove a file, that has links in other files pointing to it. There is lots of valuable cross-referencing.

Maybe I overlooked a detail in a PR related to this issue, and maybe this needs to be addressed elsewhere. Moving or removing files should probably trigger a repo-wide check for references to precisely those files.

If we're limited to only checking local links, then doing a repo-wide check on every PR should be equivalent to only checking affected files. It is not my impression that we have reached a consensus here?

Feel free to close this issue if the above is irrelevant or should be addressed later.

@ErikSchierboom
Copy link
Member

Maybe I overlooked a detail in a PR related to this issue, and maybe this needs to be addressed elsewhere. Moving or removing files should probably trigger a repo-wide check for references to precisely those files.

This is in the workflow:

deleted_or_renamed=$(git diff --no-commit-id --name-only --diff-filter DR origin/master | grep  -i .md$ | grep -v -i _sidebar.md | grep -v -i ISSUE_TEMPLATE | wc -l)
if [ "$GITHUB_REF" = "master" ] || [ $deleted_or_renamed -ne 0 ]
then
    files=$(find . -name \*.md ! -iname _sidebar.md ! -ipath \*/ISSUE_TEMPLATE/\*.md -not -path "./.git/*")
else
    files=$(git diff --no-commit-id --name-only --diff-filter AM origin/master | grep  -i .md$ | grep -v -i _sidebar.md | grep -v -i ISSUE_TEMPLATE | cat)
fi

It could of course be that this does not work.

@ErikSchierboom ErikSchierboom added type/discussion Discussion regarding v3 and removed discussion labels Apr 21, 2020
@iHiD iHiD closed this as completed Jan 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/help-wanted Extra attention is needed type/ci Relates to the Continuous Integration in this repository. type/discussion Discussion regarding v3
Projects
None yet
Development

No branches or pull requests

6 participants