Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support linking to anchors in directories #49

Closed
loilo opened this issue Dec 12, 2019 · 15 comments · Fixed by #50
Closed

Support linking to anchors in directories #49

loilo opened this issue Dec 12, 2019 · 15 comments · Fixed by #50
Labels
💪 phase/solved Post is done

Comments

@loilo
Copy link
Contributor

loilo commented Dec 12, 2019

Subject of the feature

Linking directly to folders, showing the folder's readme, is supported by all three supported Git hosts — GitHub, Gitlab and Bitbucket, and also by this plugin.

However, this plugin considers anchors appended to those directory links invalid without checking (reported as missing-heading-in-file).

Problem

Linking to a section in a folder's readme directly (without linking explicitly to the readme file) is a pattern I like to use a lot in larger projects. It works perfectly fine in GitHub and Gitlab and should therefore not be reported as an issue by this plugin. (It should theoretically work in BitBucket as well since that also renders the readme below a directory's file list. But since BitBucket behaves more like an SPA it generally has trouble with anchors in cross-folder links.)

Expected behaviour

When provided a directory link with a hash, this plugin should consider that a link to the README.md of that folder and evaluate the hash accordingly.

If no readme exists, a missing-file should be appropriate.

I added a repro at loilo/repro-remark-directory-links-49. It shows some of the described warnings but exits early due to a runtime error that occurs when linking to the root folder with a hash appended.

Caveats

  • In addition to headline anchors, the #readme anchor should be supported — however, this should already be taken care of by Allow links to point to the top of the document #48.

  • Case sensitivity is a little bit unclear: While GitHub supports uppercase and lowercase readme file names, there's no documentation on which one is preferred if more than one matching file is present.

    I tested this a while ago and if I remember correctly, there is no way to determine priority from the file name, but actually the readme file committed first will be used. GitHub will even prefer a README.rst over a README.md if it was added earlier.

    Gitlab and BitBucket also support case-insensitive readme file names, but I have no idea (and have not done any tests) how they determine the preferred file.

    However, I'd probably ignore all this "multiple readme files" fluff as "weird edge case" and hands down just pick the first thing that looks like a /^readme.md$/g.

@loilo loilo added 🙉 open/needs-info This needs some more info 🦋 type/enhancement This is great to have labels Dec 12, 2019
@wooorm
Copy link
Member

wooorm commented Dec 12, 2019

This won’t be trivial, but, may be doable, and is useful!

  • Here, new files are added on the CLI:

    fileSet.add(vfile({cwd: file.cwd, path: path.relative(file.cwd, fp)}))

  • Here, files are accessed. We could use fs.stats instead. Stats has a method .isDirectory that could be used to find out that something is a folder and that we need to find out which readme file is probably uses:

    function checkIfExists(filePath) {
    fs.access(filePath, fs.F_OK, onaccess)
    function onaccess(err) {
    var noEntry = err && err.code === 'ENOENT'
    landmarks[filePath] = {'': !noEntry}

  • The file extensions to support for readmes are documented in github/markup. For the file name, /^readme$/i should be fine. On the CLI, we use:

    https://github.com/remarkjs/remark/blob/24cd52d6075966e9113524f002c0cb37133f1a37/packages/remark-cli/cli.js#L5

  • Maybe you are right about older-file-first! That would be difficult. I was thinking that maybe they do alpha-sort, and prefer the first? That suggests why .asciidoc is preferred here. I would say to ignore all non-markdown extensions entirely. This is a markdown project. People using it will probably only use Markdown.

  • What may be hard is that we now have URLs to things (/path/to/folder#heading) and (/path/to/folder/readme.md#heading) that are populated from different URLs: (/path/to/folder/readme.md#heading). Maybe a solution is, for every /^readme.md$/i file, to also expose URLs to its folder?

@loilo
Copy link
Contributor Author

loilo commented Dec 12, 2019

I was thinking that maybe they do alpha-sort, and prefer the first?

I just tested again, and you're absolutely correct. Either my memory trolled me or GitHub changed this.

I would say to ignore all non-markdown extensions entirely.

Definitely. But GitHub still allows you to have a README.md and a readme.md in the same folder. However, README.md is always preferred, so apparently alphasort it is.

What may be hard is that we now have URLs to things (/path/to/folder#heading) and (/path/to/folder/readme.md#heading) that are populated from different URLs: ( /path/to/folder/readme.md#heading). Maybe a solution is, for every /^readme.md$/i file, to also expose URLs to its folder?

Sorry, couldn't really follow you on that. Would you mind to rephrase or elaborate on this? 🙂

@wooorm
Copy link
Member

wooorm commented Dec 12, 2019

👍

We process every file to find outgoing links and defined links.
I think that when generating defined links (/path/to/folder/readme.md#heading) it makes sense to set defined links for the folder if the file matches a supported readme filename (/path/to/folder#heading).

These places are I believe here and here

@loilo
Copy link
Contributor Author

loilo commented Dec 12, 2019

Oh okay, I think I got it. So instead of internally resolving the directory to the readme file as I suggested, we'd do it the other way around and internally copy over the found anchors from each readme to its corresponding directory?

@wooorm
Copy link
Member

wooorm commented Dec 12, 2019

Yes! So, paths to folders currently fail, we fix them, by loading a correct readme.
For every file that is processed, that has a valid readme name, expose both links from that file, but also links from its directory!

This solves the case where links are mixed: both to a readme, and to its folder. Only need to process that file once.

@loilo
Copy link
Contributor Author

loilo commented Dec 12, 2019

Sounds good. I'll take a look and attach a PR if I manage to wrap my head around it. 😉

@loilo
Copy link
Contributor Author

loilo commented Dec 12, 2019

One more concern I just came up with: If we want to detect a readme file in a folder, we'd have to read all files in the directory and match them against the readme pattern. This might become a performance concern with really big folders.

Do you think it would be worth it to check for for the most common readme.md and README.md explicitely, and only then fall back to scanning the directory?

@wooorm
Copy link
Member

wooorm commented Dec 12, 2019

Hmm, Readme.md is also likely.
I don’t know, the list becomes relatively long that I’m worried it wouldn’t add much of a performance improvement.

One potential approach is fs.Dir for streaming directory entries. It’s able to stop reading when a match is found. However, it doesn’t return results in a particular order, so that’s annoying.

🤷‍♂️ I don’t know! Maybe just go with the simplest approach and see if there are bottlenecks later? We can always fix it in the future.

@loilo
Copy link
Contributor Author

loilo commented Dec 12, 2019

Alright, potential case of YAGNI detected. 😁

@loilo
Copy link
Contributor Author

loilo commented Dec 12, 2019

Ok, I'm a bit stuck and need a little heads-up: Given I don't want to do a major program flow change to the plugin, I'll have to detect readme files inside the find function.

Is it fine to do it there using the fs module?

I'm asking because I don't know what the browser support story is for this module and how it relates to vfile.

@wooorm
Copy link
Member

wooorm commented Dec 12, 2019

I would suggest using adding a resolve file, with a resolve.browser.js alternative that is a “noop” like find-repo.browser.js.

I think actual fs access can go in check-files.js, which already has a simplified version for browsers?

@loilo
Copy link
Contributor Author

loilo commented Dec 12, 2019

That's pretty much where I'm stuck.

My latest approach was roughly like "inject the file set into the check-files.js and add the readme vfile from there". This works in part as the added file will actually go through a find call, however that call only comes when the previous check & validation cycle is already finished (which is too late, obviously).

To be honest, I'm not sure whether it's a good idea for me to finish up this issue. I'm not too confident working with Node streams and not at all familiar with how remark works overall — I'll probably cause you more work by live-learning these things and then delivering a mediocre solution than I would if I left the implementation to you. 🙈

@wooorm
Copy link
Member

wooorm commented Dec 12, 2019

🤗 alright, thanks for working on it! And for helping figure out a solution. I'm away for the weekend so can't code on it now. But I'll get to it soon!

@loilo
Copy link
Contributor Author

loilo commented Dec 12, 2019

No hurry, thanks for offering all these great packages in the first place. 😁

Looking forward to how you're going to approach this, I'll probably learn more about remark from the resulting code diff than from the last hours of reading the existing code. 😉

@loilo
Copy link
Contributor Author

loilo commented Dec 12, 2019

By the way, here is the markdown file that I added to my branch for future tests of this feature. All links containing the term "invalid" should produce some kind of warning, all others should pass.

(The link to the project root at the end of the file might be a bit odd, but it's there to prevent regression of a currently present bug: Linking to the project root at some point leads to effectively calling path.relative(baseFolder, baseFolder), which results in the empty string. That empty string is in turn used as the path for a vfile, which causes a runtime error.)

wooorm added a commit that referenced this issue Jan 15, 2020
Previously, a link to a directory such as `folder#anchor`, didn’t
work.  This commit adds support for, in such a case:

1.  Loading the particular readme file that vendors such as GitHub,
    GitLab, and BitBucket display as an index on those page
2.  For every processed file that is a readme of a directory, marking
    its anchors as valid

Closes GH-49.
wooorm added a commit that referenced this issue Jan 23, 2020
Previously, a link to a directory such as `folder#anchor`, didn’t
work.  This commit adds support for, in such a case:

1.  Loading the particular readme file that vendors such as GitHub,
    GitLab, and BitBucket display as an index on those page
2.  For every processed file that is a readme of a directory, marking
    its anchors as valid

Closes GH-49.
Closes GH-50.
@wooorm wooorm added ⛵️ status/released and removed 🙉 open/needs-info This needs some more info 🦋 type/enhancement This is great to have labels Mar 30, 2020
@wooorm wooorm added the 💪 phase/solved Post is done label Aug 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 phase/solved Post is done
Development

Successfully merging a pull request may close this issue.

2 participants