Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with base dir <empty> #320

Closed
ran-huang opened this issue Sep 13, 2021 · 5 comments
Closed

Error with base dir <empty> #320

ran-huang opened this issue Sep 13, 2021 · 5 comments

Comments

@ran-huang
Copy link

Hi, I use Lychee in GitHub Actions, which worked perfectly until this week. I got this error today:

Error: Error with base dir `<empty>` : Found absolute local link "/download" but no base directory was set. Set with `--base`.
+ exit_code=1

I tried to add --exclude /download in args, but the error persists. It still checks /download.

I can't set --base because it will affect other relative links with different base URL. Is there a way to fix the error? Thank you!

@MichaIng
Copy link
Member

MichaIng commented Sep 13, 2021

Do you use the latest stable release v0.7.1 or build it from master? Dump question, as it says --base instead of --base-url so it must be the master branch 😄.

Local file check support has been added: #262
This means you can use a base path now instead of a base URL and lychee will then not do a network request for internal links but instead check if the file/dir exists locally. Probably this solves your issue with the different base URLs? Else it may make sense to check the websites for different domains/hosts with dedicated lychee calls so that you can define the correct base path (or URL) separately, to have internal links with absolute paths checked as well. For those with relative paths, local file/dir checking is done automatically resolving the relative path in the directory tree.

But before this feature was added, internal links with absolute paths were silently ignored if no --base-url was given, if I'm not mistaken. And I think this should be kept instead of throwing an error. Probably a warning like here or a silent exclusion like for unsupported schemes makes sense @mre?

Another thing is that --excludes should probably be skipped before any further check on the URL is done.


Related code for the first issue: https://github.com/lycheeverse/lychee/blob/f143087/lychee-lib/src/helpers/path.rs#L37-L58

  • Turning this invalid base path error into a warning (like unknown response status) or a silent exclude (like unsupported scheme).

Related code where excludes are handled: https://github.com/lycheeverse/lychee/blob/93948d7/lychee-lib/src/client.rs#L171-L200

  • This is indeed after all the extraction, collection, interpretation, turning internal links into file:// URIs and related checks have been done already. We may think about switching the order so that excludes are applied as a very early step within the collector on the raw URI as given in the inputs. This has quite some implications, so needs to be tested carefully, but it would solve some other issues I ran into, e.g. that it was not possible to exclude the given private addresses while checking internal links against a locally running webserver. In this case we use 127.0.0.1 as base URL, but it would be excluded with -E, while we actually would like to use -E to exclude any literal 127.0.0.1 within the input files. Of course it does not make much sense to apply an exclude filter on the base URL which was explicitly given as such by the user.
  • Also it would fit a bit more my expectations/logic: If I define an exclude rule, I would expect it to be applied on the raw URI (or path) given in the input file rather than the final URI constructed by lychee. Especially when having internal links, both can be quite different, and the base path or URL is often only valid within the test environment and may contain path elements which do not exist on the deployed website, which are currently unexpectedly excluded. E.g. /home root path within the GitHub actions VM does currently exclude all internal links when using an --exclude '/home' rule, while probably it was only wanted to exclude https://example.org/home*.

@mre
Copy link
Member

mre commented Sep 16, 2021

I see three options moving forward:

  1. We silently ignore internal links with absolute paths.
  2. We print a warning but check the rest
  3. We assume the input directory as the base for absolute links if no base is set.

In the long run I lean towards option 3. It would be what I expect if I run lychee public/ for example where public is the build directory of a static site generator.
It gets tricky when we look at multiple inputs. Then we'd have to track the input source for each link to determine the base. We could add a source field to the Input, which would allow us to backtrack the source of each link for that use-case.

This requires quite some refactoring, so in the meantime we could go with option 1 to not break any existing user workflows.

@MichaIng
Copy link
Member

I vote for option 1 + adding a note to the usage that --base is required to check relative URLs with absolute path (or "internal links with absolute path", not sure how to name precisely + understandable). If option 2, then I fear we'd need another CLI option to allow muting it. There may be cases where files with different base paths/URLs are aimed to be checked in a single run, so that setting a base would be right for some but wrong for others.

Option 3 would imply too many assumptions and/or complicated checks and steps to do it mostly right, IMHO. Better to force users being explicit about what they want than being too implicit with much effort and always the chance to still assume wrong.

@mre
Copy link
Member

mre commented Sep 16, 2021

You have a good point there. I changed my mind. Unless anyone else brings up a good argument against it, let's go with option 1.

@ran-huang
Copy link
Author

Thank you! This issue is fixed. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants