Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong markdown prevents link to get parsed correctly #9045

Closed
4 of 7 tasks
alexandercerutti opened this issue Jun 7, 2023 · 8 comments · Fixed by #9046
Closed
4 of 7 tasks

Wrong markdown prevents link to get parsed correctly #9045

alexandercerutti opened this issue Jun 7, 2023 · 8 comments · Fixed by #9046
Labels
bug An error in the Docusaurus core causing instability or issues with its execution domain: markdown Related to Markdown parsing or syntax

Comments

@alexandercerutti
Copy link

alexandercerutti commented Jun 7, 2023

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

Hello there, thanks for Docusaurus! This is really great.
Yesterday I pushed a version tag which triggered a pipeline to publish documentation on Github Pages.

The build suddenly started failing without the docs having been touched.

On both CI and local, the error reported what follows:

Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.

Exhaustive list of all broken links found:

- On source page path = /Trackers/MyService2/:
   -> linking to ../../session-data.md (resolved as: /session-data.md)

Note that my configuration includes these two properties:

{
	...
	onBrokenLinks: "throw",
	onBrokenMarkdownLinks: "throw",
	...
}

When running docusaurus start, it wasn't returning the error but the link wasn't being resolved correctly: I could see the link pointing to raw ../../session-data.md instead of /session-data. So obliviously it was failing.

What's weird is that the link was existing and other pages on the same tree had the same link working.

My folder tree is this one:

.
├── Trackers
│   ├── MyService1
│   │   └── README.md
│   ├── MyService2
│   │   └── README.md
│   ├── MyService3
│   │   └── README.md
│   ├── MyService4
│   │   └── README.md
│   ├── add-a-tracker.md
│   └── README.md
├── intro.md
├── errors-reference.md
└── session-data.md

So I started to look back in git to see when the docs was edited the last time but at a first look I didn't see anything wrong (but actually there was something wrong). As I forgot to push the previous version tag, I didn't see the error earlier.

So I started debugging the project by running it with NODE_OPTIONS="--inspect-brk" but after several hours of debugging, I definitely failed to understand the whole picture and why it was failing.

Then I found out there was a markdown syntax error only on that specific page: a set of ``` that wasn't getting closed (as you can see from the repro case below, on this line).

So, it is fine to me that there was an error there so it failed compiling. What's odd to me is the error message and the fact that the compiling process reached such point without giving any hint about wrong markdown issue.

Another odd thing to me is that it wasn't actually a real syntax error because the html box can get generated without any problem (just a UI issue to me).

Clearly, if the link is put before the syntax error, it gets resolved correctly.

I don't know if this is an issue about docusaurus or a dependency, but I wasn't able to go any further in my investigation.

Thank you!

Reproducible demo

https://stackblitz.com/edit/github-qtdreu?file=docs%2FMyService%2FREADME.md

Please note that I had to disable on stackblitz the auto-formatting on save.

Steps to reproduce

  1. Create two pages. The second should have a link to the first one
  2. Insert in the second page, before the link, a wrong markdown (see the repro case, it might be a problem with preformatted code block)

Expected behavior

A better error message or at least links getting resolved correctly, but according to what I wrote above, I don't actually know what should I really expect.

Actual behavior

A completely unrelated error message:

Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.

Exhaustive list of all broken links found:

- On source page path = /Trackers/MyService2/:
   -> linking to ../../session-data.md (resolved as: /session-data.md)

Your environment

  • Public source code: n/a
  • Public site URL: n/a
  • Docusaurus version used: 2.4.1
  • Environment name and version (e.g. Chrome 89, Node.js 16.4): Node 16.x
  • Operating system and version (e.g. Ubuntu 20.04.2 LTS): Ubuntu latest (CI), MacOS 12.6

Self-service

  • I'd be willing to fix this bug myself.
@alexandercerutti alexandercerutti added bug An error in the Docusaurus core causing instability or issues with its execution status: needs triage This issue has not been triaged by maintainers labels Jun 7, 2023
@Josh-Cena
Copy link
Collaborator

Your repro contains unpaired fences:

## Import via CDN

```html
<script src="https://myservice-core.umd.js"></script>
<script src="https://myservice.umd.js"></script>

This script is also available for ES5:

```html
<script src="https://myservice.umd.es5.js"></script>
```

## Activation and deactivation

This service gets automatically triggeredwhen [session data](../session-data.md) gets updated and certain conditions are honored.

We may be able to fix this, but it's not obligatory for us to handle non-well-formed Markdown.

@Josh-Cena Josh-Cena added domain: markdown Related to Markdown parsing or syntax and removed status: needs triage This issue has not been triaged by maintainers labels Jun 7, 2023
@Josh-Cena Josh-Cena changed the title Wrong markdown prevents link to get parsed correctly (but the rest of file gets parsed correctly) Wrong markdown prevents link to get parsed correctly Jun 7, 2023
@alexandercerutti
Copy link
Author

alexandercerutti commented Jun 7, 2023

Hey @Josh-Cena, thanks for prompt reply!

What's actually weird is that this seems to be a valid Markdown on the HTML, but that doesn't explain to me the connection with link resolution.

It is okay for me if your don't handle the Markdown error, but I feel like there must be something else wrong somewhere in the middle, that prevents the link to get resolved if this situation happens.

@Josh-Cena
Copy link
Collaborator

I'm preparing a PR already. Note that Markdown is very resilient: whatever the input is, the parser almost never throws a syntax error. The problem here is that Docusaurus believes the link is inside a code block (because it has encountered three code fences so far, so what comes next is probably code), so we don't replace the text, while to the actual Markdown parser, the second fence actually does not close the first fence.

@alexandercerutti
Copy link
Author

alexandercerutti commented Jun 7, 2023

So, if I understood correctly, there are two "actors" that read the parsed markdown structure (I expect to be there a metastructure - for tokens - somewhere if the parsing phase happens once, or two different markdown parsers) and handle it in two different ways?

An actor converts markdown to html, the other checks the links and other things for docusaurus, right?

Thanks for the PR!

@Josh-Cena
Copy link
Collaborator

Yes, you are right! Docusaurus actually has its own way of parsing Markdown links that does not use a full-blown Markdown parser. It's more performant but also means lots of bugs and inconsistencies. Sorry if it wasn't obvious previously.

@alexandercerutti
Copy link
Author

alexandercerutti commented Jun 7, 2023

Ugh, I hate these situations ahah Having two "flows" that do almost the same thing without a solution that could satisfy both, is something that code-smells to me!

Anyway, IMHO since docusaurus is a tool I'd probably sacrifice performance for correctness, but I'd also have just a parser.

Anyway, thank your for your support in triaging this issue 😄

@Josh-Cena
Copy link
Collaborator

We've tried using a proper parser before, but it didn't always work because we can't use the global Markdown configuration here. It is planned though. V3 will be about MDX v2, and then we may investigate further for a better Markdown infrastructure.

@slorber
Copy link
Collaborator

slorber commented Jun 8, 2023

Helpful issue to track for a better md linkification process: #9048

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution domain: markdown Related to Markdown parsing or syntax
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants