Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please make SUMMARY.md support urlencode markdown file name #1640

Open
xiefucai opened this issue Sep 3, 2021 · 9 comments
Open

Please make SUMMARY.md support urlencode markdown file name #1640

xiefucai opened this issue Sep 3, 2021 · 9 comments
Labels
A-Links Area: Issues with links A-Summary Area: The summary page, organization of pages. E-Help-wanted Experience: Help Needed

Comments

@xiefucai
Copy link

xiefucai commented Sep 3, 2021

Example,There is a markdown file with a question mark in its file name.

How can I get data from AsyncStorage and add it to initial state when using useReducer?.md

now i have to show it in SUMMARY.md

- [How can I get data from AsyncStorage and add it to initial state when using useReducer?](React/How can I get data from AsyncStorage and add it to initial state when using useReducer?)

and the result in toc is

<a href="React/How can I get data from AsyncStorage and add it to initial state when using useReducer?.html">How can I get data from AsyncStorage and add it to initial state when using useReducer?</a>

after click the link, the page will jump to the wrong url, the part "?.html" of the url will be the search part of url.

I hope I can write the markdown as

- [How can I get data from AsyncStorage and add it to initial state when using useReducer?](React/How%20can%20I%20get%20data%20from%20AsyncStorage%20and%20add%20it%20to%20initial%20state%20when%20using%20useReducer%3F.md)

and the result is

<a href="React/How%20can%20I%20get%20data%20from%20AsyncStorage%20and%20add%20it%20to%20initial%20state%20when%20using%20useReducer%3F.html">How can I get data from AsyncStorage and add it to initial state when using useReducer?</a>

so, please support

  1. someone can urldecode the file name in SUMMARY.md;
  2. urldecode the file path before check file exists;
  3. urlencode the name of markdown file before output as html;
@ehuss ehuss added A-Links Area: Issues with links A-Summary Area: The summary page, organization of pages. E-Help-wanted Experience: Help Needed labels Sep 3, 2021
@apatniv
Copy link
Contributor

apatniv commented Oct 4, 2021

Can I take this if no one else is already working on this?

@ehuss
Copy link
Contributor

ehuss commented Oct 4, 2021

Yea, of course! I would probably start looking at around here. I don't know if mdbook has any existing escape function, or if one is readily available in a dependency.

@apatniv
Copy link
Contributor

apatniv commented Oct 9, 2021

Thank you @ehuss for the pointers.

I am trying to see how mdbook reacts to the following entry in summary.md

- [Hello World](./Hello World?.md)

mdbook build fails with the following error:

021-10-09 16:47:47 [WARN] (mdbook::book::summary): Expected a start of a link, actually got Some(Text(Borrowed("[")))
2021-10-09 16:47:47 [ERROR] (mdbook::utils): Error: Summary parsing failed for file="/tmp/mdBook/sample_testing/src/SUMMARY.md"
2021-10-09 16:47:47 [ERROR] (mdbook::utils):    Caused By: There was an error parsing the numbered chapters
2021-10-09 16:47:47 [ERROR] (mdbook::utils):    Caused By: There was an error parsing the numbered chapters
2021-10-09 16:47:47 [ERROR] (mdbook::utils):    Caused By: failed to parse SUMMARY.md line 10, column 3: The link items for nested chapters must only contain a hyperlink

The mdbook uses pulldown-cmark library to parse links in summary.md. The presense of "?" breaks the parser.

let text = "[Hello World](./Hello World?.md)";
let parser = pulldown_cmark::Parser::new(text);
for event in parser {
    println!("event={:?}", event);
}

The above code dumps the following response:

event=Start(Paragraph)
event=Text(Borrowed("["))
event=Text(Borrowed("Hello World"))
event=Text(Borrowed("]"))
event=Text(Borrowed("(./Hello World?.md)"))
event=End(Paragraph)

The library is able to parse the links if we use the format [](<>) format

Example:

let text = "[Hello World](<./Hello World?.md>)";
let parser = pulldown_cmark::Parser::new(text);
for event in parser {
    println!("event={:?}", event);
}
event=Start(Paragraph)
event=Start(Link(Inline, Borrowed("./Hello World?.md"), Borrowed("")))
event=Text(Borrowed("Hello World"))
event=End(Link(Inline, Borrowed("./Hello World?.md"), Borrowed("")))
event=End(Paragraph)

There is an existing function but it only works for "<" and ">".

I found a percent_encoding which lets us control what characters need to be escaped.

I am assuming since the toc should only contain names of the md book file, is it safe to escape "?" character as well?

There seems to be some weird rules like this.

@YJDoc2
Copy link
Contributor

YJDoc2 commented Oct 10, 2021

Hey, sorry to jump in the middle of your work, but I would like to ask, would it be better to sanitize the path instead of urlencoding it?

Urlencoding can get tricky with multiple characters, and as per the stack overflow link @apatniv has added, it depends on which RFC is used, which encoding is used etc. Added to that, the url-encoded name will not be easily legible as a file name, if generating a file, with possible added issues that the filesystem, depending on OS might not support the characters used, especially with chapter names that might contain Unicode non-ascii symbols.
Looking from #1611 , if mdBook is ready to support that, would it be a better idea, that rather than urlencoding, sanitize the names, so that all blanks or other characters are replaced by '_' or such a character, which is valid in url and would produce quite legible generated file names?
What do you think @ehuss @xiefucai @apatniv ?
Again, apologies for jumping in between your work 😅

@ehuss
Copy link
Contributor

ehuss commented Oct 24, 2021

The mdbook uses pulldown-cmark library to parse links in summary.md. The presense of "?" breaks the parser.

I don't think it is the ?, it is the space. Spaces in links are not valid markdown.
I'm actually not sure I now follow the original report, since the given example is rejected by mdbook.

In general, mdbook doesn't seem to support any sort of escaping in the summary other than space (added in #1293). Although I'd be reluctant to encourage anyone to use strange characters like ?, it may be worthwhile to add full support for unescaping % encoding in the link, so a question mark could be %3f. In which case, the fix would go here.

@apatniv
Copy link
Contributor

apatniv commented Oct 24, 2021

Good observation. It is actually the space that is breaking the parser.

text = [first](./HelloWorld?.md)

event=Start(Paragraph)
event=Start(Link(Inline, Borrowed("./HelloWorld?.md"), Borrowed("")))
event=Text(Borrowed("first"))
event=End(Link(Inline, Borrowed("./HelloWorld?.md"), Borrowed("")))
event=End(Paragraph)

text = [first](./Hello World?.md)

event=Start(Paragraph)
event=Text(Borrowed("["))
event=Text(Borrowed("first"))
event=Text(Borrowed("]"))
event=Text(Borrowed("(./Hello World?.md)"))
event=End(Paragraph)

@ISSOtm
Copy link
Contributor

ISSOtm commented Jun 18, 2022

Considering that #1766 wants "URLs" in SUMMARY.md to all be treated as paths (which is also more or less what the code is already doing), I agree with @YJDoc2 that paths should be sanitized rather than URL-encoded. (Doing so would also take care of that issue.)

@plule
Copy link

plule commented Sep 13, 2023

Hello, I have hit this issue when working with an external tool that generate markdowns with url-encoded links. The current implementation only replaces %20 with spaces, so any other special character (accents for example) create 404 links.

The common mark specification qualifies links as "URI" and other editors supporting markdown (vscode, obsidian) also support percent-encoded links, I think that urldecoding links is a sane approach here.

I have an implementation using urlencoding to do this, and simply decoding the url from summary fixes my issue with exported markdown and the question mark one of this issue. It does not enforce it, so both [💜](%F0%9F%92%9C.md) and [💜](💜.md) will generate valid links.

@ehuss and other readers, if this approach is alright, I'll open the PR :) (the only missing thing is documenting the change I think)

@ISSOtm
Copy link
Contributor

ISSOtm commented Sep 14, 2023

Good observation. It is actually the space that is breaking the parser.

text = [first](./HelloWorld?.md)

event=Start(Paragraph)
event=Start(Link(Inline, Borrowed("./HelloWorld?.md"), Borrowed("")))
event=Text(Borrowed("first"))
event=End(Link(Inline, Borrowed("./HelloWorld?.md"), Borrowed("")))
event=End(Paragraph)

text = [first](./Hello World?.md)

event=Start(Paragraph)
event=Text(Borrowed("["))
event=Text(Borrowed("first"))
event=Text(Borrowed("]"))
event=Text(Borrowed("(./Hello World?.md)"))
event=End(Paragraph)

By the way, the fix according to CommonMark is to enclose the destination in angle brackets. So both of these should function normally:

  • [first](<./Hello World?.md>)
  • - [How can I get data from AsyncStorage and add it to initial state when using useReducer?](<React/How can I get data from AsyncStorage and add it to initial state when using useReducer?>)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Links Area: Issues with links A-Summary Area: The summary page, organization of pages. E-Help-wanted Experience: Help Needed
Projects
None yet
Development

No branches or pull requests

6 participants