Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Urls with space aren't rendered correctly #5153

Closed
ThatAIGeek opened this issue Aug 18, 2018 · 20 comments
Closed

Urls with space aren't rendered correctly #5153

ThatAIGeek opened this issue Aug 18, 2018 · 20 comments

Comments

@ThatAIGeek
Copy link

@ThatAIGeek ThatAIGeek commented Aug 18, 2018

Links to local files that have space in their names aren't rendered as links in jupyter lab. Example:
folder: https://github.com/jupyter-widgets/ipywidgets/tree/master/docs/source/examples
notebook: https://github.com/jupyter-widgets/ipywidgets/blob/master/docs/source/examples/Index.ipynb
Same thing with jupyter notebook. Changing space to %20 helps for jupyter notebooks whicle jupyter lab starts to render the text as link, but still can't open the file which it refers to.
example

@ThatAIGeek ThatAIGeek changed the title Urls with space in name aren't rendered correctly Urls with space aren't rendered correctly Aug 18, 2018
@mhuttner
Copy link

@mhuttner mhuttner commented Aug 30, 2018

I had the same issue as and tracked it down.
@Liberus using   to encode a space seems to work for me

Description below:

Setup

  • Place a link to another Notebook containing a space in a Notebook markdown cell
 [Link](./Local%20Notebook.ipynb)

Expected Outcome

Clicking the Link opens a new tab with the Notebook

Actual Outcome

The API Request for the file fails with a 404, because the the file string gets URL Encoded to
Local%2520Notebook.ipynb
I tracked this down to the following code path

return docManager.services.contents
.get(path, { content: false })

get(
path: string,
options?: Contents.IFetchOptions
): Promise<Contents.IModel> {
let [drive, localPath] = this._driveForPath(path);
return drive.get(localPath, options).then(contentsModel => {

get(
localPath: string,
options?: Contents.IFetchOptions
): Promise<Contents.IModel> {
let url = this._getUrl(localPath);

private _getUrl(...args: string[]): string {
let parts = args.map(path => URLExt.encodeParts(path));
let baseUrl = this.serverSettings.baseUrl;
return URLExt.join(baseUrl, this._apiEndpoint, ...parts);
}

export function encodeParts(url: string): string {
return join(...url.split('/').map(encodeURIComponent));
}

I'm just not sure at what point a fix would be appropriate, to not break other api requests.

@ThatAIGeek
Copy link
Author

@ThatAIGeek ThatAIGeek commented Aug 30, 2018

Good job @mhuttner ! Unfortunately, I'm not a JS/TS developer so I can't help with the fix, but if you'll provide one I'm willing to test it.
Maybe someone from developers will be willing to help, now that you did some research on it?

@minrk
Copy link
Contributor

@minrk minrk commented Aug 30, 2018

I think the bug is in the very top, where rendermime creates the link. Since the .get API expects API paths (not escaped), rendermime should unescape URL targets for relative links before calling the contents.get API.

The rendering issue is apparently a longtime issue in marked.js, where it would implicitly encode markdown link targets with spaces in them. The CommonMark spec explicitly defines link targets as having no spaces, so those aren't valid markdown in current specs. A recent release of marked.js fixed this compatibility issue, which has resulted in notebooks that relied on this not rendering as they used to, since those aren't valid markdown links.

@ian-r-rose
Copy link
Member

@ian-r-rose ian-r-rose commented Aug 30, 2018

Thanks for the report @Liberus and @mhuttner, and thanks for the context @minrk. I'm not entirely sure what we should do to proceed here: since the links rendered by marked are not escaped, it would seem error prone to unescape them. That is, that should break filenames that have %XX in them.

Maybe spaces are a common enough character that we would want to break the spec and replace '%20' with ' ' ourselves. It would certainly make me a bit nervous, though.

@ian-r-rose
Copy link
Member

@ian-r-rose ian-r-rose commented Aug 30, 2018

(And yes, if we did want to mess with the paths rendered by marked, the appropriate place for that would be in handleAttr and handleAnchor in packages/rendermime/src/renderers.ts).

@minrk
Copy link
Contributor

@minrk minrk commented Sep 3, 2018

The marked change, while potentially frustrating for authors who put spaces in their link targets (this includes most IPython/Jupyter tutorial docs), makes the right thing to do unambiguous from the JupyterLab perspective, I think: link targets from markdown are now always and only escaped URLs. They can no longer be unescaped. That means there shouldn't be any ambiguity in handling them. Either:

  1. unescape from markdown to get the contents API "true" name, or
  2. support an "already-escaped" flag to the contents API that skips the call to encodeParts (more icky, but avoids unescape->re-escape roundtrip errors, which can technically happen for improperly escaped initial targets)

@blink1073 blink1073 added this to the Future milestone Sep 6, 2018
@jasongrout jasongrout removed this from the Future milestone Sep 6, 2018
@jasongrout jasongrout added this to the 1.0 milestone Sep 6, 2018
@jasongrout
Copy link
Contributor

@jasongrout jasongrout commented Sep 21, 2018

More context:

  1. gfm explicitly disallows spaces too: https://github.github.com/gfm/#link-destination
  2. Indeed, GitHub does not allow spaces in md files. However, GitHub itself does render the links with spaces in notebooks as valid links: https://github.com/jupyter-widgets/ipywidgets/blob/e27c7a81fd7ff96ea74726575845b61e81c5c855/docs/source/examples/Index.ipynb

See also microsoft/vscode#24763 and nteract/nteract#914 for examples of projects supporting this view.

Closing, since the links in question are not valid markdown links, as @minrk pointed out.

@jasongrout
Copy link
Contributor

@jasongrout jasongrout commented Sep 21, 2018

Closing, since the links in question are not valid markdown links, as @minrk pointed out.

I didn't mean to say that. What I meant to say was that yes, I think we should do @minrk's first suggestion, at least for now, where we (essentially) unescape marked to get the contents api url.

@ian-r-rose
Copy link
Member

@ian-r-rose ian-r-rose commented Sep 21, 2018

The links from marked are not escaped, though.

@jasongrout
Copy link
Contributor

@jasongrout jasongrout commented Sep 21, 2018

What I'm seeing is that this source in a markdown cell is not recognized or rendered as a link: [file](file with spaces.txt).

This is recognized as a link, but makes an incorrect call to the contents api (it makes the call with the %20s, instead of with spaces like it should: [file](file%20with%20spaces.txt).

I think that we should either unescape (e.g., convert %20 to space) the path parameter before the link handler is called at

linkHandler.handleLink(anchor, path, hash);
, or we should unescape it in the handlelink function we provide before we use it in the contents api get call at
.get(path, { content: false })
If we did it this way, I think we should unescape it before adding it to the DOM, i.e., unescape it here:

Either way, we should clarify that the link handler is called with either an escaped or unescaped path, whichever is the case, in the docs at

* @param path: the path to open when the link is clicked.

@ian-r-rose
Copy link
Member

@ian-r-rose ian-r-rose commented Sep 21, 2018

The problem is that there are other escapable characters that are not spaces which could get caught up by that, no?

@jasongrout
Copy link
Contributor

@jasongrout jasongrout commented Sep 21, 2018

Example?

@ian-r-rose
Copy link
Member

@ian-r-rose ian-r-rose commented Sep 21, 2018

What if my file path included %24? That would be unescaped accidentally.

@jasongrout
Copy link
Contributor

@jasongrout jasongrout commented Sep 21, 2018

You would need to escape the %, so it should be [file](myfile%2524)

@ian-r-rose
Copy link
Member

@ian-r-rose ian-r-rose commented Sep 21, 2018

So we would require that people authoring markdown should manually escape valid file paths?

@jasongrout
Copy link
Contributor

@jasongrout jasongrout commented Sep 21, 2018

I think that's basically what we are saying, right? Markdown links should be escaped urls.

@jasongrout
Copy link
Contributor

@jasongrout jasongrout commented Sep 21, 2018

I think that's basically what we are saying, right? Markdown links should be escaped urls.

No, that's not what we are saying (sorry for claiming that!). We are saying that markdown links follow the markdown spec, where any spaces or control characters are escaped, as well as unmatched parens.

I think there is a bug in marked, that it doesn't escape explicit % characters (i.e., not part of an escape) in links. Compare https://marked.js.org/demo/?outputType=html&text=%5Blink%5D(file%25a) with the commonmark reference implementation: https://spec.commonmark.org/dingus/?text=%5Blink%5D(file%25a)

@minrk
Copy link
Contributor

@minrk minrk commented Sep 26, 2018

I think the marked spec is that certain characters must be escaped (space, percent, paren), while others may be escaped.

So the following examples:

  • [has space.ipynb](has space.ipynb) invalid
  • [has space.ipynb](has%20space.ipynb)
  • [ünicode.ipynb](ünicode.ipynb) (valid and escapes to %C3%BCnicode.ipynb)
  • [ünicode.ipynb](%C3%BCnicode.ipynb) (this and above produce the exact same link target)

I wouldn't worry too much about an edge case like % not as part of an escape, since I'd call that an invalid URL. marked and commonjs handle it being invalid differently, but I think it's still not valid.

The more severe issue in jupyterlab right now is that correct markdown links don't resolve because they get double-escaped. I think it's entirely impossible to link to files with spaces in the name with jupyterlab right now, and correctly escaped URLs have never resolved properly as far as I can tell.

@minrk
Copy link
Contributor

@minrk minrk commented Sep 26, 2018

An edge case that shows the ambiguity: You want to link to a file that has actual name has%25percent.ipynb. There is only one correct way to link to this file: has%2525percent.ipynb, because has%25percent.ipynb actually means has%percent.ipynb. So I think always calling decodeURI on the path from marked is the right thing to do, because it gets the right answer in every case above:

  • already escaped (decodes to the right path)
  • not escaped (decode does nothing, path stays correct)
  • not escaped but contains escape sequences (valid but incorrect, the user has specified the wrong URL according to the markdown spec)

@minrk
Copy link
Contributor

@minrk minrk commented Sep 26, 2018

I believe #5383 fixes this

@blink1073 blink1073 mentioned this issue Sep 28, 2018
31 tasks
@lock lock bot locked as resolved and limited conversation to collaborators Aug 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants