Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to specify doc-open-url results in a useless F1 webpage #1798

Open
blerner opened this issue Sep 6, 2017 · 8 comments
Open

Trying to specify doc-open-url results in a useless F1 webpage #1798

blerner opened this issue Sep 6, 2017 · 8 comments

Comments

@blerner
Copy link

blerner commented Sep 6, 2017

In the past, because our lab machines have DrRacket installed virtually, we've been using a hack to redirect the F1 help-search page. @elibarzilay revised that hack into a supported mechanism, so now we can simply say

raco pkg config --set -i doc-open-url http://docs.racket-lang.org/

and F1 should simply work. (It used to be a different URL, http://download.racket-lang.org/docs/<version>/html/, apparently, but I'm guessing this is the new correct one...) Trouble is, if you actually point DrRacket at this URL, you get no results, because the context is set to, e.g. Intermediate+Student+Language -- note the plusses. @stchang and I tracked this behavior down today, and we unraveled the following:

  • /usr/share/racket/pkgs/scribble-lib/help/search.rkt (which I can't find in the repo, so I can't link to it directly) defines perform-search, which builds up a uri-encoded query string containing the context and the search term.
  • It then passes that string into send-main-page, which runs (url-query (string->uri (format "q?~a" query))), to parse that string back into a url structure, then peel off the query portion of it as an alist.
  • It appends that alist with any existing query from the doc-open-url, and builds up a new url which it then converts to a string via url->string and sends over the network.

Now for the tricky bits:

  • The initial context includes things like L: Intermediate Student Language, with spaces. Those spaces get uri-encoded to %20.
  • The various components of the query string get concatenated with +.
  • Then string->uri decodes the whole thing back to an alist whose entries are strings with spaces.
  • Then the final url->string encodes the query string using alist->form-urlencoded which turns the spaces in the strings into +, and then concatenates the various query components with & (I think; it might be ;).
  • Finally, the server seems to choose not to translate the + back to spaces, despite the warning in the code (uri-codec.rkt):
;; NB: RFC 2396 supersedes RFC 1738.

;; This differs slightly from the straight encoding in RFC 2396 in
;; that `+' is allowed, and represents a space.  We follow this
;; convention, encoding a space as `+' and decoding `+' as a space.
;; There appear to be some brain-dead decoders on the web, so we also
;; encode `!', `~', `'', `(' and ')' using their hex representation.
;; This is the same choice as made by the Java URLEncoder.

;; Draws inspiration from encode-decode.scm by Kurt Normark and a code
;; sample provided by Eli Barzilay

Presumably, this is because identifiers like scene+line need their pluses intact. But it leads to a broken context on the search page, and no results get found.

So it seems like one or more of the following things are going wrong here:

  • send-main-page shouldn't tear apart its query string, or it should do it properly, or something
  • url-string shouldn't be using alist->form-urlencoded, or alist->form-urlencoded shouldn't use form-url-encode, to handle spaces in the provided arguments
  • And, the server needs to correctly handle + characters in its query strings and/or it needs to conspire better with send-main-page to encode +s in Racket identifier names so that +s in query strings can be unambiguously decoded.

Note: This doesn't affect local installations of DrRacket, because those send F1 pages to a local file:// URL, and that control path doesn't muck with the query string at all.

I don't think this can be fixed purely server-side, since it affects code in DrRacket too, so this fix won't make it into 6.10. But if there's a patch we can try, we can probably hack that into our deployment and try it out...

@stchang
Copy link
Contributor

stchang commented Sep 6, 2017

Here's the start of the referenced code: https://github.com/racket/scribble/blob/master/scribble-lib/help/search.rkt#L63

@mflatt
Copy link
Member

mflatt commented Sep 8, 2017

As far as I can tell, the part that's going wrong is on the server side, although it's JavaScript sent back to the browser. The URL is parsed using string operations, and a "label" query is converted using unescape in JavaScript, which does not decode + into a space.

To avoid the immediate problem, I see advice like
https://stackoverflow.com/questions/12042592/decoding-url-parameters-with-javascript,
but surely that's not the right approach.

The deeper problem is that the JavaScript code shouldn't try to do its own parsing of a URL string. The obviously better choice is to use the URL class in JavaScript... but apparently URL is new and not always supported by browsers???

So, it seems like a stupid question, but: How do I correctly and portably parse a URL in JavaScript to extract a query value?

@samth
Copy link
Sponsor Member

samth commented Sep 8, 2017

This suggests that you can use URL with high probability: http://caniuse.com/#feat=url

@samth
Copy link
Sponsor Member

samth commented Sep 8, 2017

And if you want a backup solution for IE, here's a polyfill: https://github.com/webcomponents/URL

@jackfirth
Copy link
Sponsor Contributor

jackfirth commented Sep 8, 2017

Part of the cause here is that the HTTP URI scheme RFC does not standardize any format whatsoever for the query string; it can basically be any blob of characters. The foo=1&bar=2 syntax is purely by convention of (nearly all) implementations. Many servers and clients have slightly different interpretations and formatting rules with no clear indication which set of rules is "correct".

@mflatt
Copy link
Member

mflatt commented Sep 8, 2017

@blerner I've updated docs.racket-lang.org. Does using "F1" in DrRacket now work?

@blerner
Copy link
Author

blerner commented Sep 8, 2017

Looks like it helps: F1 on scene+line yields the URL http://docs.racket-lang.org/search/search-context.html?q=scene%2Bline&hq=O%3A%7B+L%3Alang%2Fhtdp-beginner+T%3Ateachpack+T%3Apicturing-programs+%7D&label=Beginning+Student which then seems to work properly -- at least, on Linux, in Chromium and in Firefox. I can't test on the lab machines right now, because I don't have access to changing their config myself, but it should suffice...

Thanks!

mflatt added a commit that referenced this issue Sep 10, 2017
Use the `URL` class in JavaScript instead of manually parsing a URL
string.

Related to #1798
@blerner
Copy link
Author

blerner commented Sep 13, 2017

Ah, it's not quite fixed. The resulting page doesn't work in IE. The relevant JS code is

if (location.search.length > 0) {
  var u = new URL(location);
  var newsearch = "";
  for(var key of u.searchParams.keys()) { // <<<=== this line
    var val = u.searchParams.get(key);
    // an empty "hq=" can be used to clear the cookie
    if (key == "hq") {
      SetCookie("PLT_ContextQuery", val);
    } else if (key == "label") {
      SetCookie("PLT_ContextQueryLabel", val);
    } else {
      if (newsearch == "") newsearch = "?";
      newsearch = newsearch + "&" + key + "=" + encodeURIComponent(val);
    }
  }
  // localtion.replace => jump without leaving the current page in the history
  // (the new url uses "index.html" and the new search part)
  location.replace(location.href.replace(/\/[^\/?#]*[?][^#]*/,
                                         "/index.html" + newsearch));
} else {
  // no parameters found? just jump to the search page...
  location.href = "index.html";
}

I'm pretty sure IE doesn't support for-of loops. You'd need to use the wordier for-in loop with a hasOwnProperty check...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants