New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New math engine SsKaTeX #455

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
2 participants
@ccorn
Contributor

ccorn commented Sep 21, 2017

Introduction

The branch de.1tein.ccorn.sskatex contains a new math-to-HTML/CSS/MathML engine: SsKaTeX, meaning server-side KaTeX, as opposed to client-side KaTeX. It eliminates the need (and the flexibility, and the overhead) for client-side math-rendering Javascript.

Consider this a lightweight alternative to Mathjax-Node. SsKaTeX uses KaTeX's katex.min.js (not included), interpreted by the Duktape JS engine instead of MathJax and Node.js.

In contrast to Mathjax-Node, you do not need a NodeJS installation (whose dependency V8 is unavailable for my apple-darwin9-powerpc64 platform), and no inter-process communication is needed because the JS is handled by the Duktape library via the Ruby bindings provided by the duktape gem. Javascript execution context initialization is done only once. As a result, the performance is reasonable.

What you need

On the kramdown side:

  • Ruby version 1.9 or later, mainly for the Unicode regexps used,
  • Ruby gem Duktape,
  • katex.min.js from KaTeX.

All these requirements need to be met only if SsKaTeX is actually used.

SsKaTeX quickstart configuration:

:math_engine: sskatex
:math_engine_opts: 
  katexjs: 'path-to-katex/katex.min.js'

In the HTML templates:

  • References to KaTeX's CSS and Fonts
  • No Javascript necessary!

On the actual server:

  • KaTeX's CSS and Fonts (assuming you mirror those)
  • No more Javascript!

All that KaTeX stuff is conveniently packaged in a tarball available at the KaTeX releases page.

There are some more configuration options; for the details I refer you to the documentation.

Things done

  • Documentation (rake doc), both for sources (RDoc) and for Webgen.
  • Tests. Only the most basic ones, but rake test likes them.
  • .travis.yml updated, so your CI should install the duktape gem and download katex.min.js into a katex folder and run the tests automatically.

Things to consider

  • Why not use execjs instead of hard-coding duktape? Because

    • duktape is lightweight and known to work (currently);
    • the automatic interpreter selection by execjs could accidentally break your kramdown processing once it finds another shiny Javascript interpreter that happens not to work with KaTeX.

    However, if there are downsides to this approach, replacing duktape with execjs should be straightforward, as the usage is similar.

  • Encoding:

    • I implicitly assume that in SsKaTeX::call(converter, el, opts) the el.value is encoded in UTF-8. Otherwise, an extra .encode('UTF-8') might be necessary.
    • I do some quoting of non-printing and non-ASCII characters in JS string literals. Maybe there are more elegant ways to do that in Ruby than I have used in my function jsquote.
  • Semantics:

    • The SsKaTeX::AVAILABLE constant being true is not sufficient for availability: It mostly checks for the presence of duktape. One would also have to test for the presence of katex.min.js, but its location is configuration-dependent, and the module does not (seem to) have access to such configuration at initialization time.

Looking forward to your feedback.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Sep 22, 2017

On branch de.1tein.ccorn.sskatex.execjs, I have prepared an ExecJS-based version. This might enable SsKaTeX use under JRuby which cannot install the Duktape gem. It might be interesting to test this with the CI's Node.js installation as well as with other engine bindings like therubyracer/therubyrhino which I do not have available.

In case of interest, I could merge that branch here, or create another pull request, although that sounds like a bad idea to me.

The Caveat still applies: The engine flexibility of ExecJS increases the likelihood of malfunctioning configurations and corresponding ticket filings. I'd rather prefer something specific that is known to work.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Sep 22, 2017

By the way, private functions of SsKaTeX are documented too. rdoc -a -o htmldoc/rdoc lib should enable you to browse the code conveniently.

@gettalong gettalong self-assigned this Sep 22, 2017

@gettalong

This comment has been minimized.

Owner

gettalong commented Sep 22, 2017

Thank you for you pull request!

I'm currently rather busy so I had just a quick look - here are some of the more obvious things I noticed:

  • Ruby uses snake_case for method and variable names
  • No need to test for Ruby < 2.0 - since the last release kramdown only support Ruby 2.x
  • The commits need to be cleaned up for a merge, either into one big or a few cohesive ones, getting rid of the "minor cleanup" things and such
  • Don't add the downloaded katex files to .gitignore, this is not useful in the general case
  • The JS files should live in data/kramdown/sskatex/, not somewhere in the lib/ directory

As for duktape vs execjs: I don't know either but if duktape is self-contained, with less dependencies, I would go for it.

I will have a more in-depth look when I have some time in the coming weeks.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Sep 23, 2017

New commits addressing the following:

  • Ruby uses snake_case for method and variable names
  • No need to test for Ruby < 2.0 - since the last release kramdown only support Ruby 2.x
  • [...]
  • Don't add the downloaded katex files to .gitignore, this is not useful in the general case
  • The JS files should live in data/kramdown/sskatex/, not somewhere in the lib/ directory

I am used to contributing diffs, not entire branches, so I knew I would mess this up:

  • The commits need to be cleaned up for a merge, either into one big or a few cohesive ones, getting rid of the "minor cleanup" things and such

Not sure how to repair this without dents in the Github workflow. Rebasing seems out of the question because this branch is already public. I could transfer the changes summarily to a new branch, close this pull request and start a new one.

As for duktape vs execjs: I don't know either but if duktape is self-contained, with less dependencies, I would go for it.

Installing Duktape needs not much more than a C99 compiler and a version of Ruby that can load native object files. Even my old PowerMac G5 can install Duktape (as opposed to V8, Node.js, and all that). However, your CI has demonstrated that JRuby cannot install Duktape.

With ExecJS that should not be an issue. It would even use Node.js, so your CI tests should pass without additional gems even for JRuby (not tested). ExecJS is described as being able to use other engines such as therubyracer or therubyrhino, and it selects one of the available engines automatically. The list of supported engines includes Duktape, so the ExecJS-based approach can be seen as a generalization. It is certainly more flexible. Which might mean: more tickets about engines no one here has ever tested.

There should be a way to preempt ExecJS's automatic JS engine selection on startup and make it user-configurable instead. This way, users could stick to a configuration known to work. Until that is the case with ExecJS, I'd favor Duktape.

@gettalong

This comment has been minimized.

Owner

gettalong commented Sep 23, 2017

Not sure how to repair this without dents in the Github workflow. Rebasing seems out of the question because this branch is already public. I could transfer the changes summarily to a new branch, close this pull request and start a new one.

No, that's not necessary. Just work on your branch but rebase, squash, etc. and then force push. After all: This is just a branch in your repo and although public there is no need to be conservative in this case. So please don't open a new pull request - thanks!

@ccorn

This comment has been minimized.

Contributor

ccorn commented Sep 23, 2017

No, that's not necessary. Just work on your branch but rebase, squash, etc. and then force push. After all: This is just a branch in your repo and although public there is no need to be conservative in this case. So please don't open a new pull request - thanks!

Radically squashed now. The CI failed, but that isnpm install mathjax-node failing because the machine could not download or verify mathjax-2.7.2.tgz. To me, this appears to be temporary.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Sep 23, 2017

Due to repeated rebasing and squashing, there are some commits referenced here that are no longer part of a named branch. I have deleted them locally (their SHA1 ids no longer exist), but Github keeps them even with git push --force --all --prune, probably because they are associated (via commit message) to an open pull request. Sigh. Please pay no attention to commits introduced here with the message

ccorn added a commit to ccorn/kramdown that referenced this pull request

Mission accomplished: No dents in the workflow (but noise instead).

@ccorn

This comment has been minimized.

Contributor

ccorn commented Sep 23, 2017

I find that the engine selection (and the associated availability tests) by ExecJS can be overridden with an environment variable EXECJS_RUNTIME and set later via assignment to ExecJS.runtime.
In particular, one can

ENV['EXECJS_RUNTIME'] ||= 'Disabled'
require 'execjs'
[...]
ExecJS.runtime = ExecJS::Runtimes::Duktape

This seems to encourage an ExecJS-based approach.
I will work on that branch a bit.
If it turns out to be satisfactory, I will add a commit here.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Sep 24, 2017

I have now merged an ExecJS-based version.

The results are encouraging. ExecJS found three JS engines on my old powerpc64-apple-darwin9 platform, including the gem Duktape of course, but also the MacOS-specific JavaScriptCore and an ancient MacPorts-based Mozilla SpiderMonkey 1.8.5. And each engine worked. Duktape was automatically preferred and the fastest, since it does not need an external process.

For better control, I have introduced an additional option for math_engine_opts, called jsrun, which allows the user to override the engine autodetection and specify the Javascript engine to use. If it is not set, the usual ExecJS criteria are followed: If an environment variable EXECJS_RUNTIME is set, its contents will be used, otherwise autodetection takes place.

Note that there is no explicit dependency on Duktape any longer. For the CI, Node should be autodetected. Serious SsKaTeX users and testers should try RubyRacer/MiniRacer, RubyRhino (for JRuby), or Duktape because those avoid external processes and are therefore significantly faster. Of these, Duktape is most modest in its requirements and the only option I have tried.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Sep 25, 2017

I have done a performance comparison of ExecJS (with Duktape engine) vs. Duktape alone (the previous commit).

What I expected was that ExecJS would be slower because of the extra indirection. Instead, runtimes turned out to be about 10% shorter.

I suspect that to be so because ExecJS adds actual "compile this" hints to the context creation routine ExecJS.compile.

Edit: After inspection of the ExecJS implementation, the only optimization I have found is the Duktape option complex_object: nil. Anyway, however they have done it, it works.

Edit: After adding the same optimization to the Duktape-only branch, it is now 3% faster than ExecJS. So the overhead incurred from the ExecJS layer is about 3%. This is small enough to be outweighed by the flexibility of ExecJS, I'd say.

@gettalong

Thanks for your pull request! I have had time today to review your changes and provide some feedback.

.gitignore Outdated
@@ -1,3 +1,4 @@
htmldoc
katex

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Why is this line here?

This comment has been minimized.

@ccorn

ccorn Oct 31, 2017

Contributor

The intention was to avoid recursive file listings by git status when katex.tar.gz has been downloaded und unpacked to get SsKaTeX covered in the tests. Turns out that git status lists only katex if that line is missing, no matter how big the tree below that is (in contrast to e.g. monotone's mtn list unknown), so I'll remove it.

@@ -0,0 +1,8 @@
#! /bin/bash

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Please remove this file, it is not a binary in the sense that it needs to be available to the end user.

This comment has been minimized.

@ccorn

ccorn Oct 31, 2017

Contributor

It is a developer tool, and therefore it is not installed. It is used when updating KaTeX which tends to vary the generated HTML+MathML with each version. I'd move it to another subdirectory, if you find bin misleading. tools perhaps? Or the existing test? Or test/bin?

This comment has been minimized.

@gettalong

gettalong Oct 31, 2017

Owner

The bin/ directory is for application binaries only, so not development tools. Best way would be to incorporate this into the Rakefile under the dev namespace.

@@ -0,0 +1,31 @@
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/assign#Polyfill

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

This file can only be included in kramdown if its license is compatible. Please make sure that it is, include the necessary statements as comment in this file and also place a note in COPYING.

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Same procedure for escape_nonascii_html.js unless this is your code.

This comment has been minimized.

@ccorn

ccorn Oct 31, 2017

Contributor

Will do. The polyfill for object_assign.js is from MDN and seems to be covered by this copyright spec. The history of the MDN entry dates it to 2014 or later. Therefore the following paragraph seems to apply:

Code samples added on or after August 20, 2010 are in the public domain. No licensing notice is necessary, but if you need one, you can use: "Any copyright is dedicated to the Public Domain. http://creativecommons.org/publicdomain/zero/1.0/".

The code for escape_nonascii_html.js is by me.

This comment has been minimized.

@gettalong

gettalong Oct 31, 2017

Owner

Thanks for investigating! Then please add the information that this file is in the public domain to the COPYING file.

This comment has been minimized.

@ccorn

ccorn Nov 3, 2017

Contributor

Turns out that we can rewrite tex_to_html.js such that object_assign.js is not needed and the resulting Javascript looks slightly better.

and [SsKaTeX](../math_engine/sskatex.html).
Each one requires a Javascript engine installed where kramdown runs,
in order to perform the precompilation.
The resulting pages still require CSS and fonts, but no Javascript anymore.

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Please wrap the paragraphs with a line length of 100 characters.

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Same for doc/math_engine/sskatex.page.

A typical SsKaTeX configuration looks like this:
~~~ yaml
math_engine: sskatex

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Why does the line start with two spaces?

This comment has been minimized.

@ccorn

ccorn Oct 31, 2017

Contributor

So that one can paste it below an unindented math_engine_opts:. As this seems to be non-obvious, I'll unindent the fragments.
Update: Not math_engine_opts: here, but the kramdown: in Jekyll's _config.yml. Yet another reason for unindenting.

This comment has been minimized.

@gettalong

gettalong Oct 31, 2017

Owner

Ah 😄 Indeed, I hadn't thought of that. I agree, it is better to unindent the fragments.

# Given a +math_engine_opts+ dictionary,
# return a new JS engine context, initialized with the JS helper files,
# and with a JS object +katexopts+ containing general KaTeX options.
def newcontext(config)

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Should be called new_context.

'Disabled').to_s.to_sym
ExecJS.runtime = JSRUN_FROMSYM[jsrun]
if config[:verbose]
warn "Available JS runtimes: #{jsruns_available.join(', ')}"

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Don't use warn, use converter.warning.

This comment has been minimized.

@ccorn

ccorn Oct 31, 2017

Contributor

Seems better (will try), unless such use implies actual warning semantics. The use by SsKaTeX is rather meant for logging under the verbose or debug options.

This comment has been minimized.

@gettalong

gettalong Nov 2, 2017

Owner

Yeah, I get that. However, if one consciously enables verbose output it is okay to use the warning facility. This way the output will be correctly shown by whatever tool uses kramdown.

As for debug: That's different since it should only ever be used for development purposes. So using warn would be fine by me.

This comment has been minimized.

@ccorn

ccorn Nov 3, 2017

Contributor

I have now tried using converter.warning for the verbose option. For that, I had to reorganize SsKaTeX's private interfaces a bit (passing converter instead of just config). Unfortunately, a jekyll build now does not output those messages at all. It seems I have to go back to warn directly.
However, I'd keep the changed interfaces, just in case we need converter.warning for other things.

# Return a properly initialized JS engine context,
# depending on the given _config_. Cache its reference for reuse.
def jscontext(config)

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Should be called js_context.

# This should really be provided by +ExecJS::Runtimes+:
# A list of available JS engines, as symbols, in the order of preference.
def jsruns_available

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

Shouldn't this be called available_jsruns since it returns the available JS runtimes?

# Return a properly initialized JS engine context,
# depending on the given _config_. Cache its reference for reuse.
def jscontext(config)
JSCTX[config] ||= newcontext(config)

This comment has been minimized.

@gettalong

gettalong Oct 30, 2017

Owner

I think that this could potentially be used for a denial of service attack since if kramdown is used by server application and parses client input, a client could modify math_engine_opts continuously to create new entries here.

How much memory does one entry take? How much time overhead is this compilation step?

This comment has been minimized.

@ccorn

ccorn Oct 31, 2017

Contributor

Caching the context is necessary because the MathEngine API does not seem to provide a method for just the engine initialization and processing of options. I have no figures yet, but I remember that the speedup compared to re-initialization for each formula was substantial enough that I would not go back.

In a typical Jekyll setup, kramdown: math_engine_opts: are configured globally (in _config.yaml), so the context initialization gets done only once per jekyll build (or jekyll serve). Therefore it is shared across all kramdown source files, which is particularly beneficial for large sites.

Regarding overall security: If the user controls part of the filesystem (e.g. can upload files) and can tweak math_engine_opts without restriction, the user could point katexjs (or any of the libs) to a file containing a definition of katex.renderToString (or our entry function tex_to_html) with an endless loop or more nefarious intent. The Javascript in there could do anything, with the privileges of the kramdown process. So, unless user-tweakable kramdown options can be restricted for such a service, the gates to hell are open. What threat model do we have?

Attack scenarios aside: Indeed a long-running process should have a way of forgetting no-longer used configs, otherwise memory usage would be unbounded.
A minimalistic approach would be to cache only the last used config. This already works for Jekyll.
A better approach would be to clear the context cache when it takes more than a prescribed limit of memory. I'd work in that direction.

This comment has been minimized.

@ccorn

ccorn Oct 31, 2017

Contributor

Some benchmarks for a small site with 178 formulae, JS engine Duktape and default options:

  • 0.41s for newcontext
  • avg 0.018s per compile_tex_math (excluding jscontext)
  • Many inline formulae take less than 3ms.

Therefore, caching at least one context is necessary.

This comment has been minimized.

@ccorn

ccorn Nov 1, 2017

Contributor

It occurs to me that recursive computation of context object sizes would be mostly unnecessary overhead. Instead, the configuration cache could be cleared if the number of its key-value pairs is about to exceed some maximum, say 10.
I'd even make that limit configurable via math_engine_opts because not doing so does not really improve security as long as users can (and need to) specify arbitrary katexjs.

This comment has been minimized.

@gettalong

gettalong Nov 2, 2017

Owner

Thanks for investigating!

Since caching is necessary, instead of simple hash, it would probably be better than to use an LRU data structure, like this https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/utils/lru_cache.rb.

As for making the number of cached items configurable: Since the math_engine_opts can change with every input document, I don't see the use of making this configurable. I'd say that caching the last 10 context objects would just be fine.

This comment has been minimized.

@ccorn

ccorn Nov 2, 2017

Contributor

Since caching is necessary, instead of simple hash, it would probably be better than to use an LRU data structure, like this https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/utils/lru_cache.rb.

Cool. That one is Affero GPL, so I suppose that code needs a separate notice in COPYING, unless you, the author, decide otherwise for kramdown. (I'd rather copy that module instead of adding a dependency on hexapdf.)

This comment has been minimized.

@ccorn

ccorn Nov 2, 2017

Contributor

I don't see the use of making this configurable. I'd say that caching the last 10 context objects would just be fine.

Agreed because making that configurable exposes an implementation detail which we may want to change without having to change the user documentation.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Nov 3, 2017

I have committed changes according to the above review and discussion. These changes have now been squashed to two commits:

  • The first commit contains actually new code (new tasks in the Rakefile to check for KaTeX and to update SsKaTeX test reference outputs, obsoleting my previous developer bash script). Needs review.
  • The second comprises the remainder of the changes, mostly moving code around, renaming and reformatting, but also slight changes in the code because of internal interface changes (passing converter instead of just config).
  • The second commit additionally contains the LRUCache code adapted from HexaPDF. I have tentatively changed its copyright comment to be consistent with the remainder of kramdown, as the author is the same, but that needs approval of course.
@ccorn

This comment has been minimized.

Contributor

ccorn commented Nov 4, 2017

The CI failed again, not due to SsKaTeX but apparently because of mathjax-node resp. its dependencies becoming ever more demanding.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Nov 4, 2017

With those changes (using mathjax-node-cli instead of mathjax-node, increasing Node.js version to at least 4.5) rake test works again even where Mathjax-Node thinks it is available. I should probably rebase that on master and issue a pull request, but I do not want to get sidetracked, and working on a platform with Node.js is hard for me.

@gettalong

This comment has been minimized.

Owner

gettalong commented Nov 5, 2017

I so regret having this pile of [use your imagination here, you may use "reprocessed food stuff"] called NodeJS as a test dependency... I have used your instructions to update the mathjax-node-cli integration, lights are green again on the master branch.

Thank you very much for your latest work!! I have looked through the files and only have two remaining requests:

  • As discussed please use converter.warning instead of warn when showing verbose messages.
  • Squash your commits into one and rebase on master.

I will then merge your changes and release a new version next week.

@gettalong

This comment has been minimized.

Owner

gettalong commented Nov 5, 2017

I forgot:

  • Bringing the LRUCache into kramdown makes it MIT, fine by me.
  • Even if Jekyll doesn't display the warnings, using converter.warning is the best way to go forward regarding the verbose messages (debug messages are a different thing because they are only meant for developers). The warnings kramdown outputs are sometimes very useful in determining whether there was some mistake. webgen naturally uses them and it's rather easy to spot dangling link references this way. I have opened an issue for Jekyll.
  • Another possibility: We could introduce a new kramdown option 'debug' that makes kramdown output warnings as warn messages. This would benefit all tools (although in a possibly worse way) that use kramdown but don't handle the generated warnings.
@ccorn

This comment has been minimized.

Contributor

ccorn commented Nov 6, 2017

I have rebased on master and squashed the previous commits into one; one new commit I have not squashed yet because that makes its changes easier to review; when we are done, I will do the final squashing.
Squashed everything now. Using a separate branch for Latest changes to ease review.

  • SsKaTeX's logging code refactored, looks nicer now.
  • verbose uses converter.warning now. That's not too bad with current Jekyll because debug still uses warn and in the latest commit includes the verbose-level messages as well (which makes sense because the channels are nominally different).
  • verbose now also logs the contents of the katexopts sub-dictionary. Interestingly, the key duplication (as strings and as symbols) in the configuration makes .to_json output duplicate key-value pairs. Of course the JS parser on the receiving side would deduplicate those, but it looked bad in the logs, therefore I added a nondestructive key deduplication function.
  • Sanity checks would now raise Kramdown::Error instead of RuntimeError
  • Updated the affected parts of the documentation and tried to reduce stylistic inconsistencies by marking up option keywords as bold (like the corresponding list items).
  • Thanks for the Jekyll issue. They are already working on it.
New math-to-HTML/CSS/MathML engine: SsKaTeX
The name SsKaTeX means server-side KaTeX, as opposed to client-side KaTeX.
It eliminates the need (and the flexibility, and the overhead)
for client-side math-rendering Javascript.

Consider this a lightweight alternative to Mathjax-Node.
SsKaTeX uses KaTeX's katex.min.js (not included) instead of MathJax
and uses a configurable or auto-selected JS engine (via ExecJS),
e.g. Duktape, rather than depending strictly on Node.js.

Javascript execution context initialization is done only once.
With some JS engines, no inter-process communication is needed
because the Javascript is handled by library functions.
As a result, the performance is reasonable.
@gettalong

This comment has been minimized.

Owner

gettalong commented Nov 12, 2017

Thanks for your changes!

I have cherry-picked your commit and modified it a bit:

  • Comments are now wrapped at 100 characters per line
  • Removed the Rakefile task for checking for KaTeX since writing ls katex effectively does the same
  • The AVAILABLE constant now also takes the available Javascript runtimes into account, i.e. only when there is at least one available will SsKaTeX be available
  • Removal of the private_constant declarations (I don't think that they are really useful in this case)
  • Some small formatting changes
  • Fixed the changes to test/test_files.rb since many tests weren't run anymore

The changes are already on Github and Travis is green.

Thanks again for your contribution!

@gettalong gettalong closed this Nov 12, 2017

@ccorn

This comment has been minimized.

Contributor

ccorn commented Nov 12, 2017

  • Removed the Rakefile task for checking for KaTeX since writing ls katex effectively does the same

The removed task is (still) a dependency for the remaining Rakefile task:

    task update_katex_tests: [:test_katexjs]

I added it to have something that shows developers how to actually get KaTeX. (They should not need to peek into .travis.yml for that.) If you remove the test and katex/katex.min.js does not exist, you get wrong reference output (the MathJax markup). Thereafter tests will fail when KaTeX becomes available. Therefore that check is important.

  • The AVAILABLE constant now also takes the available Javascript runtimes into account, i.e. only when there is at least one available will SsKaTeX be available

I omitted that deliberately because the availability tests for AVAILABLE might increase startup times significantly (they check for external binaries, for example), even if SsKaTeX is not actually used. On my platform, there is not much of a difference (10ms), but that may be because not many big-overhead engines are available there. This might be different on other platforms or when ExecJS adds more engines. So I decided to make the availability list only when given the verbose option at the time of first use. There should be no need to pay for something that is not used.
At least one might replace the .select construct with a .find so it terminates with the first engine found. That is along the lines of ExecJS's .best_available.

  • Removal of the private_constant declarations (I don't think that they are really useful in this case)

As you like. The intent was to make clear that the constants are not considered part of a stable interface and might disappear in the future, so other modules should better not rely on them. If in doubt, I make implementation details private.

  • Fixed the changes to test/test_files.rb since many tests weren't run anymore

Ouch. Awfully sorry for that. Your fix gives the intended semantics. Thanks.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Nov 12, 2017

I might add that there is a subtlety with ExecJS runtime availability. The Disabled runtime is always available, but not included in the runtimes list. Configuring with jsrun: Disabled would therefore cause a runtime error on first use, as documented. With the availability tests at loading time, SsKaTeX without actual ExecJS runtime would simply become unavailable even if Disabled had been configured, and then math fragments would not raise runtime errors.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Nov 12, 2017

To summarize, I'd propose the following conservative changes, although I understand the motivation for the current state:

  • Add the proposed Rakefile target :test_katexjs or at least its functionality as needed for :update_katex_tests
  • Defer JS engine availability tests until the relevant configuration is available. (The whole point of setting Disabled prior to loading ExecJS was to avoid a bunch of file system accesses and possibly other unrequested ExecJS automatisms when it is not even certain whether SsKaTeX is used at all, or what engine the math_engine_opts would request.)
diff --git a/Rakefile b/Rakefile
index 44e22d3..d30fbf3 100644
--- a/Rakefile
+++ b/Rakefile
@@ -267,6 +267,18 @@ EOF
     puts "Look through the above mentioned files and correct all problems" if inserted
   end
 
+  desc "Check for KaTeX availability"
+  task :test_katexjs do
+    katexjs = 'katex/katex.min.js'
+    raise (<<TKJ) unless File.exists? katexjs
+Cannot find file '#{katexjs}'.
+You need to download KaTeX e.g. from https://github.com/Khan/KaTeX/releases/
+and extract at least '#{katexjs}'.
+Alternatively, if you have a copy of KaTeX unpacked somewhere else,
+you can create a symbolic link 'katex' pointing to that KaTeX directory.
+TKJ
+  end
+
   desc "Update kramdown SsKaTeX test reference outputs"
   task update_katex_tests: [:test_katexjs] do
     # Not framed in terms of rake file tasks to prevent accidental overwrites.
diff --git a/lib/kramdown/converter/math_engine/sskatex.rb b/lib/kramdown/converter/math_engine/sskatex.rb
index bb30dfd..0e5a98f 100644
--- a/lib/kramdown/converter/math_engine/sskatex.rb
+++ b/lib/kramdown/converter/math_engine/sskatex.rb
@@ -25,7 +25,8 @@ module Kramdown::Converter::MathEngine
       require 'json'
       ENV['EXECJS_RUNTIME'] = 'Disabled' # Defer automatic JS engine selection
       require 'execjs'
-      ExecJS::Runtimes.runtimes.select(&:available?).size > 0
+      # No test for any JS engine availability here; specifics are config-dependent anyway
+      true
     rescue LoadError
       false
     ensure
@gettalong

This comment has been minimized.

Owner

gettalong commented Nov 12, 2017

Thanks for your comments. I agree with the second point concerning the availability of the runtimes.

And thanks for pointing out the dangling task dependency. However, the removed task does only check the availability of the katex JS file. Even if that is available but execjs is not, it will provide an invalid output. Since only those who who develop kramdown will make use of this task, they will know how to install the required dependencies. Additionally, there is a remark in the README file that points to the travis configuration file for information on regarding the installation of the dependencies. So I'm inclined to not include that KaTeX file check.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Nov 12, 2017

Thanks for the feedback. Since it is important that KaTeX availability is properly checked when updating the reference outputs, let me improve the test dependency so that it actually shows whether SsKaTeX can be used. I have renamed it to test_sskatex_deps now. Here is the revised and tested proposal for Rakefile:

diff --git a/Rakefile b/Rakefile
index 44e22d3..e36ffd1 100644
--- a/Rakefile
+++ b/Rakefile
@@ -267,8 +267,26 @@ EOF
     puts "Look through the above mentioned files and correct all problems" if inserted
   end
 
+  desc "Check for SsKaTeX availability"
+  task :test_sskatex_deps do
+    katexjs = 'katex/katex.min.js'
+    raise (<<TKJ) unless File.exists? katexjs
+Cannot find file '#{katexjs}'.
+You need to download KaTeX e.g. from https://github.com/Khan/KaTeX/releases/
+and extract at least '#{katexjs}'.
+Alternatively, if you have a copy of KaTeX unpacked somewhere else,
+you can create a symbolic link 'katex' pointing to that KaTeX directory.
+TKJ
+    html = %x{echo '$$a$$' | #{RbConfig.ruby} -Ilib bin/kramdown --math-engine sskatex}
+    raise (<<XJS) unless / class="katex"/ === html
+Some static dependency of SsKaTeX, probably the 'execjs' gem, is not available.
+If you 'gem install execjs', also make sure that some JS engine is available,
+e.g. by installing one of the gems 'duktape', 'therubyracer', or 'therubyrhino'.
+XJS
+  end
+
   desc "Update kramdown SsKaTeX test reference outputs"
-  task update_katex_tests: [:test_katexjs] do
+  task update_katex_tests: [:test_sskatex_deps] do
     # Not framed in terms of rake file tasks to prevent accidental overwrites.
     stems = ['test/testcases/block/15_math/sskatex',
              'test/testcases/span/math/sskatex']

NB: Herein, multi-line output strings have been formatted to 80 characters per line because those go to the terminal.

@ccorn

This comment has been minimized.

Contributor

ccorn commented Nov 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment