Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RES could use snudown compiled to JS via emscripten #452

Closed
wants to merge 1 commit into from
Closed

RES could use snudown compiled to JS via emscripten #452

wants to merge 1 commit into from

Conversation

aidanhs
Copy link
Contributor

@aidanhs aidanhs commented Sep 4, 2013

Issue to track progress of 'emsnudown', as initially discussed in #434

This issue will eventually be converted to a PR.

@mturley
Copy link

mturley commented Jun 25, 2013

What is currently used to render markdown in RES?

@aidanhs
Copy link
Contributor Author

aidanhs commented Jun 25, 2013

A manual port of snudown to js - https://github.com/gamefreak/snuownd

@aidanhs
Copy link
Contributor Author

aidanhs commented Jun 28, 2013

Initial testing and benchmarking looks promising.

EmSnudown currently passes all Snudown sanity tests (SnuOwnd fails one edge case).

SANITY
============
=== BODY:
/r/all-minus-something
=== SNUDOWN:
<p><a href="/r/all-minus-something">/r/all-minus-something</a></p>

=== EMSNUDOWN:
<p><a href="/r/all-minus-something">/r/all-minus-something</a></p>

=== SNUOWND:
<p><a href="/r/all">/r/all</a>-minus-something</p>

============
FAIL: 1
SUCCESS: 51
FAILED SANITY TESTS

Running through a few thousand random comments has EmSnudown, Snudown and SnuOwnd all producing identical results.

Benchmarking shows EmSnudown as actually being faster than SnuOwnd which I found a little surprising. I might play around with this some more and see if minifying SnuOwnd helps it. They are micro-benchmarks though and were run on a virtual machine just to see if EmSnudown would be badly slower (apparently not).

Testing initialisation speed
SnuOwnd  9.1412460804
EmSnudown  8.01900315285
Testing medium number of renders
SnuOwnd  6.8896510601
EmSnudown  5.82095980644
Testing large number of renders
SnuOwnd  5.66779303551
EmSnudown  2.73442697525

Size of a minified EmSnudown is 235KB. Size of the current unminified SnuOwnd is ~100KB. I can't actually check because closure compiler chokes on variables named 'char', but uglifyjs suggests SnuOwnd might be reducible to about 33KB.

The current status is that EmSnudown seems to work, and reasonably well. Given that RES currently (I think) only uses a single simple call to SnuOwnd it would be replacable as-is with a few line changes (and putting in the EmSnudown code instead of SnuOwnd).

Todo:

  • it'd be nice if it was a smaller file
  • @gamefreak mentioned callbacks for sanitisation
  • benchmarks
  • better tests

Code at https://github.com/aidanhs/emsnudown, tests at https://github.com/aidanhs/emsnudown/blob/79588aff539102adcc321f4634ca3f6f5d974d2d/test/test.py

@honestbleeps
Copy link
Owner

how big is the output? I thought last time we'd looked at emscripten /
snudown it was like a 2mb javascript file?

On Thu, Jun 27, 2013 at 11:08 PM, aidanhs notifications@github.com wrote:

Initial testing and benchmarking looks promising.

EmSnudown currently passes all Snudown sanity tests (SnuOwnd fails one
edge case).

SANITY

=== BODY:
/r/all-minus-something
=== SNUDOWN:

/r/all-minus-something

=== EMSNUDOWN:

/r/all-minus-something

=== SNUOWND:

/r/all-minus-something

FAIL: 1
SUCCESS: 51
FAILED SANITY TESTS

Running through a few thousand random comments has EmSnudown, Snudown and
SnuOwnd all producing identical results.

Benchmarking shows EmSnudown as actually being faster than SnuOwnd which I
found a little surprising. I might play around with this some more and see
if minifying SnuOwnd helps it. They are micro-benchmarks though and were
run on a virtual machine just to see if EmSnudown would be badly slower
(apparently not).

BENCHMARK
PREPARING
RUNNING
Testing initialisation speed
SnuOwnd 9.1412460804
EmSnudown 8.01900315285
Testing medium number of renders
SnuOwnd 6.8896510601
EmSnudown 5.82095980644
Testing large number of renders
SnuOwnd 5.66779303551
EmSnudown 2.73442697525

Size of a minified EmSnudown is 235KB. Size of the current unminified
SnuOwnd is ~100KB. I can't actually check because closure compiler chokes
on variables named 'char', but uglifyjs suggests SnuOwnd might be reducible
to about 33KB.

The current status is that EmSnudown seems to work, and reasonably well.
Given that RES currently (I think) only uses a single simple call to
SnuOwnd it would be replacable as-is with a few line changes (and putting
in the EmSnudown code instead of SnuOwnd).

Todo:

Code at https://github.com/aidanhs/emsnudown, tests at
https://github.com/aidanhs/emsnudown/blob/79588aff539102adcc321f4634ca3f6f5d974d2d/test/test.py


Reply to this email directly or view it on GitHubhttps://github.com//issues/452#issuecomment-20169468
.

@patricksnape
Copy link
Collaborator

@honestbleeps didn't he say 235kb?

​I'm not surprised its faster, the javascript engine guys have been doing alot of work on optimising this kind of code. You didn't say what browser you tested in (unless I missed it), but Firefox should be particularly good at this with OdinMonkey.

​Also, I've noticed this when minimising RES myself, char is actually a reserved keyword in javascript so it's not surprising that it jokes. Really, that should be changed.

On Fri, Jun 28, 2013 at 5:59 AM, honestbleeps notifications@github.com
wrote:

how big is the output? I thought last time we'd looked at emscripten /
snudown it was like a 2mb javascript file?
On Thu, Jun 27, 2013 at 11:08 PM, aidanhs notifications@github.com wrote:

Initial testing and benchmarking looks promising.

EmSnudown currently passes all Snudown sanity tests (SnuOwnd fails one
edge case).

SANITY

=== BODY:
/r/all-minus-something
=== SNUDOWN:

/r/all-minus-something

=== EMSNUDOWN:

/r/all-minus-something

=== SNUOWND:

/r/all-minus-something

FAIL: 1
SUCCESS: 51
FAILED SANITY TESTS

Running through a few thousand random comments has EmSnudown, Snudown and
SnuOwnd all producing identical results.

Benchmarking shows EmSnudown as actually being faster than SnuOwnd which I
found a little surprising. I might play around with this some more and see
if minifying SnuOwnd helps it. They are micro-benchmarks though and were
run on a virtual machine just to see if EmSnudown would be badly slower
(apparently not).

BENCHMARK
PREPARING
RUNNING
Testing initialisation speed
SnuOwnd 9.1412460804
EmSnudown 8.01900315285
Testing medium number of renders
SnuOwnd 6.8896510601
EmSnudown 5.82095980644
Testing large number of renders
SnuOwnd 5.66779303551
EmSnudown 2.73442697525

Size of a minified EmSnudown is 235KB. Size of the current unminified
SnuOwnd is ~100KB. I can't actually check because closure compiler chokes
on variables named 'char', but uglifyjs suggests SnuOwnd might be reducible
to about 33KB.

The current status is that EmSnudown seems to work, and reasonably well.
Given that RES currently (I think) only uses a single simple call to
SnuOwnd it would be replacable as-is with a few line changes (and putting
in the EmSnudown code instead of SnuOwnd).

Todo:

Code at https://github.com/aidanhs/emsnudown, tests at
https://github.com/aidanhs/emsnudown/blob/79588aff539102adcc321f4634ca3f6f5d974d2d/test/test.py


Reply to this email directly or view it on GitHubhttps://github.com//issues/452#issuecomment-20169468
.


Reply to this email directly or view it on GitHub:
#452 (comment)

@aidanhs
Copy link
Contributor Author

aidanhs commented Jun 28, 2013

Yup, 235KB. I'm going to look at with a view to potentially reducing this at the cost of speed...might be interesting to see the tradeoffs.

Benchmarks were run in node, which uses v8 - essentially chrome, which is pleasing because they don't even optimise asm.js yet. Asm.js aware firefox should be even faster. That said, I should do some in-browser benchmarks, particularly for Safari, Opera, IE.

@aidanhs
Copy link
Contributor Author

aidanhs commented Jun 29, 2013

Ok, I messed up with benchmarks. It did seem odd that they were faster than handwritten js (especially as v8 doesn't optimise asm.js)...I guess it's nice to know my intuition was right. Anyway:

Initialisation speed
SnuOwnd 10.1511340141
EmSnudown 21.0092520714
Medium number of renders
SnuOwnd 7.11319208145
EmSnudown 14.9687621593
Large number of renders
SnuOwnd 4.14359998703
EmSnudown 6.31308412552

As you can see, EmSnudown is actually still faster than SnuOwnd for rendering, it's just the initial setup which makes it slower (which makes a lot of sense given it needs to initialise a view of memory). Given that this initial setup can be deferred until when comment preview is actually loaded (as RES currently does with SnuOwnd), I'm not sure this is actually an issue.

As noted above, I'm not putting EmSnudown forward for inclusion yet - I just wanted to correct the benchmarks.

@gamefreak
Copy link
Collaborator

I'm confused... Wouldn't smaller numbers be faster?
Regardless of which is faster I can say that SnuOwnd doesn't really try to be efficient. I just threw around string concatenation and string slices everywhere.

And this is the sanitization function I was talking about. As you can see, it overrides the callbacks and specifies custom flags and whitelist. This would be pretty easy to do as a modification at the C end.

function sanitizeHTML(htmlStr) {
    if (!this.sanitizer) {
        var SnuOwnd = window.SnuOwnd;
        var redditCallbacks = SnuOwnd.getRedditCallbacks();
        var callbacks = SnuOwnd.createCustomCallbacks({
            paragraph: function(out, text, options){
                if (text) out.s += text.s;
            },
            autolink: redditCallbacks.autolink,
            raw_html_tag: redditCallbacks.raw_html_tag
        });
        var rendererConfig = SnuOwnd.defaultRenderState();
        rendererConfig.flags = SnuOwnd.DEFAULT_WIKI_FLAGS;
        rendererConfig.html_element_whitelist = [
            'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'span', 'div', 'code',
            'br', 'hr', 'p', 'a', 'img', 'pre', 'blockquote', 'table',
            'thead', 'tbody', 'tfoot', 'tr', 'th', 'td', 'strong', 'em',
            'i', 'b', 'u', 'ul', 'ol', 'li', 'dl', 'dt', 'dd',
            'font', 'center', 'small', 's', 'q', 'sub', 'sup', 'del'
        ];
        rendererConfig.html_attr_whitelist = [
            'href', 'title', 'src', 'alt', 'colspan',
            'rowspan', 'cellspacing', 'cellpadding', 'scope',
            'face', 'color', 'size', 'bgcolor', 'align'
        ];
        this.sanitizer = SnuOwnd.getParser({
        callbacks: callbacks,
        context: rendererConfig
        });
    }
    return this.sanitizer.render(htmlStr);
}

@aidanhs
Copy link
Contributor Author

aidanhs commented Jul 11, 2013

Sorry, I was trying to be clever by comparing the proportional speedup (i.e. rendering speed) but I've realised that 3 datapoints really isn't enough to do that. Smaller numbers are faster, which is why I was saying that the load speed of emsnudown is quite a bit slower.

Interesting, ok, in theory I should be able to just hook into the stuff that's already in the C code. Might I ask where you use this? I had a look in the RES code but can't see it (nor can I actually see anything but the default renderer being used, implying wiki rendering doesn't work).

@aidanhs
Copy link
Contributor Author

aidanhs commented Aug 11, 2013

This is stalled while I think about emscripten-core/emscripten#1382

It's not very good if the strange unicode faces people like to use don't preview correctly.

@aidanhs
Copy link
Contributor Author

aidanhs commented Aug 20, 2013

The above is solved in the emscripten unstable branch, just need it to graduate to stable...

@aidanhs
Copy link
Contributor Author

aidanhs commented Sep 1, 2013

Now waiting on a bug in the actual snudown code - reddit/snudown#54

@aidanhs
Copy link
Contributor Author

aidanhs commented Sep 1, 2013

Ok, I’m happy with the rendering. I’ll convert this issue to a PR when I get a bit of time to do the legwork and see what I need to change to fully replace SnuOwnd. Some brief notes:

  • Emsnudown has been reduced to 193KB for ‘free’ (courtesy of a more recent Emscripten).
  • I’ve run tests on the snudown test suite and on some 63000 comments. I’ve found a few different cases where snuownd doesn’t quite match up with snudown - they’re all quite minor.
  • Done more benchmarking (I can share numbers or the code is all on the repo) - apparently SnuOwnd is ~2x faster for rendering comments and EmSnudown is ~3x faster at initialisation...but it’s all a little irrelevant because both rendering and initialisation are ~millisecond operations.
  • I’ve set up a live comparison page for curiosity - http://aidanhs.github.io/emsnudown/ (holding down a key isn’t jerky at all...which kinda makes me wonder why doing it in RES is…).

@gavin19
Copy link
Collaborator

gavin19 commented Sep 1, 2013

I don't get the jerkiness unless RES is still working on the page. I just tested a ~600 comment thread and it took almost a minute to append all the save-RES/bitcoin links etc. During that time it was still jumpy, but after that it was smooth.

For the user docs link on the live page use this instead. Yours is missing the permalink part so you need to dig through the 'load comments' to find it.

@gamefreak
Copy link
Collaborator

  • I looked at your examples in the browser and I see any differences in the unicode ones other than the link one.
  • The unicode issues are most likely rooted in how JavaScript handles characters instead of bytes. emsnudown is probably making the conversion while SnuOwnd isn't. I feel like it could be more trouble than it would be worth to fix
  • The /r/all-stuff issue was something I thought I fixed. I could fix that, but if SnuOwnd is just going to be replaced, then why bother.
  • RES deliberately waits until the user stops typing to re-render the markdown.

@aidanhs
Copy link
Contributor Author

aidanhs commented Sep 1, 2013

Thanks for that @gavin19, I thought I'd done that but managed to fail.
And thanks for the jerkiness hint, I guess it must be that.

@gamefreak they are all minor, to the extent that some of them are literally just notes for interest and would never be an issue in reality. To go through "test - behavior - impact" for each of them:

  • minus subreddits - kinda obvious - only really relevant for gold users so meh
  • unicode link - snuownd doesn't urlencode unicode - can't see this being an issue, it seems chrome handles it fine and I hope other browsers do too
  • byte link - there's a non-character byte at the end of the url which is stripped instead of urlencoded - who cares, why do people even have non-character bytes in their comments!?
  • zalgo - looking at it again, I think this is just snuownd collapsing whitespace - no effect since browsers collapse whitespace everywhere that snuownd collapses whitespace
  • mixed whitespace - as above

I absolutely agree about not fixing it (or any of the above). I wouldn't, even if emsnudown didn't exist. WRT replacement, we'll see if I can fill in for the HTML sanitisation functions...

@spladug
Copy link

spladug commented Sep 1, 2013

FYI, Snudown and its upstream Sundown are frozen and will be replaced. See vmg/sundown@37728fb for more info.

@aidanhs
Copy link
Contributor Author

aidanhs commented Sep 2, 2013

@spladug thanks for that, I actually did notice the plans to replace snudown elsewhere, though it's interesting to now have the details of it being a coordinated effort and the intended roadmap.

But: that commit was (almost exactly) 9 months ago and had a stated aim to have a successor to sundown (almost exactly) 8 months ago and had a 'soon' promise made (almost exactly) 3 months ago.
Since the original announcement, while sundown has been frozen, snudown hasn't (minor changes/bugfixes I admit).

This isn't me doubting it'll happen! It's just it's difficult to measure progress with no news. Unless you have some to share? ;) I look forward to sundown's successor - the learning process for emsnudown is coming to an end. And what is life for, if not for learning?

@aidanhs
Copy link
Contributor Author

aidanhs commented Sep 4, 2013

As promised, this has been converted into a PR.

However, the final piece of work (read: hackery) I've done to get a sanitiser working for RES has tipped the balance (in my mind) back to keeping snuownd. The original idea behind emsnudown was to create a thin shim wrapper to easily allow new snudown versions without having to do any work.

Unfortunately the hack that I've had to resort to to expose the custom callback functionality has left it more fragile than I would like. In combination with snudown being deprecated (though I imagine we'll see minor changes slipping in...) the original motivation behind emsnudown is mostly lost and all it provides is a larger, slower, more difficult to customise version of snuownd in exchange for very edge case correctness.

If you do have a compelling reason to want emsnudown that's fine (it's fairly easy for me to maintain) but I personally suggest rejecting this pull request and letting it be an example for what was attempted and/or starting point for emscripten compilation of the successor of snudown. Long live snuownd!

@aidanhs
Copy link
Contributor Author

aidanhs commented Sep 20, 2013

No objections, closing.

@aidanhs aidanhs closed this Sep 20, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants