-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement new md parser #33
Conversation
Thanks so much @didrocks! Will take a look at it soon. /cc @marcacohen @cassierecher in case you guys want to take a look too |
@x1ddos bump, let me know if I can help push this through! |
Hey, sorry for a long wait. Looking at it now and will probably continue tomorrow. |
I just want to make sure this won't introduce incompatible changes with internal usage. |
Sorry, have an urgent change to make, to close #34. Will come back to this right after. |
Meanwhile, @didrocks could you rebase this onto latest master? |
e1de21c
to
4ee1dcb
Compare
And rebased! The existing tests pass locally and on Travis. |
Man, sorry. I'm back to this. Thanks for rebasing! |
@x1ddos bump. |
Yeah, I'm here :) |
I want to bumb this; this would be a great addition if we can get it in. Can we get a decision on if it can be accepted or what blocks acceptance? |
@x1ddos did some basic memory usage analysis of this PR. I used the current Here's the usage from with the current built-from-master
And here's the usage with
The key thing to see here is that Is this enough evidence to drop the memory concern? Or would you like me to do a different kind of test. |
So, I've just ran some simple benchmarking using the following function: // testdata/codelab.md is
// https://github.com/canonical-websites/tutorials.ubuntu.com/blob/master/tutorials/server/access-remote-desktop/access-remote-desktop.md
func BenchmarkParser(b *testing.B) {
lab, err := ioutil.ReadFile("testdata/codelab.md")
if err != nil {
b.Fatal(err)
}
r := bytes.NewReader(lab)
b.ResetTimer()
for i := 0; i < b.N; i++ {
p := Parser{}
if _, err := p.Parse(r); err != nil {
b.Fatal(err)
}
}
} The results using existing parser:
The new parser in this change:
I have to admit I was expecting worse due to switching away from tokenizer-based parser, but it's not that bat at all. Hey @didrocks would you mind resolving conflicts? There's only one change in parse.go which adds support for I'm reviewing the code meanwhile. It's too bad the tests were removed but @samtstern promised he would add them back. :) |
@x1ddos I went and sent a PR doing the work of fixing the merge conflicts: In the event that @didrocks is no longer interested in continuing this work we can use my fork of his branch. |
Yes, of course, although it would be very nice if we all could use the same tool without forking. |
Oh no, what I meant was we can create a PR from my fork in the event that
this PR is "dead". And then I can respond to review comments on behalf of
didirocks. It's a last resort but at least we can still keep moving!
…On Mon, Jun 18, 2018 at 11:24 AM alex ***@***.***> wrote:
In the event that @didrocks <https://github.com/didrocks> is no longer
interested in continuing this work we can use my fork of his branch.
Yes, of course, although it would be very nice if we all could use the
same tool without forking.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIEw6sGvY-Xz7WTKyO-GmYdLBWrBKJi5ks5t9_BPgaJpZM4RjEfc>
.
|
Even if it's been some months, I'm still around :) Rather than using a merged PR, I just rebased on latest master. Resolving the conflict was indeed using didrocks@95d2c4e (thanks Sam!). Just note that you have a typo: it's not CI passes on Go 1.8. Note that master doesn't pass due to formatting issue in an unrelated file: If this is fine for you, I'll let you close the other PR and once merged, I'll move ubuntu (even if I'm not in charge of that anymore) to use claat master tool ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems very nice. Just a couple minor nits, because you'd want to rebase after #53 anyway.
Thanks for this!
claat/parser/md/parse.go
Outdated
func code(ds *docState, term bool) types.Node { | ||
elem := findParent(ds.cur, atom.Pre) | ||
// inline <code> text | ||
// TODO, here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this TODO mean, could you expand it to a phrase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mostly a leftover (when I reworked the parser, I created all functions to not miss some balise and added TODO to list them). Removed.
claat/parser/md/parse.go
Outdated
} | ||
|
||
title := nodeAttr(ds.cur, "title") | ||
if title != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you might as well make this more compact:
if v := nodeAttr(ds.cur, "title"); v != "" {
n.Title = v
}
the same for alt
above.
then, the whole image function will get quite smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, we don't reuse v later on, to if scope is enough. I'm even unsure why I didn't modify it. Done now!
The markdown parser wasn't reentrant, so list in list, italic text in a list or note, and other similar cases weren't supported. Also, some objects like survey and others weren't supported either. Use the gdoc parser thus to mirror the functionality and global logic. Adapt it as we don't have css and the output from blackfriday conversion is a little bit different..
gdoc parser was always emiting Start = 1 when exporting ordered list, however, in markdown, it doesn't have default value (being 1), and so, ordered lists were incorrectly marked as ul instead of ol.
Test the correct current node based on previous and next ones.
We are now using the gdoc parser implementation. This should be abstracted on a common toolbox in the long run. Tests are removed from now on, should be reshaped and readded.
A survey is a special prefixed table, with one cell being a different survey. The survey is composed of a text (title), and one or more list items.
Rebased and nitpicks modifications done. Thanks for the review! CI passes on 1.8.x and 1.10.x. |
|
||
func isButton(hn *html.Node) bool { | ||
// TODO: implements | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this the first time.
Do you guys not use the "download" button at all?
It was implemented before in
tools/claat/parser/md/parse.go
Line 443 in 2aac38f
ps.emit(types.NewButtonNode(true, true, true, newBreaklessTextNode(s[1]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I found the usage below in func button
but because this isButton
always returns false, the condition never holds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have grepped for TODO… (but with the tests issue and no time given to work on those, this is why I didn't dare proposing it beforehand). Indeed, we don't recognize button, we don't use it on tutorials.ubuntu.com. I should spend time looking at it again, and what syntax would generate a button (if I remember correctly on gdoc, it was a link + special name).
I can't handle this today, but will probably be available to look into it later this week, is that fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we can add it later. maybe @samtstern can help as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up TODOs:
- the [download] button
- tests!
Thanks for merging! |
Thanks for sticking with this! |
Here is the md parser we are using for quite almost a year in ubuntu. (powering https://tutorial.ubuntu.com).
It's inspired by the google doc parser. I didn't propose it beforehand as I wasn't given time to rewrite the md test suite and I think we should have an interface and share some code between the 2 parsers.
However, @samtstern convinced me to put this PR up against the upstream repo and will handle the necessary tests additions. ;)
You can see a scaffold of a markdown tutorial with syntax description here: https://github.com/canonical-websites/tutorials.ubuntu.com/blob/master/examples/example-tutorial.md (and see that it formats well with any markdown parser, like github).
Note as well that I wrote https://github.com/Ubuntu/tutorial-deployment (with tests) to generate an offline tutorial website (compatible with our web team site). The feature you can find interesting there is the possibility to save any markdown file and have the current tutorial web page refreshed (a little bit like the online gdoc refresh google appengine service for codelabs) to easily view your modifications. It only needs to use this polymer component: https://github.com/canonical-websites/tutorials.ubuntu.com/blob/master/src/elements/websocket-reloader.html which websocket signal and reload the page if this is the currently modified tutorial.
Hope that helps!