Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new md parser #33

Merged
merged 5 commits into from
Jun 19, 2018
Merged

Conversation

didrocks
Copy link
Contributor

Here is the md parser we are using for quite almost a year in ubuntu. (powering https://tutorial.ubuntu.com).

It's inspired by the google doc parser. I didn't propose it beforehand as I wasn't given time to rewrite the md test suite and I think we should have an interface and share some code between the 2 parsers.
However, @samtstern convinced me to put this PR up against the upstream repo and will handle the necessary tests additions. ;)

You can see a scaffold of a markdown tutorial with syntax description here: https://github.com/canonical-websites/tutorials.ubuntu.com/blob/master/examples/example-tutorial.md (and see that it formats well with any markdown parser, like github).

Note as well that I wrote https://github.com/Ubuntu/tutorial-deployment (with tests) to generate an offline tutorial website (compatible with our web team site). The feature you can find interesting there is the possibility to save any markdown file and have the current tutorial web page refreshed (a little bit like the online gdoc refresh google appengine service for codelabs) to easily view your modifications. It only needs to use this polymer component: https://github.com/canonical-websites/tutorials.ubuntu.com/blob/master/src/elements/websocket-reloader.html which websocket signal and reload the page if this is the currently modified tutorial.

Hope that helps!

@x1ddos
Copy link
Contributor

x1ddos commented Jan 18, 2018

Thanks so much @didrocks! Will take a look at it soon.
And of course thanks @samtstern for convincing you :)

/cc @marcacohen @cassierecher in case you guys want to take a look too

@samtstern
Copy link
Contributor

@x1ddos I told @didrocks I would take on the work of writing some tests / doing other cleanup if this PR gets merged. As I showed you in an email this MD parser is really solid and has a bunch of features we could use.

@samtstern
Copy link
Contributor

@x1ddos bump, let me know if I can help push this through!

@x1ddos
Copy link
Contributor

x1ddos commented Jan 24, 2018

Hey, sorry for a long wait. Looking at it now and will probably continue tomorrow.

@x1ddos
Copy link
Contributor

x1ddos commented Jan 24, 2018

I just want to make sure this won't introduce incompatible changes with internal usage.

@x1ddos
Copy link
Contributor

x1ddos commented Jan 30, 2018

Sorry, have an urgent change to make, to close #34. Will come back to this right after.

@x1ddos x1ddos self-assigned this Jan 30, 2018
@x1ddos
Copy link
Contributor

x1ddos commented Jan 30, 2018

Meanwhile, @didrocks could you rebase this onto latest master?
I switched to Travis CI, to make sure it still compiles and passes all existing tests.

@didrocks
Copy link
Contributor Author

And rebased! The existing tests pass locally and on Travis.

@x1ddos
Copy link
Contributor

x1ddos commented Feb 14, 2018

Man, sorry. I'm back to this. Thanks for rebasing!

@samtstern
Copy link
Contributor

@x1ddos bump.

@x1ddos
Copy link
Contributor

x1ddos commented Mar 7, 2018

Yeah, I'm here :)
Have it open all the time. Still getting back to it.
Sorry for such a huge delay.

@rachelmyers
Copy link

I want to bumb this; this would be a great addition if we can get it in. Can we get a decision on if it can be accepted or what blocks acceptance?

@samtstern
Copy link
Contributor

@x1ddos did some basic memory usage analysis of this PR. I used the current README.md from master and converted it to a codelab.

Here's the usage from with the current built-from-master claat:

$ /usr/bin/time -v ~/gocode/bin/claat export -o /usr/local/google/home/samstern/Scratch/codelabs-out README.md
ok
        Command being timed: "/usr/local/google/home/samstern/gocode/bin/claat export -o /usr/local/google/home/samstern/Scratch/codelabs-out README.md"
        User time (seconds): 0.01
        System time (seconds): 0.00
        Percent of CPU this job got: 106%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 9400
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 731
        Voluntary context switches: 115
        Involuntary context switches: 1
        Swaps: 0
        File system inputs: 0
        File system outputs: 32
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

And here's the usage with claat built from this branch:

$ /usr/bin/time -v ~/gocode/bin/claat export -o /usr/local/google/home/samstern/Scratch/codelabs-out README.md
ok
        Command being timed: "/usr/local/google/home/samstern/gocode/bin/claat export -o /usr/local/google/home/samstern/Scratch/codelabs-out README.md"
        User time (seconds): 0.01
        System time (seconds): 0.00
        Percent of CPU this job got: 106%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 9244
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 737
        Voluntary context switches: 82
        Involuntary context switches: 1
        Swaps: 0
        File system inputs: 0
        File system outputs: 32
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

The key thing to see here is that maximum resident set size is about the same. When I run the command repeatedly I actually get slight variance in the number but always in the 9000kb-9600kb range.

Is this enough evidence to drop the memory concern? Or would you like me to do a different kind of test.

@x1ddos
Copy link
Contributor

x1ddos commented Jun 18, 2018

So, I've just ran some simple benchmarking using the following function:

// testdata/codelab.md is
// https://github.com/canonical-websites/tutorials.ubuntu.com/blob/master/tutorials/server/access-remote-desktop/access-remote-desktop.md
func BenchmarkParser(b *testing.B) {
	lab, err := ioutil.ReadFile("testdata/codelab.md")
	if err != nil {
		b.Fatal(err)
	}
	r := bytes.NewReader(lab)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
	        p := Parser{}
		if _, err := p.Parse(r); err != nil {
			b.Fatal(err)
		}
	}
}

The results using existing parser:

# Old MD parser.
$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: github.com/googlecodelabs/tools/claat/parser/md
BenchmarkParser-4   	 1000000	      2120 ns/op	   10240 B/op	      14 allocs/op
PASS
ok  	github.com/googlecodelabs/tools/claat/parser/md	2.158s

The new parser in this change:

# New MD parser.
$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: github.com/googlecodelabs/tools/claat/parser/md
BenchmarkParser-4   	  500000	      2703 ns/op	   10920 B/op	      22 allocs/op
PASS
ok  	github.com/googlecodelabs/tools/claat/parser/md	1.396s

I have to admit I was expecting worse due to switching away from tokenizer-based parser, but it's not that bat at all.

Hey @didrocks would you mind resolving conflicts? There's only one change in parse.go which adds support for alt and title image attributes. I briefly checked this change and didn't see anything similar in func image(ds *docState) types.Node.

I'm reviewing the code meanwhile.

It's too bad the tests were removed but @samtstern promised he would add them back. :)

@samtstern
Copy link
Contributor

@x1ddos I went and sent a PR doing the work of fixing the merge conflicts:
didrocks#2

In the event that @didrocks is no longer interested in continuing this work we can use my fork of his branch.

@x1ddos
Copy link
Contributor

x1ddos commented Jun 18, 2018

In the event that @didrocks is no longer interested in continuing this work we can use my fork of his branch.

Yes, of course, although it would be very nice if we all could use the same tool without forking.

@samtstern
Copy link
Contributor

samtstern commented Jun 18, 2018 via email

@didrocks
Copy link
Contributor Author

Even if it's been some months, I'm still around :)

Rather than using a merged PR, I just rebased on latest master. Resolving the conflict was indeed using didrocks@95d2c4e (thanks Sam!). Just note that you have a typo: it's not m.Title but n.Title.

CI passes on Go 1.8. Note that master doesn't pass due to formatting issue in an unrelated file: parser/gdoc/parse_test.go:198: T.Errorf format %q has arg clab.Meta.Status of wrong type *github.com/googlecodelabs/tools/claat/types.LegacyStatus. Seems trivial to fix, but I don't want to mix up in PR. I can handle it in another PR or you can fix it directly in master, which sounds quicker.

If this is fine for you, I'll let you close the other PR and once merged, I'll move ubuntu (even if I'm not in charge of that anymore) to use claat master tool ;)

Copy link
Contributor

@x1ddos x1ddos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems very nice. Just a couple minor nits, because you'd want to rebase after #53 anyway.

Thanks for this!

func code(ds *docState, term bool) types.Node {
elem := findParent(ds.cur, atom.Pre)
// inline <code> text
// TODO, here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this TODO mean, could you expand it to a phrase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mostly a leftover (when I reworked the parser, I created all functions to not miss some balise and added TODO to list them). Removed.

}

title := nodeAttr(ds.cur, "title")
if title != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you might as well make this more compact:

if v := nodeAttr(ds.cur, "title"); v != "" {
  n.Title = v
}

the same for alt above.
then, the whole image function will get quite smaller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, we don't reuse v later on, to if scope is enough. I'm even unsure why I didn't modify it. Done now!

@x1ddos
Copy link
Contributor

x1ddos commented Jun 19, 2018

I merged #53: @didrocks could you rebase again? hopefully the last time. thanks!

The markdown parser wasn't reentrant, so list in list, italic text in a list
or note, and other similar cases weren't supported.
Also, some objects like survey and others weren't supported either.

Use the gdoc parser thus to mirror the functionality and global logic. Adapt
it as we don't have css and the output from blackfriday conversion is a little
bit different..
gdoc parser was always emiting Start = 1 when exporting ordered list,
however, in markdown, it doesn't have default value (being 1), and so, ordered
lists were incorrectly marked as ul instead of ol.
Test the correct current node based on previous and next ones.
We are now using the gdoc parser implementation. This should be abstracted on
a common toolbox in the long run.
Tests are removed from now on, should be reshaped and readded.
A survey is a special prefixed table, with one cell being a different survey.
The survey is composed of a text (title), and one or more list items.
@didrocks
Copy link
Contributor Author

Rebased and nitpicks modifications done. Thanks for the review! CI passes on 1.8.x and 1.10.x.


func isButton(hn *html.Node) bool {
// TODO: implements
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this the first time.
Do you guys not use the "download" button at all?
It was implemented before in

ps.emit(types.NewButtonNode(true, true, true, newBreaklessTextNode(s[1])))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I found the usage below in func button but because this isButton always returns false, the condition never holds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have grepped for TODO… (but with the tests issue and no time given to work on those, this is why I didn't dare proposing it beforehand). Indeed, we don't recognize button, we don't use it on tutorials.ubuntu.com. I should spend time looking at it again, and what syntax would generate a button (if I remember correctly on gdoc, it was a link + special name).

I can't handle this today, but will probably be available to look into it later this week, is that fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can add it later. maybe @samtstern can help as well.

Copy link
Contributor

@x1ddos x1ddos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up TODOs:

  • the [download] button
  • tests!

@x1ddos x1ddos merged commit d557ecb into googlecodelabs:master Jun 19, 2018
This was referenced Jun 19, 2018
@didrocks
Copy link
Contributor Author

Thanks for merging!

@x1ddos
Copy link
Contributor

x1ddos commented Jun 19, 2018

Thanks for sticking with this!
Sorry it took so long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants