Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve org mode support #5545

Merged
merged 2 commits into from Jun 8, 2019

Conversation

@niklasfasching
Copy link
Contributor

commented Dec 20, 2018

As the title says, this PR improves the Org mode support of hugo. For a basic overview of the changes, check out this page - it showcases the html output of goorgeous (left, currently included with hugo) and go-org (right, this PR) for the same input. The css is adapted to go-org, so open up dev tools for a more fair comparison.

For an overview of the open issues of goorgeous that are fixed in go-org take a look here

If you want to try out any Org mode snippets, you can do so in your browser here.

Also take a look at the discussion for more information.

I included a second commit in this PR that adds the Org mode front matter to the main content as Org mode itself also uses the front matter for configuration - the impact of this is much smaller (as go-org only uses the TODO front matter variable right now - so we could move that to a separate PR.

// contrary to other document and front matter formats, part of the main section.
// l.emit advances the offset (start) of the lexer (and thus excludes the front matter
// from the main section) so we have to reset it.
frontMatterStart := l.start

This comment has been minimized.

Copy link
@bep

bep Dec 20, 2018

Member

This doesn't look right. I don't understand the comment, but even if I did: This will produce more content than it consumes, and I'm fairly confident that this can be solved without this workaround.

This comment has been minimized.

Copy link
@niklasfasching

niklasfasching Dec 20, 2018

Author Contributor

I'll try to explain my goal, maybe you can point me into the right direction:
I want hugo to send the complete file contents to go-org - not just what hugo calls the MainSection (the part after the front matter). This change means to, in a way, use the front matter section twice: (1) for the hugo front matter and (2) as part of the actual content (main section) that is rendered / send to go-org.


The reason for this is that the front matter possibly contains information that is important for the rendering as #+FOO: bar is the settings format of the org-mode parser.

An example:

#+TITLE: Foo
#+TODO: TODO DONE CUSTOM

content [...]

Hugo currently only sends

content [...]

to the renderer. However, the #+TODO: TODO DONE CUSTOM is actually configuration for the Org mode renderer - go-org would use it to update the list of keywords that are recognized as TODO keywords in headlines - see the headline with CUSTOM todo status example.

So I guess the problem this change is trying to solve is that the front matter of an org file actually serves two purposes: (1) configure hugo (2) configure the Org mode parser.

This comment has been minimized.

Copy link
@bep

bep Dec 20, 2018

Member

You could store away the start/end positions for front matter here:

https://github.com/gohugoio/hugo/blob/master/hugolib/page_content.go#L92

We currently don't use it.

This comment has been minimized.

Copy link
@bep

bep Dec 21, 2018

Member

So I guess the problem this change is trying to solve is that the front matter of an org file actually serves two purposes: (1) configure hugo (2) configure the Org mode parser.

Why not send the unmarshaled frontmatter?

https://github.com/gohugoio/hugo/blob/master/hugolib/page_content.go#L94

This comment has been minimized.

Copy link
@niklasfasching

niklasfasching Dec 21, 2018

Author Contributor

Thanks! this sounds like it will get more involved than a 2 line change...

As the benefit of this is pretty small for now (there's not much to be configured in go-org for now) I suggest we split this into a separate PR and just focus on org mode support itself for now.

Is that ok with you?


Why not send the unmarshaled frontmatter?

Good idea - but this requires some more work on the go-org side first - as already discussed in the forum Org mode itself does not support anything but plain string values [1] - so right now we have a problem mapping hugo front matter map[string]interface{} to the map[string]string org mode understands. I am not sure how I want to go ahead with this yet and would like to think about it some more.

[1] The approach suggested by kaushalmodi is custom to ox-hugo

This comment has been minimized.

Copy link
@bep

bep Dec 21, 2018

Member

As the benefit of this is pretty small for now

Agree. And Hugo's API in this area is also rather clumsy. I have been planning on redoing that for a while ... But time.

This comment has been minimized.

Copy link
@bep

bep Dec 21, 2018

Member

Also note that If you need some "configuration for the rendererd", we should add that as configuration in Hugo which you could receive in some constructor.

This comment has been minimized.

Copy link
@niklasfasching

niklasfasching Dec 21, 2018

Author Contributor

Also note that If you need some "configuration for the rendererd", we should add that as configuration in Hugo which you could receive in some constructor.

Maybe I shouldn't have called it configuration for the renderer. It's not configuration in the sense that it should be passed to the constructor. Rather, we currently remove part of the org document because we conflate hugo configuration (front matter) and org mode configuration (buffer settings) by using the same syntax (#+KEY: VALUE), location (top of the file) and namespace for both of them.
So by removing the front matter from the content that is send to the renderer we actually remove a part of the document.

Coming back to the point before

Why not send the unmarshaled frontmatter?

I like the idea because it sounds like a good way to provide go-org with the full document without polluting the hugo code base with org specific stuff - having the front matter available sounds like it could be useful for other things as well. But, ignoring hugo, just sending the content + the raw front matter together to go-org would be the best thing to do.

I hope that makes sense...

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch 6 times, most recently from 37c24c1 to 3ddb2fb Dec 21, 2018

@niklasfasching

This comment has been minimized.

Copy link
Contributor Author

commented Dec 26, 2018

Updated with support for .TableOfContents
Please let me know if there's anything else that should be done - I'll try to keep this branch rebased in the meanwhile.

@bep bep requested a review from kaushalmodi Dec 28, 2018

@kaushalmodi

This comment has been minimized.

Copy link
Member

commented Dec 28, 2018

I'm traveling at the moment. I'll test out this branch once I get back home (Jan 3).

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch 3 times, most recently from 20ccbb6 to 1e67499 Dec 30, 2018

@iNecas

This comment has been minimized.

Copy link

commented Jan 1, 2019

I've started exploring hugo as a platform for my blog. As Emacs user, the native support for org mode was definitely +1 when selecting the tool for the job. I've already hit limitations of goorgeous (such as missing support for definition lists).

Not that it would matter much, but I've checkout this branch and rendering of org-mode work just fine for me: looking forward for hugo having an active maintenance of the org-mode backend again.

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch from 1e67499 to 9ddb80a Jan 2, 2019

@niklasfasching

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

Not at my PC now. But can you confirm if that doesn't cause any Org lint errors? I.e have keywords ending in [] and then do any arbitrary export, like ox-html, and see that there are no errors.

My reason to have #+begin_src toml blocks for verbatim toml front matter in ox-hugo was that regular Org exports shouldn't break or show weird outputs.

Good point! I didn't know about org-lint, thx :).

It doesn't seem to result in any errors

image

Also Org mode expects at least some keywords to contain square brackets. see org-element-dual-keywords:

Documentation:
List of affiliated keywords which can have a secondary value.

In Org syntax, they can be written with optional square brackets
before the colons.  For example, RESULTS keyword can be
associated to a hash value with the following:

  #+RESULTS[hash-string]: some-source
@kaushalmodi

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

Good point! I didn't know about org-lint, thx :).
It doesn't seem to result in any errors

Thanks for checking that.

Also Org mode expects at least some keywords to contain square brackets. see org-element-dual-keywords

And I did not know that. Good to know :)


About testing go-org

As I wasn't a Goorgeous user, I did not have any handy posts in Org format. But I know that @zamansky uses it and blogs often. So I built hugo locally using this PR branch and ran it on his repo (https://github.com/cestlaz/hugo-blog-generator). I got just this one warning:

go-org: Could not parse token org.token{kind:"beginBlock", lvl:0, content:"COMMENT", matches:[]string{"#+BEGIN_COMMENT", "", "COMMENT", ""}}: Falling back to treating it as plain text.

Looks like you just need to support #+begin_comment .. #+end_comment.

Awesome job!

@kaushalmodi
Copy link
Member

left a comment

Looks good except for that one minor feature request.

@niklasfasching

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

Looks like you just need to support #+begin_comment .. #+end_comment.

Awesome, thanks for testing it out :)!

Regarding the warnings:
I cloned the repo and ran it locally. I got 2 warnings:

$ go run ~/go/src/github.com/gohugoio/hugo/ 

Building sites … 

WARN 2019/01/04 19:19:02 Could not parse token org.token{kind:"beginBlock", lvl:0, content:"SRC", matches:[]string{"#+BEGIN_SRC emacs-lisp", "", "SRC", " emacs-lisp"}}: Falling back to treating it as plain text.
go-org: Could not parse token org.token{kind:"beginBlock", lvl:0, content:"COMMENT", matches:[]string{"#+BEGIN_COMMENT", "", "COMMENT", ""}}: Falling back to treating it as plain text.

WARN 2019/01/04 19:19:03 Could not parse token org.token{kind:"endBlock", lvl:0, content:"COMMENT", matches:[]string{"#+END_COMMENT", "", "COMMENT"}}: Falling back to treating it as plain text.
  1. Warning 1 is an error in the source file: The last source block contains an unclosed {{< highlight "emacs-lisp" >}} (can't link to the line as github renders it, just look at the last src block).
    The unclosed highlight tag results in the rest of the file getting highlighted and go-org can't find an #+END_SRC line because there isn't one (rather, there is a line with <span style="color:#960050;background-color:#1e0010">#</span>+End_SRC. [1]

  2. Warning 2 is due to the comment block starting at the top of the file. Hugo sees the #+BEGIN_COMMENT line and thinks it's Org mode front matter and removes it from the document that get's send to go-org (only the non-front matter part of the document gets rendered). So... not really a markup error but nothing we can do much about I think.

To summarize, as far as i can tell, neither is a bug in go-org. The first warning is due to a markup error, the second warning is due to a shortcoming in how Org mode front matter is recognized in hugo [2].

[1] I don't know how the blog is rendered as it is at https://cestlaz-nikola.github.io, but using hugo with goorgeous actually swallows everything after the opening tag of the source block in question. If you want to try it out yourself the path to the post is /posts/using-emacs-30-elfeed-2/

[2] Hugo treats lines starting with #+ as Org mode front matter (link). It would be more robust to check whether the line is a keyword with a regexp like this but that adds quite a bit of complexity I'm not sure is worth it.

@kaushalmodi

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

Hugo treats lines starting with #+ as Org mode front matter (link). It would be more robust to check whether the line is a keyword with a regexp like this but that adds quite a bit of complexity I'm not sure is worth it.

I would have thought that #+begin_comment .. #+end_comment would be treated similar to how you would treat #+begin_src .. .. #+end_src, and such. Is that not the case?

@niklasfasching

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

I'm not sure if I understand you correctly, could you expand on that?

Maybe there's a misunderstanding: What you quote was meant to be about how hugo treats lines starting with #+, not about go-org.

The warning in question is generated by a file starting with

#+BEGIN_COMMENT
[...]
#+END_COMMENT

Hugo thinks the first line is front matter (as it starts with #+) and removes it from the content of the file.
go-org receives just

[...]
#+END_COMMENT

As there's no #+BEGIN_COMMENT line, it logs a warning about an out of place #+END_COMMENT line.

PS: This did point to a bug in my integration of go-org into hugo - while i did set the logger to jww for the content rendering, i did not do so for the front matter parsing. Due to this there's also a log message from go-org not prefixed by WARN. Will fix.

@kaushalmodi

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

I'm not sure if I understand you correctly, could you expand on that?

Maybe there's a misunderstanding: What you quote was about how hugo treats lines starting with #+, not about what go-org does.

Yes, I quoted correctly. Sorry for not being clear.

I meant to ask what hugo does with #+begin_src blocks (because those also begin with #+ but are not confused as keywords). How do hugo and go-org interact with those blocks so that the code blocks are rendered correctly? Can the same be achieved for #+begin_comment blocks?

@niklasfasching

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

Thx, got it now :).

Hugo treats the #+BEGIN_COMMENT line as front matter because it starts with #+ and is at the top of the file (i.e. there's no non-empty lines that don't start with #+ ahead of it).
A #+BEGIN_SRC line at the top of the file would also be treated as front matter.

It's just about where in the file the line is, nothing specific about the #+BEGIN_COMMENT.

The fix for the above case is simply

foobar
the above line ends the front matter section / starts the content section because it 1. is not empty and 2. does not start with #+. the below will now work.
#+BEGIN_COMMENT
[...]
#+END_COMMENT
@kaushalmodi

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

It's just about where in the file the line is, nothing specific about the #+BEGIN_COMMENT.

Understood. Thanks. That means that #+begin_comment mid post should work as expected then?

Update: Answering myself to the above question: Yes, based on your edit of your comment above.


This is a separate issue then for hugo.. it should consider #+.. as Org keywords only if #+ is not followed by #+begin_ or #+end_ (case-insensitive). That is not 100% accurate, but that will cover a lot of cases.

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch from 9ddb80a to f5a2a6c Jan 4, 2019

@kaushalmodi
Copy link
Member

left a comment

Looks good to me.

@niklasfasching

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

Understood. Thanks. That means that #+begin_comment mid post should work as expected then?

👍

This is a separate issue then for hugo

👍

Thanks for taking the time to review this :)

PS: The last force push just fixes the front matter logging & updates go-org (no functional changes, just refactoring)

@kaushalmodi

This comment has been minimized.

Copy link
Member

commented Jan 4, 2019

This is a separate issue then for hugo.. it should consider #+.. as Org keywords only if #+ is not followed by #+begin_ or #+end_ (case-insensitive). That is not 100% accurate, but that will cover a lot of cases.

You are officially the most qualified person knowing both Go and Org. Do you think you can tackle fixing that in hugo in a separate PR?

@niklasfasching

This comment has been minimized.

Copy link
Contributor Author

commented Jan 4, 2019

Do you think you can tackle fixing that in hugo in a separate PR?

I have it on my todo list (along with #5436) :).

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch from f5a2a6c to 5ab3942 Jan 4, 2019

@kaushalmodi

This comment has been minimized.

Copy link
Member

commented Jan 9, 2019

@bep Do you have further comments on this PR?

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch from 5ab3942 to b518737 Jan 9, 2019

@sunbing81 sunbing81 referenced this pull request Jan 13, 2019

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch from b518737 to 7a04299 Jan 13, 2019

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch from 7a04299 to d0237aa Jan 25, 2019

@kaushalmodi

This comment has been minimized.

Copy link
Member

commented Jan 31, 2019

@bep Another tiny nudge to review this.

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch 2 times, most recently from d30c4ef to 4023f82 Mar 9, 2019

niklasfasching added some commits Jun 4, 2019

Improve Org mode support: Replace goorgeous with go-org
Sadly, goorgeous has not been updated in over a year and still has a lot of
open issues (e.g. no support for nested lists).

go-org fixes most of those issues and supports a larger subset of Org mode
syntax.
Refactor Org mode front matter: Introduce '#+KEY[]:' array notation
Hugo requires some front matter values to be arrays (e.g. for taxonomies).
Org mode front matter syntax (`#+KEY: VALUE`) does however not support anything
but string values normally - which is why goorgeous hardcoded the values for
the keys tags, categories & aliases to be parsed as string arrays. This causes
problems with custom taxonomies.

A simple thing we can do instead is make keywords ending in '[]' be parsed as
string arrays.

@niklasfasching niklasfasching force-pushed the niklasfasching:improve-org-mode-support branch from 4023f82 to e06720c Jun 4, 2019

@niklasfasching

This comment has been minimized.

Copy link
Contributor Author

commented Jun 4, 2019

@bep is there anything specific blocking this? i can totally understand if this just doesn't have priority, just stumbled across this again by chance :)

@bep

This comment has been minimized.

Copy link
Member

commented Jun 8, 2019

@niklasfasching really sorry about this. I was reminded about this when I got some version issues with that other library.

@bep bep merged commit fad183c into gohugoio:master Jun 8, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details
@niklasfasching

This comment has been minimized.

Copy link
Contributor Author

commented Jun 8, 2019

No worries, thanks for all your work on hugo :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.