Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve org mode support #5545

Merged
merged 2 commits into from Jun 8, 2019

Conversation

niklasfasching
Copy link
Contributor

@niklasfasching niklasfasching commented Dec 20, 2018

As the title says, this PR improves the Org mode support of hugo. For a basic overview of the changes, check out this page - it showcases the html output of goorgeous (left, currently included with hugo) and go-org (right, this PR) for the same input. The css is adapted to go-org, so open up dev tools for a more fair comparison.

For an overview of the open issues of goorgeous that are fixed in go-org take a look here

If you want to try out any Org mode snippets, you can do so in your browser here.

Also take a look at the discussion for more information.

I included a second commit in this PR that adds the Org mode front matter to the main content as Org mode itself also uses the front matter for configuration - the impact of this is much smaller (as go-org only uses the TODO front matter variable right now - so we could move that to a separate PR.

// contrary to other document and front matter formats, part of the main section.
// l.emit advances the offset (start) of the lexer (and thus excludes the front matter
// from the main section) so we have to reset it.
frontMatterStart := l.start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. I don't understand the comment, but even if I did: This will produce more content than it consumes, and I'm fairly confident that this can be solved without this workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to explain my goal, maybe you can point me into the right direction:
I want hugo to send the complete file contents to go-org - not just what hugo calls the MainSection (the part after the front matter). This change means to, in a way, use the front matter section twice: (1) for the hugo front matter and (2) as part of the actual content (main section) that is rendered / send to go-org.


The reason for this is that the front matter possibly contains information that is important for the rendering as #+FOO: bar is the settings format of the org-mode parser.

An example:

#+TITLE: Foo
#+TODO: TODO DONE CUSTOM

content [...]

Hugo currently only sends

content [...]

to the renderer. However, the #+TODO: TODO DONE CUSTOM is actually configuration for the Org mode renderer - go-org would use it to update the list of keywords that are recognized as TODO keywords in headlines - see the headline with CUSTOM todo status example.

So I guess the problem this change is trying to solve is that the front matter of an org file actually serves two purposes: (1) configure hugo (2) configure the Org mode parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could store away the start/end positions for front matter here:

https://github.com/gohugoio/hugo/blob/master/hugolib/page_content.go#L92

We currently don't use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the problem this change is trying to solve is that the front matter of an org file actually serves two purposes: (1) configure hugo (2) configure the Org mode parser.

Why not send the unmarshaled frontmatter?

https://github.com/gohugoio/hugo/blob/master/hugolib/page_content.go#L94

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! this sounds like it will get more involved than a 2 line change...

As the benefit of this is pretty small for now (there's not much to be configured in go-org for now) I suggest we split this into a separate PR and just focus on org mode support itself for now.

Is that ok with you?


Why not send the unmarshaled frontmatter?

Good idea - but this requires some more work on the go-org side first - as already discussed in the forum Org mode itself does not support anything but plain string values [1] - so right now we have a problem mapping hugo front matter map[string]interface{} to the map[string]string org mode understands. I am not sure how I want to go ahead with this yet and would like to think about it some more.

[1] The approach suggested by kaushalmodi is custom to ox-hugo

Copy link
Member

@bep bep Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the benefit of this is pretty small for now

Agree. And Hugo's API in this area is also rather clumsy. I have been planning on redoing that for a while ... But time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that If you need some "configuration for the rendererd", we should add that as configuration in Hugo which you could receive in some constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that If you need some "configuration for the rendererd", we should add that as configuration in Hugo which you could receive in some constructor.

Maybe I shouldn't have called it configuration for the renderer. It's not configuration in the sense that it should be passed to the constructor. Rather, we currently remove part of the org document because we conflate hugo configuration (front matter) and org mode configuration (buffer settings) by using the same syntax (#+KEY: VALUE), location (top of the file) and namespace for both of them.
So by removing the front matter from the content that is send to the renderer we actually remove a part of the document.

Coming back to the point before

Why not send the unmarshaled frontmatter?

I like the idea because it sounds like a good way to provide go-org with the full document without polluting the hugo code base with org specific stuff - having the front matter available sounds like it could be useful for other things as well. But, ignoring hugo, just sending the content + the raw front matter together to go-org would be the best thing to do.

I hope that makes sense...

@niklasfasching niklasfasching force-pushed the improve-org-mode-support branch 6 times, most recently from 37c24c1 to 3ddb2fb Compare December 26, 2018 17:56
@niklasfasching
Copy link
Contributor Author

Updated with support for .TableOfContents
Please let me know if there's anything else that should be done - I'll try to keep this branch rebased in the meanwhile.

@bep bep requested a review from kaushalmodi December 28, 2018 09:50
@kaushalmodi
Copy link
Contributor

I'm traveling at the moment. I'll test out this branch once I get back home (Jan 3).

@niklasfasching niklasfasching force-pushed the improve-org-mode-support branch 3 times, most recently from 20ccbb6 to 1e67499 Compare December 30, 2018 17:27
@iNecas
Copy link

iNecas commented Jan 1, 2019

I've started exploring hugo as a platform for my blog. As Emacs user, the native support for org mode was definitely +1 when selecting the tool for the job. I've already hit limitations of goorgeous (such as missing support for definition lists).

Not that it would matter much, but I've checkout this branch and rendering of org-mode work just fine for me: looking forward for hugo having an active maintenance of the org-mode backend again.

@niklasfasching
Copy link
Contributor Author

Not at my PC now. But can you confirm if that doesn't cause any Org lint errors? I.e have keywords ending in [] and then do any arbitrary export, like ox-html, and see that there are no errors.

My reason to have #+begin_src toml blocks for verbatim toml front matter in ox-hugo was that regular Org exports shouldn't break or show weird outputs.

Good point! I didn't know about org-lint, thx :).

It doesn't seem to result in any errors

image

Also Org mode expects at least some keywords to contain square brackets. see org-element-dual-keywords:

Documentation:
List of affiliated keywords which can have a secondary value.

In Org syntax, they can be written with optional square brackets
before the colons.  For example, RESULTS keyword can be
associated to a hash value with the following:

  #+RESULTS[hash-string]: some-source

@kaushalmodi
Copy link
Contributor

Good point! I didn't know about org-lint, thx :).
It doesn't seem to result in any errors

Thanks for checking that.

Also Org mode expects at least some keywords to contain square brackets. see org-element-dual-keywords

And I did not know that. Good to know :)


About testing go-org

As I wasn't a Goorgeous user, I did not have any handy posts in Org format. But I know that @zamansky uses it and blogs often. So I built hugo locally using this PR branch and ran it on his repo (https://github.com/cestlaz/hugo-blog-generator). I got just this one warning:

go-org: Could not parse token org.token{kind:"beginBlock", lvl:0, content:"COMMENT", matches:[]string{"#+BEGIN_COMMENT", "", "COMMENT", ""}}: Falling back to treating it as plain text.

Looks like you just need to support #+begin_comment .. #+end_comment.

Awesome job!

Copy link
Contributor

@kaushalmodi kaushalmodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for that one minor feature request.

@niklasfasching
Copy link
Contributor Author

niklasfasching commented Jan 4, 2019

Looks like you just need to support #+begin_comment .. #+end_comment.

Awesome, thanks for testing it out :)!

Regarding the warnings:
I cloned the repo and ran it locally. I got 2 warnings:

$ go run ~/go/src/github.com/gohugoio/hugo/ 

Building sites … 

WARN 2019/01/04 19:19:02 Could not parse token org.token{kind:"beginBlock", lvl:0, content:"SRC", matches:[]string{"#+BEGIN_SRC emacs-lisp", "", "SRC", " emacs-lisp"}}: Falling back to treating it as plain text.
go-org: Could not parse token org.token{kind:"beginBlock", lvl:0, content:"COMMENT", matches:[]string{"#+BEGIN_COMMENT", "", "COMMENT", ""}}: Falling back to treating it as plain text.

WARN 2019/01/04 19:19:03 Could not parse token org.token{kind:"endBlock", lvl:0, content:"COMMENT", matches:[]string{"#+END_COMMENT", "", "COMMENT"}}: Falling back to treating it as plain text.
  1. Warning 1 is an error in the source file: The last source block contains an unclosed {{< highlight "emacs-lisp" >}} (can't link to the line as github renders it, just look at the last src block).
    The unclosed highlight tag results in the rest of the file getting highlighted and go-org can't find an #+END_SRC line because there isn't one (rather, there is a line with <span style="color:#960050;background-color:#1e0010">#</span>+End_SRC. [1]

  2. Warning 2 is due to the comment block starting at the top of the file. Hugo sees the #+BEGIN_COMMENT line and thinks it's Org mode front matter and removes it from the document that get's send to go-org (only the non-front matter part of the document gets rendered). So... not really a markup error but nothing we can do much about I think.

To summarize, as far as i can tell, neither is a bug in go-org. The first warning is due to a markup error, the second warning is due to a shortcoming in how Org mode front matter is recognized in hugo [2].

[1] I don't know how the blog is rendered as it is at https://cestlaz-nikola.github.io, but using hugo with goorgeous actually swallows everything after the opening tag of the source block in question. If you want to try it out yourself the path to the post is /posts/using-emacs-30-elfeed-2/

[2] Hugo treats lines starting with #+ as Org mode front matter (link). It would be more robust to check whether the line is a keyword with a regexp like this but that adds quite a bit of complexity I'm not sure is worth it.

@kaushalmodi
Copy link
Contributor

Hugo treats lines starting with #+ as Org mode front matter (link). It would be more robust to check whether the line is a keyword with a regexp like this but that adds quite a bit of complexity I'm not sure is worth it.

I would have thought that #+begin_comment .. #+end_comment would be treated similar to how you would treat #+begin_src .. .. #+end_src, and such. Is that not the case?

@niklasfasching
Copy link
Contributor Author

niklasfasching commented Jan 4, 2019

I'm not sure if I understand you correctly, could you expand on that?

Maybe there's a misunderstanding: What you quote was meant to be about how hugo treats lines starting with #+, not about go-org.

The warning in question is generated by a file starting with

#+BEGIN_COMMENT
[...]
#+END_COMMENT

Hugo thinks the first line is front matter (as it starts with #+) and removes it from the content of the file.
go-org receives just

[...]
#+END_COMMENT

As there's no #+BEGIN_COMMENT line, it logs a warning about an out of place #+END_COMMENT line.

PS: This did point to a bug in my integration of go-org into hugo - while i did set the logger to jww for the content rendering, i did not do so for the front matter parsing. Due to this there's also a log message from go-org not prefixed by WARN. Will fix.

@kaushalmodi
Copy link
Contributor

I'm not sure if I understand you correctly, could you expand on that?

Maybe there's a misunderstanding: What you quote was about how hugo treats lines starting with #+, not about what go-org does.

Yes, I quoted correctly. Sorry for not being clear.

I meant to ask what hugo does with #+begin_src blocks (because those also begin with #+ but are not confused as keywords). How do hugo and go-org interact with those blocks so that the code blocks are rendered correctly? Can the same be achieved for #+begin_comment blocks?

@niklasfasching
Copy link
Contributor Author

niklasfasching commented Jan 4, 2019

Thx, got it now :).

Hugo treats the #+BEGIN_COMMENT line as front matter because it starts with #+ and is at the top of the file (i.e. there's no non-empty lines that don't start with #+ ahead of it).
A #+BEGIN_SRC line at the top of the file would also be treated as front matter.

It's just about where in the file the line is, nothing specific about the #+BEGIN_COMMENT.

The fix for the above case is simply

foobar
the above line ends the front matter section / starts the content section because it 1. is not empty and 2. does not start with #+. the below will now work.
#+BEGIN_COMMENT
[...]
#+END_COMMENT

@kaushalmodi
Copy link
Contributor

kaushalmodi commented Jan 4, 2019

It's just about where in the file the line is, nothing specific about the #+BEGIN_COMMENT.

Understood. Thanks. That means that #+begin_comment mid post should work as expected then?

Update: Answering myself to the above question: Yes, based on your edit of your comment above.


This is a separate issue then for hugo.. it should consider #+.. as Org keywords only if #+ is not followed by #+begin_ or #+end_ (case-insensitive). That is not 100% accurate, but that will cover a lot of cases.

Copy link
Contributor

@kaushalmodi kaushalmodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@niklasfasching
Copy link
Contributor Author

niklasfasching commented Jan 4, 2019

Understood. Thanks. That means that #+begin_comment mid post should work as expected then?

👍

This is a separate issue then for hugo

👍

Thanks for taking the time to review this :)

PS: The last force push just fixes the front matter logging & updates go-org (no functional changes, just refactoring)

@kaushalmodi
Copy link
Contributor

kaushalmodi commented Jan 4, 2019

This is a separate issue then for hugo.. it should consider #+.. as Org keywords only if #+ is not followed by #+begin_ or #+end_ (case-insensitive). That is not 100% accurate, but that will cover a lot of cases.

You are officially the most qualified person knowing both Go and Org. Do you think you can tackle fixing that in hugo in a separate PR?

@niklasfasching
Copy link
Contributor Author

niklasfasching commented Jan 4, 2019

Do you think you can tackle fixing that in hugo in a separate PR?

I have it on my todo list (along with #5436) :).

@kaushalmodi
Copy link
Contributor

@bep Do you have further comments on this PR?

@kaushalmodi
Copy link
Contributor

@bep Another tiny nudge to review this.

@niklasfasching niklasfasching force-pushed the improve-org-mode-support branch 2 times, most recently from d30c4ef to 4023f82 Compare March 9, 2019 14:24
Sadly, goorgeous has not been updated in over a year and still has a lot of
open issues (e.g. no support for nested lists).

go-org fixes most of those issues and supports a larger subset of Org mode
syntax.
Hugo requires some front matter values to be arrays (e.g. for taxonomies).
Org mode front matter syntax (`#+KEY: VALUE`) does however not support anything
but string values normally - which is why goorgeous hardcoded the values for
the keys tags, categories & aliases to be parsed as string arrays. This causes
problems with custom taxonomies.

A simple thing we can do instead is make keywords ending in '[]' be parsed as
string arrays.
@niklasfasching
Copy link
Contributor Author

@bep is there anything specific blocking this? i can totally understand if this just doesn't have priority, just stumbled across this again by chance :)

@bep
Copy link
Member

bep commented Jun 8, 2019

@niklasfasching really sorry about this. I was reminded about this when I got some version issues with that other library.

@bep bep merged commit fad183c into gohugoio:master Jun 8, 2019
@niklasfasching
Copy link
Contributor Author

No worries, thanks for all your work on hugo :)

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants