Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/tools/cmd/godoc: GORDO enriched Go documentation format. #35947

Open
ohir opened this issue Dec 3, 2019 · 4 comments
Open

proposal: x/tools/cmd/godoc: GORDO enriched Go documentation format. #35947

ohir opened this issue Dec 3, 2019 · 4 comments
Labels
Projects
Milestone

Comments

@ohir
Copy link

@ohir ohir commented Dec 3, 2019

Proposal: GORDO enriched Go documentation format.

Author: Ohir Ripe [Wojciech S. Czarnecki]

Last updated: 2019/12/03

Discussion at https://golang.org/issue/35947

Related to: #7873, #16666, #35896 and other "rich format please" issues.

Abstract

GORDO (dʒɔrˈdo) stands for GO Rich DOcs

I propose using a ´gordo´ annotations within the ¨Go¨ source documentation. Gordo enrichment of the source text is unobtrusive even if read in the ˉraw formˉ. Did you notice gordo annotations? These would render:

I propose using a gordo annotations within the Go source documentation. Gordo enrichment of the source text is unobtrusive even if read in the raw form.

Background

Current state of Go's source documentation processing is good enough for documenting single implemented things, ie. functions, variables, constants. It falls short if one must convey a new idea, an unobvious implementation of an algorithm, or even just describe a sequence of events (no lists, sadly).

Proposal

I propose extending godoc processing by a gordo annotations parser implementing both console and html output of described below format.

::: Gordo specification :::

 Styling:
 
 ˚  U+02DA ringabove  ˚dismiss / back to normal
 ´  U+00B4 acute      ´italics´       ´italics˙   𝑖𝑡𝑎𝑙𝑖𝑐𝑠           
 ¨  U+00A8 diaeresis  ¨bold¨             ¨bold˙   𝐛𝐨𝐥𝐝             
 ˘  U+02D8 breve      ˘ibold˘    ˘bold+italics˙   𝒃𝒐𝒍𝒅-𝒊𝒕𝒂𝒍𝒊𝒄𝒔       
 ˉ  U+00AF macron     ˉfixedˉ           ˉfixedˉ   fixed width span
 «» guillemets        «notable or related text»   p͜a͜y͜ ͜a͜t͜t͜e͜n͜t͜i͜o͜n͜ span

An emphasis (styled text) begins after either acute, diaeresis, or
breve character - none followed by a ringabove - and ends at a breve,
acute, or diaeresis of the other emphasis' start, or this emphasis
stop. It ends also at a macron, at a left guillemet, or at a ringabove
"dismiss" character. The 'fixed' and 'notable' spans begin and end only
with their respective special characters so other three emphases can be
used inside.  An empty line ends all running emphases and spans.

Editing software may apply styles while keeping the syntax visible.
In the final form a style is applied and syntax characters are hidden.


::: Lists ::: 

 •  U+2022 bullet       • bulleted list item 
 •a                     • lettered list item
 •1                     • numbered list item

List items need to be given without blank lines inbetween.
List ends at an empty line as any other gordo introduced styling.

 Godoc considerations:

 • List items should be recognized as such even if user-indented.
   It can not be code, as • can not possibly open a line of a
   valid Go source (if it could, it should be escaped anyway).
 • terminal output should impose uniform indentation of lists
 • Gofmt may impose uniform indentation of consecutive list
   items, as the godoc itself does not allow nested lists.
   (Other gordo processors may allow for nesting though).


::: Structure :::

 ¶ U+00B6 pilcrow        quotable section head   ¶(refid)   // 
 § U+00A7 paragraph      quotable paragraph lead §(refid)   //
 » U+00BB rguillemet             « quotable note »(refid)   // 

 «( U+00AB lguillemet    quote a refid here «(refid)        // also: 
                       «(quote adding no quote characters)  // «(refid)
                       «"quote in double quotation marks"   // «"refid"
                       «'quote in single quotation marks'   // «'refid'
                       «[quote in brackets]                 // «[refid]
                       «/quote in slashes/                  // «/refid/
                       «|quote in bars|                     // «|refid|
                       «.|just a notable span|»  // dot escapes the bar

The «(refid) "quote an internal link" token always outputs its target's
text. On the console text is put before a refid given in parentheses.
Html and info versions turn the quoted text into a link to the place
of origin instead. Eg. the source of:

    Annolex Editor  ¶(Sect.2)
    ... Please read «"Sect.2" for the primer. 

 should output on the console:

    Annolex Editor (Sect.2)
    ... Please read "Annolex Editor" (Sect.2) for the primer.

 but in html it is expected to output a link:

    ✻ Annolex Editor
    ... Please read "͟A͟n͟n͟o͟l͟e͟x͟ ͟E͟d͟i͟t͟o͟r" for the primer.

Document author is expected to keep ´refids´ both short ¨and¨ meaning.


::: making a Table Of Content part :::

In the console output that does not support hypertext movements the
TOC part is elided. Otherwise (as for html and info output formats)
TOC entries are made of quoted text and link to respective points of
the document. Quotable text may also link back to its TOC entry.

In an autogenerated TOC the ¶ and § references are quoted and styled as
"TOC-Section" and "TOC-Subsection", respectively. Any quotable portion
of the document can also be listed in a TOC manually:

            // toc entry from:
 «þ 'refid' // a quotable note,  "TOC-inlined" style
 •þ 'refid' // a quotable note,  "TOC-item" style (listed)
 Ǧ 'refid' // a section head,   "TOC-inlined" style
 •¶ 'refid' // a section head,   "TOC-Section" style (listed)
 Ǥ 'refid' // a paragraph lead, "TOC-inlined" style
 •§ 'refid' // a paragraph lead, "TOC-Subsection" style (listed)
            // apostrophes can be ommited if refid contains no spaces.


::: External links :::

 »þ               « link description »þ          // text description of 
 þ  U+00FE thorn    þ somesite.tld/path/tolink   // an url listed below

External links are introduced via the « note ending in a »þ digraph.
The url path - without protocol - must be given in the following line
prepended by a þ (likely indented). If more than one »þ is present in
a line, their respective url paths are given in separate lines below:

  in our «IEEE-ITSS Open Journal »þ and also on « our faculty »þ site.
     þ www.ieee-itss.org/oj-its
     þ www.ivt.ethz.ch

The final form of the output, incl. hypertext protocol used, is defined
by the gordo processor. This specification only mandates that the plain
text renderer - if used at all - removes gordo special characters and any
superfluous space left after this removal — including spaces following
the « of notable or link description span. Also, links rendered under the
sentence should be given numerical indice and be prefixed with protocol:

  in our IEEE-ITSS Open Journal¹ and also on our faculty² site.
     ¹ https://www.ieee-itss.org/oj-its
     ² https://www.ivt.ethz.ch

Up to three external links can be referenced in a single source line,
as WGL4 set provides only ¹, ², and ³ superscripted digits. If there are
more references in a single line, processor may insert an error message
into all outputs.


::: Escapes :::
 
Every special gordo character becomes ordinary if it is followed by
an immediate dismiss. The ˚ dismiss itself is ordinary where it has
nothing to dismiss.  The ¶ § þ • characters out of a valid digraph
or place are ordinary.  Gordo characters escaped:

        ˚˚   ´˚   ¨˚   ˘˚   ˉ˚   «˚   »˚   þ˚   •˚   ¶˚   §˚   


::: Writing annotations :::

All gordo special characters are from the common WGL4 set that is guaranteed
to have screen representations on all major OSes. Gordo special characters
can be typed using either US (macOS), or US-Intl (Windows US-International)
keyboard layouts. See "US keyboards" section.

But there are some 1000 keyboard layouts defined for hundreds of languages
using tens of scripts. Then this number is multiplied by hundreds of custom
configurations for tens of editors. In most combinations of above: one, two,
or more gordo characters might not be available at user's fingertips.

Yet gordo can be entered using any combination of layout and editor easily.

The GORDO environment variable contains a translation table the formatting
processor (gofmt for Go) uses. There user provided character (to the left)
is paired with a gordo one. 11 space separated pairs form a translation table
that formatter uses to translate annotations from the user configured chars
to gordo's cannonical form. GORDO should be set locally by an user, to her
best convenience.

GORDO allows the twelth table position: a "lead". If lead position is not
empty, formatter will treat as special (then translate) only a character that
comes after a configured lead. This enables one to use abbreviations varying
at last position, where all lead and surrogate characters can be in a user's
script — or even all be ascii:  GORDO=',˚ /´ =¨ +˘ |ˉ <« >» o• Lþ p§ s¶ ,,'.
With above GORDO, typed [,,|Wfix,,, and ,,=Bold,,,] after a write and format
will read [ˉWfix˚ and ¨Bold˚].

Default per-OS maps serve stock US/US-Intl layouts of respective major OSes:

  GORDO='˚˚ ´´ ¨¨ ˘˘ ˉˉ «« »» •• þþ §§ ¶¶' # linux has user defined layouts
  GORDO='°˚ ´´ ¨¨ ˘˘ ˉˉ «« »» •• …þ §§ ¶¶' # macOS substitutes … with þ
  GORDO='°˚ ´´ ¨¨ ‘˘ ¦ˉ «« »» ¤• þþ §§ ¶¶' # MSwin substs: ‘˘, ¦ˉ, and ¤•

The only corner case of gordo escaping arises for a dismiss surrogate,
that must be doubled should it be ordinary. Hence to output an ordinary
lone ringabove character four dismiss surrogates in a row must be used.

Note the ° degree mapped to a dismiss character in default mac/win maps
and write °°K, °°C, °°F, and nnn°° on both OSes. If desired.


::: Appendix A. US keyboards :::

 Stock US layouts mappings:
                           Opt/AltGr + MacOS  MsWin  Unix (proposed/user)
 ˚ U+02DF ringabove   ˚dismiss       ‖ °map ‖ °map ‖    / ‖ 
 ´ U+00B4 acute       ´italics´      ‖ sh E ‖    ' ‖    , ‖  𝑖𝑡𝑎𝑙𝑖𝑐𝑠
 ¨ U+00A8 diaeresis   ¨bold¨         ‖ sh U ‖ sh ' ‖    . ‖  𝐛𝐨𝐥𝐝
 ˘ U+02D8 breve       ˘ibold˘        ‖ sh . ‖ ‘map ‖    m ‖  𝒃𝒐𝒍𝒅-𝒊𝒕𝒂𝒍𝒊𝒄𝒔
 ˉ U+00AF macron      ˉfixedˉ        ‖ sh , ‖ ¦map ‖    - ‖  fixed width
 « U+00AB lguillemet  «important»    ‖    \ ‖    [ ‖ sh < ‖  a͜t͜t͜e͜n͜t͜i͜o͜n͜
 þ U+00FE thorn       þ linkurl      ‖ …map ‖    t ‖    w ‖  ͟l͟i͟n͟k͟
 • U+2022 bullet      • list item    ‖    8 ‖ ¤map ‖    0 ‖  • bulleted item
 ¶ U+00B6 pilcrow     ¶(refid)       ‖    7 ‖ sh ; ‖ sh P ‖  section anchor  
 § U+00A7 paragraph   §(refid)       ‖    6 ‖ sh S ‖    p ‖  paragraph anchor
 » U+00BB rguillemet  »(refid)       ‖ sh \ ‖    ] ‖ sh > ‖  notable anchor 

Above table in the MacOS and MsWin columns shows where gordo characters 
are on the stock US/US-Intl layout. The _map shows positions of default
surrogate characters per respective platform, entered with a chords:

  mac:  Option   ; ‖  … to þ  ͟l͟i͟n͟k͟ 
  mac:  Option   k ‖  ° to ˚  dismiss 
  win:  AltGr sh : ‖  ° to ˚  dismiss 
  win:  AltGr    9 ‖  ‘ to ˘  𝒃𝒐𝒍𝒅-𝒊𝒕𝒂𝒍𝒊𝒄𝒔
  win:  AltGr sh \ ‖  ¦ to ˉ  fixed width
  win:  AltGr    4 ‖  ¤ to •  bulleted item
  
Third column of above table shows ¨proposed¨ bindings for linux/unix OSes.
Unlike Windows and Mac, Linux gives user a full control over her keyboard:
she may add, remove, or change produced characters at will — and have it
picked up by the X GUI and terminal immediately. (Almost) every key may
produce up to four additional characters in a chord. With compose key
sequences an user may type several thousands of unicode runes more.


::: Appendix B. All gordo recognized "specials" :::

 11 characters (shown escaped):  ˚˚  ´˚  ¨˚  ˘˚  ˉ˚  «˚  »˚  þ˚  •˚  ¶˚  §˚

 19 digraphs:
  »þ                 // close a « link description »þ
  •a •1              // •1 numbered and •a lettered list item lead
  «þ •þ «¶ •¶ «§ •§  // make a TOC entry linked to given quotable text place
  »( §( ¶(           // provide refid string, for a span, paragraph, section
  «( «" «' «[ «/ «|  // link and quote refid here (a quotable piece of text)
  «.                 // just a notable span (starting with above quote char)


Rationale

Documentation that can be styled even with only bold and italics, and one that can be structured to fit the domain, may help package authors to be more precise and unambigous, and help documentation consumers to avoid misunderstandings.

Gordo enabled godoc may encourage a well structured documentation that is written into the program sources even, or the more, for most sophisticated ideas, solutions and code. Now packages of just middle complexity often resort to external descriptions of their api.

Gordo parsing is fast, and there are no ambiguities introduced.

Unlike markdown that makes raw annotated text almost unreadable, the gordo annotations are barely noticeable unless reader is wilfully scanning for the formatting hints.

Compatibility

This proposal extends documentation source syntax, and this syntax parsing methods, in a way that may not influence any program source but — in theory — might alter the visible html output of some existing documentation.

Even if this would happen, such a change would likely effect in the font decoration or size, and would not affect the meaning.

Implementation

Enabling gordo annotations would need support from both gofmt and godoc. While implementation of basic formatting could be trivial, the real power of the proposed format and methods lie in the ability to make documentation both easy to skim at console and useable as an interactive manual in the browser. The last one needs working internal links between "quotable" and "quote" places implemented as well. Implementing this might need more resources.

@gopherbot gopherbot added this to the Unreleased milestone Dec 3, 2019
@gopherbot gopherbot added the Tools label Dec 3, 2019
@taruti

This comment has been minimized.

Copy link
Contributor

@taruti taruti commented Dec 4, 2019

There are tons of readily available lightweight markup syntaxes (markdown, asciidoc, reStructuredText, Textile, ...). Why are you proposing yet another markup language?

This seems hard to type. And having to type different things on different operating systems that are translated to various symbols (with a per os GORDO environment variable) seems like a bad idea.

Also using accents in formatting does not make the documents very readable in my personal opinion.

@cagedmantis cagedmantis changed the title x/tools/cmd/godoc: GORDO enriched Go documentation format. proposal: x/tools/cmd/godoc: GORDO enriched Go documentation format. Dec 4, 2019
@gopherbot gopherbot added the Proposal label Dec 4, 2019
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Dec 4, 2019

Go docs are meant to be unobtrusive plain text. Obscure Unicode markup does not count as plain text. When reading your example, I did notice the "gordo annotations", but I thought something was wrong with the browser's text rendering. That's not a good thing for documentation.

If we add any more support, it is most likely going to be using a very limited subset of Markdown, like maybe just adopting one bullet list syntax. Even that is still a ways down the priority list though.

@rsc rsc added this to Incoming in Proposals Dec 4, 2019
@ohir

This comment has been minimized.

Copy link
Author

@ohir ohir commented Dec 5, 2019

@rsc

Go docs are meant to be unobtrusive plain text.

Gordo is meant to preserve Go docs to be unobtrusive plain text.

Obscure Unicode

All characters used in gordo came with the brand new DEC's VT100 terminal unit in the year 1983. Thirty six years ago. This set I used in the 1989' software and these characters were available on the dated daisy wheel printers my first client then had.

Obscure

Used daily with latin letters by a billion people or more.

Unicode markup does not count as plain text. When reading your example, I did notice the "gordo annotations", but I thought something was wrong with the browser's text rendering.

These will not render in the browser. These might be visible in the source and there they are the least obtrusive. Click through the raw button, please.

If we add any more support, it is most likely going to be using a very limited subset of Markdown,

Does really **bold**, _italics_, **_bold-italics_** and lists introduced by a significant whitespace allows one to better make sense of the words than ¨´˘ with a space under?


@taruti

There are tons of readily available lightweight markup syntaxes (markdown, asciidoc, reStructuredText, Textile, ...). Why are you proposing yet another markup language?

Because other markups are obtrusive for anyone who reads them in the source.

markdown source:
this version uses the [**Atkin**](https://fylux.github.io/2017/03/16/Sieve-Of-Atkin/) sieve
instead of previously used [**Pritchard's wheel**](https://link.springer.com/article/10.1007/BF00264164) one.

gordo source:
this version uses the «¨Atkin¨»þ sieve instead of previously used «¨Pritchard's wheel¨»þ one.
    þ fylux.github.io/2017/03/16/Sieve-Of-Atkin/
    þ link.springer.com/article/10.1007/BF00264164

markdown renders:
this version uses the Atkin sieve
instead of previously used Pritchard's wheel one.

gordo renders:
this version uses the Atkin sieve instead of previously used Pritchard's wheel one.

This seems hard to type. And having to type different things on different operating systems that are translated to various symbols (with a per os GORDO environment variable) seems like a bad idea.

Please re-read. I on my side will try to edit this part to have it not being understood exactly the opposite.

This seems hard to type.

It is an user's choice how to type gordo. The example provided in the proposal even shows how to type it using only ASCII characters — just like a markdown.

that are translated to various symbols

No. The opposite!

Various characters of user's choice are translated to the fixed set of eleven "gordo" characters.

Author types whatever keystrokes she wants and whatever she finds convenient/avaliable on her national keyboard layout, considering an IDE or editor she uses.
It is the target (cannonical) 11 charcters set that does not change.
GORDO table sets the input, output is fixed and same on all OSes and in all editors.

Also using accents in formatting does not make the documents very readable in my personal opinion.

It depends of what one does want to focus on. If it is the markup a reader needs to analyse, then yes - single dots or rings at top of the line need special attention.

Note though, that for all readers but author the less noticeable markup is, the better.

We (me at least) work with source documentation laid out with fixed-width fonts on screens of certain capacity. The html version is important before - lets us read faster and assess quality better. Where I work with other's source, In my vim I have marked parts of the docs (source) four to six keystrokes away.

Gordo aims to be unobtrusive in the source. So to allow it be as readable on the terminal as on the web while keeping the web version searchable and interactive, in a way.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Dec 5, 2019

I didn't say anything about **bold**, _italics_, **_bold-italics_**.
In general we don't want markup in doc comments.
I said we might recognize bullets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Incoming
4 participants
You can’t perform that action at this time.