Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better code blocks #17

Merged
merged 6 commits into from
Jun 5, 2023
Merged

Better code blocks #17

merged 6 commits into from
Jun 5, 2023

Conversation

jonludlam
Copy link
Collaborator

This commit adds some new functionality to code blocks. Firstly it allows arbitrary delimiters meaning that code containing the usual code block terminator can now be parsed correctly. The syntax for this is:

{delimiter@ocaml[ ... ]delimiter}

The delimiter can contain the chars [ a-z A-Z 0-9 _ - ], the same as the language tag that comes after the '@' symbol. Note that there's no way to have a delimited code block without a language tag.

The second piece of functionality is that code blocks can now have associated output:

{@ocaml[

... ocaml code ...

][

... odoc formatted output ...

]}

This syntax also supports the delimiters as above. The delimiters only encode the code block, not the output:

{delim@ocaml[

... ocaml code containing ][ or ]} ...

]delim[

... odoc formatted output ...

]}

The idea is that the odoc formatted output should be well formed and thus any escaping is done in the usual way.

The output can then contain, for example, error blocks produced by mdx:

{delim@ocaml[
  let x = "]}
]delim[
{err@mdx-error[
Line 1, characters 9-10:
Error: String literal not terminated]err}
]}

In addition this also allows the possibility of code blocks to produce rich output - ie., allowing marked-up such as tables, headings, images and so on, in such a way that they are associated with the code block, and hence can be manipulated by a 'test-promote' workflow.

This commit adds some new functionality to code blocks. Firstly
it allows arbitrary delimiters meaning that code containing the
usual code block terminator can now be parsed correctly. The
syntax for this is:

    {delimiter@ocaml[ ... ]delimiter}

The delimiter can contain the chars `[ a-z A-Z 0-9 _ - ]`, the
same as the language tag that comes after the '@' symbol. Note
that there's no way to have a delimited code block without a
language tag.

The second piece of functionality is that code blocks can now have
associated output:

    {@ocaml[

    ... ocaml code ...

    ][

    ... odoc formatted output ...

    ]}

This syntax also supports the delimiters as above. The delimiters
only encode the _code_ block, not the output:

    {delim@ocaml[

    ... ocaml code containing ][ or ]} ...

    ]delim[

    ... odoc formatted output ...

    ]}

The idea is that the odoc formatted output should be well formed and thus
any escaping is done in the usual way.

The output can then contain, for example, error blocks produced by mdx:

    {delim@ocaml[
      let x = "]}
    ]delim[
    {err@mdx-error[
    Line 1, characters 9-10:
    Error: String literal not terminated]err}
    ]}

In addition this also allows the possibility of code blocks to produce
rich output - ie., allowing marked-up such as tables, headings,
images and so on, in such a way that they are associated with the code
block, and hence can be manipulated by a 'test-promote' workflow.
@jonludlam
Copy link
Collaborator Author

Review comment from group review in Cambridge: Could we perhaps make the delimiting generic, e.g.:

{delim@ocaml[ ... ]delim[ ]}
{delim|v ... verbatim v|delim}
{delim|m ... |delim}
{delim|math ... |delim}
{|v ... v|}

@jonludlam
Copy link
Collaborator Author

While I think having delimiters elsewhere is a worthy goal, we can do that in another PR.

Copy link
Contributor

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax looks reasonable, no need to block for more generic delimiters (which would probably still look different on code blocks).

| In_explicit_list -> (List.rev acc, next_token, where_in_line)
| In_tag -> (List.rev acc, next_token, where_in_line)
| In_table_cell -> (List.rev acc, next_token, where_in_line)
| In_code_results -> (List.rev acc, next_token, where_in_line))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most cases have the same type and could perhaps be written with an or-pattern ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoyingly not (the return types are different in each case). See the comment immediately below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

src/token.ml Outdated Show resolved Hide resolved
src/lexer.mll Outdated Show resolved Hide resolved
src/lexer.mll Outdated Show resolved Hide resolved
@@ -219,18 +217,19 @@ let emit_verbatim input start_offset buffer =
let t = trim_trailing_blank_lines t in
emit input (`Verbatim t) ~start_offset

let emit_code_block ~start_offset input metadata c =
let c = trim_trailing_blank_lines c in
let emit_code_block ~start_offset content_offset input metadata delim terminator c has_results =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location calculations in this function are quite complicated and would deserve some comments.
Why is content_offset needed ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a remark on delim_char but otherwise I think it's ready to merge.
The CI failures are not related, everything is OK.

src/lexer.mll Outdated
@@ -267,6 +267,9 @@ let raw_markup_target =
let language_tag_char =
['a'-'z' 'A'-'Z' '0'-'9' '_' '-' ]

let delim_char =
['a'-'z' 'A'-'Z' '0'-'9' '_' '-' ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without - ? This might interfere with the {- ...} syntax with a different parser engine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I thought I had removed that, as per your previous comment. Will fix, thanks!

@@ -219,18 +217,19 @@ let emit_verbatim input start_offset buffer =
let t = trim_trailing_blank_lines t in
emit input (`Verbatim t) ~start_offset

let emit_code_block ~start_offset input metadata c =
let c = trim_trailing_blank_lines c in
let emit_code_block ~start_offset content_offset input metadata delim terminator c has_results =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jonludlam jonludlam merged commit f12420a into ocaml-doc:main Jun 5, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants