Better code blocks #17

jonludlam · 2023-04-24T15:55:37Z

This commit adds some new functionality to code blocks. Firstly it allows arbitrary delimiters meaning that code containing the usual code block terminator can now be parsed correctly. The syntax for this is:

{delimiter@ocaml[ ... ]delimiter}

The delimiter can contain the chars [ a-z A-Z 0-9 _ - ], the same as the language tag that comes after the '@' symbol. Note that there's no way to have a delimited code block without a language tag.

The second piece of functionality is that code blocks can now have associated output:

{@ocaml[

... ocaml code ...

][

... odoc formatted output ...

]}

This syntax also supports the delimiters as above. The delimiters only encode the code block, not the output:

{delim@ocaml[

... ocaml code containing ][ or ]} ...

]delim[

... odoc formatted output ...

]}

The idea is that the odoc formatted output should be well formed and thus any escaping is done in the usual way.

The output can then contain, for example, error blocks produced by mdx:

{delim@ocaml[
  let x = "]}
]delim[
{err@mdx-error[
Line 1, characters 9-10:
Error: String literal not terminated]err}
]}

In addition this also allows the possibility of code blocks to produce rich output - ie., allowing marked-up such as tables, headings, images and so on, in such a way that they are associated with the code block, and hence can be manipulated by a 'test-promote' workflow.

This commit adds some new functionality to code blocks. Firstly it allows arbitrary delimiters meaning that code containing the usual code block terminator can now be parsed correctly. The syntax for this is: {delimiter@ocaml[ ... ]delimiter} The delimiter can contain the chars `[ a-z A-Z 0-9 _ - ]`, the same as the language tag that comes after the '@' symbol. Note that there's no way to have a delimited code block without a language tag. The second piece of functionality is that code blocks can now have associated output: {@ocaml[ ... ocaml code ... ][ ... odoc formatted output ... ]} This syntax also supports the delimiters as above. The delimiters only encode the _code_ block, not the output: {delim@ocaml[ ... ocaml code containing ][ or ]} ... ]delim[ ... odoc formatted output ... ]} The idea is that the odoc formatted output should be well formed and thus any escaping is done in the usual way. The output can then contain, for example, error blocks produced by mdx: {delim@ocaml[ let x = "]} ]delim[ {err@mdx-error[ Line 1, characters 9-10: Error: String literal not terminated]err} ]} In addition this also allows the possibility of code blocks to produce rich output - ie., allowing marked-up such as tables, headings, images and so on, in such a way that they are associated with the code block, and hence can be manipulated by a 'test-promote' workflow.

jonludlam · 2023-04-27T13:49:16Z

Review comment from group review in Cambridge: Could we perhaps make the delimiting generic, e.g.:

{delim@ocaml[ ... ]delim[ ]}
{delim|v ... verbatim v|delim}
{delim|m ... |delim}
{delim|math ... |delim}
{|v ... v|}

jonludlam · 2023-06-01T09:01:05Z

While I think having delimiters elsewhere is a worthy goal, we can do that in another PR.

Julow

The syntax looks reasonable, no need to block for more generic delimiters (which would probably still look different on code blocks).

Julow · 2023-06-01T11:37:16Z

src/syntax.ml

+        | In_explicit_list -> (List.rev acc, next_token, where_in_line)
+        | In_tag -> (List.rev acc, next_token, where_in_line)
+        | In_table_cell -> (List.rev acc, next_token, where_in_line)
+        | In_code_results -> (List.rev acc, next_token, where_in_line))


Most cases have the same type and could perhaps be written with an or-pattern ?

Annoyingly not (the return types are different in each case). See the comment immediately below.

src/token.ml

src/lexer.mll

Julow · 2023-06-01T11:50:15Z

src/lexer.mll

@@ -219,18 +217,19 @@ let emit_verbatim input start_offset buffer =
  let t = trim_trailing_blank_lines t in
  emit input (`Verbatim t) ~start_offset

-let emit_code_block ~start_offset input metadata c =
-  let c = trim_trailing_blank_lines c in
+let emit_code_block ~start_offset content_offset input metadata delim terminator c has_results =


The location calculations in this function are quite complicated and would deserve some comments.
Why is content_offset needed ?

I've added a comment.

Julow

I have a remark on delim_char but otherwise I think it's ready to merge.
The CI failures are not related, everything is OK.

Julow · 2023-06-02T15:41:25Z

src/lexer.mll

@@ -267,6 +267,9 @@ let raw_markup_target =
 let language_tag_char =
  ['a'-'z' 'A'-'Z' '0'-'9' '_' '-' ]

+let delim_char =
+  ['a'-'z' 'A'-'Z' '0'-'9' '_' '-' ]


Without - ? This might interfere with the {- ...} syntax with a different parser engine.

Huh, I thought I had removed that, as per your previous comment. Will fix, thanks!

Julow · 2023-06-02T15:42:38Z

src/lexer.mll

@@ -219,18 +217,19 @@ let emit_verbatim input start_offset buffer =
  let t = trim_trailing_blank_lines t in
  emit input (`Verbatim t) ~start_offset

-let emit_code_block ~start_offset input metadata c =
-  let c = trim_trailing_blank_lines c in
+let emit_code_block ~start_offset content_offset input metadata delim terminator c has_results =


jonludlam added 2 commits April 24, 2023 16:53

Better named field

5ae8034

Julow reviewed Jun 1, 2023

View reviewed changes

jonludlam added 3 commits June 1, 2023 14:51

Review fixes

b8a9cf1

Add comment explaining tricky location adjustments

9238b07

Remove esy test

387ac5c

Julow approved these changes Jun 2, 2023

View reviewed changes

Remove ambiguity from delim_char

cae9a69

jonludlam merged commit f12420a into ocaml-doc:main Jun 5, 2023
2 of 3 checks passed

panglesd mentioned this pull request Sep 29, 2023

Vendor odoc-parser realworldocaml/mdx#430

Merged

gpetiot mentioned this pull request Oct 2, 2023

Upgrade mdx to use last version of odoc-parser realworldocaml/mdx#439

Merged

4 tasks

panglesd mentioned this pull request May 29, 2024

Bad parsing or ][ in block code ocaml/odoc#1137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better code blocks #17

Better code blocks #17

jonludlam commented Apr 24, 2023

jonludlam commented Apr 27, 2023

jonludlam commented Jun 1, 2023

Julow left a comment

Julow Jun 1, 2023

jonludlam Jun 1, 2023

Julow Jun 2, 2023

Julow Jun 1, 2023

jonludlam Jun 1, 2023

Julow Jun 2, 2023

Julow left a comment

Julow Jun 2, 2023

jonludlam Jun 2, 2023

Julow Jun 2, 2023

Better code blocks #17

Better code blocks #17

Conversation

jonludlam commented Apr 24, 2023

jonludlam commented Apr 27, 2023

jonludlam commented Jun 1, 2023

Julow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Julow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment