-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ANSI writer #9565
add ANSI writer #9565
Conversation
Nice! Upload a screenshot? The slowness does concern me. I think the main application of this would be reading files on your terminal, and a 2.7 sec delay seems too much for that. Can you pin down where the slowdown comes from, by experimentation or profiling? |
Ah, there's definitely something with super-linear time complexity going on with |
See jgm/doclayout#29 for the performance issue. |
Staring at my first screenshot there I realized DocLayout's line breaking is broken by the styled text, filed jgm/doclayout#30 to track since I don't think I'm going to plow all the way to a solution right away. |
Rebased. Please note d664dbe; something about jgm/doclayout#31 has caused a whitespace regression here. Actually I just realized what it probably is; The djot reader parses `-\n' parses as an empty list item just fine but nevertheless I'm not sure if tweaking this test is fine or if we should go put the missing space back where it belongs. |
I'm not worried about that space; I think removing it is actually the intended behavior so I'll consider this a bug fix. |
Some comments on the design choices:
I think the horizontal line with the emoji is too opinionated (even though it looks cool).
Is the issue just that skylighting produces a fully rendered string? I don't see why that's a problem, necessarily.
Probably a good idea.
Is there any way to check for it?
I'd like to avoid C dependencies.
For placeholder purposes, one option would be to include a table generated by the I wonder whether it could work to use the |
Why not use skylighting for these too, for consistency? |
Admitted. I'll calm this down.
Honestly I only partly remember what I was talking about. The main thing that's subpar is if you select a highlight-style that uses a nondefault background color, you get what I would consider subpar results: If we could get a Overall I think the state of affairs with code blocks is fine for starters.
Not really, I guess. Someone has a list of the terminal emulators that are known to support it, but there's no I think the ANSI writer does need to be able to print the links out visibly, as there are certainly widely used terminal emulators that don't and may never support OSC 8 (Terminal.app, xterm, urxvt). Some users with supporting terminals might not want it anyway. I think it has to be a writer option provided by a CLI flag.
Valid. I don't think the lofi outputs supported by Chafa are a critical feature for Pandoc. The Kitty and iTerm 2 protocols will probably cover most of the users who would want to see images in the terminal anyway.
|
Ah yes, that is bad. I'd be open to having skylighting-format-ansi export a function that yields a Doc Text. |
Another thought about the 4-space indentation on body text. This echoes the formatting of man pages, and it looks good and familiar. But one drawback is that it makes it hard to copy/paste content from a document you are viewing this way -- since you'll get these initial indents. |
I think the copy-paste drawback you mention is of limited relevance to the ANSI writer use-cases. The analogy to viewing man pages is apt; I think the ANSI writer should optimize for reading over other considerations. Even if there weren't an indented margin, copy-pasting would still be a bit annoying due to hard breaks in the formatted output. At least in Pandoc if someone wants a chunk of more pasteable text you can rerun with |
I see relevant examples in other writers that support syntax highlighting. I sort of assume code spans marked up with their language are rare but it makes sense to provide as much support for it here as elsewhere. We'll still need some kind of fallback for generic code spans. The basic issue there of course is that all the text in the terminal is already fixed-width, so we can't use that for contrast. Magenta-on-white sort of resembles what Slack does with code/fixed-width spans: The GitHub CLI uses Glamour to render gfm and its code spans default to bright red on dark red, plus an extra space's worth of padding on each end of the span for additional contrast. I want to land on something here that doesn't require me to get to jgm/doclayout#32 just yet, because I uh, don't want to work on it right now. Any particular preferences or suggestions? |
Copy-paste: I'm not sure. I certainly have wanted to copy code from man pages and READMEs before, and wouldn't have wanted to context shift, exit man, fire up a new program, and find the relevant section again. Even if we're optimizing for reading, copying text IS something one often wants to do when reading a document. Certainly if I wanted to copy some text from a Word or HTML document or EPUB, I'd rather fire up Note also that eliminating the 4-space indent would make part of the problem with highlighted code blocks with backgrounds go away. (There would still be the odd color shift at the end of the first line, which I don't really understand....maybe this is something that needs to be fixed in skylighting-format-ansi?) |
I'm not positive I totally understand it either. Removing but like the prior screenshot, when echoing directly to the terminal. I think the different behavior of
(wow there I go copy pasting from the man page) So I interpret this to mean that In any case, the best thing to do is use doclayout to put code blocks into an actual rectangle but as noted I think the current state is acceptable. The default I don't want to bikeshed the matter of the margin; I can drop it from this PR with the idea that it makes sense to err on the side of less-opinionated output at least to begin with. That does make the question of how to style headings a bit more of a headscratcher. What's here so far isn't thoroughly worked out but shifting the headings into the margin like man output was at least an available option. Since we can't make headings larger or use a different font family, it's not obvious how to create hierarchy. Lynx's defaults are not at all exuberant: The style used by the Charm.sh folks in https://github.com/charmbracelet/gum uses box-drawing characters for this in a way that seems fairly sensible, on top of a color contrast. I could take or leave the color part. (They have a 2-column left margin too, heh.) We could try modestly more spacious decorations with box drawing characters as well, cf.
If we we could even take advantage of markdown's familiarity and put the appropriate number of Converting H1s to all-caps would match how man pages are written, but Pandoc doesn't really do any such thing by default elsewhere and makes no difference for many scripts. Do you have any inclinations on this? I think getting headings right is the big remaining design decision I want to make. |
One question is whether it's necessary to distinguish different heading levels visually. Personally, I think it's okay not to. Using indentation would give you a way to distinguish a couple of levels of heading, but not six (unless you indent the body text really far, or do what lynx seems to do, which is just ugly -- with the heading indented more than the contained body text). Putting the heading in boldface and a different color from the body text definitely sets it apart as a heading, even without indentation. I don't like the box-drawing character options much. |
rebased and force-pushed. |
I'm not a big fan of the new idea for sections. Having section headings indented to the right relative to body text just seems wrong. And I'm not sure I like the section symbols. Would it be so bad just to make all headings boldface, flush-left, with space above and below? You wouldn't then be able to distinguish heading levels, but how big a drawback is that? If you have a document with many levels and need to keep track of them, you could always render with Another option would be, e.g., level 1 - bold, underlined, all caps or something like that. (Though personally I think this might be ugly.) |
Numbered sections isn't implemented so far, I can spend some time on it but I think it could go in the backlog for after merge. I think the minimum I want to ship with is H1s and H2s that are distinguishable from each other and from H3-H6. At this point that'd probably be bold+caps for H1, bold for H2, italic for H3-H6. This is slightly worse for scripts without capitals but I guess most such scripts also commonly don't have italics, so there's some degradation anyway. The remaining option which I have mostly discounted til now is to add a color; since I've used it sparingly elsewhere I think a contrasting color shared by all headings might be helpful after all. Since you are pretty firmly on the side of keeping it simple I'll do another iteration with something like the above. |
That sounds reasonable. I think the idea of having a color for the headings is a good one. (Might as well be the same for all levels, because color isn't a good way to distinguish heading levels.) The scheme you suggest is fine with me. Anyway, we can get a version of this out and see what people say...the heading scheme can always be tweaked in the future. |
Implementing numbering shouldn't be too complicated. The basic approach is to do something like let blocks' = makeSections (writerNumberSections opts) Nothing blocks where |
6891313
to
016b1b0
Compare
For reasons not currently clear to me(!), the tip of doclayout consumes the breaking space that is followed by nothing (i.e. the ignored raw LaTeX inline). The tests of djoths itself still pass.
The ANSI writer (-t ansi) outputs a document formatted with ANSI control sequences for reading on the console. Most Pandoc elements are supported and printed in a reasonable way, if not always ideally. This version does no detection of terminal capabilities nor does it fall back to different output styles for less-capable terminals. Some gory details: - Title blocks are formatted with modest extravagance in --standalone mode. - Strong, Emph, Underline, and Strikeout spans are all formatted accordingly using SGR codes (which will be silently ignored by terminals that don't support them). - Headings have somewhat arbitrary styles applied to them that probably need immediate improvement. - Blockquotes and all flavors of list look pretty good. - Code spans are colored magenta-on-white, which on the author's terminal looks kind of like the pinkish treatment of code spans used by many stylesheets. This probably isn't a good final decision. - Code blocks are formatted by Skylighting's formatANSI using standard writer options and included directly in the output. This has some issues; see code comments. - Links are printed with OSC 8 to create hyperlinks and colored cyan. The author's terminal automatically adds a dotted-underline to OSC 8 hyperlinks, but only colors them differently on command-mouseover. Setting an underlined style on links may be more broadly accessible. OSC 8 support is not checked for, so on terminals not supporting it or with support disabled, the link text will be colored but not do anything and the links will not be printed. - Images are displayed as their alt text. Support for the Kitty and iTerm 2 inline image protocols is planned. Supporting other terminals by using Chafa (https://hpjansson.org/chafa/) to print sixels etc would be cool too but the author would have to do some FFI stuff and it would add a dependency to Pandoc. - Tables are replaced with a useless placeholder. Table output using box-drawing characters is desired. - Subscripts and Superscripts are just parenthesized when accurate Unicode representations aren't available. Because these span types could have all kinds of semantics, there's not an obvious thing to do with them. - Simple math is translated to Pandoc inlines using existing functionality. An ambitious person could look into emulating the console-mode math output of a computer algebra system, or rendering each display math element as an image with TeX or Typst and including it, or some other thing.
All headings at the document top-level are green. Headings inner to structures like blockquotes and lists are not. H1 is bold and all caps. H2 is bold. H3-H6 are italic. The design here is meant to be relatively boring/simple and allow telling H1 and H2 apart from each other and from the remaining heading levels.
Code blocks are indented four spaces. Unhighlighted code blocks are colored red. A quirk is that code blocks that are numbered (class numberLines) but don't set a language get run through skylighting, but the code itself isn't actually highlighted, so the default red style won't get applied to them. This isn't reasonably solvable right now.
This table renderer is pretty basic and uses minimal non-data ink. It presently renders correctly in the presence of colspans but not rowspans. Code blocks in table cells are never syntax highlighted because the present approach of using skylighting's ANSI output is incompatible with the reflowing done by doclayout's block elements. (i.e., you get terminal escapes littering your output). No attempt is made to shrink table width below the available maximum.
@jgm Hi, I'm back! Thinking too hard about table rendering plus the temperature in my room during the summer deterred me for a few months but I have finally banged out a reasonable first draft of table rendering for the ANSI writer. I haven't put it through an aggressive stress test but it renders simple tables pretty well. The most obvious thing that doesn't work is rowspans. I don't want to try to work on rowspans any more now than I did in the spring. With this I believe that all Pandoc elements are rendered by the ANSI writer to a reasonable degree of quality. If you have any further thoughts on the default aesthetics of tables, or anything else I worked on previously, let me know. Apart from aesthetics, what else needs to be done to merge? Should I add docs + tests to this PR? |
I don't know what happened here; I didn't intend to merge this quite yet! In any case, it's OK I think; it was quite close to being ready. I would have rebased it, though. |
Haha oops, I was a bit surprised. Good news though, I did rebase it and it looks like it ff-merged. If you make an issue for me with test and docs tasks I’ll try to tackle them.
… On Sep 3, 2024, at 1:08 PM, John MacFarlane ***@***.***> wrote:
I don't know what happened here; I didn't intend to merge this quite yet!
In any case, it's OK I think; it was quite close to being ready. I would have rebased it, though.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
One of the things I'm not sure about is the addition of the Another question: will the ANSI output look good with various color schemes in the terminal? E.g. if I have white text on a black background? I think I can take care of basic test and docs. |
I added a basic test: test/ansi-test.txt is the (markdown) input, and test/ansi-test.ansi is the expected output. Feel free to add to the test case if you like. |
The table thing I did by analogy to figures. It looks like leaving off |
Maybe we should just omit the label "Table N"? It's only really useful when you also have a mechanism for referring to tables, as in LaTeX. And it adds some unwelcome clutter esp. for tables without captions. |
Yeah I agree
… On Sep 3, 2024, at 1:51 PM, John MacFarlane ***@***.***> wrote:
Maybe we should just omit the label "Table N"? It's only really useful when you also have a mechanism for referring to tables, as in LaTeX. And it adds some unwelcome clutter esp. for tables without captions.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I'll make that change. |
@silby you mentioned way up there that superscript and subscript are just paranthesised. I wonder if it wouldn't be better to render them in a more Pandoc Markdown or TeX style as |
@bpj those are some good ideas; it'll be easier for me to keep track of them when I come back to this if you make some fresh issues! |
This requires jgm/doclayout#28 in addition to the stuff I already got merged; cabal.project does not reflect this.The requireddoclayout
commit is added tocabal.project
.I think that the surface area supported by what I have here is a minimum scope to be worth releasing, but I don't really think this is quite good enough as is to merge. I'm filing a draft PR to get some feedback on the choices and compromises made so far and what the requirements for a shipping ANSI writer should be. I'd really love some ideas on how to format headings! Some of what's here reflects arbitrary design choices that make me happy, like putting a four-column margin on the whole output.
It's a bit slow; renderingThe slowness is fixed.MANUAL.txt
to ANSI takes 2.77 seconds on my machine, compared to 0.66 seconds for rendering it to HTML. I have an inkling this may be due to the quantity and size of definition lists inMANUAL.txt
, but who knows.There's no new tests or docs so far.
The ANSI writer (-t ansi) outputs a document formatted with ANSI control sequences for reading on the console.
Most Pandoc elements are supported and printed in a reasonable way, if not always ideally. This version does no detection of terminal capabilities nor does it fall back to different output styles for less-capable terminals.
Some gory details: