Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested syntax for {|underline|}, {!strikeout!} and {.small caps.} #10

Open
bpj opened this issue Jul 18, 2022 · 18 comments
Open

Suggested syntax for {|underline|}, {!strikeout!} and {.small caps.} #10

bpj opened this issue Jul 18, 2022 · 18 comments

Comments

@bpj
Copy link

bpj commented Jul 18, 2022

In the announcement thread on pandoc-discuss I suggested to add these syntaxes:

{|underline|} 
{!strikeout!}
{.small caps.}

@jgm asked me to open an issue here.

I would very much appreciate to have a syntax for small caps in particular.

@uvtc
Copy link
Contributor

uvtc commented Jul 20, 2022

Note that djot already has syntax for:

  • {-deleted-} (strikeout / strikethrough) and
  • {+inserted+} ("underline").

As for smallcaps... I'm seeing three kinds of span syntax in djot:

  • text bounded by {char and char} (where sometimes the braces are optional)
    • eg.: {*bold*}, {_italic_}, {+inserted+}, {-deleted-}, {~sub~}, {^sup^}, {=highlighted=}
  • text bounded by colons (note: adding curlies {:foo:} stops the colon syntax)
    • eg.: :smiley:, :+1:
  • text bounded by backticks and prefixed with something
    • eg.: $`a = b + c` for math

So, some alternatives for a smallcaps syntax that might work with djot:

  • {.Some Small Caps.}
  • Can't just use .bound by dots. (too common in regular text), though maybe prefix bounding colons, like .:Some Small Caps:
  • .`Some Small Caps` (prefixing the backticks by something other than $)

I kinda' like {.That First Option.}. The only problem is that {.foo.} (for smallcaps) looks an awful lot like {.foo} (for attributes (what you'd write to get class="foo")) --- I think those are just too similar.

Maybe another punctuation character:

{,Some Small Caps,}

{;Some Small Caps;}

I think the comma looks good --- maybe even better than the dot; if you squint, the comma looks a little like a very tiny arrow pointing down, which is very nice for small caps. 😃

@jgm
Copy link
Owner

jgm commented Jul 20, 2022

Only the first form works to apply formatting to an arbitrary sequence of inline elements. The second and third form operate on plain unformatted text.

For other characters, think =, #, %, and . are out, because of conflicts with attribute (and command and raw) syntax. That leaves

{,
{@
{$
{&
{|
{,
{;
{/
{?

@uvtc
Copy link
Contributor

uvtc commented Jul 21, 2022

Hm. My 2 cents:

  • {/Foo Bar/} looks like it would be for italic, but there's already syntax for italic.
  • $ is already used for math.
  • {|Foo Bar|} is pretty, though not sure what it looks like it would be syntax for.
  • {?Foo Bar?} looks too much like a question ({?Have you tried the eggplant?})
  • {&Foo Bar&} and {@Foo Bar@} are pretty bulky, and, like |, they don't suggest any markup to me.

That leaves the comma and the semicolon.

@jgm
Copy link
Owner

jgm commented Jul 26, 2022

Maybe:

  • {,small caps,}
  • {!strikeout!}
  • {;underline;}

Problem is that none of these are at all suggestive of what they mean.
! looks like it is emphasizing the text rather than striking it out.
, is not TOO bad for small caps.
; is horrible for underline. If we wanted underline, it could be better to reserve {_ for that and use {/ for italic emphasis. However, it's important to have a syntax for emphasis that generally doesn't require the {, and / is widely used between words.

One option could be to require the { when _ or * or / is used inside a word, and only allow them to be used "bare" when they are on the edge of a word. I considered that option earlier, but identifying word boundaries is tricky unless we build a lot of unicode logic into the parser (detecting character classes), which I'd hoped to avoid.

@wooorm
Copy link
Contributor

wooorm commented Jul 26, 2022

, gives me thin space vibes (from LaTeX). Not sure if something like that would make sense here. But that would be my first hunch

@bpj
Copy link
Author

bpj commented Jul 26, 2022

@jgm what's wrong with {|underline|}? Its a vertical line but at least a line! The idea with {!strikeout!} is that strikeout "cancels" text and ! means negation in many programming languages. It's far from perfect but it's something, although I agree that it's not at all obvious to non-programmers.

@bpj
Copy link
Author

bpj commented Jul 26, 2022

BTW I'm fine with commas for small caps. I just took the dots from my old Perl script which I mentioned on pandoc-discuss, not realizing at the moment that it might clash with classes in attributes. As I mentioned I used to use {/italics/} but /italics/ is horrible. Requiring that it is flanked with whitespace or ASCII punctuation won't cut, since there are plenty of non-ASCII punctuation.

@bpj
Copy link
Author

bpj commented Jul 26, 2022

The main reason /italics/ is horrible from a linguist's POV is that it clashes with phonemic notation, no matter what characters you require it to (not) be surrounded by. Its on a par with not allowing mathematicians to use < and > for less-than and greater-than.

@jgm
Copy link
Owner

jgm commented Jul 26, 2022

@jgm what's wrong with {|underline|}? Its a vertical line but at least a line!

True! Maybe that's not so bad.

So, maybe the best idea would be {|underline|} and {,small caps,} and {!strikeout!}.

@uvtc
Copy link
Contributor

uvtc commented Jul 27, 2022

@jgm what's wrong with {|underline|}? Its a vertical line but at least a line!

True! Maybe that's not so bad.

So, maybe the best idea would be {|underline|} and {,small caps,} and {!strikeout!}.

@jgm , are you suggesting replacing:

  • {+underline+} with {|underline|}, and
  • {-strikeout-} with {!strikeout!}?

I think that {-strikeout-} already looks good. It looks like it suggests strikethrough / strikeout. {!this!} not only makes me think "warning", but also, the tall slim characters (including |) don't look good inside the curlies, IMO.

Underline markup is not used very often. If I needed it, and if it weren't {_underline_} I bet I'd have to look up its syntax to figure out whether it's {+this+} or {|this|} (neither of which make me think, "underline").

Since bold is used much less often than italic, if I were starting from scratch, I'd consider:

  • *italic* or {*italic*}
  • {+bold+}
  • {_underline_} (aka inserted)
  • keep {-strikeout-} (aka deleted)
  • {,Small Caps,}

and keep the others as they are ({~sub~}, {^sup^}, {=highlighted=}).

@bpj
Copy link
Author

bpj commented Jul 27, 2022

@uvtc I can't answer for @jgm but my thought is not to replace anything, but rather that <ins> and <del> are specifically for material which was inserted/marked for deletion in the current revision which will be unmarked/removed in the next revision and so are inappropriate for material which is to be more "permanently" underlined/struck out for whatever reason. While it is true that the Pandoc AST doesn't currently have any dedicated elements for making the distinction — beyond rendering one or the other with a span with a class — I think it is an important distinction and I see no reason why djot cannot make it. Djot can output HTML directly and the distinction may anyway (unfortunately) be moot in some other formats like LaTeX, so IMO its being lost in transition to the Pandoc AST isn't a huge deal, however unfortunate. Thanks to the braces it is feasible to remove deleted material or remove the {+ and +} markup with regex1, which provisionally makes the semantic distinction meaningful anyway.

Footnotes

  1. In most dialects something like \{\-.*?\-\} (in Lua %{%-.-%-%}) will do since nested deletions are probably not a thing anyway. In Perl you could handle even that with the Regexp::Common::balanced module:

    use Regexp::Common qw[balanced];
    
    $text =~ s/$RE{balanced}{-begin=>'{-'}{-end=>'-}'}//g;
    

@dumblob
Copy link

dumblob commented Jul 27, 2022

@uvtc I can't answer for @jgm but my thought is not to replace anything, but rather that <ins> and <del> are specifically for material which was inserted/marked for deletion in the current revision which will be unmarked/removed in the next revision and so are inappropriate for material which is to be more "permanently" underlined/struck out for whatever reason.

For discussion about ins del sub etc. see #15 .

@jgm
Copy link
Owner

jgm commented Jul 27, 2022

Yes, that's my thinking. Semantically, "inserted" and "deleted" are different from "underline" and "strikethrough," even if that's how browsers render them typically.

@uvtc
Copy link
Contributor

uvtc commented Jul 30, 2022

In that case, then those look to me like pretty good uses of {|pipes|}, {!bangs!}, and {,commas,}.

The {|pipes|} make some sense for underlines to me too, since pipes are also used for tables (which are, of course, themselves made up of lines). And the bangs seem good too given, as @bpj points out, their association with "not".

It seems like a nice feature of djot that it may contain not only ways to mark insert and delete (for providing feedback to someone on proposed changes), as well as explicit underline and strikethrough. I don't know of another light markup format that provides that.

And it also leaves (at the least) {@at@}, {&amp&}, {;semi;}, and {?question mark?} still available for possible future use if something else is needed down the road.

BTW, I really like djot's simplicity of using the curlies to disambiguate syntax when necessary.

@waldyrious
Copy link
Contributor

@uvtc I can't answer for @jgm but my thought is not to replace anything, but rather that <ins> and <del> are specifically for material which was inserted/marked for deletion in the current revision which will be unmarked/removed in the next revision and so are inappropriate for material which is to be more "permanently" underlined/struck out for whatever reason.

For discussion about ins del sub etc. see #15 .

See also #13 for discussion about underline and strike-through (including an alternative syntax proposal for them).

@bpj
Copy link
Author

bpj commented Nov 6, 2022

@waldyrious what do you mean about #13? I see nothing there which is relevant to this, or in the spirit of djot; djot specifically rejects doubled delimiter characters, IMO for good reasons.

@waldyrious
Copy link
Contributor

Indeed, I was careless with my comment there and the reference to the thread here — my apologies. I have now added a (hopefully) more considered comment to that thread.

The relevance of both of my comments there to this issue lies in the syntax proposal for presentational tags, including those discussed here, namely underline and strikeout.

@evanrelf
Copy link

evanrelf commented Nov 7, 2022

I was reading the cheatsheet, and I noticed that code didn't have support for {` and `}, which would bring it in line with italic and bold:

Markup Result
_italic_ or {_italic_} italic
*bold* or {*bold*} bold
`verbatim/code` verbatim/code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants