Add --wrap=sentence #7435

jwflory · 2021-07-09T16:01:10Z

Describe your proposed improvement and the problem it solves.

Improvement: AsciiDoc outputs should place each sentence on a new line, and paragraphs should be separated by an additional new line (or an empty line).
Solved problem: Follows AsciiDoc Recommended Practices of "one sentence per line" (see excerpt below)

It’s important to note that this technique works because AsciiDoc doesn’t treat wrapped lines in prose as hard line breaks. At least, it doesn’t show up that way to the reader. The line breaks between contiguous lines of prose will not be visible in the rendered document (i.e., as the reader sees it). While a single line break doesn’t appear in the output, two consecutive line breaks starts a new paragraph (or other block).

This is important to me because I always hand-edit pandoc-exported AsciiDoc files to make these changes. It would save me manual time and effort to have AsciiDoc output from pandoc follow the one sentence per line recommended practice. This practice is one of the reasons I choose to use AsciiDoc in projects, so I often rely on pandoc for converting content into AsciiDoc.

Conversion example

Before (gfm):

**Lorem ipsum dolor sit amet, consectetur adipiscing elit.** Nulla non accumsan diam. _Etiam vel augue neque_. Mauris sit amet varius arcu. [Nulla aliquam](https://github.com/jgm/pandoc) consectetur felis, eu lobortis ligula fermentum semper.

Integer odio dui, pretium in leo ut: pharetra bibendum elit. Sed id lacinia arcu; proin efficitur pretium consectetur. Nam semper in diam quis facilisis. Vestibulum sit amet vehicula ante. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nulla facilisi.

* **Nulla pulvinar ante vel nisi consectetur**: eu tempor lectus cursus.
* _Quisque at diam at libero viverra sagittis_: Sed fermentum porta facilisis.

After (AsciiDoc):

*Lorem ipsum dolor sit amet, consectetur adipiscing elit.*
Nulla non accumsan diam.
_Etiam vel augue neque_.
Mauris sit amet varius arcu.
https://github.com/jgm/pandoc[Nulla aliquam] consectetur felis, eu lobortis ligula fermentum semper.

Integer odio dui, pretium in leo ut:
pharetra bibendum elit.
Sed id lacinia arcu;
proin efficitur pretium consectetur.
Nam semper in diam quis facilisis.
Vestibulum sit amet vehicula ante.
Interdum et malesuada fames ac ante ipsum primis in faucibus.
Nulla facilisi.

* **Nulla pulvinar ante vel nisi consectetur**:
  eu tempor lectus cursus.
* _Quisque at diam at libero viverra sagittis_:
  Sed fermentum porta facilisis.

Describe alternatives you've considered.

Hand-editing these changes into pandoc output. This is lengthy and time-consuming for lengthy documents. Implementing this would save me and other AsciiDoc writers time when following this convention.

Thanks for your consideration! pandoc helps me a lot and it is incredibly useful! 🙌🏻

This commit converts the Open Hardware reading list from Markdown into an AsciiDoc format. This is a clean conversion with no additional changes made by me. This is done in part to demonstrate how I normally hand-convert these files using a `pandoc`-driven workflow. Part of #43. A conversion example for jgm/pandoc#7435. Signed-off-by: Justin W. Flory (he/him) [UNICEF Innovation] <jflory@unicef.org>

jgm · 2021-07-09T18:48:12Z

Do you know about --wrap=preserve?

jgm · 2021-07-09T18:51:35Z

I mention --wrap=preserve because that allows you to do your one-sentence-per-line thing in the source document and preserve this in the asciidoc output.

Automatically detecting sentence boundaries isn't easy or reliable (esp. when you support many languages). For example, consider:

He's the Prof. Barnes is in his study.

He said that Prof. Barnes is in his study.

In the first example there are two sentences; in the second just one. Figuring that out requires a pretty good grasp of English syntax and abbreviation conventions, and pandoc isn't that smart.

jgm · 2021-07-09T18:54:52Z

That said, in the man writer we do split sentences, because roff treats a line-ending period differently from a line-internal one. For this we use splitSentences from Text.Pandoc.Shared; however, it is not entirely reliable for reasons given above.

jgm · 2021-07-09T18:56:13Z

Perhaps we could contemplate adding --wrap=sentence.

jwflory added the enhancement label Jul 9, 2021

jwflory mentioned this issue Jul 9, 2021

[CONTENT] Convert Markdown content to AsciiDoc unicef/inventory#43

Closed

21 tasks

jgm changed the title ~~asciidoc: Parse each sentence on a newline to follow AsciiDoc Recommended Practices~~ asciidoc: write each sentence on a separate line to follow AsciiDoc Recommended Practices Jul 9, 2021

jgm changed the title ~~asciidoc: write each sentence on a separate line to follow AsciiDoc Recommended Practices~~ Add --wrap=sentence Sep 19, 2021

MatthijsBlom mentioned this issue Oct 16, 2022

Add markdown version learnyouahaskell/learnyouahaskell.github.io#35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --wrap=sentence #7435

Add --wrap=sentence #7435

jwflory commented Jul 9, 2021

jgm commented Jul 9, 2021

jgm commented Jul 9, 2021

jgm commented Jul 9, 2021

jgm commented Jul 9, 2021

Add --wrap=sentence #7435

Add --wrap=sentence #7435

Comments

jwflory commented Jul 9, 2021

Describe your proposed improvement and the problem it solves.

Conversion example

Describe alternatives you've considered.

jgm commented Jul 9, 2021

jgm commented Jul 9, 2021

jgm commented Jul 9, 2021

jgm commented Jul 9, 2021