Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --wrap=sentence #7435

Open
jwflory opened this issue Jul 9, 2021 · 4 comments
Open

Add --wrap=sentence #7435

jwflory opened this issue Jul 9, 2021 · 4 comments

Comments

@jwflory
Copy link

jwflory commented Jul 9, 2021

Describe your proposed improvement and the problem it solves.

  • Improvement: AsciiDoc outputs should place each sentence on a new line, and paragraphs should be separated by an additional new line (or an empty line).
  • Solved problem: Follows AsciiDoc Recommended Practices of "one sentence per line" (see excerpt below)

It’s important to note that this technique works because AsciiDoc doesn’t treat wrapped lines in prose as hard line breaks. At least, it doesn’t show up that way to the reader. The line breaks between contiguous lines of prose will not be visible in the rendered document (i.e., as the reader sees it). While a single line break doesn’t appear in the output, two consecutive line breaks starts a new paragraph (or other block).

This is important to me because I always hand-edit pandoc-exported AsciiDoc files to make these changes. It would save me manual time and effort to have AsciiDoc output from pandoc follow the one sentence per line recommended practice. This practice is one of the reasons I choose to use AsciiDoc in projects, so I often rely on pandoc for converting content into AsciiDoc.

Conversion example

Before (gfm):

**Lorem ipsum dolor sit amet, consectetur adipiscing elit.** Nulla non accumsan diam. _Etiam vel augue neque_. Mauris sit amet varius arcu. [Nulla aliquam](https://github.com/jgm/pandoc) consectetur felis, eu lobortis ligula fermentum semper.

Integer odio dui, pretium in leo ut: pharetra bibendum elit. Sed id lacinia arcu; proin efficitur pretium consectetur. Nam semper in diam quis facilisis. Vestibulum sit amet vehicula ante. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nulla facilisi.

* **Nulla pulvinar ante vel nisi consectetur**: eu tempor lectus cursus.
* _Quisque at diam at libero viverra sagittis_: Sed fermentum porta facilisis.

After (AsciiDoc):

*Lorem ipsum dolor sit amet, consectetur adipiscing elit.*
Nulla non accumsan diam.
_Etiam vel augue neque_.
Mauris sit amet varius arcu.
https://github.com/jgm/pandoc[Nulla aliquam] consectetur felis, eu lobortis ligula fermentum semper.

Integer odio dui, pretium in leo ut:
pharetra bibendum elit.
Sed id lacinia arcu;
proin efficitur pretium consectetur.
Nam semper in diam quis facilisis.
Vestibulum sit amet vehicula ante.
Interdum et malesuada fames ac ante ipsum primis in faucibus.
Nulla facilisi.

* **Nulla pulvinar ante vel nisi consectetur**:
  eu tempor lectus cursus.
* _Quisque at diam at libero viverra sagittis_:
  Sed fermentum porta facilisis.

Describe alternatives you've considered.

Hand-editing these changes into pandoc output. This is lengthy and time-consuming for lengthy documents. Implementing this would save me and other AsciiDoc writers time when following this convention.

Thanks for your consideration! pandoc helps me a lot and it is incredibly useful! 🙌🏻

jwflory added a commit to unicef/inventory that referenced this issue Jul 9, 2021
This commit converts the Open Hardware reading list from Markdown into
an AsciiDoc format. This is a clean conversion with no additional
changes made by me. This is done in part to demonstrate how I normally
hand-convert these files using a `pandoc`-driven workflow.

Part of #43. A conversion example for jgm/pandoc#7435.

Signed-off-by: Justin W. Flory (he/him) [UNICEF Innovation] <jflory@unicef.org>
@jgm
Copy link
Owner

jgm commented Jul 9, 2021

Do you know about --wrap=preserve?

@jgm
Copy link
Owner

jgm commented Jul 9, 2021

I mention --wrap=preserve because that allows you to do your one-sentence-per-line thing in the source document and preserve this in the asciidoc output.

Automatically detecting sentence boundaries isn't easy or reliable (esp. when you support many languages). For example, consider:

He's the Prof. Barnes is in his study.

He said that Prof. Barnes is in his study.

In the first example there are two sentences; in the second just one. Figuring that out requires a pretty good grasp of English syntax and abbreviation conventions, and pandoc isn't that smart.

@jgm
Copy link
Owner

jgm commented Jul 9, 2021

That said, in the man writer we do split sentences, because roff treats a line-ending period differently from a line-internal one. For this we use splitSentences from Text.Pandoc.Shared; however, it is not entirely reliable for reasons given above.

@jgm
Copy link
Owner

jgm commented Jul 9, 2021

Perhaps we could contemplate adding --wrap=sentence.

@jgm jgm changed the title asciidoc: Parse each sentence on a newline to follow AsciiDoc Recommended Practices asciidoc: write each sentence on a separate line to follow AsciiDoc Recommended Practices Jul 9, 2021
@jgm jgm changed the title asciidoc: write each sentence on a separate line to follow AsciiDoc Recommended Practices Add --wrap=sentence Sep 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants