You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Look, all the non-Latin characters are represented as escape sequences. It is not a showstopper, since the rendered man page looks good, but every non-Latin character is represented with 5 bytes (in case of Greek), or 8 bytes (in case of Cyrillic and Armenian). If the characters are not escaped, they would occupy only 2 bytes each. It is just waste of space.
Modern groff allows using UTF-8 encoding in source files:
$ cat test.man
.\" Automatically generated by Pandoc 2.14.0.3
.\"
.TH "" "" "" "" ""
.hy
.SH Ελληνικά
.PP
српски հայերեն
$ groff -D utf8 -m man -T utf8 < test.man
() ()
Ελληνικά
српски հայերեն
()
Thus, I request the man writer outputs non-Latin character as-is, without converting them to escape sequences.
Pandoc version:
$ pandoc --version
pandoc 2.14.0.3
Compiled with pandoc-types 1.22.1, texmath 0.12.3.3, skylighting 0.10.5.2,
citeproc 0.4.0.1, ipynb 0.1.0.1
User data directory: /home/vdb/.local/share/pandoc
Copyright (C) 2006-2021 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
It is not the last available version. However, I scanned the pandoc release notes for releases after 2.14.0.3, it seems there were no changes in man writer.
BTW, in Fedora 37 man pages in languages with non-Latin writing systems do not use escape sequences. For example, Serbian:
$ cat /usr/share/man/sr/man1/cat.1.gz | gunzip | head -n20
.\" -*- coding: UTF-8 -*-
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.48.5.
.\"*******************************************************************
.\"
.\" This file was generated with po4a. Translate the source file.
.\"
.\"*******************************************************************
.TH CAT 1 "Августа 2022" "ГНУ coreutils 9.1" "Корисничке наредбе"
.SH НАЗИВ
cat \- concatenate files and print on the standard output
.SH УВОД
\fBcat\fP [\fI\,ОПЦИЈА\/\fP]... [\fI\,ДАТОТЕКА\/\fP]...
.SH ОПИС
.\" Add any additional description here
.PP
Надовежите ДАТОТЕКУ(Е) на стандардни излаз.
.PP
Без ДАТОТЕКЕ, или када је ДАТОТЕКА \-, чита стандардни улаз.
.TP
\fB\-A\fP, \fB\-\-show\-all\fP
Or Japanese:
$ cat /usr/share/man/ja/man1/cat.1.gz | gunzip | head -n20
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.13.
.TH CAT "1" "2021年5月" "GNU coreutils" "ユーザーコマンド"
.SH 名前
cat \- ファイルの内容を連結して標準出力に出力する
.SH 書式
.B cat
[\fI\,オプション\/\fR]... [\fI\,ファイル\/\fR]...
.SH 説明
.\" Add any additional description here
.PP
ファイル (複数可) の内容を結合して標準出力に出力します。
.PP
ファイルの指定がない場合や FILE が \- の場合, 標準入力から読み込みを行います。
.HP
\fB\-A\fR, \fB\-\-show\-all\fR \fB\-vET\fR と同じ
.TP
\fB\-b\fR, \fB\-\-number\-nonblank\fR
空行以外に行番号を付ける。\-n より優先される
.HP
\fB\-e\fR \fB\-vE\fR と同じ
I am not aware about other distros, though.
The text was updated successfully, but these errors were encountered:
It used to be that UTF-8 in man pages was not reliably supported.
Perhaps that situation has changed and we can revisit this. In any case, we could keep the present behavior when the --ascii option is used.
Consider an example:
Source markdown file includes Greek, Cyrillic, and Armenian letters.
Pandoc converted markdown to man page, it is ok. However, let's have a look into .man file content:
Look, all the non-Latin characters are represented as escape sequences. It is not a showstopper, since the rendered man page looks good, but every non-Latin character is represented with 5 bytes (in case of Greek), or 8 bytes (in case of Cyrillic and Armenian). If the characters are not escaped, they would occupy only 2 bytes each. It is just waste of space.
Modern
groff
allows using UTF-8 encoding in source files:Thus, I request the man writer outputs non-Latin character as-is, without converting them to escape sequences.
Pandoc version:
It is not the last available version. However, I scanned the pandoc release notes for releases after 2.14.0.3, it seems there were no changes in man writer.
BTW, in Fedora 37 man pages in languages with non-Latin writing systems do not use escape sequences. For example, Serbian:
Or Japanese:
I am not aware about other distros, though.
The text was updated successfully, but these errors were encountered: