Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] column: option to not exit with error 'Invalid or incomplete multibyte or wide character' #542

Closed
ProBackup-nl opened this issue Nov 21, 2017 · 7 comments

Comments

@ProBackup-nl
Copy link

ProBackup-nl commented Nov 21, 2017

  • util-linux: 2.31
  • LANG=C or (??_??.UTF-8 without correct locales set)

Imagine a minimal (rescue) boot environment like initramfs.

Trying to print a formatted string in columns that contains multi-byte characters — like ¹ (superscript 1) — using command:

$ printf '%s\n' $'fw\302\271' | column -c111
column: read failed: Invalid or incomplete multibyte or wide character

results in an error because of either a LANG=C (or POSIX, or another non *.UTF-8 setting) or in case there is a *.UTF-8 locale string set, because the environment is without the corresponding locale files being present.

I would be nice if the column has an option that would exit without an error. Like a best effort result.

Assumption

My guess is that column needs i18n and/or utf8 for figuring out that \302\271 is actually to be counted as 1 character position while printing.

Proposal

-w, --widechar-one    assume all widechars print as 1 character when locale/i18n lookup fails (default is exit with error)
or
-m, --imperfect-print   better print something not perfectly aligned then bailing out with an error
@ProBackup-nl ProBackup-nl changed the title column: feature request: option to not exit with error 'Invalid or incomplete multibyte or wide character' [RFE] column: option to not exit with error 'Invalid or incomplete multibyte or wide character' Nov 21, 2017
@karelzak
Copy link
Collaborator

We already support this use-case (ignore wide chars) if compiled without wide chars support. Maybe we can use this code path also when requested by user.

Not sure about the option name, maybe --force would be good enough.

karelzak added a commit that referenced this issue Nov 22, 2017
Addresses: #542
Signed-off-by: Karel Zak <kzak@redhat.com>
@karelzak
Copy link
Collaborator

karelzak commented Nov 22, 2017

OK, I have implemented it in another way.

  • invalid multi-byte seq. are replaced by \xhex
  • the feature is enabled by default, you don't have to use any option (like --force)

I think it's better to be optimistic for tools like column(1) and print as much as possible rather than exit with EILSEQ.

@karelzak
Copy link
Collaborator

example:

$ printf '%s bbb ccc\n' $'fw\302\271' | LANG=C ./column -c111 
fw\xc2\xb9 bbb ccc

@ProBackup-nl
Copy link
Author

ProBackup-nl commented Nov 22, 2017

I love your optimistic approach for this issue, to print as much as possible, without adding a new option. Will test as soon as the release drips down to Arch Linux.

karelzak added a commit that referenced this issue Dec 14, 2017
Addresses: #542
Signed-off-by: Karel Zak <kzak@redhat.com>
@alindeman
Copy link

@karelzak Could this be related to why I'm seeing column escape ANSI color sequences in a recent version? Is there any way to make column not escape these color sequences?

Simplified example:

printf '\x1b[33mHello\x1b[m\n' | column -t
\x1b[33mHello\x1b[m

In older versions of column (e.g., 2.23 from CentOS 7), the color sequences are preserved.

@karelzak
Copy link
Collaborator

karelzak commented Jan 2, 2018

The current stable version (v2.31.1) works as expected

$ printf '\x1b[33mHello\x1b[m\n' | ./column -t
Hello

See #490

@ProBackup-nl
Copy link
Author

Also

printf '%s\n' $'fw\302\271' | column -c111
fw\xc2\xb9

works as expected in v2.31.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants