Skip to content


Subversion checkout URL

You can clone with
Download ZIP


Make it easier to work with encoded (non-ascii) strings #1

rwstauner opened this Issue · 3 comments

1 participant


RJBS recently made this change:

Here is his report, excerpted from IRC:
should look like

Things to report:

  1. T::VC expects "string" to be a byte string, not text. I had to encode it first.
  2. T::VC returns a byte string, not a text string. I had to decode it afterward.
  3. I had to communicate the character encoding of the bytestream to Vim, which meant passing '+set fenc=utf-8', but I could not do this without copying and pasting the default options as well.

Thoughts on addressing this:

  1. Either document that it expects a byte string or fix it to expect unicode, which it will then encode, and document that it expects text, not bytes. I suggest the latter. People passing a string should be passing unencoded text. It also eliminates the question of passing the encoding, because if you pass a byte string, you will also need to allow for the encoding to be passed.
  2. I strongly suggest returning a character string, but otherwise document that an encoded byte string is returned. In either case, you probably need to pass an argument to ensure that it is always returned in one encoding, so it can be reliably decoded.
  3. provide access to @VIM_OPTIONS (or whatever it was called) as ->default_vim_options so it can be included in "those options plus more" as (options => [ T::VC->default_vim_options, ... ])

Looking at usage by reverse deps I found this:

    # any encoding will do if vim automatically detects it
    my $vim_encoding = 'utf-8';
    my $BOM = "\x{feff}";
    my $syn = Text::VimColor->new(
            filetype    => $lang,
            string      => encode($vim_encoding, $BOM . $str),
    $str = decode($vim_encoding, $syn->html);
    $str =~ s/^$BOM//;
    return $str;

For reference, rjbs's similar code is here:

sub build_html {
  my ($self, $str, $param) = @_;

  my $octets = Encode::encode('utf-8', $str, Encode::FB_CROAK);

  my $vim = Text::VimColor->new(
    string   => $octets,
    filetype => $param->{filetype},

    vim_options => [
      qw( -RXZ -i NONE -u NONE -N -n ), "+set nomodeline", '+set fenc=utf-8',

  my $html_bytes = $vim->html;
  my $html = Encode::decode('utf-8', $html_bytes);

  return $html;
@rwstauner rwstauner referenced this issue from a commit
@rwstauner Test methods of specifying encoding used by other modules
Ensure that we don't break backward compatibility
with other modules that have already implemented workarounds.

This is the beginning of addressing gh-1.

I have added extra_vim_options => [] for an easy way to append options to the list after the defaults.

@rwstauner rwstauner closed this issue from a commit
@rwstauner Accept (and return) character strings
closes gh-1.

Thanks, RJBS!
@rwstauner rwstauner closed this in e227c46
@rwstauner rwstauner referenced this issue from a commit
@rwstauner v0.23
  - Attempt to do the right thing with character strings:
    Encode them in UTF-8, tell vim the file encoding (UTF-8),
    and return a (decoded) character string.
    Thanks to Ricardo Signes for the very helpful report (gh-1).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.