Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Make it easier to work with encoded (non-ascii) strings #1

Closed
rwstauner opened this Issue November 04, 2011 · 3 comments

1 participant

Randy Stauner
Randy Stauner
Owner

RJBS recently made this change:
rjbs/Pod-Elemental-Transformer-VimHTML@d465c21

Here is his report, excerpted from IRC:

https://skitch.com/rjbs/gfti4/advcal-unicode
should look like
https://skitch.com/rjbs/gfthb/advcal-unicode

Things to report:

  1. T::VC expects "string" to be a byte string, not text. I had to encode it first.
  2. T::VC returns a byte string, not a text string. I had to decode it afterward.
  3. I had to communicate the character encoding of the bytestream to Vim, which meant passing '+set fenc=utf-8', but I could not do this without copying and pasting the default options as well.

Thoughts on addressing this:

  1. Either document that it expects a byte string or fix it to expect unicode, which it will then encode, and document that it expects text, not bytes. I suggest the latter. People passing a string should be passing unencoded text. It also eliminates the question of passing the encoding, because if you pass a byte string, you will also need to allow for the encoding to be passed.
  2. I strongly suggest returning a character string, but otherwise document that an encoded byte string is returned. In either case, you probably need to pass an argument to ensure that it is always returned in one encoding, so it can be reliably decoded.
  3. provide access to @VIM_OPTIONS (or whatever it was called) as ->default_vim_options so it can be included in "those options plus more" as (options => [ T::VC->default_vim_options, ... ])
Randy Stauner
Owner

Looking at usage by reverse deps I found this:
https://metacpan.org/source/MORITZ/App-Mowyw-v0.7.1/lib/App/Mowyw.pm#L566

    # any encoding will do if vim automatically detects it
    my $vim_encoding = 'utf-8';
    my $BOM = "\x{feff}";
    my $syn = Text::VimColor->new(
            filetype    => $lang,
            string      => encode($vim_encoding, $BOM . $str),
            );
    $str = decode($vim_encoding, $syn->html);
    $str =~ s/^$BOM//;
    return $str;
Randy Stauner
Owner

For reference, rjbs's similar code is here:
https://metacpan.org/source/RJBS/Pod-Elemental-Transfomer-VimHTML-0.093581/lib/Pod/Elemental/Transformer/VimHTML.pm#L15

sub build_html {
  my ($self, $str, $param) = @_;

  my $octets = Encode::encode('utf-8', $str, Encode::FB_CROAK);

  my $vim = Text::VimColor->new(
    string   => $octets,
    filetype => $param->{filetype},

    vim_options => [
      qw( -RXZ -i NONE -u NONE -N -n ), "+set nomodeline", '+set fenc=utf-8',
    ],
  );

  my $html_bytes = $vim->html;
  my $html = Encode::decode('utf-8', $html_bytes);

  return $html;
}
Randy Stauner rwstauner referenced this issue from a commit January 17, 2012
Randy Stauner Test methods of specifying encoding used by other modules
Ensure that we don't break backward compatibility
with other modules that have already implemented workarounds.

This is the beginning of addressing gh-1.
d97f6c9
Randy Stauner
Owner

I have added extra_vim_options => [] for an easy way to append options to the list after the defaults.

Randy Stauner rwstauner closed this issue from a commit February 02, 2013
Randy Stauner Accept (and return) character strings
closes gh-1.

Thanks, RJBS!
e227c46
Randy Stauner rwstauner closed this in e227c46 February 02, 2013
Randy Stauner rwstauner referenced this issue from a commit February 02, 2013
Randy Stauner v0.23
  - Attempt to do the right thing with character strings:
    Encode them in UTF-8, tell vim the file encoding (UTF-8),
    and return a (decoded) character string.
    Thanks to Ricardo Signes for the very helpful report (gh-1).
7420633
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.