Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Make it easier to work with encoded (non-ascii) strings #1

Closed
rwstauner opened this Issue · 3 comments

1 participant

@rwstauner
Owner

RJBS recently made this change:
rjbs/Pod-Elemental-Transformer-VimHTML@d465c21

Here is his report, excerpted from IRC:

https://skitch.com/rjbs/gfti4/advcal-unicode
should look like
https://skitch.com/rjbs/gfthb/advcal-unicode

Things to report:

  1. T::VC expects "string" to be a byte string, not text. I had to encode it first.
  2. T::VC returns a byte string, not a text string. I had to decode it afterward.
  3. I had to communicate the character encoding of the bytestream to Vim, which meant passing '+set fenc=utf-8', but I could not do this without copying and pasting the default options as well.

Thoughts on addressing this:

  1. Either document that it expects a byte string or fix it to expect unicode, which it will then encode, and document that it expects text, not bytes. I suggest the latter. People passing a string should be passing unencoded text. It also eliminates the question of passing the encoding, because if you pass a byte string, you will also need to allow for the encoding to be passed.
  2. I strongly suggest returning a character string, but otherwise document that an encoded byte string is returned. In either case, you probably need to pass an argument to ensure that it is always returned in one encoding, so it can be reliably decoded.
  3. provide access to @VIM_OPTIONS (or whatever it was called) as ->default_vim_options so it can be included in "those options plus more" as (options => [ T::VC->default_vim_options, ... ])
@rwstauner
Owner

Looking at usage by reverse deps I found this:
https://metacpan.org/source/MORITZ/App-Mowyw-v0.7.1/lib/App/Mowyw.pm#L566

    # any encoding will do if vim automatically detects it
    my $vim_encoding = 'utf-8';
    my $BOM = "\x{feff}";
    my $syn = Text::VimColor->new(
            filetype    => $lang,
            string      => encode($vim_encoding, $BOM . $str),
            );
    $str = decode($vim_encoding, $syn->html);
    $str =~ s/^$BOM//;
    return $str;
@rwstauner
Owner

For reference, rjbs's similar code is here:
https://metacpan.org/source/RJBS/Pod-Elemental-Transfomer-VimHTML-0.093581/lib/Pod/Elemental/Transformer/VimHTML.pm#L15

sub build_html {
  my ($self, $str, $param) = @_;

  my $octets = Encode::encode('utf-8', $str, Encode::FB_CROAK);

  my $vim = Text::VimColor->new(
    string   => $octets,
    filetype => $param->{filetype},

    vim_options => [
      qw( -RXZ -i NONE -u NONE -N -n ), "+set nomodeline", '+set fenc=utf-8',
    ],
  );

  my $html_bytes = $vim->html;
  my $html = Encode::decode('utf-8', $html_bytes);

  return $html;
}
@rwstauner rwstauner referenced this issue from a commit
@rwstauner Test methods of specifying encoding used by other modules
Ensure that we don't break backward compatibility
with other modules that have already implemented workarounds.

This is the beginning of addressing gh-1.
d97f6c9
@rwstauner
Owner

I have added extra_vim_options => [] for an easy way to append options to the list after the defaults.

@rwstauner rwstauner closed this issue from a commit
@rwstauner Accept (and return) character strings
closes gh-1.

Thanks, RJBS!
e227c46
@rwstauner rwstauner closed this in e227c46
@rwstauner rwstauner referenced this issue from a commit
@rwstauner v0.23
  - Attempt to do the right thing with character strings:
    Encode them in UTF-8, tell vim the file encoding (UTF-8),
    and return a (decoded) character string.
    Thanks to Ricardo Signes for the very helpful report (gh-1).
7420633
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.