Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

=encoding utf8 not supported by rtf encoder #92

Closed
japharl opened this issue Oct 3, 2018 · 4 comments
Closed

=encoding utf8 not supported by rtf encoder #92

japharl opened this issue Oct 3, 2018 · 4 comments

Comments

@japharl
Copy link

japharl commented Oct 3, 2018

If pod string is a utf8 string, and =encoding utf8 is specified at the beginning of the rtf output, utf8 is not written to the rtf file.

@japharl
Copy link
Author

japharl commented Oct 4, 2018

Note - tentative solution for myself was to use the perl module MsOffice::Word::HTML::Writer as a replacement for Pod::Simple::rtf . The code change was reasonably straight forward for my purposes...

@khwilliamson
Copy link
Contributor

Thanks for the report. This will be fixed in the next release, in a couple weeks

khwilliamson added a commit to khwilliamson/pod-simple that referenced this issue May 15, 2019
This resolves github issue perl-pod#92.

The problem was on all but ancient perls, a pattern was matching
everything but ascii characters, and it was supposed to stop at \xff.
I think it was me who introduced this bug at some point, and since we
didn't have a test file that included Unicode RTF, the bug went
unnoticed until reported.  Code existed in this module to handle the
case; it's just that this pattern caused all above-255 code points to be
deleted from the input before that code got executed.

The solution here is to not use that pattern at all, but to generate a
new pattern based on just the characters that we have escapes for.  This
also works on EBCDIC platforms of all vintages.

However, I notice that the code doesn't handle code points above 0xFFFF.
This will be addressed in the next few commits.
khwilliamson added a commit to khwilliamson/pod-simple that referenced this issue May 15, 2019
This resolves github issue perl-pod#92.

The problem was on all but ancient perls, a pattern was matching
everything but ascii characters, and it was supposed to stop at \xff.
I think it was me who introduced this bug at some point, and since we
didn't have a test file that included Unicode RTF, the bug went
unnoticed until reported.  Code existed in this module to handle the
case; it's just that this pattern caused all above-255 code points to be
deleted from the input before that code got executed.

The solution here is to not use that pattern at all, but to generate a
new pattern based on just the characters that we have escapes for.  This
also works on EBCDIC platforms of all vintages.

However, I notice that the code doesn't handle code points above 0xFFFF.
This will be addressed in the next few commits.
khwilliamson added a commit to khwilliamson/pod-simple that referenced this issue May 20, 2019
This resolves github issue perl-pod#92.

The problem was on all but ancient perls, a pattern was matching
everything but ascii characters, and it was supposed to stop at \xff.
I think it was me who introduced this bug at some point, and since we
didn't have a test file that included Unicode RTF, the bug went
unnoticed until reported.  Code existed in this module to handle the
case; it's just that this pattern caused all above-255 code points to be
deleted from the input before that code got executed.

The solution here is to not use that pattern at all, but to generate a
new pattern based on just the characters that we have escapes for.  This
also works on EBCDIC platforms of all vintages.

However, I notice that the code doesn't handle code points above 0xFFFF.
This will be addressed in the next few commits.
khwilliamson added a commit to khwilliamson/pod-simple that referenced this issue May 20, 2019
This resolves github issue perl-pod#92.

The problem was on all but ancient perls, a pattern was matching
everything but ascii characters, and it was supposed to stop at \xff.
I think it was me who introduced this bug at some point, and since we
didn't have a test file that included Unicode RTF, the bug went
unnoticed until reported.  Code existed in this module to handle the
case; it's just that this pattern caused all above-255 code points to be
deleted from the input before that code got executed.

The solution here is to not use that pattern at all, but to generate a
new pattern based on just the characters that we have escapes for.  This
also works on EBCDIC platforms of all vintages.

However, I notice that the code doesn't handle code points above 0xFFFF.
This will be addressed in the next few commits.
khwilliamson added a commit to khwilliamson/pod-simple that referenced this issue May 20, 2019
This resolves github issue perl-pod#92.

The problem was on all but ancient perls, a pattern was matching
everything but ascii characters, and it was supposed to stop at \xff.
I think it was me who introduced this bug at some point, and since we
didn't have a test file that included Unicode RTF, the bug went
unnoticed until reported.  Code existed in this module to handle the
case; it's just that this pattern caused all above-255 code points to be
deleted from the input before that code got executed.

The solution here is to not use that pattern at all, but to generate a
new pattern based on just the characters that we have escapes for.  This
also works on EBCDIC platforms of all vintages.

However, I notice that the code doesn't handle code points above 0xFFFF.
This will be addressed in the next few commits.
khwilliamson added a commit to khwilliamson/pod-simple that referenced this issue May 21, 2019
This resolves github issue perl-pod#92.

The problem was on all but ancient perls, a pattern was matching
everything but ascii characters, and it was supposed to stop at \xff.
I think it was me who introduced this bug at some point, and since we
didn't have a test file that included Unicode RTF, the bug went
unnoticed until reported.  Code existed in this module to handle the
case; it's just that this pattern caused all above-255 code points to be
deleted from the input before that code got executed.

The solution here is to not use that pattern at all, but to generate a
new pattern based on just the characters that we have escapes for.  This
also works on EBCDIC platforms of all vintages.

However, I notice that the code doesn't handle code points above 0xFFFF.
This will be addressed in the next few commits.
khwilliamson added a commit to khwilliamson/pod-simple that referenced this issue May 21, 2019
This resolves github issue perl-pod#92.

The problem was on all but ancient perls, a pattern was matching
everything but ascii characters, and it was supposed to stop at \xff.
I think it was me who introduced this bug at some point, and since we
didn't have a test file that included Unicode RTF, the bug went
unnoticed until reported.  Code existed in this module to handle the
case; it's just that this pattern caused all above-255 code points to be
deleted from the input before that code got executed.

The solution here is to not use that pattern at all, but to generate a
new pattern based on just the characters that we have escapes for.  This
also works on EBCDIC platforms of all vintages.

However, I notice that the code doesn't handle code points above 0xFFFF.
This will be addressed in the next few commits.
@khwilliamson
Copy link
Contributor

Thanks for the report. This is fixed in version 3.36 now available on CPAN

@japharl
Copy link
Author

japharl commented May 22, 2019

++ Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants