Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] fix utf8.c overflowed VC6's preproc macro buffer and syntax errored #16499

Closed
p5pRT opened this issue Apr 8, 2018 · 7 comments

Comments

@p5pRT
Copy link

commented Apr 8, 2018

Migrated from rt.perl.org#133088 (status was 'resolved')

Searchable as RT133088$

@p5pRT

This comment has been minimized.

Copy link
Author

commented Apr 8, 2018

From @bulk88

Created by @bulk88

See attached patch. This is intended for 5.28. Maybe backport too. VC6
DEBUGGING cant be built w/o this patch. KHW's input is needed if U8 is
the right type for var c or should it be U32. Most of the perl codebase
uses U8, a couple examples use U32. IDK what the return type is of
EIGHT_BIT_UTF8_TO_NATIVE macro and its not documented.

details, console error
----------------------------------------

cl -c -nologo -GF -W3 -I.\include -I. -I.. -DWIN32 -D_CONSOLE
-DNO_STRICT -DPERL
DLL -DPERL_CORE -Od -MD -Zi -DDEBUGGING -DPERL_EXTERNAL_GLOB
-DPERL_IS_MINIPER
L -Fomini\utf8.obj -Fdmini\utf8.pdb ..\utf8.c
utf8.c
..\utf8.c(4057) : error C2017​: illegal escape sequence
..\utf8.c(4057) : error C2017​: illegal escape sequence
..\utf8.c(4057) : error C2017​: illegal escape sequence
..\utf8.c(4057) : error C2017​: illegal escape sequence
..\utf8.c(4057) : error C2143​: syntax error : missing ')' before 'string'
..\utf8.c(4057) : error C2017​: illegal escape sequence
..\utf8.c(4057) : error C2017​: illegal escape sequence
..\utf8.c(4057) : error C2017​: illegal escape sequence
..\utf8.c(4057) : error C2059​: syntax error : ')'
..\utf8.c(4057) : error C2059​: syntax error : ')'
..\utf8.c(4057) : error C2059​: syntax error : ')'
..\utf8.c(4057) : error C2017​: illegal escape sequence
..\utf8.c(4057) : fatal error C1013​: compiler limit : too many open
parentheses
GNUmakefile​:1439​: recipe for target 'mini\utf8.obj' failed
------------------------------------

I've attached a code formatted preprocessed version of the C func that
caused the syntax error. and a screen shot of my code highlighter
pointing out the stray \s.

Perl Info
---
Flags:
               category=core
               severity=low
---
Site configuration information for perl 5.27.9:

Configured by Administrator at Tue Jan 30 20:34:30 2018.

Summary of my perl5 (revision 5 version 27 subversion 9) configuration:

             Platform:
               osname=MSWin32
               osvers=5.2.3790
               archname=MSWin32-x86-multi-thread
               uname=''
               config_args='undef'
               hint=recommended
               useposix=true
               d_sigaction=undef
               useithreads=define
               usemultiplicity=define
               use64bitint=undef
               use64bitall=undef
               uselongdouble=undef
               usemymalloc=n
               default_inc_excludes_dot=define
               bincompat5005=undef
             Compiler:
               cc='cl'
               ccflags ='-nologo -GF -W3 -O1 -MD -Zi -DNDEBUG -GL -DWIN32
-D_CONSOLE -DNO_STRICT -D_CRT_SECURE_NO_DEPRECATE
-D_CRT_NONSTDC_NO_DEPRECATE  -DPERL_TEXTMODE_SCRIPTS
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DWIN32_NO_REGISTRY'
               optimize='-O1 -MD -Zi -DNDEBUG -GL'
               cppflags='-DWIN32'
               ccversion='15.00.30729.01'
               gccversion=''
               gccosandvers=''
               intsize=4
               longsize=4
               ptrsize=4
               doublesize=8
               byteorder=1234
               doublekind=3
               d_longlong=undef
               longlongsize=8
               d_longdbl=define
               longdblsize=8
               longdblkind=0
               ivtype='long'
               ivsize=4
               nvtype='double'
               nvsize=8
               Off_t='__int64'
               lseeksize=8
               alignbytes=8
               prototype=define
             Linker and Libraries:
               ld='link'
               ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -ltcg
-libpath:"c:\perl\lib\CORE"        -machine:x86'
               libpth="C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\lib"
               libs=oldnames.lib kernel32.lib user32.lib gdi32.lib
winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
               perllibs=oldnames.lib kernel32.lib user32.lib gdi32.lib
winspool.lib
comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib
netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib
odbc32.lib odbccp32.lib comctl32.lib msvcrt.lib
               libc=msvcrt.lib
               so=dll
               useshrplib=true
               libperl=perl527.lib
               gnulibc_version=''
             Dynamic Linking:
               dlsrc=dl_win32.xs
               dlext=dll
               d_dlsymun=undef
               ccdlflags=' '
               cccdlflags=' '
               lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf
-ltcg
               -libpath:"c:\perl\lib\CORE"        -machine:x86'


---
@INC for perl 5.27.9:
               lib
               C:/p527/srcnew/lib

---
Environment for perl 5.27.9:
               CYGWIN=tty
               HOME (unset)
               LANG (unset)
               LANGUAGE (unset)
               LD_LIBRARY_PATH=/usr/lib/x86:/usr/X11R6/lib
               LOGDIR (unset)
               PATH=C:\WINDOWS\system32;C:\Program Files (x86)\Microsoft
Visual
Studio 9.0\VC\BIN;C:\Program Files\Microsoft
SDKs\Windows\v6.0A\bin;C:\Perl\bin;C:\WINDOWS;C:\Program Files
(x86)\Microsoft Visual Studio 9.0\Common7\IDE;C:\Program Files
(x86)\Git\bin;C:\sp3220\c\bin;
               PERL_BADLANG (unset)
               SHELL (unset)



@p5pRT

This comment has been minimized.

Copy link
Author

commented Apr 8, 2018

From @bulk88

0001-fix-utf8.c-overflowed-VC6-s-preproc-macro-buffer-and.patch
From b94b17165dea31b2a11600bf3353f6c3ac38af6e Mon Sep 17 00:00:00 2001
From: Daniel Dragan <bulk88@hotmail.com>
Date: Sun, 8 Apr 2018 00:19:11 -0400
Subject: [PATCH] fix utf8.c overflowed VC6's preproc macro buffer and syntax
 errored

Only happened with CFG=Debug/-DDEBUGGING. Non-debugging VC6 build not
affected.

cl -c -nologo -GF -W3 -I.\include -I. -I.. -DWIN32 -D_CONSOLE -DNO_STRICT
-DPERLDLL -DPERL_CORE  -Od -MD -Zi -DDEBUGGING  -DPERL_EXTERNAL_GLOB
-DPERL_IS_MINIPERL -Fomini\utf8.obj -Fdmini\utf8.pdb ..\utf8.c
utf8.c
..\utf8.c(4057) : error C2017: illegal escape sequence
..\utf8.c(4057) : error C2017: illegal escape sequence
..\utf8.c(4057) : error C2017: illegal escape sequence
..\utf8.c(4057) : error C2017: illegal escape sequence
..\utf8.c(4057) : error C2143: syntax error : missing ')' before 'string'
..\utf8.c(4057) : error C2017: illegal escape sequence
..\utf8.c(4057) : error C2017: illegal escape sequence
..\utf8.c(4057) : error C2017: illegal escape sequence
..\utf8.c(4057) : error C2059: syntax error : ')'
..\utf8.c(4057) : error C2059: syntax error : ')'
..\utf8.c(4057) : error C2059: syntax error : ')'
..\utf8.c(4057) : error C2017: illegal escape sequence
..\utf8.c(4057) : fatal error C1013: compiler limit : too many open
parentheses
GNUmakefile:1439: recipe for target 'mini\utf8.obj' failed

The VC6 C preprocessor breaks down and messes up the number of \s
in escaping asserts in asserts in asserts to make a double quote string
litteral for an assert message. VC7/VC 2003 doesn't have this problem.

Fix the asserts in asserts by factoring out EIGHT_BIT_UTF8_TO_NATIVE
macro which has asserts inside it from L1_func aka toFOLD_LC which is
another macro that has asserts inside it.

Some adtl details in RT ticket associated with the patch.
---
 utf8.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/utf8.c b/utf8.c
index 94e3267..f7b9638 100644
--- a/utf8.c
+++ b/utf8.c
@@ -3881,13 +3881,12 @@ S_check_and_deprecate(pTHX_ const U8 *p,
         }                                                                    \
     }                                                                        \
     else if UTF8_IS_NEXT_CHAR_DOWNGRADEABLE(p, e) {                          \
+        U8 c = EIGHT_BIT_UTF8_TO_NATIVE(*p, *(p+1));                         \
         if (flags & (locale_flags)) {                                        \
-            result = LC_L1_change_macro(EIGHT_BIT_UTF8_TO_NATIVE(*p,         \
-                                                                 *(p+1)));   \
+            result = LC_L1_change_macro(c);                                  \
         }                                                                    \
         else {                                                               \
-            return L1_func(EIGHT_BIT_UTF8_TO_NATIVE(*p, *(p+1)),             \
-                           ustrp, lenp,  L1_func_extra_param);               \
+            return L1_func(c, ustrp, lenp,  L1_func_extra_param);            \
         }                                                                    \
     }                                                                        \
     else {  /* malformed UTF-8 or ord above 255 */                           \
-- 
2.5.0.windows.1

@p5pRT

This comment has been minimized.

Copy link
Author

commented Apr 8, 2018

From @bulk88

UV
Perl__to_utf8_fold_flags(const U8 * p, const U8 * e, U8 * ustrp, STRLEN * lenp, U8 flags,
  const char *const file, const int line)
{
  UV result;
  const U32 utf8n_flags =
  S_check_and_deprecate(p, &e, 3, ((flags) ? (char) 1 : (char) 0), file, line);

  (void) ((p) || (_assert("p", "..\\utf8.c", 4047), 0));
  (void) ((ustrp) || (_assert("ustrp", "..\\utf8.c", 4047), 0));
  (void) ((file) || (_assert("file", "..\\utf8.c", 4047), 0));

  (void) ((!((flags & 0x1) && (flags & 0x4)))
  ||
  (_assert
  ("! ((flags & FOLD_FLAGS_LOCALE) && (flags & FOLD_FLAGS_NOMIX_ASCII))", "..\\utf8.c", 4050),
  0));

  (void) ((p != ustrp) || (_assert("p != ustrp", "..\\utf8.c", 4052), 0));

  if (flags & (0x1)) {
  do {
  if ((((PL_warn_locale) ? (char) 1 : (char) 0))) {
  Perl__warn_problematic_locale();
  }
  } while (0);
  if (PL_in_utf8_CTYPE_locale) {
  flags &= ~(0x1);
  }
  }
  if (((U64) (((*p) | 0) | 0) < 128)) {
  if (flags & (0x1)) {
  result =
  (((((((*p)) == 0xB5) ? (char) 1 : (char) 0))
  && PL_in_utf8_CTYPE_locale) ? 0x03BC
  : ((void)
  ((!PL_in_utf8_CTYPE_locale || ((*p)) != 0xDF)
  || (_assert("! PL_in_utf8_CTYPE_locale || ((*p)) != 0xDF", "..\\utf8.c", 4055), 0)),
  (!((sizeof((*p)) == 1)
  || !(((U64) (((*p)) | 0)) & ~0xFF)) ? ((*p)) : (PL_in_utf8_CTYPE_locale) ?
  PL_latin1_lc[(U8) ((*p))] : (U8) tolower((U8) ((*p))))));
  } else {
  return Perl__to_fold_latin1(*p, ustrp, lenp, ((flags) & (0x2 | 0x4)));
  }
  } else
  if (((void)
  ((((sizeof(*(p)) == 1) || !(((U64) ((*(p)) | 0)) & ~0xFF)))
  || (_assert("( (sizeof(*(p)) == 1) || !(((U64)((*(p)) | 0)) & ~0xFF))", "..\\utf8.c", 4055),
  0)), (((U8) ((*(p)) | 0)) & 0xfe) == 0xc2) && ((e) - (p) > 1)
  &&
  ((void)
  ((((sizeof(*((p) + 1)) == 1) || !(((U64) ((*((p) + 1)) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*((p)+1)) == 1) || !(((U64)((*((p)+1)) | 0)) & ~0xFF))", "..\\utf8.c", 4055),
  0)), (((U8) ((*((p) + 1)) | 0)) & 0xC0) == 0x80)) {
  if (flags & (0x1)) {
  result =
  (((((((((void)
  ((((void)
  ((((sizeof(*p) == 1) || !(((U64) ((*p) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))", "..\\utf8.c", 4055),
  0)), (((U8) ((*p) | 0)) & 0xfe) == 0xc2))
  ||
  (_assert
  ("((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*p) | 0)) & 0xfe) == 0xc2)",
  "..\\utf8.c", 4055), 0)),
  (void) ((((void)
  ((((sizeof(*(p + 1)) == 1) || !(((U64) ((*(p + 1)) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)), (((U8) ((*(p + 1)) | 0)) & 0xC0) == 0x80))
  ||
  (_assert
  ("((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*(p+1)) | 0)) & 0xC0) == 0x80)",
  "..\\utf8.c", 4055), 0)),
  ((U8)
  (((void)
  ((((sizeof((*(p + 1))) == 1) || !(((U64) (((*(p + 1))) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((((U8) (*p)) & (((2) >= 7) ? 0x00 : (0x1F >> ((2) - 2))))) << 6) |
  ((((U8) ((U8) (*(p + 1))))) & ((U8) ((1U << 6) - 1))))))))) ==
  0xB5) ? (char) 1 : (char) 0))
  && PL_in_utf8_CTYPE_locale) ? 0x03BC
  : ((void)
  ((!PL_in_utf8_CTYPE_locale
  ||
  ((((void)
  ((((void)
  ((((sizeof(*p) == 1) || !(((U64) ((*p) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))", "..\\utf8.c", 4055),
  0)), (((U8) ((*p) | 0)) & 0xfe) == 0xc2))
  ||
  (_assert
  ("((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*p) | 0)) & 0xfe) == 0xc2)",
  "..\\utf8.c", 4055), 0)),
  (void) ((((void)
  ((((sizeof(*(p + 1)) == 1) || !(((U64) ((*(p + 1)) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)), (((U8) ((*(p + 1)) | 0)) & 0xC0) == 0x80))
  ||
  (_assert
  ("((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*(p+1)) | 0)) & 0xC0) == 0x80)",
  "..\\utf8.c", 4055), 0)),
  ((U8)
  (((void)
  ((((sizeof((*(p + 1))) == 1) || !(((U64) (((*(p + 1))) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((((U8) (*p)) & (((2) >= 7) ? 0x00 : (0x1F >> ((2) - 2))))) << 6) |
  ((((U8) ((U8) (*(p + 1))))) & ((U8) ((1U << 6) - 1))))))))) != 0xDF)
  ||
  (_assert
  ("! PL_in_utf8_CTYPE_locale || ((( (void)( (((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*p) | 0)) & 0xfe) == 0xc2)) || (_assert(\"((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\\\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\\",
  \\\"..\\\\utf8.c\\", 4055), 0)),
  (((U8) ((*p) | 0)) & 0xfe) ==
  0xc2) \
  ", \"..\\\\utf8.c\", 4055), 0) ), (void)( (((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*(p+1)) | 0)) & 0xC0) == 0x80)) || (_assert(\"((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\\\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\\",
  \\\"..\\\\utf8.c\\", 4055), 0)),
  (((U8) ((*(p + 1)) | 0)) & 0xC0) ==
  0x80) \
  ", \"..\\\\utf8.c\", 4055), 0) ), ((U8)(((void)( (( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))) || (_assert(\"( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), ((( ((U8) (*p)) & (((2) >= 7) ? 0x00 : (0x1F >> ((2)-2))))) << 6) | ((((U8) ((U8)(*(p+1))))) & ((U8) ((1U << 6) - 1))))))))) != 0xDF",
  "..\\utf8.c", 4055), 0)),
  (!((sizeof
  ((((void)
  ((((void)
  ((((sizeof(*p) == 1) || !(((U64) ((*p) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))", "..\\utf8.c", 4055),
  0)), (((U8) ((*p) | 0)) & 0xfe) == 0xc2))
  ||
  (_assert
  ("((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*p) | 0)) & 0xfe) == 0xc2)",
  "..\\utf8.c", 4055), 0)),
  (void) ((((void)
  ((((sizeof(*(p + 1)) == 1) || !(((U64) ((*(p + 1)) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)), (((U8) ((*(p + 1)) | 0)) & 0xC0) == 0x80))
  ||
  (_assert
  ("((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*(p+1)) | 0)) & 0xC0) == 0x80)",
  "..\\utf8.c", 4055), 0)),
  ((U8)
  (((void)
  ((((sizeof((*(p + 1))) == 1) || !(((U64) (((*(p + 1))) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((((U8) (*p)) & (((2) >= 7) ? 0x00 : (0x1F >> ((2) - 2))))) << 6) |
  ((((U8) ((U8) (*(p + 1))))) & ((U8) ((1U << 6) - 1))))))))) == 1)
  ||
  !(((U64)
  (((((void)
  ((((void)
  ((((sizeof(*p) == 1) || !(((U64) ((*p) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))", "..\\utf8.c", 4055),
  0)), (((U8) ((*p) | 0)) & 0xfe) == 0xc2))
  ||
  (_assert
  ("((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*p) | 0)) & 0xfe) == 0xc2)",
  "..\\utf8.c", 4055), 0)),
  (void) ((((void)
  ((((sizeof(*(p + 1)) == 1) || !(((U64) ((*(p + 1)) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)), (((U8) ((*(p + 1)) | 0)) & 0xC0) == 0x80))
  ||
  (_assert
  ("((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*(p+1)) | 0)) & 0xC0) == 0x80)",
  "..\\utf8.c", 4055), 0)),
  ((U8)
  (((void)
  ((((sizeof((*(p + 1))) == 1) || !(((U64) (((*(p + 1))) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((((U8) (*p)) & (((2) >= 7) ? 0x00 : (0x1F >> ((2) - 2))))) << 6) |
  ((((U8) ((U8) (*(p + 1))))) & ((U8) ((1U << 6) - 1))))))))) | 0)) & ~0xFF))
  ? ((((void)
  ((((void)
  ((((sizeof(*p) == 1) || !(((U64) ((*p) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))", "..\\utf8.c", 4055),
  0)), (((U8) ((*p) | 0)) & 0xfe) == 0xc2))
  ||
  (_assert
  ("((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*p) | 0)) & 0xfe) == 0xc2)",
  "..\\utf8.c", 4055), 0)),
  (void) ((((void)
  ((((sizeof(*(p + 1)) == 1) || !(((U64) ((*(p + 1)) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)), (((U8) ((*(p + 1)) | 0)) & 0xC0) == 0x80))
  ||
  (_assert
  ("((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*(p+1)) | 0)) & 0xC0) == 0x80)",
  "..\\utf8.c", 4055), 0)),
  ((U8)
  (((void)
  ((((sizeof((*(p + 1))) == 1) || !(((U64) (((*(p + 1))) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((((U8) (*p)) & (((2) >= 7) ? 0x00 : (0x1F >> ((2) - 2))))) << 6) |
  ((((U8) ((U8) (*(p + 1))))) & ((U8) ((1U << 6) - 1)))))))))
  : (PL_in_utf8_CTYPE_locale) ?
  PL_latin1_lc[(U8)
  ((((void)
  ((((void)
  ((((sizeof(*p) == 1) || !(((U64) ((*p) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))", "..\\utf8.c",
  4055), 0)), (((U8) ((*p) | 0)) & 0xfe) == 0xc2))
  ||
  (_assert
  ("((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*p) | 0)) & 0xfe) == 0xc2)",
  "..\\utf8.c", 4055), 0)),
  (void) ((((void)
  ((((sizeof(*(p + 1)) == 1)
  || !(((U64) ((*(p + 1)) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((U8) ((*(p + 1)) | 0)) & 0xC0) == 0x80))
  ||
  (_assert
  ("((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*(p+1)) | 0)) & 0xC0) == 0x80)",
  "..\\utf8.c", 4055), 0)),
  ((U8)
  (((void)
  ((((sizeof((*(p + 1))) == 1) || !(((U64) (((*(p + 1))) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((((U8) (*p)) & (((2) >= 7) ? 0x00 : (0x1F >> ((2) - 2))))) << 6) |
  ((((U8) ((U8) (*(p + 1))))) & ((U8) ((1U << 6) - 1)))))))))] : (U8)
  tolower((U8)
  ((((void)
  ((((void)
  ((((sizeof(*p) == 1) || !(((U64) ((*p) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))", "..\\utf8.c",
  4055), 0)), (((U8) ((*p) | 0)) & 0xfe) == 0xc2))
  ||
  (_assert
  ("((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*p) | 0)) & 0xfe) == 0xc2)",
  "..\\utf8.c", 4055), 0)),
  (void) ((((void)
  ((((sizeof(*(p + 1)) == 1) || !(((U64) ((*(p + 1)) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((U8) ((*(p + 1)) | 0)) & 0xC0) == 0x80))
  ||
  (_assert
  ("((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*(p+1)) | 0)) & 0xC0) == 0x80)",
  "..\\utf8.c", 4055), 0)),
  ((U8)
  (((void)
  ((((sizeof((*(p + 1))) == 1) || !(((U64) (((*(p + 1))) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((((U8) (*p)) & (((2) >= 7) ? 0x00 : (0x1F >> ((2) - 2))))) << 6) |
  ((((U8) ((U8) (*(p + 1))))) & ((U8) ((1U << 6) - 1)))))))))))));
  } else {
  return
  Perl__to_fold_latin1(((void)
  ((((void)
  ((((sizeof(*p) == 1) || !(((U64) ((*p) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)), (((U8) ((*p) | 0)) & 0xfe) == 0xc2))
  ||
  (_assert
  ("((void)( (( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*p) == 1) || !(((U64)((*p) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*p) | 0)) & 0xfe) == 0xc2)",
  "..\\utf8.c", 4055), 0)),
  (void) ((((void)
  ((((sizeof(*(p + 1)) == 1)
  || !(((U64) ((*(p + 1)) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((U8) ((*(p + 1)) | 0)) & 0xC0) == 0x80))
  ||
  (_assert
  ("((void)( (( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))) || (_assert(\"( (sizeof(*(p+1)) == 1) || !(((U64)((*(p+1)) | 0)) & ~0xFF))\", \"..\\\\utf8.c\", 4055), 0) ), (((U8)((*(p+1)) | 0)) & 0xC0) == 0x80)",
  "..\\utf8.c", 4055), 0)),
  ((U8)
  (((void)
  ((((sizeof((*(p + 1))) == 1)
  || !(((U64) (((*(p + 1))) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((*(p+1))) == 1) || !(((U64)(((*(p+1))) | 0)) & ~0xFF))",
  "..\\utf8.c", 4055), 0)),
  (((((U8) (*p)) & (((2) >= 7) ? 0x00 : (0x1F >> ((2) - 2))))) << 6)
  | ((((U8) ((U8) (*(p + 1))))) & ((U8) ((1U << 6) - 1))))))), ustrp,
  lenp, ((flags) & (0x2 | 0x4)));
  }
  } else {
  STRLEN len_result;
  result = Perl_utf8n_to_uvchr_msgs(p, e - p, &len_result, 0x10000, 0, 0);
  if (len_result == (STRLEN) - 1) {
  Perl__force_out_malformed_utf8_message(p, e, utf8n_flags, 1);
  };

  result =
  (flags & 0x2) ? S__to_utf8_case(result, p, ustrp, lenp, PL_utf8_tofold, Case_Folding_invmap,
  CF_AUX_TABLE_ptrs, CF_AUX_TABLE_lengths,
  "foldcase") : S__to_utf8_case(result, p, ustrp, lenp,
  PL_utf8_tosimplefold,
  Simple_Case_Folding_invmap,
  ((void *) 0), ((void *) 0),
  "foldcase");

  if (flags & 0x1) {

  if ((((sizeof("\xE1\xBA\x9E") - 1) == (PL_utf8skip[*(const U8 *) (p)]))
  && (memcmp(((char *) p), ("" "\xE1\xBA\x9E" ""), (sizeof("\xE1\xBA\x9E") - 1)) == 0))) {

  Perl_ck_warner((63),
  "Can't do fc(\"\\x{1E9E}\") on non-UTF-8 locale; "
  "resolved to \"\\x{17F}\\x{17F}\".");
  goto return_long_s;
  } else

  if ((((sizeof("\xEF\xAC\x85") - 1) == (PL_utf8skip[*(const U8 *) (p)]))
  && (memcmp(((char *) p), ("" "\xEF\xAC\x85" ""), (sizeof("\xEF\xAC\x85") - 1)) ==
  0))) {

  Perl_ck_warner((63),
  "Can't do fc(\"\\x{FB05}\") on non-UTF-8 locale; "
  "resolved to \"\\x{FB06}\".");
  goto return_ligature_st;
  }

  return S_check_locale_boundary_crossing(p, result, ustrp, lenp);
  } else if (!(flags & 0x4)) {
  return result;
  } else {

  UV original;

  U8 * s = ustrp;
  U8 * e = ustrp + *lenp;
  while (s < e) {
  if (((U64) ((*s) | 0) < 128)) {

  original = Perl_valid_utf8_to_uvchr(p, lenp);

  if (original == 0xDF || original == 0x1E9E

  ) {
  goto return_long_s;
  } else if (original == 0xFB05) {
  goto return_ligature_st;
  }

  ((void)
  (((((((sizeof(size_t) < sizeof(*lenp)
  || sizeof(char) >
  ((size_t) 1 << 8 *
  (sizeof(size_t) - sizeof(*lenp)))) ? (size_t) (*lenp) : ((size_t) -
  1) / sizeof(char)) >
  ((size_t) - 1) / sizeof(char))) ? (char) 1 : (char) 0))
  && (S_croak_memory_wrap(), 0)), (void) ((((void *) (ustrp)) != 0)
  ||
  (_assert
  ("((void*)(ustrp)) != 0", "..\\utf8.c",
  4146), 0)), (void) ((((void *) (p)) != 0)
  ||
  (_assert
  ("((void*)(p)) != 0",
  "..\\utf8.c", 4146),
  0)),
  (void) memcpy((char *) (ustrp), (const char *) (p), (*lenp) * sizeof(char)));
  return original;
  }
  s += PL_utf8skip[*(const U8 *) (s)];
  }

  return result;
  }
  }

  if (((U64) (((result) | 0) | 0) < 128)) {
  *ustrp = (U8) result;
  *lenp = 1;
  } else {
  *ustrp =
  ((void)
  ((((sizeof((U8) result) == 1) || !(((U64) (((U8) result) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((U8) result) == 1) || !(((U64)(((U8) result) | 0)) & ~0xFF))",
  "..\\utf8.c", 4163), 0)), (((void) ((!((U64) (((U8) result) | 0) < 128))
  ||
  (_assert
  ("! ((U64)(((U8) result) | 0) < 128)",
  "..\\utf8.c", 4163), 0)),
  ((U8)
  ((((U8) ((U8) result)) >> 6) |
  (((2) >
  7) ? 0xFF : (0xFF & (0xFE <<
  (7 - (2))))))))));
  *(ustrp + 1) =
  ((void)
  ((((sizeof((U8) result) == 1) || !(((U64) (((U8) result) | 0)) & ~0xFF)))
  ||
  (_assert
  ("( (sizeof((U8) result) == 1) || !(((U64)(((U8) result) | 0)) & ~0xFF))",
  "..\\utf8.c", 4164), 0)), (((void) ((!((U64) (((U8) result) | 0) < 128))
  ||
  (_assert
  ("! ((U64)(((U8) result) | 0) < 128)",
  "..\\utf8.c", 4164), 0)),
  ((U8)
  ((((U8) ((U8) result)) &
  ((U8) ((1U << 6) - 1))) | 0x80)))));
  *lenp = 2;
  }

  return result;

  return_long_s​:

  *lenp = 2 * sizeof("\xC5\xBF") - 2;

  ((void)
  (((((((sizeof(size_t) < sizeof(*lenp)
  || sizeof(U8) >
  ((size_t) 1 << 8 *
  (sizeof(size_t) - sizeof(*lenp)))) ? (size_t) (*lenp) : ((size_t) -
  1) / sizeof(U8)) >
  ((size_t) - 1) / sizeof(U8))) ? (char) 1 : (char) 0))
  && (S_croak_memory_wrap(), 0)), (void) ((((void *) (ustrp)) != 0)
  ||
  (_assert
  ("((void*)(ustrp)) != 0", "..\\utf8.c",
  4179), 0)),
  (void) ((((void *) ("\xC5\xBF" "\xC5\xBF")) != 0)
  ||
  (_assert("((void*)(\"\\xC5\\xBF\" \"\\xC5\\xBF\")) != 0", "..\\utf8.c", 4179),
  0)), (void) memcpy((char *) (ustrp), (const char *) ("\xC5\xBF" "\xC5\xBF"),
  (*lenp) * sizeof(U8)));
  return 0x017F;

  return_ligature_st​:

  *lenp = sizeof("\xEF\xAC\x86") - 1;
  ((void)
  (((((((sizeof(size_t) < sizeof(*lenp)
  || sizeof(U8) >
  ((size_t) 1 << 8 *
  (sizeof(size_t) - sizeof(*lenp)))) ? (size_t) (*lenp) : ((size_t) -
  1) / sizeof(U8)) >
  ((size_t) - 1) / sizeof(U8))) ? (char) 1 : (char) 0))
  && (S_croak_memory_wrap(), 0)), (void) ((((void *) (ustrp)) != 0)
  ||
  (_assert
  ("((void*)(ustrp)) != 0", "..\\utf8.c",
  4187), 0)),
  (void) ((((void *) ("\xEF\xAC\x86")) != 0)
  || (_assert("((void*)(\"\\xEF\\xAC\\x86\")) != 0", "..\\utf8.c", 4187), 0)),
  (void) memcpy((char *) (ustrp), (const char *) ("\xEF\xAC\x86"),
  (*lenp) * sizeof(U8)));
  return 0xFB06;

  }

@p5pRT

This comment has been minimized.

Copy link
Author

commented Apr 8, 2018

@p5pRT

This comment has been minimized.

Copy link
Author

commented Apr 8, 2018

From @khwilliamson

Thanks, applied as 1a75116

EIGHT_BIT_UTF8_TO_NATIVE is not documented because it's too low level to encourage others to use. But the name is supposed to signify that it fits in a U8
--
Karl Williamson

@p5pRT

This comment has been minimized.

Copy link
Author

commented Apr 8, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Author

commented Apr 8, 2018

@khwilliamson - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.