Skip to content
This repository

Code points after \uFFFF #33

Closed
rurban opened this Issue June 21, 2013 · 1 comment

1 participant

Reini Urban
Reini Urban
Owner
rurban commented June 21, 2013

It seems that potion assumes that Unicode code points are 4 hex max:

    sromanov@killdozer ~/mydev/potion $ cat example/unicode.pn
    "I'm snowman - \u 2603\n" print
    "I'm bactrian - \u 1f42b\n" print

bactrian is printed wrong (not displayable here)

\u is sometimes defined to use exactly 4 chars, so I propose to use \U not as in python requiring exact 8 chars, but
in a relaxed way allowing 4 or 5 hex chars.
Theoretically one could also use 6 chars, but this range U+100000 - U+10FFFF (Plane 16 Private Use only)
is not used yet. So I got for 4-5 and take the 6th as char for the next symbol.

\U xxxx or \U xxxxx and if the 5th char is accidently a hex char but should not belong to the unicode char
it will be a incompatible change.

syntax.y:
escU = esc 'U' < hexl hexl hexl hexl hexl? >

The second possibility is to use \u with 4 or 5 chars.

perl gets this right (since 5.8.9, at least), since it has {} delimiters

    sromanov@killdozer ~/mydev/potion $ cat example/unicode.pl
    binmode STDOUT, ":encoding(UTF-8)";
    print "I'm snowman - \x{2603}\n";
    print "I'm bactrian - \x {1f42b}\n";
Reini Urban rurban closed this June 21, 2013
Reini Urban rurban reopened this June 21, 2013
Reini Urban
Owner
rurban commented June 21, 2013

This is missing:
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
See http://www.fileformat.info/info/unicode/utf8.htm

And see http://blogs.perl.org/users/rurban/2013/06/ruby-vs-perl-github-cannot-use-utf8.html why we could not file the original report, as github cannot parse these chars neither.

Reini Urban rurban referenced this issue from a commit in perl11/p2 June 21, 2013
Reini Urban [potion #33] add \Uxxxxx? 5 chars for U+FFFF .. U+10FFFF
parse \U with 4 or 5 hexchars (not 8). fixes issue #33.
919fdc9
Reini Urban rurban closed this September 16, 2013
Reini Urban rurban referenced this issue from a commit October 07, 2013
Reini Urban mass backport from p2-0.0 2013-10-06 r1580 4310cfd (199k patch)
backported from p2 en masse

optional signature args:
  fill in nil/0 for optional unprovided args, or defaults if declared.
  add arity and minargs closure and sig fields and methods
  add X86_ARGO_IMM to the jit to fill in values directly

added method argument typechecks with -DDEBUG
parse \Uxxxxx with 4-5 hexchars for U+FFFF .. U+10FFFF. fixes issue #33
parse messages with embedded . and : within the name (module support)
allow methods named loop if the 3rd ast argument is no block (wanted for
  the debug loop and other event lops)
Report error: 'continue' outside of loop, 'break' outside of loop
Start support for typed messages

new parser errors:
  Invalid utf-8 unicode character (U+D800-U+DFFF)
  Invalid utf-8 unicode character (U+C0,U+C1)
new compiler errors:
  Not enough arguments to %s. Required %d, given %d
  Not enough arguments to %s. Required %d to %d, given %d
  Too many arguments to %s. Allowed %d, given %d

Methods:
  tuple.bsearch returns now false if element not found, not -1
  add tuple.shift
  renamed source.dump to dumpbc
  dump takes now an optional backend arg and passes it to the compile/backend module
  num.step returns now the correct number of steps
  add num.string
  add lobby.can (boolean bind)
  add lobby.print, lobby.say
  string.ord is now utf8 safe

docs: documented all methods and structs
  added doxygen and gtags/htags support
  see make doc docall
  http://perl11.org/potion/html/ and http://perl11.org/potion/ref/

add -d OP_DEBUG/AST_DEBUG framework
  store OP_DEBUG for each new line in the ast in potion_source_debug(),
  add debug ops for each new line in the src, pointing to the ast (in f->debugs)
  store fileno, lineno and line in AST
  add global pn_filenames
  source_file and source_line methods, fileno indexing pn_filenames
  OP_DEBUG does not call the potion_debug repl yet

fix vasprintf for mingw32 and mingw64

PNFile:
  return a catchable Error object not exit with perror.
  check close for EINTR
  enhance file.write: write a binary representation of any primitive obj to
    the file handle: string, number or bool
  add file.print: "print" a stringification of any object to the filehandle

Various:
New PNSource methods: size, file, lineno, line, loc, compile, dumpbc, clone
Add a name field to protos
Add proto_clone (unused)
Support -DDISABLE_CALLCC to support emcc emscripten (compile to js, runs potion in the browser)
Allow empty strlen to potion_find_file
Allow empty file arg to potion_load, use self then
Add global argv tuple
Add potion_filename_find, potion_filename_push, potion_find_file
Changed potion_eval arity
Add potion_utf8char_decode for ord
Document potion_vm_proto usage with optional args
Fixed mingw64 cross-compilation, use absolute -I and -L paths

Internals
---------
gc: stabilize walking inter-ptrs on the stack
    if an invalid object on the stack is detected, (such as ptr->data)
    skip it, (size = 0).
    fixes gc_forward test
    GC various missed object fields

Optimize tuple allocation, overallocate by 3. Add alloc field
Optimized potion_sig_name_at, faster than potion_sig_at
Optimzed bytecode loop via CGOTO computed goto gcc extension by ~3-10%
potion_io_error takes char* now
change potion_type_char file f=>F
add PN_TDECIMAL for double ffi
added potion_type_name, potion_type_error, potion_type_error_want, potion_type_default
added potion_class_type, potion_object_size (unused in GC, not GC-safe)
removed potion_cmd_compile, potion_cmd_exec does all
rearrange Potion_State fields: most used first
rename message to msg
fix lineno counting in the parser
Improve gc and sig tests
added test/closures/optional.pn, test/strings/unicode.pn tests
Use -fno-omit-frame-pointer also for core/internal and core/gc as it uses STACK_DIR,
  which is affected. Note that icc has STACKDIR 1, always uses esp addressing, not ebp.
  icc is broken
Replace __WORDSIZE  by PN_SIZE_T for cross compilers, esp. for x86_64-w64-mingw32-gcc

DEBUG:
  add GC timings to potion -Dv
  gc + asan: linux and darwin reserved stackzone for asan
  omit subsequent nils on source_string
  add DBG_c -Dc compiler messages
  improve potion_dump_stack
faca25d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.