Skip to content

Commit

Permalink
[tools] typetables: fix gen_charset_tables.pl and regenerate
Browse files Browse the repository at this point in the history
Removes defunct and since 2010 unused Parrot_ascii_typetable.
Adds \v to CCLASS_NEWLINE manually (confirmed),
\x85\xa0 confirmed to be now in the whitespace cclass, but
several old systems fail with the \xa0 (non-breaking whitespace)
test for whitespace.
Several chars 160..191 are not in the [[:punct:]] class anymore.

Added bootstrap-tables make target, update the tables automatically.
Improved src/string/encoding/tables.c pod.
Closes PR #1087
  • Loading branch information
Reini Urban committed Oct 5, 2014
1 parent 391b79b commit 1463167
Show file tree
Hide file tree
Showing 5 changed files with 64 additions and 37 deletions.
22 changes: 14 additions & 8 deletions ChangeLog
@@ -1,16 +1,22 @@
2014-10-21 release 6.9.0
- Core
+ Add platform encoding functions for darwin, FreeBSD, OpenBSD and NetBSD,
fixing rakudo problems with UTF-8 locales. [GH #1092]
+ Enable trap op (int3) on x86_64 also
+ Add const to env api STRING args
+ Add platform encoding functions for darwin, FreeBSD, OpenBSD and NetBSD,
fixing rakudo problems with UTF-8 locales. [GH #1092]
+ Enable trap op (int3) on x86_64 also
+ Add const to env api STRING args
+ Regenerated and fixed the iso_8859_1_typetable. Add \x2028, \x2029 and \v
to be of cclass newline. [GH #1086, perl6 RT #122341].
Several chars 160..191 are not in the [[:punct:]] class anymore.
Removed defunct Parrot_ascii_typetable, unused since 2010.
- Build
+ Fix all -Wshadow instances
+ Fix all -Wshadow instances
+ Added bootstrap-tables make target
- Documentation
+ Improved src/string/encoding/tables.c pod.
- Tests
+ Run fulltests with the runcore=fast,-O1,-O2 fast, without -D040, --gc-debug [GH #1086]
+ Simplify smolder resend usage
+ Fix mingw issues with \r\n
+ Run fulltests with the runcore=fast,-O1,-O2 fast, without -D040, --gc-debug [GH #1086]
+ Simplify smolder resend usage
+ Fix mingw issues with \r\n
- Community

2014-09-16 release 6.8.0
Expand Down
8 changes: 8 additions & 0 deletions config/gen/makefiles/root.in
Expand Up @@ -864,6 +864,8 @@ help :
@echo " bootstrap-ops: Generate C code from .ops files. Requires already built parrot."
@echo " bootstrap-nci: Generate C code for NCI. Requires already built parrot."
@echo " bootstrap-prt0: Generate prt0.pir. Requires already built parrot."
@echo " bootstrap-tables: Generate src/string/encoding/tables.[ch]."
@echo " bootstrap-namealias: Generate src/string/namealias.c via gperf."
@echo ""
@echo "Release:"
@echo " release: Create a tarball."
Expand Down Expand Up @@ -903,6 +905,12 @@ $(RUN_INC_DIR)/config.fpmc : myconfig config_lib.pir \
$(MINIPARROT) -I$(RUN_INC_DIR) config_lib.pir > $@
@$(ADDGENERATED) "$@" "[]"

# Check the generated tables before submitting updates.
bootstrap-tables \
src/string/encoding/tables.h \
src/string/encoding/tables.c: tools/dev/gen_charset_tables.pl
$(PERL) tools/dev/gen_charset_tables.pl

bootstrap-prt0: $(WINXED) $(FRPTWO_DIR)/prt0.winxed
$(WINXED) --noan -c $(FRPTWO_DIR)/prt0.winxed

Expand Down
29 changes: 14 additions & 15 deletions src/string/encoding/tables.c
@@ -1,16 +1,15 @@
/*
* Copyright (C) 2005-2011, Parrot Foundation.
/* ex: set ro ft=c: -*- buffer-read-only:t -*-
* !!!!!!! DO NOT EDIT THIS FILE !!!!!!!
*
* This file is generated automatically from 'tools/dev/gen_charset_tables.pl'.
*
* Generate the 8-bit character set classification table for
* en_US.iso88591. Unicode is managed by icu.
*
* DO NOT EDIT THIS FILE DIRECTLY!
* please update the tools/dev/gen_charset_tables.pl script instead.
* Convenient definitions are: WHITESPACE, WORDCHAR, PUNCTUATION, DIGIT.
* See F<include/parrot/cclass.h> for all.
*
* Created by gen_charset_tables.pl 19534 2007-07-02 02:12:08Z petdance
* Overview:
* This file contains various charset tables.
* Data Structure and Algorithms:
* History:
* Notes:
* References:
* Copyright (C) 2005-2014, Parrot Foundation.
*/

/* HEADERIZER HFILE: none */
Expand Down Expand Up @@ -38,10 +37,10 @@ const INTVAL Parrot_iso_8859_1_typetable[256] = {
0x0200, 0x0200, 0x0200, 0x0200, 0x0200, 0x0200, 0x0200, 0x0200, /* 136-143 */
0x0200, 0x0200, 0x0200, 0x0200, 0x0200, 0x0200, 0x0200, 0x0200, /* 144-151 */
0x0200, 0x0200, 0x0200, 0x0200, 0x0200, 0x0200, 0x0200, 0x0200, /* 152-159 */
0x04e0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, /* 160-167 */
0x04c0, 0x04c0, 0x28c4, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, /* 168-175 */
0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x28c6, 0x04c0, 0x04c0, /* 176-183 */
0x04c0, 0x04c0, 0x28c4, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, /* 184-191 */
0x0160, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, /* 160-167 */
0x04c0, 0x04c0, 0x28c6, 0x04c0, 0x04c0, 0x04c0, 0x04c0, 0x04c0, /* 168-175 */
0x04c0, 0x04c0, 0x00c0, 0x00c0, 0x04c0, 0x28c6, 0x04c0, 0x04c0, /* 176-183 */
0x04c0, 0x00c0, 0x28c6, 0x04c0, 0x00c0, 0x00c0, 0x00c0, 0x04c0, /* 184-191 */
0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, /* 192-199 */
0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, /* 200-207 */
0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x28c5, 0x04c0, /* 208-215 */
Expand Down
15 changes: 10 additions & 5 deletions src/string/encoding/tables.h
@@ -1,10 +1,15 @@
/*
* Copyright (C) 2005-2010, Parrot Foundation.
/* ex: set ro ft=c: -*- buffer-read-only:t -*-
* !!!!!!! DO NOT EDIT THIS FILE !!!!!!!
*
* This file is generated automatically from 'tools/dev/gen_charset_tables.pl'.
*
* Generate the 8-bit character set classification table for
* en_US.iso88591. Unicode is managed by icu.
*
* DO NOT EDIT THIS FILE DIRECTLY!
* please update the tools/dev/gen_charset_tables.pl script instead.
* Convenient definitions are: WHITESPACE, WORDCHAR, PUNCTUATION, DIGIT.
* See F<include/parrot/cclass.h> for all.
*
* This file contains various charset tables.
* Copyright (C) 2005-2014, Parrot Foundation.
*/

/* HEADERIZER HFILE: none */
Expand Down
27 changes: 18 additions & 9 deletions tools/dev/gen_charset_tables.pl
Expand Up @@ -15,7 +15,11 @@ =head1 SYNOPSIS
=head1 DESCRIPTION
Generate character set tables.
Generate the 8-bit character set classification table for
en_US.iso88591. Unicode is managed by icu.
Convenient definitions are: WHITESPACE, WORDCHAR, PUNCTUATION, DIGIT.
See F<include/parrot/cclass.h> for all.
=cut

Expand All @@ -26,7 +30,7 @@ =head1 DESCRIPTION
* Local variables:
* c-file-style: "parrot"
* End:
* vim: expandtab shiftwidth=4:
* vim: expandtab shiftwidth=4 cinoptions='\:2=2' :
*/
EOF

Expand All @@ -35,17 +39,22 @@ =head1 DESCRIPTION
#
my %table = (
'en_US.iso88591' => 'Parrot_iso_8859_1_typetable',
'POSIX' => 'Parrot_ascii_typetable',
# 'POSIX' => 'Parrot_ascii_typetable', # (removed 2010)
);

my $header = <<"HEADER";
/*
* Copyright (C) 2005-2010, Parrot Foundation.
/* ex: set ro ft=c: -*- buffer-read-only:t -*-
* !!!!!!! DO NOT EDIT THIS FILE !!!!!!!
*
* This file is generated automatically from '$0'.
*
* Generate the 8-bit character set classification table for
* en_US.iso88591. Unicode is managed by icu.
*
* DO NOT EDIT THIS FILE DIRECTLY!
* please update the $0 script instead.
* Convenient definitions are: WHITESPACE, WORDCHAR, PUNCTUATION, DIGIT.
* See F<include/parrot/cclass.h> for all.
*
* This file contains various charset tables.
* Copyright (C) 2005-2014, Parrot Foundation.
*/
/* HEADERIZER HFILE: none */
Expand Down Expand Up @@ -77,7 +86,7 @@ sub classify {
$ret |= 0x0200 if $chr =~ /^[[:cntrl:]]$/; # CCLASS_CONTROL
$ret |= 0x0400 if $chr =~ /^[[:punct:]]$/; # CCLASS_PUNCTUATION
$ret |= 0x0800 if $chr =~ /^[[:alnum:]]$/; # CCLASS_ALPHANUMERIC
$ret |= 0x1000 if $chr =~ /^[\n\r\f\x85]$/; # CCLASS_NEWLINE
$ret |= 0x1000 if $chr =~ /^[\n\r\f\v\x85]$/; # CCLASS_NEWLINE
$ret |= 0x2000 if $chr =~ /^[[:alnum:]_]$/; # CCLASS_WORD

return $ret;
Expand Down

0 comments on commit 1463167

Please sign in to comment.