Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 271 lines (233 sloc) 8.873 kb
a0d0e21 perl 5.000
Larry Wall authored
1 /* regcomp.h
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
2 */
3
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
4 typedef OP OP_4tree; /* Will be redefined later. */
5
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
6 /*
7 * The "internal use only" fields in regexp.h are present to pass info from
8 * compile to execute that permits the execute phase to run lots faster on
9 * simple cases. They are:
10 *
7907280 perl 5.0 alpha 2
Larry Wall authored
11 * regstart sv that must begin a match; Nullch if none obvious
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
12 * reganch is the match anchored (at beginning-of-line only)?
13 * regmust string (pointer into program) that match must include, or NULL
7907280 perl 5.0 alpha 2
Larry Wall authored
14 * [regmust changed to SV* for bminstr()--law]
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
15 * regmlen length of regmust string
16 * [regmlen not used currently]
17 *
18 * Regstart and reganch permit very fast decisions on suitable starting points
19 * for a match, cutting down the work a lot. Regmust permits fast rejection
20 * of lines that cannot possibly match. The regmust tests are costly enough
e50aee7 This is my patch patch.1m for perl5.001.
Andy Dougherty authored
21 * that pregcomp() supplies a regmust only if the r.e. contains something
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
22 * potentially expensive (at present, the only such thing detected is * or +
23 * at the start of the r.e., which can involve a lot of backup). Regmlen is
e50aee7 This is my patch patch.1m for perl5.001.
Andy Dougherty authored
24 * supplied because the test in pregexec() needs it and pregcomp() is computing
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
25 * it anyway.
26 * [regmust is now supplied always. The tests that use regmust have a
27 * heuristic that disables the test if it usually matches.]
28 *
29 * [In fact, we now use regmust in many cases to locate where the search
30 * starts in the string, so if regback is >= 0, the regmust search is never
31 * wasted effort. The regback variable says how many characters back from
32 * where regmust matched is the earliest possible start of the match.
33 * For instance, /[a-z].foo/ has a regmust of 'foo' and a regback of 2.]
34 */
35
36 /*
37 * Structure for regexp "program". This is essentially a linear encoding
38 * of a nondeterministic finite-state machine (aka syntax charts or
39 * "railroad normal form" in parsing technology). Each node is an opcode
40 * plus a "next" pointer, possibly plus an operand. "Next" pointers of
41 * all nodes except BRANCH implement concatenation; a "next" pointer with
42 * a BRANCH on both ends of it is connecting two alternatives. (Here we
43 * have one of the subtle syntax dependencies: an individual BRANCH (as
44 * opposed to a collection of them) is never concatenated with anything
45 * because of operator precedence.) The operand of some types of node is
46 * a literal string; for others, it is a node leading into a sub-FSM. In
47 * particular, the operand of a BRANCH node is the first node of the branch.
48 * (NB this is *not* a tree structure: the tail of the branch connects
49 * to the thing following the set of BRANCHes.) The opcodes are:
50 */
51
52 /*
53 * A node is one char of opcode followed by two chars of "next" pointer.
54 * "Next" pointers are stored as two 8-bit pieces, high order first. The
55 * value is a positive offset from the opcode of the node containing it.
56 * An operand, if any, simply follows the node. (Note that much of the
57 * code generation knows about this implicit relationship.)
58 *
59 * Using two bytes for the "next" pointer is vast overkill for most things,
60 * but allows patterns to get big without disasters.
61 *
e91177e applied patch, with indentation tweaks
Ilya Zakharevich authored
62 * [The "next" pointer is always aligned on an even
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
63 * boundary, and reads the offset directly as a short. Also, there is no
64 * special test to reverse the sign of BACK pointers since the offset is
65 * stored negative.]
66 */
67
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
68 struct regnode_string {
69 U8 flags;
70 U8 type;
71 U16 next_off;
72 U8 string[1];
73 };
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
74
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
75 struct regnode_1 {
76 U8 flags;
77 U8 type;
78 U16 next_off;
79 U32 arg1;
80 };
81
82 struct regnode_2 {
83 U8 flags;
84 U8 type;
85 U16 next_off;
86 U16 arg1;
87 U16 arg2;
88 };
89
9c7e81e make REG_INFTY default to something saner when sizeof(short) > 2
Andy Dougherty authored
90 /* XXX fix this description.
91 Impose a limit of REG_INFTY on various pattern matching operations
92 to limit stack growth and to avoid "infinite" recursions.
93 */
94 /* The default size for REG_INFTY is I16_MAX, which is the same as
95 SHORT_MAX (see perl.h). Unfortunately I16 isn't necessarily 16 bits
96 (see handy.h). On the Cray C90, sizeof(short)==4 and hence I16_MAX is
97 ((1<<31)-1), while on the Cray T90, sizeof(short)==8 and I16_MAX is
98 ((1<<63)-1). To limit stack growth to reasonable sizes, supply a
99 smaller default.
100 --Andy Dougherty 11 June 1998
101 */
102 #if SHORTSIZE > 2
103 # ifndef REG_INFTY
104 # define REG_INFTY ((1<<15)-1)
105 # endif
106 #endif
107
108 #ifndef REG_INFTY
83cfe48 Re: [PATCH 5.004_66] REG_INFTY patch corrected
Ilya Zakharevich authored
109 # define REG_INFTY I16_MAX
110 #endif
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
111
e91177e applied patch, with indentation tweaks
Ilya Zakharevich authored
112 #define ARG_VALUE(arg) (arg)
113 #define ARG__SET(arg,val) ((arg) = (val))
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
114
115 #define ARG(p) ARG_VALUE(ARG_LOC(p))
116 #define ARG1(p) ARG_VALUE(ARG1_LOC(p))
117 #define ARG2(p) ARG_VALUE(ARG2_LOC(p))
118 #define ARG_SET(p, val) ARG__SET(ARG_LOC(p), (val))
119 #define ARG1_SET(p, val) ARG__SET(ARG1_LOC(p), (val))
120 #define ARG2_SET(p, val) ARG__SET(ARG2_LOC(p), (val))
121
122 #ifndef lint
e91177e applied patch, with indentation tweaks
Ilya Zakharevich authored
123 # define NEXT_OFF(p) ((p)->next_off)
124 # define NODE_ALIGN(node)
125 # define NODE_ALIGN_FILL(node) ((node)->flags = 0xde) /* deadbeef */
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
126 #else /* lint */
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
127 # define NEXT_OFF(p) 0
e91177e applied patch, with indentation tweaks
Ilya Zakharevich authored
128 # define NODE_ALIGN(node)
129 # define NODE_ALIGN_FILL(node)
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
130 #endif /* lint */
131
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
132 #define SIZE_ALIGN NODE_ALIGN
133
e91177e applied patch, with indentation tweaks
Ilya Zakharevich authored
134 #define OP(p) ((p)->type)
135 #define OPERAND(p) (((struct regnode_string *)p)->string)
136 #define NODE_ALIGN(node)
137 #define ARG_LOC(p) (((struct regnode_1 *)p)->arg1)
138 #define ARG1_LOC(p) (((struct regnode_2 *)p)->arg1)
139 #define ARG2_LOC(p) (((struct regnode_2 *)p)->arg2)
140 #define NODE_STEP_REGNODE 1 /* sizeof(regnode)/sizeof(regnode) */
141 #define EXTRA_STEP_2ARGS EXTRA_SIZE(struct regnode_2)
142
143 #define NODE_STEP_B 4
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
144
145 #define NEXTOPER(p) ((p) + NODE_STEP_REGNODE)
146 #define PREVOPER(p) ((p) - NODE_STEP_REGNODE)
147
e91177e applied patch, with indentation tweaks
Ilya Zakharevich authored
148 #define FILL_ADVANCE_NODE(ptr, op) STMT_START { \
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
149 (ptr)->type = op; (ptr)->next_off = 0; (ptr)++; } STMT_END
e91177e applied patch, with indentation tweaks
Ilya Zakharevich authored
150 #define FILL_ADVANCE_NODE_ARG(ptr, op, arg) STMT_START { \
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
151 ARG_SET(ptr, arg); FILL_ADVANCE_NODE(ptr, op); (ptr) += 1; } STMT_END
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
152
22c35a8 phase 1 of somewhat major rearrangement of PERL_OBJECT stuff
Gurusamy Sarathy authored
153 #define REG_MAGIC 0234
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
154
3280af2 PL_ prefix to all perlvars, part1
Nick Ing-Simmons authored
155 #define SIZE_ONLY (PL_regcode == &PL_regdummy)
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
156
bbce6d6 [inseparable changes from patch from perl5.003_08 to perl5.003_09]
Perl 5 Porters authored
157 /* Flags for first parameter byte of ANYOF */
158 #define ANYOF_INVERT 0x40
159 #define ANYOF_FOLD 0x20
160 #define ANYOF_LOCALE 0x10
161 #define ANYOF_ISA 0x0F
162 #define ANYOF_ALNUML 0x08
163 #define ANYOF_NALNUML 0x04
164 #define ANYOF_SPACEL 0x02
165 #define ANYOF_NSPACEL 0x01
166
ae5c130 [win32] merge change#664 from maint branch
Gurusamy Sarathy authored
167 /* Utility macros for bitmap of ANYOF */
168 #define ANYOF_BYTE(p,c) (p)[1 + (((c) >> 3) & 31)]
169 #define ANYOF_BIT(c) (1 << ((c) & 7))
170 #define ANYOF_SET(p,c) (ANYOF_BYTE(p,c) |= ANYOF_BIT(c))
171 #define ANYOF_CLEAR(p,c) (ANYOF_BYTE(p,c) &= ~ANYOF_BIT(c))
172 #define ANYOF_TEST(p,c) (ANYOF_BYTE(p,c) & ANYOF_BIT(c))
173
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
174 #define ANY_SKIP ((33 - 1)/sizeof(regnode) + 1)
175
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
176 /*
177 * Utility definitions.
178 */
179 #ifndef lint
2304df6 perl 5.0 alpha 8
Andy Dougherty authored
180 #ifndef CHARMASK
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
181 #define UCHARAT(p) ((int)*(unsigned char *)(p))
182 #else
2304df6 perl 5.0 alpha 8
Andy Dougherty authored
183 #define UCHARAT(p) ((int)*(p)&CHARMASK)
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
184 #endif
185 #else /* lint */
6b88bc9 complete s/foo/PL_foo/ changes (all escaped cases identified with
Gurusamy Sarathy authored
186 #define UCHARAT(p) PL_regdummy
a687059 @TimToady perl 3.0: (no announcement message available)
TimToady authored
187 #endif /* lint */
188
34184a4 Ilya's idea for cleaning up failed regex allocs (substantive parts
Gurusamy Sarathy authored
189 #define FAIL(m) \
cea2e8a more complete support for implicit thread/interpreter pointer,
Gurusamy Sarathy authored
190 STMT_START { \
191 if (!SIZE_ONLY) \
192 SAVEDESTRUCTOR(S_clear_re,(void*)PL_regcomp_rx); \
193 Perl_croak(aTHX_ "/%.127s/: %s", PL_regprecomp,m); \
34184a4 Ilya's idea for cleaning up failed regex allocs (substantive parts
Gurusamy Sarathy authored
194 } STMT_END
195
196 #define FAIL2(pat,m) \
cea2e8a more complete support for implicit thread/interpreter pointer,
Gurusamy Sarathy authored
197 STMT_START { \
198 if (!SIZE_ONLY) \
199 SAVEDESTRUCTOR(S_clear_re,(void*)PL_regcomp_rx); \
200 S_re_croak2(aTHX_ "/%.127s/: ",pat,PL_regprecomp,m); \
34184a4 Ilya's idea for cleaning up failed regex allocs (substantive parts
Gurusamy Sarathy authored
201 } STMT_END
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
202
203 #define EXTRA_SIZE(guy) ((sizeof(guy)-1)/sizeof(struct regnode))
204
d09b2d2 add patch that generates regnodes.h via regcomp.pl
Ilya Zakharevich authored
205 #define REG_SEEN_ZERO_LEN 1
206 #define REG_SEEN_LOOKBEHIND 2
207 #define REG_SEEN_GPOS 4
208 #define REG_SEEN_EVAL 8
209
73c4f7a EXTERN_C declarations for global arrays in various
Gurusamy Sarathy authored
210 START_EXTERN_C
211
d09b2d2 add patch that generates regnodes.h via regcomp.pl
Ilya Zakharevich authored
212 #include "regnodes.h"
213
214 /* The following have no fixed length. char* since we do strchr on it. */
215 #ifndef DOINIT
22c35a8 phase 1 of somewhat major rearrangement of PERL_OBJECT stuff
Gurusamy Sarathy authored
216 EXTCONST char PL_varies[];
d09b2d2 add patch that generates regnodes.h via regcomp.pl
Ilya Zakharevich authored
217 #else
22c35a8 phase 1 of somewhat major rearrangement of PERL_OBJECT stuff
Gurusamy Sarathy authored
218 EXTCONST char PL_varies[] = {
d09b2d2 add patch that generates regnodes.h via regcomp.pl
Ilya Zakharevich authored
219 BRANCH, BACK, STAR, PLUS, CURLY, CURLYX, REF, REFF, REFFL,
a0ed51b @TimToady Here are the long-expected Unicode/UTF-8 modifications.
TimToady authored
220 WHILEM, CURLYM, CURLYN, BRANCHJ, IFTHEN, SUSPEND, CLUMP, 0
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
221 };
d09b2d2 add patch that generates regnodes.h via regcomp.pl
Ilya Zakharevich authored
222 #endif
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
223
d09b2d2 add patch that generates regnodes.h via regcomp.pl
Ilya Zakharevich authored
224 /* The following always have a length of 1. char* since we do strchr on it. */
a0ed51b @TimToady Here are the long-expected Unicode/UTF-8 modifications.
TimToady authored
225 /* (Note that lenght 1 means "one character" under UTF8, not "one octet".) */
d09b2d2 add patch that generates regnodes.h via regcomp.pl
Ilya Zakharevich authored
226 #ifndef DOINIT
22c35a8 phase 1 of somewhat major rearrangement of PERL_OBJECT stuff
Gurusamy Sarathy authored
227 EXTCONST char PL_simple[];
d09b2d2 add patch that generates regnodes.h via regcomp.pl
Ilya Zakharevich authored
228 #else
22c35a8 phase 1 of somewhat major rearrangement of PERL_OBJECT stuff
Gurusamy Sarathy authored
229 EXTCONST char PL_simple[] = {
230 REG_ANY, ANYUTF8, SANY, SANYUTF8, ANYOF, ANYOFUTF8,
a0ed51b @TimToady Here are the long-expected Unicode/UTF-8 modifications.
TimToady authored
231 ALNUM, ALNUMUTF8, ALNUML, ALNUMLUTF8,
232 NALNUM, NALNUMUTF8, NALNUML, NALNUMLUTF8,
233 SPACE, SPACEUTF8, SPACEL, SPACELUTF8,
234 NSPACE, NSPACEUTF8, NSPACEL, NSPACELUTF8,
235 DIGIT, DIGITUTF8, NDIGIT, NDIGITUTF8, 0
c277df4 Jumbo regexp patch applied (with minor fix-up tweaks):
Ilya Zakharevich authored
236 };
237 #endif
238
73c4f7a EXTERN_C declarations for global arrays in various
Gurusamy Sarathy authored
239 END_EXTERN_C
cad2e5a @jhi Integrate with Sarathy.
jhi authored
240
241 typedef struct re_scream_pos_data_s
242 {
243 char **scream_olds; /* match pos */
244 I32 *scream_pos; /* Internal iterator of scream. */
245 } re_scream_pos_data;
246
247 struct reg_data {
248 U32 count;
249 U8 *what;
250 void* data[1];
251 };
252
253 struct reg_substr_datum {
254 I32 min_offset;
255 I32 max_offset;
256 SV *substr;
257 };
258
259 struct reg_substr_data {
260 struct reg_substr_datum data[3]; /* Actual array */
261 };
262
263 #define anchored_substr substrs->data[0].substr
264 #define anchored_offset substrs->data[0].min_offset
265 #define float_substr substrs->data[1].substr
266 #define float_min_offset substrs->data[1].min_offset
267 #define float_max_offset substrs->data[1].max_offset
268 #define check_substr substrs->data[2].substr
269 #define check_offset_min substrs->data[2].min_offset
270 #define check_offset_max substrs->data[2].max_offset
Something went wrong with that request. Please try again.