Fetching contributors…
Cannot retrieve contributors at this time
1365 lines (1094 sloc) 59.6 KB
<title>PerlGuts Illustrated</title>
span.i58, span.i510 {display: none;}
img.i514, div.i514 {text-align: center;}
img.i58, img.i510, img.i514, img.i518 {border: none;}
<script type="text/javascript">
function Toggle(id) {
var ele=document.getElementById(id);
if ('inline') {'none';
} else {'inline';
<body bgcolor="#FFFFFF" text="#000000" link="#000055" vlink="#550000" alink="#000000" topmargin="0">
<h1 align="center" name="top">PerlGuts Illustrated<br><br>
<small>Version 0.49, for perl 5.20 and older</small></h1>
<p>This document is meant to supplement the <i><a href="">perlguts(1)</a></i> manual
page that comes with Perl. It contains commented illustrations of all major internal Perl data structures.
Having this document handy hopefully makes reading the Perl source code easier.
It might also help you interpret the <i><a href="">Devel::Peek</a></i> dumps.
<p>Most of the internal perl structures had been refactored twice, with 5.10 and 5.14. The comparison links and illustrations for 5.8 - 5.20 are now included in this single document, but also available as extra files. 5.10 to 5.12 changes: only <a href="#SvOOK"><i>OOK</i></a>.
<li><a href="index-8.html">illguts for 5.8 and older</a>
<li><a href="index-10.html">illguts for 5.10</a>
<li><a href="index-12.html">illguts for 5.12</a>
<li><a href="index-14.html">illguts for 5.14 - 5.18</a>
<li><a href="index-18.html">illguts since 5.18</a>
<p>The first things to look at are the data structures that represent
Perl data; scalars of various kinds, arrays and hashes. Internally
Perl calls a scalar <i>SV</i> (scalar value), an array <i>AV</i>
(array value) and a hash <i>HV</i> (hash value). In addition it uses
<i>IV</i> for integer value, <i>NV</i> for numeric value (aka double),
<i>PV</i> for a pointer value (aka string value (char*), but 'S' was
already taken), and <i>RV</i> for reference value. The <i>IVs</i> are
further guaranteed to be big enough to hold a <code>void*</code> pointer.
<p>The internal relationship between the Perl data types is really
object oriented. Perl relies on using C's structural equivalence to
help emulate something like C++ inheritance of types. The various
data types that Perl implement are illustrated in this class hierarchy
diagram. The arrows indicate inheritance (IS-A relationships).
<p><a name="svtypes"><center><img src="svtypes.png"></center></a>
<p>As you can see, Perl uses multiple inheritance with <i>SvNULL</i>
(also named just <i>SV</i>) acting as some kind of virtual base class.
All the Perl types are identified by small numbers, and the internal
Perl code often gets away with testing the ISA-relationship between
types with the &lt;= operator. As you can see from the figure above,
this can only work reliably for some comparisons. All Perl data value
objects are tagged with their type, so you can always ask an object
what its type is and act according to this information.
<p>The symbolic <b>SvTYPE</b> names (and associated value) are with <a name="SvTYPE">5.14</a>:
<tr><th>svtype</th> <th>5.20</th> <th>5.14 - 5.18</th> <th>5.10</th> <th>5.6 + 5.8</th></tr>
<tr><td><b>0</b></td> <td><b>SVt_NULL</b></td> <td>SVt_NULL</td> <td>SVt_NULL</td> <td>SVt_NULL</td></tr>
<tr><td><b>1</b></td> <td><b>SVt_IV</b></td> <td>SVt_BIND</td> <td>SVt_BIND</td> <td>SVt_IV</td> </tr>
<tr><td><b>2</b></td> <td><b>SVt_NV</b></td> <td>SVt_IV</td> <td>SVt_IV</td> <td>SVt_NV</td> </tr>
<tr><td><b>3</b></td> <td><b>SVt_PV</b></td> <td>SVt_NV</td> <td>SVt_NV</td> <td>SVt_RV</td> </tr>
<tr><td><b>4</b></td> <td><b>SVt_INVLIST</b></td><td>SVt_PV</td> <td>SVt_RV</td> <td>SVt_PV</td> </tr>
<tr><td><b>5</b></td> <td><b>SVt_PVIV</b></td> <td>SVt_PVIV</td> <td>SVt_PV</td> <td>SVt_PVIV</td> </tr>
<tr><td><b>6</b></td> <td><b>SVt_PVNV</b></td> <td>SVt_PVNV</td> <td>SVt_PVIV</td> <td>SVt_PVNV</td> </tr>
<tr><td><b>7</b></td> <td><b>SVt_PVMG</b></td> <td>SVt_PVMG</td> <td>SVt_PVNV</td> <td>SVt_PVMG</td> </tr>
<tr><td><b>8</b></td> <td><b>SVt_REGEXP</b></td> <td>SVt_REGEXP</td><td>SVt_PVMG</td><td>SVt_PVBM</td> </tr>
<tr><td><b>9</b></td> <td><b>SVt_PVGV</b></td> <td>SVt_PVGV</td> <td>SVt_PVGV</td><td>SVt_PVLV</td> </tr>
<tr><td><b>10</b></td><td><b>SVt_PVLV</b></td> <td>SVt_PVLV</td> <td>SVt_PVLV</td> <td>SVt_PVAV</td> </tr>
<tr><td><b>11</b></td><td><b>SVt_PVAV</b></td> <td>SVt_PVAV</td> <td>SVt_PVAV</td> <td>SVt_PVHV</td> </tr>
<tr><td><b>12</b></td><td><b>SVt_PVHV</b></td> <td>SVt_PVHV</td> <td>SVt_PVHV</td> <td>SVt_PVCV</td> </tr>
<tr><td><b>13</b></td><td><b>SVt_PVCV</b></td> <td>SVt_PVCV</td> <td>SVt_PVCV</td> <td>SVt_PVGV</td> </tr>
<tr><td><b>14</b></td><td><b>SVt_PVFM</b></td> <td>SVt_PVFM</td> <td>SVt_PVFM</td> <td>SVt_PVFM</td> </tr>
<tr><td><b>15</b></td><td><b>SVt_PVIO</b></td> <td>SVt_PVIO</td> <td>SVt_PVIO</td> <td>SVt_PVIO</td> </tr>
<p>In addition to the simple type names already mentioned, the
following names are found in the hierarchy figure: An <i>PVIV</i> value can
hold a string and an integer value. An <i>PVNV</i> value can hold a
string, an integer and a double value. The <i>PVMG</i> is used when
magic is attached or the value is blessed. The
<i>PVLV</i> represents a LValue object.
<i>RV</i> is now a seperate scalar of type <i>SVt_IV</i>.
<i>CV</i> is a code value, which represents a perl
function/subroutine/closure or contains a pointer to an XSUB.
<i>GV</i> is a glob value and <i>IO</i> contains pointers to open
files and directories and various state information about these. The
<i>PVFM</i> is used to hold information on forms.
<i>P5RX</i> was formerly called <i>PVBM</i> for Boyer-Moore
(match information), but contains now regex information.
<i>BIND</i> was a unused placeholder for read-only aliases or VIEW. (#29544, #29642)
<i>INVLIST</i> is an CORE internal inversion list object only, used for faster
utf8 matching, since 5.19.2. Same layout as a PV.
<p>A Perl data object can change type as the value is modified. The SV is
said to be upgraded in this case. Type changes only go down the
hierarchy. (See the sv_upgrade() function in <tt>sv.c</tt>.)
<p>The actual layout in memory does not really match how a typical C++
compiler would implement a hierarchy like the one depicted above.
Let's see how it is done.
In the description below we use field names that match the macros that
are used to access the corresponding field. For instance the
<code>xpv_cur</code> field of the <code>xpvXX</code> structs are
accessed with the <code>SvCUR()</code> macro. The field is referred
to as <b>CUR</b> in the description below. This also match the field
names reported by the <i>Devel::Peek</i> module.
<a name="_SV_HEAD"><h2>_SV_HEAD and struct sv</h2></a>
<p>The simplest type is the "struct sv". It represents the common
structure for a SV, <a href="#gv">GV</a>, <a href="#cv">CV</a>, <a href="#av">AV</a>, <a href="#hv">HV</a>, <a href="#io">IO</a> and P5RX, without any <a href="#svpv">struct
xpv<i>&lt;xx&gt;</i></a> attached to it. It consist of four words, the _SV_HEAD with 3
values and the SV_U union with one pointer.</p>
<b>_SV_HEAD and SV_U union</b>
<p><center><img src="svhead.png"></center>
<!-- <div align="center"><span class="i58" id="svnull-8"><a name="svnull-8" onclick="javascript:Toggle('svnull-8')"><span title="Click to hide">Until 5.8:</span><img class="i58" src="svnull-8.png" alt="svnull 5.8"></a></span>
<span class="i514" id="svhead"><a href="#svnull-8" onclick="javascript:Toggle('svnull-8')"><span title="Click to show other">Since 5.10:<img src="svhead.png" alt="_SV_HEAD 5.10"></span></a></span></div> -->
<p>The first word contains the <b>ANY</b> pointer to the optional body.
All types are implemented by attaching additional data to the ANY pointer,
just the <a href="#svrv">RV</a> not.
<p>The second word is an 32 bit unsigned integer reference counter
(<b>REFCNT</b>) which should tell us how many pointers reference this object.
When Perl data types are created this value is initialized to 1. The
field must be incremented when a new pointer is made to point to it
and decremented when the pointer is destroyed or assigned a different
value. When the reference count reaches zero the object is freed.
<p>The third word contains a <b>FLAGS</b> field and a <b>TYPE</b> field as 32 bit
unsigned integer.
<p>Since 5.10 the fourth and last HEAD word contains the <b>sv_u union</b>, which
contains a pointer to another SV (a RV), the <a href="#sviv">IV</a>
value, the <a href="#svpv">PV</a> string, the <a href="#av">AV</a> svu_array,
a <a href="#he">HE</a> hash or a <a href="#gp">GP</a> struct.
<a name="flags">The TYPE field contains a small number (0-127, mask <code>0xff</code>) that
represents one of the <code>SVt_</code> types shown in the type hierarchy figure
The FLAGS field has room for 24 flag bits (<code>0x00000100-0x80000000</code>),
which encode how various fields of the object should be interpreted,
and other state information. Some flags are just used as
optimizations in order to avoid having to dereference several levels
of pointers just to find that the information is not there.
<p><center><img src="flags.png"></center>
<p>The purpose of the <strong>SvFLAGS</strong> bits are:
<dt> 0x00000100 <b><a name="SVf_IOK">SVf_IOK</a></b> (public integer)
<dd> This flag indicates that the object has a valid public IVX field value.
It can only be set for value type SvIV or subtypes of it.
<br><i>(SVf_IOK was 0x00010000 until 5.10)</i>
<dt> 0x00000200 <b><a name="SVf_NOK">SVf_NOK</a></b> (public number)
<dd> This flag indicates that the object has a valid public NVX field value.
It can only be set for value type SvNV or subtypes of it.
<br><i>(SVf_NOK was 0x00020000 until 5.10)</i>
<dt> 0x00000400 <b><a name="SVf_POK">SVf_POK</a></b> (public string)
<dd> This flag indicates that the object has a valid public PVX, CUR and LEN
field values (i.e. a valid string value).
It can only be set for value type SvPV or subtypes of it.
<br><i>(SVf_POK was 0x00040000 until 5.10)</i>
<dt> 0x00000800 <b><a name="SVf_ROK">SVf_ROK</a></b> (valid reference pointer)
<dd> This flag indicates that the type should be treated as an SvRV
and that the RV field contains a valid reference pointer.
<br><i>(SVf_ROK was 0x00080000 until 5.10)</i>
<dt> 0x00001000 <b><a name="SVp_IOK">SVp_IOK</a></b> (private integer)
<dd> This flag indicates that the object has a valid non-public
IVX field value. It can only be set for value type SvIV or
subtypes of it.
<br><i>(SVp_IOK was 0x01000000 until 5.10)</i>
<p>The private OK flags (SVp_IOK, SVp_NOK, SVp_POK) are used
by the magic system. During execution of a magic callback,
the private flags will be used to set the public flags.
When the callback returns, then the public flags are
cleared. This effectively is used to pass the value to
get/set to/from magic callbacks.
<dt> 0x00002000 <b><a name="SVp_NOK">SVp_NOK</a></b> (private number)
<dd> This flag indicates that the object has a valid non-public NVX
field value, a double float. It can only be set for value type SvNV
or subtypes of it.
<br><i>(SVp_NOK was 0x02000000 until 5.10)</i>
<dt> 0x00004000 <b><a name="SVp_POK">SVp_POK</a></b> (private string)
<dd> This flag indicates that the object has a valid non-public PVX, CUR and LEN
field values (i.e. a valid string value).
It can only be set for value type SvPV or subtypes of it.
<br><i>(SVp_POK was 0x04000000 until 5.10)</i>
<dt> 0x00008000 <b><a name="SVp_SCREAM">SVp_SCREAM</a></b>
<dd> A string SvPV* type has been studied.
<br><i>(SVp_SCREAM was 0x08000000 until 5.10)</i>
<dt> 0x00008000 <b><a name="SVphv_CLONEABLE">SVphv_CLONEABLE</a></b>
<dd> For PVHV (<a href="#stash">stashes</a> only) to clone its objects.
<br><i>(Introduced with 5.8.7)</i>
<dt> 0x00008000 <b><a name="SVpgv_GP">SVpgv_GP</a></b>
<dd> GV has a valid GP.
<br><i>(Introduced with 5.10)</i>
<dt> 0x00008000 <b><a name="SVprv_PCS_IMPORTED">SVprv_PCS_IMPORTED</a></b>
<dd> RV is a proxy for a constant
subroutine in another package. Set the
CvIMPORTED_CV_ON() if it needs to be
expanded to a real GV.
<br><i>(Introduced with 5.8.9)</i>
<dt> 0x00010000 <b><a name="SVf_IsCOW">SVf_IsCOW</a></b>
<dd> copy on write or shared hash key if SvLEN == 0.
<br><i>(Introduced with 5.18. This bit was used for SVs_PADSTALE, SVpad_STATE before)</i>
<dt> 0x00020000 <b><a name="SVs_PADTMP">SVs_PADTMP</a></b>
<dd> in use as tmp
<br><i>(SVs_PADTMP was 0x00000200 from 5.6-5.8)</i>
<dt> 0x00020000 <b><a name="SVs_PADSTALE">SVs_PADSTALE</a></b>
<dd> lexical has gone out of scope
<br><i>(SVs_PADSTALE was 0x00010000 from 5.10-5.14)</i>
<dt> 0x00020000 <b><a name="SVpad_TYPED">SVpad_TYPED</a></b>
<dd> pad name is a typed Lexical
<br><i>(SVpad_TYPED was 0x40000000 in 5.8)</i>
<dt> 0x00040000 <b><a name="SVs_PADMY">SVs_PADMY</a></b>
<dd> in use a "my" variable
<br><i>(SVs_PADMY was 0x00000400 in 5.6-5.8)</i>
<dt> 0x00040000 <b><a name="SVpad_OUR">SVpad_OUR</a></b>
<dd> pad name is "our" instead of "my"
<br><i>(SVpad_OUR was 0x80000000 in 5.6-5.8)</i>
<dt> 0x00080000 <b><a name="SVs_TEMP">SVs_TEMP</a></b>
<dd> string is stealable
<br><i>(SVs_TEMP was 0x00000800 in 5.6-5.8)</i>
<dt> 0x00100000 <b><a name="SVs_OBJECT">SVs_OBJECT</a></b>
<dd> This flag is set when the object is "blessed". It can only be
set for value type SvPVMG or subtypes of it. This flag also
indicates that the STASH pointer is valid and
points to a namespace HV.
<br><i>(SVs_OBJECT was 0x00001000 in 5.6-5.8)</i>
<dt> 0x00200000 <b><a name="SVs_GMG">SVs_GMG</a></b> (Get Magic)
<dd> This flag indicates that the object has a magic <i>get</i> or
<i>len</i> method to be invoked.
It can only be set for value type SvPVMG or subtypes
of it. This flag also indicate that the MAGIC pointer is valid.
Formerly called GMAGICAL.
<br><i>(SVs_GMG was 0x00002000 in 5.6-5.8)</i>
<dt> 0x00400000 <b><a name="SVs_SMG">SVs_SMG</a></b> (Set Magic)
<dd> This flag indicates that the object has a magic <i>set</i> method to
be invoked. Formerly called SMAGICAL.
<br><i>(SVs_SMG was 0x00004000 in 5.6-5.8)</i>
<dt> 0x00800000 <b><a name="SVs_RMG">SVs_RMG</a></b> (Random Magic)
<dd> This flag indicates that the object has any other magical methods
(besides get/len/set magic method) or even methodless magic attached.
<br><i>(SVs_RMG was 0x00008000 in 5.6-5.8)</i>
<p>The SVs_RMG flag (formerly called RMAGICAL) is used mainly
for tied HV and AV (having 'P' magic) and SVs which have magic
<i>clear</i> method. It is used as an optimization to avoid
setting SVs_GMG and SVs_SMG flags for SVs which need to be
marked as MAGICAL otherwise.
Any of SVs_GMG, SVs_SMG and SVs_RMG is called MAGICAL.
<dt> 0x01000000 <b><a name="SVf_FAKE">SVf_FAKE</a></b>
<dd>0: glob or lexical is just a copy<br>
1: SV head arena wasn't malloc()ed<br>
2: in conjunction with <a href="#SVf_READONLY">SVf_READONLY</a>
marked a shared hash key scalar
(SvLEN == 0) or a copy on write
string (SvLEN != 0) until 5.18 which came with a seperate SvIsCOW(sv)<br>
3: For <a href="#cv">PVCV</a>, whether CvUNIQUE(cv)
refers to an eval or once only
[CvEVAL(cv), CvSPECIAL(cv)]<br>
4: On a pad name SV, that slot in the
frame AV is a REFCNT'ed reference
to a lexical from "outside"
<br><i>(SVf_FAKE was 0x00100000 in 5.6-5.8)</i>
<dt> 0x01000000 <b><a name="SVphv_REHASH">SVphv_REHASH</a></b>
<dd>5: On a PVHV, hash values are being
<br><i>(SVphv_REHASH was 0x10000000 in 5.8)</i>
<dt> 0x02000000 <b><a name="SVf_OOK">SVf_OOK</a></b> (Offset OK)
<dd> For a PVHV this means that a hv_aux struct is present after the main array.
This flag indicates that the string has an offset at the beginning.
This flag can only be set for value type SvPVIV
or subtypes of it. It also follows that the IOK (and IOKp) flag must
be off when OOK is on. Take a look at the <a href="#SvOOK"><i>SvOOK</i></a> figure
<br><i>(SVf_OOK was 0x00200000 in 5.6-5.8)</i>
<dt> 0x04000000 <b><a name="SVf_BREAK">SVf_BREAK</a></b>
<dd>REFCNT is artificially low. Used by
SVs in final arena cleanup. Set in S_regtry on PL_reg_curpm, so that
perl_destruct() will skip it
<br><i>(SVf_BREAK was 0x00400000 in 5.6-5.8)</i>
<dt> 0x08000000 <b><a name="SVf_READONLY">SVf_READONLY</a></b>
<dd> This flag indicates that the value of the object may not be
modified. But it also used together with SVf_FAKE and SVf_ROK
for other purposes.
<br><i>(SVf_READONLY was 0x00800000 in 5.6-5.8)</i>
<dt> 0x10000000 <b><a name="SVf_AMAGIC">SVf_AMAGIC</a></b>
<dd>has magical overloaded methods
<dt> 0x20000000 <b><a name="SVphv_SHAREKEYS">SVphv_SHAREKEYS</a></b>
<dd> Only used by HVs when the keys live on a shared string table. See
<a href="hv">HV</a> below.
<dt> 0x20000000 <b><a name="SVf_UTF8">SVf_UTF8</a></b>
<dd>SvPV is UTF-8 encoded.
This is also set on RVs whose overloaded
stringification is UTF-8. This might
only happen as a side effect of SvPV().
<dt> 0x40000000 <b><a name="SVpav_REAL">SVpav_REAL</a></b>
<dd> Free old entries in AVs only. See description of <a href="av">AV</a> below.
<br><i>(Introduced with 5.10)</i>
<dt> 0x40000000 <b><a name="SVphv_LAZYDEL">SVphv_LAZYDEL</a></b>
<dd> Only used by HVs. This is only set true on a PVGV when
it's playing "PVBM", but is tested for on any regular scalar
(anything &lt;= PVLV). See description of <a href="#hv">HV</a>
<dt> 0x40000000 <b><a name="SVpbm_VALID">SVpbm_VALID</a></b>
<dd>Clashes with SVpad_NAME. See description of <a href="#svpvbm">PVBM</a> below.
<br><i>(SVpbm_VALID was 0x80000000 in 5.6-5.8)</i>
<dt> 0x40000000 <b><a name="SVrepl_EVAL">SVrepl_EVAL</a></b>
<dd>Replacement part of s///e
<dt> 0x80000000 <b><a name="SVf_IVisUV">SVf_IVisUV</a></b>
<dd> Use XPVUV instead of XPVIV. For <a href="#sviv">IV</a>s only
(IV, PVIV, PVNV, PVMG, PVGV and maybe PVLV).
<dt> 0x80000000 <b><a name="SVpav_REIFY">SVpav_REIFY</a></b>
<dd> Can become real. For <a href="#svpvav">PVAV</a> only.
<br><i>(Introduced with 5.10)</i>
<dt> 0x80000000 <b><a name="SVphv_HASKFLAGS">SVphv_HASKFLAGS</a></b>
<dd> Keys have flag byte after hash. For <a href="#svpvhv">PVHV</a> only.
<br><i>(Introduced with 5.8.0)</i>
<dt> 0x80000000 <b><a name="SVpfm_COMPILED">SVpfm_COMPILED</a></b>
<dd> FORMLINE is compiled. For <a href="#svpvbm">PVFM</a> only.
<dt> 0x80000000 <b><a name="SVpbm_TAIL">SVpbm_TAIL</a></b>
<dd> PVGV when SVpbm_VALID is true.
Only used by SvPVBMs. See description of <a href="#svpvbm">PVBM</a> below.
<br><i>(SVpbm_TAIL was 0x40000000 in 5.6-5.8)</i>
<dt> 0x80000000 <b><a name="SVprv_WEAKREF">SVprv_WEAKREF</a></b>
<dd> RV upwards. However, SVf_ROK and SVp_IOK are exclusive. For <a href="#svrv">RV</a> only.
<dt> 0x80000000 <b><a name="SVpad_STATE">SVpad_STATE</a></b>
<dd> pad name is a "state" var
<br><i>(SVpad_STATE was 0x00010000 in 5.10-5.14)</i>
<p>The <code>struct sv</code> is common for all variable types in
Perl. In the Perl source code this structure is typedefed to
<i>SV</i>, <i>RV</i>, <i>AV</i>, <i>HV</i>, <i>CV</i>, <i>GV</i>,
<i>IO</i> and <i>P5RX</i>. Routines that can take any type as parameter
will have <code>SV*</code> as parameter. Routines that only work with
arrays or hashes have <code>AV*</code> or <code>HV*</code>
respectively in their parameter list. Likewise for the rest.
<h2><a name="arena">Arena</a></h2>
<p>Since 5.10 SV heads and bodies are allocated in 4K arenas chunks.
Heads need 4 fields, bodies are kept in unequally sized arena sets.
Some types need no body (<i>NULL, IV, RV</i>), and some allocate
only partial bodies with <i>"ghost"</i> fields.</p>
<p><b>PL_sv_arenaroot</b> points to the first reserved SV arena head with
some private arena data, a link to the next arena, some flags,
number of frees slots.<br>
<b>PL_sv_root</b> points to the chained list of free SV head slots.
When this becomes empty a new arena is allocated.</p>
<b>PL_body_arenas</b> is the head of the uneven sized linked-list of body arenas.<br>
<b>PL_body_roots[]</b> contains pointers to the list of free SV bodies per svtype.
<p><center><img src="arena.png"></center>
<h2><a name="svpv">SvPV</a></h2>
<p>A scalar that can hold a string value is called an
<i>SvPV</i>. In addition to the <i>SV</i> struct of SvNULL,
an <i>xpv</i> struct ("body") is allocated and it contains 3-4 fields.
<b>svu_pv</b> was formerly called <b>PVX</b> and before 5.10
it was the first field of xpv. svu_pv/PVX is the pointer to
an allocated char array. All old field names <b>must</b> be
accessed through the old macros, which is called SvPVX().
<b>CUR</b> is an integer giving the current length of the string.
<b>LEN</b> is an integer giving the length of the allocated string.
The byte at (PVX + CUR) should always be '\0' in order to
make sure that the string is NUL-terminated if passed to C
library routines. This requires that LEN is always at least
1 larger than CUR.
<p><center><img src="svpv.png"></center>
<p>The <b>POK</b> flag indicates that the string pointed to by PVX
contains an valid string value. If the POK flag is off and
the ROK flag is turned on, then the PVX field is used as a
pointer to an RV (see <a href="#svrv">SvRV</a> below) and
the struct xpv is unused. An SvPV with both the POK and ROK flags
turned off represents <i>undef</i>. The PVX pointer can
also be NULL when POK is off and no string storage has been
<p>If the string is shared, created by <b>sharepvn</b>, the PVX
is part of a <a href="#hek"><b>HEK</b></a>, i.e. the PVX points to the hek_key of the <tt>struct hek</tt>.
<p>Since 5.18 there is now a seperate <b>IsCOW</b> flag indicating that
the PVX is shared as long as nobody is changing the value. The current
implementation adds a <b>COW_REFCNT</b> byte at the aligned end of the
PVX, which makes it unusable for COW in the static compiler and threads.
It also requires that LEN is always at least 2 larger than CUR to keep
the \0 byte. But beware: shared COWs use SvLEN=0 and set hek_len.
<h2><a name="svpviv">SvPVIV</a> and <a name="svpvnv">SvPVNV</a></h2>
<p>The <i>SvPVIV</i> type is like <i>SvPV</i> but has an additional
field to hold a single integer value called <b>IVX</b> in <b>xiv_u</b>. The <b>IOK</b> flag
indicates if the IVX value is valid. If both the IOK and POK flag is
on, then the PVX will (usually) be a string representation of the same
number found in IVX.
<p><center><img src="svpviv.png"></center>
<p>The <i>SvPVNV</i> type is like <i>SvPVIV</i> but uses the
single <i>double</i> value called NVX in xnv_u.
The corresponding flag is called NOK.
<p><center><img src="svpvnv.png"></center>
<h2><a name="SvOOK">SvOOK</a></h2>
As a special hack, in order to improve the speed of removing characters
from the beginning of a string, the <a href="#SVf_OOK"><i>OOK flag</i></a> is used.
<i>SvOOK_offset</i> used to be stored in SvIVX, but is since 5.12
stored within the first 8 bit (one char) of the buffer.
The PVX, CUR, LEN is adjusted to point within the allocated string instead.
<p><center><img src="ook.png"></center>
<h2><a name="sviv">SvIV</a></h2>
Since 5.10 for a raw IV (without PV) the IVX slot is in the HEAD,
there is no xpviv struct ("body") allocated.
The <i>SvIVX</i> macro abuses SvANY pointer arithmethic to point
to a compile-time calculated negative offset from HEAD-1 to sv_u.svu_iv,
so that PVIV and IV can use the same SvIVX macro.
<p><center><img src="sviv.png"></center>
<h2><a name="svnv">SvNV</a></h2>
Since 5.10 for a raw NV (without PV) the xpvnv struct is not fully allocated,
only the needed body size.
<p><center><img src="svnv.png"></center>
<h2><a name="svrv">SvRV</a></h2>
The <i>SvRV</i> type uses the fourth HEAD word sv_u.svu_rv as pointer to an SV
(which can be any of the SvNULL subtypes), AV or HV.<br>
A SvRV object with ROK flag off represents an undefined value.<br>
The seperate SVt_RV was replaced in 5.12 with <a href="#sviv">SVt_IV</a> and a SVf_ROK flag.
<p><center><img src="svrv.png"></center>
<h2><a name="svpvmg">SvPVMG</a></h2>
<p>Blessed scalars or other magic attached. <i>SvPVMG</i> has two
additional fields; MAGIC and STASH. MAGIC is a pointer to additional
structures that contains callback functions and other data. If the
MAGIC pointer is non-NULL, then one or more of the MAGICAL flags will
be set.
<p>STASH (<b>s</b>ymbol <b>t</b>able h<b>ash</b>) is a pointer to a HV
that represents some namespace/class/package. (That the HV represents a
namespace means that the NAME field of the HV must be non-NULL. See
description of <a href="hv">HVs</a> and <a href="#stash">stashes</a>
The STASH field is set when the value is blessed into a package
(becomes an object). The OBJECT flag will be set when STASH is.
<small><i>(IMHO, this field should really have been named "CLASS".
The GV and CV subclasses introduce their own unrelated fields called
STASH which might be confusing.)</i></small>
<p><center><img src="svpvmg.png"></center>
<p>The field MAGIC points to an instance of <code>struct magic</code>
(typedef'ed as <code>MAGIC</code>). This struct has 8 fields:
<i>moremagic</i> is a pointer to another MAGIC and is used to form a
single linked list of the MAGICs attached to an SV.
<li><i>virtual</i> is a pointer to a struct containing 5-8 function
pointers. The functions (if set) are invoked when the corresponding
action happens to the SV.
<li><i>private</i> is a 16 bit number (U16) not used by Perl.
<li><i>type</i> is a character field and is used to denote which kind
of magic this is. The interpretation of the rest of the fields depend
on the <i>type</i> (actually it is the callbacks attached to
<i>virtual</i> that do any interpretation). There is usually a direct
correspondence between the <i>type</i> field and the <i>virtual</i>
<li><i>flags</i> contains 8 flag bits, where 2 of them are generally used. Bit
2 is the <b>REFCOUNTED</b> flag. It indicates that the <i>obj</i> is assumed to
be an SV and that it's reference count must be decremented when this magic is
freed. Self-referenced magic obj &lt;=&gt; sv have the REFCOUNTED flag not set,
so that on destruction no self-ref'ed loops can appear. The <b>GSKIP</b> flag
indicate that invocation of the magical GET method should be suppressed. Other
flag bits are used depending of the kind of magic.
<li><i>obj</i> is usually a pointer to some SV, <i>SvTIED_obj</i>.
How it is used depends on the kind of magic this is.
<li><i>ptr</i> is usually a pointer to some character <i>MgPV</i> string. How it
is used depends on the kind of magic this is. If the <i>len</i> field
is &gt;= 0, then <i>ptr</i> is assumed to point to a malloced buffer and
will be automatically freed when the magic is.
<li><i>len</i> is usually the length of the character string pointed
to by <i>ptr</i>. How it is used depends on the kind of magic this
The <code>struct magic_state</code> is stored on the global <a
href="#stacks">savestack</a>. <i>mgs_sv</i> points to our magical sv,
and <i>mgs_ss_ix</i> points on the savestack after the saved
<h2><a name="svpvbm">SvPVBM (old)</a></h2>
Since 5.10 <i>SvPVBM</i> are really <i>PVGV</i>s, with the <b>VALID</b> flag set,
and "B" magic attached. Before <i>SvPVBM</i> where <i>SV</i> objects by their own.<p>
<p>The <i>SvPVBM</i> is like <a href="#svpvmg">SvPVMG</a> above.
I uses the <code>xnv_u</code> union for three additional values in <code>xbm_s</code>;
<code>U32 BmPREVIOUS, U8 BmUSEFUL, U8 BmRARE</code>.
The SvPVBM value types are used internally to implement very
fast lookup of the string in PVX using the "Boyer-Moore" algorithm.
They are used by the Perl <code>index()</code> builtin when the search string is a
constant, as well as in the RE engine. The <tt>fbm_compile()</tt>
function turns normal SvPVs into this value type.<p>
<p>A table of 256 elements is appended to the PVX. This table
contains the distance from the end of string of the last occurrence of
each character in the original string. (In recent Perls, the table is
not built for strings shorter than 3 character.) In addition
fbm_compile() locates the rarest character in the string (using
builtin letter frequency tables) and stores this character in the
<i>BmRARE</i> field. The <i>BmPREVIOUS</i> field is set to the location of the
first occurrence of the rare character. <i>BmUSEFUL</i> is incremented
(decremented) by the RE engine when this constant substring (does not)
help in optimizing RE engine access away. If it goes below 0, then
the corresponding substring is forgotten and freed;
<p><center><img src="svpvbm.png"></center>
<p>The extra SvPVBM information and the character distance table is
only valid when the <b>VALID</b> flag is on. A magic structure with
the sole purpose of turning off the VALID flag on assignment, is always
attached to a <i>valid</i> SvPVBM.
<p>The <b>TAIL</b> flag is used to indicate that the search for the SvPVMG
should be <i>tail anchored</i>, i.e. a match should only be considered
at the end of the string (or before newline at the end of the string).
<h2><a name="p5rx">REGEXP (P5RX)</a></h2>
<p>The structures behind the P5RX, the <i>struct regexp</i>, store the compiled and
optimized state of a perl regular expression. New here is support for pluggable
regex engines - the original engine was <a
href="">critized</a> <i>("Thompson NFA
for abnormal expressions would be linear, but does not support backtracking")</i>,
non-recursive execution, and faster trie-structures for alternations.
See <a href="">re::engine::RE2</a>
for the fast DFA implementation without backrefs.</p>
<p>The <i>struct regexp</i> contains the compiled bytecode of the expression,
some meta-information about the regex,
such as the used engine, the precomp and the number of pairs of backreference parentheses.
<i>reg_data</i> contains code and pad pointers for EXEC items in the bytecode.</p>
<p>Since 5.11 the REGEXP is seperate from a PVMG, blessed into the "Regexp" package, with the
SvANY pointing to the struct regexp, and SvPVX pointing to the string representation of the qr//.<br/>
Since 5.17.6 the SvANY ptr is the same as the SvPVX pointer, and the SvPVX pointer (i.e. <tt>sv_u.svu_rx</tt>)
is now used to access the regexp via ReANY().</p>
<p align="center"><img src="mjd-regexp.gif"><br>
<i>Marc Jason Dominus -</i></p>
Nobody so far did a successful freeze/thaw of those internal structures,
but we have Abhijit's <code>re_dup()</code> to clone a regexp,
and we can simply recompile along
PM_SETRE(&pm, CALLREGCOMP(newSVpv($restring), $op->pmflags));
RX_EXTFLAGS(PM_GETRE(&pm)) = $op->reflags;
BTW: Marc-Jason Dominus implemented a debugger for the compiled Rx bytecode
<a href=""></a>.<p>
See <b>perlreguts</b> for some details.
<h2><a name="svpvlv">SvPVLV</a></h2>
The <i>SvPVLV</i> is like <a href="#svpvmg"><i>SvPVMG</i></a> above, but has four additional
fields; TARGOFF, TARGLEN, TARG, TYPE. The typical use is for Perl
builtins that can be used in the LValue context (substr, vec,...).
They will return an SvPVLV value, which when assigned to use magic to
affect the <i>target</i> object, which they keep a pointer to in the TARG
The xiv_u union is used as the GvNAME field, pointing to a namehek.
<p>The TYPE is a character variable. It encodes the kind if LValue
this is. Interpretation of the other LValue fields depend on the TYPE.
The SvPVLVs are (almost) always magical. The magic type will match
the TYPE field of the SvPVLV. The types are:
<blockquote><dl compact>
<dt> <b>'x'</b>
<dd> Type-x LVs are returned by the <code>substr($string,
$offset, $len)</code> builtin.
<dt> <b>'v'</b>
<dd> Type-v LVs are returned by the <code>vec($string,
$offset, $bits)</code> builtin.
<dt> <b>'.'</b>
<dd> Type-. LVs are returned by the <code>pos($scalar)</code> builtin.
<dt> <b>'k'</b>
<dd> Type-k LVs are returned when <code>keys %hash</code> is
used on the left side of the assignment operator.
<dt> <b>'y'</b>
<dd> Type-y LVs are used by auto-vivification (of hash and array
elements) and the foreach array iterator variable.
<dt> <b>'/'</b>
<dd> Used by <i>pp_pushre</i>. <i>(I don't understand this yet.)</i>
<p>The figure below shows an SvPVLV as returned from the
<code>substr()</code> builtin. The first substr parameter (the
string to be affected) is assigned to the TARG field. The substr
offset value goes in the TARGOFF field and the substr length parameter
goes in the TARGLEN field.
<p><center><img src="svpvlv.png"></center>
<p>When assignment to an SvPVLV type occurs, then the value to be
assigned is first copied into the SvPVLV itself (and affects the PVX,
IVX or NVX). After this the magic SET method is invoked, which will
update the TARG accordingly.
<h2><a name="av">AV</a></h2>
<p>An array is in many ways represented similar to strings.
An AV contains all the fields of SvPVMG, but not more.
Some fields of xpvav and sv have been renamed.
ARYLEN uses the MAGIC field, to point to a magic SV
(which is returned when <code>$#array</code> is requested) and is only created on demand.
IVX has become ALLOC, which is a pointer to the allocated array.
PVX in the sv_u has become ARRAY, the direct pointer the the current array start,
CUR has become FILL and LEN has become MAX.
One difference is that the value of FILL/MAX is always one
less than CUR/LEN would be in a SVPV.
The NVX field is unused.
<p>The previous extra FLAGS field in the xpvav has been merged into the sv_flags field.
<p><center><img src="av.png"></center>
<p>The array pointed to by ARRAY contains pointers to any of the
SvNULL subtypes. Usually ALLOC and ARRAY both point to the start of
the allocated array. The use of two pointers is similar to the OOK
hack described <a href="#SvOOK">above</a>. The shift operation can be implemented
efficiently by just adjusting the ARRAY pointer (and FILL/MAX).
Similarly, the pop just involves decrementing the FILL count.
<p>There are only 2 array flags defined:
<dt> <b>SVpav_REAL</b>
<dd> It basically means that all
SVs contained in this array is owned and must have their
reference counters decremented when the reference is removed
from the array. All normal arrays are REAL. For the
<code>stack</code> the REAL flag is turned off.
For <code>@_</code> the REAL flag is initially turned off.
<dt> <b>SVpav_REIFY</b>
<dd> The array is <i>not</i> REAL but should be made REAL if modified.
The <code>@_</code> array will have the REIFY flag turned on.
<h2><a name="hv">HV</a></h2>
<p>Hashes are the most complex of the Perl data types. In addition to
what we have seen above, the very last index in the HE*[] points to a
new xpvhv_aux struct. HVs use <i>HE</i> structs to represent
"hash element" key/value pairs and <i>HEK</i> structs to represent
"hash element keys".
<dt><b>RITER, EITER</b>:
<dd>Those two fields are used to implement a single iterator over the
elements in the hash.
RITER which is an integer index into the array referenced by ARRAY and
EITER which is a pointer to an HE. In order find the next hash
element one would first look at EITER->next and if it turns out to be
NULL, RITER is incremented until ARRAY[RITER] is non-NULL. The
iterator starts out with RITER = -1 and EITER = NULL.<p>
<dd>Until 5.8 NAME was a NUL-terminated string which denotes the fully qualified name of the
name space (aka <i>package</i>). This was one of the few places where
Perl did not allow strings with embedded NULs.<br>
Since 5.10 the value of NAME points to a HEK,
and since 5.14 to one HEK if name_count == 0, or to two HEKs,
where HEK[0] is the effective stash name (HvENAME_HEK_NN)
if name_count > 0 or HEK[1] if name_count < 0.
<dt><b>GvSTASH</b> (until 5.8):
<dd>When the hash represented a name space (<a href="#stash"><i>stash</i></a>)
GvSTASH (formerly called PMROOT) pointed to a node in the Perl syntax
tree. It was used to implement the reset() builtin for REs.<p>
<p>The first few fields of the xpvhv have been renamed in the same way
as for AVs. <b>MAX</b> is the number of elements in ARRAY minus one. (The
size of the ARRAY is required to be a power of 2, since the code that
deals with hashes just mask off the last few bits of the HASH value to
locate the correct HE column for a key: <code>ARRAY[HASH &
MAX]</code>). Also note that ARRAY can be NULL when the hash is empty
(but the MAX value will still be at least 7, which is the minimum
value assigned by Perl.)<br>
The <b>FILL</b> is the number of elements in ARRAY which are not NULL. The
IVX field has been renamed <b>KEYS</b> and is the number of hash elements in
the HASH.
<p><center><img src="hv.png"></center>
<p>The <a name="he"><b>HE</b></a>s are simple structs containing 3 pointers. A pointer to the
next HE, a pointer to the key and a pointer to the value of the given hash
<p>The <a name="hek"><b>HEK</b></a>s are special variable sized structures that store the hash
keys. They contain 4 fields. The computed <i>hash</i> value of the string,
the <i>len</i>gth of the string, <i>len</i>+1 bytes for the
key string itself (including trailing NUL), and a trailing byte for
HEK_FLAGS <i>(since 5.8)</i>.
As a special case, a <i>len</i> value of <code>HEf_SVKEY</code> (-2)
indicate that a pointer to an SV is stored in the HEK instead of a
string. This hack is used for some magical hashes.
<p>In a perfect hash both KEYS and FILL are the same value. This
means than all HEs can be located directly from the pointer in the
ARRAY (and all the he->next pointers are NULL).
<p>The following two hash specific flags are found among the common
SvNULL flags:
<dt> 0x20000000 <b><a href="#SVphv_SHAREKEYS">SVphv_SHAREKEYS</a></b>
<dd> When this flag is set, then the hash will share the HEK structures
with a special hash pointed to by the <code>strtab</code> variable.
This reduce the storage occupied by hash keys, especially when we
have lots of hashes with the same keys.
The SHAREKEYS flag is on by default for newly created HVs.
<center><img src="strtab.png"></center>
What is special with the <code>strtab</code> hash is that the <i>val</i>
field of the HE structs is used as a reference counter for the
HEK. The counter is incremented when new hashes link up this HEK
and decremented when the key is removed from the hashes.
When the reference count reach 0, the HEK (and corresponding HE)
is removed from <code>strtab</code> and the storage is freed.
<dt> 0x40000000 <b><a href="#SVphv_LAZYDEL">SVphv_LAZYDEL</a></b>
<dd>This flag indicates that the hash element pointed to by EITER is
really deleted. When you delete the current hash element, perl
only marks the HV with the LAZYDEL flag, and when the iterator
is advanced, then the element is zapped. This makes it possible
to delete elements in a hash while iterating over it.
<h2><a name="gv">GV</a></h2>
<The <i>GV</i> ("glob value" aka "symbol") shares the same structure as the <i>SvPVMG</i>.
<p>The <a href="#gp">GP</a> is a pointer to structure that holds pointers to data of
various kinds. Perl use a pointer, instead of including the GP fields
in the xpvgv, in order to implement the proper glob aliasing
behavior (i.e. different GVs can share the same GP).
<p>The NAMEHEK denotes the unqualified name of this symbol
and GvSTASH points to the symbol table where this symbol
belongs. The fully qualified symbol name is obtained by
taking the NAME of the GvSTASH (see <a href="#hv">HV</a>
above) and appending "::" and NAME to it. The hash pointed
to by GvSTASH will usually contain an element with NAME as
key and a pointer to this GV as value. See description of
<a href="#stash">stashes</a> below.
<p>A magic of type '*' is always attached to the GV (not shown in the
figure). The magic GET method is used to stringify the globs (as the fully
qualified name prefixed with '*'). The magic SET method is used to alias
an GLOB based on the name of another glob.
<p><center><img src="gv.png"></center>
<p><a name="GvFLAGS"><b>GvFLAGS</b></a>:
<dt>0x1) <b>INTRO</b>
<dt>0x2) <b>MULTI</b>
<dd> Have we seen more than one occurrence of this glob. Used to
implement the "possibly typo" warning.
<dt>0x4) <b>ASSUMECV</b>
The GV is most likely a CV.
<dt>0x8) <b>IN_PAD</b>
With ithreads new GVs are created temporary on the PAD, and not as global SV.
<dt>0x10) <b>IMPORTED_SV</b>
<dt>0x20) <b>IMPORTED_AV</b>
<dt>0x40) <b>IMPORTED_HV</b>
<dt>0x80) <b>IMPORTED_CV</b>
<h3><a name="gp">GP</a></h3>
<p>GPs can be shared between one or more GVs. The data type fields
for the GP are: SV, IO, FORM, AV, HV, CV. These hold a pointer to the
corresponding data type object. (The SV must point to some simple SvNULL
subtype (i.e. with type &lt;= SVt_PVLV). The FORM field must point to a
SvPVFM if non-NULL. The IO field must point to an IO if non-NULL, the AV
to an AV, etc.) The SV is always present (but might point to a
SvNULL object). All the others are initially NULL.
<p>The additional administrative fields in the GP are: CVGEN, REFCNT, EGV,
<p>REFCNT is a reference counter. It says how many GVs have a pointer
to this GP. It is incremented/decremented as new GVs reference/forget
this GP. When the counter reach 0 the GP is freed.
<p>EGV, the "effective gv", if *glob, is a pointer to the GV that
originally created this GP (used to tell the real name of any aliased
symbol). If the original GV is freed, but GP should stay since
another GV reference it, then the EGV is NULLed.
<p>CVGEN is an integer used to validate method cache CV entries in the
GP. If CVGEN is zero, then the CV is real. If CVGEN is non-zero, but
less than the global variable <tt>subgeneration</tt>, then the CV
contains a stale method cache entry. If CVGEN is equal to
<tt>subgeneration</tt> then the CV contains a valid method cache
Every time some operation that might invalidate some of the
method caches are performed, then the <tt>subgeneration</tt> variable
is incremented.
<p>FILE_HEK is the name of the file where this symbol was first created.
<p>LINE is the corresponding line number in the file.
<h3><a name="stash">Stashes</a></h3>
GVs and stashes work together to implement the name spaces of Perl.
Stashes are named HVs with all the element values being pointers to
GVs. The root of the namespace is pointed to by the global variable
<p>In the figure below we have simplified the representation of
stashes to a single box. The text in the blue field is the NAME of
the HV/stash. The hash elements keys are shown as field names and the
element values are shown as a pointers to globs (GV). The GVs are
also simplified to a single box. The text in the green field in the
fully qualified name of the GV. Only the GP data fields are shown (and
FORM has been eliminated because it was not 2 letters long :-).
<p>The figure illustrates how the scalar variables <code>$::foo</code>
and <code>$foo::bar::baz</code> are represented by Perl.
<p><center><img src="stash.png"></center>
<p>All resolution of qualified names starts with the stash pointed to
by the <code>defstash</code> variable. Nested name spaces are
implemented by a stash entry with a key ending in "<code>::</code>".
The entry for "<code>main::</code>" ensures that <code>defstash</code> is also
known as "<code>main</code>" package (and has the side-effect that the
"<code>main::main::main</code>" package is <code>defstash</code> too.)
Unqualified names are resolved starting at <code>curstash</code> or
<code>curcop-&gt;cop_stash</code> which are influenced by the
<code>package</code> declaration in Perl.
<p>As you can see from this figure, there are lots of pointers to
dereference in order to look up deeply nested names. Each stash
is at least 4 levels deep and each glob is 3 levels, giving at least
24 pointer dereferences to access the data in the
<code>$foo::bar::baz</code> variable from <code>defstash</code>.
<p>The <code>defstash</code> stash is also a place where globs
representing source files are entered. These entries are prefixed
with "<code>_&lt;</code>". The FILEGV field of the GP points to the
same glob as the corresponding "<code>_&lt;</code>" entry in
<code>defstash</code> does.
<h2><a name="cv">CV</a></h2>
The <i>CV</i> ("code value") is like <a href="#svpvmg"><i>SvPVMG</i></a> above, but has
some renamed and additional fields; CvSTASH, START, ROOT, GV, FILE,
<p><center><img src="cv.png"></center>
The <code>CvSTASH</code> is a pointer to the <a href="#stash">stash</a> in which the CV was <i>compiled</i>.<p>
<code>START</code> and <code>ROOT</code> point to the start and the root of the compiled op tree for this function.<p>
DEPTH and <a href="#pad">PADLIST</a> are needed to access and check the
current scratchpad.
Lexicals are accessed by the OP->targ index into the PADLIST.
<p>See <a href="#pad">PAD</a>s and <a href="#op">OP</a>s below.
<h2><a name="svpvfm">SvPVFM</a></h2>
The <i>SvPVFM</i> is like <a href="#vc"><i>CV</i></a> above, but adds a single field
called LINES.
<p><center><img src="svpvfm.png"></center>
<a name="io"><h2>IO</h2></a>
The <i>IO</i> is like <a href="#svpvmg"><i>SvPVMG</i></a> above, but has quite a few
additional fields.
<p><center><img src="io.png"></center>
<dt>1 IOf_ARGV this fp iterates over ARGV
<dt>2 IOf_START check for null ARGV and substitute '-'
<dt>4 IOf_FLUSH this fp wants a flush after write op
<dt>8 IOf_DIDTOP just did top of form
<dt>16 IOf_UNTAINT consider this fp (and its data) "safe"
<dt>32 IOf_NOLINE slurped a pseudo-line from empty file
<dt>64 IOf_FAKE_DIRP xio_dirp is fake (source filters kludge)
<a name="pad"><h2>PAD</h2></a>
<p>A <code>PAD</code> is a list (AV) of elements for Perl variables for
each subroutine. PADs ("Scratchpads") are used by Perl to store
lexical variables, op targets and constants. Every <code>TARG</code>
argument for on OP (see below) is a index into the <code>PAD</code>,
and each recursion level has its own <code>PAD</code>.</p>
<p><center><img src="pad.png"></center></p>
<p>Each new sub creates a <code>PADLIST</code> of length 1, which points
to current PAD, the <code>PL_curpad</code>, indexed by
<code>TARG</code>. The 0'th entry of the <code>CvPADLIST</code> is an
AV which represents the "names" or rather the "static type
information" for lexicals.</p>
<p>The <code>CvDEPTH</code>'th entry of <code>CvPADLIST</code> AV is an
AV which is the stack frame at that depth of recursion into the
CV. The 0'th slot of a frame AV is an AV which is
<code>@_</code>. Other entries are storage for variables and op
targets, the scratchpads.</p>
<p>During compilation is simplified scratchpad is used. The current
<code>PL_comppad</code> is just a PAD which holds the <code>TARG</code>
variables directly, without indirection which is needed for run-time
recursion and threading.
<p>During compilation: <code>PL_comppad_name</code> is set to the names
AV, the declared type information. <code>PL_comppad</code> is set to
the frame AV for the frame <code>CvDEPTH == 1</code>.
<code>PL_curpad</code> is set to the body of the frame AV
(i.e. <code>AvARRAY(PL_comppad)</code>).<br> During
execution, PL_<code>comppad</code> and <code>PL_curpad</code> refer to
the live frame of the currently executing sub.</p>
<p>Since 5.18 PADLISTs are refcounted, with a seperate <code>struct padlist</code>,
not depictured here yet.</p>
<p>Lexicals (my and our variables) have <code>SVs_PADMY</code> /
<code>SVs_PADOUR</code> set, and targets have <code>SVs_PADTMP</code>
set. A <code>SVs_PADTMP</code> (targets/GVs/constants) has a
<code>&PL_sv_undef</code> name, as they are looked up by the TARG index,
only <code>SVs_PADMY</code> / <code>SVs_PADOUR</code> get valid names.</p>
<h2><a name="op" href="op.html">OP</a></h2>
A Perl program/subroutine is represented internally by a syntax tree built from OP nodes.
This tree really is just a linked list of ops in <i>exec</i> order.
Perl 5.005 had 346 different OP-codes, Perl 5.16 has 372 OP-Codes, see <tt>opnames.h</tt>.
Each op represents a <tt>pp_<i>opname</i>()</tt> function. Note that some <tt>pp_</tt> functions
are just macros, several opcodes share the same function.<br>
In Perl there are 12 different OP classes, that are related like the following
class hierarchy diagram shows:
<p><center><img src="optypes.png"></center>
<p><center><img src="op1.png"></center>
<p><center><img src="op2.png"></center>
<p>A typical small optree for <code>$a = $b + 42</code> would be:
<center><img src="opsample.png"> <img src="opsamp2.png"></center>
<table><tr valign="top"><td><pre>
$ perl-nonthreaded -MO=Concise -e '$a = $b + 42'
8 &lt;@&gt; leave[1 ref] vKP/REFC -&gt;(end)
1 &lt;0&gt; enter -&gt;2
2 &lt;;&gt; nextstate(main 1 -e:1) v:{ -&gt;3
7 &lt;2&gt; sassign vKS/2 -&gt;8
5 &lt;2&gt; add[t1] sK/2 -&gt;6
- &lt;1&gt; ex-rv2sv sK/1 -&gt;4
3 &lt;$&gt; gvsv(*b) s -&gt;4
4 &lt;$&gt; const(IV 42) s -&gt;5
- &lt;1&gt; ex-rv2sv sKRM*/1 -&gt;7
6 &lt;$&gt; gvsv(*a) s -&gt;7
</pre><i>(Note: ex-ops are Nullified)</i></td><td><pre>
$ perl-nonthreaded -MO=Concise,-exec -e '$a = $b + 42'
1 &lt;0&gt; enter
2 &lt;;&gt; nextstate(main 1 -e:1) v:{
3 &lt;$&gt; gvsv(*b) s
4 &lt;$&gt; const(IV 42) s
5 &lt;2&gt; add[t1] sK/2
6 &lt;$&gt; gvsv(*a) s
7 &lt;2&gt; sassign vKS/2
8 &lt;@&gt; leave[1 ref] vKP/REFC
We have two BINOPs, SASSIGN and ADD as &lt;2&gt; and three SVOPs, GVSV
and CONST as &lt;$&gt;. <i>Note that for a threaded perl the GVSV OPs
would have been PADOPs.</i>
A SVOP pushes a SV onto the stack. A BINOP takes two args from the
stack, and pushes a result.
<b>B::Concise Types</b>:<br>
<table cellpadding="3"><tr valign="top"><td><pre>
S scalar
L list
A array value
H hash value
C code value
F file value
R scalar reference
0 baseop
1 unop
2 binop
| logop
@ listop
/ pmop
$ svop_or_padop
# padop
" pvop_or_svop <!-- " -->
{ loop
; cop
% baseop_or_unop
- filestatop
} loopexop
parsed op_flags:
v Want void
s Want scalar (single value)
l Want list of any length
K Kids
P Parens, or block needs explicit scope entry
M MOD. Will modify (lvalue)
S Stacked. Some arg is arriving on the stack
* Special. Do something weird for this op
static %opflags
m needs stack mark
f fold constants
s always produces scalar
t needs target scalar
T ... which may be lexical
i always produces integer
I has corresponding int op
d danger, unknown side effects
u defaults to $_
A word on <b>cop.cop_warnings</b>:
The numeric value of lexical warnings can be special: 0, 1 or 2 and is then stored in
pointer pointed to by cop_warnings. cop_warnings may also hold a string buffer of a bitmask of warning categories. Since 5.10 this string buffer is in the second word of cop_warnings, the length is then stored in the first word. (i.e a pascal string) <!-- no picture yet -->
<p><i>For syntax trees and OP codes also see
<a href=""></a>
<a href=""></a>.</i>
<h2><a name="stacks">Stacks</a></h2>
During compilation and runtime Perl use various stacks to manage itself and the
program running. Several data stacks (variable scope and subroutine arguments),
and also code context stacks (block context).<br>
<h3><a name="scope">Scope</a></h3>
The first three data stacks implement <b>scopes</b>, including variables and
values which are restored (or actions to be performed) when the scope is left.
<p>The <b><code>scopestack</code></b> pushes the <code>savestack_ix</code>
when <code>ENTER</code> is executed. On <code>LEAVE</code> the top
<code>savestack_ix</code> entry is popped and all things saved on the
<code>savestack</code> since this is restored. This means that a
<code>ENTER/LEAVE</code> pairs represents dynamic nestable scopes.
<p>The <b><code>savestack</code></b> contains records of things saved in
order to be restored when the scopes are left. Each record consist of
2-4 ANY elements. The first one is a type code, which is used to
decide how long the record is and how to interpret the other elements.
(In the figure the type codes are marked pinkish color.) The
restoring involves updating memory locations of various types as well
as more general callbacks (destructors).
<p>The <b><code>tmps_stack</code></b> implement mortal SVs. Each time a new
mortal is made, then <code>tmps_ix</code> is incremented and the
corresponding entry in <code>tmps_stack</code> made to point to it.
When <code>SAVETMPS</code> is executed, then the old
<code>tmps_floor</code> value is saved on the <code>savestack</code> and
then <code>tmps_floor</code> is set equal to <code>tmps_ix</code>.
When <code>FREETMPS</code> is executed, then all SVs pointed to by the
pointers between <code>tmps_floor</code> and <code>tmps_ix</code> will
have their REFCNT decremented. How many this will be depend on how
many scopes has been left. Note that the <code>tmps_floor</code> and
<code>tmps_ix</code> values is the index of the last SV* pushed. They
both start out as -1 when the stack is empty.
<p><center><img src="scope.png"></center>
<h3>The @_ stack</h3>
<p>The next two stacks handle the arguments passed to subroutines, also the return values.
<p><a name="curstack">The first one</a> is simply denoted as <b>the stack</b>
and is really an AV. The variable <b><code>curstack</code></b> points to this AV. To
speed up access Perl also maintain direct pointers to the start
(<code>stack_base</code>) and the end (<code>stack_max</code>) of the allocated
ARRAY of this AV. This AV is so special that it is marked as not REAL and the FILL
field is not updated. Instead we use a dedicated pointed called
<code>stack_sp</code>, the stack pointer. The stack is used to pass arguments
to PP operations and subroutines and is also the place where the result of these
operations as well as subroutine return values are placed.
<p>The <a name="markstack"><b><code>markstack</code></b></a> is used to indicate the
extent of the stack to be passed as @_ to Perl subroutines. When a subroutine
is to be called, then first the start of the arguments are marked by pushing the
<code>stack_sp</code> offset onto <code>markstack</code>, then the arguments
themselves are calculated and pushed on the stack. Then the <code>@_</code>
array is set up with pointers the SV* on the stack between the <code>MARK</code>
and <code>stack_sp</code> and the subroutine starts running. For XSUB routines,
the creation of <code>@_</code> is suppressed, and the routine will use the
<code>MARK</code> directly to find it's arguments.
<!--<p>pre-5.10: <i>The <code>retstack</code> contains pointers to the operation to go
to after subroutines return. Each time a subroutine is called a new
OP* is pushed on this stack. When a subroutine returns, Perl pops the
top OP* from <code>retstack</code> and continues execution from this
<p><center><img src="stack.png"></center>
<h3><a name="context">Context</a></h3>
<p>The <a name="cxstack"><b><code>cxstack</code></b></a> for <em>context stack</em>
contains <code>cx</code> records that describe the current block context. Each time a
subroutine, an eval, a loop, a format block or given/when block is entered, then
a new PERL_CONTEXT cx record is pushed on the <code>cxstack</code>. When the
context block finished at any LEAVE* op, then the top record is pop'ed and the
corresponding values restored.<p> A cxstack record, the cx, is either
a block context or subst context. A block context has a common header of size 6
and shares then structs for sub, format, eval, loop or given/when contexts also
of size 6. The subst context is of size 12.
<p><center><img src="context.png"></center>
<h2><a name="sub">sub</a></h2>
The context setup for a Perl or XS subroutine does at <b>entersub</b>:
cx->blk_sub.retop = PL_op->op_next;
/* push args */
/* call sub */
and at <b>leavesub</b>
/* pop return value(s) */
POPSUB(cx,sv); /* release CV and @_ ... */
PL_curpm = newpm; /* ... and pop $1 et al */
return cx->blk_sub.retop;
The <em>ENTER/LEAVE</em> pair handles the scope- and savestack.<p>
The <em>PUSHBLOCK/POPBLOCK</em> pair handles the cxstack header of the current
context, the special <code>blk_sub</code> values are handled in the subsequent
SUB calls.<br> PUSHBLOCK arguments are the type and stack, the
<code>POPBLOCK</code> return value <code>newpm</code> is the
<code>cx->blk_oldpm</code>, which was <code>PL_curpm</code> at
entry. <code>PUSHBLOCK</code> increments <code>cxstack_ix</code>, <code>POPBLOCK</code> does
decrement it.<p>
The <em>PUSHSUB/POPSUB</em> pair handles the <code>cx->blk_sub</code> record
from the very same <code>cxstack</code>, the <code>POPSUB</code> return value
<code>sv</code> is the <code></code> which was the <code>cv</code>
from <code>PUSHSUB</code>. <code>POPSUB</code> also releases <code>@_</code>,
the <code>blk_sub.argarray</code>.</p>
<h2><a name="eval">eval</a></h2>
<p><center><img src="eval.png"></center>
<p>An eval call is similar to a sub call. The <b>evaltry</b> and <b>eval</b> op for
<code>eval{}</code> and <code>eval ""</code> just pack the op sequence into a
simple try/catch switch between <code>JMPENV_PUSH</code> and
<code>JMPENV_POP</code> calls.
<p>The <b>struct jmpenv</b> packages the state
required to perform a proper non-local jump, <b>top_env</b> being the initial
JMPENV record. In case of abnormal exceptions (i.e. die) a
<code>JMPENV_JUMP</code> must be done, a non-local jump out to the previous
JMPENV level with a proper <em>setjmp</em> record.
<div align="right">
<i>&copy; 1998-1999 Gisle Aas. 2009,2010,2012,2013,2014 Reini Urban</i><br>
<a href="">&lt;;</a><br>
<a href="">&lt;;</a><br>
$Date: 2014-06-12 10:39:18 rurban$