Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: c389b7ec8e
Fetching contributors…

Cannot retrieve contributors at this time

255 lines (178 sloc) 8.905 kb
EEP: 22
Title: Range checking for binaries
Version: $Revision$
Last-Modified: $Date$
Author: Richard A. O'Keefe <ok@cs.otago.ac.nz>
Status: Draft
Type: Standards Track
Erlang-Version: R12B-4
Content-Type: text/plain
Created: 27-Aug-2008
Post-History:
Abstract
A module may request that bit fields be range checked.
Specification
A new directive is added.
-bit_range_check(Wanted).
where Wanted is 'false' or 'true'.
Recall that a segment of a bit string (or binary) has the form
Value [':' Size] ['/' Type_Specifier_List]
where Type_Specifier_List includes such things as 'integer',
'signed', and 'unsigned'. Currently the documentation states
that
"Signedness ... Only matters for matching and when the type
is integer. The default is unsigned."
Combining the Size with the Unit gives a Size_In_Bits.
The on-line Erlang manual does not state in section 6.16 that
in constructing a bit string the bottom Size_In_Bits bits of
an integer are used with the rest quietly ignored, but it is so.
The directive -bit_range_check(false) makes explicit the
programmer's intention that this C-like truncation should happen.
The directive -bit_range_check(true) says that it is a checked
run-time error in
Value:Size/unsigned-integer-unit:1
or constructions otherwise equivalent to it if Value does not
lie in the range 0 <= Value < 2**Size, and it is a checked
run-time error in
Value:Size/signed-integer-unit:1
or constructions otherwise equivalent to it if Value does not
lie in the range -(2**(Size-1)) <= Value < 2**(Size-1).
The error that is raised is like the error that would be raised
for (1//0):Size/Type_Specifier_List except for using 'badrange'
instead of 'badarith'.
The behaviour of integer bit syntax segments in the absence of
a -bit_range_check directive is implementation defined and
subject to change.
The BEAM system is extended with a new instruction or instructions
similar to the existing instruction or instructions for integer
segments but checking the range. The compiler is extended to
generate them for <<...>> expressions in the range of a
-bit_range_check(true) directive.
A -bit_range_check directive may not appear after a bit syntax
pattern or expression or after another -bit_range_check directive.
Motivation
It keeps on coming as an unpleasant surprise to Erlang programmers
that this truncation happens. Quiet destruction of information is
otherwise alien to Erlang: integer arithmetic is unbounded, not
wrapped as in some (but not all) C systems; element/2 doesn't take
indices modulo tuple size but raises an exception if the index is
out of range, and so on.
In any case where the truncation is wanted, an Erlang programmer
can already write
(Value rem 256):unsigned-integer
and the Erlang compiler could notice this and optimise the 'rem'
operation away, so the truncation is not only unusual in Erlang,
it is also unexpected in this particular case.
It is not only unexpected, it removes a chance to find mistakes,
so it would seem to be undesirable.
Edwin Fine asked "How difficult could it be to add optional run-
time checking to detect this condition without a serious risk of
adverse effects on the correctness of Erlang run-time execution?"
Bjšrn Gustavssan replied "it would be better to add optional
support in the compiler to turn on checks (either for an entire
module, or for individual segments of a binary). If someone
writes an EEP, we will consider implementing it."
This is that EEP.
Rationale
The Erlang/OTP team regard the old behaviour as a feature,
and wish to retain it. In particular, they wish modules that
were written expecting the old behaviour to continue to work
(for now) without modification.
One alternative would be to add new syntax, such as having a
new 'checked' specifier, so that
Value/checked-unsigned-integer
would require a value in the range 0..255.
But many Erlang programmers will want to use this as the normal
case, and will not like the safe version being so much more effort
to write than the unsafe version.
It appears that "truncation wanted/not wanted" is not a matter
of this expression or that, but of this programmer or that,
and we can expect that each module will be written by someone
expecting only one behaviour or expecting only the other.
Adding a
-bit_range_check(true).
directive to a module is more work than doing nothing at all,
but programmers who want this behaviour should be able to set up
their editing environment to have this line in their template for
creating new Erlang modules.
There are several questions:
- Should this apply to bit strings as well as integers?
- What should the name of the directive be?
- What should the argument(s) of the directive be?
- Should multiple instances of the directive be allowed in
a module?
Bit strings: Assume X = <<5:3/unsigned-integer-unit:1>>.
Currently, <<X:2/bits>> quietly truncates X. This drops bits
from the right of X, giving <<2:2>>. If this worked the same
as integers, you would expect <<1:2>>. This is certainly
very odd. Since we get truncation on the left and padding on
the left for integers, we naturally expect padding on the
right for bit strings to go with truncation on the right.
But <<X:4/bits>> isn't <<10:4>>, it's a runtime exception.
All very odd indeed. It would certainly be desirable to have
an easy way for the programmer to indicate whether they wanted
truncation on the left or the right and padding on the left or
the right. Perhaps a new built in function
set_bit_size(Bit_String, Desired_Width,
Truncation, Padding, Fill)
Bit_String : a bit string
Desired_Width : a non-negative integer, the width wanted
Truncation: 'left' | 'right' | 'error';
if bit_size(Bit_String) > Desired_Width
truncate on the left/truncate on the right/
report an error
Padding: 'left' | 'right' | 'error';
if bit_size(Bit_String) < Desired_Width
pad on the left/pad on the right/report an error
Fill: 0 | 1 | 'copy';
pad with 0/pad with 1/pad with a copy of the
last bit at the end where padding is done.
However, that idea is only partly baked, and is not part of the
current proposal. As things currently stand, using the bit
syntax and relying on implicit truncation is the simplest way
to extract the leading bits of a bit string.
As long as the name of the directive is intention-revealing,
it doesn't matter very much what it is.
I proposed 'bit_range_check' because it is all about checking,
ranges in bit syntax, but since in this draft it does NOT apply
to bit string segments, perhaps 'bit_integer_range_check' would
be better.
The arguments false and true seem clear enough.
Alternatives would be something like
-bit_integer_range(check).
-bit_integer_range(no_check).
That would be fine too.
Classical Pascal compilers let you do things like
{$I-} (* disable index checks *)
(* code with no index checks *)
{$I+} (* re-enable index checks *)
Allowing multiple -bit_range_check directives in a module could
let you use code written for the old approach inside a module
that otherwise uses the new approach. I don't believe that we
want to encourage that sort of thing: it is MUCH easier when
reading a module if all of it follows the same rule.
It is also easier for an Erlang compiler that expects to be able
to process function definitions in any order. The compiler can
check for one of these directives anywhere in a module before it
handles any bit syntax forms anywhere. However, it is easier for
people reading a module if, when they first see a <<...>>
construction, they have already seen any directive that might
affect what it means.
The restrictions on the number and placement of these directives
can always be relaxed later if necessary.
Backwards Compatibility
All existing Erlang code remains acceptable with unchanged
semantics.
Reference Implementation
None, because I still can't find my way around the compiler.
References
None.
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
Jump to Line
Something went wrong with that request. Please try again.