Skip to content

getelementptr is underspecified #4642

@sunfishcode

Description

@sunfishcode
Bugzilla Link 4270
Resolution FIXED
Resolved on Aug 25, 2009 23:27
Version trunk
OS All
CC @lattner,@nlewycky

Extended Description

This paragraph from LangRef.html:

Note that it is undefined to access an array out of bounds: array and pointer indexes must always be within the defined bounds of the array type when accessed with an instruction that dereferences the pointer (e.g. a load or store instruction). The one exception for this rule is zero length arrays. These arrays are defined to be accessible as variable length arrays, which requires access beyond the zero'th element.

raises several questions. I'm working on adding the concept of undefined
integer arithmetic overflow to LLVM, and also GEP expansion, so I'm filling
this bug in order to work towards clarification of the rules.

The first sentence seems to suggest that it's well defined to compute
arbitrary addresses, as long as they are not dereferenced. Especially since
there is no other mention of C's "one-past-the-end" provision, this
sentance seems to take that role by saying that in LLVM IR, addresses
N-past-the-end, or even N-ahead-of-the-beginning, may be computed, for
any N.

However, the second sentence makes a special provision for
zero-length array types. If N-past-the-end addresses are permitted, this
wouldn't really be an exception, but instead just an example of the
standard rule.

Also, there is also a rumor that GEP overflow is intended to be
undefined behavior. This isn't mentioned in LangRef.html, but it's been
heard spoken in a variety of places, and if it's true, it would seem to
rule out N-past-the-end. However in that case, there's nothing guaranteeing
one-past-the-end, which is needed for C support.

So first, assuming %A points to an array of [10 x double], which of the
following instructions are intended to be undefined?
%a = getelementptr double* %A, i64 -1
%b = getelemnetptr double* %A, i64 9223372036854775807
%c = getelementptr double* %A, i64 10

Second, is there anything undefined about this code?

%p3 = getelementptr [3 x [3 x double]]* %p, i64 0, i64 0, i64 3
store double 0.0, double* %p3

(assume %p3 points to sufficient storage)
The last index 3 is outside the bounds implied by the static type
implied by the base pointer and the gep, however the computed address
is within the bounds of the underlying allocated storage.

The following comment from BasicAliasAnalysis.cpp suggests that this
code is valid and that optimizers should handle it correctly:

// We have to be careful here about array accesses. In particular, consider:
// A[1][0] vs A[0][i]
// In this case, we don't know that the array will be accessed in bounds:
// the index could even be negative. Because of this, we have to
// conservatively give up and return may alias. We disregard differing
// array subscripts that are followed by a variable index without going
// through a struct.

Third, if there are any cases where a getelementptr by itself (with no
load or store) is "undefined", is it Undefined Behavior, as in
"demons may fly out your nose", or is it merely that the getlementptr
may return an unspecified result?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions