Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat address-of array subscripts the same way as address-of dereferences #1163

Merged
merged 6 commits into from
Aug 23, 2021

Conversation

kkjeer
Copy link
Contributor

@kkjeer kkjeer commented Aug 18, 2021

Fixes #1148

This PR modifies the type checker so that, if an expression e has type T, then &e[idx] and &idx[e] also have type T. This is similar to the current behavior where, if e has type T, then &*e also has type T.

From the C spec section 6.5.3.2:

Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator.

This is similar to the rules for &*e:

If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue.

Copy link
Contributor

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see this change. I have a question about whether this is Checked C specific or something that should be upstreamed.

@@ -14201,6 +14199,15 @@ QualType Sema::CheckAddressOfOperand(ExprResult &OrigOp, SourceLocation OpLoc) {

CheckAddressOfPackedMember(op);

if (getLangOpts().C99) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this code is not placed where the comment was removed. Why place it here?

Is this a change that is specific to the Checked C version of clang? Or should it be propagated upstream? Put another way, is this a bug in clang? Or is there something specific about Checked C that requires this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was placed here (rather than where the comment was removed) so that it occurs after the check on line 14127 for taking the address of an lvalue. If this code is placed where the comment is removed, then certain tests (e.g. Sema/complex-imag.c) fail due to missing expected errors (they expect errors "cannot take the address of an rvalue of type " to be emitted).

I can add a comment explaining why this code is placed where it is.

To the best of my knowledge, this change isn't Checked C specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code is moved to where the comment was removed, then the Sema/expr-address-of.c test fails. This test expects the following error:

void foo() {
  register int x[10];
  &x[10];              // expected-error {{address of register variable requested}}
}

The "address of register variable requested" error is emitted in the call to diagnoseAddressOfInvalidType on line 14187. If the code in this PR is added before line 14187, then this expected error will not be emitted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code you are adding and I don't think it is needed for the regular C path. Taking the address of a subscript expression e1[e2], where e1 or e2 is a pointer to T, will result in a pointer to T being created with the existing code. The subscript expression will have type T and the final statement will create a pointer to T.

I believe the code is really only needed for the Checked C path. It might be better to put this under a Checked C flag. You could then explain in the comment that the code avoids the unexpected result of &e1[e2] having a different kind of pointer type than the pointer type that is being subscripted. This can happen in unchecked scopes where the & operator is applied to a subscript expression involving a checked pointer.

Copy link
Contributor

@sulekhark sulekhark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code and think the change is only needed for Checked C. I think it would be helpful clarify that.

@@ -14201,6 +14199,15 @@ QualType Sema::CheckAddressOfOperand(ExprResult &OrigOp, SourceLocation OpLoc) {

CheckAddressOfPackedMember(op);

if (getLangOpts().C99) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code you are adding and I don't think it is needed for the regular C path. Taking the address of a subscript expression e1[e2], where e1 or e2 is a pointer to T, will result in a pointer to T being created with the existing code. The subscript expression will have type T and the final statement will create a pointer to T.

I believe the code is really only needed for the Checked C path. It might be better to put this under a Checked C flag. You could then explain in the comment that the code avoids the unexpected result of &e1[e2] having a different kind of pointer type than the pointer type that is being subscripted. This can happen in unchecked scopes where the & operator is applied to a subscript expression involving a checked pointer.

Copy link
Contributor

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kkjeer kkjeer merged commit 4085e87 into master Aug 23, 2021
@kkjeer kkjeer deleted the address-of-array-subscript-type branch August 23, 2021 23:36
arbipher added a commit that referenced this pull request Aug 26, 2021
* Revert "[BoundsWidening] Determine checked scope specifier per statement (#1139)" (#1141)

This reverts commit 980321d.

* Determine checked scopes per statement (#1142)

We introduce a 2-bit field called CheckedScopeSpecifier in the Stmt class.
During parsing when a compound statement is created we iterate the elements
(statements) of the compound statement and set the checked scope specifier for
each element to the checked scope specifier of the compound statement.

We can get the checked scope specifier for a statement by calling the
getCheckedScopeSpecifier method on the statement.

* Update the instructions for upgrade of LLVM/Clang. (#1146)

* Updated the instructions for upgrade of LLVM/Clang.
Also added a new file LLVM-Upgrade-Notes.md to track important
information related to upgrades.

* Fixed typos.

* Addressed review comments.

* Fixed an inadvertent deletion.

* Addressed review comments.

* Incorporated review comments.

* Fixed minor typos.

* Fixed typos.

* Add new flags for available facts analysis

* Add the analysis into the build script and the sema bounds

* Add utility functions to check whether a var is used in a Expr and a BoundsExpr

* Add AbstractFact as a basic available fact;
Add InferredFact and adjust WhereClauseFact to be a subclass of AbstractFact

* Add data structures used in the analysis

* Add print and dump functions

* Add utility functions which are also used by BoundsWideningAnalysis.

* Add other utility functions.

`IsSwitchCaseBlock`: use `dyn_cast_or_null` to cover the null pointer
case.

`ConditionOnEdge`: do not test if there is no edge between
pred to curr since it will only be called if there is an edge.

`GetModifiedVars`: use `TranspareCasts` to bypass some casting.
The feature to deal with membership access and the array indexing is
still TODO.

* Add fact comparision and fact-realted set oerations (contains TODO).

* Add testscases (one covers basic features and the other is converted
from the previous available facts analysis)

* Dataflow analysis: Add statement-based Gen/Kill.

* Dataflow analysis: Add block-edge-based Gen set.

* Dataflow analysis: Add function to compute In and Out set.

* Dataflow analysis: Addworklist algorithm.

* Add desctrutors to release the memory

* Fix: modify the Gen/Kill rules to match the design doc;
It also fixes a bug to visit dead blocks.

* Cleanup comments

* Fix: use the exisiting functions to find a `VarDecl` in an expr

* Change the equal check on fact collections to equal size check

* Update the testcases with the updated Gen/Kill

* Remove debug flag for available facts.

* Use lexco-compare for `EqualityOpFact` and `InferredFact`.

* Add a map to store the comparision results of facts.

* Change the source location of a fact to its near expr.

* Use a dedicated list to collect created facts and clean them finally.

* Verify if an expr contains errors before checking invertibility  (#1154)

The community has introduced a new annotation called "contains-errors" on AST
nodes that contain semantic errors. As a result, after the upgrade of Checked C
sources to LLVM 12 we need to check if an expr contains errors before operating
on the expr. One such place is in InverseUtil::IsInvertible where we need to
check if the input modifying expr contains errors.

* Added containsErrors checks to InverUtil::Inverse

* [BoundsWidening] Handle complex conditionals in bounds widening (#1149)

Support bounds widening in presence of complex conditionals like:
  "if (*p != 0)", "if ((c = *p) == 'a')", etc.

* Don't record temporary equality between expressions such as x and x + 1 in TargetSrcEquality (#1162)

* Add AllowTempEquality parameter to RecordEqualityWithTarget

* Use a ModifiedSameValue variable to determine the return value for UpdateSameValueAfterAssignment

* Rename ModifiedSameValue to RemovedAnyExprs and clean up comments

* Treat address-of array subscripts the same way as address-of dereferences (#1163)

* In CheckAddressOfOperand, add case for address-of array subscripts to C99-specific logic

* Move address-of array subscript check after other checks such as taking the address of an lvalue

* Adjust expected AST output to account for different types of address-of array subscripts

* Restore deleted comment about checking for array subscript expressions

* Add comment explaining the placement of the address-of array subscript logic

* Put &e1[e2] typing rules under a Checked C flag

* Update the available facts analysis.

Co-authored-by: Mandeep Singh Grang <magrang@microsoft.com>
Co-authored-by: Sulekha Kulkarni <Sulekha.Kulkarni@microsoft.com>
Co-authored-by: Katherine Kjeer <6687333+kkjeer@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent behavior with str and &str[0]
3 participants