Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VIP: Specify behavior of uninitialized variables #1493

Closed
charles-cooper opened this issue Jun 21, 2019 · 20 comments · Fixed by #1597
Closed

VIP: Specify behavior of uninitialized variables #1493

charles-cooper opened this issue Jun 21, 2019 · 20 comments · Fixed by #1597

Comments

@charles-cooper
Copy link
Member

charles-cooper commented Jun 21, 2019

Simple Summary

Require newly declared variables to be initialized.

Abstract

Motivation

Uninitialized variables can currently refer to garbage memory (#1476 (comment)). Probably not what people expect.

Specification

option 1) Require variables to be set by the programmer
option 2) if the programmer declares a new variable without setting it, auto set it to a default value of 0

EDIT: the eventual consensus was to choose option 1. See #1493 (comment) for the rationale.

Backwards Compatibility

Programs which don't initialize variables will no longer compile.

Dependencies

Modifies existing behavior of creating new memory variables

Copyright

Copyright and related rights waived via CC0

@jacqueswww
Copy link
Contributor

jacqueswww commented Jun 25, 2019

This shouldn't be happening. I assume this is happening after calls to private functions? (ie. a variable get defined after a private call) :/

@fubuloubu
Copy link
Member

Even if it shouldn't be happening, IMO (1) is the best way to mitigate a mistake in calling code or whatever.

@jacqueswww
Copy link
Contributor

jacqueswww commented Jun 25, 2019

Yes, the compiler should have a flag that adds mstore <var_pos> 0 to memory after a variable is created, the flag is only set after a self call to private occurs.

@fubuloubu
Copy link
Member

fubuloubu commented Jun 25, 2019

That... Seems way more complicated then just enforcing you set a local on definition if it is not set before it's read.

@jakerockland
Copy link
Contributor

My 2¢: Also seems to me that (1) makes it the most explicit as to what is going on / less error prone.

@jacqueswww
Copy link
Contributor

jacqueswww commented Jun 25, 2019

I feel like there are two separate problems here, one is changing how vyper currently works to be more explicit about initial values.
The other is that currently vyper has default assignment of initial values, but after changing to CALL the default values of the EVM are not zero by default in memory, as it was before.

@charles-cooper
Copy link
Member Author

I am leaning towards having the programmer explicitly set it. Then they don't have to depend on what the language decides as the default value. Also, there isn't much reason to not initialize a variable - most of the time, the programmer wants to set it at some point in the function. The main use case I can think of is when a programmer defines a variable, and then the value assigned to it is branch dependent. That's a kind of 'edge case' that we could discuss separately (maybe worth implementing something like the balanced return checker but for setting variables) but in any case it is still clearer to set it to a sentinel value at declaration! As a secondary concern, since declared variables are usually assigned to, it's better performance to skip the compiler generated mstore 0 and go straight to assigning whatever the programmer was planning to assign to it.

@jakerockland
Copy link
Contributor

Yeah definitely are two problems at play @jacqueswww, so we may want to update the issues to more accurately reflect that.

On the note of the first problem though, I think I'd be in favor of a VIP that makes Vyper more explicit about initial values.

That may warrant more discussion though that just fixing the currently broken behavior of how the initial values are supposed to behave.

@jakerockland
Copy link
Contributor

Thinking about this more, fixing the current broken behavior seems higher priority change than removing default initial values, as the later would be non backwards compatible.

Especially given this is explicitly marked to be the case in the docs: https://vyper.readthedocs.io/en/v0.1.0-beta.10/types.html#initial-values

Still, I'd be in favor of a VIP that removes default initial values and requires them to be set by the programmer. 😅

@fubuloubu
Copy link
Member

One thing to note about the suggestion to require the developer to set a variable before reading it: it doesn't have to be at creation of the variable, but it should be set before it's read (aka write once before read). That would allow the use case @CharlesCooper was describing with dependent settings.

@davesque
Copy link
Contributor

davesque commented Jul 1, 2019

@fubuloubu Re: Setting var values before read. That would waste gas in the case that the variable is set to zero intially since all memory is well-defined initially as zero. Also, as @jacqueswww described it to me, if the internal call context storage is working correctly, even uninitialized zero values will be explicitly copied to memory and then restored later when the internal call returns. So maybe it's actually the case that there's a bug in the context storage mechanism for internal calls?

It's a bit hard to tell if anyone is suggesting this, but I wouldn't want to insert extra ops meant to zero-out memory when, as far as we know, the internal call context storage mechanism should be preserving variable values across calls, even if those values were uninitialized zero values. We should instead try to figure out why our mental model of what should be happening isn't matching up with what actually happens.

However, I'm all in favor of requiring that variables syntactically have an initial value. But if that initial value is zero, we shouldn't do any extra work that doesn't need to happen. Having new memory set to zero is a detail of the EVM spec (it's not a Vyper language feature) and is not likely to change.

@jakerockland
Copy link
Contributor

@davesque Yeah that's a good point that if we make a VIP around values needing initial values to be explicitly set by the programmer when implementing we are sure to have a compile-time optimization that doesn't add extra ops if the default value is just 0.

@charles-cooper
Copy link
Member Author

charles-cooper commented Jul 1, 2019

@davesque it has to do with when a private function has a larger frame than the caller, and a variable is declared after the private function call.

@private
def priv(): # frame is > 30 words
    xs: uint256[30]
    for i in range(0, 30):
        xs[i] = 7 # write values to memory

@public
def caller() -> uint256: # frame is < 10 words
    self.priv()
    x: uint256 # looks like should be zero, but it's occupying space that was written to by priv()
    return x # not 0

@davesque
Copy link
Contributor

davesque commented Jul 1, 2019

@charles-cooper @jacqueswww

Okay, yeah that seems correct.

I've been looking into this for an hour or two and here's my current understanding of this:

Variable records are only created as assignment statements are encountered in the code. However, the private call builder just uses whatever variables it finds in the context to copy local vars to the stack and pop them off. For variable assignments with explicit values that occur after private calls, this isn't a problem since the value will always be set at the appropriate location in memory. However, for uninitialized variable assignments, it will just use whatever memory was not written over by the frame recovery which will be junk values from the callee's frame if that frame was larger than the caller's frame.

So it seems like, ideally, the compiler would be aware of all assignment statements in the caller's scope and include them all (including implicit, uninitialized zero values) when copying the caller's frame to the stack. Or, it would have to zero-out memory for uninitialized variables after any private call, because how could it otherwise know whether or not the private call had written beyond the caller's frame in memory? I guess it's a question of whether copying all local vars to the stack (including ones that haven't even been used yet) would be more expensive than zeroing out memory for uninitialized vars that occur after private calls. My guess is that zeroing-out memory would be less expensive.

Sorry if any of this is redundant. Partially, I'm writing this all down here so you guys can tell me if I'm on the right track with my thinking and also just in case any of this is novel to the discussion.

@davesque
Copy link
Contributor

davesque commented Jul 1, 2019

@jacqueswww Is there any point in time at which we would start asking whether or not doing private calls with our own stack discipline is worth it over just using an EVM opcode? I'm not keenly aware of the gas costs involved but just though I'd pose the question.

@charles-cooper
Copy link
Member Author

@davesque there is an issue on this, basically vyper used to use CALL to self but that was too expensive. There might be a couple more options if EIPs 615 or 1380 are merged in the next fork so we want to keep from making too large of an overhaul right now before we have all options at our disposal.

@charles-cooper
Copy link
Member Author

FWIW I really think the best course of action at this stage is to get rid of declaring variables without initialization. Technically it's not fixing the bug but it's simpler for the compiler and cleaner for the user for the reasons detailed above.

@davesque
Copy link
Contributor

davesque commented Jul 1, 2019

@charles-cooper Yeah, I was aware that we used to do it that way. I was more asking if we should ever revert back to using that approach. But I forgot about those EIPs. Thanks!

And yeah I think you're right about requiring explicit assignment. I'm not sure there's a viable alternative. The one exception I can think of is if the assignment is the default value for a type and it occurs in a public call, before any private calls. Then we should be able to safely optimize out the mstore <pos> 0. Otherwise, I'm not sure we can make that assumption.

@jakerockland
Copy link
Contributor

@davesque @charles-cooper thanks for the elaboration/conversation here, it's helped me understand the underlying issue a lot more

Given the above, I agree with your approach @charles-cooper that the faster path to implementation is likely just removing uninitialized variables + making them explicit.

@dHonerkamp
Copy link

Given how unexpected & potentially critical this behaviour is, I'd like to just add a reminder that it is still not reflected in the documentation (https://vyper.readthedocs.io/en/latest/types.html#initial-values). I think a short note would already help a lot

@charles-cooper charles-cooper changed the title Specify behavior of uninitialized variables VIP: Specify behavior of uninitialized variables Sep 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants