New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VIP: Specify behavior of uninitialized variables #1493
Comments
This shouldn't be happening. I assume this is happening after calls to private functions? (ie. a variable get defined after a private call) :/ |
Even if it shouldn't be happening, IMO (1) is the best way to mitigate a mistake in calling code or whatever. |
Yes, the compiler should have a flag that adds mstore <var_pos> 0 to memory after a variable is created, the flag is only set after a self call to private occurs. |
That... Seems way more complicated then just enforcing you set a local on definition if it is not set before it's read. |
My 2¢: Also seems to me that (1) makes it the most explicit as to what is going on / less error prone. |
I feel like there are two separate problems here, one is changing how vyper currently works to be more explicit about initial values. |
I am leaning towards having the programmer explicitly set it. Then they don't have to depend on what the language decides as the default value. Also, there isn't much reason to not initialize a variable - most of the time, the programmer wants to set it at some point in the function. The main use case I can think of is when a programmer defines a variable, and then the value assigned to it is branch dependent. That's a kind of 'edge case' that we could discuss separately (maybe worth implementing something like the balanced return checker but for setting variables) but in any case it is still clearer to set it to a sentinel value at declaration! As a secondary concern, since declared variables are usually assigned to, it's better performance to skip the compiler generated mstore 0 and go straight to assigning whatever the programmer was planning to assign to it. |
Yeah definitely are two problems at play @jacqueswww, so we may want to update the issues to more accurately reflect that. On the note of the first problem though, I think I'd be in favor of a VIP that makes Vyper more explicit about initial values. That may warrant more discussion though that just fixing the currently broken behavior of how the initial values are supposed to behave. |
Thinking about this more, fixing the current broken behavior seems higher priority change than removing default initial values, as the later would be non backwards compatible. Especially given this is explicitly marked to be the case in the docs: https://vyper.readthedocs.io/en/v0.1.0-beta.10/types.html#initial-values Still, I'd be in favor of a VIP that removes default initial values and requires them to be set by the programmer. 😅 |
One thing to note about the suggestion to require the developer to set a variable before reading it: it doesn't have to be at creation of the variable, but it should be set before it's read (aka write once before read). That would allow the use case @CharlesCooper was describing with dependent settings. |
@fubuloubu Re: Setting var values before read. That would waste gas in the case that the variable is set to zero intially since all memory is well-defined initially as zero. Also, as @jacqueswww described it to me, if the internal call context storage is working correctly, even uninitialized zero values will be explicitly copied to memory and then restored later when the internal call returns. So maybe it's actually the case that there's a bug in the context storage mechanism for internal calls? It's a bit hard to tell if anyone is suggesting this, but I wouldn't want to insert extra ops meant to zero-out memory when, as far as we know, the internal call context storage mechanism should be preserving variable values across calls, even if those values were uninitialized zero values. We should instead try to figure out why our mental model of what should be happening isn't matching up with what actually happens. However, I'm all in favor of requiring that variables syntactically have an initial value. But if that initial value is zero, we shouldn't do any extra work that doesn't need to happen. Having new memory set to zero is a detail of the EVM spec (it's not a Vyper language feature) and is not likely to change. |
@davesque Yeah that's a good point that if we make a VIP around values needing initial values to be explicitly set by the programmer when implementing we are sure to have a compile-time optimization that doesn't add extra ops if the default value is just 0. |
@davesque it has to do with when a private function has a larger frame than the caller, and a variable is declared after the private function call. @private
def priv(): # frame is > 30 words
xs: uint256[30]
for i in range(0, 30):
xs[i] = 7 # write values to memory
@public
def caller() -> uint256: # frame is < 10 words
self.priv()
x: uint256 # looks like should be zero, but it's occupying space that was written to by priv()
return x # not 0 |
Okay, yeah that seems correct. I've been looking into this for an hour or two and here's my current understanding of this: Variable records are only created as assignment statements are encountered in the code. However, the private call builder just uses whatever variables it finds in the context to copy local vars to the stack and pop them off. For variable assignments with explicit values that occur after private calls, this isn't a problem since the value will always be set at the appropriate location in memory. However, for uninitialized variable assignments, it will just use whatever memory was not written over by the frame recovery which will be junk values from the callee's frame if that frame was larger than the caller's frame. So it seems like, ideally, the compiler would be aware of all assignment statements in the caller's scope and include them all (including implicit, uninitialized zero values) when copying the caller's frame to the stack. Or, it would have to zero-out memory for uninitialized variables after any private call, because how could it otherwise know whether or not the private call had written beyond the caller's frame in memory? I guess it's a question of whether copying all local vars to the stack (including ones that haven't even been used yet) would be more expensive than zeroing out memory for uninitialized vars that occur after private calls. My guess is that zeroing-out memory would be less expensive. Sorry if any of this is redundant. Partially, I'm writing this all down here so you guys can tell me if I'm on the right track with my thinking and also just in case any of this is novel to the discussion. |
@jacqueswww Is there any point in time at which we would start asking whether or not doing private calls with our own stack discipline is worth it over just using an EVM opcode? I'm not keenly aware of the gas costs involved but just though I'd pose the question. |
@davesque there is an issue on this, basically vyper used to use CALL to self but that was too expensive. There might be a couple more options if EIPs 615 or 1380 are merged in the next fork so we want to keep from making too large of an overhaul right now before we have all options at our disposal. |
FWIW I really think the best course of action at this stage is to get rid of declaring variables without initialization. Technically it's not fixing the bug but it's simpler for the compiler and cleaner for the user for the reasons detailed above. |
@charles-cooper Yeah, I was aware that we used to do it that way. I was more asking if we should ever revert back to using that approach. But I forgot about those EIPs. Thanks! And yeah I think you're right about requiring explicit assignment. I'm not sure there's a viable alternative. The one exception I can think of is if the assignment is the default value for a type and it occurs in a public call, before any private calls. Then we should be able to safely optimize out the |
@davesque @charles-cooper thanks for the elaboration/conversation here, it's helped me understand the underlying issue a lot more Given the above, I agree with your approach @charles-cooper that the faster path to implementation is likely just removing uninitialized variables + making them explicit. |
Given how unexpected & potentially critical this behaviour is, I'd like to just add a reminder that it is still not reflected in the documentation (https://vyper.readthedocs.io/en/latest/types.html#initial-values). I think a short note would already help a lot |
Simple Summary
Require newly declared variables to be initialized.
Abstract
Motivation
Uninitialized variables can currently refer to garbage memory (#1476 (comment)). Probably not what people expect.
Specification
option 1) Require variables to be set by the programmer
option 2) if the programmer declares a new variable without setting it, auto set it to a default value of 0
EDIT: the eventual consensus was to choose option 1. See #1493 (comment) for the rationale.
Backwards Compatibility
Programs which don't initialize variables will no longer compile.
Dependencies
Modifies existing behavior of creating new memory variables
Copyright
Copyright and related rights waived via CC0
The text was updated successfully, but these errors were encountered: