Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array literals should be convertible to both dynamically-sized and statically-sized arrays #11879

Open
chriseth opened this issue Sep 1, 2021 · 27 comments
Assignees
Labels
annoys users 😢 high effort A lot to implement but still doable by a single person. The task is large or difficult. high impact Changes are very prominent and affect users or the project in a major way. language design :rage4: Any changes to the language, e.g. new features selected for development It's on our short-term development
Projects

Comments

@chriseth
Copy link
Contributor

chriseth commented Sep 1, 2021

We could implement this using a new type for array literals and the codegen could store the elements in memory while reserving a memory slot for the length (and just not use it for the static case).

@cameel cameel added feature language design :rage4: Any changes to the language, e.g. new features labels Sep 1, 2021
@cameel cameel added this to New issues in Solidity via automation Sep 1, 2021
@chriseth chriseth moved this from New issues to Design Backlog in Solidity Sep 2, 2021
@chriseth
Copy link
Contributor Author

chriseth commented Sep 2, 2021

This could also solve the weird type deduction for array literals: Currently, [1,2,3] is a uint8[3] and uint[3] t = [1,2,3]; is invalid. If we do not assign a "proper" array type to the array literal expression right away, but rather keep it more similar to how tuple types currently work, this should be much easier to use.

@ekpyron
Copy link
Member

ekpyron commented Sep 2, 2021

We don't need to (nor want to?) allow naming these types, do we?
Then it would only be usable in contexts in which we could deduce whether it will end up dynamically or statically sized, so code generation would already know and be able to handle the cases separately without reserving slots and the like.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 2, 2021

That is true, but we would need to pass that through the type checker, which is not always easy to do.

@ekpyron
Copy link
Member

ekpyron commented Sep 2, 2021

At one point or another, the type checker needs to check if the array literal expression is convertible to the type it'll end up in and can just annotate that then, can't it?
It'd even consistently work for plain [1,2,3]; or even for [f(), g(), h()];, which wouldn't be annotated by any concrete array type, meaning that nothing needs to be stored anywhere and the subexpressions can just be evaluated and the results discarded in codegen...

@chriseth
Copy link
Contributor Author

chriseth commented Sep 2, 2021

((([1,2,3]))) - it can be arbitrarily nested.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 2, 2021

Oh and there also is [[1,2], [4,5]]

@ekpyron
Copy link
Member

ekpyron commented Sep 2, 2021

Ok, maybe it's nontrivial :-). But should still be possible, shouldn't it :-)...?

And those nested single-component tuples are quite annoying everywhere, maybe we should just get rid of that once and for all :-D. But that's a different topic.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 6, 2021

Implementation note:
Should introduce a new type similar to TupleType (maybe that one can be re-used with an additional boolean) which has one separate type per component. In TypeChecker::visit(TupleExpression, most of the checks can be removed. The type of the expression should be the new inline array type, where each component is just the type of the component.

Convertibility of this new type should be to statically-sized and dynamically-sized array type where each component type is convertible and the size matches.

Code generation should store each component in memory (and probably already convert to the target type).

Since we already need to convert to the target type anyway, we can maybe use @ekpyron 's idea of forwarding the target conversion type.

So it might be that this gets a bit more complicated.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 7, 2021

Forwarding the target conversion type will be very tricky, since there are too many situations in which this can happen, in my opinion.

In summary, I think we can only achieve the following without considerable effort:

We add an additional flag to ArrayType that is valid for statically-sized arrays in memory and tells that it is fine to convert this array into a dynamically-sized array. The flag is only set for array types deduced from TupleExpressions that are inline arrays for now.

Such array types are convertible to other such array types if the size matches (this is needed for two-dimensional inline arrays). They are convertible to statically-sized array types of the same size or dynamically-sized array types.

In code generation, they are handled as dynamically-sized memory arrays.

When converted to statically-sized memory arrays, we just add 0x20 to the pointer, while conversion to dynamically-sized memory arrays is a no-op.

The .length member and index access has to be done accordingly.

Another option to implement this is to make mobileType() to be the dynamically-sized memory array. I think this could require less implementation effort, but I'm not sure.

The difference between the first (A) and the second (B) way to implement it is that:

In (A), the expression [[1, 2], [4, 5]] can be converted to uint8[2][2], while it can only be converted to uint[][2] in (B).
On the other hand, [[1, 2], [3]] is valid in (B), while it is invalid in (A).

@ekpyron
Copy link
Member

ekpyron commented Sep 9, 2021

I still don't see the problem in determining up front whether the array is static or dynamic.

We do type checking - if it's possible to type-check these things, we have to know which type it has to end up in in order to validate that it's convertible - so we do know the type statically up front anyways in that sense. The worst thing that can happen is having to store it in an additional annotation and/or pass it down through a few nesting levels just as we do for the one-element-tuple-things at a lot of places anyways...

But I may be missing something and if I'm wrong and that really is problematic to do, the reserved length slot is ok-ish I guess.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 9, 2021

We can do it, yes, but I fear it might cause more problems than it solves, not only in using f for uint[]; / [1,2,3].f().

@ekpyron
Copy link
Member

ekpyron commented Sep 9, 2021

I don't see how the implementation detail for code generation affects stuff likeusing.

@chriseth
Copy link
Contributor Author

chriseth commented Sep 9, 2021

Ah then I misunderstood you, I thought you wanted to use that for type checking as well, like determining the type of [[1,2], [4,5,6]]

@ekpyron
Copy link
Member

ekpyron commented Sep 9, 2021

My point is that in any situation in which we actually need to write anything to memory (or anywhere), we will have had to typecheck the array literal already and thus know whether it's dynamic or static. So we can store that in an annotation and have code-generation generate properly typed code for it.

@ekpyron
Copy link
Member

ekpyron commented Sep 9, 2021

Whenever we don't have that, e.g. in [expr1,expr2,expr3];, we can just generate code as if expr1; expr2; expr3;, i.e. it doesn't matter if things are static or dynamic, because there really isn't any array anyways :-).

@chriseth chriseth moved this from Design Backlog to In progress in Solidity Sep 15, 2021
@cameel
Copy link
Member

cameel commented Sep 17, 2021

Looks like there's already an older issue about this: #9645.

@nventuro
Copy link
Contributor

Stopping by to enthusiastically voice support for this feature: creating simple arrays is extremely un-ergonomic (see e.g.: https://github.com/balancer-labs/balancer-v2-monorepo/blob/9e7a59e2281c1c182f1c604458a1391c0dbe44df/pkg/vault/contracts/authorizer/TimelockAuthorizerMigrator.sol#L145).

Is there an estimated timeline for this feature to be complete?

@cameel
Copy link
Member

cameel commented Mar 18, 2022

Yeah, we're aware of this particular pain point and we get a lot of complaints about it. It's one of the tasks on our roadmap for Q2 2022.

@chriseth
Copy link
Contributor Author

I'm on @ekpyron 's side now. Maybe I just don't see the problems anymore, though...

@NeverFearTomorrow
Copy link

Is there an ETA on this?

@cameel cameel added selected for development It's on our short-term development high effort A lot to implement but still doable by a single person. The task is large or difficult. high impact Changes are very prominent and affect users or the project in a major way. labels Aug 18, 2022
@drortirosh
Copy link

currently, there is a "int_constant" type, but when creating an array, it is reduced to "uint8".
instead, should have an "array_constant" (with members which are "*_constant").
The array is reduced to an actual type when used, either to a dynamic or static array.

@robercano
Copy link

I'm also very interested in this issue. Are there any updates on it? Some way we can help moving it forward? Thanks!

@cameel
Copy link
Member

cameel commented Feb 8, 2024

This is unfortunately blocked, pending improvements in the optimizer so we have to postpone it until we have those. That will probably only happen on top of the new type system, which right now takes priority.

Here's the summary from the PR where we last attempted to implement it: #12883 (comment)

The problem is the memory tracking inside DataFlowAnalyzer. For every memory write, it has to check if the memory write would overwrite any other memory state it has stored inside m_state.memory. If the memory mapping stays large, this can be quite expensive.
Two ways to overcome this come to my mind right now:

  • only make use of the memory mapping if the optimizer step deriving from DataFlowAnalyzer actually uses it
  • limit the size of the memory mapping

@drortirosh
Copy link

I don't understand why it is an optimizer issue: the conversion from an untyped array to a typed (dynamic or static) array should be done by the compiler, even if the optimizer is completely disabled.
e.g.
uint[3]a = [1,2,3];
vs
uint[] a = [1,2,3];

(the same is true for the conversion from <const int> to uint256)

@cameel
Copy link
Member

cameel commented Feb 13, 2024

It's because of how the IR codegen is designed. In the legcacy codegen we'd just allocate memory and write each element already at the target location. Which sounds simple, but in practice results in pretty complicated code with tons of special cases due to conversions, cleanup, multiple possible locations with different data encoding (memory, storage, transient storage), implicit conversions between nested array types, etc.

The approach in the IR codegen is to emit the most straightforward code and let the optimizer deal with it. So in this case it was basically to process each element independently, putting it in its own Yul variable and use the existing cleanup/conversion helpers that already have the logic to deal with all the complexity. Such code should be much easier to be proven correct and then the optimizer can deal with the redundancy using known and well tested transformations.

Unfortunately it turned out the unoptimized code is very likely to run into Stack Too Deep or even overflow the whole stack (which can hold at most 1024 elements) for large arrays. While we do have the stack-to-memory mover now, we don't want rely on it here because moving items to memory and them moving them again would be very inefficient way to do this. The optimizer (or Yul->EVM transform) has to be smarter about it.

Another problem, already mentioned in #11879 (comment), is that tracking so many memory writes would significantly slow down our dataflow analyzer, which is used by many optimizer steps. Even if we deemed the emitted code acceptable, it would make the optimizer even slower than it is now. We have to prevent that to make the feature viable.

@drortirosh
Copy link

I still think it is the wrong place. a fixed-size array is not "an optimization" over a dynamic array (of known size) - it is a completely different type, that only happens to have the same syntax at the source level.
I'm not familiar with the compiler's architecture, but to my understanding, after initial parsing of the source tree (and syntax checking), it builds an AST, and out of that, it generates a YUL representation.
in such a model, the array assignment (in the AST tree) is converted from a generic type to an explicit type, as required by the context. only after this AST transformation, we start a pass to convert it into yul.

@ekpyron
Copy link
Member

ekpyron commented Feb 19, 2024

This is indeed not an optimizer issue. The initial attempt for doing this was to still always treat array literals as statically sized and to merely allow an implicit conversion to dynamic arrays - and for that conversion we'd then have had to explicitly generate the actual conversion code - which then did become an optimizer issue, since we'd need the optimizer to generate code as if we had properly determined the type as dynamically sized from the start (nobody will want this to be as expensive as creating a static array first and then copying things over to a dynamic array).

I had argued from the beginning that we should rather consider array literals as "midly polymorphic", i.e. to allow treating them themselves as either dynamically or statically sized depending on the context as determined by type checking, and then to generate the appropriate code in the first place. That way this could indeed be solved without any involvement of or improvement in the optimizer.

So to be clear: the previous approach for implementing this failed to work out due to it relying on better optimizations. The approach I advocated from the start, though, could still be done. For that it's merely a matter of us finding resources for actually doing it (it's still non-trivial, since as of now, all expressions have a fixed type independent of their context and that invariant would break here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
annoys users 😢 high effort A lot to implement but still doable by a single person. The task is large or difficult. high impact Changes are very prominent and affect users or the project in a major way. language design :rage4: Any changes to the language, e.g. new features selected for development It's on our short-term development
Projects
No open projects
Solidity
  
In progress
9 participants