New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge fields of the same type on multi-case union structs #699

Open
teo-tsirpanis opened this Issue Sep 30, 2018 · 3 comments

Comments

Projects
None yet
5 participants
@teo-tsirpanis

teo-tsirpanis commented Sep 30, 2018

Merge fields of the same type on multi-case union structs

I propose we change the way multi-case union structs are represented.

Let's have the following as an example:

[<Struct>]
type MyType =
    | Case1 of v1:int
    | Case2 of v2:int
    | Case3 of v3:int

Internally, this type is represented as three integers, plus another one to indicate the union case.

My proposal is to make unions like this (structs where all cases are of the same type (or tuple of types)), internally store the value only once, like the following example.

The existing way of approaching this problem in F# is this one, but it ruins type-safety:

type MyTypeTag =
    | Case1 = 1
    | Case2 = 2
    | Case3 = 3

[<Struct>]
type MyType = {
    Tag: MyTypeTag
    Value: int
}

Pros and Cons

The advantage of making this adjustment to F# is generation of more compact types. Struct multi-case unions are mainly used for low-level code where space efficiency is an advantage.

The disadvantage of making this adjustment to F# is lack of backwards-compatibility. Actually, not at all. We would still enforce the unique names requirement for the structs, just they would have to be redirected to the same field, for C# consumers, or return the default element when accessing the wrong field for the wrong union case.

Some other thoughts

  • For unions with 256 cases or less, we could compact the data structures even more by storing the tag field as a byte. This would apply to all DU types in general, but I am not sure about the backwards compatibility, we could still expose them as 4-byte signed integers in a property for C# consumers. There might also be issues mith memory alignment and other black magic like that.

  • For even more compactness, we could partially merge the same-type elements of the DU cases as in the example:

[<Struct>]
type MyType =
    | Case1 of i1:int * c1: char
    | Case2 of i2:int
    | Case3 of s3: string * i3:int

The type above could be represented as a struct with only three fields instead of five: one int for the i1, i2 and i3 fields, one string for s3, and one char for c1. And their position inside the tuple wouldn't matter at all. But this would make the proposal significantly more complex. And it would also raise questions about nested tuples and other stuff like that.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): S

Affidavit (please submit!)

Please tick this by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • This is not a breaking change to the F# language design
  • I or my company would be willing to help implement and/or test this
@dsyme

This comment has been minimized.

Show comment
Hide comment
@dsyme

dsyme Oct 12, 2018

Collaborator

I would be happy to see this change made

Collaborator

dsyme commented Oct 12, 2018

I would be happy to see this change made

@abelbraaksma

This comment has been minimized.

Show comment
Hide comment
@abelbraaksma

abelbraaksma Oct 12, 2018

My current way of dealing with this is to split the union in nested unions, which improves performance for large DUs, but it's sub optimal at best.

I'd be interested in timings for the tag being a byte or a short. Since many DU matches translate into a table lookup in IL, which I believe requires a native int, it may lead to performance degradation in terms of speed. If so, I'd think the trade-off of size vs speed isn't worth it.

abelbraaksma commented Oct 12, 2018

My current way of dealing with this is to split the union in nested unions, which improves performance for large DUs, but it's sub optimal at best.

I'd be interested in timings for the tag being a byte or a short. Since many DU matches translate into a table lookup in IL, which I believe requires a native int, it may lead to performance degradation in terms of speed. If so, I'd think the trade-off of size vs speed isn't worth it.

@realvictorprm

This comment has been minimized.

Show comment
Hide comment
@realvictorprm

realvictorprm Oct 12, 2018

Member

@dsyme How complex is this task? If you think it's not too complex I would like to try to develop a initial implementation.

Member

realvictorprm commented Oct 12, 2018

@dsyme How complex is this task? If you think it's not too complex I would like to try to develop a initial implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment