Dynamically sized types #6308

Closed
nikomatsakis opened this Issue May 7, 2013 · 16 comments

Comments

Projects
None yet
8 participants
@nikomatsakis
Contributor

nikomatsakis commented May 7, 2013

As described here:

http://smallcultfollowing.com/babysteps/blog/2013/04/30/dynamically-sized-types/

Rough notes on the plan:

  • Thinking about dynamically sized types. What needs to change:

    • trans code:
      • need to ensure that all dynamically sized types have a consistent
        layout (e.g., (ptr, meta))
        • change repr of ~[T], ~str, `~Trait', etc
        • change repr of ~Trait to be a pair, moving type descriptor into vtable
        • change repr of @fn, ~fn, &fn to skip header for &fn pointer
      • make the various Datum routines for deref'ing and so forth do
        the right thing based on the ty::t
        • not sure what other places in the code think they know about
          the structure of a pointer, probably a few
        • we could keep the types as they are now and get this part right,
          by essentially translating in trans from something like
          AutoBorrowVec into the combination of datum.deref() and then
          datum.take_ref()
      • at type system level, 9then, we'll want to:
        • implement kind bounds on fns (I can do that relatively easily)
        • any other changes that would stand in the way of types being
          separated from & etc
        • do the change
  • Semantic questions:

    • [a, b, c] has type [T, ..3]
    • ~[a, b, c] has type ~[T]
    • [a, b, c] --> [T]
    • Relevant to custom smart pointer types:
    • If we for a DWIM solution:
      • ~([a, b, c]) <-- what should this do?
      • basically we'll propagate information from ~ operator, @ operator, etc
  • Modify kindck (probably?) to enforce Sized requirement

    • good idea to do this early, gives us an idea of the burden aspect
    • maybe move existing code to kindck as a first step?
  • We may want to allow kind bounds of type definitions:

    struct Foo {
    vec: ~[T] <--- we would want to know here that T:Sized
    }

    or maybe just enforce this bound at use points instead

To be implemented by @pcwalton and @nikomatsakis.

@graydon

This comment has been minimized.

Show comment
Hide comment
@graydon

graydon May 16, 2013

Contributor

accepted for well-defined milestone

Contributor

graydon commented May 16, 2013

accepted for well-defined milestone

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis May 28, 2013

Contributor

Here is a rough plan I worked out:

First snapshot will contain:

  • Implement the Sized bound and require its use:
    • initially all types will be considered Sized except for type parameters w/o a Sized bound
    • type of all local variables must be sized
    • whenever you index into a vector, type of elements must be sized
    • whenever you dereference a pointer in a non-lvalue context, type of the pointee must be sized
      • moves is already computing a lot of this information, if we generalize it a bit it can tell us whether a given context is lvalue vs rvalue, which would be useful for other things as well
  • Add proc type, proc literals
  • Enforce fn:K bounds on the upvars
    • check that upvars have the proper bounds
    • consider bounds when computing TypeContents of a closure type
  • Add Trait:K bounds on object types (I think they are not parsed today)

Second snapshot will contain

  • Port uses of @fn and ~fn to use proc
  • Remove @fn, ~fn
  • Add new fn type (just closures) and interpret &fn as fn
  • Enforce Trait:K bounds on trait types
    • as with the "closure" case above

Third sdnapshot

  • Remove &fn as a special case
  • Modify ty.rs to remove evec, estr, sigils from objects. Deal with fallout.

Obviously the last step is a little vague.

Contributor

nikomatsakis commented May 28, 2013

Here is a rough plan I worked out:

First snapshot will contain:

  • Implement the Sized bound and require its use:
    • initially all types will be considered Sized except for type parameters w/o a Sized bound
    • type of all local variables must be sized
    • whenever you index into a vector, type of elements must be sized
    • whenever you dereference a pointer in a non-lvalue context, type of the pointee must be sized
      • moves is already computing a lot of this information, if we generalize it a bit it can tell us whether a given context is lvalue vs rvalue, which would be useful for other things as well
  • Add proc type, proc literals
  • Enforce fn:K bounds on the upvars
    • check that upvars have the proper bounds
    • consider bounds when computing TypeContents of a closure type
  • Add Trait:K bounds on object types (I think they are not parsed today)

Second snapshot will contain

  • Port uses of @fn and ~fn to use proc
  • Remove @fn, ~fn
  • Add new fn type (just closures) and interpret &fn as fn
  • Enforce Trait:K bounds on trait types
    • as with the "closure" case above

Third sdnapshot

  • Remove &fn as a special case
  • Modify ty.rs to remove evec, estr, sigils from objects. Deal with fallout.

Obviously the last step is a little vague.

@graydon

This comment has been minimized.

Show comment
Hide comment
@graydon

graydon May 28, 2013

Contributor

Sounds good. Exciting!

There was some talk at the end of last discussion on this about alternatives to adding a full proc type-family, either forcing use of traits for the use cases, or moving it into another possibly disjoint fn-qualifier-name (like extern, unsafe and once), just to avoid the superficial charge of "having multiple types of function". Did you have feelings about that?

Contributor

graydon commented May 28, 2013

Sounds good. Exciting!

There was some talk at the end of last discussion on this about alternatives to adding a full proc type-family, either forcing use of traits for the use cases, or moving it into another possibly disjoint fn-qualifier-name (like extern, unsafe and once), just to avoid the superficial charge of "having multiple types of function". Did you have feelings about that?

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis May 29, 2013

Contributor

@graydon it seems I never responded to you. I had a long response prepared but I guess I closed (or lost?) the tab. Short version: I am open to discussion, but I think that as far as my personal preference goes, I'd prefer to change extern fn to a different noun rather than an adjective+noun. Even I find the current fn types kind of confusing, and I think they will only be more confusing if we add in env fn or something like that. I would sort of prefer that you select the noun based on the memory ownership rules then you can use the "adjectives" to designate restrictions that the caller must observe (caller must be unsafe, caller can only call once) and the bounds (fn:K, proc:K) to restrict the values that can be closed over. At least that was the logic. But my main goal is to make the types comprehensible, so if others feel that it's more comprehensible and/or feels less complicated to always include fn, I'm persuadable.

Contributor

nikomatsakis commented May 29, 2013

@graydon it seems I never responded to you. I had a long response prepared but I guess I closed (or lost?) the tab. Short version: I am open to discussion, but I think that as far as my personal preference goes, I'd prefer to change extern fn to a different noun rather than an adjective+noun. Even I find the current fn types kind of confusing, and I think they will only be more confusing if we add in env fn or something like that. I would sort of prefer that you select the noun based on the memory ownership rules then you can use the "adjectives" to designate restrictions that the caller must observe (caller must be unsafe, caller can only call once) and the bounds (fn:K, proc:K) to restrict the values that can be closed over. At least that was the logic. But my main goal is to make the types comprehensible, so if others feel that it's more comprehensible and/or feels less complicated to always include fn, I'm persuadable.

@bblum

This comment has been minimized.

Show comment
Hide comment
@bblum

bblum May 29, 2013

Contributor

We probably don't want to allow unsizedness ever to "infect" containing types the same way the rest of the builtin bounds can, do we? The hardest case is with type parameters. For example:

struct Foo<T> {
    data: T,
}
struct Bar<T> {
    data: ~T,
}

and in another file:

x: Foo<[int]>, // must be either illegal or dynamically-sized
y: Bar<[int]>, // ok, but should we support it?

edit: In fact, the first case should almost certainly be illegal, due to struct Baz { x: Foo<[Int]>, y: Bar<[Int]>, } -- if allowed, the offset of y would have to be computed at runtime.

edit 2: This hints at other places where you'd need to use the Sized bound even when Copy is not in play -- container data structures. Examples: moving something into an ARC, mapping an option to an option, inserting into a hashtable.

Contributor

bblum commented May 29, 2013

We probably don't want to allow unsizedness ever to "infect" containing types the same way the rest of the builtin bounds can, do we? The hardest case is with type parameters. For example:

struct Foo<T> {
    data: T,
}
struct Bar<T> {
    data: ~T,
}

and in another file:

x: Foo<[int]>, // must be either illegal or dynamically-sized
y: Bar<[int]>, // ok, but should we support it?

edit: In fact, the first case should almost certainly be illegal, due to struct Baz { x: Foo<[Int]>, y: Bar<[Int]>, } -- if allowed, the offset of y would have to be computed at runtime.

edit 2: This hints at other places where you'd need to use the Sized bound even when Copy is not in play -- container data structures. Examples: moving something into an ARC, mapping an option to an option, inserting into a hashtable.

@glaebhoerl

This comment has been minimized.

Show comment
Hide comment
@glaebhoerl

glaebhoerl May 30, 2013

Contributor

I think there's three options WRT the first case:

  1. Don't allow it at all.
  2. Allow it if the unsized type is the only member (newtype-like), resulting type is unsized. This might allow str as a library?
  3. Allow it if the unsized type is the last member (superset of 2.), resulting type is unsized. This would be a safe encoding of C's flexible array member idiom.
Contributor

glaebhoerl commented May 30, 2013

I think there's three options WRT the first case:

  1. Don't allow it at all.
  2. Allow it if the unsized type is the only member (newtype-like), resulting type is unsized. This might allow str as a library?
  3. Allow it if the unsized type is the last member (superset of 2.), resulting type is unsized. This would be a safe encoding of C's flexible array member idiom.
@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis May 30, 2013

Contributor

I had not expected to allow either Foo<[int]> or Bar<[int]> right now, though I would eventually love it if we allowed unsized members as the final member of a struct. And yes I expect to use the sized bound on container data structures and higher-order functions that operate on them.

Contributor

nikomatsakis commented May 30, 2013

I had not expected to allow either Foo<[int]> or Bar<[int]> right now, though I would eventually love it if we allowed unsized members as the final member of a struct. And yes I expect to use the sized bound on container data structures and higher-order functions that operate on them.

@bblum

This comment has been minimized.

Show comment
Hide comment
@bblum

bblum May 31, 2013

Contributor

We'll also need Sized bounds on all functions that use polymorphic vectors.

Um... should construction of dynamically-sized vector element types be forbidden, or only construction of values inhabiting such type? What I mean is:

fn bar<T: Sized>() { }
fn foo<T>() {
     bar::<~[T]>()
}

(Likewise for enums, structs, tuples, etc.)

Contributor

bblum commented May 31, 2013

We'll also need Sized bounds on all functions that use polymorphic vectors.

Um... should construction of dynamically-sized vector element types be forbidden, or only construction of values inhabiting such type? What I mean is:

fn bar<T: Sized>() { }
fn foo<T>() {
     bar::<~[T]>()
}

(Likewise for enums, structs, tuples, etc.)

@bblum

This comment has been minimized.

Show comment
Hide comment
@bblum

bblum Jun 3, 2013

Contributor

Consider this pattern:

#[deriving(Clone, DeepClone, Eq)]
pub struct Cell<T> {
    priv value: Option<T>
}

To make this work we'll have to make the auto-generated impl for Clone and DeepClone have the Sized bound on it.

Contributor

bblum commented Jun 3, 2013

Consider this pattern:

#[deriving(Clone, DeepClone, Eq)]
pub struct Cell<T> {
    priv value: Option<T>
}

To make this work we'll have to make the auto-generated impl for Clone and DeepClone have the Sized bound on it.

@bblum

This comment has been minimized.

Show comment
Hide comment
@bblum

bblum Jun 3, 2013

Contributor

OK, I have implemented a basic check for this, and added enough Sized bounds in libstd to make it compile with the new check, and have some stats to give a feeling for how often Sized will be needed.

Keep in mind, the check is less strict than it will eventually need to be. It currently prohibits unsized types in function return types, bound locals (including arguments, patterns, and static globals), struct/vector/enum/tuple constructors, and by-value self. It does not yet prohibit referring to impossible types such as ~[T] (with unsized T), and it doesn't check function signatures inside traits.

Stats:

  • 50 files need Sized bounds at all; 84 do not. Among the 50 that do, on average 12.7 Sizeds per file.
  • Among the 572 fn/impl declarations that need Sized, 475 (83%) of them don't also have Copy. 283 (49%) of them don't have any other trait bounds at all.
  • The most popular reason for needing Sized are as follows, according to build errors before adding Sized:

33.4% function return type
20.1% local variable bound in pattern
19.3% function argument
12.6% tuple element
7.5% statically-allocated variable
3.2% struct field
2.5% by-value self type
1.4% vector element

  • (edit to add) There are 322 remaining polymorphic functions/impls that don't have to use Sized (36% of total). (I obtained this with the command grep -R '\(fn\|impl\)[^<(]*<.*>' . | grep -v Sized | grep -v "<'\(self\|r\|a\|a,'b\)>" | wc -l.)
Contributor

bblum commented Jun 3, 2013

OK, I have implemented a basic check for this, and added enough Sized bounds in libstd to make it compile with the new check, and have some stats to give a feeling for how often Sized will be needed.

Keep in mind, the check is less strict than it will eventually need to be. It currently prohibits unsized types in function return types, bound locals (including arguments, patterns, and static globals), struct/vector/enum/tuple constructors, and by-value self. It does not yet prohibit referring to impossible types such as ~[T] (with unsized T), and it doesn't check function signatures inside traits.

Stats:

  • 50 files need Sized bounds at all; 84 do not. Among the 50 that do, on average 12.7 Sizeds per file.
  • Among the 572 fn/impl declarations that need Sized, 475 (83%) of them don't also have Copy. 283 (49%) of them don't have any other trait bounds at all.
  • The most popular reason for needing Sized are as follows, according to build errors before adding Sized:

33.4% function return type
20.1% local variable bound in pattern
19.3% function argument
12.6% tuple element
7.5% statically-allocated variable
3.2% struct field
2.5% by-value self type
1.4% vector element

  • (edit to add) There are 322 remaining polymorphic functions/impls that don't have to use Sized (36% of total). (I obtained this with the command grep -R '\(fn\|impl\)[^<(]*<.*>' . | grep -v Sized | grep -v "<'\(self\|r\|a\|a,'b\)>" | wc -l.)
@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jun 3, 2013

Contributor

Regarding bounds on type definitions: I think we should just permit them, at least builtin bounds (kinds). While not strictly required, it feels like we're working awfully hard to avoid them, and for no particular reason. It's easy enough to enforce the bounds at the creation site. It would also help to address an issue with Drop, because right now it makes no sense to have any bounds at all on the type parameters for a drop impl --- otherwise, what, you only run the dtor if the type parameter is owned?

Regarding the statistics, that's great information to have.

Contributor

nikomatsakis commented Jun 3, 2013

Regarding bounds on type definitions: I think we should just permit them, at least builtin bounds (kinds). While not strictly required, it feels like we're working awfully hard to avoid them, and for no particular reason. It's easy enough to enforce the bounds at the creation site. It would also help to address an issue with Drop, because right now it makes no sense to have any bounds at all on the type parameters for a drop impl --- otherwise, what, you only run the dtor if the type parameter is owned?

Regarding the statistics, that's great information to have.

@bblum

This comment has been minimized.

Show comment
Hide comment
@bblum

bblum Jun 7, 2013

Contributor

I didn't convert these into percentages, but here they are for the other libraries:

libextra stats:

      2 vector element
     21 tuple element
     22 struct field
     53 statically-allocated variable
     70 local variable bound in pattern
    149 function argument
    229 function return type
    230 does not fulfill `Sized`

libsyntax stats:

      1 local variable bound in pattern
      1 statically-allocated variable
      1 tuple element
      2 struct field
      3 vector element
     20 function return type
     52 does not fulfill `Sized`
    204 function argument

librustc stats:

     15 tuple element
     21 struct field
     31 does not fulfill `Sized`
     45 function return type
     47 statically-allocated variable
     48 function argument
     73 local variable bound in pattern
Contributor

bblum commented Jun 7, 2013

I didn't convert these into percentages, but here they are for the other libraries:

libextra stats:

      2 vector element
     21 tuple element
     22 struct field
     53 statically-allocated variable
     70 local variable bound in pattern
    149 function argument
    229 function return type
    230 does not fulfill `Sized`

libsyntax stats:

      1 local variable bound in pattern
      1 statically-allocated variable
      1 tuple element
      2 struct field
      3 vector element
     20 function return type
     52 does not fulfill `Sized`
    204 function argument

librustc stats:

     15 tuple element
     21 struct field
     31 does not fulfill `Sized`
     45 function return type
     47 statically-allocated variable
     48 function argument
     73 local variable bound in pattern
@bstrie

This comment has been minimized.

Show comment
Hide comment
@bstrie

bstrie Jul 15, 2013

Contributor

Nominating to change milestone. I feel like this is important enough that it ought to be a priority for 0.8.

Contributor

bstrie commented Jul 15, 2013

Nominating to change milestone. I feel like this is important enough that it ought to be a priority for 0.8.

@catamorphism

This comment has been minimized.

Show comment
Hide comment
@catamorphism

catamorphism Aug 29, 2013

Contributor

@bstrie We're not really using the 0.8 milestone. Maturity 1 is important as it gets.

Contributor

catamorphism commented Aug 29, 2013

@bstrie We're not really using the 0.8 milestone. Maturity 1 is important as it gets.

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Apr 15, 2014

Contributor

Nominating for close. Dupe of #12938.

Contributor

brson commented Apr 15, 2014

Nominating for close. Dupe of #12938.

@brson brson added the I-nominated label Apr 15, 2014

@pnkfelix

This comment has been minimized.

Show comment
Hide comment
@pnkfelix

pnkfelix Apr 17, 2014

Member

dupe is categorized as P-backcompat-lang and 1.0, so we can indeed close this ticket.

Member

pnkfelix commented Apr 17, 2014

dupe is categorized as P-backcompat-lang and 1.0, so we can indeed close this ticket.

@pnkfelix pnkfelix closed this Apr 17, 2014

nivkner added a commit to nivkner/rust that referenced this issue Oct 7, 2017

address more FIXME whose associated issues were marked as closed
update FIXME(#6298) to point to open issue 15020
update FIXME(#6268) to point to RFC 811
update FIXME(#10520) to point to RFC 1751
remove FIXME for emscripten issue 4563 and include target in `test_estimate_scaling_factor`
remove FIXME(#18207) since node_id isn't used for `ref` pattern analysis
remove FIXME(#6308) since DST was implemented in #12938
remove FIXME(#2658) since it was decided to not reorganize module
remove FIXME(#20590) since it was decided to stay conservative with projection types
remove FIXME(#20297) since it was decided that solving the issue is unnecessary
remove FIXME(#27086) since closures do correspond to structs now
remove FIXME(#13846) and enable `function_sections` for windows
remove mention of #22079 in FIXME(#22079) since this is a general FIXME
remove FIXME(#5074) since the restriction on borrow were lifted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment