- Feature Name:
dyn_sized - Start Date: 2018-01-25
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
Summary
Add a (non-default-bound) marker trait DynSized and the corresponding lints to statically prevent
an extern type from being used where run-time size and alignment are needed.
Motivation
Foreign type (extern type, RFC #1861) is equivalent to C’s incomplete struct, often used to
create opaque pointers. These types have no known size and alignment, even at run-time. RFC #1861
noted that size_of_val and align_of_val should not be defined for extern type, but does not
specify how.
… we must also be careful that
size_of_valandalign_of_valdo not work either, as there is not necessarily a way at run-time to get the size of extern types either. For an initial implementation, those methods can just panic, but before this is stabilized there should be some trait bound or similar on them that prevents their use statically.
These functions are later implemented to return size of 0 and alignment of 1 as a stopgap solution in PR #44295.
DynSized was introduced as part of the competing RFC #1993. All types except opaque data types
implement DynSized, and thus can solve the size_of_val problem. Like Sized, it is an implied
bound that needs to be opt-out via ?DynSized, and T: ?Sized will still imply T: DynSized.
The DynSized trait was implemented in PR #46108. However, making DynSized an implied bound
caused push back from the team. The ?Trait feature was considered confusing, and causing pressure
and churn to package authors to generalization every ?Sized to ?DynSized. Further details can be
found in RFC issue #2255.
This RFC attempts to find a solution which will
- not cause breakage within the current epoch,
- can statically detect misuse of
size_of_val, and - does not require
?DynSized.
Guide-level explanation
In Rust there are types which the compiler does not know the exact size. Slices [T] is one
example: the length of a slice is unknown when compiling. These types are known as
dynamically sized types (DSTs), as the exact size can only be known at run-time.
There are types which the size cannot be known even at run-time. They are mostly used when
interacting with external library, where objects are encapsulated behind opaque pointers. These are
called foreign types in Rust, and declared using the syntax extern { type Foreign; }.
Some Rust features expect types have a run-time size and thus cannot accept foreign types. An
obvious example is the std::mem::size_of_val function which computes the run-time size of a value.
Some less obvious examples are:
-
Rust allows DSTs to be used as a struct field:
struct Data { checksum: u32, data: [u8], // <-- a DST }
However, we need to know the alignment of the DST field to calculate its offset. A foreign type has no run-time alignment, and thus cannot be used inside a struct.
-
Allocation of an object obviously requires its size and alignment. Moreover, deallocation also needs these information. This makes
Box<Foreign>invalid.
To prevent us from accidentally allowing a foreign type in where run-time size is needed, the
standard library provides an additional marker trait DynSized, to indicate the type has a run-time
size. DynSized is automatically implemented for all types except foreign types.
We may bound a generic parameter by DynSized to totally exclude foreign types.
unsafe fn as_byte_slice<T: DynSized + ?Sized>(val: &T) -> &[u8] {
// ^~~~~~~~ we require the type `T` to have run-time size.
let size = size_of_val(val);
let ptr = val as *const T as *const u8;
unsafe { slice::from_raw_parts(ptr, size) }
}
let _ = as_byte_slice(&1); // ok! sized types implement DynSized.
let _ = as_byte_slice("foo"); // ok! str also implement DynSized.
extern {
type Foreign;
fn get_foreign() -> *mut Foreign;
}
let _ = as_byte_slice(&*get_foreign()); // error! extern type does not implement DynSized.Additionally, when we use types which possibly doesn’t have a run-time size in size_of_val, Box
or a struct field, there will be warnings:
let _: Box<Foreign> = Box::from_raw(get_foreign());
//^ warning! cannot use extern type in a box.
struct Foo<T: ?Sized> {
a: u8,
b: T, // warning! T might not have a run-time size
}Reference-level explanation
DynSized trait
Introduce a new marker trait core::marker::DynSized.
#[unstable(feature = "dyn_sized", issue = "999999")]
#[lang = "dyn_sized"]
#[rustc_on_unimplemented = "`{Self}` does not have a size known at run-time"]
#[fundamental]
pub trait DynSized {}Modify the core::marker::Sized marker trait to:
#[stable(feature = "rust1", since = "1.0.0")]
#[lang = "sized"]
#[rustc_on_unimplemented = "`{Self}` does not have a constant size known at compile-time"]
#[fundamental]
pub trait Sized: DynSized {
// ^~~~~~~~ new
}DynSized implementations are automatically generated by the compiler. Trying to implement
DynSized should emit an error, stating that DynSized is a special trait that is automatically
implemented (E0322).
DynSized will be implemented for the following types:
Sizedtypes i.e.- primitives
iN,uN,fN,char,bool, - pointers
*const T,*mut T, - references
&'a T,&'a mut T, - function pointers
fn(T, U) -> V, - arrays
[T; n], - never type
!, - unit tuple
(), - closures and generators
- primitives
- slices
[T] - string slice
str - trait objects
dyn Trait - structs, tuples and enums where all fields are
DynSized - unions where at most one field is regular-sized*, and the rest are
Sized
DynSized will not be implemented for the following types:
- foreign types
- structs, tuples and enums where at least one field is not
DynSized - unions where at least one field is not regular-sized*
DynSized is not a default bound. When T: ?Sized, we do not assume T: DynSized. Traits will
not have an implicit DynSized super-bound.
Note: “regular sized” is a subset of DynSized where the size_of_val is known via the type and
pointer metadata alone. All built-in DynSized types are regular sized, but when we support custom
DST there can be irregular sized types which are still DynSized e.g. a thin CStr. This RFC does
not attempt to specify the actual semantics of an unsized union beyond what is proposed in
PR #47650. We do not specify whether union U([u8], [u16]) is DynSized or not. The decision
should be postponed until we actually want to support such unions.
#[assume_dyn_sized] attribute
To uphold stability guarantee, we are not going to introduce DynSized bound to existing generic
bounds in the standard library.
Instead, we introduce a generic-parameter attribute #[assume_dyn_sized]:
fn size_of_val<#[assume_dyn_sized] T: ?Sized>(val: &T) -> usize;
// ^~~~~~~~~~~~~~~~~~~#[assume_dyn_sized] T: X can be thought as T: DynSized + X, except that when we substitute a
non-DynSized type into T, it simply triggers a warning instead of an error.
#[assume_dyn_sized] should not affect inference result.
The #[assume_dyn_sized] attribute is considered an implementation detail and should not be
stabilized. Use of this attribute outside of the standard library is strongly discouraged, and the
proper bound DynSized + ?Sized should be used instead.
#[assume_dyn_sized] T does not really mean T: DynSized, e.g. the following should cause a
type-check error:
fn foo<T: DynSized + ?Sized>() {
}
fn bar<#[assume_dyn_sized] T: ?Sized>() {
foo::<T>(); // <-- error: T does not implement DynSized.
}The #[assume_dyn_sized] attribute should be added to the following functions and types:
align_of_val,size_of_valRefCellRc,WeakArc,WeakBoxMutexRwLock
In rustdoc, render use of this attribute #[assume_dyn_sized] T: X as T: DynSized + X.
Lints
It is an error to use to non-DynSized types as a struct field or in size_of_val/align_of_val.
To avoid introducing breaking changes, we are going to close this gap across 3 milestones (one
milestone is one epoch or smaller time units).
Milestone 0: Warning period
- Move into this milestone after
DynSizedis implemented (but is unstable). Do not stabilizeextern typebefore this milestone is complete.
If a type cannot prove that it implements DynSized, but is used in places where size_of_val
etc are needed, a lint (not_dyn_sized, warn-by-default) will be issued. The places which triggers
the lint check are:
-
In a struct/tuple field, except the following conditions:
- The field is the first and only field in the struct, or
- The struct is
#[repr(packed)]
-
In an enum variant (if we allow DST enum)
-
A type
Tsubstituted into a generic parameter which is annotated#[assume_dyn_sized].
The type T passes the check when:
- It can be proved to implement
DynSized, or - It originates from a generic parameter annotated
#[assume_dyn_sized], or - It is a struct/tuple/enum where all fields satisfy one of these 3 conditions.
This lint must never be emitted in stable/beta channels in this milestone.
Examples
Check 1: A type which does not have guaranteed run-time size should not be allowed in a struct field.
struct Foo<T: ?Sized> {
a: u8,
b: T,
}warning: type `T` is not guaranteed to have a known size and alignment, and may cause panic at run-time
--> src/foo.rs:3:7
|
1 | struct Foo<T: ?Sized> {
| ------ hint: change to `DynSized + ?Sized`
2 | a: u8,
3 | b: T,
| ^ alignment maybe undefined
|
= note: #[warn(not_dyn_sized)] on by default
= note: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
Check 2: #[assume_dyn_sized] can be used to assume the type does have a run-time size. This
also puts a lint check on the input side.
struct Bar<#[assume_dyn_sized] T: ?Sized> {
a: u8,
b: T, // no lint here!
}
extern { type Opaque; }
let _: Bar<Opaque>; // lint here!warning: type `Opaque` has no known size and alignment, and may cause panic at run-time
--> src/bar.rs:6:11
|
6 | let _: Bar<Opaque>; // lint here!
| ^^^^^^ size is undefined
|
= note: #[warn(not_dyn_sized)] on by default
= note: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
Check 3: DynSized-ness can be checked in nested type.
struct A<T: DynSized + ?Sized>(u8, (T,)); // no lint!
struct B<T: ?Sized>(u8, (T,)); // lint!
struct C<#[assume_dyn_sized] T: ?Sized>(u8, (T,)); // no lint!Milestone 1: Denial period
- Move into this milestone after the
DynSizedtrait is stabilized.
Make not_dyn_sized deny-by-default, and enable the lint on stable/beta channel.
Milestone 2: A new epoch
- Move into this milestone after a new epoch.
Turn the not_dyn_sized lint to a hard-error when the new epoch is selected.
Milestone 3: Proper trait bounds
Replace all #[assume_dyn_sized] by proper T: DynSized bounds.
Since “breaking changes to the standard library are not possible” using epochs (RFC #2052), it may be impossible to reach this milestone.
Run-time behavior
Since violation of #[assume_dyn_sized] is just a lint, even if we misuse size_of_val the code
can still be successfully compiled, and we need to consider their behavior at run-time (i.e. after
monomorphization):
-
When a struct/tuple field is a non-
DynSizedtype, assume its alignment is 1. This ensures accessing a struct field&foo.barwill never panic. -
When
size_of_val/align_of_valis instantiated with a non-DynSizedtype, generate a panic call. (An alternative is adapt the status-quo of returning 0 and 1 respectively.)
Survey of existing DST usage
As a sanity check, we want to know how the community uses DST in order to know how the new lint will
affect them. We gather this information by categorizing DST usage of the “[top 100 packages] +
dependencies” (totally 191 packages) provided by Rust playground. We consider a piece of code is
“using DST” whenever ?Sized, size_of_val or align_of_val appears.
From the list, many usage patterns are not affected by DynSized, since they often just needs to
give a pointer address to method.
-
142 packages (≈76%) do not use any of these DST features
adler32 advapi32-sys atty backtrace-sys bit-set bit-vec bitflags build_const cc cfg-if chrono cmake color_quant cookie crc crossbeam crypt32-sys csv-core data-encoding dbghelp-sys debug_unreachable deflate docopt dtoa either enum_primitive env_logger extprim filetime fixedbitset flate2 foreign-types foreign-types-shared fuchsia-zircon fuchsia-zircon-sys futf futures-cpupool gcc getopts glob hpack html5ever httparse hyper-tls idna image inflate iovec itoa jpeg-decoder kernel32-sys language-tags lazy_static lazycell libflate libz-sys log lzw mac markup5ever matches memchr memmap mime mime_guess miniz-sys native-tls num num-bigint num-complex num-integer num-iter num-rational num-traits percent-encoding phf_codegen phf_generator pkg-config precomputed-hash regex-syntax relay rustc-demangle rustc_version safemem same-file scoped-tls scoped_threadpool scopeguard secur32-sys select semver semver-parser serde_codegen_internals serde_derive serde_derive_internals siphasher slab smallvec solicit string_cache string_cache_codegen string_cache_shared strsim synom syntex_errors syslog take tempdir term term_size termcolor termion textwrap thread-id threadpool time tokio-core tokio-proto tokio-tls typeable unicode-bidi unicode-normalization unicode-segmentation unicode-width unicode-xid unreachable url utf-8 utf8-ranges uuid vcpkg vec_map version_check void walkdir winapi winapi-build winapi-i686-pc-windows-gnu winapi-x86_64-pc-windows-gnu wincolor ws2_32-sys xattr -
Static trait object —
fn read_from<R: Read + ?Sized>(r: &mut R) -> Result<Self>; // ^~~~ ^~~~~~
A function takes an
&Tor&mut Treference, where the typeTimplements a trait. The?Sizedbound allows the function to cover dynamic trait objects sincedyn Trait: Trait. Its usage is typically limited to functions provided by the trait, and seldom needs to know the size or alignment.This pattern is used in 14 packages.
aho-corasick ansi_term csv (via serde) mio nix phf_shared png rayon-core reqwest serde serde_json serde_urlencoded (via serde) syn toml (via serde) -
AsRef —
fn open_file<P: AsRef<Path> + ?Sized>(p: &P) -> Result<Self>; // ^~~~~~~~~~~ ^~
A function takes an
&Treference, whereTimplementsAsRef<X>meaning the&Tcan be converted to an&Xwithout allocation. This is typically used to accept various kinds of strings which are unsized, thus the?Sizedbound. The function usually immediately call.as_ref()to obtain the&X, and again seldom access the runtime size or alignment.This pattern is used in 10 packages.
aho-corasick base64 clap mio miow quote regex syntex_pos unicase xml-rs -
Delegation —
impl<'a, T: Read + ?Sized> Read for &'a mut T { ... } // ^~~~ ^~~~ ^~~~~~~~~
A trait is reimplemented for smart pointers implementing the trait. The implementation typically just dereference the pointer and forward the method. Thus, the allocation aspect of the smart pointers are not touched.
Delegation targets are seen in various forms:
-
&Tand&mut T: 10 packagesaho-corasick bytes futures quote rand rayon rustc-serialize serde tokio-io toml -
Box<T>: 8 packagesbytes futures quote rand rustc-serialize serde tokio-io tokio-service -
Cow<T>: 4 packagesquote rayon rustc-serialize serde -
Rc<T>andArc<T>: 2 packagesserde tokio-service
-
-
Extension trait —
impl<T: Read + ?Sized> ReadExt for T { ... } // ^~~~ ^~~~~~~ ^
A trait is blanket-implemented for an existing trait. The
?Sizedis for completeness, and otherwise usually implemented like a typical trait which doesn’t need the size and alignment.This pattern is used in 5 packages.
byteorder gif itertools petgraph png -
Using
size_of_vallike C’ssizeof—let value: c_int = 1; setsockopt( sck, SOL_SOCKET, SO_REUSEPORT, &value as *const c_int as *const c_void, size_of_val(&value), // ^~~~~~~~~~~~~~~~~~~ );
Most
size_of_valcalls are not used to obtain the runtime size of a DST, but to mimic C’ssizeofoperator on a value, which are sized types. It is usually used in FFI scenario.This pattern is used in 10 packages.
backtrace error-chain libc mio miow net2 num_cpus schannel syntex_syntax unix_socket -
Comparison —
fn find_first<Q: PartialEq<K> + ?Sized>(&self, key: &Q) -> Option<&K> { ... } // ^~~~~~~~~~~~ ^~
A function takes an
&Treference which implementsPartialEq<X>,PartialOrd<X>,Borrow<X>,Hashor some similar methods that allows comparing the&Twith an&X. This is typically used in data-structure types.This pattern is used in 5 packages.
bytes hyper ordermap phf serde_json -
Miscellaneous usages for
?Sizedbounds such as,- just want to use a
&Twithout caring whatTis (error-chain,serde_json,tendril) - use it for
Cow<T>(ansi_term) - use it in associated type
type T: ?Sizedin order to be generic in accepting a string or byte slice (ansi_term,regex,tendril)
- just want to use a
Now some usages which will be affected by DynSized.
-
Box —
struct P<T: ?Sized>(Box<T>); // ^~~~~~
A type which contains a box of arbitrary unsized type. This is used in:
- hyper (
PtrMapCell<V>) - syntex_syntax (
P<T>) - thread_local (
TableEntry<T>)
- hyper (
-
DST struct —
struct Spawn<T: ?Sized> { id: usize, data: LocalMap, obj: T, // ^~~~~~ }
This is used in:
- futures (
Spawn<T>) - tar (
Archive<R>)
- futures (
-
Unsafe memory copying —
let mut target = vec![0u8; size_of_val(&v)]; // ^~~~~~~~~~~~~~~
Using
size_of_valon a maybe-unsized type tomemcpythe content somewhere else. This is used in:- nix (
copy_bytes)
- nix (
-
Transmuting —
Just tries to inspect the DST detail by transmutation or other unsafe tricks. This is used in:
- traitobject
Due to Rust’s stability guarantee, all above usage should continue to compile, even if it may panic at runtime.
As we can see from the above statistics, in the 49 packages using DST features, only 12 packages
really assume DynSized, and out of which, 5 packages uses Box<T>/Rc<T>/Arc<T> simply for
delegation. This means it is more popular for ?Sized to just mean “nothing is assumed”.
Drawbacks
Foreign type is a very exotic feature, but the not_dyn_sized lint applies to a much broader area.
Existing code seldom need to care about foreign types, and thus DynSized becomes an annoying
paper-cut for most users.
Even with this RFC, we still need to support size_of_val/align_of_val for non-DynSized input.
If we make size_of_val panic, dropping a Box<Foreign> will also panic which is undesirable. On
the other hand, if we make size_of_val return 0 or some made-up value, the allocator would free
the memory using the wrong size, which at best leaks the memory, and at worst overwrites unrelated
memory and causes undefined behavior.
Rationale and alternatives
Rationales
-
DynSizedin this RFC is an empty marker trait. Previous RFCs suggest this trait to provide asize_of_val/align_of_valmethod which allows “custom DST”. We believe this is not the correct level to place such methods:-
These two methods need to be defined even for non-
DynSizedtypes. To the compiler, puttingsize_of_val/align_of_valinsideDynSizedis not generic enough. It should be defined on a global trait likeAnyminus the'staticbound. -
Like
Sized, we do not want users to implementDynSizedthemselves. Thus the customized of size and alignment should be implemented via a sub-trait ofDynSized.
In either case,
DynSizedshould be empty even when we have custom DST. -
-
DynSizedis automatically implemented by the compiler, but is not an auto trait. It can be made into an auto trait:#[unstable(feature = "dyn_sized", issue = "999999")] #[lang = "dyn_sized"] #[rustc_on_unimplemented = "`{Self}` does not have a size known at run-time"] #[fundamental] pub auto trait DynSized {}
Since
extern typewill not implement any auto traits, this line is enough to distinguish foreign types.Whether
DynSizedis an auto trait is an implementation detail, and this RFC does not dictate the choice. However, there may be compilation performance problems with auto traits.
DynSized as implied bound
This RFC does not make DynSized an implied bound, in order to workaround RFC issue #2255. This
means existing ?Sized bounds will continue to accept foreign types as input, and we cannot fix the
bounds of size_of_val without breaking backward compatibility. Instead we introduce a lint, and
make use of epochs to eventually turn the lint into an error. Still, the bounds of size_of_val
will forever be incorrect to support Epoch 2015.
An alternative is just accept that we want ?DynSized. As stated in RFC #1993, making DynSized
an implied bound has the following benefits:
- Libraries, including the standard library, can relax
?Sizedto?DynSizedat their own pace without losing backward compatibility. - The standard library can provide the correct bound for all types without the
#[assume_dyn_sized]hack. ?Sized/?DynSizedlooks better than(DynSized + ?Sized)/?Sized.
On the other hand, adding new implied bounds is currently blocked by the compiler issue #21974.
Furthermore, concerns are raised about new implied bounds like ?DynSized and ?Move:
- Implied bound is a "negative feature", and it is confusing to reason about.
- Unlike the lints, the user needs to manually evaluate every use of
?Sizedand determine whether moving to?DynSizedis acceptable.
Replace #[assume_dyn_sized] by DynSized itself
Instead of introducing a new attribute for linting, we may make DynSized itself serve the purpose
for linting. We still restrict the bounds for size_of_val and friends:
fn size_of_val<T: DynSized + ?Sized>(val: &T) -> usize;
// ^~~~~~~~However, when we instantiate size_of_val::<Foreign>, instead of a type-check error, we emit a lint
instead. This allows us to describe the parameter using the correct bound without breaking existing
code.
The drawback is that it makes the trait bound concept very unnatural — the DynSized bound is
written there, but it cannot be used for inference, and violating the bound just causes a warning
Post-monomorphization error
Instead of introducing DynSized or lints, we may simply refuse to compile during monomorphization
of size_of_val, align_of_val or struct field using a foreign type. This gives an accurate
compile-time error and only affects the rare cases where foreign type is actually misused.
Rust currently has no stable post-monomorphization lints or type errors, except recursion overflow. All current post-monomorphization errors are related to unstable features e.g. using platform intrinsics with wrong type (E0511) or invalid inline assembly. These errors are expected to be checked in earlier phase before the features can be stabilized.
Introducing such errors marks a serious departure of this policy.
Do nothing
Just make size_of_val/align_of_val panic or return some sensible value, and hope the user won’t
put a foreign type in them.
Unresolved questions
- Should
DynSizedbe#[fundamental]?