Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#14902 Write a MIR lint for rooting analysis #20264

Closed
wants to merge 15 commits into from
Closed

Conversation

@mrowqa
Copy link

mrowqa commented Mar 10, 2018

There are some problems in "new" methods. MIR iterates over all temporary variables, so it detects unrooted subexpressions (I guess).

Also I don't get it why we allow box in new() in lines 62-64 of unrooted_must_root.rs. I couldn't write some example test that behaves differently with absent and present check of being boxed.

Futhermore, I left some TODOs and error messages in tests. Help is appreciated.

CC @jdm @nikomatsakis


  • ./mach build -d does not report any errors
  • ./mach test-tidy does not report any errors
  • These changes fix #14902

This change is Reviewable

@highfive
Copy link

highfive commented Mar 10, 2018

Thanks for the pull request, and welcome! The Servo team is excited to review your changes, and you should hear from @asajeffrey (or someone else) soon.

@KiChjang
Copy link
Member

KiChjang commented Mar 11, 2018

You can use the vars_iter function to loop through all user-declared locals and skipping all the temporaries created by the compiler.

@mrowqa
Copy link
Author

mrowqa commented Mar 11, 2018

The problem is that I probably want to iterate over temporaries to find some unrooted subexpressions (see example below), but we have one exception for constructors (methods named "new" and starting with prefix "new_"). Within such methods we want to loose a little bit the rules (see is_unrooted_ty). I can see MIR, so I know that Box::new(Foo(0)) creates temporary for Foo(0), but I don't know how it exactly looks with the HIR. I thought the old code was walking recursively the expression, so it could also reach Foo(0), but then the maybe_walk in is_unrooted_ty wouldn't make sense.

Given this test:

    /**
    ```
    #![feature(plugin)]
    #![plugin(script_plugins)]

    #[must_root] struct Foo([i32; 42]);

    fn foo(x: &Foo) -> i32 {
        let y = Foo(x.0).0;
        y[4]
    }

    fn main() {}
    ```
    */
    pub fn expression2() {}

Do we want to report the subexpression Foo(x.0)? I have checked and the HIR plugin allows doing so.

BTW I have checked the version with checking only user defined bindings suggested by @KiChjang (is that what we want?) and compilation progressed further; I shouldn't have deleted in_derive_expn check.

@mrowqa
Copy link
Author

mrowqa commented Mar 11, 2018

Also, how about this one:

    /**
    ```
    #![feature(plugin)]
    #![plugin(script_plugins)]

    #[must_root] struct Foo(i32);

    fn foo(x: &Foo) -> i32 {
        let y = &Box::new(Foo(x.0 + 3)); // 1st version - ok?
        // let y = Box::new(Foo(x.0 + 3)); // 2nd version - fail?
        y.0
    }

    fn main() {}
    ```
    */
    pub fn ban_box() {}
@mrowqa
Copy link
Author

mrowqa commented Mar 11, 2018

In the commit above I have removed checking the subexpressions. Now all of included tests pass and servo builds without any warnings (build -d after clean).

@mrowqa
Copy link
Author

mrowqa commented Mar 11, 2018

Hm, maybe some additional test which fails with HIR and passes with MIR? Can someone provide such one? I still don't clearly understand the impact of type monomorphization if we're talking about differences between HIR and MIR.

@jdm
Copy link
Member

jdm commented Mar 12, 2018

In general, the cases that we're concerned about are case where we could have a type like:

struct SomeContainer<T>(T);
impl<T> SomeContainer<T> {
    fn new(val: T) -> SomeContainer<T> {
        SomeContainer(T)
    }
}

Or consider a type that derives Default that should be rooted, and then there is code that does:

fn create_foo<T: Default>() -> T;

and we end up invoking it with a type that requires rooting. I believe these are cases that could be missed by the HIR lint today.

@mrowqa
Copy link
Author

mrowqa commented Mar 14, 2018

Currently we're using LateLintPass::check_fn which iterates over HIR. So we iterate only over definitions of generic functions. Requesting MIR for it doesn't scan whole HIR for each instance of generic function, so @jdm, your both examples pass my current MIR check since HIR pass sees only generic T.

Is there a way to iterate over these monomorphized generic functions? Also, we'd like to somehow know if given function is generated by derive attribute. Otherwise Servo compilation will fail due to deriving some trait, Clone AFAIR.
Maybe some clever trick with subexpression (there should be one temporary local variable for storing result of function call, I guess)? Maybe somehow "dig" into MIR? I couldn't find any Statement nor InlineAsm for function call nor don't know if it's a good idea.

CC @nikomatsakis

…orer usages involving generics)
Copy link
Author

mrowqa left a comment

@@ -87,7 +90,7 @@ impl<'a, 'tcx> LateLintPass<'a, 'tcx> for UnrootedPass {
cx: &LateContext,
def: &hir::VariantData,
_n: ast::Name,
_gen: &hir::Generics,
_gen: &hir::Generics, // use enum {generics(..), def_id(..)} instead of just def_id in UnrootedCx to (see below)

This comment has been minimized.

Copy link
@mrowqa

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

I don't really understand this comment -- oh, are you saying you could pass either a DefId *or Generics`? i.e., an enum like

enum Gen {
  Generics(Generics), DefId(DefId)
}

?

If so, I wouldn't bother. Calls to generics_of are quite cheap, no need to avoid them.

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

(In particular, they are memoized)

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

I could have passed the DefId of context, but what is the proper one here in check_struct_def and below in check_variant? I thought that this _gen should provide good context. That's why I came up with this enum for either one or the another. Anyway, I think getting DefId is the preferred way.

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

@mrowqa in check_struct_def, it should be the DefId of the struct. In check_variant it should be the DefId of the enum.

@@ -96,7 +99,7 @@ impl<'a, 'tcx> LateLintPass<'a, 'tcx> for UnrootedPass {
if item.attrs.iter().all(|a| !a.check_name("must_root")) {
for ref field in def.fields() {
let def_id = cx.tcx.hir.local_def_id(field.id);
if is_unrooted_ty(cx, cx.tcx.type_of(def_id), false) {
if is_unrooted_ty(cx, cx.tcx.type_of(def_id), false) { // ... be able to call is_unrooted_ty here? then we don't need to query generics_of

This comment has been minimized.

Copy link
@mrowqa
@@ -112,7 +115,7 @@ impl<'a, 'tcx> LateLintPass<'a, 'tcx> for UnrootedPass {
hir::VariantData::Tuple(ref fields, _) => {
for ref field in fields {
let def_id = cx.tcx.hir.local_def_id(field.id);
if is_unrooted_ty(cx, cx.tcx.type_of(def_id), false) {
if is_unrooted_ty(cx, cx.tcx.type_of(def_id), false) { // here same as above?

This comment has been minimized.

Copy link
@mrowqa
if is_unrooted_ty(cx, ty, false) {
cx.span_lint(UNROOTED_MUST_ROOT, arg.span, "Type must be rooted")
}
if !in_derive_expn(span) { // why? shouldn't it be improper usage anyway?

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

AFAIR it fails cause something derives traits like Clone.

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

Anyone? @jdm?

ty::TyParam(param_ty) => {
let ty_param_def = cx.tcx.generics_of(self.def_id).type_param(&param_ty, cx.tcx);
// will work for `foo::<Foo>(..)`
// but won't get enough information in `bar<T>(){foo::<T>(..);}`?

This comment has been minimized.

Copy link
@mrowqa

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

@nikomatsakis I guess we don't care about it right know?

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

I don't understand the question :)

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

In normal functions we will know all of the types in MIR, but in case we are in generic functions, we can use our generic parameter (which MIR, that we have, doesn't know exact type) to call other generic functions (bar in example above). Then we need to recursively track the functions, I guess. Maybe I have complicated it too much. Once I'll reach "our subgoal", I'll just write some test for it and we'll see then :)

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 20, 2018

Contributor

We don't need to recursively track functions -- if we call another generic function, that function will have its #[must_root] annotations (or not). Example:

fn foo<#[must_root] T: Default>() { bar::<T>(); }
fn bar<U>() { }

Here, when we are all done, the call to bar will yield an error. This is because U is not declared with #[must_root], and therefore we can't supply a value for U (in this case, T) that is "must-root".

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 20, 2018

Author

Oh, I see. I should have tried to write it down on a paper instead of just communicating my thought.

fn nested_visit_map<'this>(&'this mut self) -> hir::intravisit::NestedVisitorMap<'this, 'tcx> {
hir::intravisit::NestedVisitorMap::OnlyBodies(&self.cx.tcx.hir)
// how to check return value and arguments?
// unpack local and check the number? (0=ret place, 1..arg_count = args?) how to nicely pass arg_count/mir to this method?

This comment has been minimized.

Copy link
@mrowqa

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

@nikomatsakis I haven't removed the check from check_fn cause I don't know how to nicely move it here.

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

I would put the &Mir into the context and then use mir.local_kind()

#![feature(generic_param_attrs)]
#[must_root] struct Foo(i32);
struct Bar<#[must_root] T>(T); // is not detected

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

Even though I have written it explicitly, the check_struct_def hasn't detected it.

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

this is presumably because has_attr doesn't work -- in particular, it will spuriously return false for TyParam until it is fixed (we talked about the fix on gitter)

}
fn test() {
SomeContainer(Foo(3));

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

It still compiles. I haven't took a look at rustc yet.

late_cx: cx,
def_id: def_id,
in_new_function: in_new_function,
};

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

the main reason to define this was to define is_unrooted_ty as a method, so that it could access self.def_id to compute generics_of

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

it may not be necessary, just an idea to keep code nicer and not pass 100 arguments

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

I have implemented it as a method.

@@ -38,6 +40,7 @@ impl UnrootedPass {
}
}

// left to let the code compile
/// Checks if a type is unrooted or contains any owned unrooted types
fn is_unrooted_ty(cx: &LateContext, ty: &ty::TyS, in_new_function: bool) -> bool {

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

I'm confused. It seems bad to have two copies of this. =)

Is there a reason to do so just because we didn't refactor everything yet?

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

Yes, I haven't refactored the code yet. That's why I asked questions about that num "either def_id or generics".

}
}
// how to check return value and arguments?

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 19, 2018

Contributor

the return value and arguments are locals -- so you could just check the number, yes. As for passing mir, I would put a &Mir<'tcx> in the UnrootedCx.

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 19, 2018

Author

Your comment above is not displayed in files changed section, that's why I have commented another dot below.

kennytm added a commit to kennytm/rust that referenced this pull request Mar 22, 2018
Pass attributes to hir::TyParam

Required by servo/servo#20264
Discussed here: https://gitter.im/servo/servo?at=5aafdcd1012ff2bf681da97a

CC @nikomatsakis
kennytm added a commit to kennytm/rust that referenced this pull request Mar 22, 2018
Pass attributes to hir::TyParam

Required by servo/servo#20264
Discussed here: https://gitter.im/servo/servo?at=5aafdcd1012ff2bf681da97a

CC @nikomatsakis
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Mar 24, 2018

So next let's work towards making this an error:

fn foo<T>() { }
fn bar<#[must_root] U>() {
    let x = foo::<U>;
    x();
}
fn main() {
}

As you observed before, if we look at the MIR for bar, we see this:

    bb0: {                              
        StorageLive(_1);                 // bb0[0]: scope 0 at src/main.rs:3:9: 3:10
        _1 = const foo;                  // bb0[1]: scope 0 at src/main.rs:3:13: 3:21
                                         // ty::Const
                                         // └ ty: fn() {foo::<U>}
                                         // └ val: Function(DefId(0/0:3 ~ playground[dcfd]::foo[0]), Slice([U]))
                                         // mir::Constant
                                         // └ span: src/main.rs:3:13: 3:21
                                         // └ ty: fn() {foo::<U>}
                                         // └ literal: const foo
        StorageLive(_3);                 // bb0[2]: scope 1 at src/main.rs:4:5: 4:6
        _3 = _1;                         // bb0[3]: scope 1 at src/main.rs:4:5: 4:6
        _2 = move _3() -> bb1;           // bb0[4]: scope 1 at src/main.rs:4:5: 4:8
    }

The key is this const expression. We can intercept those by overriding visit_const. Constants are defined like so in MIR:

https://github.com/rust-lang/rust/blob/a0b0f5fba5d7bf18b24d1fa0e454a4fe871fecee/src/librustc/mir/mod.rs#L1848-L1853

I think the easiest thing to do, at least to start, would be to match just on the type of the constant. We are interested in seeing if it has a TyFnDef type, is the type given to function definitions (like foo):

https://github.com/rust-lang/rust/blob/a0b0f5fba5d7bf18b24d1fa0e454a4fe871fecee/src/librustc/ty/sty.rs#L126-L128

So something like this:

match constant.ty.sty {
    ty::TyFnDef(callee_def_id, callee_substs) => {
        // check goes here
    }
    _ => { }
}

Now you have the def-id and you have the substs (the set of values provided for each type parameter). So, next thing you can do is to fetch the Generics for that DefId:

let callee_generics = tcx.generics_of(callee_def_id);

Then you can go each of the substs and check if it is "must root". Although, given the mildly annoying way that the data structure is setup (which btw is being actively refactored) this works a bit better if we iterate through the generics:

// In our example, iterate over the generic parameters
// declared on `foo` (which is just `T`)...
for ty_param_def in &callee_generics.types {
    // If T has `#[must_root]`, then it is ok to
    // give it a must-root type, so just skip.
    // But in our example it doesn't.
    if has_unroot_attr(ty_param_def) {
        continue;
    }
    // <rest of code goes here>
}

The generic type declarations live in a struct like this:

https://github.com/rust-lang/rust/blob/a0b0f5fba5d7bf18b24d1fa0e454a4fe871fecee/src/librustc/ty/mod.rs#L713-L727

The field index has its index in the substs. So for each one we fetch the type that was given as its value from the substs using type_at:

let arg_ty = callee_substs.type_at(ty_param_def.index);
if is_must_root_ty(arg_ty) {
    report_error()
}

Give that a try! There are still a few more details to tidy up but it should handle our example.

@mrowqa
Copy link
Author

mrowqa commented Mar 26, 2018

I've done like you instructed and now it behaves as expected on your example. There are some tests that still fail (temporarily called generic_container and generic_container2; check out last commit).

I guess the next step is to make this code fail to compile:

#[must_root] struct Foo(i32);
struct SomeContainer<T>(T);

fn test() {
   SomeContainer(Foo(3));
}
Copy link
Contributor

nikomatsakis left a comment

Lookin' good so far!

}
}
ty::TyAdt(_, _) => {
cx.span_lint(UNROOTED_MUST_ROOT, constant.span, "xd")

This comment has been minimized.

Copy link
@nikomatsakis

nikomatsakis Mar 27, 2018

Contributor

you probably don't need this here -- this corresponds to a constant value of ADT type, e.g. const FOO: Option<i32> = Some(22) or something. This arm is checking the final type of the constant, and I think that's better done elsewhere. We only wanted to catch the case where we are supply unsuitable values for #[must_root] type parameters.

This comment has been minimized.

Copy link
@mrowqa

mrowqa Mar 27, 2018

Author

Oh, I forgot to remove this check. I was just playing around and then cleaning the code late in the night, so I missed it. Thanks for the explanation what it actually does!

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Mar 27, 2018

@mrowqa

Regarding this code:

#[must_root] struct Foo(i32);
struct SomeContainer<T>(T);

fn test() {
   SomeContainer(Foo(3));
}

Indeed, that should be an error. Handling this case is quite analogous -- that ought to be a Aggregate:

https://github.com/rust-lang/rust/blob/a0b0f5fba5d7bf18b24d1fa0e454a4fe871fecee/src/librustc/mir/mod.rs#L1607-L1612

and in particular there is this code here where we have a substitution that must be paired with the generics, in the same fashion we did before:

https://github.com/rust-lang/rust/blob/a0b0f5fba5d7bf18b24d1fa0e454a4fe871fecee/src/librustc/mir/mod.rs#L1642-L1647

The def-id of the AdtDef is accessible via its did field:

https://github.com/rust-lang/rust/blob/a0b0f5fba5d7bf18b24d1fa0e454a4fe871fecee/src/librustc/ty/mod.rs#L1585

@mrowqa
Copy link
Author

mrowqa commented Mar 27, 2018

@nikomatsakis That's what I actually tried and described on gitter. The Aggregate branch is not even taken. Following code doesn't panic on the test:

    fn visit_rvalue(&mut self, rvalue: &mir::Rvalue<'tcx>, _location: mir::Location) {
        match rvalue {
            &mir::Rvalue::Aggregate(_, _) => {
                panic!(" :( ");
            }
            _ => { }
        }
    }

Does it mean we have another bug in the compiler?

As I wrote on the gitter, in the MIR there is no Aggregate in the comments (I haven't checked how MIR dump formats it though). What I found, there are moves like this:

    let mut _1: SomeContainer<Foo>;
    let mut _2: Foo;

    bb0: {                              
        StorageLive(_2);                 // bb0[0]: scope 0 at src/main.rs:5:18: 5:24
        (_2.0: i32) = const 3i32;        // bb0[1]: scope 0 at src/main.rs:5:18: 5:24
                                         // ty::Const
                                         // + ty: i32
                                         // + val: Value(ByVal(Bytes(3)))
                                         // mir::Constant
                                         // + span: src/main.rs:5:22: 5:23
                                         // + ty: i32
                                         // + literal: const 3i32
here -> (_1.0: Foo) = move _2;           // bb0[2]: scope 0 at src/main.rs:5:4: 5:25

So I thought about checking Use viariant containg the Move operand, but I do have only Place inside with no information about the whole structure, so this way leads nowhere, I guess.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Mar 27, 2018

@mrowqa Hmm. The problem seems to be that you are coming quite late in the pipeline, and by then the Aggregate instructions have been optimized away and converted into individual writes. That said, it occurs to me now that we can fix this in another way, by looking at the types of temporaries. Currently, in visit_local_decl, you are only looking at args/user-variables, but if you looked at temporaries (such as _2, in your example), you would still detect errors. The spans on these temporaries should correspond to source expressions in the MIR (so you can use them for error reporting).

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Mar 27, 2018

There may also be a way to insert your check earlier in the pipeline, but I'm not sure how to do that.

@Manishearth
Copy link
Member

Manishearth commented Mar 29, 2018

I'm okay with a whitelist.

@mrowqa
Copy link
Author

mrowqa commented Apr 8, 2018

I have pushed some fixes. Please take in mind that this change hasn't used compiler with my bugfix yet, so CI logs aren't accurate. I have opened another PR for bumping compiler version: #20589.

A little bit offtopic:
Through a couple of past days I was experiencing problems with building the project on Windows. It worked fine until I wanted to rebase my change on top of the master since I wanted to fix new errors introduced by better MIR lint. It broke a lot of stuff like disappearing cargo.exe, but after reinstalling everything (MSVC build on Windows), I cannot build mozjs_sys which was building fine before. This call fails with:

error: failed to run custom build command for `mozjs_sys v0.50.1`
process didn't exit successfully: `D:\dev\servo\target\debug\build\mozjs_sys-c45ac733464bcb6f\build-script-build` (exit code: 101)
--- stderr
thread 'main' panicked at 'Failed to run `make`: Os { code: 2, kind: NotFound, message: "Nie można odnaleźć określonego pliku." }', libcore\result.rs:916:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

The foreign text reads "File cannot be found.". Anyone have an idea what could be wrong?

I have temporarily set up a VM with Arch and Servo builds there, but development there is really inconvenient and slow for me right now.

CC @nikomatsakis @Manishearth

@mrowqa
Copy link
Author

mrowqa commented Apr 10, 2018

@nikomatsakis
I tried to fix the rest of the servo code to compile with the lint improvements.

Next test I am trying to pass:

#[must_root] struct Foo(i32);

trait Bar {
    fn extract(&self) -> i32;
}

impl Bar for Foo {
    fn extract(&self) -> i32 { self.0 }
}

fn test() {
    Foo(3).extract();
}

Here, we start with extract() call. Then we get into has_unrooted_generic_substs. Then we go to generics.parent which is Bar trait with one subst which is Foo which is #[must_root], so it is detected as an error. It shouldn't (FormData is #[dom_struct] which implies #[must_root] IIRC).

I tried to solve it in some different ways, but usually I was getting internal compiler error. One of my ideas was to use impl_trait_ref. I guess it's not exactly what I am looking for, but we should be aware of potential more substs than just only one for the type we're implementing, shouldn't we? But it can get even more complicated if we take a look at MIR dump (note: I added some extra generic parameters for soundness):

// ty::Const
// + ty: for<'r> fn(&'r Foo<i32>) -> i32 {<Foo<i32> as Bar<i32>>::extract}
// + val: Value(ByVal(Undef))

In case you'd like to run the code, merge this PR onto 07dd37a (it was today's master when I was merging it) and apply this diff with some useful debugging code.

Edit (in case Niko missed gitter message):

the only thing i came up with was to check path_def if it contains "{{impl}}" to differentiate generic params for impl<#[must_root] T> and impl X for Y where Y is #[must_root], but it seems to be a little bit hacky and I don't know if it's a good solution.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Apr 13, 2018

@mrowqa

I don't quite understand the problem just yet. I think the problem is this:

The trait parameters are not declared as #[must_root] -- notably, the implicit Self parameter on the trait. Therefore, when we try to use a #[must_root] type with them (Foo, in this case), we get an error, because a #[must_root] type can only be used with a #[must_root] type parameter. Or, to put it our "trait-like" terms, it's as if the trait has trait Bar: Unrooted, so get an error for Foo, as it does not implement Unrooted.

This seems like a problem indeed. The obvious sound way to do this is to label the trait as #[must_root] -- meaning that its Self parameter is #[must_root]. This would imply then that code within the trait -- e.g., default method implementations -- have to treat Self as if it may be #[must_root].

We might also be able to do something like say: all traits type parameters are #[must_root] by default if they have no default methods, since I think that's the only case where it would matter if Self (and other trait parameters) were tagged with #[must_root].

(I'm not ignoring associated types, but we probably do want to think about them too, eventually -- the easiest thing though is to ban them from being bound to #[must_root] types.)

Anyway, that would probably solve the immediate problem, but there is some further complications around impls. For example, imagine we have this:

trait Foo { fn method(&self) { .. } }
impl<T> Foo for Option<T> { .. } 

I'm not 100% sure but I imagine we do not want to be able to use Option<X> with Foo where X is #[must_root]? Maybe we don't have to worry about that, though, since Option<Foo> would already itself be malformed (since Foo is #[must_root], and Option is not declared to take a #[must_root] type parameter).

In short, I have to think a bit about how best to apply the trait system to #[must_root] types. It would be interesting to get a list of examples where we impl things for #[must_root] types, just to see what sorts of patterns we have to support.

@mrowqa
Copy link
Author

mrowqa commented Apr 13, 2018

@nikomatsakis

You understood the problem right.

Annotating trait with #[must_root] solves the problem. Annotating all traits may be problematic, but many of them are auto generated as I can see.

We might also be able to do something like say: all traits type parameters are #[must_root] by default if they have no default methods, since I think that's the only case where it would matter if Self (and other trait parameters) were tagged with #[must_root].

I don't know if that's the case and also don't know how to implement this so far, but we haven't decided the solution yet, so it doesn't matter.

trait Foo { fn method(&self) { .. } }
impl<T> Foo for Option<T> { .. } 

We can't instantiate Option<Foo>. But we can have Box<T> in new methods. I performed very simple test on Box<T>: we can't call method, but we can call new_method (after changing the method name) where restrictions are loosened a little bit (for constructors purposes).

It would be interesting to get a list of examples where we impl things for #[must_root] types, just to see what sorts of patterns we have to support.

Here you are: error log. To get this error log I took commits from this PR, merged them with 07dd37a and applied this changes. Note: I have changed the following:

-pub type DomRoot<T> = Root<Dom<T>>;
+pub type DomRoot<#[must_root] T> = Root<Dom<T>>;

I guess that's correct. Anyway, some extra errors (about 700 I guess) appeared after this change.
To understand the log: you look for error: Callee generic type must be rooted.. The first line after it is the method which violated the check. Then par$$ means we checked recursively generics.parent in has_unrooted_generic_substs - usually it is trait. > shows specific #[must_root] type. Please note that some of the traits are auto generated bindings by codegen.

@Manishearth
Copy link
Member

Manishearth commented May 2, 2018

@mrowqa Just wanted to know where you're at on this now -- do you need any help? Do you have any questions?

@mrowqa
Copy link
Author

mrowqa commented May 2, 2018

@Manishearth After the last message I was waiting a few days for @nikomatsakis (check his last paragraph of his last message), then I haven't had time due to other classes on my university. I hope I will finish my assignments by the end of this week, but this is an optimistic estimation.

The next thing I wanted to try, and probably will do until anyone says there is a better solution, is to modify the codegen to put #[must_root] before each trait.
Then, next iteration - check what errors are left and try to fix them. At the end I should also change rules for #[derive(...)] to allow only specific derivations (currently every one is allowed).

@mrowqa
Copy link
Author

mrowqa commented May 9, 2018

Tonight I've got some time, so I'm continuing the work.

I've merged this with master since old base commit had too old rust-toolchain (without bugfix mentioned here before).
@nikomatsakis I've applied your suggestion to make all auto generated traits by codegen to be #[must_root]. This dropped amount of errors from ~9.1k to ~7.9k, but interesting is that without my debug code (not uploaded with latest commit) it decreases to even ~7.2k.

I'll try to solve as much of them as I can tonight.

CC @Manishearth @jdm

@mrowqa
Copy link
Author

mrowqa commented May 9, 2018

Now, there should be ~2k errors. I have also added debugging code this time, since it nicely visualize what exactly caused the violation of plugin policy - AppVeyor should list compilation errors with this extra information.

Could someone take a look at my two previous commits to check if my changes make sense? I was mainly putting #[must_root] in some places, and if error came from outside the script component, then added it to white list - there are // -- is ok? comments. Note: the rules in neighborhood without these comments were copy pasted from is_unrooted_ty back then in March or April. I haven't checked if all of them are necessary and valid from our point of view.

Tomorrow I should also have some time, so I'll continue the work.

CC @Manishearth @jdm

|| match_def_path(cx, did, &["core", "mem", "size_of"]) // -- is ok?
|| match_def_path(cx, did, &["malloc_size_of", "MallocSizeOf"]) // -- is ok?
|| match_def_path(cx, did, &["core", "option", "{{impl}}"]) // -- is ok?
|| match_def_path(cx, did, &["core", "option", "Option"]) // -- is ok? looks like misuse

This comment has been minimized.

Copy link
@Manishearth

Manishearth May 9, 2018

Member

Option is an owning thing, so it shouldn't be here. Option<MustRootType> must be rooted. I'm not sure why MallocSizeOf, size_of, and Deref need to be here.

This comment has been minimized.

Copy link
@mrowqa

mrowqa May 10, 2018

Author

That's what I actually thought about Option. I have put all of these on this list, because they needed to be rooted and they come from outside of script component, so I muted the errors in this way and I plan to do similar thing with the remaining errors. There is also another way - by modifying is_unrooted_ty to add rules that Option (and other ADTs) are implicitly #[must_root] if they contain #[must_root] types. I didn't come up with the latter solution in the first place, and it looks better than current one, so I'll try to move some stuff there.

This comment has been minimized.

Copy link
@Manishearth

Manishearth May 10, 2018

Member

This list is for things that don't need to be rooted even if they have rootables in their type parameters; i.e. reference types and similar.

This comment has been minimized.

Copy link
@mrowqa

mrowqa May 10, 2018

Author

Same states the comment in is_unrooted_ty. My intention is to put there things that maybe should be there or things that I can't easily solve in any other way, just to get rid of the compilation errors, and leave some TODO comment for items to be solved later, maybe even in separate pull request.

This comment has been minimized.

Copy link
@Manishearth

Manishearth May 10, 2018

Member

Oh that's fine I guess. Leave a visible comment saying they don't belong 😄

@mrowqa
Copy link
Author

mrowqa commented May 10, 2018

Now there is about 840 errors. It's getting tougher and tougher. I'll continue tomorrow.

@mrowqa
Copy link
Author

mrowqa commented May 11, 2018

Servo should compile now. I was trying to fix violations manually to some point when it got really tough and then just automatically generated list of remaining violations and masked some last errors.

What do we want to do next? I could try solve some violations in script component from the autogenerated list. Anyway, what will we do with the rest? It would be good to take one problem at a time and discuss how it can be solved, so then I would be able to fix similar violations. Note that I have muted some errors with #[allow(unrooted_must_root)], too. And I am not sure if I put #[must_root] in places where it should be, but if the code compiles, it shouldn't lead to crashes, I guess.

What more should be done to close this PR? I think one thing is removing debug logs. Or, actually they are really helpful to find out what violates the rooting policy, so I think they could remain, but be better formatted in some kind of tree and contain better explanation.

Other untouched thing is checking content of #[derive(...)] attributes, cause it is currently skipped. There is TODO note in code, but not so descriptive. I think that it has much less priority than the main problem, so it can be ignored now.

CC @jdm @Manishearth

@mrowqa
Copy link
Author

mrowqa commented May 11, 2018

Note: I added Box<T> to exception list, and there was a test that checks if it is disallowed, so I commented the test out.

@mrowqa
Copy link
Author

mrowqa commented May 19, 2018

I am still waiting for the answer. I haven't touched anything since last time, cause I have had more assignments at university recently.

Copy link
Member

jdm left a comment

Thank you for doing all this work! I looked closely at some of the early changes and skimmed a bunch of others, and I want to make sure I understand the reasoning behind some of the changes before I spend a bunch of time looking at all of them. In particular, I'm not sure I understand what it means to add a #[must_root] annotation to a trait, or why it's necessary in many cases.

@@ -13,12 +13,13 @@ use std::sync::mpsc::{Receiver, Sender};
/// common event loop messages. While this SendableWorkerScriptChan is alive, the associated
/// Worker object will remain alive.
#[derive(Clone, JSTraceable)]
pub struct SendableWorkerScriptChan<T: DomObject> {
#[must_root]
pub struct SendableWorkerScriptChan<#[must_root] T: DomObject> {

This comment has been minimized.

Copy link
@jdm

jdm May 31, 2018

Member

I'm confused by these changes - Trusted<T> is a type that does not require rooting (due to the allow_unrooted_interior attribute), so it's not clear why T needs to be marked must_root here.

This comment has been minimized.

Copy link
@mrowqa

mrowqa Jun 4, 2018

Author

Sorry, I don't understand the relation between Trusted<T> and this place. I was putting #[must_root] in some places I thought it should be, without further understanding all of the code (there were too many errors).

For methods generic over T that we want to be able to accept #[must_root] we must specify explicitly that it is #[must_root] or mark the method with allow_unrooted_interior and/or allow(unrooted_must_root) (not sure which one).

This comment has been minimized.

Copy link
@jdm

jdm Jun 4, 2018

Member

The reason I brought it up is that T is only used as an argument for the Trusted<T> member that is stored in SendableWorkerScriptChan. Since Trusted<T> is marked #[allow_unrooted_interior] it is allowed to be stored anywhere. SendableWorkerScriptChan should not be marked #[must_root] in that case. Is that case possible to represent with the current plugin?

This comment has been minimized.

Copy link
@mrowqa

mrowqa Jun 5, 2018

Author

Oh, I see. I didn't notice the field 2 lines below. I got your point.

I don't remember how exactly all of these checks work. As I wrote somewhere else, here we are dealing with struct definition which are handled only on the HIR level if I remember correctly.

The intention with using #[must_root] is that it should be properly and explicitly propagated. So if we want T to be able to represent a #[must_root] type, then it should be marked as a #[must_root] type. Then, if we pass T to Trusted<T> and it is marked with allow_unrooted_interior, it means that Trusted<T> isn't #[must_root], so the whole SendableWorkerScriptChan structure doesn't have to be #[must_root], because it consists of regular fields.

Why is SendableWorkerScriptChan a #[must_root]? I marked it because there were some compilation errors, I believe. It is the heuristics I wrote about here and there.

There is also one side note: I am not sure if having sth<T: U> where U is #[must_root] enforces T to also be #[must_root]. But it should be possible to mute the error with allow_unrooted_interior and/or allow(unrooted_must_root).

@@ -12,14 +12,15 @@ use style::thread_state::{self, ThreadState};
/// This extends the API of `std::cell::RefCell` to allow unsafe access in
/// certain situations, with dynamic checking in debug builds.
#[derive(Clone, Debug, Default, MallocSizeOf, PartialEq)]
pub struct DomRefCell<T> {
value: RefCell<T>,
//#[must_root] // TODO should it be #[must_root]? then -> there are usages with #[must_root] types and with normal ones; two different structs for them? or hacking plugin compiler further to mark structure as #[must_root] if it generic type is #[must_root]?

This comment has been minimized.

Copy link
@jdm

jdm May 31, 2018

Member

This is an interesting question. What we want to represent is that DomRefCell<T> is must_root if and only if T is must_root for each particular instance of T. Is this something that is possible with the current plugin?

This comment has been minimized.

Copy link
@mrowqa

mrowqa Jun 4, 2018

Author

I see the intention. It is what I came up with, but didn't implement so far. After short recalling I am not sure if it was very problematic in some way. It could potentially require some changes to part of the plugin that checks only HIR (check_struct_def method if I remember the name correctly) and maybe some cooperation with the method that analyzes the MIR.

@@ -5629,7 +5629,7 @@ def contains_unsafe_arg(arguments):

if methods:
self.cgRoot = CGWrapper(CGIndenter(CGList(methods, "")),
pre="pub trait %sMethods {\n" % descriptor.interface.identifier.name,
pre="#[must_root] pub trait %sMethods {\n" % descriptor.interface.identifier.name,

This comment has been minimized.

Copy link
@jdm

jdm May 31, 2018

Member

Why is this change required?

This comment has been minimized.

Copy link
@mrowqa

mrowqa Jun 4, 2018

Author

Because of this. We want traits to be #[must_root].

@@ -62,6 +62,7 @@ use servo_config::opts;
use std::{char, ffi, ptr, slice};

/// A trait to check whether a given `JSObject` implements an IDL interface.
#[must_root]

This comment has been minimized.

Copy link
@jdm

jdm May 31, 2018

Member

Why is this change required?

This comment has been minimized.

Copy link
@mrowqa

mrowqa Jun 4, 2018

Author

Like mentioned somewhere before - some heuristics. It is a trait that is implemented by #[must_root] types, I believe.

@@ -13,7 +13,7 @@ use std::default::Default;

/// Create the reflector for a new DOM object and yield ownership to the
/// reflector.
pub fn reflect_dom_object<T, U>(
pub fn reflect_dom_object<#[must_root] T, #[must_root] U>(

This comment has been minimized.

Copy link
@jdm

jdm May 31, 2018

Member

I think the right solution for Box<T> where T is actually a rooted type is creating a #[allow_unrooted_interior] struct Unrooted<#[must_root] T>(Box<T>) and using that instead of Box<T>.

This comment has been minimized.

Copy link
@mrowqa

mrowqa Jun 4, 2018

Author

Sounds reasonable. If I remember correctly, this is the only usage of Box<T> with #[must_root] T, though. At least, the only usage reported by the compiler when I was gradually fixing the violations. So I can be wrong.

@@ -74,6 +74,7 @@ impl Reflector {
}

/// A trait to provide access to the `Reflector` for a DOM object.
#[must_root]

This comment has been minimized.

Copy link
@jdm

jdm May 31, 2018

Member

Why is this necessary?

This comment has been minimized.

Copy link
@mrowqa

mrowqa Jun 4, 2018

Author

Like mentioned before - it is just some heuristics of mine.

@@ -309,20 +309,20 @@ impl<'root, T: RootedReference<'root> + 'root> RootedReference<'root> for Option
/// on the stack, the `Dom<T>` can point to freed memory.
///
/// This should only be used as a field in other DOM objects.
#[must_root]
pub struct Dom<T> {
// #[must_root]

This comment has been minimized.

Copy link
@jdm

jdm May 31, 2018

Member

Why is this commented out? It's pretty important for the safety of Dom<T>.

This comment has been minimized.

Copy link
@mrowqa

mrowqa Jun 4, 2018

Author

To be honest, I don't remember. Maybe I thought that making T a #[must_root] is enough, but then the compiler would skip the pointer reference. Maybe I commented it out, because it produced a lot of errors that I couldn't get rid of. If it is the case, then it was probably before I started to put items in temporary exception list.

@@ -502,7 +502,7 @@ impl JsTimerTask {
},
InternalTimerCallback::FunctionTimerCallback(ref function, ref arguments) => {
let arguments = self.collect_heap_args(arguments);
let _ = function.Call_(this, arguments, Report);
let _ = function.Call_(this, arguments, Report); // TODO this is of type T, I can't locate where function is defined (somewhere in bind gen?)

This comment has been minimized.

Copy link
@jdm

jdm May 31, 2018

Member

Function is generated by CodegenRust.py due to components/script/dom/webidl/Function.webidl

This comment has been minimized.

Copy link
@mrowqa

mrowqa Jun 4, 2018

Author

I believe I managed to hack it in some way and let Servo compile. I wanted to make the function to accept #[must_root] type, I guess.

@mrowqa
Copy link
Author

mrowqa commented Jun 4, 2018

In particular, I'm not sure I understand what it means to add a #[must_root] annotation to a trait, or why it's necessary in many cases.

It is a matter of compiler internals. The problem with simple test and suggested solution can be found here.

@bors-servo
Copy link
Contributor

bors-servo commented Jul 5, 2018

The latest upstream changes (presumably #21118) made this pull request unmergeable. Please resolve the merge conflicts.

@asajeffrey
Copy link
Member

asajeffrey commented Jan 6, 2020

What's the state of play with this PR?

@jdm
Copy link
Member

jdm commented Jan 6, 2020

One open question is whether the compiler interfaces we're using here are something we could reasonably expect to keep using in the future. Given the concerns from upstream about our existing plugin usage, I'm not confident that's the case. Let's close this until we're ready to dedicate more resources to it.

@jdm jdm closed this Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

8 participants
You can’t perform that action at this time.