Skip to content
This repository has been archived by the owner on Sep 11, 2024. It is now read-only.

COM trait methods should probably always be unsafe #97

Closed
bearmate opened this issue Oct 31, 2019 · 18 comments
Closed

COM trait methods should probably always be unsafe #97

bearmate opened this issue Oct 31, 2019 · 18 comments
Labels
!safety! Issues dealing with memory safety of the crate

Comments

@bearmate
Copy link

#[com_interface(00000000-0000-0000-C000-000000000046)]
pub trait IUnknown {
    unsafe fn query_interface(
        &self,
        riid: winapi::shared::guiddef::REFIID,
        ppv: *mut *mut winapi::ctypes::c_void
    ) -> winapi::shared::winerror::HRESULT;
    fn add_ref(&self) -> u32;
    unsafe fn release(&self) -> u32;
}

This way of declaring COM interfaces is really convenient, but IMO it provides a wrong feeling of safety. An erroneous COM server may cause memory errors, even in simple functions like add_ref. Further I would argue this functions are all special kinds of FFI calls (using the COM ABI), so like every normal FFI function they should always be unsafe. Currently the easiest way to use the com_interface macro (to not use unsafe at all) is probably the most dangerous one.

This extends to the com::InterfaceRc type. I think constructing this struct should also be unsafe. The user should be forced to verify the called COM server or signal their trust by constructing this type within an unsafe block.

This would lead to a less convenient usage, but I think this would still be a better approach, because most of the time the COM interfaces will probably not be used directly in idiomatic Rust code. Most of the time their will probably be behind another layer of abstraction and I think this should also be the layer providing the safe interface for the COM types.

However I see that especially in cases where COM is used as a system for plugin-like components it is in general impossible to verify the correctness, so their is probably a practical compromise needed.

@Rantanen
Copy link
Contributor

Rantanen commented Nov 1, 2019

Sorry, I'm going to hijack this a bit. I've been struggling with the same issue in Intercom and I'm curious to hear how my thoughts resonate. Given com-rs has the same issue, I don't feel too guilty for typing this out here.

TL;DR: Bad COM libraries can do bad things, just like bad Rust crates can do bad things. We assume Rust crates follow Rust rules, thus we should also assume COM libraries follow COM rules. unsafe should be used for the difference.

How far should unsafe reach?

The safety-extreme point of view would be that unless we can 100% guarantee that everything is safe, we should mark things as unsafe. However this would mean everything that touches extern functions would need to be recursively unsafe, infecting the whole program in the end.

Actually we go even further than that: I don't know what some_other_crate does. For all I know, they could do something extremely unsafe the moment I invoke one of their funtions. So any call to a different crate should be marked as unsafe? Clearly this isn't the case. We feel it's safe to call other crates, because they are Rust crates, which implies certain contract.

So what about COM?

I feel that last point applies to COM libraries as well. If a library claims to be a valid COM library, that claim comes with certain requirements - just like the claim to be a valid Rust crate does.

So just like with Rust crates we assume that the other crates abide by borrow rules, etc., I feel we should also assume that COM libraries abide by COM rules. This includes things such as the syscall calling convention and add_ref/release being correctly implemented, etc. So what is left for Rust's unsafe would be things required by Rust, but not covered by COM "guarantees": pointer aliasing, mutable references, etc.

@rylev
Copy link
Contributor

rylev commented Nov 1, 2019

We actually struggled a lot with this question and one reason we open sourced the library so early in its development was to get people's feedback about this, so I'm very happy to see this being brought up. Bringing in @snf and @adrianwithah who have also given some thought to the issue.

I personally am actually more sympathetic to the cause of requiring the user to mark all COM interface traits as unsafe. The reason I think that @Rantanen's argument isn't completely sound is for the following reasons:

Marking an API as unsafe means that the user of that API must verify the safety of the API against Rust's rules around memory and data race safety because the author of the API cannot or has not done so. If a crate has exposed an API as safe than the user of that API should assume that the author has verified that that that API can never been used in an unsafe way. In other words, if a user experiences memory unsafe issues in their program it can only be for one of two reasons, the user has used an unsafe API and failed to uphold the invariants required by the API OR there is a bug with respect to Rust's memory safety rules in that library.

In the case of this com crate, we as the authors of it, cannot verify that its usage will always be 100% safe. It is up to the user of the library to verify that the COM interface that they are working against conforms to the proper COM conventions. It's true (modulo bugs in this crate), that if the library being interfaced with is a valid COM library, then the user can be sure that things will work properly, but it is up to the user of com to verify this. Hinting to the user that they have a responsibility to verify the validity of an API beyond what the crate author can do is exactly what unsafe is for.

@raphlinus
Copy link

I agree with @rylev here. I think it makes sense to assume that the COM parts of the system play by the COM rules, and the Rust parts of the system are safe according to the Rust rules. I see a number of problems, though, in that the former is nowhere nearly as precisely defined and documented.

For a COM server that does something simple, I think it makes sense to allow its trait methods to be safe. But if it's possible to create unsafety by doing things that are allowable in the COM world, then I think it's better to err on the side of requiring "unsafe". I think someone basically needs to put on a "security researcher" hat and look for ways we can get to unsafety. I have a collection (the pointer return from IDWriteTextAnalysisSource::GetLocaleName method is fun, and then there's the whole question about whether there should be mutable access to ID3DBlob).

@Rantanen
Copy link
Contributor

Rantanen commented Nov 1, 2019

Now I'm conflicted... :)

I think it all boils down to "what should be verified?". Especially given COM's dynamic nature (you can register a different implementation in the system and that gets used instead), verifying everything is just not possible. In many cases verifying anything is difficult due to the binary-nature of COM libraries: What would I need to verify in order to provide a safe wrapper for Excel.Application for example - or OLE DB, which may end up calling third party ODBC drivers depending on parameters?

Another issue I have with marking otherwise safe trait methods as unsafe is the noise. I'm afraid that if every single COM call requires an unsafe { ... } block, then it might become too common and thus too easy to write without verifying the usage. Although this might not apply to com, given..

most of the time the COM interfaces will probably not be used directly in idiomatic Rust code. Most of the time their will probably be behind another layer of abstraction and I think this should also be the layer providing the safe interface for the COM types.

Maybe unsafe { extern com_crate Excel } would be the way to go. Requiring unsafe and thus forcing someone somewhere to verify that the Excel COM API is safe in COM terms, but not causing every single use of add_ref to require an unsafe block. :)

This of course doesn't cover method-based invariants, where whoever writes the trait should still add unsafes, just like with the release() above.

.. although why is query_interface marked as unsafe? Unfortunately the method documentation didn't have a Safety header:

/// The COM [`QueryInterface` Method]
///
/// This method normally should not be called directly. Interfaces that implement
/// `IUnknown` also implement [`IUknown::get_interface`] which is a safe wrapper around
/// `IUknown::query_interface`.
///
/// [`QueryInterface` Method]: https://docs.microsoft.com/en-us/windows/win32/api/unknwn/nf-unknwn-iunknown-queryinterface(refiid_void)
/// [`IUnknown::get_interface`]: trait.IUnknown.html#method.get_interface
unsafe fn query_interface(&self, riid: REFIID, ppv: *mut *mut c_void) -> HRESULT;

@adrianwithah
Copy link
Contributor

+1 regarding @Rantanen's point on usability. I think making unsafe code blocks too commonplace will diminish it's significance as a warning flag. Suppose we are relatively confident that a COM component (maybe one created using Rust?) is free of memory safety bugs, we should have a way for the developer to "trust" a component and expose it's methods as safe (assuming it has safe implementation).

That being said, as @raphlinus has mentioned, COM rules are not as precisely defined, which has been my experience while working on this library. There are many gotchas (such as uninitialised out-pointers) that could present some nasty bugs. With that in mind, I am hesitant to just introduce the "trust" function without proper documentation about a vetting process. In my opinion, this would often encourage developers to blindly use the "trust" function.

Using @Rantanen's example of Excel.Application, if we are unable to verify the functionality of third-party software, then I feel like it might not a valid case for "trusting" the component. We have to make sure that we still conform to Rust expectations that we do our due diligence before creating a safe wrapper.

@Rantanen
Copy link
Contributor

Rantanen commented Nov 4, 2019

What would unsafe on the function mean?

Personally, I feel unsafe on the function means that the user needs to somehow vet that calling that function is okay. What would this mean for COM objects?

  • The underlying pointer is a real COM pointer with a valid VTable?
    • InterfaceRc::drop ends up calling a release already so I feel that requirement is already part of the InterfacePtr::new safety requirements.
  • The caller is allowed to call the method from this thread?
    • Doesn't com already marshal through STA so this is a non-issue?
  • The calling convention is correct and the callee doesn't trash the stack.
    • Specifying one calling conventions to use is kind of what allows COM to work.
  • The parameters are valid for the purposes of the COM call.
    • I guess this is the can of worms. :)

More specifically what should not be included in the function-level blanket unsafe are:

  • Safety requirements specific to certain methods.
    • A method requires that a value behind a pointer is held alive for longer than just the call itself?
      • Manually mark the method as unsafe
    • A method requires that a buffer behind a pointer is large enough compared to another parameter?
      • Manually mark the method as unsafe
  • Any assumptions on the returned values.
    • Pointers returned by the function may be dangling. Creating dangling pointers isn't unsafe in Rust. The following code is safe Rust code:
fn get_ptr() -> *mut usize {
    1234 as *mut _
}

Using (dereferencing) any pointer requires unsafe block. The safety concerns of any return values should be resolved at that unsafe { ... }.

So what would be included in parameter validity then? The big concern I've got is *const pointers, especially given const qualifiers are lost in type libraries/MIDL generated code. I was thinking *mut pointers would be fair game to pass, given my understanding was that the caller would require &mut reference to acquire such a pointer, but surprisingly, the following code is valid "safe" Rust:

fn do_stuff_with_borrow(data: &Data) {
    call_com_method(data as *const _ as *mut _);
}

I think this is enough to turn my mind on the subject for any method that accepts pointer values of any kind. As long as all the parameters are Copy-ish types, there should be nothing the other side can do to affect Rust's memory model - however given how much Rust relies on the unsafe consideration taking place at pointer usage time, I think marking all the methods that use pointer parameters in any form as unsafe would be justified as there's no more need for unsafe { ... } once the execution passes to a C++ COM component.

The specific &Data as *const _ as *mut _ safety issue when calling COM interfaces could be avoided by having com support reference parameters directly (so it can be certain that the caller had &mut reference if the method needs a *mut pointer). However this would then cause an aliasing issue when implementing COM interfaces... :)

Supporting reference parameters wouldn't do anything to the callee being able to modify data behind *const pointers though.

@rylev
Copy link
Contributor

rylev commented Nov 4, 2019

I agree this is definitely a hair problem. I agree that we should not require the user to mark each usage of a method on a COM interfaces as unsafe (unless the user themselves cannot verify safe usage in which case they must mark the function as unsafe as well). Requiring marking the COM interface traits themselves as unsafe should be enough. This specifies to the user that they have verified that all ways this COM interface can be used from safe Rust are indeed safe. Given the current state of documentation, this is indeed a tall order, however!

We need to then document what assumptions about the COM interface we make in this library. This is an extremely difficult thing to do correctly, but I believe we can at the very least start this process. Some things are obvious.

For example, we assume that pointers to the interface remain valid as long as the reference count is above 0. We assume reference counts are decremented by 1 each time release is called. We assume that the pointer is freed once the reference count reaches 0. There are many other well known assumptions that we should document. Of course, there are even more "hidden" assumptions that we hopefully will discover with time. The user, at least, will have opted into this risk by marking the COM interface as unsafe.

@Rantanen
Copy link
Contributor

Rantanen commented Nov 4, 2019

We assume reference counts are decremented by 1 each time release is called. We assume that the pointer is freed once the reference count reaches 0.

This misses couple of real world usage cases:

  • Proxy implementations may intercept AddRef and implement their own reference counting. As far as I know .NET Interop only calls AddRef once for each proxy object and then tracks its own reference count on the proxy objects themselves.
  • If the library provides a COM interface to a static object, it shouldn't ever be released no matter how many times release is called.
  • Given static objects are never released, there's no need to waste time tracking an actual reference count on them.

When it comes to add_ref/release, do you need to assume anything other than the following?

  • Pointers are not freed until the last reference added with add_ref has been released with release.
    • Being intentionally ambiguous that the pointer doesn't need to be freed at this point.
  • Input COM interface pointers parameters are either null or point to a valid COM object, which is held referenced by the caller for the duration of the call.
  • Output COM interface pointers are considered to point to uninitialized memory when passed to a function. The callee should not attempt to dereference them for use (such as invoking release).
  • Output COM interface pointer parameters must point to either null or a valid COM object when the function returns. If these point to a valid COM object, the callee must have add_ref'd them.

And just a reminder, the return values of add_ref and release states:

The method returns the new reference count. This value is intended to be used only for test purposes.

@adumbidiot
Copy link

Sorry in advance if this is a dumb question as I haven't used COM before. Wouldn't dropping the runtime while keeping the object alive make all calls to that object crash the program, including add_ref? If so, shouldn't all calls to that object's methods be unsafe, in the event that the runtime has been dropped?

@Rantanen
Copy link
Contributor

That's a good point. It might be worth it to mark the runtime uninitialization as unsafe.

I feel this case is similar to normal Rust method calls. foo.bar() may crash the program, if foo is a reference built on top of a bad pointer - still, it's the dereference of the pointer that needs to be marked unsafe, even if the dereference isn't usually the one that triggers the crash.

@rylev
Copy link
Contributor

rylev commented Nov 15, 2019

Thanks for bringing this up. I believe you are correct though I think we should do more research to understand exactly what happens when a COM interface gets called after uninitialization. My gut tells me that it leaves our binary in an undefined state so this really is unsafe, but if Windows is doing some sort of magic to abort the process immediately upon this happening then I don't believe that would qualify as unsafe. Of course, if the process is still allowed to run (e.g., to unwind its stack), then this is indeed UB and unsafe.

@Rantanen
Copy link
Contributor

Looking further into ApartmentThreadedRuntime, it feels somewhat unsafe currently.

In addition to #105, the bigger issue is the not-unsafe CoUninitialize during Drop and no relationship between the runtime lifetime and the returned InterfaceRc results.

Currently the following is "safe":

fn return_instance() -> InterfaceRc<IFoo> {
    ApartmentThreadedRuntime::new()
        .unwrap()
        .create_instance::<IFoo>(&FOO_CLSID)
        .unwrap()
}

The runtime gets dropped inside the function while the InterfaceRc is returned out of it and gets dropped later. This results in the runtime having been uninitialized by the time the InterfaceRc::drop calls release().

Given Drop cannot be marked as unsafe, the easiest solution would be to mark the ApartmenThreadedRuntime::new as unsafe.

Another option would be to include the runtime lifetime in the InterfaceRc, but this might have unnecessarily noisy result for the otherwise quite safe pattern of calling CoInitialize at program start and CoUninitialize right before the program exit.

@adumbidiot
Copy link

Would using reference counting be too complex of a solution? Like each InterfaceRc having a reference-counted handle to a runtime, and only calling CoUninitalize when the number of handles goes to 0?

If that is too complex, would making the drop implementation panic unconditionally to force users of the library to call a special unsafe function to call CoUninitalize be viable?

I don't think marking the runtime's initialization as unsafe is a good idea, as the runtime's Drop implementation is what causes the UB. I feel like the result would be hard to track down program crashes, which would defeat the purpose of unsafe in the first place, though I also feel like I'm missing some of the rationale behind that option.

I'd also like to say that the rust SDL2 wrapper seems to wrestle with a similar problem where textures could potentially outlive their creators, and cause UB if they were dropped after their creator had been destroyed. The wrapper seems to have dealt with this problem by providing two implementations behind a feature gate: one reliant on lifetimes and another reliant on manual memory management. In the lifetime based solution, a created texture cannot outlive its creator by giving it a lifetime parameter, and it safely frees its memory through its drop implementation. In the solution based on manual memory management, there is no Drop implementation. Instead, the programmer is expected to call an unsafe destroy function on the texture to delete it and free the memory.

I'm not sure if that project handles the problem correctly; it just came to mind when I saw this problem and I feel like the the more possible solutions there are the better.

@adumbidiot
Copy link

Actually, would #101 be sufficient if it forces all traits to be unsafe?

@Rantanen
Copy link
Contributor

I feel the options are:

  • Make ApartmentThreadedRuntime::new() unsafe or add unsafe ApartmentThreadedRuntime::uninitialize() and panic on drop() if the uninitialize hasn't been called.
    • I see both of these approaches as rather similar. Both of them have trade offs of some sort, but in the end they both boil down to having the user ensure they manage the runtime in a safe way.
  • Have the InterfaceRc/InterfacePtr/etc. carry the runtime lifetime wherever they go.
    • The most "correct" approach. Might be noisy though by having those runtime lifetimes present everwyhere.
  • Reference count the interface refereces and delay the proper CoUninitailize till there are no references remaining.
    • Comes with a bit of a runtime cost.
    • There might be need for reference counting for DllCanUnloadNow anyway at some point - although the runtime reference counting is needed for COM clients while DllCanUnloadNow concerns COM servers.

I also feel like I'm missing some of the rationale behind that option.

I see unsafe as a requirement for the user to read the documentation of the method and assert to the compiler with unsafe { .. ] that they have done so and any safety requirements are maintained. Whether that exist son new or uninitialize doesn't matter that much in my books as long as it exists on one of them. Of course it would be better to have it closer to the point of the possible crash/undefined behavior/etc.,

Actually, would #101 be sufficient if it forces all traits to be unsafe?

#101 affects mostly COM servers. The runtime lifetime is a problem for COM clients. Especially if there is ever support for IDL-based code generation, then all of those unsafe traits would be defined in generated code and the writer of the COM client would never need to deal with a single unsafe keyword.

Persoanlly I'd be curious to see just how invasive 'runtime lifetime on InterfaceRc/InterfacePtr would be. Rust has gotten better at inferring those lifetimes so it might be an okay approach - although there would probably still need to be a way to convert those into 'static objects if they need to be boxed in a Box<T> or Rc<T> for example. Such omit_lifetime() method would be unsafe.

@rylev
Copy link
Contributor

rylev commented Nov 20, 2019

This issue has turned into a massive collection of related issues. I'd like us to return back to the question of a COM trait's methods should be unsafe or not. I realize that the other issues are related to this question, but I think we can first decide here what we want, and then use this decision to drive the answers to the other questions.

At the center of this is the fact that when a user declares a COM interface, they are not necessarily sure whether the concrete implementation backing this interface is safe or not. The reason for this is that it might not be known until runtime what actual implementation will be chosen. The user therefore might not be able to verify the safety of the implementation ahead of time.

There are several solutions to this issue:

  • Mark all COM interface methods as unsafe. This means that when the user actually calls the methods they will be required to verify the safety of the method call. This is arguably the only correct way to handle this as at the method call site, the user has the most amount of information on whether calling the method in that context is actually memory safe. This does however come with severe ergonomic downsides as each method call must be marked as unsafe.
  • Mark the COM interface declaration as unsafe. This means that when declaring the COM interface, the user is responsible for ensuring that the interface can never be used in an unsafe way. The user can then freely use the COM methods without having to verify safety. It should be possible to do this verification in a large amount of cases but might not always be. Safety could be supplemented with other unsafe markers. For instance, ApartmentThreadedRuntime::get_class_object could be unsafe, and require that the user verify that the use of all Interfaces with that concrete CO_CLASS is safe.
  • Do not require the user to mark the COM interface as unsafe at all. This is what we currently have, and this feels like the least correct option. I am personally convinced that we need to change this crate somehow.

I'm personally leaning to the first option. Calls to COM interface methods are inherently unsafe. Hiding that fact merely because it's inconvenient betrays the purpose of Rust. However, I know several people are against this because it makes the crate much harder to use, so I'm definitely interested in finding a difference solution.

If we believe that marking both the COM interface trait and the initialization of co_classes as unsafe could sufficiently require the user to verify safety, then perhaps that's the right way to go. In a large code base I'm not sure that is possible, however.

@rylev rylev added the !safety! Issues dealing with memory safety of the crate label Nov 20, 2019
@Rantanen
Copy link
Contributor

Rantanen commented Nov 20, 2019

TL;DR; I'm again leaning on marking everything unsafe - an opinion that changed again while writing this response...

This has been a bit of a roller coaster for me. Like I stated in my first comment in this issue, I've been giving this issue some thought previously from Intercom's perspective. Given Intercom aims to provide a user-level API, from that perspective my knee-jerk reaction was that unsafe is bad and makes the API difficult to use.

I've since then clarified the difference between Intercom and com-rs in my head. And given com-rs is a lot lower level API that is likely to be wrapped by a safe abstraction, I don't see everything being unsafe as such a bad thing anymore and started leaning towards the first option.

However finally I started to wonder what that unsafe would mean.

Especially currently when InterfaceRc is safe to construct, there's already an assumption that calling release() or add_ref() is safe.

I think at this point I feel like com-rs could go either way and it all depends on..

We need to then document what assumptions about the COM interface we make in this library.

I feel com-rs can't assume everything is safe. So some methods definitely should be unsafe. What would those methods be? I feel a somewhat obvious concern are methods that deal with pointers.

  • Calling IFoo::set_interface(&self, itf: *const Interface) should almost certainly be unsafe - the user is responsible for ensuring the parameter itf is not dangling.
  • Calling IFoo::get_value(&self, output: *mut u32) is similarly unsafe as the user is responsible for ensuring the output points to valid memory that can receive the value.

Now given COM uses HRESULT return values and relegates all meaningful outputs into output parameters that are pointers to caller allocated memory, this alone would make most COM methods unsafe anyway - and given most COM methods should be unsafe no matter what the conclusion to this issue is, the conclusion could just as well be "require everything is unsafe" without really affecting the ergonomics of most COM APIs.

@rylev rylev mentioned this issue Feb 17, 2020
@rylev
Copy link
Contributor

rylev commented Feb 18, 2020

I believe this issue is mostly resolved through #120 . The topic of safety still deserves a deeper look, but I'd like to continue those in different threads.

@rylev rylev closed this as completed Feb 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
!safety! Issues dealing with memory safety of the crate
Projects
None yet
Development

No branches or pull requests

6 participants