Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document explicitly that Weak::from_raw(ptr::null()) is ub #114525

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

est31
Copy link
Member

@est31 est31 commented Aug 5, 2023

Fixes #114517 . I was confused for a moment, as the docs were very terse on the question of whether Weak::from_raw(ptr::null()) is ub or not. It is UB, as the implementation uses NonNull, and this is intended, but as_ptr and into_raw and into_raw_and_alloc still can return null pointers in the future.

@rustbot
Copy link
Collaborator

rustbot commented Aug 5, 2023

r? @cuviper

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 5, 2023
@est31 est31 mentioned this pull request Aug 5, 2023
@Zalathar
Copy link
Contributor

Zalathar commented Aug 6, 2023

To me, the existing documentation seems pretty clear about the fact that you are only allowed to call from_raw on a pointer that came from into_raw. Any other pointer is not allowed, regardless of whether it came from ptr::null or as_ptr or somewhere else.

I worry that drawing attention to specific implementation details might end up being more confusing to the reader, by muddying the message of what is and isn't allowed.

I suppose it could still make sense to point out that ptr::null and as_ptr are definitely not OK, since they're readily-available things that people might think to try. (And the as_ptr case might temporarily appear to work until everything suddenly goes horribly wrong.)

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2023

I worry that drawing attention to specific implementation details might end up being more confusing to the reader, by muddying the message of what is and isn't allowed.

Yeah, the null pointer is just one of many pointers that you are not allowed to use, so it's somewhat confusing to put special emphasis on it.

@est31
Copy link
Member Author

est31 commented Aug 6, 2023

To me, the existing documentation seems pretty clear about the fact that you are only allowed to call from_raw on a pointer that came from into_raw. Any other pointer is not allowed, regardless of whether it came from ptr::null or as_ptr or somewhere else.

Right, the my concern was just if as_ptr or from_raw say that they might return null pointers, people might think that from_raw can also take in a null pointer, because it can theoretically come from from_raw: it boils down to different interpretations of what "comes from from_raw" means, and whether two null pointers are the same thing.

Yeah, the null pointer is just one of many pointers that you are not allowed to use, so it's somewhat confusing to put special emphasis on it.

I have put special emphasis on null pointers, because the internally used type is NonNull, and because it is very common to initialize invalid pointers via null. As @Zalathar says, null pointers are readily available. The docs do mention other invalid pointers as well:

This extends to arbitrary invalid/unaligned/etc. pointers.

And the as_ptr case might temporarily appear to work until everything suddenly goes horribly wrong

I've considered changing the docs to explicitly allow the result of as_ptr to be passed, but then with the additional requirement that forget needs to have been called, or the value needs to have been leaked in some other way. But explaining it seemed too complicated to me, and people can just use into_raw. Shrug.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2023

Right, the my concern was just if as_ptr or from_raw say that they might return null pointers, people might think that from_raw can also take in a null pointer,

It can take in a null pointer if into_raw ever actually returns one. The input is a value that into_raw actually did return, not one that it might have returned.

Just because into_raw returns 0xadfe in one execution doesn't mean 0xadfe is always a valid input to from_raw.

@cuviper
Copy link
Member

cuviper commented Aug 6, 2023

Right -- if into_raw ever changes to return null, then from_raw must be updated at that time. But generally, these should be treated like opaque values with provenance, so you're not allowed to bring your own null/dangling/etc.

@est31
Copy link
Member Author

est31 commented Aug 6, 2023

It can take in a null pointer if into_raw ever actually returns one. The input is a value that into_raw actually did return, not one that it might have returned.

That second sentence is more or less what this PR is adding to the docs, to make this explicit. I read the current docs, but they were confusing enough for me to open #114517.

Just because into_raw returns 0xadfe in one execution doesn't mean 0xadfe is always a valid input to from_raw.

Yeah for particular values like 0xadfe it makes less sense, maybe someone thinks they can initialize some Weak reference with a null pointer to then later fill it with data, because into_raw: this is absolutely not a legal use.

But generally, these should be treated like opaque values with provenance, so you're not allowed to bring your own null/dangling/etc.

Yes. Clarifying that situation is what this PR is about.

@@ -2818,6 +2818,8 @@ impl<T: ?Sized> Weak<T> {
impl<T: ?Sized, A: Allocator> Weak<T, A> {
/// Returns a raw pointer to the object `T` pointed to by this `Weak<T>`.
///
/// # Safety
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange to see a "safety" comment on a safe function

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2023

I find the docs with your change more confusing than without. The new paragraph is a big hypothetical and hard to follow. Maybe try expressing what the underlying problem is in a different way? I understand you were confused by the current docs, but I think more people will be confused by your new docs than by the current docs, so this is not an improvement.

The current docs explicitly state "must still own its potential weak reference", which already means "you can only call this exactly as often as into_raw was called". Is that the part that didn't click for you? Should there be a paragraph explaining this in more detail?

@est31 est31 force-pushed the weak_null_docs branch 2 times, most recently from 1a32f3e to 92e8760 Compare August 7, 2023 00:41
@est31
Copy link
Member Author

est31 commented Aug 7, 2023

It is strange to see a "safety" comment on a safe function

Good point. I'd added it to add more structure and make it obvious that this is relevant for unsafe things people might do with it. That was my understanding of the # Safety heading. If it is exclusively used for preconditions of unsafe functions, then obviously it's wrong there. I will remove it.

I understand you were confused by the current docs, but I think more people will be confused by your new docs than by the current docs, so this is not an improvement.

Thanks for the feedback, I have pushed a new version that I think is better, do you like it?

must still own its potential weak reference

That is also something that's hard to understand tbh, because there is two things here at play: the thing the reference points to, and also the weak reference itself, which is an abstract concept encoded by the Weak type, but also partially lives at the site of the pointer target because the ref count is stored there. It also doesn't help that you have the word "potential" in the sentence. I think i won't touch it though because it's explained a little bit further in the safety section.

/// functions, not what these functions can maybe return, or are documented to potentially return.
/// For example, [`into_raw`] can return dangling pointers, but this doesn't allow you to create
/// a dangling pointer yourself and pass it to `from_raw`. This rule holds even if that pointer
/// created via some other means has the same value than some output you have observed from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has entirely the same value then you can use it. The subtle thing is that "same value" on pointers means "same address and same provenance".

What is that last sentence supposed to defend against?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first was to explain that even though into_raw might return some dangling pointer, including possibly the null pointer, from_raw does not allow you to create an arbitrary dangling pointer. The second was to explain that even if the address is the same (or whatever else goes into the PartialEq/Hash impls on pointers), it's still not allowed. I don't know how provenance is defined, and if the docs still say that provenance's definition is vague, then I think explaining the impact here in clear words is helpful. I thought provenance was only a concept for valid pointers for example, didn't think that different invocations of ptr::invalid_mut() with the same address also yielded different provenances, instead of just having one and the same "INVALID" provenance (whereas proper allocations each create their own provenance).

Also, even if provenance allows people to use ptr::invalid_mut(usize::MAX), which is equivalent to Weak::into_inner(Weak::new()), we still want to forbid that, because we reserve changing the implementation in the future, e.g. to use null pointers instead of usize::MAX. That's at least how I understand the current documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second was to explain that even if the address is the same (or whatever else goes into the PartialEq/Hash impls on pointers), it's still not allowed.

But the first already says that. It sounds like you are saying the same thing 3 different times using different words, and I still find that more confusing than helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I'll remove the repetition, but I'll keep the part with Weak::new as @cuviper likes it.

Comment on lines 2783 to 2788
/// Note that `from_raw` expects values that actually originated from a call to one of these
/// functions, not what these functions can maybe return, or are documented to potentially return.
/// For example, [`into_raw`] can return dangling pointers, but this doesn't allow you to create
/// a dangling pointer yourself and pass it to `from_raw`. This rule holds even if that pointer
/// created via some other means has the same value than some output you have observed from
/// [`into_raw`]: use [`Weak::new`] instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Note that `from_raw` expects values that actually originated from a call to one of these
/// functions, not what these functions can maybe return, or are documented to potentially return.
/// For example, [`into_raw`] can return dangling pointers, but this doesn't allow you to create
/// a dangling pointer yourself and pass it to `from_raw`. This rule holds even if that pointer
/// created via some other means has the same value than some output you have observed from
/// [`into_raw`]: use [`Weak::new`] instead.
/// Note that `from_raw` expects values that actually originated from a call to one of these
/// functions and have not been used with `from_raw` yet,
/// not what these functions can maybe return, or are documented to potentially return.
/// For example, [`into_raw`] can return dangling pointers, but this doesn't allow you to create
/// a dangling pointer yourself and pass it to `from_raw`

/// Note that `from_raw` expects values that actually originated from a call to one of these
/// functions, not what these functions can maybe return, or are documented to potentially return.
/// For example, [`into_raw`] can return dangling pointers, but this doesn't allow you to create
/// a dangling pointer yourself and pass it to `from_raw`. This rule holds even if that pointer
Copy link
Member

@RalfJung RalfJung Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better, thanks.

I am still somewhat concerned though, do we have to write a whole paragraph explaining the meaning of "must have originated from" every single time we use that phrase? Is that phrase really so unclear? Should we use another phrase that is more clear? Will people read this, then see "must have originated from" elsewhere where we do not have a whole paragraph about this notion, and conclude "they had a clarification there, but not here, hence they obviously mean something else by 'must have originated' because otherwise they would have added the same clarification"?

There s a real danger with having too many clarifications, which is that people think they describe a special case rather than clarifying a general principle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to write a whole paragraph explaining the meaning of "must have originated from" every single time we use that phrase?

The obvious solution to that problem is to link the phrase to a centralized location (e.g. the rust reference) giving the explanation of what it means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess that would be the alternative here.

With my suggestion above applied, I can live with this PR, though I am on the fence about whether it truly is an improvement -- I'm happy to leave that to the libs-api reviewer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still somewhat concerned though, do we have to write a whole paragraph explaining the meaning of "must have originated from" every single time we use that phrase?

The main problem here is that as_ptr (and into_inner linking to it) says that it can potentially return null or invalid pointers. We say this to be future proof, not because we return null pointers, but this creates this potential misunderstanding that I'm trying to guard against.

With valid allocations, it makes sense that you require the pointer to come from a specific function, to prevent the malloc + delete problem. But with invalid ones it seems weird to do that.

One can very easily assume that this was written only keeping the valid pointers in mind, but not the invalid ones. Often even specifications leave something unspecified and then implementers interpret it differently.

Of course there is other misunderstandings too, and yeah eventually you run into the situation where you are creating more misunderstandings than fixing, but I think this PR is improving the situation.

There s a real danger with having too many clarifications, which is that people think they describe a special case rather than clarifying a general principle.

Good point, that's what the word "Note" at the start tries to accomplish.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Weak::new reference is good, but we could pare down to mostly just that suggestion. If you're trying to second-guess the pointer origin, this is probably what you actually need.

@rust-log-analyzer

This comment has been minimized.

@est31
Copy link
Member Author

est31 commented Sep 3, 2023

@cuviper do you like the new version?

Comment on lines +2787 to +2788
/// a dangling pointer yourself and pass it to `from_raw`. Even if it has the same address
/// as a pointer created by [`into_raw`], use [`Weak::new`] instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// a dangling pointer yourself and pass it to `from_raw`. Even if it has the same address
/// as a pointer created by [`into_raw`], use [`Weak::new`] instead.
/// a dangling pointer yourself and pass it to `from_raw`, even if it has the same address
/// as a pointer created by [`into_raw`]. Use [`Weak::new`] instead.

/// return, or are documented to potentially return.
/// For example, [`into_raw`] can return dangling pointers, but this doesn't allow you to create
/// a dangling pointer yourself and pass it to `from_raw`. Even if it has the same address
/// as a pointer created by [`into_raw`], use [`Weak::new`] instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same diff as above

/// or is documented to potentially return.
/// For example, [`into_raw`] can return dangling pointers, but this doesn't allow you to create
/// a dangling pointer yourself and pass it to `from_raw`. Even if it has the same address
/// as a pointer created by [`into_raw`], use [`Weak::new`] instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and yet another time

@cuviper
Copy link
Member

cuviper commented Jan 25, 2024

@rustbot author
(for Ralf's suggestions)

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 25, 2024
@est31
Copy link
Member Author

est31 commented Jan 26, 2024

yes sorry I didn't get to them, I will try to take a look soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can Weak be null?
7 participants