-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enforce protocols to always be valid utf8 strings #3745
Conversation
@mxinden please review despite broken build, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review. Still need more time.
@@ -158,7 +196,7 @@ pub trait InboundUpgrade<C>: UpgradeInfo { | |||
/// method is called to start the handshake. | |||
/// | |||
/// The `info` is the identifier of the protocol, as produced by `protocol_info`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs updating.
This is turning out to be pretty invasive but I think it is worth the breaking change. We reduce complexity by removing one layer of abstraction. This looses us some type safety as mentioned in the description but in reality, this just means we can no longer do |
I'd be hoping that we can ship deprecations like #3807 first such that I can delete all this code and don't have to touch it in these breaking changes :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall direction looks good to me.
core/src/upgrade/select.rs
Outdated
EitherName::A(info) => EitherFuture::First(self.0.upgrade_inbound(sock, info)), | ||
EitherName::B(info) => EitherFuture::Second(self.1.upgrade_inbound(sock, info)), | ||
fn upgrade_inbound(self, sock: C, selected_protocol: Protocol) -> Self::Future { | ||
if self.0.protocols().any(|p| p == selected_protocol) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in your pull request, having to do this now is a bummer. This is in the hot-path. With our move to one-stream-per-request this will be executed often.
All that said, I am fine moving forward here. Based on intuition, this warrants a benchmark before introducing any optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue is that SelectUpgrade
is constructed for each upgrade, meaning it is pointless to try and cache something.
Once we get rid of upgrades for protocols, we can try and cache some information in the handler, i.e. a HashMap of Protocol
to sub-handler.
I'd suggest we defer it until then. Despite being the hotpath, I think iterating a handful of protocols (which should be allocation-free) shouldn't be very slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative that came to my mind:
Introduce a ToProtocol
trait which Protocol
implements by default. Our ToProtocolsIter
would require an Iterator
where Item
implements ToProtocol
.
With this abstraction we still have the simplicity for the user, namely that they can just use Protocol
and that they can't mistake a Protocol
for a str
given that the latter is wrapped in a newtype. At the same time we maintain the functionality for e.g. the SelectUpgrade
.
Just a thought. I don't feel strongly about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is effectively the same solution as we had before, just with a different name. It doesn't remove a "layer" for the user because they still have to specify the Item
type.
But perhaps we should go with this path until we fully tackle: #2863
At that point, we will also be able to implement performance optimizations in the Select
handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the introduction of separate newtypes in multistream-select
and libp2p-core
, this is proving to be difficult.
On the one hand, I'd like to maintain this separation, I think multistream-select
should be an implementation detail of libp2p-core
and libp2p-swarm
.
But, with the two new-types in place, I cannot re-create the type-safety we had before because multistream-select
doesn't know about the type used in libp2p-core
.
Perhaps it is too early to try and remove this abstraction. Here is a different idea:
- Resurrect feat: replace
ProtocolName
withAsRef<str>
#3746 - Still introduce a
Protocol
newtype but have it inlibp2p-swarm
. This means protocols can use it and we can use it within our event types and commands
#[derive(Clone, Debug, PartialEq, Eq)] | ||
pub struct Protocol(Bytes); | ||
#[derive(Clone, Debug, PartialEq, Eq, Hash)] | ||
pub struct Protocol(Cow<'static, str>); // TODO: Instead of `Cow`, we should probably be storing `Arc<str>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we start with as is and introduce an Arc<str>
if necessary? None of it is exposed, thus easy to change in a non-breaking way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fully sure. It improves performance for cases where we dynamically construct a Protocol
. Ideally, users use from_static
as much as possible so it is kind of a niche usecase but I still want it to be performant.
It is not much work so I think I am just gonna do it. I don't see why a protocol string should ever be mutable which is the only thing that becomes impossible with Arc<str>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An Arc<str>
would not be enough, right? Arc
would require a heap allocation, no? So you would have Either<&'static str, Arc<String>>
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we would have an Either
of &'static str
and Arc<String>
.
This comment was marked as resolved.
This comment was marked as resolved.
f1a9349
to
f909139
Compare
core/src/upgrade/select.rs
Outdated
EitherName::A(info) => EitherFuture::First(self.0.upgrade_inbound(sock, info)), | ||
EitherName::B(info) => EitherFuture::Second(self.1.upgrade_inbound(sock, info)), | ||
fn upgrade_inbound(self, sock: C, selected_protocol: Protocol) -> Self::Future { | ||
if self.0.protocols().any(|p| p == selected_protocol) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative that came to my mind:
Introduce a ToProtocol
trait which Protocol
implements by default. Our ToProtocolsIter
would require an Iterator
where Item
implements ToProtocol
.
With this abstraction we still have the simplicity for the user, namely that they can just use Protocol
and that they can't mistake a Protocol
for a str
given that the latter is wrapped in a newtype. At the same time we maintain the functionality for e.g. the SelectUpgrade
.
Just a thought. I don't feel strongly about this.
#[derive(Clone, Debug, PartialEq, Eq)] | ||
pub struct Protocol(Bytes); | ||
#[derive(Clone, Debug, PartialEq, Eq, Hash)] | ||
pub struct Protocol(Cow<'static, str>); // TODO: Instead of `Cow`, we should probably be storing `Arc<str>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An Arc<str>
would not be enough, right? Arc
would require a heap allocation, no? So you would have Either<&'static str, Arc<String>>
instead?
These functions were only used for some code in the interop-tests which is easily mitigated and perhaps even easier to understand now. We can thus deprecate these functions and their related types and thereby reduce the API surface of `libp2p-core` and the maintenance burden. This change is motivated by the work around making protocols always strings which requires/required updates to all these upgrades. Related #3806. Related #3271. Related #3745. Pull-Request: #3807.
This comment was marked as resolved.
This comment was marked as resolved.
@thomaseizinger please ping me when I should take another look. |
Will do! |
This comment was marked as outdated.
This comment was marked as outdated.
This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏 |
I am going with a different approach now: #3746 |
Description
Previously, a protocol could be any sequence of bytes as long as it started with
/
. Now, we directly parse a protocol asString
which enforces it to be valid UTF8. This has a significant API impact. Instead of any type being able to represent a protocol, we use aProtocol
newtype now that can only be constructed from aString
or a&'static str
.Technically, this would not be necessary to for protocols to always be strings. We could also only change the
ProtocolName
trait to just return&str
instead of&[u8]
. However, both of these are breaking changes and I am of the opinion that there are currently too many abstraction layers in regards to stream upgrades. This change removes some of these layers by introducing theUpgradeProtocols
trait which replaces theUpgradeInfo
trait.We will eventually lose a bit of type safety with this approach. In particular, once we also ship #2863, types like
DeniedUpgrade
won't work anymore because each protocol is represented as the same type.Notes & open questions
Is the change in abstraction layers good? Should we go this far or retain the type-safety of upgrades we get from protocols being able to be any type and just having to satisfy a trait? I am going to prototype that too and see what it is like.
This is a draft because currently, only the
multistream-select
tests compile and work. I've only touched the code necessary for that to work.Dependencies
In order to make the implementation of this PR simpler, I've extracted several others. They are all deprecations, meaning we can release them first, then delete the code and finally merge this PR (once it is ready in itself).
{In,Out}boundUpgradeExt
#3807OptionalUpgrade
#3806libp2p-request-response
Change checklist