Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional unsafe operations #97

Closed
nockty opened this issue Sep 22, 2023 · 2 comments
Closed

Add optional unsafe operations #97

nockty opened this issue Sep 22, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@nockty
Copy link
Contributor

nockty commented Sep 22, 2023

Context

For messages that contain many string fields (e.g. repeated string with many elements coming in), UnmarshalVT can spend a lot of CPU time in runtime.slicetobytestring. Indeed, when decoding the []byte data, it does a string(bytes) cast (e.g. m.Foo1 = string(dAtA[iNdEx:postIndex])). Since []byte is mutable and string is not, this cast requires an allocation for safety. This allocation, repeated many times, sometimes turns out to be expensive.

Feature request

We could avoid this by using the unsafe package that allows us to perform this cast without an allocation:

func unsafeBytesToString(b []byte) string {
	return *(*string)(unsafe.Pointer(&b))
}

Of course, the user has to be careful because if they overwrite the []byte data they received from the wire, then the string is mutated. So, this feature should be opt-in in my opinion.

This feature is mentioned in another issue: Per Msg/Field Features. I think having per-message / per-field features is not mandatory to implement unsafe casting, though.

Note that this feature applies to bytes fields as well where we could reference data instead of copying it.

Proposal

I think this feature is worth implementing and I'm open to ideas. In my opinion, a simple and pragmatic approach to implement it would be to add unsafe functions, such as UnmarshalVTUnsafe, which perform such operations for all applicable fields. I've actually started such an implementation on my personal fork and it seems to work well (diff for anyone curious, although it doesn't work yet for bytes fields). Now, before spending more time on it, I would love to gauge interest and hear considerations from others!

A few considerations I had on my side:

  • I think having a different function from UnmarshalVT is mandatory because several applications can use the same generated code for a message and we cannot ask them to all be careful not overwriting data received from the wire.
  • I considered adding a feature but it sounds weird because it would be transversal to other features.
    • Either we could just always generate both the safe and unsafe versions of the function;
    • or I think we could add an --unsafe flag (different from features) to trigger the generation of unsafe functions.

Let me know what you folks think about all this!

@nockty nockty added the enhancement New feature or request label Sep 22, 2023
@vmg
Copy link
Member

vmg commented Sep 22, 2023

This is a perfectly sensible feature, and in fact we already have an open PR for it: #79

It's just missing some cleanup and some better naming. I'm definitely in support of it, by providing a separate API (UnmarshalUnsafe) instead of an --unsafe flag (which would be dangerous IMO).

@nockty
Copy link
Contributor Author

nockty commented Sep 22, 2023

Ah, I didn't realized there was already a PR. 🤦‍♂️ Thanks for pointing it out, I'll take a look into it!


Regarding the --unsafe flag, I was actually thinking about it this way:

  • without --unsafe (marshal+unmarshal features): generate UnmarshalVT and MarshalVT only
  • with --unsafe (marshal+unmarshal features): generate UnmarshalVT, UnmarshalUnsafeVT, MarshalVT, and MarshalUnsafeVT

I actually like the PR's approach more. It would consist in adding new features like unmarshal_unsafe and marshal_unsafe. I'm going to iterate on that. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants