Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add an option to use shallow partial #378

Open
aikoven opened this issue Nov 1, 2021 · 17 comments · Fixed by #412
Open

Proposal: add an option to use shallow partial #378

aikoven opened this issue Nov 1, 2021 · 17 comments · Fixed by #412
Labels

Comments

@aikoven
Copy link
Collaborator

aikoven commented Nov 1, 2021

Deep partial is a convenient way to create objects, but it has some drawbacks:

  1. It requires traversing the whole object tree, which may hurt performance e.g. for large arrays.

  2. Sometimes you already have the complete (non-partial) object and want to put it to a field of a parent object created from partial. In this case, the child object is deep-cloned without need:

    function createChild(): Child; 
    
    const parent = Parent.fromPartial({
      child: createChild(),
    });
  3. It makes type checks weaker. For example, given Protobuf message:

    message Response {
      repeated Item items = 1;
    
      message Item {    
        string key = 1;
        string value = 2;
      }
    }

    Create the Response object from deep partial:

    const items: Array<[key: string, value: string]> = [];
    
    const response = Response.fromPartial({
      items: items.map(([key, val]) => ({
        key,
        // mistake in field name, not detected by TypeScript
        val,
      }))
    })

    To fix it, we have to specify the return type:

    const response = Response.fromPartial({
      items: items.map(([key, val]): DeepPartial<Response_Item> => ({
        key,
        // TypeScript error:
        // Object literal may only specify known properties, and 'val' does not exist in type
        // '{ key?: string | undefined; value?: string | undefined; }'
        val,
      }))
    })

My proposal is to add an option useShallowPartial, that when set to true makes fromPartial methods accept the built-in Partial instead of DeepPartial.

@stephenh
Copy link
Owner

Hey @aikoven ; sorry for not replying sooner, I got behind on ts-proto email/issue notifications and am finally working my way through them.

I was thinking exact types would solve your 2nd point, and while it's not yet built into TS, I think there is a sufficiently neat / works-for-us mapped typed that we can start using to finally have typo-free fromPartials; see #412.

Do you think this would be good enough to alleviate your ask for shallow partial?

If so, that's great, if not np, I'd be good with like a useDeepPartial that defaults to true but can be set to false? Wdyt? If you want to submit a PR for that, that'd be great.

@stephenh
Copy link
Owner

I also wonder if maybe this could be either a separate method, i.e. Message.from? And maybe a flag like useFrom=simple,partial that defaults to partial (current behavior, adds a Message.fromPartial) but users could decide if they wanted just Message.from instead, or both Message.from and Message.fromPartial)?

Tangentially, I've been wondering about renaming the fromPartial to something based on "create", i.e. Message.create (would be the proposed Message.from) and maybe like Message.createDeep (would be the current Message.fromPartial).

@aikoven
Copy link
Collaborator Author

aikoven commented Nov 28, 2021

My primary concern was the unneeded deep traversals, like when you already have the complete object (maybe received it from another service), but to pass it further you have to make a deep clone of it.

However, the shallow partial has its drawbacks as well, and I still think that the DeepPartial is a good default, so the Exact is a nice addition.

Thanks for your comments, I'll work on a PR when I have some spare time.

@stephenh
Copy link
Owner

@aikoven cool, sounds good.

Just talking out loud, but maybe we could detect "already a full message" and skip the deep traversal, i.e. like add a hidden message.__is_message__ marker attribute on the message, and when fromPartial sees that, it would stop traversing into that message and just use it as-is.

Granted, lists are a little bit tricky because if you do:

const foo = FooMessage.fromPartial({
  firstName: "f",
  children: [...lots of already-valid children...]
});

Then we'd at least have to do a children.map(c => c.__is_message__ ? c : Child.fromPartial(c)) which I suppose is only a linear scan so wouldn't be too bad...

Coincidentally the type-registry feature's $type field would work really well for this, i.e. any message with the $type field we'd just stop traversing...so if the type-registry is enabled, fromPartial could probably start doing this proposed detection "for free" w/o any new __is_message__ marker attributes...

@stephenh stephenh reopened this Nov 28, 2021
@stephenh
Copy link
Owner

🎉 This issue has been resolved in version 1.92.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@aikoven
Copy link
Collaborator Author

aikoven commented Nov 29, 2021

Tangentially, I've been wondering about renaming the fromPartial to something based on "create", i.e. Message.create (would be the proposed Message.from) and maybe like Message.createDeep (would be the current Message.fromPartial).

I like the idea of renaming it to something more basic, like create or from or init because to me this method is the only canonical way of creating message objects. I don't ever create them using object literals, because that code would break if we add a field to the message.

Just talking out loud, but maybe we could detect "already a full message" and skip the deep traversal, i.e. like add a hidden message.__is_message__ marker attribute on the message

Coincidentally the type-registry feature's $type field would work really well for this

This is an interesting idea, though I see how it could lead to non-obvious and surprising behavior. For example, consider the object destructuring:

const {someField, ...rest} = fullProtoObject;

Message.fromPartial(rest) // rest has the $type filed, but it is not "full"

@stephenh
Copy link
Owner

I don't ever create them using object literals

Yeah, maybe I'll think about doing a breaking change that renames fromPartial to create and adds an option to keepfromPartial if users need/want that for backwards compatibility/migration.

For example, consider the object destructuring:

Ah yeah, I didn't think of that...

This is admittedly maybe "too cute", but the check could be:

return !!o.$type && Object.keys(o).length === 10 ? o ? Message.fromPartial(o);

Where we know 10 is the number of fields in the *.proto file.

So that would catch someField being destructured out in your example...

@aikoven
Copy link
Collaborator Author

aikoven commented Dec 24, 2021

This is admittedly maybe "too cute", but the check could be:

return !!o.$type && Object.keys(o).length === 10 ? o ? Message.fromPartial(o);

It still sounds kinda fragile to me, though right now I can't think of other cases where it could break in regular code.

Yeah, maybe I'll think about doing a breaking change that renames fromPartial to create and adds an option to keepfromPartial if users need/want that for backwards compatibility/migration.

There's also an option to make the Message itself a factory function:

export interface Message {
  // ...
}

const create = (input: DeepPartial<Message>): Message => {
  // ...
}

export const Message = Object.assign(create, {
  encode, decode, ...
})

Then we can use Message as both a type and a factory, somewhat similar to classes:

const message: Message = Message({...})

@stephenh
Copy link
Owner

I had another thought, what about using classes for message types?

Granted, "just data instead of classes" is/was one of the main goals of ts-proto, so at first that seems kind of sacrilegious (and unless we got really lucky, would surely? be a breaking change).

But, after thinking about it, TypeScript's structural typing already let's anyone "implement a class" with "just an object literal" by just matching the class's public interface, as long as there are not any private methods.

So, in a way, I think if we flipped ts-proto messages to classs but a) did not have any private fields, and b) used only static methods, I'm tempted to think that most ts-proto users / code would be none the wiser...

And, specifically the upshot would be that fromPartial could use instanceof FooMessage to detect an already-full / trusted message.

Three more musings, given your PR #457 :

  1. It'd be great to have a mini benchmark for these sorts of changes.

  2. I heard awhile ago that v8's optimizations are based around making psuedo-classes for objects, and then re-assigning the "class" each time a property is added to the object, so idioms like:

const a = {
  foo: 1,
  bar: 2 ,
};

Are faster than:

const a = {};
a.foo = 1;
a.bar = 2;

Because v8 effectively sees each property assignment as a potentially "class" changing operation (similarly, assigning all fields in a constructor, vs leaving some empty and lazy-assignment them is also "supposed to be better").

Anyway, given you're thinking of performance, I'd been meaning to make the methods fromPartial, toJSON, and fromJSON all use the first return { foo: ..., bar: ... }. It should really be pretty easy, especially after @webmaster128 's work about making most/all? property assignments one-liners.

The one method that is more tricky is decode, since we incrementally assign fields as we see tags. Unless we decided to create an initial instance up-front, with all fields set to default values, and then assign over them with the real values, as tags are seen.

  1. Separate from the "shallow / optimized fromPartial", having actual classs for messages might help with v8 with optimizations just in general? Not sure.

@aikoven
Copy link
Collaborator Author

aikoven commented Dec 28, 2021

I had another thought, what about using classes for message types?

I'm usually opposed to classes, because almost always a problem can be solved using much simpler abstractions (i.e. plain objects). But in case of ts-proto I think that it might be a viable solution.

Granted, "just data instead of classes" is/was one of the main goals of ts-proto, so at first that seems kind of sacrilegious (and unless we got really lucky, would surely? be a breaking change).

This could go to the next major version along with other planned breaking changes, like changing option defaults, maybe cleaning up options.

  1. It'd be great to have a mini benchmark for these sorts of changes.

Since my primary concern is performance of deep traversal, I'll try to assemble a benchmark that compares deep vs shallow.

  1. I heard awhile ago that v8's optimizations are based around making psuedo-classes for objects, and then re-assigning the "class" each time a property is added to the object, so idioms like:
const a = {
  foo: 1,
  bar: 2 ,
};

Are faster than:

const a = {};
a.foo = 1;
a.bar = 2;

Yeah, I heard about that too, though I feel like it should not be way too hard to optimize the second variant. Probably the first created object will be non-optimized, but for subsequent ones the v8 has all the info to build a pseudoclass.

The one method that is more tricky is decode, since we incrementally assign fields as we see tags. Unless we decided to create an initial instance up-front, with all fields set to default values, and then assign over them with the real values, as tags are seen.

In #457 all objects are constructed from a base instance that already has all fields initialized.

  1. Separate from the "shallow / optimized fromPartial", having actual classs for messages might help with v8 with optimizations just in general? Not sure.

Not sure too. Intuitively a class should be more expensive than a plain object because classes is a heavier abstraction.


I've got another idea: what if encode method just accepted DeepPartial directly? It could do all the coersion to default values inline. This way there'd be no need to first make a full object just to pass it to encode, and no wasted traversals.

@webmaster128
Copy link
Collaborator

I'm not a fan of chaning objects to classes. Coming here from protobuf.js this was one of the best choices we saw here. In theory classes are objects anyways, but in practice the objects make things like equality checks very simple. I also don't see how classes would help here.

Adding another variant of the fromPartial method should be straight forward. Ideally fromPartial calls fromShallowPartial to avoid code duplication.

Adding a benchmarking framework here would be great. Then it would be possible to isolate and avoid regressions. Personally I don't have performance applications, so having someone pushing this would be great.

@aikoven
Copy link
Collaborator Author

aikoven commented Dec 28, 2021

In theory classes are objects anyways, but in practice the objects make things like equality checks very simple. I also don't see how classes would help here.

Not an apology to classes, but for example lodash#isEqual should work fine with classes as well as plain objects. We could also have ts-proto generate equality check functions , which would make it times faster than generic deep equality check like lodash#isEqual.

Classes would let us distinguish message instances from other objects using instanceof. However, the same could be achieved for plain objects too, for example by adding non-enumerable symbols.

@webmaster128
Copy link
Collaborator

Not an apology to classes, but for example lodash#isEqual should work fine with classes as well as plain objects. We could also have ts-proto generate equality check functions , which would make it times faster than generic deep equality check like lodash#isEqual.

I'm more thinking about testing frameworks where you don't really have the choice which equality implementation you want to use if you want to have nice test code like expect(uut).toEqual({ my: expectation}). We use Jasmine and this is really picky about the prototype. E.g. you don't get equality if you compare a Buffer and an Uint8Array of the same content. I'm sure there are solutions to each particular problem, but all this mess simply does not exist right now, which is a nice thing.

Classes would let us distinguish message instances from other objects using instanceof.

I see.

@stephenh
Copy link
Owner

We use Jasmine and this is really picky about the prototype

Ah shoot, I was just going to say, we could use prototype assignment w/o going all the way to class. I.e. in @aikoven 's #457, the createFoo methods could become:

const markerProto = {};
function createFooMessage() {
  return Object.create(markerProto, { ...fields... });
}

And then inside of fromPartial we could detect "is this already a fully-instantiated object" by checking Object.getPrototypeOf(foo) === markerProto).

I'm brainstorming "ways of observing an object is 'already fully-instantiated". We could:

  1. Use class and instanceof
  2. Use our own markerProto / prototype assignment
  3. Add a dummy o.__isFullyInstantiated field on each instance
  4. Use a weak set, i.e. inside of createFooMessage, put the newly created object into a module-level const fullyCreated = new WeakSet() and check existence in that set before recursing inside of fromPartial
  5. Use Object.keys(o).length === numExpectedKeys (...actually doesn't work for multi-level fromPartials where the 1st level has all of the keys, but that doesn't actually guarantee that nested values are fully populated)
  6. ...other ideas...?

I kinda like the markerProto idea; at one point ts-proto actually used a base prototype for what has now become the createFooMessage instance...

But the weak set idea is the least invasive, i.e. it doesn't change the look/shape/protos of messages at all; I think the only downside would be potentially (?) increased GC overhead from the weak set? I really don't know if that'd have any effect on GC or not.

Dunno, @aikoven / @webmaster128 wdyt about the weak set approach to short-circuiting deep traversals inside of fromPartial?

I suppose we could also provide multiple options...like default to the weak set approach, but if we do actually see performance overhead from that (although maybe we wouldn't, I'm really just speculating), we could let users fall back to the markerProto approach which has this imagined less GC overhead but at the "cost" of a non-default prototype...

@webmaster128
Copy link
Collaborator

webmaster128 commented Dec 29, 2021

Taking one step back, I wonder if this is really something that needs to be implemented here and create code and drawbacks for all other users. The proto message objects are fully public. Wouldn't it make more sense to create optimized constructors for a particular use case in the caller code? The caller code

  • Knows where there pain points are
  • Can decide if a deep copy is desire in that particular field. Even if we had the marker, it does not mean the library knows if a shallow copy is desired. An application can set a child message foo: Foo to one message, then mutate it, then set it to another message. Should both be mutated now or just the later?

@stephenh
Copy link
Owner

really something that needs to be implemented here

Yeah, good to sanity check that, but personally I'm pretty okay with trying to solve this in a general way / directly in ts-proto.

My rationale is that I've made probably ~2-3 libraries to "create nested data" (one being an ORM creating backend entities, another for client-side GraphQL types, and also ts-proto's fromPartial) that all want to let the programmer easily created a nested graph/tree of data as succinctly as possible, and then have the library "apply / fill in defaults".

I don't know that I'd really articulated / realized it before now, but all three have this same problem of "ergonomically let the programmer pass 'just literals' / partials (that need defaults applied) or already instantiated 'blessed' objects / entities (that don't)".

So, dunno, just given I've seen the same need in a few places, I think it's a pretty common need, and IMO, if we figure out how to do it well, can be a bit ergonomic win/pro for ts-proto users.

create optimized constructors for a particular use case in the caller code

Yeah, I agree this can work / be a good approach, but I think can also end up being "every call site wants it's own new / dedicated constructor", i.e. test suites are a great example of where fromPartials are nice to "create an object with only the data this specific test method cares about". Granted, we don't necessarily want to mix testing concerns with production code concerns.

Even if we had the marker, it does not mean the library knows if a shallow copy is desired.
An application can set a child message foo: Foo to one message, then mutate it, then set
it to another message. Should both be mutated now or just the later

I think it's less about shallow copies and more about "full messages", i.e. I think what you're creating is:

// create a 'full' / 'trusted' message, sets a default `bar` field
const foo = FooMessage.fromPartial({ foo: 1 });
// I change bar to `someOtherBar`, should `foo` be considered "not full / trusted" anymore
foo.bar = someOtherBar;

I'd assert this should be fine b/c someOtherBar was either constructed itself from fromPartial / decode / fromJSON (so trusted)or declared with an object literal likeconst bar: BarMessage = { ... }which, if you're not using partial fields, all ofBarMessage's fields should be set (and recursively), so we can still consider bar` to be "full / trusted".

(Granted, if you passed someOtherBar to a fromPartial, it would not be "marked", so we would technically fromPartial it, but the idea is that, if you stick to creating objects from fromPartial / decode / fromJSON, then pragmatically most of your arguments to fromPartial will be already-marked / valid messages.)

Back to @aikoven 's original ask, it was actually along your nudge @webmaster128 of "let the client code deal with it", insofar as client code that is currently passing nested levels of an object literal like:

const foo = FooMessage.fromPartial({
  first: ...,
  address: { street: ... },
});

Would have to put fromPartial around each child literal, i.e.:

const foo = FooMessage.fromPartial({
  first: ...,
  address: AddressMessage.fromPartial({ street: ... }),
});

Which, pros/cons:

  • Pro: is simple for ts-proto to do, i.e. we just add a partial=shallow type option
  • Con: imo creates a lot of boilerplate / noise at each call site

Which, dunno, I'm admittedly biased b/c of similar approaches in my other ORM and GQL libraries, and ts-proto so far, that "con" of "keep re-typing fromPartial" (or newFactory for the GQL library) has bothered me enough that I've really tried to keep the "just data" syntax of { first: ..., address: { street: ... } } possible...

So that's why when in @aikoven brought up the admittedly simple "just give me a non-deep fromPartial" ask, I've been pushing back / brainstorming a way to keep the current API but still achieve his goal of avoiding re-fromPartial-izing / copying messages that are already "good".

(Just thinking of the other libraries, the ORM used classes, so could do instanceof to recognize "already good" entities, and the GQL library used a Set of "already created objects".)

So, dunno @aikoven this is your ask and likely your PR :-) so wdyt, i.e. it seems like leading options are now a) add a shallow partial and punt on "optimized deep partials" or b) use a weak set to mark ts-proto-created objects? (...or both, and let users choose).

@aikoven
Copy link
Collaborator Author

aikoven commented Dec 30, 2021

My rationale is that I've made probably ~2-3 libraries to "create nested data" (one being an ORM creating backend entities, another for client-side GraphQL types, and also ts-proto's fromPartial) that all want to let the programmer easily created a nested graph/tree of data as succinctly as possible, and then have the library "apply / fill in defaults".

One of the things that led me to this proposal is this: I rarely create nested objects inline, more often they are created in separate functions. For example, consider this gRPC API:

service Users {
  rpc GetUsers(GetUsersRequest) returns (GetUsersResponse);
}

message User {
  string id = 1;
  string name = 2;
}

message GetUsersRequest {};

message GetUsersResponse {
  repeated User users = 1;
};

Now imagine that we're implementing this API. Our GetUsers implementation needs to go to database and then convert received data to proto:

const usersImpl = {
  async getUsers(request: GetUsersRequest): Promise<GetUsersResponse> {
    const usersFromDb = await getUsersFromDb();

    return GetUsersResponse.fromPartial({
      users: usersFromDb.map(item => ({...}))
    });
  }
}

For now it's fine to do the conversion inline, and have the mapper function item => ({...}) return deep-partal data. Now we add another method GetUserById that uses the same User message in its response:

message GetUserByIdResponse {
  User user = 1;
}

The implementation of this new method requires the same mapper function, so we need to extract it and reuse:

function convertUserToProto(userFromDb: UserFromDb): User

It feels natural to return the full User here, but what I in fact tend to do is return DeepPartial<User>, as this is what the caller actually needs.

Theoretically it is fine to just always have "outgoing" objects be DeepPartial. In practice though these objects become kinda second-class citizens (in contrast to "incoming" objects), that have to carry that DeepPartial wrapper with them, making the code more verbose and obscure.

So, dunno @aikoven this is your ask and likely your PR :-) so wdyt, i.e. it seems like leading options are now a) add a shallow partial and punt on "optimized deep partials" or b) use a weak set to mark ts-proto-created objects? (...or both, and let users choose).

The WeakSet idea is smart, though it could break some code that uses mutation:

const nested = Nested.fromPartial({...})
const msg = Msg.fromPartial({nested});
msg.nested.field = '';

Currently, this code won't mutate the nested object, only its clone inside msg. But if we introduce the WeakSet optimization, it will mutate.

Also, it only helps with objects, but not with nested arrays: we still have to iterate over them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants