Skip to content

Conversation

@dnwpark
Copy link
Contributor

@dnwpark dnwpark commented Aug 22, 2025

Given a schema

type A {
   total_val := sum(.<a[is B].val);
};
type B {
    a: A;
    val: int64;
};

If a user does:

a = default.A()
b = default.B(a=a, val=7)

client.sync(a, b)

The order of inserts will be a, then b and so the value of a.total_val fetched during the insert will be 0.

This PR adds a refetch query for the computed properties of newly inserted objects.

Tests are based on #848

@dnwpark dnwpark requested review from 1st1 and msullivan August 22, 2025 19:40
@dnwpark dnwpark force-pushed the refetch-new-computed branch from 963073a to d36e8c0 Compare August 22, 2025 22:57
@dnwpark dnwpark force-pushed the refetch-new-computed branch from d36e8c0 to b753f12 Compare August 22, 2025 22:58
@dnwpark dnwpark force-pushed the refetch-new-computed branch from b753f12 to a4ac639 Compare August 22, 2025 23:09
@dnwpark dnwpark force-pushed the refetch-new-computed branch from a4ac639 to 19fe9fa Compare August 22, 2025 23:14
Base automatically changed from sync-tests to master August 23, 2025 01:23
@dnwpark dnwpark force-pushed the refetch-new-computed branch 2 times, most recently from 309abbd to 0b99ffa Compare August 25, 2025 16:48
ref_shape.fields[field_name] = ptr_info

else:
# New objects only need computeds refetched
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, strictly speaking this isn't true and I'm not sure why we need this "optimization". Even new object's links can be rewritten by a trigger/mutation rewrites.

So the question is why do we need to bother and should we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because the design says we should refetch all computed properties for new objects, but only set properties for existing objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably part of the intended behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not by a trigger, but by a rewrite certainly. We probably do want to refetch all of them?

(Which design do you mean here?)


else:
# New objects only need computeds refetched
for field_name in obj.__pydantic_computed_fields__:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use pointer introspection here, not __pydantic_computed_fields__

)
for obj in shape.models:
link_ids = obj_links_all[obj.id]
link_ids = obj_links_all[self._get_id(obj)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch


def __post_init__(self) -> None:
self.new_object_ids = {}
self.new_objects = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the collection type for new_object_ids

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase onto the fresh master

# see the above comment.
multi_prop_commit_recursive(linked)

if new_obj_id := self.new_object_ids.get(id(obj)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, rebase is needed. Basically with the latest master you'd just self.new_object_ids.get(obj)

@dnwpark dnwpark force-pushed the refetch-new-computed branch 3 times, most recently from 86d8765 to 8eb33b0 Compare August 26, 2025 00:00
@dnwpark dnwpark marked this pull request as ready for review August 26, 2025 01:23
@dnwpark dnwpark force-pushed the refetch-new-computed branch from 8eb33b0 to c2debad Compare August 26, 2025 04:08
@dnwpark dnwpark marked this pull request as draft August 26, 2025 04:45
@dnwpark dnwpark marked this pull request as ready for review August 26, 2025 20:31
@dnwpark dnwpark force-pushed the refetch-new-computed branch from 929784c to 134a7a0 Compare August 27, 2025 16:27
@1st1 1st1 requested a review from vpetrovykh September 2, 2025 17:36
@dnwpark dnwpark force-pushed the refetch-new-computed branch 3 times, most recently from acdf183 to dc41bdb Compare September 3, 2025 00:09
@dnwpark
Copy link
Contributor Author

dnwpark commented Sep 3, 2025

This now includes fixes for #884

tp_pointers = tp_obj.__gel_reflection__.pointers
ref_shape = refetch_ops[tp_obj]
ref_shape.models.append(obj)
# Existing objects should refetch anything that was previously set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just existing objects, right? New objects too?

if (
# prop not refetched
prop.name not in new_obj.__dict__
# TODO: Refetching links for new objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Comment on lines 1713 to 1718
if (
self.refetch
and obj in self.new_object_ids
and (new_obj_id := self.new_object_ids.get(obj))
):
new_obj = self.new_objects.get(new_obj_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? Why does new_objects need a different path for updating the objects with refetched data than the old objects?

Copy link
Contributor Author

@dnwpark dnwpark Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 instances of the same object:

  • the user instance
  • the batch instance
  • the refetch instance

New objects are updated differently from existing objects since they don't have an id to start with.

The sequence of operations:

  1. Client._save_impl: Batch queries are run in the client
  2. QueryBatch.record_inserted_data: batch instances saved in new_objects and the relationship to the user instances is saved in new_object_ids.
  3. SaveExecutor._commit:
    1. Computed props from the batch instance are applied to the user instance
    2. apply_refetched_data: the batch instance is updated with the refetch instance's data
    3. _commit_recursive: the user instance is updated with the batch instance's data.

Step 3 has a lot of redundant work so I've simplified it to:

  1. apply_refetched_data: the batch instance is updated with the refetch instance's data
  2. Copy all refetched props to the user instance.

@dnwpark dnwpark force-pushed the refetch-new-computed branch from 4fcac61 to 464610e Compare September 3, 2025 23:03
@dnwpark dnwpark merged commit 854a79c into master Sep 4, 2025
73 of 83 checks passed
@dnwpark dnwpark deleted the refetch-new-computed branch September 4, 2025 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants