Update the SQL script used for generating latest snapshot.#703
Update the SQL script used for generating latest snapshot.#703sophie4869 wants to merge 3 commits intofirebase:nextfrom
Conversation
The old script uses FIRST_VALUE and OVER, which sorts the entire changelog and finds the first record for each document. It can result in a memory issue when running BigQuery reading from the latest snapshot. (Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 110% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 96%) The updated script selects the maximum timestamp for each document_id, and joins back with the table by the latest timestamp instead.
|
Thanks @sophie4869. This will be tested internally and has been added to the tracker for reviewing. |
…chema view. There's no need to find the latest value again.
|
@sophie4869 There is a slight delay for reviewing this, I am currently experiencing installation issues... This is perhaps related to #701 |
|
Hi @sophie4869. The latest updates from the With these updates, The updates look great after reviewing on a test installation - If you can update the above we can look at getting this approved. Any questions, let me know! |
|
+1 to solving this problem. I could work on this if a hand is required |
|
Thanks @dgilperez. We are appreciate prs/updates from the community (and provide credit for contributions). Otherwise this is still in our backlog to update/complete. |
|
Thanks for the quick reply @dackers86. Shall I open a new PR as a fork from this one, or @sophie4869 do you want me to work on your fork (I guess you'll need to grant me permissions). I will do the former if there is no quick response from Sophie, if that's OK |
|
I can take a look by the end of the week. Does that work for you?
…On Mon, 14 Mar 2022, 16:32 David Gil, ***@***.***> wrote:
Thanks for the quick reply @dackers86 <https://github.com/dackers86>.
Shall I open a new PR as a fork from this one, or @sophie4869
<https://github.com/sophie4869> do you want me to work on your fork (I
guess you'll need to grant me permissions). I will do the former if there
is no quick response from Sophie, if that's OK
—
Reply to this email directly, view it on GitHub
<#703 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHYQDF5TB6D35I4ZNIVYITU75LX3ANCNFSM5AXZMTBA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@sophie4869 I created a new PR at #915, with your changes rebased. I could not run the test suite (did not know how) and did not hook the extension to a real Firebase project either. But I am running your queries in my BigQuery views with good results 👍 |
|
Will this cover if there are null time stamps or the latest entry for a document has two entries with the same timestamp? |
|
Closing this as reopened (these commits rebased on next) and tracked in a PR here: #1288 |
The old script uses FIRST_VALUE and OVER, which sorts the entire changelog and finds the first record for each document. It can result in a memory issue when running BigQuery reading from the latest snapshot. (Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 110% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 96%)
The updated script selects the maximum timestamp for each document_id, and joins back with the table by the latest timestamp instead.