Skip to content

Update the SQL script used for generating latest snapshot.#703

Closed
sophie4869 wants to merge 3 commits intofirebase:nextfrom
sophie4869:sophie-next
Closed

Update the SQL script used for generating latest snapshot.#703
sophie4869 wants to merge 3 commits intofirebase:nextfrom
sophie4869:sophie-next

Conversation

@sophie4869
Copy link
Copy Markdown
Contributor

The old script uses FIRST_VALUE and OVER, which sorts the entire changelog and finds the first record for each document. It can result in a memory issue when running BigQuery reading from the latest snapshot. (Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 110% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 96%)

The updated script selects the maximum timestamp for each document_id, and joins back with the table by the latest timestamp instead.

The old script uses FIRST_VALUE and OVER, which sorts the entire changelog and finds the first record for each document. It can result in a memory issue when running BigQuery reading from the latest snapshot. (Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 110% of limit. Top memory consumer(s):  sort operations used for analytic OVER() clauses: 96%)

The updated script selects the maximum timestamp for each document_id, and joins back with the table by the latest timestamp instead.
@google-cla google-cla Bot added the cla: yes Author has signed the CLA label Jul 21, 2021
@dackers86
Copy link
Copy Markdown
Member

Thanks @sophie4869.

This will be tested internally and has been added to the tracker for reviewing.

…chema view. There's no need to find the latest value again.
@dackers86
Copy link
Copy Markdown
Member

@sophie4869 There is a slight delay for reviewing this, I am currently experiencing installation issues...

ext-firestore-bigquery-export-fsexportbigquery
{"@type":"type.googleapis.com/google.cloud.audit.AuditLog","status":{"code":3,"message":"Build failed: npm ERR! cipm can only install packages when your package.json and package-lock.json or npm-shrinkwrap.json are in sync. Please update your lock file with `npm install` before continuing.\nnpm ERR! \nnpm ERR! \nnpm ERR! Missing: @firebaseextensions/firestore-bigquery-change-tracker@^1.1.10\nnpm ERR! \n\nnpm ERR! A complete log of this run can be found in:\nnpm ERR! /www-data-home/.npm/_logs/2021-08-10T12_35_21_209Z-debug.log; Error ID: beaf8772"},"authenticationInfo":{"principalEmail":"219368645393@cloudservices.gserviceaccount.com"},"serviceName":"cloudfunctions.googleapis.com","methodName":"google.cloud.functions.v1.CloudFunctionsService.CreateFunction","resourceName":"projects/extensions-testing/locations/us-central1/functions/ext-firestore-bigquery-export-fsexportbigquery"}

This is perhaps related to #701

@dackers86
Copy link
Copy Markdown
Member

Hi @sophie4869.

The latest updates from the next branch will now fix npm errors when installing locally.

With these updates, CI should also now run the tests which also need to be updated to match the changes you have made.

The updates look great after reviewing on a test installation - If you can update the above we can look at getting this approved. Any questions, let me know!

@dackers86 dackers86 added the needs: author feedback Pending additional information from the author label Jan 6, 2022
@dgilperez
Copy link
Copy Markdown
Contributor

+1 to solving this problem. I could work on this if a hand is required

@dackers86
Copy link
Copy Markdown
Member

Thanks @dgilperez. We are appreciate prs/updates from the community (and provide credit for contributions).

Otherwise this is still in our backlog to update/complete.

@dgilperez
Copy link
Copy Markdown
Contributor

Thanks for the quick reply @dackers86. Shall I open a new PR as a fork from this one, or @sophie4869 do you want me to work on your fork (I guess you'll need to grant me permissions). I will do the former if there is no quick response from Sophie, if that's OK

@sophie4869
Copy link
Copy Markdown
Contributor Author

sophie4869 commented Mar 14, 2022 via email

@dgilperez
Copy link
Copy Markdown
Contributor

@sophie4869 I created a new PR at #915, with your changes rebased. I could not run the test suite (did not know how) and did not hook the extension to a real Firebase project either.

But I am running your queries in my BigQuery views with good results 👍

@meyerovb
Copy link
Copy Markdown

meyerovb commented Nov 3, 2022

Will this cover if there are null time stamps or the latest entry for a document has two entries with the same timestamp?

@cabljac
Copy link
Copy Markdown
Contributor

cabljac commented Nov 7, 2022

Closing this as reopened (these commits rebased on next) and tracked in a PR here: #1288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Author has signed the CLA needs: author feedback Pending additional information from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants