Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve bug preventing affectedRows count for mutation queries #72

Merged

Conversation

pmooney-socraticworks
Copy link
Contributor

Resolves this issue

Though there was code in parseResults to return the affectedRows count for INSERT/UPDATE/DELETE queries, it was never being called because of an extra check for a truthy currentResultSet, which was initialized to null and only populated if there was a message that was a RowDescriptionMessage in the list of messages, which may not be present.

Beyond that though, the currentResultSet was only pushed into the returning resultSet if the message was RowDescriptionMessage. I moved this push to CommandCompleteMessage instead, which seems to always be either the only or last message returned from a query:

The response to a SELECT query (or other queries that return row sets, such as EXPLAIN or SHOW) normally consists of RowDescription, zero or more DataRow messages, and then CommandComplete. COPY to or from the frontend invokes special protocol as described in Section 55.2.6. All other query types normally produce only a CommandComplete message.

source

Beyond that main fix, there were a couple of housekeeping items I took care of, including:

  • Running the prettier command defined in the package.json
  • Updating a few timestamp-related tests to compare agains the UTC milliseconds of the returned timestamps, not their javascript dates. This standardizes the comparison for developers running the tests in any timezone, as it would only currently pass for developers in the GMT timezone (+00).
  • Add pgdata-test to .gitignore

Thanks for working on this project @samwillis, it's been great seeing this evolve. Please let me know if you see any issues or have any comments.

@@ -116,7 +123,7 @@ test("basic types", async (t) => {
SELECT * FROM test;
`);

t.deepEqual(res, {
t.like(res, {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deepEqual verifies all enumerable properties match on the res object. like does not. This was the only thing I could think of to standardize timestamp comparisons across timezones.

Copy link
Collaborator

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pmooney-socraticworks

Thanks for the PR, good spot on the missing affectedRows for queries returning no rows!

Something important to note, parseResults is used on results from both query() and exec() and that exec can take multiple sql statements and so return multiple sets of results. I think your change fixes the missing affectedRows but breaks the multiple results sets for exec.

currentResultSet was initiated and used when first seeing a RowDescriptionMessage as messages could look like:

  • RowDescriptionMessage
  • DataRowMessage
  • DataRowMessage
  • RowDescriptionMessage
  • DataRowMessage
  • DataRowMessage
  • CommandCompleteMessage

I think the best fix is to retain the existing logic, but add a condition in the CommandCompleteMessage branch to create and push a resultSet if there isn't one.

This to some extent raises a question about affectedRows. For a exec() that does an insert then select the affectedRows will be impacted only from the insert but placed on the select result set as it's the last statement. Essentially affectedRows is an accumulator for all statements, but added to the last result set. I think I'm ok with that for now and I don't really want to start nesting result sets in a wrapper object... something to think about.

@pmooney-socraticworks
Copy link
Contributor Author

@samwillis Thanks for looking at this, and I'd be happy to check and see how exec changed in response to these changes. Just to make sure we're on the same page and I have the correct expectations, given this query:

  const multiQueryResult = await db.exec(`
    INSERT INTO test (name) VALUES ('test');
    UPDATE test SET name = 'test2';
    SELECT * FROM test;
  `);

The main branch currently gives these results:

[
  {
    rows: [ { id: 1, name: 'test2' } ],
    fields: [ { name: 'id', dataTypeID: 23 }, { name: 'name', dataTypeID: 25 } ],
    affectedRows: 0
  },
  {
    rows: [ { id: 1, name: 'test2' } ],
    fields: [ { name: 'id', dataTypeID: 23 }, { name: 'name', dataTypeID: 25 } ],
    affectedRows: 0
  },
  {
    rows: [ { id: 1, name: 'test2' } ],
    fields: [ { name: 'id', dataTypeID: 23 }, { name: 'name', dataTypeID: 25 } ],
    affectedRows: 0
  }
]

My naive guess would that they would have returned:

[
  {
    rows: [],
    fields: [],
    affectedRows: 1
  },
  {
    rows: [],
    fields: [],
    affectedRows: 1
  },
  {
    rows: [ { id: 1, name: 'test2' } ],
    fields: [ { name: 'id', dataTypeID: 23 }, { name: 'name', dataTypeID: 25 } ],
    affectedRows: 0
  }
]

Since postgres doesn't return rows from inserts and updates unless the RETURNING keyword is used.

Your thoughts?

@samwillis
Copy link
Collaborator

Hmm, If main is returning the rows three times that's a bug 😬

The affectedRows is only possible to infer from the CommandCompleteMessageafter all statements and so can't be applied to the individual resultSets. Unfortunately that means that without changing the output structure it should be this:

[
  {
    rows: [],
    fields: [],
  },
  {
    rows: [],
    fields: [],
  },
  {
    rows: [ { id: 1, name: 'test2' } ],
    fields: [ { name: 'id', dataTypeID: 23 }, { name: 'name', dataTypeID: 25 } ],
    affectedRows: 2
  }
]

If we leave off the affectedRows property except for on the last result I think that's ok.

@pmooney-socraticworks
Copy link
Contributor Author

@samwillis Ahh, ok, that makes sense. I'll write a test for that then and see if I can make it behave. Thanks for getting back to me.

@samwillis
Copy link
Collaborator

@pmooney-socraticworks no worries, awesome that your keen to contribute!

- Relevant for `.exec` queries containing one or more mutation queries
- Prevent `rows` from SELECT queries bleeding over into previous result sets.
Comment on lines +82 to +89
function retrieveRowCount(msg: CommandCompleteMessage): number {
const parts = msg.text.split(" ");
switch (parts[0]) {
case "INSERT":
return parseInt(parts[2], 10);
case "UPDATE":
case "DELETE":
return parseInt(parts[1]);
return parseInt(parts[1], 10);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to retrieveRowCount, because I realized that INSERT statements send back a message that looks like INSERT 0 1, where the second number is actually the count of rows inserted. Per our previous conversation, if we want an affectedRows count of 2, then we would need to consider that number.

Not sure if a newly created row is "affected" though.

Oh, and I added a 10 for the radix, to ensure a base-10 number (force of habit)


for (const msg of messages) {
const filteredMessages = messages.filter(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filtering because there are messages after the last CommandCompletedMessage that we don't care about, but we need to check to see, when we're iterating, if we're on the last CommandCompletedMessage, so that we can append affectedRows.

@pmooney-socraticworks
Copy link
Contributor Author

@pmooney-socraticworks no worries, awesome that your keen to contribute!

My pleasure. I'm pretty excited about this project. It has a lot of potential to become standard in people's testing infrastructure for Postgres integration tests. In one test suite I did manage to get running, the time it took to run went down from 55 seconds to about 9 seconds. With improvements between JS <-> WASM down the road, I can see that number shrinking even more.

@samwillis samwillis self-requested a review April 6, 2024 07:12
Copy link
Collaborator

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work 👍

@samwillis samwillis merged commit 7110f37 into electric-sql:main Apr 6, 2024
@pmooney-socraticworks pmooney-socraticworks deleted the update-affectedRows-count branch April 6, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

affectedRows not being returned in UPDATE .query method
2 participants