Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk copy return rows in NpgsqlBinaryImporter #2115

Closed
wants to merge 3 commits into from

Conversation

chrisatx
Copy link

See issue #2112 .
I am requesting that npgsql consider integrating this enhancement to return the number of rows copied in NpgsqlBinaryImport.Complete() method.

See PostgreSQLCopyHelper/PostgreSQLCopyHelper#30 for downstream implementation.

In this pull request: one commit with 2 files to implement the minor enhancement in npgsql.

I have tested this commit on the latest dev code in the following way:

  • Cloned npgsql/dev.

  • Changed 2 files to implement enhancement

  • Compiled npgsql:BulkCopyReturnRows branch

  • Implemented downstream enhancement in PostgreSQLCopyHelper (checked out latest there)

  • Compiled PostgreSQLCopyHelper referencing new npgsql binaries

  • Tested COPY using new libraries (npgsql and PostgreSQLCopyHelper) dropped into my application to verify that copy works and returns the row counts.

  • Have not yet had a chance to add tests as I wanted to push this out for comment.

@austindrenski
Copy link
Contributor

Just a little nit: try not to use #issue in your commit messages. GitHub links each commit to the referenced issue, and (sadly) doesn't remove them on close/rewrites/force-pushes.

Use something like #-issue or just the issue number while you're iterating, and we can properly ref them when we squash commit.

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea seems good, but see implementation comments.

Expect<ReadyForQueryMessage>(_connector.ReadMessage());
_state = ImporterState.Committed;

CommandCompleteMessage cmdComplete;
switch (msg1.Code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check the code here? You've made sure with Expect<> above that it's CommandComplete, no?

Copy link
Author

@chrisatx chrisatx Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all very helpful and professional feedback comments!
Thanks very much for taking the time to provide the quality check, given your deep knowledge and stewardship of this fine project.

I think the simple answer is: a) I can't quite remember since I wrote this in a hurry to solve a problem in May/June. b) I think for expediency and completeness, I was re-using a similar "property checking" structure either from NpgsqlBinaryExporter or another class, that clearly required a lot more checks than needed for the simpler case in NpgsqlBinaryImporter.

(Continued in next code review comment)

{
case BackendMessageCode.CompletedResponse:
cmdComplete = (CommandCompleteMessage)msg1;
if (cmdComplete.StatementType == StatementType.Copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, how can the statement type be anything other than copy? You can put this in a Debug.Assert() but it doesn't seem to make sense to test for it...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c) I have been living in F# world recently, which is far more explicit using its type system to disallow invalid states. So my caution in jumping into C# without knowing the codebase probably led me be ok in over-using available checks.

For example: as a completely new user to this codebase and after your prompting, it is (now) intuitive that the following is true:

If msg1 is of type CommandCompleteMessage, then msg1.Code must equal BackendMessageCode.CompletedResponse.

However, when I was adding this code, at first glance I could not be certain of that relationship even though it seemed correct, and so out of caution and limited time, I re-used a check from another part of the codebase.
In F#, I would at this point briefly ponder whether to restructure the types/classes in such a way to remove (not require) these checks, a common refactoring, but this has been less natural to do in C# until the recent introduction of discriminated unions.

Your comment is well noted, and I will revise and streamline this series of logic.
I agree, the checks are unnecessary now that I understand the type/property relationships.

{
CheckReady();
var copyRows = 0; // number of rows copied on success. See "Outputs" section of https://www.postgresql.org/docs/current/static/sql-copy.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to declare this here rather than just returning the value directly below?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check this. At some point a default of 0 should be set.

@@ -301,9 +302,22 @@ public void Complete()
_buf.EndCopyMode();

_connector.SendMessage(CopyDoneMessage.Instance);
Expect<CommandCompleteMessage>(_connector.ReadMessage());

var msg1 = _connector.ReadMessage();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expect<> already returns the message cast to the right type, no need to read it, call Expect<> on it and then downcast to CommandCompleteMessage below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful. I'll revert with some improvements soon. Thanks again!

@chrisatx
Copy link
Author

Thanks @austindrenski. Point noted, and I definitely agree with the goal of reducing noise or inconvenience during the issue tagging process.
I'll follow the suggested approach for tagging issues in commits.

Complete() method. Thanks for code review advice from @austindrenski and
@roji.  This helps fix #-2112 .
@chrisatx
Copy link
Author

I think the code cleanup I committed on Aug 15 incorporates all the suggested code review items.
Let me know if I can assist more, or if there are other concerns.
Thanks!

Expect<ReadyForQueryMessage>(_connector.ReadMessage());
_state = ImporterState.Committed;
// Number of rows copied on success - see "Outputs" section of https://www.postgresql.org/docs/current/static/sql-copy.html
return (int)cmdComplete.Rows;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this cast throw?

@roji More broadly, is there any reason to believe that the protocol restricts this to 2^32?

class CommandCompleteMessage : IBackendMessage
{
internal StatementType StatementType { get; private set; }
internal uint OID { get; private set; }
internal uint Rows { get; private set; }

Per the protocol docs:

CommandComplete (B)
  Byte1('C')
    Identifies the message as a command-completed response.

  Int32
    Length of message contents in bytes, including self.

  String
    The command tag. This is usually a single word that identifies which SQL command was completed.

    For an INSERT command, the tag is INSERT oid rows, where rows is the number of rows inserted. oid is the object ID of the inserted row if rows is 1 and the target table has OIDs; otherwise oid is 0.

    For a DELETE command, the tag is DELETE rows where rows is the number of rows deleted.

    For an UPDATE command, the tag is UPDATE rows where rows is the number of rows updated.

    For a SELECT or CREATE TABLE AS command, the tag is SELECT rows where rows is the number of rows retrieved.

    For a MOVE command, the tag is MOVE rows where rows is the number of rows the cursor's position has been changed by.

    For a FETCH command, the tag is FETCH rows where rows is the number of rows that have been retrieved from the cursor.

    For a COPY command, the tag is COPY rows where rows is the number of rows copied. (Note: the row count appears only in PostgreSQL 8.2 and later.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a good question.

There are other places in the wire protocol where the number of rows is an Int32 (e.g. the max number of rows to return in the Execute message). This doesn't prove that here it's the same but it's an indication.

Also, the ADO.NET API's DbDataReader.RecordsAffected is an int. This doesn't mean that PostgreSQL returns an int, but the standard .NET API does... And it seems to make sense to align the COPY API with this.

We can test, or wait until someone complains about inserting too many rows :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can find some time (and space....) to test this for the sake of curiosity.

In API terms, I was picturing something like LongRecordsAffected to mirror an array's Length and LongLength members.

Copy link
Member

@roji roji Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. In any case it makes sense for me to leave this returning int, and if ever someone complains we can always add a CompleteLong().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldnt it just be easier to change the return of this method to long or uint now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@say25 Generally speaking, there's value in maintaining conformance with ADO.NET APIs.

That being said, since this is still open, we could just add CompleteLong() here, rather than punting for a future exception. But we first need to verify that the protocol can return an out-of-range result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing this below.

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but we're lacking testing for this new functionality. You can add an assert in CopyTests.BinaryRoundtrip.

@austindrenski
Copy link
Contributor

Continuing from the review above:

For reference, it appears that COPY returns a uint64 for the number of processed rows:

src/backend/commands/copy.c#L223-L229:

/* DestReceiver for COPY (query) TO */
typedef struct
{
    DestReceiver pub;       /* publicly-known function pointers */
    CopyState    cstate;    /* CopyStateData for the command */
    uint64       processed; /* # of tuples processed */
} DR_copy;

src/backend/commands/copy.c#L989-L1010:

if (is_from)
{
    Assert(rel);

    /* check read-only transaction and parallel mode */
    if (XactReadOnly && !rel->rd_islocaltemp)
        PreventCommandIfReadOnly("COPY FROM");
    PreventCommandIfParallelMode("COPY FROM");

    cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
                           NULL, stmt->attlist, stmt->options);
    *processed = CopyFrom(cstate);  /* copy from file to database */
    EndCopyFrom(cstate);
}
else
{
    cstate = BeginCopyTo(pstate, rel, query, relid,
                         stmt->filename, stmt->is_program,
                         stmt->attlist, stmt->options);
    *processed = DoCopyTo(cstate);  /* copy from database to file */
    EndCopyTo(cstate);
}

@roji
Copy link
Member

roji commented Sep 6, 2018

@austindrenski thanks for looking this up in the PostgreSQL sources... This makes me think of reversing my previous comment - if the protocol really does return a ulong, maybe we should too as @say25 suggests above. After all this is a totally Npgsql-specific API...

@say25 say25 mentioned this pull request Sep 9, 2018
@roji
Copy link
Member

roji commented Sep 11, 2018

@chrisatx are you interested in continuing work on this PR? I'm asking because @say25 has submitted #2150.

@chrisatx
Copy link
Author

@roji - @say25 's #2150 looks great with the additional testing and uint/ulong work.
If it is nearing completion and acceptance, then it makes sense to move forward with #2150 that builds on and supercedes #2115.
We can close this PR in favor of #2150, and I can contribute there.

@roji
Copy link
Member

roji commented Sep 19, 2018

Superceded by #2150

@roji roji closed this Sep 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants