Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add-MdbcData only adds 1 document at a time #51

Closed
awsles opened this issue Dec 16, 2020 · 7 comments
Closed

Add-MdbcData only adds 1 document at a time #51

awsles opened this issue Dec 16, 2020 · 7 comments

Comments

@awsles
Copy link

awsles commented Dec 16, 2020

The Add-MdbcData cmdlet only adds one document at a time when the -InputObject switch is used with an array. For example: Add-MdbcData -InputObject $MyArray -Collection $coll, my $MyArray contains an array of documents. Passing an array this way simply adds a single document containing an array of the documents -- not exactly as intended.
However, using the PowerShell piping iterator $MyArray | Add-MdbcData -Collection $coll does work.

This behaviour differs from PowerShell practice to also allow the set of objects to be passed and iterated by the implementation (possibly using db.collection.bulkWrite() ?). At minimum, the documentation page should be updated to reflex this behaviour. Ideally though, passing in an array could be detected by the underlying code (which is faster than using the PowerShell piping iterator).

@nightroman
Copy link
Owner

nightroman commented Dec 16, 2020

FWIW, many PowerShell official and community cmdlets do not follow this "practice". Try

Out-String -InputObject @(@{x=1}, @{x=2})
@(@{x=1}, @{x=2}) | Out-String

PowerShell common practice is using the pipeline for many input objects.
Having said that, I agree that the current behavior is not useful. Let me think.

@awsles
Copy link
Author

awsles commented Dec 16, 2020

Writing one at a time is quite slow so a bulk insert option would be quite handy. I also explored trying to call db.collection.bulkWrite() using Invoke-MdbcCommand but it doesn't appear that shell methods can be called via that cmdlet (an Invoke-MdbcShellCommand cmdlet would be awesome to have).

@nightroman
Copy link
Owner

@lesterw1 Does this all exist in C# driver? Mdbc does not use shell, it uses C# driver. The examples/suggestions in shell format are not that useful...

@nightroman
Copy link
Owner

Done, v6.5.8

@awsles
Copy link
Author

awsles commented Dec 17, 2020

Just tested v6.5.8. Using a collection for -InputObject is about 8% faster than pipelining. Great result! Thank you.

@awsles
Copy link
Author

awsles commented Dec 17, 2020

What about using the C# driver collection.BulkWriteAsync() as a further optimization?

@nightroman
Copy link
Owner

What about using the C# driver collection.BulkWriteAsync() as a further optimization?

Unlike this topic suggestion, BulkWriteAsync is not that straightforward to engage. I am sure this is doable but currently I have no such plans. (1) I have no free time. (2) It needs some thinking about the design/concept. Bulk write is about all kind of write operations combined together, not about just adding documents. This operation probably needs a new cmdlet, and it is not obvious how to design its input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants