-
Notifications
You must be signed in to change notification settings - Fork 77
insert with buffer #2684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
insert with buffer #2684
Conversation
tnaum-ms
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xingfan-msft Thank you for your first PR. Congratulations! 🎉
My comments in this review are there to help us align with the way we work in this repository.
- I suggested changes to the buffer implementation to make it easier to use.
- I've not started the extension with your changes, but it looks like it would work
- I've spotted changes to
package.jsonthat were unexpected, this reminds me that I didn't share our dev-guide with you (note-to-self: we should really add it to this repo), but in general, in this case: we prefer more commits instead of a single one where many change are bundled together - we're aware that a clean cut is not always possible, but even if it's fuzzy, it still helps to follow the 'chain-of-thought' ;)
I'll follow up on this in our call later today.
| "@microsoft/vscode-azext-dev": "^2.1.0", | ||
| "@pmmmwh/react-refresh-webpack-plugin": "^0.5.15", | ||
| "@swc/cli": "^0.6.0", | ||
| "@swc/core": "^1.11.7", | ||
| "@swc/jest": "^0.2.37", | ||
| "@types/documentdb": "^1.10.13", | ||
| "@types/fs-extra": "^11.0.4", | ||
| "@types/jest": "^29.5.14", | ||
| "@types/lodash.debounce": "^4.0.9", | ||
| "@types/lodash.omit": "^4.5.9", | ||
| "@types/mocha": "^10.0.10", | ||
| "@types/node": "~20.9.5", | ||
| "@types/pg": "^8.11.11", | ||
| "@types/react": "^18.3.18", | ||
| "@types/react-dom": "^18.3.5", | ||
| "@types/semver": "^7.3.12", | ||
| "@types/uuid": "^10.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, did you have a conflict on this file?
We introduced some changes to package.json recently and did update a few packages. Your pull request seems to introduce changes here.
Was any change to package.json intended?
| export type InsertDocumentsResult = { | ||
| /** Indicates whether this write result was acknowledged. If not, then all other members of this result will be undefined */ | ||
| acknowledged: boolean; | ||
| /** The number of inserted documents for this operations */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was acknowledged removed?
| skip: number, | ||
| limit: number, | ||
| ): Promise<WithId<Document>[]> { | ||
| // eslint-disable-next-line @typescript-eslint/no-unsafe-assignment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like our lint configs differ as this shouldn't be necessary. Let's discuss live.
| forceServerObjectId: true, | ||
| ordered: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forceServerObjectId: this comment is not strictly related to this ticket. Let's create a research-ticket to figure out which setting here is recommended (as I don't want to invest in measurements at this stage - we're building a GUI, not a migration library) for bulk-insert scenarios when high throughput is expected.
| constructor(node: CollectionItem) { | ||
| this.node = node; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not on you here, we've started with a dependency on a node as it's a part of a command, but let's try to get rid of it the deeper we go. Here, a ClusterModel is a good candidate.
| private maxDocumentSize: number = 16 * 1024 * 1024; | ||
| private maxTotalSize: number = 32 * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove maxDocumentSize and the document size check. We'll let the backend/the database deal with it.
Today it's 16MB, but, IIRC, it's in BSON, so measuring JSON can be a bit off and we'd run into errors that'd be hard to resolve in the future.
| let flushResponse: { count: number; error: Array<{ index: number; errmsg: string }> | undefined } | undefined; | ||
| if (this.documents.length >= this.maxBatchSize || this.totalSize + docSize > this.maxTotalSize) { | ||
| flushResponse = await this.flush(); | ||
| } | ||
|
|
||
| this.documents.push(document); | ||
| this.totalSize += docSize; | ||
| return flushResponse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the logic here.
We insert documents, but get no feedback, i.e. the return value is undefined and I'm supposed to interpret it as success: the document has been added to the buffer.
Then, once the buffer is flushed on it's own due to it being full, I'd get a response stating how many documents have been inserted.
On top, the flush function is public so I can call it in case of me being done with a buffer and no documents left. I see that you're doing it in another section.
However, it's hard to read and the behavior of the entire class is unexpected with use over time. Some calls will be fast, some will be slow. I'd suggest splitting it so that addDocuments focuses on filling the buffer and returning a success/failure information. Boolean would do, but a status response might be helpful as well (although today the only rejection reason would be the fact that the buffer is full.).
Then, The user would call flush when needed.
Having a size function on the buffer would be helpful.
| import * as fse from 'fs-extra'; | ||
| import * as vscode from 'vscode'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the benefit of fse here? (out of interest, I'm not saying it's a bad choice.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR titled "insert with buffer" introduces a buffering mechanism for Mongo document insertion and improves error reporting in the insert flow.
- Extended the InsertDocumentsResult type to include error indicators and detailed write errors.
- Added a new DocumentBufferforMongo class to batch document inserts with size checks and flushing.
- Adjusted dependency versions and ESLint disable comments to align with the updated code.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/documentdb/ClustersClient.ts | Enhanced insert operation by extending error reporting and adding try–catch around document replacement. |
| src/commands/importDocuments/importDocuments.ts | Introduced the DocumentBufferforMongo class for batching document insertion and refactored insertion logic. |
| package.json | Updated dependency versions and adjusted the extension version to match the intended release state. |
Comments suppressed due to low confidence (1)
src/commands/importDocuments/importDocuments.ts:23
- [nitpick] Consider renaming 'DocumentBufferforMongo' to 'DocumentBufferForMongo' for consistent CamelCase styling.
export class DocumentBufferforMongo {
| acknowledged: boolean; | ||
| /** The number of inserted documents for this operations */ | ||
| insertedCount: number; | ||
| /** Indicates whether error occured duiring insertion. If not, then the member `errorMessage` will undefined */ |
Copilot
AI
May 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typographical errors in the comment; consider replacing 'occured' with 'occurred', 'duiring' with 'during', and inserting 'be' before 'undefined'.
| /** Indicates whether error occured duiring insertion. If not, then the member `errorMessage` will undefined */ | |
| /** Indicates whether an error occurred during insertion. If not, then the member `errorMessage` will be undefined */ |
| insertedCount: number; | ||
| /** Indicates whether error occured duiring insertion. If not, then the member `errorMessage` will undefined */ | ||
| errorsOccured: boolean; | ||
| /** Status code of insertion opertion */ |
Copilot
AI
May 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in documentation; replace 'opertion' with 'operation'.
| /** Status code of insertion opertion */ | |
| /** Status code of insertion operation */ |
| errorMessage: string | undefined; | ||
| /** Array of writeError while processing the bulk write */ | ||
| /** If writeErrors is valid, means the failure happened to specific documents, and the database acknowledged bulk write operation */ | ||
| /** If writeErrors is undefined, means the failure happened to the whole operation, may related to doccument size or some other errors, where retry should be applied */ |
Copilot
AI
May 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typographical error: 'doccument' should be corrected to 'document'.
| /** If writeErrors is undefined, means the failure happened to the whole operation, may related to doccument size or some other errors, where retry should be applied */ | |
| /** If writeErrors is undefined, means the failure happened to the whole operation, may related to document size or some other errors, where retry should be applied */ |
No description provided.