Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add recursiveDelete() #1427

Merged
merged 11 commits into from
Feb 23, 2021
2 changes: 1 addition & 1 deletion dev/src/bulk-writer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -688,7 +688,7 @@ export class BulkWriter {
* Throws an error if the BulkWriter instance has been closed.
* @private
*/
private _verifyNotClosed(): void {
_verifyNotClosed(): void {
if (this._closing) {
throw new Error('BulkWriter has already been closed.');
}
Expand Down
137 changes: 129 additions & 8 deletions dev/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@

import * as firestore from '@google-cloud/firestore';

import {CallOptions, grpc, RetryOptions} from 'google-gax';
import {CallOptions, GoogleError, grpc, RetryOptions, Status} from 'google-gax';
import {Duplex, PassThrough, Transform} from 'stream';

import {URL} from 'url';

import {google} from '../protos/firestore_v1_proto_api';
import {ExponentialBackoff, ExponentialBackoffSetting} from './backoff';
import {BulkWriter} from './bulk-writer';
import {BulkWriter, BulkWriterError} from './bulk-writer';
import {BundleBuilder} from './bundle';
import {fieldsFromJson, timestampFromJson} from './convert';
import {
Expand Down Expand Up @@ -141,7 +141,7 @@ const CLOUD_RESOURCE_HEADER = 'google-cloud-resource-prefix';
/*!
* The maximum number of times to retry idempotent requests.
*/
const MAX_REQUEST_RETRIES = 5;
export const MAX_REQUEST_RETRIES = 5;

/*!
* The default number of idle GRPC channel to keep.
Expand All @@ -166,7 +166,7 @@ const MAX_CONCURRENT_REQUESTS_PER_CLIENT = 100;
*
* @private
*/
const REFERENCE_NAME_MIN_ID = '__id-9223372036854775808__';
export const REFERENCE_NAME_MIN_ID = '__id-9223372036854775808__';

/**
* Document data (e.g. for use with
Expand Down Expand Up @@ -399,6 +399,26 @@ export class Firestore implements firestore.Firestore {
*/
private registeredListenersCount = 0;

/**
* A lazy-loaded BulkWriter instance to be used with recursiveDelete() if no
* BulkWriter instance is provided.
*
* @private
*/
private _bulkWriter: BulkWriter | undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to BulkWriter (kind of like a singleton pattern)? This could then be a shared instance that could be used by other operations in the future as well.

I also wonder whether the default ramp up makes sense here. It will slow down the delete (especially when compared to the CLI) and it also doesn't quite fit the use case for 5/5/5. We do not have to ramp up to give the database a chance to shard - the data is already sharded, we just want to delete it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to BulkWriter (kind of like a singleton pattern)? This could then be a shared instance that could be used by other operations in the future as well.

Do you mean that calling firestore.bulkWriter() would return the same instance each time, or having a Firestore-internal BulkWriter.bulkWriter() getter that lazy-loads?

I also wonder whether the default ramp up makes sense here.

Yeah, that's something I was thinking about as well to implement as part of the RST_STREAM retries. Deletes aren't subject to 555 throttling, which creates two possibilities. 1) We can add an internal override on BulkWriter to turn off throttling when recursive delete is called. However, this would mean that writes being performed at the same time won't get throttled. 2) We can add logic for BulkWriter to automatically ignore throttling limits for deletes, but then we'd have to batch deletes separately from writes (or think of another solution that allows writes to still be throttled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firestore internal getter that lazy-loads. I want to keep it similar to what you have, but try to move these internals out of Firestore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a static BulkWriter.bulkWriter() method.


/**
* Lazy-load the Firestore's default BulkWriter.
*
* @private
*/
private getBulkWriter(): BulkWriter {
if (!this._bulkWriter) {
this._bulkWriter = this.bulkWriter();
}
return this._bulkWriter;
}

/**
* Number of pending operations on the client.
*
Expand Down Expand Up @@ -1198,27 +1218,128 @@ export class Firestore implements firestore.Firestore {
this.bulkWritersCount -= 1;
}

/**
* Recursively deletes all documents and subcollections at and under the
* specified level.
*
* If any delete fails, the promise is rejected with an error message
* containing the number of failed deletes and the stack trace of the last
* failed delete. The provided reference is deleted regardless of whether
* all deletes succeeded.
*
* `recursiveDelete()` uses a BulkWriter instance with default settings to
* perform the deletes. To customize throttling rates or add success/error
* callbacks, pass in a custom BulkWriter instance.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a code snippet here that shows how to collect the references for the failed delete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I included the default error handler to make it easier to paste. Do you think including it would confuse developers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

* @param ref The reference of a document or collection to delete.
* @param bulkWriter Custom BulkWriter instance used to perform the deletes.
* @return A promise that resolves when all deletes have been performed.
* The promise is rejected if any of the deletes fail.
*
* @example
* // Recursively delete a reference and log the references of failures.
* const bulkWriter = firestore.bulkWriter();
* bulkWriter
* .onWriteError((error) => {
* if (
* error.code === GrpcStatus.UNAVAILABLE &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop this line. Deletes are almost always retyrable, and we should show this here and do a blind retry up to MAX_RETRY_ATTEMPTS times.

Regardless of what the developer does, we should apply a sensible backoff on top of these attempts. This means that we should backoff longer for RESOURCE_EXHAUSTED.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added the RESOURCE_EXHAUSTED note to the todo list at the top of the PR.

* error.failedAttempts < MAX_RETRY_ATTEMPTS
* ) {
* return true;
* } else {
* console.log('Failed write at document: ', error.documentRef);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a toString() to make this output sensible.

"That sounds like a good idea, but it seems unrelated to this PR."

I came up with an answer for you to copy & paste.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often rely on printing the javascript object for DocumentReference when debugging to view all the different attributes of the reference, so I'm afraid to replace that with a pretty-print.

Updated to use documentRef.path which should be sufficient. Thoughts?

* return false;
* }
* });
* await firestore.recursiveDelete(docRef, bulkWriter);
*/
recursiveDelete(
ref:
| firestore.CollectionReference<unknown>
| firestore.DocumentReference<unknown>,
bulkWriter?: BulkWriter
): Promise<void> {
// Capture the error stack to preserve stack tracing across async calls.
const stack = Error().stack!;

const writer = bulkWriter ?? this.getBulkWriter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the extra writer variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writer has type BulkWriter, whereas bulkWriter always has type BulkWriter | undefined, which makes using the variable later messier.

writer._verifyNotClosed();
const docStream = this._getAllDescendants(
ref instanceof CollectionReference
? (ref as CollectionReference<unknown>)
: (ref as DocumentReference<unknown>)
);
const resultDeferred = new Deferred<void>();
let errorCount = 0;
let lastError: Error | undefined;
const incrementErrorCount = (err: Error): void => {
errorCount++;
lastError = err;
};
const onStreamEnd = (): void => {
if (ref instanceof DocumentReference) {
writer.delete(ref).catch(err => incrementErrorCount(err));
}
writer.flush().then(async () => {
if (lastError === undefined) {
resultDeferred.resolve();
} else {
let error = new GoogleError(
`${errorCount} ` +
`${errorCount !== 1 ? 'deletes' : 'delete'} ` +
'failed. The last delete failed with: '
);
if (lastError instanceof BulkWriterError) {
error.code = (lastError.code as number) as Status;
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why these lines aren't covered by Codecov, but I'm pretty sure they're being called. Maybe it's b/c I put it in a lambda function to access lastError and errorCount. Is this an anti-pattern?

error = wrapError(error, stack);

// Wrap the BulkWriter error last to provide the full stack trace.
resultDeferred.reject(
lastError.stack ? wrapError(error, lastError.stack ?? '') : error
);
}
});
};

docStream
.on('error', err => {
err.code = Status.FAILED_PRECONDITION;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at https://developers.google.com/maps-booking/reference/grpc-api/status_codes

FAILED_PRECONDITION means that a simple retry won't work here. The developer has to change something about the state of the system before retrying the operation. This is likely not the case here. I would guess that we will have an error code here already when we hit this code path, but if we don't, I would suggest using UNAVAILABLE as the return code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

err.message = 'Failed to fetch children documents.';
lastError = err;
onStreamEnd();
})
.on('data', (snap: QueryDocumentSnapshot) => {
writer.delete(snap.ref).catch(err => incrementErrorCount(err));
})
.on('end', () => onStreamEnd());

return resultDeferred.promise;
}

/**
* Retrieves all descendant documents nested under the provided reference.
*
* @private
* @return {Stream<QueryDocumentSnapshot>} Stream of descendant documents.
*/
// TODO(chenbrian): Make this a private method after adding recursive delete.
_getAllDescendants(
ref: CollectionReference | DocumentReference
private _getAllDescendants(
ref: CollectionReference<unknown> | DocumentReference<unknown>
): NodeJS.ReadableStream {
// The parent is the closest ancestor document to the location we're
// deleting. If we are deleting a document, the parent is the path of that
// document. If we are deleting a collection, the parent is the path of the
// document containing that collection (or the database root, if it is a
// root collection).

let parentPath = ref._resourcePath;
if (ref instanceof CollectionReference) {
parentPath = parentPath.popLast();
}
const collectionId =
ref instanceof CollectionReference ? ref.id : ref.parent.id;
ref instanceof CollectionReference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionReference means CollectionReference<DocumentData>. If you change this to ref instanceof CollectionReference<unknown> you should be able to remove the as DocumentReference cast (which should also be as DocumentReference<unknown>).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TS compiler was not happy with that suggestion:
TS2348: Value of type 'typeof CollectionReference' is not callable. Did you mean to include 'new'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the TS compiler is right, since the generics support of TypeScript is pretty much like Java and "erased at compile time". I was able to get the code to transpiler even without the ref as DocumentReference though..

FWIW, an as DocumentReference<unknown> would still be the correct way to type this if we were to keep the cast. Right now, you are dropping the converter in the cast, when really we don't know anything about it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the <unknown> type to the DocumentReference cast.

? ref.id
: (ref as DocumentReference).parent.id;

let query: Query = new Query(
this,
Expand Down
140 changes: 83 additions & 57 deletions dev/system-test/firestore.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2629,69 +2629,95 @@ describe('BulkWriter class', () => {
return firestore.terminate();
});

// TODO(chenbrian): This is a temporary test used to validate that the
// StructuredQuery calls work properly. Remove these tests after adding
// recursive delete tests.
it('finds nested documents and collection', async () => {
// ROOT-DB
// └── randomCol
// ├── anna
// └── bob
// └── parentsCol
// ├── charlie
// └── daniel
// └── childCol
// ├── ernie
// └── francis
const batch = firestore.batch();
batch.set(randomCol.doc('anna'), {name: 'anna'});
batch.set(randomCol.doc('bob'), {name: 'bob'});
batch.set(randomCol.doc('bob/parentsCol/charlie'), {name: 'charlie'});
batch.set(randomCol.doc('bob/parentsCol/daniel'), {name: 'daniel'});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/ernie'), {
name: 'ernie',
});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/francis'), {
name: 'francis',
});
await batch.commit();
describe('recursiveDelete()', () => {
async function countDocumentChildren(
ref: DocumentReference
): Promise<number> {
let count = 0;
const collections = await ref.listCollections();
for (const collection of collections) {
count += await countCollectionChildren(collection);
}
return count;
}

const numStreamItems = async (
stream: NodeJS.ReadableStream
): Promise<number> => {
async function countCollectionChildren(
ref: CollectionReference
): Promise<number> {
let count = 0;
// eslint-disable-next-line @typescript-eslint/no-unused-vars
for await (const _ of stream) {
++count;
const docs = await ref.listDocuments();
for (const doc of docs) {
count += (await countDocumentChildren(doc)) + 1;
}
return count;
};
}

// Query all descendants of collections.
let descendantsStream = await firestore._getAllDescendants(randomCol);
expect(await numStreamItems(descendantsStream)).to.equal(6);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob').collection('parentsCol')
);
expect(await numStreamItems(descendantsStream)).to.equal(4);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob').collection('parentsCol/daniel/childCol')
);
expect(await numStreamItems(descendantsStream)).to.equal(2);
beforeEach(async () => {
// ROOT-DB
// └── randomCol
// ├── anna
// └── bob
// └── parentsCol
// ├── charlie
// └── daniel
// └── childCol
// ├── ernie
// └── francis
const batch = firestore.batch();
batch.set(randomCol.doc('anna'), {name: 'anna'});
batch.set(randomCol.doc('bob'), {name: 'bob'});
batch.set(randomCol.doc('bob/parentsCol/charlie'), {name: 'charlie'});
batch.set(randomCol.doc('bob/parentsCol/daniel'), {name: 'daniel'});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/ernie'), {
name: 'ernie',
});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/francis'), {
name: 'francis',
});
await batch.commit();
});

// Query all descendants of documents.
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob')
);
expect(await numStreamItems(descendantsStream)).to.equal(4);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob/parentsCol/daniel')
);
expect(await numStreamItems(descendantsStream)).to.equal(2);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('anna')
);
expect(await numStreamItems(descendantsStream)).to.equal(0);
it('recursiveDelete on top-level collection', async () => {
await firestore.recursiveDelete(randomCol);
expect(await countCollectionChildren(randomCol)).to.equal(0);
});

it('recursiveDelete on nested collection', async () => {
const coll = randomCol.doc('bob').collection('parentsCol');
await firestore.recursiveDelete(coll);

expect(await countCollectionChildren(coll)).to.equal(0);
expect(await countCollectionChildren(randomCol)).to.equal(2);
});

it('recursiveDelete on nested document', async () => {
const doc = randomCol.doc('bob/parentsCol/daniel');
await firestore.recursiveDelete(doc);

const docSnap = await doc.get();
expect(docSnap.exists).to.be.false;
expect(await countDocumentChildren(randomCol.doc('bob'))).to.equal(1);
expect(await countCollectionChildren(randomCol)).to.equal(3);
});

it('recursiveDelete on leaf document', async () => {
const doc = randomCol.doc('bob/parentsCol/daniel/childCol/ernie');
await firestore.recursiveDelete(doc);

const docSnap = await doc.get();
expect(docSnap.exists).to.be.false;
expect(await countCollectionChildren(randomCol)).to.equal(5);
});

it('recursiveDelete with custom BulkWriter instance', async () => {
const bulkWriter = firestore.bulkWriter();
let callbackCount = 0;
bulkWriter.onWriteResult(() => {
callbackCount++;
});
await firestore.recursiveDelete(randomCol, bulkWriter);
expect(callbackCount).to.equal(6);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also get a test that has "randomCollA/docA" and "randomCollB/docB" and only deletes randomCollA? There might be some nuances in how to delete parent collection (in other ports) and I want to make sure that all of our clients handle this case correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, done.

});

it('can retry failed writes with a provided callback', async () => {
Expand Down