Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add recursiveDelete() #1427

Merged
merged 11 commits into from
Feb 23, 2021
18 changes: 17 additions & 1 deletion dev/src/bulk-writer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,8 @@ export class BulkWriter {
*/
readonly _rateLimiter: RateLimiter;

private static instance: BulkWriter;

/**
* The user-provided callback to be run every time a BulkWriter operation
* successfully completes.
Expand Down Expand Up @@ -372,6 +374,20 @@ export class BulkWriter {
}
}

/**
* Returns a lazy-loaded BulkWriter instance.
*
* @param firestore Used to instantiate the BulkWriter instance the first
* time.
* @private
*/
static _getInstance(firestore: Firestore): BulkWriter {
if (!this.instance || this.instance.firestore !== firestore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about the fact that this would break if multiple Firestore instances are used together. I am sightly inclined to ask you to undo this change since it would simplify the instance creation and its lifetime management (at the cost of further increasing the size of Firestore).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see arguments either way, but at this point, I'm more in favor of leaving it on the Firestore instance, since trying to track the different firestore instances to pass into BulkWriter is more confusing than just having the BulkWriter instance available on the Firestore instance.

Reverted back to before.

this.instance = new BulkWriter(firestore);
}
return this.instance;
}

/**
* Create a document with the provided data. This single operation will fail
* if a document exists at its location.
Expand Down Expand Up @@ -688,7 +704,7 @@ export class BulkWriter {
* Throws an error if the BulkWriter instance has been closed.
* @private
*/
private _verifyNotClosed(): void {
_verifyNotClosed(): void {
if (this._closing) {
throw new Error('BulkWriter has already been closed.');
}
Expand Down
116 changes: 108 additions & 8 deletions dev/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@

import * as firestore from '@google-cloud/firestore';

import {CallOptions, grpc, RetryOptions} from 'google-gax';
import {CallOptions, GoogleError, grpc, RetryOptions, Status} from 'google-gax';
import {Duplex, PassThrough, Transform} from 'stream';

import {URL} from 'url';

import {google} from '../protos/firestore_v1_proto_api';
import {ExponentialBackoff, ExponentialBackoffSetting} from './backoff';
import {BulkWriter} from './bulk-writer';
import {BulkWriter, BulkWriterError} from './bulk-writer';
import {BundleBuilder} from './bundle';
import {fieldsFromJson, timestampFromJson} from './convert';
import {
Expand Down Expand Up @@ -141,7 +141,7 @@ const CLOUD_RESOURCE_HEADER = 'google-cloud-resource-prefix';
/*!
* The maximum number of times to retry idempotent requests.
*/
const MAX_REQUEST_RETRIES = 5;
export const MAX_REQUEST_RETRIES = 5;

/*!
* The default number of idle GRPC channel to keep.
Expand All @@ -166,7 +166,7 @@ const MAX_CONCURRENT_REQUESTS_PER_CLIENT = 100;
*
* @private
*/
const REFERENCE_NAME_MIN_ID = '__id-9223372036854775808__';
export const REFERENCE_NAME_MIN_ID = '__id-9223372036854775808__';

/**
* Document data (e.g. for use with
Expand Down Expand Up @@ -1198,27 +1198,127 @@ export class Firestore implements firestore.Firestore {
this.bulkWritersCount -= 1;
}

/**
* Recursively deletes all documents and subcollections at and under the
* specified level.
*
* If any delete fails, the promise is rejected with an error message
* containing the number of failed deletes and the stack trace of the last
* failed delete. The provided reference is deleted regardless of whether
* all deletes succeeded.
*
* `recursiveDelete()` uses a BulkWriter instance with default settings to
* perform the deletes. To customize throttling rates or add success/error
* callbacks, pass in a custom BulkWriter instance.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a code snippet here that shows how to collect the references for the failed delete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I included the default error handler to make it easier to paste. Do you think including it would confuse developers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

* @param ref The reference of a document or collection to delete.
* @param bulkWriter Custom BulkWriter instance used to perform the deletes.
* @return A promise that resolves when all deletes have been performed.
* The promise is rejected if any of the deletes fail.
*
* @example
* // Recursively delete a reference and log the references of failures.
* const bulkWriter = firestore.bulkWriter();
* bulkWriter
* .onWriteError((error) => {
* if (
* error.failedAttempts < MAX_RETRY_ATTEMPTS
* ) {
* return true;
* } else {
* console.log('Failed write at document: ', error.documentRef.path);
* return false;
* }
* });
* await firestore.recursiveDelete(docRef, bulkWriter);
*/
recursiveDelete(
ref:
| firestore.CollectionReference<unknown>
| firestore.DocumentReference<unknown>,
bulkWriter?: BulkWriter
): Promise<void> {
// Capture the error stack to preserve stack tracing across async calls.
const stack = Error().stack!;

const writer = bulkWriter ?? BulkWriter._getInstance(this);
writer._verifyNotClosed();
const docStream = this._getAllDescendants(
ref instanceof CollectionReference
? (ref as CollectionReference<unknown>)
: (ref as DocumentReference<unknown>)
);
const resultDeferred = new Deferred<void>();
let errorCount = 0;
let lastError: Error | undefined;
const incrementErrorCount = (err: Error): void => {
errorCount++;
lastError = err;
};
const onStreamEnd = (): void => {
if (ref instanceof DocumentReference) {
writer.delete(ref).catch(err => incrementErrorCount(err));
}
writer.flush().then(async () => {
if (lastError === undefined) {
resultDeferred.resolve();
} else {
let error = new GoogleError(
`${errorCount} ` +
`${errorCount !== 1 ? 'deletes' : 'delete'} ` +
'failed. The last delete failed with: '
);
if (lastError instanceof BulkWriterError) {
error.code = (lastError.code as number) as Status;
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why these lines aren't covered by Codecov, but I'm pretty sure they're being called. Maybe it's b/c I put it in a lambda function to access lastError and errorCount. Is this an anti-pattern?

error = wrapError(error, stack);

// Wrap the BulkWriter error last to provide the full stack trace.
resultDeferred.reject(
lastError.stack ? wrapError(error, lastError.stack ?? '') : error
);
}
});
};

docStream
.on('error', err => {
err.code = Status.UNAVAILABLE;
err.message = 'Failed to fetch children documents.';
lastError = err;
onStreamEnd();
})
.on('data', (snap: QueryDocumentSnapshot) => {
writer.delete(snap.ref).catch(err => incrementErrorCount(err));
})
.on('end', () => onStreamEnd());

return resultDeferred.promise;
}

/**
* Retrieves all descendant documents nested under the provided reference.
*
* @private
* @return {Stream<QueryDocumentSnapshot>} Stream of descendant documents.
*/
// TODO(chenbrian): Make this a private method after adding recursive delete.
_getAllDescendants(
ref: CollectionReference | DocumentReference
private _getAllDescendants(
ref: CollectionReference<unknown> | DocumentReference<unknown>
): NodeJS.ReadableStream {
// The parent is the closest ancestor document to the location we're
// deleting. If we are deleting a document, the parent is the path of that
// document. If we are deleting a collection, the parent is the path of the
// document containing that collection (or the database root, if it is a
// root collection).

let parentPath = ref._resourcePath;
if (ref instanceof CollectionReference) {
parentPath = parentPath.popLast();
}
const collectionId =
ref instanceof CollectionReference ? ref.id : ref.parent.id;
ref instanceof CollectionReference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionReference means CollectionReference<DocumentData>. If you change this to ref instanceof CollectionReference<unknown> you should be able to remove the as DocumentReference cast (which should also be as DocumentReference<unknown>).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TS compiler was not happy with that suggestion:
TS2348: Value of type 'typeof CollectionReference' is not callable. Did you mean to include 'new'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the TS compiler is right, since the generics support of TypeScript is pretty much like Java and "erased at compile time". I was able to get the code to transpiler even without the ref as DocumentReference though..

FWIW, an as DocumentReference<unknown> would still be the correct way to type this if we were to keep the cast. Right now, you are dropping the converter in the cast, when really we don't know anything about it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the <unknown> type to the DocumentReference cast.

? ref.id
: (ref as DocumentReference).parent.id;

let query: Query = new Query(
this,
Expand Down
154 changes: 97 additions & 57 deletions dev/system-test/firestore.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2629,69 +2629,109 @@ describe('BulkWriter class', () => {
return firestore.terminate();
});

// TODO(chenbrian): This is a temporary test used to validate that the
// StructuredQuery calls work properly. Remove these tests after adding
// recursive delete tests.
it('finds nested documents and collection', async () => {
// ROOT-DB
// └── randomCol
// ├── anna
// └── bob
// └── parentsCol
// ├── charlie
// └── daniel
// └── childCol
// ├── ernie
// └── francis
const batch = firestore.batch();
batch.set(randomCol.doc('anna'), {name: 'anna'});
batch.set(randomCol.doc('bob'), {name: 'bob'});
batch.set(randomCol.doc('bob/parentsCol/charlie'), {name: 'charlie'});
batch.set(randomCol.doc('bob/parentsCol/daniel'), {name: 'daniel'});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/ernie'), {
name: 'ernie',
});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/francis'), {
name: 'francis',
});
await batch.commit();
describe('recursiveDelete()', () => {
async function countDocumentChildren(
ref: DocumentReference
): Promise<number> {
let count = 0;
const collections = await ref.listCollections();
for (const collection of collections) {
count += await countCollectionChildren(collection);
}
return count;
}

const numStreamItems = async (
stream: NodeJS.ReadableStream
): Promise<number> => {
async function countCollectionChildren(
ref: CollectionReference
): Promise<number> {
let count = 0;
// eslint-disable-next-line @typescript-eslint/no-unused-vars
for await (const _ of stream) {
++count;
const docs = await ref.listDocuments();
for (const doc of docs) {
count += (await countDocumentChildren(doc)) + 1;
}
return count;
};
}

// Query all descendants of collections.
let descendantsStream = await firestore._getAllDescendants(randomCol);
expect(await numStreamItems(descendantsStream)).to.equal(6);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob').collection('parentsCol')
);
expect(await numStreamItems(descendantsStream)).to.equal(4);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob').collection('parentsCol/daniel/childCol')
);
expect(await numStreamItems(descendantsStream)).to.equal(2);
beforeEach(async () => {
// ROOT-DB
// └── randomCol
// ├── anna
// └── bob
// └── parentsCol
// ├── charlie
// └── daniel
// └── childCol
// ├── ernie
// └── francis
const batch = firestore.batch();
batch.set(randomCol.doc('anna'), {name: 'anna'});
batch.set(randomCol.doc('bob'), {name: 'bob'});
batch.set(randomCol.doc('bob/parentsCol/charlie'), {name: 'charlie'});
batch.set(randomCol.doc('bob/parentsCol/daniel'), {name: 'daniel'});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/ernie'), {
name: 'ernie',
});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/francis'), {
name: 'francis',
});
await batch.commit();
});

// Query all descendants of documents.
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob')
);
expect(await numStreamItems(descendantsStream)).to.equal(4);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob/parentsCol/daniel')
);
expect(await numStreamItems(descendantsStream)).to.equal(2);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('anna')
);
expect(await numStreamItems(descendantsStream)).to.equal(0);
it('on top-level collection', async () => {
await firestore.recursiveDelete(randomCol);
expect(await countCollectionChildren(randomCol)).to.equal(0);
});

it('on nested collection', async () => {
const coll = randomCol.doc('bob').collection('parentsCol');
await firestore.recursiveDelete(coll);

expect(await countCollectionChildren(coll)).to.equal(0);
expect(await countCollectionChildren(randomCol)).to.equal(2);
});

it('on nested document', async () => {
const doc = randomCol.doc('bob/parentsCol/daniel');
await firestore.recursiveDelete(doc);

const docSnap = await doc.get();
expect(docSnap.exists).to.be.false;
expect(await countDocumentChildren(randomCol.doc('bob'))).to.equal(1);
expect(await countCollectionChildren(randomCol)).to.equal(3);
});

it('on leaf document', async () => {
const doc = randomCol.doc('bob/parentsCol/daniel/childCol/ernie');
await firestore.recursiveDelete(doc);

const docSnap = await doc.get();
expect(docSnap.exists).to.be.false;
expect(await countCollectionChildren(randomCol)).to.equal(5);
});

it('does not affect other collections', async () => {
const collA = randomCol.doc('bob').collection('parentsCol');

// Add other nested collection under 'bob' that shouldn't be deleted.
const collB = randomCol.doc('bob').collection('pets');
await collB.doc('doggo').set({name: 'goodboi'});

await firestore.recursiveDelete(collA);

expect(await countCollectionChildren(collA)).to.equal(0);
expect(await countCollectionChildren(randomCol)).to.equal(3);
expect(await countCollectionChildren(collB)).to.equal(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now a similar test to what you have above (something akin to on nested collection). I think the edge case is for root collections so something like randomCol.doc('bob') and a recursive delete on "otherCollection" (which could really be any name). We can then ensure than randomCol still has one document.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying what you meant. Added a test for that.

});

it('with custom BulkWriter instance', async () => {
const bulkWriter = firestore.bulkWriter();
let callbackCount = 0;
bulkWriter.onWriteResult(() => {
callbackCount++;
});
await firestore.recursiveDelete(randomCol, bulkWriter);
expect(callbackCount).to.equal(6);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also get a test that has "randomCollA/docA" and "randomCollB/docB" and only deletes randomCollA? There might be some nuances in how to delete parent collection (in other ports) and I want to make sure that all of our clients handle this case correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, done.

});

it('can retry failed writes with a provided callback', async () => {
Expand Down