Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add recursiveDelete() #1427

Merged
merged 11 commits into from
Feb 23, 2021
120 changes: 113 additions & 7 deletions dev/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@

import * as firestore from '@google-cloud/firestore';

import {CallOptions, grpc, RetryOptions} from 'google-gax';
import {CallOptions, GoogleError, grpc, RetryOptions, Status} from 'google-gax';
import {Duplex, PassThrough, Transform} from 'stream';

import {URL} from 'url';

import {google} from '../protos/firestore_v1_proto_api';
import {ExponentialBackoff, ExponentialBackoffSetting} from './backoff';
import {BulkWriter} from './bulk-writer';
import {BulkWriter, BulkWriterError} from './bulk-writer';
import {BundleBuilder} from './bundle';
import {fieldsFromJson, timestampFromJson} from './convert';
import {
Expand Down Expand Up @@ -76,6 +76,7 @@ const serviceConfig = interfaces['google.firestore.v1.Firestore'];

import api = google.firestore.v1;
import {CollectionGroup} from './collection-group';
import {DocumentData} from '@google-cloud/firestore';
thebrianchen marked this conversation as resolved.
Show resolved Hide resolved

export {
CollectionReference,
Expand Down Expand Up @@ -141,7 +142,7 @@ const CLOUD_RESOURCE_HEADER = 'google-cloud-resource-prefix';
/*!
* The maximum number of times to retry idempotent requests.
*/
const MAX_REQUEST_RETRIES = 5;
export const MAX_REQUEST_RETRIES = 5;

/*!
* The default number of idle GRPC channel to keep.
Expand All @@ -166,7 +167,7 @@ const MAX_CONCURRENT_REQUESTS_PER_CLIENT = 100;
*
* @private
*/
const REFERENCE_NAME_MIN_ID = '__id-9223372036854775808__';
export const REFERENCE_NAME_MIN_ID = '__id-9223372036854775808__';

/**
* Document data (e.g. for use with
Expand Down Expand Up @@ -399,6 +400,26 @@ export class Firestore implements firestore.Firestore {
*/
private registeredListenersCount = 0;

/**
* A lazy-loaded BulkWriter instance to be used with recursiveDelete() if no
* BulkWriter instance is provided.
*
* @private
*/
private _bulkWriter: BulkWriter | undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to BulkWriter (kind of like a singleton pattern)? This could then be a shared instance that could be used by other operations in the future as well.

I also wonder whether the default ramp up makes sense here. It will slow down the delete (especially when compared to the CLI) and it also doesn't quite fit the use case for 5/5/5. We do not have to ramp up to give the database a chance to shard - the data is already sharded, we just want to delete it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to BulkWriter (kind of like a singleton pattern)? This could then be a shared instance that could be used by other operations in the future as well.

Do you mean that calling firestore.bulkWriter() would return the same instance each time, or having a Firestore-internal BulkWriter.bulkWriter() getter that lazy-loads?

I also wonder whether the default ramp up makes sense here.

Yeah, that's something I was thinking about as well to implement as part of the RST_STREAM retries. Deletes aren't subject to 555 throttling, which creates two possibilities. 1) We can add an internal override on BulkWriter to turn off throttling when recursive delete is called. However, this would mean that writes being performed at the same time won't get throttled. 2) We can add logic for BulkWriter to automatically ignore throttling limits for deletes, but then we'd have to batch deletes separately from writes (or think of another solution that allows writes to still be throttled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firestore internal getter that lazy-loads. I want to keep it similar to what you have, but try to move these internals out of Firestore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a static BulkWriter.bulkWriter() method.


/**
* Lazy-load the Firestore's default BulkWriter.
*
* @private
*/
private getBulkWriter(): BulkWriter {
if (!this._bulkWriter) {
this._bulkWriter = this.bulkWriter();
}
return this._bulkWriter;
}

/**
* Number of pending operations on the client.
*
Expand Down Expand Up @@ -1198,15 +1219,100 @@ export class Firestore implements firestore.Firestore {
this.bulkWritersCount -= 1;
}

/**
* Recursively deletes all documents and subcollections at and under the
* specified level.
*
* If any deletes fail, the promise is rejected with an error message
thebrianchen marked this conversation as resolved.
Show resolved Hide resolved
* containing the number of failed writes and the stack trace of the last
* failed write. The provided reference is deleted regardless of whether
* all deletes succeeded, except when Firestore fails to fetch the provided
* reference's descendants.
*
* Firestore uses a BulkWriter instance with default settings to perform the
thebrianchen marked this conversation as resolved.
Show resolved Hide resolved
* deletes. To customize throttling rates or add success/error callbacks,
* pass in a custom BulkWriter instance.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a code snippet here that shows how to collect the references for the failed delete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I included the default error handler to make it easier to paste. Do you think including it would confuse developers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

* @param ref The reference of a document or collection to delete.
* @param bulkWriter Custom BulkWriter instance with which to perform the
* deletes with.
* @return A promise that resolves when all deletes have been performed.
* The promise is rejected if any of the deletes fail.
*/
recursiveDelete<T = DocumentData>(
ref: CollectionReference<T> | DocumentReference<T>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need T here. It can be unknown.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change the signature to firestore.DocumentRef and add some casting logic from the firestore typed classes to our internal classes. Can you double check to see if there's a cleaner way? Thanks!

bulkWriter?: BulkWriter
): Promise<void> {
const docStream = this.getAllDescendants(ref);
Copy link
Contributor

@schmidt-sebastian schmidt-sebastian Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const docStream = this.getAllDescendants(ref);
const docStream = this.getAllDescendants(ref);

I also wonder if we should move this somewhere else (not that there is a good place for this). The class is getting a bit big.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not really sure where to move it. Are you suggesting we encapsulate all the behavior in a separate class?

const writer = bulkWriter ?? this.getBulkWriter();
const deleteCompleted = new Deferred<void>();
let errorCount = 0;
let lastError: Error | undefined;

// Capture the error stack to preserve stack tracing across async calls.
const stack = Error().stack!;
thebrianchen marked this conversation as resolved.
Show resolved Hide resolved

docStream
.on('error', err => {
err.code = Status.FAILED_PRECONDITION;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at https://developers.google.com/maps-booking/reference/grpc-api/status_codes

FAILED_PRECONDITION means that a simple retry won't work here. The developer has to change something about the state of the system before retrying the operation. This is likely not the case here. I would guess that we will have an error code here already when we hit this code path, but if we don't, I would suggest using UNAVAILABLE as the return code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

err.message =
'Failed to fetch children documents. ' +
'The provided reference was not deleted.';
thebrianchen marked this conversation as resolved.
Show resolved Hide resolved
deleteCompleted.reject(wrapError(err, stack));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not deleting the provided reference in the case of a failed StructuredQuery the ideal, expected behavior? I need a quick sanity check. Here are the options:

  1. Do not delete the provided reference if StructuredQuery stream ever errors.
  2. Do not delete provided reference iff StructuredQuery stream errors before any deletes are enqueued.
  3. Delete the reference regardless of what happens.

I think we originally agreed on Option 2. It makes logical sense -- if Firestore can't fetch any of the descendants, that's a failed precondition, so Firestore shouldn't delete the provided reference. However, what's tripping me up is if the StructuredQuery stream errors out midway through the deletion process. In this case, an argument can be made both ways -- Firestore shouldn't delete the provided reference because it couldn't fetch the descendants (failed precondition argument) OR Firestore should delete the provided reference to be consistent in behavior (the provided reference is deleted even if BulkWriter fails to delete some references).

Trying to document all these edge cases led me to consider Option 3 -- if I'm a developer who simply wants the collection or document deleted, I don't care about the provided reference, and Firestore should just nuke whatever it can get a hold of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query can error at any point in time. I would suggest we make no guarantees whatsoever, which makes the overall behavior of the API much more predictable. If any operation fails, the database tree is in an undefined state, and the user needs to try again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed it to always delete the provided reference when possible, but kept the error message in for query stream failures. Not sure if we want the FAILED_PRECONDITION status code though.

})
.on('data', (snap: QueryDocumentSnapshot) => {
const docRef = new DocumentReference(this, snap.ref._path);
writer.delete(docRef).catch(err => {
errorCount++;
lastError = err;
});
})
.on('end', () => {
writer.flush().then(async () => {
if (ref instanceof DocumentReference) {
try {
await ref.delete();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a more elegant way to differentiate between the final document reference delete failing vs. the BulkWriter deletes. Any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the rest of the PR, but if you want to work on it before I do - couldn't you just let the BulkWriter handle this delete as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a few more changes/cleanup I need to push upstream, so reviewing now might result in some wasted work.

Thanks for the suggestion! I originally figured a normal delete is less verbose than enqueuing another operation to BulkWriter and calling flush(), but using BulkWriter allows for any user callbacks to also run.

} catch (err) {
logger(
'Firestore.recursiveDelete',
null,
'failed to delete the provided reference',
ref.path
);
lastError = err;
}
}

if (lastError === undefined) {
deleteCompleted.resolve();
} else {
let error = new GoogleError(
`${errorCount} ` +
`${errorCount > 1 ? 'deletes' : 'delete'} ` +
'failed. The last delete failed with: '
);
if (lastError instanceof BulkWriterError) {
error.code = (lastError.code as number) as Status;
}
error = wrapError(error, stack);

// Wrap the BulkWriter error last to provide the full stack trace.
deleteCompleted.reject(wrapError(error, lastError.stack ?? ''));
}
});
});

return deleteCompleted.promise;
}

/**
* Retrieves all descendant documents nested under the provided reference.
*
* @private
* @return {Stream<QueryDocumentSnapshot>} Stream of descendant documents.
*/
// TODO(chenbrian): Make this a private method after adding recursive delete.
_getAllDescendants(
ref: CollectionReference | DocumentReference
private getAllDescendants<T = DocumentData>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be underscore prefixed to prevent "usage by code completion".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed. Aren't private methods hidden from code completion? How do they show up? Also, what's the difference between the _method() and method_() patterns in this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"private" is only hidden from TS developers. JS developers will still see it in code completion.

Underscore suffix is the same as underscore prefix, but from about 10 years ago. I am surprised this client uses underscore suffix.

ref: CollectionReference<T> | DocumentReference<T>
): NodeJS.ReadableStream {
// The parent is the closest ancestor document to the location we're
// deleting. If we are deleting a document, the parent is the path of that
Expand Down
140 changes: 83 additions & 57 deletions dev/system-test/firestore.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2629,69 +2629,95 @@ describe('BulkWriter class', () => {
return firestore.terminate();
});

// TODO(chenbrian): This is a temporary test used to validate that the
// StructuredQuery calls work properly. Remove these tests after adding
// recursive delete tests.
it('finds nested documents and collection', async () => {
// ROOT-DB
// └── randomCol
// ├── anna
// └── bob
// └── parentsCol
// ├── charlie
// └── daniel
// └── childCol
// ├── ernie
// └── francis
const batch = firestore.batch();
batch.set(randomCol.doc('anna'), {name: 'anna'});
batch.set(randomCol.doc('bob'), {name: 'bob'});
batch.set(randomCol.doc('bob/parentsCol/charlie'), {name: 'charlie'});
batch.set(randomCol.doc('bob/parentsCol/daniel'), {name: 'daniel'});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/ernie'), {
name: 'ernie',
});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/francis'), {
name: 'francis',
});
await batch.commit();
describe('recursiveDelete()', () => {
async function countDocumentChildren(
ref: DocumentReference
): Promise<number> {
let count = 0;
const collections = await ref.listCollections();
for (const collection of collections) {
count += await countCollectionChildren(collection);
}
return count;
}

const numStreamItems = async (
stream: NodeJS.ReadableStream
): Promise<number> => {
async function countCollectionChildren(
ref: CollectionReference
): Promise<number> {
let count = 0;
// eslint-disable-next-line @typescript-eslint/no-unused-vars
for await (const _ of stream) {
++count;
const docs = await ref.listDocuments();
for (const doc of docs) {
count += (await countDocumentChildren(doc)) + 1;
}
return count;
};
}

// Query all descendants of collections.
let descendantsStream = await firestore._getAllDescendants(randomCol);
expect(await numStreamItems(descendantsStream)).to.equal(6);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob').collection('parentsCol')
);
expect(await numStreamItems(descendantsStream)).to.equal(4);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob').collection('parentsCol/daniel/childCol')
);
expect(await numStreamItems(descendantsStream)).to.equal(2);
beforeEach(async () => {
// ROOT-DB
// └── randomCol
// ├── anna
// └── bob
// └── parentsCol
// ├── charlie
// └── daniel
// └── childCol
// ├── ernie
// └── francis
const batch = firestore.batch();
batch.set(randomCol.doc('anna'), {name: 'anna'});
batch.set(randomCol.doc('bob'), {name: 'bob'});
batch.set(randomCol.doc('bob/parentsCol/charlie'), {name: 'charlie'});
batch.set(randomCol.doc('bob/parentsCol/daniel'), {name: 'daniel'});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/ernie'), {
name: 'ernie',
});
batch.set(randomCol.doc('bob/parentsCol/daniel/childCol/francis'), {
name: 'francis',
});
await batch.commit();
});

// Query all descendants of documents.
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob')
);
expect(await numStreamItems(descendantsStream)).to.equal(4);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('bob/parentsCol/daniel')
);
expect(await numStreamItems(descendantsStream)).to.equal(2);
descendantsStream = await firestore._getAllDescendants(
randomCol.doc('anna')
);
expect(await numStreamItems(descendantsStream)).to.equal(0);
it('recursiveDelete on top-level collection', async () => {
await firestore.recursiveDelete(randomCol);
expect(await countCollectionChildren(randomCol)).to.equal(0);
});

it('recursiveDelete on nested collection', async () => {
const coll = randomCol.doc('bob').collection('parentsCol');
await firestore.recursiveDelete(coll);

expect(await countCollectionChildren(coll)).to.equal(0);
expect(await countCollectionChildren(randomCol)).to.equal(2);
});

it('recursiveDelete on nested document', async () => {
const doc = randomCol.doc('bob/parentsCol/daniel');
await firestore.recursiveDelete(doc);

const docSnap = await doc.get();
expect(docSnap.exists).to.be.false;
expect(await countDocumentChildren(randomCol.doc('bob'))).to.equal(1);
expect(await countCollectionChildren(randomCol)).to.equal(3);
});

it('recursiveDelete on leaf document', async () => {
const doc = randomCol.doc('bob/parentsCol/daniel/childCol/ernie');
await firestore.recursiveDelete(doc);

const docSnap = await doc.get();
expect(docSnap.exists).to.be.false;
expect(await countCollectionChildren(randomCol)).to.equal(5);
});

it('recursiveDelete with custom BulkWriter instance', async () => {
const bulkWriter = firestore.bulkWriter();
let callbackCount = 0;
bulkWriter.onWriteResult(() => {
callbackCount++;
});
await firestore.recursiveDelete(randomCol, bulkWriter);
expect(callbackCount).to.equal(6);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also get a test that has "randomCollA/docA" and "randomCollB/docB" and only deletes randomCollA? There might be some nuances in how to delete parent collection (in other ports) and I want to make sure that all of our clients handle this case correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, done.

});

it('can retry failed writes with a provided callback', async () => {
Expand Down