ISnapshotTree.blobs has unwanted usage - used to store blobID mapping, as well as blob content #4746

vladsud · 2021-01-06T17:27:13Z

There are multiple (but not that many) places in our repo where we rely on a fact that ISnapshotTree.blobs contains two non-overlapping and distinct set of data:

Mapping of path (like "header" or ".attributes") to storage blobId.
Mapping of blobId to actual content of the blob.

This looks totally wrong for two reasons:

There is actually no guarantee that these two sets (path names and blobId names) have no overlap. Nothing prevents storage to generate blobId = "header". This will result in user data loss as we will not be able to open a file.
As we change how data flow through storage/driver/runtime, blobs will become binary, i.e. not strings but ArrayBufferLike. We will have to split these two sets or have a pain of continuous cases and suppressing compiler warnings / adding a lot of unneeded IFs.

I believe we are in this situation only because unsaved changes (i.e. features like Draft mode, and container runtime reload due to code proposal) store data in right there in blobs property, instead of using separate array.

loadAndInitializeProtocolState() is a good example here - instead of using same path (i.e. always have storage interface, and readiing from it as we do on "normal" load, we added another branch that assumes blobs themselves can be fetched from same ISnapshotTree. We get here due to convertProtocolAndAppSummaryToSnapshotTreeCore() making that design choice of stuffing everything everything into same ISnapshotTree, I believe it should not do it. I believe this is the only place where we do it (based on scanning of who's using fromUtf8ToBase64)

Places where we rely on that behavior (reading) are easy to find after https://github.com/microsoft/FluidFramework/pull/4530/files makes it into main - one can search for few remaining usages of fromBase64ToUtf8() to find them.

Examples:

readAndParseFromBlobs

export function readAndParseFromBlobs<T>(blobs: {[index: string]: string}, id: string): T {
    const encoded = blobs[id];
    const decoded = fromBase64ToUtf8(encoded);
    return JSON.parse(decoded) as T;
}

convertSnapshotTreeToSummaryTree():

    // The entries in blobs are supposed to be blobPath -> blobId and blobId -> blobValue
    // and we want to push blobPath to blobValue in tree entries.
    if (snapshot.blobs[value] !== undefined) {
        const decoded = fromBase64ToUtf8(snapshot.blobs[value]);
        builder.addBlob(key, decoded);
    }

getDocumentAttributes():

    // Back-compat: old docs would have ".attributes" instead of "attributes"
    const attributesHash = ".protocol" in tree.trees
        ? tree.trees[".protocol"].blobs.attributes
        : tree.blobs[".attributes"];

    const attributes = storage !== undefined ? await readAndParse<IDocumentAttributes>(storage, attributesHash)
        : readAndParseFromBlobs<IDocumentAttributes>(tree.trees[".protocol"].blobs, attributesHash);

loadAndInitializeProtocolState():

            members = readAndParseFromBlobs<[string, ISequencedClient][]>(snapshot.trees[".protocol"].blobs,
                baseTree.blobs.quorumMembers);
            proposals = readAndParseFromBlobs<[number, ISequencedProposal, string[]][]>(
                snapshot.trees[".protocol"].blobs, baseTree.blobs.quorumProposals);
            values = readAndParseFromBlobs<[string, ICommittedProposal][]>(snapshot.trees[".protocol"].blobs,
                baseTree.blobs.quorumValues);

convertProtocolAndAppSummaryToSnapshotTreeCore:

function convertProtocolAndAppSummaryToSnapshotTreeCore(
            case SummaryType.Blob: {
                const blobId = uuid();
                treeNode.blobs[key] = blobId;
                const content = typeof summaryObject.content === "string" ?
                    summaryObject.content : Uint8ArrayToString(summaryObject.content, "base64");
                treeNode.blobs[blobId] = fromUtf8ToBase64(content);
                break;
            }

The text was updated successfully, but these errors were encountered:

vladsud · 2021-01-06T17:29:06Z

Related item: #4695

vladsud · 2021-02-25T02:33:12Z

Please note that addressing this issue properly will likely mean introduction of storage object where previously we have not had one.
Note that we were already pretty bad at managing when we have storage and when we do not have it and transitions. I've added this comment in code- it's easy to see here by adding assert and hitting it right away:

    public get storage(): IDocumentStorageService {
        // This code is plain wrong. It lies that it never returns undefined!!!
        // All callers should be fixed, as this API is called in detached state of container when we have
        // no storage and it's passed down the stack without right typing.
        if (!this._storage && this.context.storage) {
            // Note: BlobAggregationStorage is smart enough for double-wrapping to be no-op
            this._storage = BlobAggregationStorage.wrap(this.context.storage, this.logger);
        }
        // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
        return this._storage!;
    }

It would be great to re-evaluate how we deal with transitions of no storage -> storage, and in order to fix this bug it likely needs to be temp storage -> permanent storage.
For most part that can be abstracted from layers below by always giving storage object and doing switch under the covers, though it begs question - can we do it safely? Or it needs to be an addition (i.e. old blobs are always available, but now we gained write capabilities).
I guess first part on solving problems here - understanding how code works today and how we can remove eslint violation suppressions and make code work correctly with property types, learn and move to fixing this issue.

jatgarg · 2021-07-30T23:30:26Z

Closing with all above mentioned PR.
Following up with this to remove leftover back compat code in future. #6938

vladsud added the bug Something isn't working label Jan 6, 2021

ghost added the triage label Jan 6, 2021

vladsud assigned jatgarg Jan 6, 2021

vladsud added this to the February 2021 milestone Jan 6, 2021

ghost added triage and removed triage labels Jan 6, 2021

vladsud mentioned this issue Jan 6, 2021

Changing all runtime calls of IDocumentStorageService.read() to readString() and more #4530

Closed

curtisman added area: driver Driver related issues and removed triage labels Jan 8, 2021

ghost added the triage label Jan 8, 2021

curtisman removed the triage label Jan 8, 2021

vladsud modified the milestones: February 2021, March 2021 Jan 20, 2021

vladsud modified the milestones: March 2021, April 2021 Feb 25, 2021

vladsud modified the milestones: April 2021, May 2021 Mar 13, 2021

vladsud modified the milestones: May 2021, June 2021 Apr 22, 2021

This was referenced Jun 3, 2021

Storage handling in detach container across layers. #6342

Closed

Add container storage adapter to handle storage in detached container #6380

Merged

vladsud modified the milestones: June 2021, July 2021 Jun 29, 2021

This was referenced Jun 29, 2021

Add api to allow read contents from snapshot synchronously in rehydrated container when required #6582

Closed

Draft for removing blob contents from ISnapshotTree.blobs #6608

Closed

This was referenced Jul 19, 2021

Remove usages of readAndParseBlobs from sync constructors methods #6795

Merged

Remove readAndParseFromBlobs api as no longer needed #6816

Merged

Remove blob contents from snapshot in rehydrate container, loader changes #6822

Merged

This was referenced Jul 28, 2021

Try to read from new format while summarizing in detached container #6903

Merged

Follow up on snapshot blobs issue in rehydrating detached container issue #6938

Closed

jatgarg closed this as completed Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISnapshotTree.blobs has unwanted usage - used to store blobID mapping, as well as blob content #4746

ISnapshotTree.blobs has unwanted usage - used to store blobID mapping, as well as blob content #4746

vladsud commented Jan 6, 2021 •

edited

Loading

vladsud commented Jan 6, 2021

vladsud commented Feb 25, 2021

jatgarg commented Jul 30, 2021

ISnapshotTree.blobs has unwanted usage - used to store blobID mapping, as well as blob content #4746

ISnapshotTree.blobs has unwanted usage - used to store blobID mapping, as well as blob content #4746

Comments

vladsud commented Jan 6, 2021 • edited Loading

vladsud commented Jan 6, 2021

vladsud commented Feb 25, 2021

jatgarg commented Jul 30, 2021

vladsud commented Jan 6, 2021 •

edited

Loading