Skip to content

Conversation

@Spoffy
Copy link
Contributor

@Spoffy Spoffy commented May 20, 2025

Context

When importing a document that uses external attachments from another installation, the document's attachment store id still refers to the one on the original installation.

This can cause issues such as:

  • API errors when trying to access an invalid store
  • Documents being set to 'external' on installations with no external storage.

The current fix is to toggle the document from 'External' -> 'Internal' -> 'External' to refresh the store id, but this is opaque and undocumented, and only works on installations with external stores configured.

Multiple solutions were considered for this:

  • Add a 'magic value' store id that always refers to the installation's default store - Backwards compatibility would be an issue, with this problem persisting for documents currently in use.
  • Update the attachment store when a document is opened - This could theoretically lead to unexpected behaviour with document's attachment store ids changing without notifying the user. Especially in the future if stores are explicitly chosen for a document.
  • Modify the document on import - This avoids issues with surprising store ID changes for existing documents (as fewer assumptions exist around doc behaviour during import), but is trickier as we don't open the document as an ActiveDoc, so existing APIs for modifying the setting won't work.

Proposed solution

This PR adds the 'modify document on import' solution.

When a document is imported, the document's settings checked to see if the attachment store id exists. If not, the attachment store id is updated to the installation's default store id, or internal if the installation has no external stores.

Related issues

#1021

Has this been tested?

  • 👍 yes, I added tests to the test suite

@paulfitz paulfitz self-requested a review May 20, 2025 15:06
Copy link
Member

@paulfitz paulfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, your reasoning seems fair. Awkward little corner, handled in a reasonable way.

if (!columns.some(column => column.name === 'documentSettings')) {
return;
}
const results = await db.all('SELECT id, schemaVersion, documentSettings FROM _grist_DocInfo');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_grist_DocInfo is a singleton, so db.get might read more simply?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of thing always makes me nervous (as it breaks if the singleton assumption is ever violated), but it seems like a reasonable simplification here (since presumably the rest of the code would also break).

const settings: DocumentSettings | undefined = safeJsonParse(row.documentSettings, undefined);
const newSettings = makeChanges(settings);
await db.run('UPDATE _grist_DocInfo SET documentSettings = ? WHERE id = ?',
JSON.stringify(newSettings, (k, v) => v === undefined ? null : v),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. A little scary, maybe undertested, but probably fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it looks very scary.

For example:
On line 771 you expect to get DocumentSetttings | undefined

Yet on this line, you expect to write null back to database, as this might happen:

JSON.stringify(undefined, (k, v) => v === undefined ? null : v) === 'null'

So you don't expect to handle what you write to database yourself.

Copy link
Contributor Author

@Spoffy Spoffy May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're completely right, I'm really glad you spotted this. Something like this sneaking through was one of the reasons I was apprehensive about using this approach!

I'll give this another pass and add some additional tests here.

Edit: Looks like safeJsonParse was dealing with null values, which isn't obvious at first glance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've significantly reinforced the method with a lot more checks - it should be far more robust now.

const updatePromises = results.map(async (row) => {
const settings: DocumentSettings | undefined = safeJsonParse(row.documentSettings, undefined);
const newSettings = makeChanges(settings);
await db.run('UPDATE _grist_DocInfo SET documentSettings = ? WHERE id = ?',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice I think not to run the update if no change is needed, to lower the impact if something goes wrong some day with some new funky addition to documentSettings.

@paulfitz
Copy link
Member

(if you rebase you'll pick up a fix for a test failure)

const settings: DocumentSettings | undefined = safeJsonParse(row.documentSettings, undefined);
const newSettings = makeChanges(settings);
await db.run('UPDATE _grist_DocInfo SET documentSettings = ? WHERE id = ?',
JSON.stringify(newSettings, (k, v) => v === undefined ? null : v),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it looks very scary.

For example:
On line 771 you expect to get DocumentSetttings | undefined

Yet on this line, you expect to write null back to database, as this might happen:

JSON.stringify(undefined, (k, v) => v === undefined ? null : v) === 'null'

So you don't expect to handle what you write to database yourself.

row.id
);
});
await Promise.all(updatePromises);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Promise.all won't wait for other promises to stop if only one fails.
  2. You don't close db after the end (remember that you probably should wait for all promises to stop before closing db).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a quick test for an error we are expecting here? Something like:

Suggested change
// This isn't critical to document operation, so shouldn't block importing.
// This isn't critical to document operation, so shouldn't block importing.
if (!err.message.include('....')) { throw err; }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no specific error here I'm trying to catch. It's more that this is an optional operation - if it goes wrong for any reason, it shouldn't be grounds for blocking the import.

There's a lot of possible ones that could crop up here: File, SQlite, type checker, to name a few.

What do you think? I'm open to just propagating the error, if that's more sensible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So lets propagate the error then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been thinking on this - and I totally agree, there could be unforeseen side effects that get suppressed.

Comment on lines 799 to 831
if (!dbManager) { throw new Error("HomeDbManager not available"); }
// Sets up authorization of the document.
const org = client.authSession.org;
if (!org) { throw new Error('Documents can only be opened in the context of a specific organization'); }

// We use docId in the key, and disallow urlId, so we can be sure that we are looking at the
// right doc when we re-query the DB over the life of the websocket.
const useShareUrlId = Boolean(originalUrlId && parseUrlId(originalUrlId).shareKey);
const urlId = useShareUrlId ? originalUrlId! : docId;
auth = new DocAuthorizerImpl({dbManager, urlId, openMode, authSession: client.authSession});
await auth.assertAccess('viewers');
const docAuth = auth.getCachedAuth();
if (docAuth.docId !== docId) {
// The only plausible way to end up here is if we called openDoc with a urlId rather
// than a docId.
throw new Error(`openDoc expected docId ${docAuth.docId} not urlId ${docId}`);
}
} else {
log.debug(`DocManager.openDoc not using authorization for ${docId} because GRIST_SINGLE_USER`);
auth = new DummyAuthorizer('owners', docId);
}

const docSessionPrecursor: DocSessionPrecursor = new DocSessionPrecursor(client, auth, {linkParameters});

// Fetch the document, and continue when we have the ActiveDoc (which may be immediately).
return this._withUnmutedDoc(docSessionPrecursor, docId, async () => {
const activeDoc: ActiveDoc = await this.fetchDoc(docSessionPrecursor, docId);

// Get a fresh DocSession object.
const docSession = activeDoc.addClient(client, docSessionPrecursor);

// If opening in (pre-)fork mode, check if it is appropriate to treat the user as
// an owner for granular access purposes.
if (openMode === 'fork') {
if (await activeDoc.canForkAsOwner(docSession)) {
// Mark the session specially and flush any cached access
// information. It is easier to make this a property of the
// session than to try computing it later in the heat of
// battle, since it introduces a loop where a user property
// (user.Access) depends on evaluating rules, but rules need
// the user properties in order to be evaluated. It is also
// somewhat justifiable even if permissions change later on
// the theory that the fork is theoretically happening at this
// instance).
docSession.forkingAsOwner = true;
activeDoc.flushAccess(docSession);
} else {
// TODO: it would be kind to pass on a message to the client
// to let them know they won't be able to fork. They'll get
// an error when they make their first change. But currently
// we only have the blunt instrument of throwing an error,
// which would prevent access to the document entirely.
}
}

const [metaTables, recentActions, user, userOverride] = await Promise.all([
activeDoc.fetchMetaTables(docSession),
activeDoc.getRecentMinimalActions(docSession),
activeDoc.getUser(docSession),
activeDoc.getUserOverride(docSession),
]);

let docUsage: FilteredDocUsageSummary | undefined;
try {
docUsage = await activeDoc.getFilteredDocUsageSummary(docSession);
} catch (e) {
log.warn("DocManager.openDoc failed to get doc usage", e);
}

const result: OpenLocalDocResult = {
docFD: docSession.fd,
clientId: docSession.client.clientId,
doc: metaTables,
log: recentActions,
recoveryMode: activeDoc.recoveryMode,
user: user.toUserInfo(),
userOverride,
docUsage,
isTimingOn: activeDoc.isTimingOn,
};

if (!activeDoc.muted) {
this.emit('open-doc', this.storageManager.getPath(activeDoc.docName));
}

this.gristServer.getTelemetry().logEvent(docSession, 'openedDoc', {
full: {
docIdDigest: docId,
userId: client.authSession.userId,
altSessionId: client.authSession.altSessionId,
},
});

return {activeDoc, result};
});
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this version?

Suggested change
try {
const columns = await db.all("PRAGMA table_info(_grist_DocInfo)");
if (!columns.some(column => column.name === 'documentSettings')) {
return;
}
const results = await db.all('SELECT id, schemaVersion, documentSettings FROM _grist_DocInfo');
const updatePromises = results.map(async (row) => {
const parsedSettings: unknown = safeJsonParse(row.documentSettings, undefined);
const isValidSettingsObject = DocumentSettingsChecker.test(parsedSettings);
// Throw if there's something expected in the settings object.
// This shouldn't occur unless there's a bug or a malformed doc.
if (parsedSettings && !isValidSettingsObject) {
DocumentSettingsChecker.check(parsedSettings);
}
const settings = parsedSettings && isValidSettingsObject ? parsedSettings : undefined;
const newSettings = makeChanges(settings);
// Avoid unnecessary DB updates
if (isDeepEqual(settings, newSettings)) {
return;
}
await db.run('UPDATE _grist_DocInfo SET documentSettings = ? WHERE id = ?',
JSON.stringify(newSettings),
row.id
);
});
// Ensures every update has resolved before closing the database.
await Promise.allSettled(updatePromises);
// Throws if anything has errored.
await Promise.all(updatePromises);
try {
const columns = await db.all("PRAGMA table_info(_grist_DocInfo)");
if (!columns.some(column => column.name === 'documentSettings')) {
return;
}
const row = await db.get('SELECT id, schemaVersion, documentSettings FROM _grist_DocInfo');
if (!row) {
throw new Error(`No _grist_DocInfo row found in ${fname}`);
}
const parsedSettings: unknown = safeJsonParse(row.documentSettings, undefined);
if (DocumentSettingsChecker.test(parsedSettings)) {
const newSettings = makeChanges(parsedSettings);
// Avoid unnecessary DB updates
if (isDeepEqual(parsedSettings, newSettings)) {
return;
}
await db.run('UPDATE _grist_DocInfo SET documentSettings = ? WHERE id = ?',
JSON.stringify(newSettings),
row.id
);
} else {
// Perform a check that will throw an error as this is not a valid DocumentSettings object.
DocumentSettingsChecker.check(parsedSettings);
}
  • It uses db.get as we expect only one row
  • It actually checks that we have a row there
  • It facilitates Checker.test, to remove unnecessary 'casting' later on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see this, and just pushed a set of changes that are very similar anyway. Could you have a look at that? :)

@berhalak berhalak self-requested a review June 2, 2025 09:07
Copy link
Contributor

@berhalak berhalak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Spoffy !

@Spoffy Spoffy merged commit 415f099 into main Jun 5, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants