Skip to content

Make database storage paths more unique#3456

Merged
koesie10 merged 7 commits intomainfrom
koesie10/unique-database-names
Mar 11, 2024
Merged

Make database storage paths more unique#3456
koesie10 merged 7 commits intomainfrom
koesie10/unique-database-names

Conversation

@koesie10
Copy link
Copy Markdown
Member

@koesie10 koesie10 commented Mar 8, 2024

This tries to pick more unique database storage paths. For GitHub databases, we will now try to use the actual repository name instead of the last part of the URL, which is always the language (e.g. java or cpp). For example, this is the old and new storage location of the C++ database of google/brotli:

  • Old: ${storageUri}/cpp/cpp
  • New: ${storageUri}/google-brotli/cpp

This makes it less likely to run into the 100 databases limit that we have if more than 100 databases have the same storage path. It also makes it easier to find which folder in the storage corresponds to which database.

Separately, this also adds a fallback for the 100 databases limit where we will now try to append a nanoid instead of a counter. So, when all folders from java-1 to java-100 exist, we would error out with "Could not find a unique name for downloaded database." before, which wasn't helpful for users (and because of #3455 they couldn't get out of this situation by deleting databases, even if they knew that would help resolve the problem). Now, we will generate a nanoid and append it to the base name, so we would try to use e.g. java-odkDB1k61hq7PDJnv0qGY instead. This should essentially remove the databases limit.

Checklist

  • CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
  • Issues have been created for any UI or other user-facing changes made by this pull request.
  • [Maintainers only] If this pull request makes user-facing changes that require documentation changes, open a corresponding docs pull request in the github/codeql repo and add the ready-for-doc-review label there.

@koesie10 koesie10 marked this pull request as ready for review March 8, 2024 14:17
@koesie10 koesie10 requested a review from a team as a code owner March 8, 2024 14:17
Comment thread extensions/ql-vscode/src/databases/database-fetcher.ts Outdated
@@ -429,6 +459,11 @@ async function getStorageFolder(storagePath: string, urlStr: string) {
counter++;
folderName = join(realpath, `${lastName}-${counter}`);
if (counter > 100) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need to keep the behaviour of calling databases java-{1-100} before we switch to another method? This is only for generating new databases, so it doesn't need to be compatible with existing databases, does it?

Why not use java-${nanoid()} all the time?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not doing that is that this name may be shown to the user (as mentioned in the comment: "we need to generate a folder name for the unzipped archive, this needs to be human readable since we may use this name as the initial name for the database"). java-100 is still more readable than java-odkDB1k61hq7PDJnv0qGY and would probably make it easier for users to recognize which database they have just added.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. It seems we use the filename only if there wasn't a name override (which comes when downloading from github) or and the database itself doesn't define a name (I don't know how common this is, but I expect any database built from a git repo will have a name).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative thought I had was what's the relative performance of checking a single file exists vs listing all files in the directory? If we just list all files and then do our checks in memory I would expect it to work fine up to thousands of databases.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would make sense. I'll change it to readdir once and change the maximum counter value to 10,000.

@koesie10 koesie10 requested a review from a team as a code owner March 11, 2024 10:32
folderName = join(realpath, `${lastName}-${nanoid()}`);
}
if (counter > 200) {
throw new Error("Could not find a unique name for downloaded database.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this error shouldn't happen anymore if we use nanoid, shall we still update this error message to indicate that the problem is to do with having too many databases?

Maybe just:

Suggested change
throw new Error("Could not find a unique name for downloaded database.");
throw new Error("Could not find a unique name for downloaded database. Please remove some databases and try again.");

?

Copy link
Copy Markdown
Contributor

@robertbrignull robertbrignull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Apologies for the multiple rounds of reviewing, but I think the algorithm is looking good now.

Comment thread extensions/ql-vscode/src/common/filenames.ts Outdated
Comment thread extensions/ql-vscode/src/databases/database-fetcher.ts Outdated
Comment thread extensions/ql-vscode/src/databases/database-fetcher.ts Outdated
Copy link
Copy Markdown
Contributor

@robertbrignull robertbrignull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏼

@koesie10 koesie10 merged commit 16a0fce into main Mar 11, 2024
@koesie10 koesie10 deleted the koesie10/unique-database-names branch March 11, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants