Make database storage paths more unique by koesie10 · Pull Request #3456 · github/vscode-codeql

koesie10 · 2024-03-08T14:02:15Z

This tries to pick more unique database storage paths. For GitHub databases, we will now try to use the actual repository name instead of the last part of the URL, which is always the language (e.g. java or cpp). For example, this is the old and new storage location of the C++ database of google/brotli:

Old: ${storageUri}/cpp/cpp
New: ${storageUri}/google-brotli/cpp

This makes it less likely to run into the 100 databases limit that we have if more than 100 databases have the same storage path. It also makes it easier to find which folder in the storage corresponds to which database.

Separately, this also adds a fallback for the 100 databases limit where we will now try to append a nanoid instead of a counter. So, when all folders from java-1 to java-100 exist, we would error out with "Could not find a unique name for downloaded database." before, which wasn't helpful for users (and because of #3455 they couldn't get out of this situation by deleting databases, even if they knew that would help resolve the problem). Now, we will generate a nanoid and append it to the base name, so we would try to use e.g. java-odkDB1k61hq7PDJnv0qGY instead. This should essentially remove the databases limit.

Checklist

CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
Issues have been created for any UI or other user-facing changes made by this pull request.
[Maintainers only] If this pull request makes user-facing changes that require documentation changes, open a corresponding docs pull request in the github/codeql repo and add the ready-for-doc-review label there.

robertbrignull · 2024-03-11T10:01:45Z

@@ -429,6 +459,11 @@ async function getStorageFolder(storagePath: string, urlStr: string) {
    counter++;
    folderName = join(realpath, `${lastName}-${counter}`);
    if (counter > 100) {


Is there any need to keep the behaviour of calling databases java-{1-100} before we switch to another method? This is only for generating new databases, so it doesn't need to be compatible with existing databases, does it?

Why not use java-${nanoid()} all the time?

The reason for not doing that is that this name may be shown to the user (as mentioned in the comment: "we need to generate a folder name for the unzipped archive, this needs to be human readable since we may use this name as the initial name for the database"). java-100 is still more readable than java-odkDB1k61hq7PDJnv0qGY and would probably make it easier for users to recognize which database they have just added.

Thanks for explaining. It seems we use the filename only if there wasn't a name override (which comes when downloading from github) or and the database itself doesn't define a name (I don't know how common this is, but I expect any database built from a git repo will have a name).

Another alternative thought I had was what's the relative performance of checking a single file exists vs listing all files in the directory? If we just list all files and then do our checks in memory I would expect it to work fine up to thousands of databases.

Yes, that would make sense. I'll change it to readdir once and change the maximum counter value to 10,000.

robertbrignull · 2024-03-11T11:28:10Z

+      folderName = join(realpath, `${lastName}-${nanoid()}`);
+    }
+    if (counter > 200) {
      throw new Error("Could not find a unique name for downloaded database.");


Although this error shouldn't happen anymore if we use nanoid, shall we still update this error message to indicate that the problem is to do with having too many databases?

Maybe just:

Suggested change

throw new Error("Could not find a unique name for downloaded database.");

throw new Error("Could not find a unique name for downloaded database. Please remove some databases and try again.");

?

…ase-names

robertbrignull

Looking good. Apologies for the multiple rounds of reviewing, but I think the algorithm is looking good now.

robertbrignull

LGTM 👍🏼

Make database storage paths more unique

2c35a97

koesie10 marked this pull request as ready for review March 8, 2024 14:17

koesie10 requested a review from a team as a code owner March 8, 2024 14:17

robertbrignull reviewed Mar 11, 2024

View reviewed changes

Introduce createFilenameFromString function

e8efbbb

koesie10 requested a review from a team as a code owner March 11, 2024 10:32

robertbrignull reviewed Mar 11, 2024

View reviewed changes

koesie10 added 2 commits March 11, 2024 12:41

Use readdir instead of repeated pathExists calls

fe01360

Merge remote-tracking branch 'origin/main' into koesie10/unique-datab…

bbc09f3

…ase-names

robertbrignull reviewed Mar 11, 2024

View reviewed changes

Comment thread extensions/ql-vscode/src/common/filenames.ts Outdated

Comment thread extensions/ql-vscode/src/databases/database-fetcher.ts Outdated

Comment thread extensions/ql-vscode/src/databases/database-fetcher.ts Outdated

koesie10 added 3 commits March 11, 2024 14:14

Fix location of removing dots

7681a56

Improve readability of duplicate filename logic

0e3665b

Reduce nanoid tries

e003175

robertbrignull approved these changes Mar 11, 2024

View reviewed changes

koesie10 merged commit 16a0fce into main Mar 11, 2024

koesie10 deleted the koesie10/unique-database-names branch March 11, 2024 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make database storage paths more unique#3456

Make database storage paths more unique#3456
koesie10 merged 7 commits intomainfrom
koesie10/unique-database-names

koesie10 commented Mar 8, 2024

Uh oh!

Uh oh!

robertbrignull Mar 11, 2024

Uh oh!

koesie10 Mar 11, 2024

Uh oh!

robertbrignull Mar 11, 2024

Uh oh!

robertbrignull Mar 11, 2024

Uh oh!

koesie10 Mar 11, 2024

Uh oh!

robertbrignull Mar 11, 2024

Uh oh!

robertbrignull left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robertbrignull left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	throw new Error("Could not find a unique name for downloaded database.");
	throw new Error("Could not find a unique name for downloaded database. Please remove some databases and try again.");

Conversation

koesie10 commented Mar 8, 2024

Checklist

Uh oh!

Uh oh!

robertbrignull Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

koesie10 Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

robertbrignull Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

robertbrignull Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

koesie10 Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

robertbrignull Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

robertbrignull left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robertbrignull left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants