Make database storage paths more unique#3456
Conversation
| @@ -429,6 +459,11 @@ async function getStorageFolder(storagePath: string, urlStr: string) { | |||
| counter++; | |||
| folderName = join(realpath, `${lastName}-${counter}`); | |||
| if (counter > 100) { | |||
There was a problem hiding this comment.
Is there any need to keep the behaviour of calling databases java-{1-100} before we switch to another method? This is only for generating new databases, so it doesn't need to be compatible with existing databases, does it?
Why not use java-${nanoid()} all the time?
There was a problem hiding this comment.
The reason for not doing that is that this name may be shown to the user (as mentioned in the comment: "we need to generate a folder name for the unzipped archive, this needs to be human readable since we may use this name as the initial name for the database"). java-100 is still more readable than java-odkDB1k61hq7PDJnv0qGY and would probably make it easier for users to recognize which database they have just added.
There was a problem hiding this comment.
Thanks for explaining. It seems we use the filename only if there wasn't a name override (which comes when downloading from github) or and the database itself doesn't define a name (I don't know how common this is, but I expect any database built from a git repo will have a name).
There was a problem hiding this comment.
Another alternative thought I had was what's the relative performance of checking a single file exists vs listing all files in the directory? If we just list all files and then do our checks in memory I would expect it to work fine up to thousands of databases.
There was a problem hiding this comment.
Yes, that would make sense. I'll change it to readdir once and change the maximum counter value to 10,000.
| folderName = join(realpath, `${lastName}-${nanoid()}`); | ||
| } | ||
| if (counter > 200) { | ||
| throw new Error("Could not find a unique name for downloaded database."); |
There was a problem hiding this comment.
Although this error shouldn't happen anymore if we use nanoid, shall we still update this error message to indicate that the problem is to do with having too many databases?
Maybe just:
| throw new Error("Could not find a unique name for downloaded database."); | |
| throw new Error("Could not find a unique name for downloaded database. Please remove some databases and try again."); |
?
robertbrignull
left a comment
There was a problem hiding this comment.
Looking good. Apologies for the multiple rounds of reviewing, but I think the algorithm is looking good now.
This tries to pick more unique database storage paths. For GitHub databases, we will now try to use the actual repository name instead of the last part of the URL, which is always the language (e.g.
javaorcpp). For example, this is the old and new storage location of the C++ database ofgoogle/brotli:${storageUri}/cpp/cpp${storageUri}/google-brotli/cppThis makes it less likely to run into the 100 databases limit that we have if more than 100 databases have the same storage path. It also makes it easier to find which folder in the storage corresponds to which database.
Separately, this also adds a fallback for the 100 databases limit where we will now try to append a nanoid instead of a counter. So, when all folders from
java-1tojava-100exist, we would error out with "Could not find a unique name for downloaded database." before, which wasn't helpful for users (and because of #3455 they couldn't get out of this situation by deleting databases, even if they knew that would help resolve the problem). Now, we will generate a nanoid and append it to the base name, so we would try to use e.g.java-odkDB1k61hq7PDJnv0qGYinstead. This should essentially remove the databases limit.Checklist
ready-for-doc-reviewlabel there.