-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash-based object store #53
Comments
The full path/object store notion is flawed for the same reasons seen in #35, when we first opted for a directory structure on disk, viz: if each item ever stores its full path, when folder names change the path is destroyed. This might read to the difficult situation where renaming or re-rooting a folder requires substantially rewriting large portions of a file hierarchy, i.e. renaming hundreds or thousands of files. Let's say this violates "least surprise" that renaming a single node would be so expensive. And for what? The file already knows its path - it merely needs an association with the record responsible for the data block (i.e., a foreign key to the other table). This assocation, in fact, already exists, it's just called "backup" right now, and it points to a location on amazon instead of a location on disk. We should, instead, rename this table to "data_blocks", and simply make it point to a location on disk in addition. The method Metis::File#location should (instead of constructing a file system path) defer to Metis::DataBlock#location. Currently the Backup object is formed via the "archive" command. The DataBlock should, instead, be formed by the completed Upload, which hands the actual data off to DataBlock and attaches it to the appropriate Metis::File. |
Closed by #56 |
Currently each file on Metis is stored on disk in an actual directory structure. This is cumbersome to manage and requires several filesystem operations in order to move files to a different path. Ostensibly the reason for this was some sort of inspectibility of the file store on-disk. In practice this isn't really the case (the hex-encoded file paths are hard to read), and they cause some serious issues (there is a linux file system size limit that leaks into Metis, as it takes two hex characters to encode each file name character).
A better object store could use MD5s to organize files. Each file content is stored in a directory structure according to its hash; the object store maintains a table mapping a file key (i.e., a full path including :project_name/:bucket_name) to an md5. Newly-uploaded files are stored at a temporary location until the object store can hash them, after which it is moved to its md5-location.
This has several advantages: duplicate files don't take up extra space, and moving files from one path to another merely involves changing a database entry in the object store. This also abstracts the "object store" away from Metis' folder/file structure, paving the way for future use of other object stores (e.g., a cloud-based store or a Ceph store).
The text was updated successfully, but these errors were encountered: