-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use closure tables to implement file cache without full path column #4209
Comments
A problem is that the most used query on that table is selecting a row by path (or rather path_hash), closure tables can speedup getting all children for copy/move operations but we will still need to keep the full path somehow if we want to keep fast reads |
@butonic is this still a problem? What’s the call on this? |
It is a completely different approach to storing the file tree in the database. If someone feels inclined to do some performance testing I'd still like to try this. I have a branch somewhere but it needs to add a join for every subdirectory. Depending on the DB that might be costly ... |
moving to someday maybe milestone |
Regarding the most used query being the lookup of the fileid by path: Those are only SELECTS. They can be cached and scaled easily. UPDATES cant. We could also use self joins to look up more than one directory depth in a single query. The only theoretical problem I see is additional space requirement because the closure table grows exponentially: https://b2berry.files.wordpress.com/2011/12/b2berryscalingrtctables1.pdf I prefer closure tables becaus we have to update mtime, etag and size up to all tree roots (multiple because of sharing). Closure tables allow us to select or update all parents in one query. |
@DeepDiver1975 @cmonteroluque @karlitschek bringing this to the top of the backlog to help open the discussion. |
to be honest - with a different database structure we can gain speed up - maybe - this has to be analysed. But - we have a more pressing issue with the database: concurrent access. But the whole way we operate on the database is still fishy and requires a detailed analysis to ensure data integrity. There have been thoughs from @dragotin on this topic - thanks a lot once more. I prefer to first work on this area instead of restructuring the schema. |
Just to be clear: I totally agree with @DeepDiver1975 here. Concurrent access is a problem because our updates / etag and mtime propagation are not atomic and we don't handle transaction rollbacks correctly. AFAICT Closure tables solve the data integrity problems. BUT the only way to know for sure if they solve all our problems without creating new ones is by trying them: do the necessary implementation and test with huge data sets. |
@butonic I suppose it would be possible to make such experiments as a separate app. The only change needed in core is a config switch to make it possible to use another Storage class as home storage, so we could switch it to that one. |
Well or it could simply be an external storage backend (provided by the app) that actually uses another local folder to store the files. Then mount it somewhere for testing. |
@butonic where's the POC branch you mentioned ? |
Also found this Doctrine extension that brings nested sets and closure tables: https://github.com/Atlantic18/DoctrineExtensions/blob/master/doc/tree.md |
Hmm that sounds cool. But that might not work for sizes, as the size will be different for each folder. Unless there is an aggregate query that can automatically calculate the sizes on the go while updating. |
@butonic relevant for your rawstorage experiments as well ? |
no but we could try implementing a local storage with a closure table based cache ... could ... if we had the time ... |
other cases where closure tables would come in handy:
|
I need to track this somewhere, so I might just as well do it publicly. Past and current filecache implementations rely on a path column containing the full path of file. Currently it is limited to 512 chars: https://github.com/owncloud/core/blob/master/db_structure.xml#L224
There has been a lot of research on how to store tree like structures in relational databases. A good starting point is this stackoverflow entry. Another comparison can be found in One more Nested Intervals vs. Adjacency List comparison.
The most promising approach which I consider superior to eg. Nested Sets are Closure Tables, due to the clear meaning of column values. I know few people who actually understand nested sets ...
I already had a rudimentary version of them working in the OC4.5 file cache ... dunno if the code is still in some branch on one of my local machines ...
cc @icewind1991 @bartv2 opinions?
The text was updated successfully, but these errors were encountered: