[WIP] server-side file metadata operations with rpc (mdhim alternative)#485
[WIP] server-side file metadata operations with rpc (mdhim alternative)#485
Conversation
|
I've commented on this up before in a separate PR here and here and here, but all these data structures... ..are basically 90% copy-n-pasted seg_tree.c code. For example: It sounds like we need a generic tree library that abstracts away the uglyness of the low-level BSD tree.h that we have. Using seg_tree.c as a template, you can write a generic tree library that takes in a void *data, with a custom comparator function. You'd then cast the void *data to your custom data structure (like extent_tree_node, unifyfs_inode, int, etc) so it works for whatever you want to put in the tree. We should even re-write seg_tree to use the generic tree implementation. The generic tree would look something like this: int unifyfs_tree_init(struct unifyfs_tree* tree, int (compare_func)(void *data1, void *data2));
int unifyfs_tree_add(struct unifyfs_tree* tree, void *data);
struct unifyfs_tree_node* unifyfs_tree_find(struct unifyfs_tree* tree, void *data);
void unifyfs_tree_remove(struct unifyfs_tree* tree, void *data);
...This will reduce the amount of code substantially. We would also be able to verify the generic tree library works simply by running the existing seg_tree tests (assuming we adapted seg_tree to use the generic tree underneath). That should exercise all the tree's codepaths. |
af75dbe to
78e550e
Compare
|
@tonyhutter Thanks for the feedback, and I agree that we need some refactoring there. I will address this after some other parts are done. |
|
|
||
| /* set return value | ||
| * TODO: check if we have an error and handle it */ | ||
| ret = out.ret; |
There was a problem hiding this comment.
On these return values from the children, I think we need to compute an OR operation or something. This code will use the return value from the last child, but I think we want to report an error if any child reports an error.
There was a problem hiding this comment.
Thanks @adammoody . Somehow I haven't able to check your reviews last week. I will fix this.
|
We could drop the int2void if that's no longer needed. I think it was just used to lookup a collective state structure given an integer tag, but that's not necessary with the improved collectives using non-blocking margo calls. |
|
Do we still have the server-side local extent optimization? That was on the old branch in this commit: |
Oh, now I see this was dropped in a later commit. Nevermind. |
|
@tonyhutter , refactoring is a good idea. It will be more involved than just those four functions, though. In the segment/extent trees, adding a node is more like a merge operation where the node being inserted can merge with the node just before and just after if they line up just right. Whereas in the inode tree, adding a new node is just a simple insert operation. So we likely need to define an "add" function pointer in addition to the "compare" function. Another wrinkle is that the client and server structures may require different locking. The seg_tree for the client already includes pthread locks. The server potentially needs to deal with both pthreads and margo threads because it uses both pthreads and margo threads internally. It's not yet clear if we need both pthread locks and margo locks or if just one of those will suffice and if so which one. Anyway, I think that means we'll need some sort of "lock" and "unlock" function pointers. Then there are a few big pieces missing on the server side that this PR is trying to address as it replaces MDHIM. We still need to add support to look up extent info in the read path, we need to add support for distributed extent data and queries in addition to the broadcasted extents that we have now. Once all of that is done, we'll have a big mess of potential deadlocks and race conditions to think about that will drive us to figure out the pthread/margo locking requirements. It's not clear how much the server-side data structures might diverge by the time we're done or whether we'll even need them at all. In fact, you can see that int2void was just tossed out completely since it's no longer used anywhere after the collective rewrite, so one of those three copy-and-paste structures is already gone. I think we'll have a better picture of how different/similar these structures are after the main work in the PR is done, then it should be obvious how best to refactor and clean up. |
2acfb37 to
bb69277
Compare
TEST_CHECKPATCH_SKIP_FILES=common/src/unifyfs_configurator.h
- merging the collective status into unifyfs_coll_state_t
- Reviving some necessary metadata logics from old code. - Error propagating in the collective fops.
- applying Swen's fix - fixed warnings, segmentation faults, now deadlock
-- we exchange buffers successfully, but need to merge them.. (causing a deadlock now)
- still needs some cleaning in unifyfs_inode
- filesize, truncate, unlink, metaset - remove the previous syncronous implementations
|
@sandrain , how hard would it be to reapply the sequence of commits from the original PR that you used to get to this PR? I started another temporary branch which is the original PR that is rebased on the current dev branch. I need to test that to check that it still works. I've only checked that it compiles so far. Anyway, I was hoping we could layer the commits you and @boehms made on the original PR if possible. |
|
@adammoody I see the margotree branch is synced with dev. I will apply updates here into the original margotree branch. |
completely. Instead of using a distributed kv store (mdhim), each server daemon maintains information of all files in memory. The file information includes file attributes and extent information (extent tree). When the file information should be shared, we use collective operations (broadcast/reduce). - new data structures in server: unifyfs_inode_tree, unifyfs_inode - new collective operations: unifyfs_collectives.h unifyfs_collectives.c
- connecting the previous operations with mdhim - re-create fsync rpc - connecting the collective operations (except read/mread)
- also, discarding int2void tree that is not used anymore
bb69277 to
095b94b
Compare
|
I am closing this PR for this changes are now in margotreenew. |
This is a PR draft for removing the mdhim dependency for our metadata management. The initial PR was #427 and #466. This is particularly based on #427, which has been diverged quite a lot from the dev branch. Some notable changes are:
Group RPC operations for synchronizing file metadata across servers
#427 elaborates this. The current group operations include
Command line option to specify the operation mode:
mdhimorrpcThe current mdhim-based operations are all preserved for now. The runtime argument
-zwill specify whether UnifyFS server uses mdhim (-z mdhim) or the rpc alternative (-z rpc).Inode abstraction in the server
When the server runs with the rpc mode (
-z rpc), the new data structures (inode and inode tree) are initialized for managing file attributes (including the extent trees) for each file.TODO
More TODO
fsync(2))Types of changes