Conversation
server/src/unifyfs_tree.c
Outdated
|
|
||
| static void filesize_response_forward(unifyfs_state_filesize_t* st) | ||
| { | ||
| printf("%d: BUCKEYES response_forward\n", glb_pmi_rank); fflush(stdout); |
There was a problem hiding this comment.
I think we need to find a way to get a "BUCKEYES" printf in the final product ;)
There was a problem hiding this comment.
Yeah, keen eye :-). Needed someway to find my new debug messages, and for some reason that name stands out to me. O-H!
9755176 to
5ed78f0
Compare
|
@CamStan , I'll check this again. I see my other PRs are running the tests just fine, so must be something I messed up. |
|
I cleared Travis' cache and ran this again. In watching it, it appears to hang after it gets to that last test that prints successfully, then eventually fails out. I believe the I'm not seeing anything obvious, but I keep running into these situations where the unit tests hang in various places. Still haven't been able to figure out why, but there could be a bigger underlying issue that keeps getting bumped. |
|
@CamStan , I did find a bug that I think was leading to a segfault. When I cleaned that up, the tests started passing. So it was my fault after all. Sorry for the false alarm, but thanks for helping me figure it out. |
fb30c21 to
bdb5d39
Compare
TEST_CHECKPATCH_SKIP_FILES=common/src/unifyfs_configurator.h
|
Is there any way you can extend seg_tree to get what you want, rather than copy-n-pasting that code into separate extent_tree/gfid2ext_tree/int2void implementations? Maybe add a |
- merging the collective status into unifyfs_coll_state_t
| } | ||
|
|
||
| /* Remove and free all nodes in the unifyfs_inode_tree. */ | ||
| void unifyfs_inode_tree_destroy( |
There was a problem hiding this comment.
Are we also freeing the tree structure itself here, and if so, should we destroy the rwlock?
There was a problem hiding this comment.
Yes, you are right that we need to destroy the lock. But the structure is allocated by the caller for now.
|
Reiterating my earlier comment, are there any plans to extend seg_tree to get rid of the separate, but very similar tree implementations of extent_tree, int2void, unifyfs_inode_tree? |
That makes sense, but we're saving that step for later. It may be that the requirements of these different data structures diverge as more functionality is added. We've kept them separate to allow for that. We can revisit this idea down the road. |
|
@sandrain and @boehms , here are the last bits we need to add so that we can replace our use of MDHIM for storing file attributes. Would you mind adding these in? Considering how to code up a replacement for the extra logic we have in unifyfs_set_file_attribute(), it basically comes down to selectively updating the size and is_laminated fields in the metadata. I think we have the following cases: client-to-server metaset_rpc with create==1:
client-to-server metaset_rpc with create==0:
truncate:
laminate:
Once we have that, we should be able to replace all calls to unifyfs_set_file_attribute() and unifyfs_get_file_attribute(). |
Please include this step in this PR though; it's not worth racking up the Technical Debt, especially since this PR isn't time sensitive. The seg_tree code already has test cases, so we know it works. It should be extended in this PR to be a more general tree library that would accommodate these new use cases. It should be doable if you add a |
- applying Swen's fix - fixed warnings, segmentation faults, now deadlock
-- we exchange buffers successfully, but need to merge them.. (causing a deadlock now)
- still needs some cleaning in unifyfs_inode
- filesize, truncate, unlink, metaset - remove the previous syncronous implementations
|
Closing as margotree work has been merged. |
Using margo rpcs between servers, this implements a tree-based reduction to compute the file size across servers.
One challenge here is that these operations are on-demand, non-blocking collectives. When a file size request comes into to a server, it allocates a temporary structure to track the state of the collective. It assigns a tag, unique to that local server, that the server will use to look up its state structure on later rpcs. It forwards the tag to any children, and the children include that tag in their reply.
To implement a file size lookup, the server calls a file size request rpc on each of its children in the tree. Each child in turn forwards that request down the tree. At the leaves, the servers lookup their local max write offset for the given file and respond to their parent. On each file size response, a server takes the max of the incoming message with its current max and saves that on the state structure. Once it receives replies from all children, it forwards a reply to its own parent and releases the state structure.
This type of tree-based bcast/reduce can be used for other operations like truncate and unlink. If we at some point opt to replicate the metadata info for each file on every server, this could be used for operations that act on the stat info too like chmod, chgrp. Large read operations could be done with tree-based scatter/bcast of extents, followed by tree-based gathers to collect the data replies.
The second commit in this PR adds server-based segment trees to track file extents on each server. These segment trees are updated when new file extents are fsync'd from the client. The file size reduction is modified to use these segment trees to get the max extent offset on each node as input to the reduction. Additionally, the segment trees define operations for file truncate and unlink to be used in a future commit.
The last commit uses the new server-based segment tree to service client read requests. For each read request from the client, the server processes as much as it can from its local segment tree. If any request has left over segments after this local processing, the server falls back to query the global key value store and requests data from other servers. This eliminates remote key/value lookups in the case a client is reading data local to the node, which should speed up those reads. Also, it will flatten writes across fsync operations from a single process as well as processes from the same node. This optimization is not valid in the case that a process on another node writes to the same byte, since the local server would then retrieve its stale local value. It can be used with truncate, though, so long as the extents are truncated appropriately.
Description
Motivation and Context
How Has This Been Tested?
Types of changes
Checklist: