New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
there should be an mdu command #143
Comments
bonus: it should calculate usage by adding up the |
This would be really valuable. Ideally, the tool should take into account both (1) the number of copies (as reported from the HTTP headers), as well as (2) the physical size used, which you can get by calling stat(1) or stat(2) from a job and looking at the number of blocks. It could report both logical size and physical size. It would be really neat if it had an option to export in ncdu's format, which means we could import it into that tool to help visualize space used. The way I'd probably go about this is to have the tool run an mfind on its arguments looking only for objects and then run a two-phase job on those objects: the first phase would be a C program that just calls stat(2) and reports the object name, the logical size, the number of blocks used, the size of each block, and anything else would be useful there. The second phase would be a reduce phase that would sort the inputs by object name and then create the JSON structure that ncdu expects. The trickiest part of that is probably dealing with the encoding between phases. If we just make the mapper a Node program, then we could just JSON.stringify the name, or even the entire record, but the startup cost would be substantially larger. That may not matter too much, given Manta's parallelism. |
Note that the etag for a snaplink'd file is the same. I wonder if it is |
That's a good point, although I don't think know if we want to commit to that. I think it would be reasonable if an Etag for the same content was the same, even if it was a separate copy. |
FWIW, obj.etag appears to be the metadata objectId: https://github.com/joyent/manta-muskie/blob/054fcc04fe724a0e319d6dedb185632c0f0c61bf/lib/obj.js#L520 Not sure that is a promised interface.
Shouldn't it capture other metadata as well? content-type, 'm-*' headers. |
Possibly. I'd have to review RFC 2616. Even if so, is it plausible that an implementation could allow separate copies of the same data with the same headers to have the same etag? On the other hand, unless we add a new header for object-id, I'm not sure how else mdu could deal with snaplinks. |
@davepacheco I wonder if it would be faster to just do directory listings and use the "size" and "durability" fields for listed files:
That is logical instead of physical block usage, which is a bit of a departure from standard |
Yes, a tool that looked at logical usage would be much easier to build and would run much faster. For end users, that's probably more appropriate, too. But as operators, we've wasted lots of time in the past clearing out usage of lots of logical space that freed up very little physical space, and I really don't want to do that again. Ideally, this tool would have two modes, but it's the slower-running, harder-to-build one that we really want at the moment. |
I've started work on this... might still be a while tho. |
@davepacheco I concur with you a du like tool would be awesome. Having to go through this to try to understand where "all the disk space went" I'm wondering if it would be easier/possible to expose compression ratio by file in the directory listing? then if a file was 1x compress vs 6x compressed, i think it would get you to the physical sizing a "bit" easier? |
@tebbers That would definitely be nice, but it's surprisingly difficult. The problem is that ZFS doesn't appear to calculate the correct number of physical bytes used until the transaction group that created the file has been written out, which is likely well after the object's metadata has been committed. We could populate this information asynchronously, but that's non-trivial itself and leaves a window shortly after object creation where the information would be incorrect. |
@davepacheco you have any pointers? I'm trying to create a du -ks in shell scripts by doing mfind's and mget's but it's pretty time consuming. |
I've been using manta-mdu, which is close to the point where it could be polished and brought into node-manta. |
client.ftw
is the perfect api for creating adu
like command. something that can calculate usage on a file/directory basis.I may start working on this, so I make this issue as a more of a placeholder than anything
The text was updated successfully, but these errors were encountered: