there should be an mdu command #143

bahamas10 · 2013-11-01T00:57:49Z

client.ftw is the perfect api for creating a du like command. something that can calculate usage on a file/directory basis.

I may start working on this, so I make this issue as a more of a placeholder than anything

The text was updated successfully, but these errors were encountered:

bahamas10 · 2013-11-01T00:58:37Z

bonus: it should calculate usage by adding up the size attributes, but with a command line switch, calculate manta usage by adding up size * copies

davepacheco · 2016-06-27T16:51:34Z

This would be really valuable. Ideally, the tool should take into account both (1) the number of copies (as reported from the HTTP headers), as well as (2) the physical size used, which you can get by calling stat(1) or stat(2) from a job and looking at the number of blocks. It could report both logical size and physical size. It would be really neat if it had an option to export in ncdu's format, which means we could import it into that tool to help visualize space used.

The way I'd probably go about this is to have the tool run an mfind on its arguments looking only for objects and then run a two-phase job on those objects: the first phase would be a C program that just calls stat(2) and reports the object name, the logical size, the number of blocks used, the size of each block, and anything else would be useful there. The second phase would be a reduce phase that would sort the inputs by object name and then create the JSON structure that ncdu expects.

The trickiest part of that is probably dealing with the encoding between phases. If we just make the mapper a Node program, then we could just JSON.stringify the name, or even the entire record, but the startup cost would be substantially larger. That may not matter too much, given Manta's parallelism.

trentm · 2016-06-27T17:19:46Z

[trent.mick@us-east /trent.mick/stor/tmp/snap]$ cat afile
this is a file
[trent.mick@us-east /trent.mick/stor/tmp/snap]$ cat samecontent
this is a file
[trent.mick@us-east /trent.mick/stor/tmp/snap]$ ln afile snaplink
[trent.mick@us-east /trent.mick/stor/tmp/snap]$ ^D

$ minfo /trent.mick/stor/tmp/snap/afile
HTTP/1.1 200 OK
etag: 3c53eadf-0fa8-c3c0-c7de-db74fbecdd01
last-modified: Mon, 27 Jun 2016 17:16:29 GMT
durability-level: 2
content-length: 15
content-md5: JrtzVWzrMqXfMLczxTVe5Q==
content-type: text/plain
date: Mon, 27 Jun 2016 17:17:05 GMT
server: Manta
x-request-id: 350a0f51-9e6e-4c96-9aa9-b4ab3ca6e829
x-response-time: 24
x-server-name: 39adec6c-bded-4a14-9d80-5a8bfc1121f9
connection: keep-alive
x-request-received: 1467047825291
x-request-processing-time: 369

$ minfo /trent.mick/stor/tmp/snap/snaplink
HTTP/1.1 200 OK
etag: 3c53eadf-0fa8-c3c0-c7de-db74fbecdd01
last-modified: Mon, 27 Jun 2016 17:16:55 GMT
durability-level: 2
content-length: 15
content-md5: JrtzVWzrMqXfMLczxTVe5Q==
content-type: text/plain
date: Mon, 27 Jun 2016 17:17:09 GMT
server: Manta
x-request-id: a24a4ffa-0799-4a04-968d-c64e10894cb7
x-response-time: 10
x-server-name: 39adec6c-bded-4a14-9d80-5a8bfc1121f9
connection: keep-alive
x-request-received: 1467047829543
x-request-processing-time: 343

$ minfo /trent.mick/stor/tmp/snap/samecontent
HTTP/1.1 200 OK
etag: 3be48426-f15c-4938-9a20-c66a318093b0
last-modified: Mon, 27 Jun 2016 17:16:39 GMT
durability-level: 2
content-length: 15
content-md5: JrtzVWzrMqXfMLczxTVe5Q==
content-type: text/plain
date: Mon, 27 Jun 2016 17:17:29 GMT
server: Manta
x-request-id: 410af1ea-ccbf-4ab5-be22-a6e025e170e3
x-response-time: 11
x-server-name: 3d2b5d91-5cd9-4123-89a5-794f44eab9fd
connection: keep-alive
x-request-received: 1467047849735
x-request-processing-time: 422

Note that the etag for a snaplink'd file is the same. I wonder if it is
sematically correct to derive a "ino" value for the ncdu format from this
etag, as a way to indicate to ncdu that the snaplink is the equivalent of a
hardlink in terms of not taking extra space.

davepacheco · 2016-06-27T18:14:07Z

That's a good point, although I don't think know if we want to commit to that. I think it would be reasonable if an Etag for the same content was the same, even if it was a separate copy.

trentm · 2016-06-27T19:08:08Z

FWIW, obj.etag appears to be the metadata objectId: https://github.com/joyent/manta-muskie/blob/054fcc04fe724a0e319d6dedb185632c0f0c61bf/lib/obj.js#L520

Not sure that is a promised interface.

I think it would be reasonable if an Etag for the same content was the same, even if it was a separate copy.

Shouldn't it capture other metadata as well? content-type, 'm-*' headers.

davepacheco · 2016-06-27T20:41:44Z

Shouldn't it capture other metadata as well? content-type, 'm-*' headers.

Possibly. I'd have to review RFC 2616. Even if so, is it plausible that an implementation could allow separate copies of the same data with the same headers to have the same etag?

On the other hand, unless we add a new header for object-id, I'm not sure how else mdu could deal with snaplinks.

trentm · 2016-06-28T16:21:45Z

@davepacheco I wonder if it would be faster to just do directory listings and use the "size" and "durability" fields for listed files:

$ mls -j ~~/stor/tmp
{"name":"5f67e820-1489-4db7-9df2-1d8e3ec5cd90-file.gz","etag":"142ad91b-73d8-6cb4-9cd9-efacf7df7a9a","size":229535627,"type":"object","mtime":"2014-10-08T22:53:25.146Z","durability":2,"parent":"/trent.mick/stor/tmp"}
{"name":"5f67e820-1489-4db7-9df2-1d8e3ec5cd90.imgmanifest","etag":"88ac47b9-e53f-c065-b446-e2d0455c0c00","size":1052,"type":"object","mtime":"2014-10-08T22:52:44.298Z","durability":2,"parent":"/trent.mick/stor/tmp"}
...

That is logical instead of physical block usage, which is a bit of a departure from standard du. I don't know if that limitation would be unacceptable.

davepacheco · 2016-06-28T18:42:57Z

Yes, a tool that looked at logical usage would be much easier to build and would run much faster. For end users, that's probably more appropriate, too. But as operators, we've wasted lots of time in the past clearing out usage of lots of logical space that freed up very little physical space, and I really don't want to do that again. Ideally, this tool would have two modes, but it's the slower-running, harder-to-build one that we really want at the moment.

trentm · 2016-07-14T19:41:11Z

I've started work on this... might still be a while tho.

tebbers · 2016-08-22T10:03:32Z

@davepacheco I concur with you a du like tool would be awesome. Having to go through this to try to understand where "all the disk space went" I'm wondering if it would be easier/possible to expose compression ratio by file in the directory listing? then if a file was 1x compress vs 6x compressed, i think it would get you to the physical sizing a "bit" easier?

davepacheco · 2016-08-22T15:41:17Z

@tebbers That would definitely be nice, but it's surprisingly difficult. The problem is that ZFS doesn't appear to calculate the correct number of physical bytes used until the transaction group that created the file has been written out, which is likely well after the object's metadata has been committed. We could populate this information asynchronously, but that's non-trivial itself and leaves a window shortly after object creation where the information would be incorrect.

tebbers · 2016-08-22T16:16:14Z

@davepacheco you have any pointers? I'm trying to create a du -ks in shell scripts by doing mfind's and mget's but it's pretty time consuming.

davepacheco · 2016-08-22T16:20:47Z

I've been using manta-mdu, which is close to the point where it could be polished and brought into node-manta.

davepacheco added the enhancement label Aug 15, 2015

trentm self-assigned this Jul 14, 2016

trentm removed their assignment Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

there should be an mdu command #143

there should be an mdu command #143

bahamas10 commented Nov 1, 2013

bahamas10 commented Nov 1, 2013

davepacheco commented Jun 27, 2016 •

edited

trentm commented Jun 27, 2016

davepacheco commented Jun 27, 2016

trentm commented Jun 27, 2016

davepacheco commented Jun 27, 2016

trentm commented Jun 28, 2016

davepacheco commented Jun 28, 2016

trentm commented Jul 14, 2016

tebbers commented Aug 22, 2016

davepacheco commented Aug 22, 2016

tebbers commented Aug 22, 2016

davepacheco commented Aug 22, 2016

there should be an mdu command #143

there should be an mdu command #143

Comments

bahamas10 commented Nov 1, 2013

bahamas10 commented Nov 1, 2013

davepacheco commented Jun 27, 2016 • edited

trentm commented Jun 27, 2016

davepacheco commented Jun 27, 2016

trentm commented Jun 27, 2016

davepacheco commented Jun 27, 2016

trentm commented Jun 28, 2016

davepacheco commented Jun 28, 2016

trentm commented Jul 14, 2016

tebbers commented Aug 22, 2016

davepacheco commented Aug 22, 2016

tebbers commented Aug 22, 2016

davepacheco commented Aug 22, 2016

davepacheco commented Jun 27, 2016 •

edited