Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better verbosity for writes #7792

Open
mcast opened this issue Jul 1, 2021 · 0 comments
Open

better verbosity for writes #7792

mcast opened this issue Jul 1, 2021 · 0 comments
Milestone

Comments

@mcast
Copy link

mcast commented Jul 1, 2021

We're using icommands & server both 4.2.7 on Ubuntu Bionic.

The writing commands we (we locally being the /cgp zone) use are

  1. iput of "small" files: doesn't tell the destination
  2. iput of "big" files: -V flag will tell destination server, but not the resource
  3. irepl, any size: doesn't tell the destination

I'm asking specifically about write commands because we're seeing some odd performance problems which @kript and @bh9 are dealing with for us, but I can't so easily generate stats on "what transfers were fast, and which were slow?" when the icommands don't say where the file was put.

Things that would be useful,

  1. tell the DATA_ID of the resulting object
  2. tell the DATA_RESC_ID(s) of resulting copies as they're created
  3. machine readable output
  4. consistent output format & contents, regardless of the different code paths used to do the work
  5. consistent output format, even when the operation fails
  6. efficiency; using information already to hand, not spending time or generating load by doing more lookups.

I realise this isn't going to happen in time to deal with our current problem, but please can it guide future changes to icommand output?

Without the above I may look at the before & after of irepl happening. One significant advantage of this in the current contrext is that I have "before" already, with the DATA_ID of these files in bulk. From that I can discover later where these files ended up, and it's much more efficient to do them in bulk:

$ iquest '%s %s:%s %s/%s' "select order(DATA_ID), order(DATA_REPL_NUM), DATA_RESC_ID, COLL_NAME, DATA_NAME where DATA_ID in ('49188939', '49189577', '49189592', '49189595', '49201390', '49201414', '49201434', '49203573')"
49188939 0:48662204 /cgp/intproj/2500/sample/foo/foo.v1.sample.dupmarked.bam
49188939 1:48624007 /cgp/intproj/2500/sample/foo/foo.v1.sample.dupmarked.bam
49189577 0:38224944 /cgp/intproj/2500/sample/foo/foo.v1.sample.dupmarked.bam.bai
49189577 1:41692990 /cgp/intproj/2500/sample/foo/foo.v1.sample.dupmarked.bam.bai
49189592 0:48662204 /cgp/intproj/2500/sample/foo/foo.v1.sample.dupmarked.bam.met.gz
49189592 1:41692989 /cgp/intproj/2500/sample/foo/foo.v1.sample.dupmarked.bam.met.gz
49189595 0:41704469 /cgp/intproj/2500/sample/foo/foo.v1.sample.dupmarked.bam.bas
49189595 1:41693077 /cgp/intproj/2500/sample/foo/foo.v1.sample.dupmarked.bam.bas
49201390 0:38224943 /cgp/intproj/2500/sample/foo/foo.v1.ascat.counts.gz
49201390 1:41692991 /cgp/intproj/2500/sample/foo/foo.v1.ascat.counts.gz
49201414 0:48662203 /cgp/intproj/2500/sample/foo/foo.v1.ascat.counts.gz.tbi
49201414 1:48624007 /cgp/intproj/2500/sample/foo/foo.v1.ascat.counts.gz.tbi
49201434 0:41704467 /cgp/intproj/2500/sample/foo/foo.v1.ascat.counts.is_male.txt
49201434 1:41693079 /cgp/intproj/2500/sample/foo/foo.v1.ascat.counts.is_male.txt
49203573 0:48662208 /cgp/intproj/2500/sample/foo/foo.v1.merged.bw
49203573 1:41692984 /cgp/intproj/2500/sample/foo/foo.v1.merged.bw

This way I have the option of patching over the lack of verbosity with more programming but less runtime, because I can query around 1000 individual files or a range of 100k files in one bite.


Current outputs look like this, when I ask for Christmas-tree verbosity:

$ iput -v -V -P -f MANIFEST /cgp/sandbox/mca/MANIFEST
0/1 -  0.00% of files done   0.000/0.003 MB -  0.00% of file sizes done
Processing MANIFEST - 0.003 MB   2021-07-01.11:27:25
   MANIFEST                        0.003 MB | 0.531 sec | 0 thr |  0.005 MB/s

$ iput -v -V -P -f ~/ln.any-old.bam /cgp/sandbox/mca/any-old.bam
0/1 -  0.00% of files done   0.000/12531.713 MB -  0.00% of file sizes done
Processing ln.any-old.bam - 12531.713 MB   2021-07-01.11:05:07
From server: NumThreads=16, addr:irods-cgp-sb01, port:20185, cookie=48059554
ln.any-old.bam - 280.000/12531.713 MB -  2.23% done   2021-07-01.11:05:12
ln.any-old.bam - 720.000/12531.713 MB -  5.75% done   2021-07-01.11:05:12
ln.any-old.bam - 1320.000/12531.713 MB - 10.53% done   2021-07-01.11:05:12
ln.any-old.bam - 1840.000/12531.713 MB - 14.68% done   2021-07-01.11:05:13
ln.any-old.bam - 2520.000/12531.713 MB - 20.11% done   2021-07-01.11:05:13
ln.any-old.bam - 3040.000/12531.713 MB - 24.26% done   2021-07-01.11:05:14
ln.any-old.bam - 3400.000/12531.713 MB - 27.13% done   2021-07-01.11:05:14
ln.any-old.bam - 4120.000/12531.713 MB - 32.88% done   2021-07-01.11:05:14
ln.any-old.bam - 4840.000/12531.713 MB - 38.62% done   2021-07-01.11:05:15
ln.any-old.bam - 5400.000/12531.713 MB - 43.09% done   2021-07-01.11:05:16
ln.any-old.bam - 6160.000/12531.713 MB - 49.16% done   2021-07-01.11:05:16
ln.any-old.bam - 6520.000/12531.713 MB - 52.03% done   2021-07-01.11:05:17
ln.any-old.bam - 7240.000/12531.713 MB - 57.77% done   2021-07-01.11:05:17
ln.any-old.bam - 8040.000/12531.713 MB - 64.16% done   2021-07-01.11:05:18
ln.any-old.bam - 8640.000/12531.713 MB - 68.95% done   2021-07-01.11:05:18
ln.any-old.bam - 9440.000/12531.713 MB - 75.33% done   2021-07-01.11:05:19
ln.any-old.bam - 9880.000/12531.713 MB - 78.84% done   2021-07-01.11:05:19
ln.any-old.bam - 11063.232/12531.713 MB - 88.28% done   2021-07-01.11:05:20
ln.any-old.bam - 11836.160/12531.713 MB - 94.45% done   2021-07-01.11:05:21
ln.any-old.bam - 12095.553/12531.713 MB - 96.52% done   2021-07-01.11:05:21
ln.any-old.bam - 12531.713/12531.713 MB - 100.00% done   2021-07-01.11:05:21
   any-old.bam                 12531.713 MB | 37.172 sec | 16 thr | 337.124 MB/s
$ irepl -R red -v -V -P /cgp/sandbox/mca/any-old.bam
0/1 -  0.00% of files done   0.000/12531.713 MB -  0.00% of file sizes done

so I had since 3.3.1 regarded the -v and -V flags as producing no useful information, and avoided using them.

@korydraughn korydraughn transferred this issue from irods/irods_client_icommands Jun 12, 2024
@alanking alanking added this to the 4.3.4 milestone Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants