`elfshaker list` output changes proposal #74

peterwaller-arm · 2022-04-04T09:23:11Z

At the moment 'elfshaker list' shows packs by default. This is inconvenient as the output is not something you can feed into 'elfshaker extract' and other tools.

@veselink1 and I discussed this briefly and here are some considerations:

The output should be sorted.
- The sort incurs a delay/cost, but we think it would be worse to have something which 'often appears to be sorted but is not', which would be the alternative. I might revisit this decision if the sorting turned out to be expensive, because we have to read in the full set of snapshots across all of the packs.
The output should be in canonical PACK:SNAPSHOT form.
- Should we elide [PACK:] where possible? Probably not because then we might be back to the 'looks like sorted order problem'. Unless we think it's clear enough to have the snapshot name as the sort key.
Only the fully-qualified snapshot name should be shown by default.
- Extra information about snapshots (files? size?) is convenient to have but should not be at the expense of breaking machine readability by the default, so should probably be available via a flag. --format?
UI inputs
- elfshaker list shows all snapshots.
- elfshaker list <pack> shows snapshots within a pack.
- elfshaker list <snapshot> could show files within a snapshot.
UI outputs
- Machine readability applies to all of the above, they should share behaviours when it comes to --format considerations.
- I think possibly we should just always drop the header, and document what it is. I think it's also somewhat discoverable in the sense that it is 'somewhat obvious' what it is. Having the header print to stderr is a neat idea but nonstandard and more often than not I end up piping it to /dev/null. I'm flexible on this point.
Unique snapshots
- What to do if a snapshot name is not unique? A common way this arises right now is if you have both a loose snapshot and a snapshot living within a pack. One observation I have here is that the snapshot is unique in the presence of duplicates if the hashes of the contents of the extract would be the same. I suspect this is not too expensive to compute/maintain. One possibility is to memoize it as another field to the .pack.idx file and compute it on demand if the information is required but not present.
Documentation
- We'll need to update docs, probably.

Another thing to consider is that we're talking about changing the interface here, which could be confusing to users not prepared for it. I don't think it's big deal at this stage in the project, and the failures would be pretty obvious to users in this case, but not something I would do lightly in the future.

The text was updated successfully, but these errors were encountered:

veselink1 · 2022-05-05T12:43:07Z

Agreed on all points expect the machine readability by default.

I think useful output by default is better for new (all?) users. When machine readability is desired it could be specified by (e.g.) --format "%s" to only print the snapshot name. My personal experience with tools that rely on a format string to print useful output is that I cannot commit to memory all the format specifiers and end up with one extra alias in my .bashrc. Those are my thoughts but I'm happy to also have machine readability by default if you think it will make things easier in the more common case.

Headers on stderr - I'm not sure what the standard way of doing this is. We could omit them on !isatty(stderr). But I don't think they are that useful either way.
Unique snapshots - This does not seem relevant to elfshaker list. Perhaps you were thinking about the possibility of using wildcards with elfshaker extract. I like the idea of memoizing snapshot hashes; we could implement this separately in the future if time permits.

peterwaller-arm · 2022-05-06T14:12:17Z

My day went differently than I imagined so I didn't get much time to hack on this today, sorry.

One issue I'd not considered much yet was what to do if a snapshot and a pack are ambiguous, if list were to accept both packs or snapshot. I'm wondering if really this should be elfshaker list [packs]... where the packs are optional, but elfshaker list-packs to get a list of packs and elfshaker show <snapshot> to list files within a snapshot (note, I don't think show is the right verb here, but naming can come later if it's right to separate these things).

Machine readability: For me the question of machine readability is "How much value does breaking machine readability by default bring?". If the extra human readable information is vital then maybe it's worth it. In my mind at the moment that's not the case though, the extra information is 'how big is the snapshot' and 'how many files does the snapshot have', which aren't particularly actionable information, and are available through other obvious routes (e.g. extract the snapshot, run du, or run find | wc). It's not like we'd be making the information inconvenient to have, either. It's just if I'm composing a one-liner to grab something out of the list I'd prefer not to have to bloat it by specifying extra flags to make it compose nicely together in the shell. As an aside, the primary actionable information I've encountered so far there is whether (e.g. for manyclangs) the build failed, and therefore the snapshot is essentially empty. Maybe it might be useful to have another way to expose his information. More thinking required.

Headers on stderr: my experience of this so far is just that it's weird. As an experienced unix user I haven't seen this behaviour in other apps, so I fear it is too surprising. And yeah, I think the value brought by the header is low so it's probably best omitted.

Unique snapshots: my concern there was what to do for the output of elfshaker list, if a snapshot appears in multiple packs. One possibility is to list the snapshot only once if it does not matter in practice which snapshot you extract (because they are bit identical). And in the case where disambiguation is required maybe they should be fully qualified. The other obvious possibility is for the output to always be fully qualified, but this feels like bloating the output with a lot of redundant information. I'm not fully decided on this yet so think I'll experiment.

veselink1 · 2022-05-06T15:21:49Z

One issue I'd not considered much yet was what to do if a snapshot and a pack are ambiguous, if list were to accept both packs or snapshot.

I like the list-packs, list distinction. The command to print the files in a snapshot could be list-files.

Machine readability: "How much value does breaking machine readability by default bring?" We seem to have entirely different perspectives on this. Whichever way we end up implementing this, it's not going to make people not want to use elfshaker. I'd optimise for OOTB interactive usage. But having to extract a snapshot to compute size is not straightforward: the directory contents will have to be replaced by the new snapshot, also find | wc won't work because of elfshaker_data. There would have to be a flag to enable snapshot sizes in the output.

Headers on stderr: Ack

Unique snapshots: I think the output should always be fully-qualified because of the issues with sorting that could arise otherwise.

veselink1 · 2022-07-15T07:05:49Z

After more than 2 months, I have some more thoughts :)

Unique snapshots: I like the idea of outputting the simple, unqualified snapshots and I understand the role disambiguating between snapshots could play in simplifying the output. But, I just realised that even if we can disambiguate between A/snapshot1 and B/snapshot1 in the majority of cases when they are identical, they are not necessarily identical. This was perhaps discussed at some point but I cannot recall clearly. In any case, I believe this makes a strong point in support of always outputting fully-qualified snapshots in the list output.

pwaller · 2022-07-16T13:32:26Z

in the majority of cases when they are identical, they are not necessarily identical.

Yes, my thinking was that if they are not identical, it could print fully qualified. Perhaps it is too weird to have something which looks like it only prints snapshot names but then suddenly prints pack:snapshot unexpectedly. However, my further thinking there was that if an element of elfshaker list's output is taken somewhere, the most likely thing is that it will be reused as input to elfshaker somewhere. In that case, it doesn't seem to harmful so long as the item printed identifies the pack.

On the other hand, I'm assuming time proximity between reading the output and reusing the input, and ignoring the fact that things could change in the meantime to make them ambiguous. This seems reasonable since I expect ambiguities arriving by name to be rare according to 'sensible intended use'.

One option in my mind is to document that the output of elfshaker-list is a snapshot-ish, that is, it may be a snapshot or a fully-qualified pack:snapshot pair. If we did that, we could start with short outputs, and migrate to always-qualified if a use case was discovered which necessitated it, and it wouldn't break anything since the the now changed output should be usable in most places that the original outputs would be usable.

veselink1 · 2022-07-16T13:39:08Z

Given

m:a1
m:a2
n:a1
n:a3

What would the list output be according to your suggestion?

a2
a3
m:a1
n:a1

Wouldn't that be confusing if expecting to see consecutive snapshots?

pwaller · 2022-07-16T13:52:22Z

Yes, it could be confusing to someone. Would it matter in practice, so long as users are anyway feeding the input into elfshaker commands which don't mind being fully qualified?

I'm not sure if we can promise consecutive snapshots efficiently, also. It would be good if we can avoid the need to sort, which could get expensive with sufficient snapshots. Maybe we have to sort because the alternative would be too confusing. I'd like to try without and see how bad it is.

Also, if we can identify identical snapshots, we don't need to print m:a1 and n:a1 separately anyway in the case they're identical, we could drop the second instance.

veselink1 · 2022-07-16T13:58:22Z

elfshaker find prints all snapshots without sorting and the output can be fed back into elfshaker. We do that in manyclangs-run to match commit SHA to snapshot. I prefer optimising elfshaker list for ease of use and readability. In any case, we can try both approaches and see which we like more. It should be 1 line change anyway.

elfshaker list has been split into several commands. - elfshaker list <pack>... lists the snapshots in the packs - elfshaker list-files <snapshot> lists the files in the snapshot - elfshaker list-packs lists the packs in the repo See #74 and the help output.

peterwaller-arm changed the title ~~Output changes proposal~~ elfshaker list output changes proposal Apr 4, 2022

veselink1 mentioned this issue Jul 15, 2022

Use checksums to disambiguate snapshots in >1 pack #84

Merged

veselink1 mentioned this issue Jul 17, 2022

New elfshaker list behaviour #85

Merged

veselink1 closed this as completed in #85 Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`elfshaker list` output changes proposal #74

`elfshaker list` output changes proposal #74

peterwaller-arm commented Apr 4, 2022

veselink1 commented May 5, 2022

peterwaller-arm commented May 6, 2022

veselink1 commented May 6, 2022

veselink1 commented Jul 15, 2022

pwaller commented Jul 16, 2022

veselink1 commented Jul 16, 2022

pwaller commented Jul 16, 2022

veselink1 commented Jul 16, 2022

elfshaker list output changes proposal #74

elfshaker list output changes proposal #74

Comments

peterwaller-arm commented Apr 4, 2022

veselink1 commented May 5, 2022

peterwaller-arm commented May 6, 2022

veselink1 commented May 6, 2022

veselink1 commented Jul 15, 2022

pwaller commented Jul 16, 2022

veselink1 commented Jul 16, 2022

pwaller commented Jul 16, 2022

veselink1 commented Jul 16, 2022

`elfshaker list` output changes proposal #74

`elfshaker list` output changes proposal #74