Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new output format suitable for the tsort utility. #23

Closed
Earnestly opened this issue Jul 28, 2015 · 10 comments
Closed

A new output format suitable for the tsort utility. #23

Earnestly opened this issue Jul 28, 2015 · 10 comments

Comments

@Earnestly
Copy link

I currently have some code which outputs packages along with their dependencies in a format that can be used by tsort. However this is slow because it appears I have to run expac for each package in the list. E.g.

    for dep in $(expac -S '%E' "$pkg"); do 
        printf '%s %s\n' "$pkg" "$dep"
    done

It would be really nice if I could use expac -Sl '\n%n' '%n %E' "${packages[@]}" instead where the string -l accepts is formatter aware, e.g.:

teapot earnest %i master ~ expac -l '\n%n ' -S '%n %E' systemd
systemd acl
%n bash
%n dbus
%n iptables
%n kbd
%n kmod
%n hwids
%n libcap
%n libgcrypt
%n libsystemd
%n libidn
%n lz4
%n pam
%n libseccomp
%n util-linux
%n xz

However as you can see, the %n is inserted literally.

This would make for example, doing cycle detection really simple:

expac -Sl '\n%n ' '%n %E' "$(pacman -Sql core)" | tsort

The other benefit of this would be to make building associative arrays easier too: (obviously substituting %n with the appropriate expansion.)

% expac -l ' %n ' -S '%n %E' systemd
systemd acl %n bash %n dbus %n iptables ... # and so on.

The command-line syntax by repurposing -l is perhaps a little less pretty but it does seem to get pretty close to providing this functionality.

What are your thoughts?

@falconindy
Copy link
Owner

I'm intrigued by the idea of allowing the -l argument to be interpreted. However, I'm confused by your example. What do you expect expac -l ' %n ' '%n' -S '%E' systemd to do? Expac would parse this as: a delimeter of ' %n ', an output format of '%n', and two targets -- '%E' and 'systemd'.

@Earnestly
Copy link
Author

Sorry, that was a mistake, I've fixed it in the main issue text.

The correct invocation was expac -l ' %n ' -S '%n %E' systemd which currently outputs:

systemd acl %n bash %n dbus %n iptables ...

@falconindy
Copy link
Owner

Implemented this locally by allowing formatting for -d and -l. I noticed that while it works for core, it fails for extra because there exist packages which have no dependencies, i.e.

./expac -1Sl '\n%n ' '%n %E' $(pacman -Sql extra) | awk 'NF == 1 { n += 1 } END { print n }'
359

In this case, tsort refuses to process the input because there's not a valid edge. I'm not sure this invalidates the patch, but it complicates your use case.

@Earnestly
Copy link
Author

Yeah, it does make things a bit more complex. It can probably be scripted around but not ideal.

This is probably a terrible idea, but perhaps if %E (or any value) is empty, a replacement could be used as a placeholder. Keenard's jshon lets you do this for missing keys so that anything expecting a certain structure will still find it intact. It does this by inserting "null" markers via the -C flag. For expac this sounds fairly hacky though just for one use-case.

edit: Of course, by mixing content and structure like this you will basically assume no key or object can be "null" in jshon's case.

@Earnestly
Copy link
Author

Could it maybe be an idea to just insert the package name itself as a dependency if it contains none? This appears to work with tsort, e.g.

% tsort <<EOF
a a
a b
b b
EOF
=> a
=> b

@Earnestly
Copy link
Author

Actually with a bit of thinking, combining ESR's notes on DSVs I've instead used expac to print a nice awk friendly format, such that package:dep:dep:dep:dep\n using expac -S '%n:%E' -l ':'.

This can be trivially parsed with awk -F ::

p = $1

# If we find that a package contains no dependency, make it a
# dependency of itself to provide tsort a complete graph.
if($2 == ""){
    $2 = $1
}

for(i = 0; i <= NF; ++i){ 
    print p, $i
}

This appears to work on every repo in Arch Linux but I'm unsure of it's correctness.

For science, here is a compressed version of the script I am testing it with:

expac -S '%n:%E' -l ':' < <(pacman -Sql extra) | awk -F : '{p=$1; if($2==""){$2=$1}; for(i=0; i<=NF;++i){print p, $i}}' | uniq | tsort > /dev/null

If this is valid, this bug can be closed as invalid.

@AladW
Copy link

AladW commented Feb 21, 2016

While awk can be used to circumvent the issue, it would still be nice to have an option like --isempty to replace empty fields. For example, to account for packages without dependency:

expac '%n %E' --isempty '%E=%n'

Together with the suggested format support for -l, the above one-liner would be reduced to

expac '%n %E' -l'\n%n ' --isempty '%E=%n' < <(pacman -Sql extra) | tsort

Alternatively you could do something like --empty '%n' to replace any empty field with %n.

@falconindy
Copy link
Owner

I'm not clear on what you're suggesting. What would the arg to --isempty be replacing? What is it which is "empty" that would trigger this behavior?

@Earnestly
Copy link
Author

I think what would be nice is a way to prefix the output, using expac in a pipeline can be awkward due to the format of its output when it comes to multiple items.

For example, %n %E might return multiple items which instead of being presented in a nice columnar format, will print it out as a single line. Attempting to use -l '\n' to try and make it a column causes more issues than it solves as the first line now contains name firstdep.

I'm not really sure how to design this whole thing and I think expac will always be used along with a filter for most uses, so if we could somehow get a column based output such as:

% expac --prefix '%n' '%E' foo qux
foo bar
foo baz
foo qux
qux frob
qux norf

If in the above there turns out to be nothing to satisfy '%E' then it's fine too because even a read loop which as

while read -r name dep; do
    if [[ ! $dep ]]; then
        dep=$name
    fi
    ...
done

could easily handle it. Of course the trade-off here is that it assumes %n is always valid.

But I don't really know exactly how to define a general command-line or approach where this will work, perhaps allowing formatters in -l or -d can work and cases of missing values needs to be validated as part of the pipeline.

An example such as jshon uses -C for attempting to continue parsing and when encountering missing values, replacing it with an in-band null string.

Of course the trade-off here is you now have in-band data, but I've made use of it in some cases because I can guarantee that certain values which I use for sentinels can never be the string "null" which neatly allowed me to skip any real kind of validation which would have being complex and annoying.

In the case of %n we can probably guarantee that it'll always be inhabited, or am I wrong?

@AladW
Copy link

AladW commented May 2, 2016

On a side note, I noticed a while ago you can use -v to achieve a similar effect as jshon -C:

% expac -Sv '%n %R %S' linux
linux None None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants