Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellanous fixups of a miscellaneous nature #171

Closed
wants to merge 7 commits into from

Conversation

eli-schwartz
Copy link
Collaborator

Initially motivated by the desire to update the PGP error string, then I found subprocesses using shells everywhere and jumped down the rabbit hole.

@polygamma
Copy link
Owner

I am glad you found the way back out of the rabbit hole! Going to review and test your changes in detail.

expac_base_query = "expac {} '{}'".format(option, "?!".join(["%{}".format(formatter) for formatter in formatting]))
base_query_length = len(expac_base_query.encode("utf8"))
queries_parameters = split_query_helper(max_query_length, base_query_length, 1, targets)
cmd = ["expac", option, "?!".join(["%{}".format(formatter) for formatter in formatting])] + targets

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even understand what this used to be doing. Why does this need to be split? ARG_MAX is quite reasonable, but if not, then it should accept names on stdin.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When being run with subprocess.run, the command may be too long to be executed, I had those values researched when I coded that part, it really is needed to split the query, if it is too long.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, splitting it on 131071 doesn't even make sense, how do you know what ARG_MAX is.

But like I said, if you're genuinely afraid the number of targets will overflow ARG_MAX, the appropriate response is not to try splitting it into multiple commands, but switch modes of operation and tell expac to read from the not-ARG_MAX-limited stdin.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about the number of arguments, but the number of characters of the query

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARG_MAX is the kernel-defined character (not argument) limit of a command line being passed through exec(2). Is there something different, which is expac-specific, that you are referring to?

Because the number in question is the current value of ARG_MAX from uapi/linux/limits.h but again this is not really a reliable way of determining what is safe, and it's probably better to skip the issue entirely.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I misinterpreted the meaning of ARG_MAX then, guess it's the value I looked up back then, when I researched the max limit. I do not see any way listed on https://github.com/falconindy/expac to make expac read from stdin however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use - as a target (and /dev/stdin is not a terminal), expac will read targets from standard input.

For example (use printf so standard input is not a terminal)

$ printf "%s\n" aurman coreutils pacman
aurman
coreutils
pacman
$ printf "%s\n" aurman coreutils pacman | expac -Q "%n %v" -
aurman 2.12-1
coreutils 8.29-1
pacman 5.1.0-2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, like I said and, specifically, implemented in 4ac65c9

test_command("rm -rf expac-git/")

# install aurman
test_command("sudo python setup.py install --optimize=1", "/home/aurman/aurman-git")
test_command("sudo python setup.py install --optimize=1", dir_to_execute=d(d(d(a(__file__)))))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are less reliant on being run inside docker, hardcoded paths kind of suck.

return_code = run(command, shell=True, stdout=DEVNULL, stderr=DEVNULL).returncode
else:
return_code = run(command, shell=True, stdout=DEVNULL, stderr=DEVNULL, cwd=dir_to_execute).returncode
test_output = run(command, shell=True, stdout=DEVNULL, stderr=PIPE, universal_newlines=True, cwd=dir_to_execute)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole function actually could be used a lot more generically everywhere, I stopped trying to see error output on Travis partway through, so it only touched this file. Oh well.

@eli-schwartz
Copy link
Collaborator Author

It often seems to claim failure on aurman -Syu cower --noedit --pgp_fetch --keyserver hkp://ipv4.pool.sks-keyservers.net:11371 --noconfirm, with keyserver issues.

Rebase with more thorough fix for error codes, which might fix the timing or else expose the error more obviously.

@polygamma
Copy link
Owner

polygamma commented Jun 19, 2018

Actually I had planned to not use "cower" anymore due to the needed PGP key. I had noticed that, too, and wanted to change all that when rewriting most of the docker tests. had planned to do that after my exams, which start on Monday... fun times


# if last commit seen hash file does not exist - create
if not os.path.isfile(last_commit_hash_file):
empty_tree_hash = run(
"git hash-object -t tree --stdin < /dev/null",
shell=True,
["git", "hash-object", "-t", "tree", "/dev/null"],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e. g. here, how do you know, that this is doing exactly the same as executing git hash-object -t tree --stdin < /dev/null

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, running them both produces the same results, because they're both reading the contents of /dev/null. It's actually a constant in the git source code, but using hash-object is recommended because that does not depend on git using SHA1.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, maybe I have not made clear, what I wanted to know. The List writing ["git", "hash-object", "-t", "tree", "/dev/null"] leaves out --stdin <, but why is still exactly that being executed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

["git", "hash-object", "-t", "tree", "/dev/null"] will execute git hash-object -t tree /dev/null

git hash-object <file> is equivalent to git hash-object --stdin < <file> (see man git-hash-object)

git hash-object -t tree /dev/null is the recommended method of deriving the empty tree object (for example, the official sample pre-commit hook)

test_command("rm -rf expac-git/")

# install aurman
test_command("sudo python setup.py install --optimize=1", "/home/aurman/aurman-git")
test_command("sudo python setup.py install --optimize=1", dir_to_execute=d(d(d(a(__file__)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous invocation, you omit dir_to_execute. Since this invocation is in the same directory, doesn't it make sense to omit it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who omitted what?

I modified an existing instance to derive the dir_to_execute from the code itself rather than making assumptions about where the testsuite is cloned to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was referring to the fact that the tests are executing from /home/aurman anyway. But your approach seems more robust.

print("Error with: '{}'".format(command))

if CurrentTest.to_return == 0 and return_code != 0:
if CurrentTest.to_return == 0 and test_output.returncode != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to test for CurrentTest.to_return == 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not especially relevant to this PR, I guess.

@polygamma
Copy link
Owner

Thanks to @eli-schwartz for your effort and also thanks to @saleemrashid for the explanations and also the effort trying to further improve this PR. I am currently preparing for exams, hence I am not going to merge this over the next days (I guess), because I also want to look through the whole aurman sourcecode to make sure, that there is nothing left, which has not been changed accordingly. I am however going to rewrite the docker tests first, so that it is less pain to work with them.

@eli-schwartz
Copy link
Collaborator Author

Rebase on top of #173

@polygamma
Copy link
Owner

Have changed the docker tests, see: 63b126a

Currently still using cower and cower-git, do you have any suggestions for better options? Should be quite fast to build, conflict each other and do not need a PGP key.

@eli-schwartz
Copy link
Collaborator Author

Rebase on master, dropping the docker test changes and dropping 016ad45 since the same change got manually added in fc5d55b.

@polygamma
Copy link
Owner

polygamma commented Jun 20, 2018

I changed the keyserver for the docker tests, I guess using the "pool" was the problem, I took the ubuntu keyserver for the joke of it

@eli-schwartz eli-schwartz force-pushed the fixups branch 2 times, most recently from 321746f to de85983 Compare June 20, 2018 22:53
@eli-schwartz
Copy link
Collaborator Author

Status?

@polygamma
Copy link
Owner

got my last exam next week, will look into and merge this PR probably on the weekend, so 6-8 July

@eli-schwartz
Copy link
Collaborator Author

Rebased to resolve conflicts with #186

@polygamma
Copy link
Owner

I have cherrypicked some commits, rest to be done next week

Don't fork subprocesses in order to handle things in the stdlib (python
has a very decent in-process C module for this). Also DRY. Also better
chained error messages.
@eli-schwartz
Copy link
Collaborator Author

Rebased to remove cherry-picked commits 989ce00 7984f37 2666fb7

In nearly every case it is better to run subprocess.run() without the
burden of forking a bash shell too. All it takes is to turn strings into
arrays of strings and *remove* some use of formatting.

Aurman specifically, does not use *any* of the fancy shell
interpretation that sometimes makes forking a shell to fork the
commands, a worthwhile endeavor. So don't do so.

This is further cleanup after the removal of subprocesses for file
operations. Only changes where this cleanly applies have been done for
now... in a followup commit, subprocesses that handle arguments as
heavily-quoted strings will be handled.
Some functions pass around lots of string-formatted arguments. Handling
this properly would require major refactoring to use lists everywhere.
For now, use shlex to convert strings to a proper list-based array that
subprocess.run can elegantly handle without running its own shell
parser.
Instead of decoding the stdout of subprocess.run and then doing our own
writes, just hand subprocess the file object and let it write directly.
Appending is a bit faster than constucting new lists on top of the old
ones. It also avoids the need to use two lists.
@eli-schwartz
Copy link
Collaborator Author

Develop branch looks good, and has the heroic quality of making aurman actually use native lists for this, instead of the shlex dance, which is definitely nice.

Thanks!

@polygamma
Copy link
Owner

see: 5a40935...32e2ac6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants