Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit how much data we read from a command substitution #3822

Closed
krader1961 opened this issue Feb 7, 2017 · 12 comments
Closed

limit how much data we read from a command substitution #3822

krader1961 opened this issue Feb 7, 2017 · 12 comments
Assignees
Milestone

Comments

@krader1961
Copy link
Contributor

@krader1961 krader1961 commented Feb 7, 2017

At the moment we don't impose any limits on the amount of data that we accept from read or a command substitution (e.g., (cmd)). Issue #3712 proposes setting a limit of 100 MiB for any single read execution. This issue is to discuss whether the same, or a similar, limit should be imposed on command substitution output. See issue #3817 for examples of what can happen without such a limit.

@ChrisJefferson
Copy link

@ChrisJefferson ChrisJefferson commented Feb 7, 2017

Also (don't know if this should be a separate issue), ctrl+c doesn't work while a command substitution is running, leaving no method of escaping from a bad substitution.

@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Feb 7, 2017

Yes, @ChrisJefferson, that is a distinct issue separate from this one. In fact, it's already been reported. See issue #3527.

@ChrisJefferson
Copy link

@ChrisJefferson ChrisJefferson commented Feb 8, 2017

thanks, didn't want it lost. Sorry.

@faho
Copy link
Member

@faho faho commented Feb 9, 2017

ARG_MAX is probably of interest here. See e.g. http://www.in-ulm.de/~mascheck/various/argmax/ and http://pubs.opengroup.org/onlinepubs/7908799/xsh/limits.h.html.

It is defined as "Maximum length of argument to the exec functions including environment data. Minimum Acceptable Value: _POSIX_ARG_MAX ".

So that means once we exceed it, we can't rely on external commands being launchable anymore. We could allow more for our builtins, so instead of doing one invocation of an external command with a huge amount of data you'd store it in a variable and then run the external command with smaller chunks.

But the goal here isn't to disallow "illegal" invocations, it's to avoid a hung shell, so we should probably just pick a high limit and use it for everything. If you need to e.g. remove a gazillion files, use rm -r or find -delete instead of a command substitution.

(On my system, ARG_MAX is 2MiB, btw)

@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Feb 11, 2017

We should not distinguish between the ARG_MAX limit for external commands and builtins (or functions) not subject to that limit. Doing so just leads to inconsistencies that confuses users. At least with respect to limiting the text interpolated by a command substitution. Not least because other text on the command line outside of the command substitution could easily result in ARG_MAX being exceeded. The two failure modes are independent of each other.

Should we use the maximum of ARG_MAX or 10 MiB for command substitution? I vote no for consistency with the read command.

@zanchey
Copy link
Member

@zanchey zanchey commented Feb 27, 2017

Which bit are you voting no to? :-)

@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Feb 27, 2017

No to using ARG_MAX. Using that limit for any single command substitution would not guarantee that an external command could be launched because other arguments on the command line could push us over the OS limit. Too, on many systems ARG_MAX can be quite small (e.g., a few tens of KB). It would also be inconsistent with the recently imposed limit for the read command and unnecessarily limit the utility of fish builtins and functions.

@floam
Copy link
Member

@floam floam commented Mar 3, 2017

Should this perhaps be something queryable/set through our ulimit builtin rather than an environment variable?

@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Mar 3, 2017

Should this perhaps be something queryable/set through our ulimit builtin rather than an environment variable?

Yes, but only if we can be reasonably confident it won't need to be set prior to reading the users ~/.config/fish/config.fish. That's because all the other ulimit options affect child processes whereas this would not. So using ulimit is not 100% equivalent to using an env var for this particular setting. Unless, of course, we magically mapped it to an env var and vice-versa. But if we did that why bother involving ulimit since this is a var, like fish_use_posix_spawn, that we do not expect anyone to ever need to set except for debugging; e.g., like I did in the read unit tests.

@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Mar 4, 2017

Since no one has argued for not implementing a limit on how much data we will accept from a command substitution I'm going to remove the RFC label.

The only question I think still needs an answer is what to do if the limit is exceeded for a non-interactive shell. We should obviously emit a warning to stderr whether or not the shell is interactive. But if non-interactive should the shell immediately exit or simply treat that command substitution has having failed with a non-zero status (and zero characters substituted)?

@krader1961 krader1961 removed the RFC label Mar 4, 2017
@zanchey
Copy link
Member

@zanchey zanchey commented Jun 1, 2017

I think the behaviour should mirror that of read - that is, exit with a non-zero status without any substitution.

@krader1961 krader1961 modified the milestones: fish-3.0, fish-future Jun 23, 2017
@krader1961
Copy link
Contributor Author

@krader1961 krader1961 commented Jun 23, 2017

We snuck in the same change to the read command in the 2.6.0 release. Which was arguably wrong to do since it wasn't backward compatible. We should only do the same thing for command substitutions in a major release. So retargeting this to the fish 3.0.0 release it looks like we're going to do a major release in the near future.

Also, it is a good sign that no one has reported a problem with the 10 MiB limit on read commands since 2.6.0 was released. That increases my confidence this is a low risk change that will only make it harder for badly written fish script to result in a denial of service type problem.

@krader1961 krader1961 self-assigned this Jul 27, 2017
krader1961 added a commit that referenced this issue Aug 4, 2017
This makes command substitutions impose the same limit on the amount
of data they accept as the `read` builtin. It does not limit output of
external commands or builtins in other contexts.

Fixes #3822
@krader1961 krader1961 closed this Aug 4, 2017
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants