Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix proc_* usages for running shell commands not correctly returning the child process status code due to quirky proc_* behaviours #3745

Closed
wants to merge 5 commits into from

Conversation

luispabon
Copy link

@luispabon luispabon commented Jan 4, 2024

Using proc_* instead of shell_exec does give an extra degree of control over the execution of commands, but also introduces quirky behaviour (present for nearly 20 years) whereby if the pipes aren't read before we close them, the return code given by proc_close is incorrect.

This fixes a number of issues on the AcmeClient where commands would apparently fail with return code 120 when in fact they actually succeeded.

This actually fixes #1915 and #2710

The better solution would be to use one of the more established libraries (eg symfony/process) but it seems that introducing dependencies is a no-no given the project structure.

I've noticed that all of the proc_ usages dotted around the code are pretty much copy paste.

Synthetic test: running a command that fails with before / after the fix:
image

Similar test, but with a succeeding command
image

Here's an example of the anomalous behaviour while running a google cloud DNS cert update
image

And this is after the patches above:
image

Again, when running acme.sh:
image

And after the fix:
image

…ss status code due to quirky proc_* behaviours
@fichtner
Copy link
Member

fichtner commented Jan 4, 2024

To be honest this is one reason why random and custom “run shell command” functions make no sense in the plugin scope.

@luispabon luispabon changed the title Fix LeUtil::run_shell_command not correctly returning the child process status code due to quirky proc_* behaviours Fix proc_* usages for running shell commands not correctly returning the child process status code due to quirky proc_* behaviours Jan 4, 2024
@luispabon
Copy link
Author

luispabon commented Jan 4, 2024

Ready for review.

@fichtner I don't know enough of the design philosophy of this codebase to say one way or the other, but this at least fixes the existing problem. Longer term, I'd tentatively suggest to provide a library on base opnsense code to let plugins run commands safely and with less shenanigans. But it's out of scope for this fix I'd say.

@fraenki
Copy link
Member

fraenki commented Jan 5, 2024

why random and custom “run shell command” functions make no sense in the plugin scope

Any suggestion for improvements?

@fichtner
Copy link
Member

fichtner commented Jan 5, 2024

First consolidate all spots into one library function, then discuss what requirement was needed to reimplement. It may have been the stdin pipe use?

fraenki added a commit to fraenki/plugins that referenced this pull request Jan 6, 2024
@fraenki fraenki added the bug Production bug label Jan 6, 2024
@fraenki
Copy link
Member

fraenki commented Jan 6, 2024

I was unable to reproduce this on OPNsense:

root@opnsense:~ # php php_proc.php
The shell command 'ls -l /tmp' returned exit code '0'

However, when running the same PHP code on Linux this was reliably reproducable:

$ php php_proc.php
The shell command 'ls -l /tmp' returned exit code '2'

🤷‍♂️
I'll add the workaround anyway, because it was reported to fix gcloud.

@fraenki
Copy link
Member

fraenki commented Jan 6, 2024

@fichtner I've tried to implement your suggestion in #3749. Would appreciate a review.

@luispabon Thanks for submitting this PR. I've implemented other enhancements and incoorporated your fix into #3749. I'll close this PR now.

@fraenki fraenki closed this Jan 6, 2024
@luispabon
Copy link
Author

Brilliant, thank you 👍

fraenki added a commit to fraenki/plugins that referenced this pull request Jan 8, 2024
@fraenki
Copy link
Member

fraenki commented Jan 9, 2024

@luispabon, upon further testing I've noticed that the change may cause the acme.sh process get stuck (it just hangs forever and will never terminate). This is the case because the stdout/stderr streams are blocking streams by default. The following change to LeUtils.php should fix this:

--- a/security/acme-client/src/opnsense/mvc/app/library/OPNsense/AcmeClient/LeUtils.php
+++ b/security/acme-client/src/opnsense/mvc/app/library/OPNsense/AcmeClient/LeUtils.php
@@ -197,6 +197,8 @@ class LeUtils
             // This workaround ensures that the accurate return code
             // is reliably returned.
             fclose($proc_pipes[0]);
+            stream_set_blocking($proc_pipes[1], false);
+            stream_set_blocking($proc_pipes[2], false);
             $output = array();
             while (!feof($proc_pipes[1])) {
                 $output[] = rtrim(fgets($proc_pipes[1], 1024), "\n");

@luispabon would you please verify that this does not negatively affect gcloud? Thanks in advance!

EDIT: Sigh. This causes other issues. Currently investigating...

@luispabon
Copy link
Author

@fraenki I haven't been able to reproduce acme.sh getting stuck, but the above doesn't seem to hurt it either, I was able to create a cert from scratch. What other issues does it cause?

@fraenki
Copy link
Member

fraenki commented Jan 9, 2024

@luispabon I see acme.sh reliably getting stuck when trying to issue a new certificate. Is this reproducable for you when using the code from #3749?

@luispabon
Copy link
Author

No, not at all, see for yourself:

recording.webm

I've run this a few times to make sure.

@fraenki
Copy link
Member

fraenki commented Jan 9, 2024

@luispabon Thanks! This is really odd.

Using my initial non-blocking code I've had one occurence of PHP memory exhaustion and high CPU usage.

I did extensive testing and the following code change seems to fix the stuck processes for me:

--- a/security/acme-client/src/opnsense/mvc/app/library/OPNsense/AcmeClient/LeUtils.php
+++ b/security/acme-client/src/opnsense/mvc/app/library/OPNsense/AcmeClient/LeUtils.php
@@ -197,14 +197,14 @@ class LeUtils
             // This workaround ensures that the accurate return code
             // is reliably returned.
             fclose($proc_pipes[0]);
-            $output = array();
-            while (!feof($proc_pipes[1])) {
-                $output[] = rtrim(fgets($proc_pipes[1], 1024), "\n");
+            stream_set_blocking($proc_pipes[1], false);
+            stream_set_blocking($proc_pipes[2], false);
+            while (!feof($proc_pipes[1]) || !feof($proc_pipes[2])) {
+                $stdout = fread($proc_pipes[1], 1024);
+                $stderr = fread($proc_pipes[2], 1024);
+                usleep(50000);
             }
             fclose($proc_pipes[1]);
-            while (!feof($proc_pipes[2])) {
-                $output[] = rtrim(fgets($proc_pipes[2], 1024), "\n");
-            }
             fclose($proc_pipes[2]);
 
             // Get exit code

@luispabon Would you please once more test if this code still works for you, especially when using gcloud?

@luispabon
Copy link
Author

With the following code:

    public static function run_shell_command($proc_cmd, $proc_env = array())
    {
        $proc_desc = array(  // descriptor array for proc_open()
            0 => array("pipe", "r"), // stdin
            1 => array("pipe", "w"), // stdout
            2 => array("pipe", "w")  // stderr
        );
        $proc_pipes = array();
        $proc = proc_open($proc_cmd, $proc_desc, $proc_pipes, null, $proc_env);

        // Make sure the resource could be setup properly
        if (is_resource($proc)) {
            // This workaround ensures that the accurate return code
            // is reliably returned.
            fclose($proc_pipes[0]);

            stream_set_blocking($proc_pipes[1], false);
            stream_set_blocking($proc_pipes[2], false);

            while (!feof($proc_pipes[1]) || !feof($proc_pipes[2])) {
                $stdout = fread($proc_pipes[1], 1024);
                $stderr = fread($proc_pipes[2], 1024);
                usleep(50000);
            }

            fclose($proc_pipes[1]);
            fclose($proc_pipes[2]);
     
            // Get exit code
            $result = proc_close($proc);
            log_error(sprintf("AcmeClient: The shell command returned exit code '%d': '%s'", $result, $proc_cmd));
            return($result);
        } else {
            log_error(sprintf("AcmeClient: Unable to prepare shell command '%s'", $proc_cmd));
            return(-999);
        }
    }

Still works fine and no issues with lingering processes:

image

@luispabon luispabon deleted the fix-le-utils-exec branch January 9, 2024 16:05
@fraenki
Copy link
Member

fraenki commented Jan 9, 2024

@luispabon Thanks again. So I'll commit this code and let others test it in the next release candidate.

fraenki added a commit to fraenki/plugins that referenced this pull request Jan 9, 2024
…e#3745

This deals with the following issues that occured while performing
extensive testing:
- acme.sh processes got stuck forever
- PHP memory exhaustion
- PHP processes with high CPU usage
fraenki added a commit to fraenki/plugins that referenced this pull request Jan 9, 2024
…e#3745

This deals with the following issues that occured while performing
extensive testing:
- acme.sh processes got stuck forever
- PHP memory exhaustion
- PHP processes with high CPU usage
@luispabon
Copy link
Author

Great stuff, thank you @fraenki

fichtner added a commit that referenced this pull request Jan 10, 2024
fraenki added a commit to fraenki/plugins that referenced this pull request Jan 10, 2024
…e#3745

This deals with the following issues that occured while performing
extensive testing:
- acme.sh processes got stuck forever
- PHP memory exhaustion
- PHP processes with high CPU usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

Successfully merging this pull request may close these issues.

security/acme-client: gcloud DNS - error
3 participants