Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts stuck in "Upcoming" #17695

Closed
pacamaster opened this issue Mar 18, 2024 · 17 comments
Closed

Scripts stuck in "Upcoming" #17695

pacamaster opened this issue Mar 18, 2024 · 17 comments
Assignees
Labels
~agent Related to Fleet's osquery runtime and agent autoupdater (Orbit) bug Something isn't working as documented ~csa Issue was created by or deemed important by the Customer Solutions Architect. customer-preston #g-mdm MDM product group P2 Prioritize as urgent :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.
Milestone

Comments

@pacamaster
Copy link
Member

pacamaster commented Mar 18, 2024

Fleet version:
Reported in Fleet Fleet 4.47.0 Go go1.21.7
osquery 5.11.0
Fleetd 1.22.0
Web browser and operating system:
Current version


💥  Actual behavior

After running scripts on an endpoint, the script is stuck in "Upcoming."

This doesn't let other scripts behind it run.

The expected behavior is that scripts, that take over 5 minutes to run, timeout and are moved to "Past." Then, scripts behind this one run.

🧑‍💻  Steps to reproduce

  1. Enroll a Windows hosts w/ --enable-scripts
  2. Run this script that attempts to install several Windows applications
  3. See that sometimes the script is stuck in "Upcoming"
@pacamaster pacamaster added bug Something isn't working as documented customer-preston :incoming New issue in triage process. labels Mar 18, 2024
@sharon-fdm
Copy link
Contributor

@georgekarrv, assign to MDM team?

@nonpunctual nonpunctual added #g-mdm MDM product group ~released bug This bug was found in a stable release. ~csa Issue was created by or deemed important by the Customer Solutions Architect. labels Mar 18, 2024
@nonpunctual
Copy link
Contributor

@sharon-fdm sorry if I assigned to the wrong team!

@valentinpezon-primo
Copy link

Relate to #17180

@nonpunctual
Copy link
Contributor

From customer-preston:

(per request by for a use case for script purging)
Since:
we are doing script-based app management on our own
we have to sometimes download multi-Gb apps
we use one script to install all customer apps
And, on the Fleet side of script rules:
there is currently no way to manually clear or adjust the script queue
scripts will attempt execution 2x
if a script needs to run a second time after this it must be queued again
re-queue does not clear any previous failures or attempts
script history can be obtained with api (or activity feed) but the Fleet UI modal shows the latest (or pending)
This is what happens:
Our app-install scripts timeout
We consider it still running, since it's in the queue, so we do not attempt to run it again
Any other script (for recovery key, MB agent, ...) is queued and NOT RUN since the queue can't be purged
This is obviously a HUGE problem, since App Management is a key part of any MDM value prop, but especially on SMBs + this blocks any other form of script exec
We really need solutions from you on this, starting with but not limited to:
More permissive rules around script run time
Script purging

@noahtalerman noahtalerman changed the title Scripts not running on endpoints with enable-scripts Scripts stuck in "Upcoming" Mar 19, 2024
@noahtalerman
Copy link
Member

noahtalerman commented Mar 19, 2024

EDIT: The customer added multiple applications to the following script and found that at around 6 or 7 apps that take awhile to download, the scripts get stuck in "Pending":

$diskName = (get-location).Drive.Name;
    
    function ExecInstallation($wingetDirectory) {
        $tempFolderPath = "$($diskName):\Temp"
        if (-Not (Test-Path -Path $tempFolderPath)) {
            echo "Creating temporary repository - $tempFolderPath"
            New-Item -Path $tempFolderPath -ItemType Directory
        }

        cd $wingetDirectory
        echo "Running $wingetDirectory\winget.exe"
        
        
        cmd.exe /c "winget.exe install --disable-interactivity --silent --accept-package-agreements --accept-source-agreements --force SlackTechnologies.Slack"
    }

    Set-ExecutionPolicy -ExecutionPolicy Bypass -Force
    $expectedGlobalPath = "$($diskName):\Program Files\WindowsApps\Microsoft.DesktopAppInstaller_*_x64_*"

    echo $expectedGlobalPath

    if (Test-Path "$expectedGlobalPath\winget.exe") {
        $wingetDirectory = $expectedGlobalPath
        echo "Global winget directory found for - $wingetDirectory"
        ExecInstallation "$wingetDirectory"
    } else {
        echo "Global winget directory not found	"
        $userFolders = Get-ChildItem -Path "$($diskName):\Users" -Directory

        foreach ($userFolder in $userFolders) {
            $user = $userFolder.Name
            $wingetDirectory = "C:\Users\$user\AppData\Local\Microsoft\WindowsApps\Microsoft.DesktopAppInstaller*"

            if (Test-Path "$wingetDirectory\winget.exe") {
                echo "Winget directory found for user $user - $wingetDirectory"
                ExecInstallation "$wingetDirectory"
                break
            }
        }
    }

@JoStableford
Copy link
Contributor

@noahtalerman noahtalerman added the :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. label Mar 20, 2024
@noahtalerman
Copy link
Member

Hey @valentinpezon-primo! When you get the chance, can you please share the exact script y'all ran? (the one that includes the 6 or 7 apps)

This way, we can try to reproduce on our end and get a fix in.

@valentinpezon-primo
Copy link

Hi @noahtalerman , sure i can but Its not script related since it's not the same script that get blocked everytime, I will copy paste the scripts that get blocked so you have them anyway 👍

Also, i saw this small note on the fleet ui "Script is running or will run when the host comes online."
Maybe the issue is not related to the queue itself but related to the way you tell your script queue that the host is online ? since being online is the trigger to start the scripts in queue

Here are the scripts :

Script to re-enroll win device when enrollment fails :

$EnrollmentsPath = "HKLM:\SOFTWARE\Microsoft\Enrollments\"



$Enrollments = Get-ChildItem -Path $EnrollmentsPath



$DiscoveryServerFullUrls = @("https://aventa.mdm.getprimo.com/api/mdm/microsoft/discovery")



Foreach ($Enrollment in $Enrollments) {

    $EnrollmentObject = Get-ItemProperty Registry::$Enrollment

    if ($EnrollmentObject."DiscoveryServiceFullURL" -in $DiscoveryServerFullUrls ) {

        $EnrollmentPath = $EnrollmentsPath + $EnrollmentObject."PSChildName"

        Write-Host "Suppression de l'inscription : $EnrollmentPath"

        Remove-Item -Path $EnrollmentPath -Recurse

        Write-Host "Inscription supprimée. Réinscription de l'appareil..."

        C:\Windows\System32\deviceenroller.exe /c /AutoEnrollMDM

        Write-Host "L'appareil a été réinscrit."

    }

}

Script to install app (on a macos):

#!/bin/bash

curl -o Installomator.sh https://raw.githubusercontent.com/Installomator/Installomator/main/Installomator.sh
chmod +x Installomator.sh

sudo ./Installomator.sh microsoftoffice365 DEBUG=0 NOTIFY=silent BLOCKING_PROCESS_ACTION=silent_fail; sudo ./Installomator.sh microsoftteams DEBUG=0 NOTIFY=silent BLOCKING_PROCESS_ACTION=silent_fail; sudo ./Installomator.sh 1password8 DEBUG=0 NOTIFY=silent BLOCKING_PROCESS_ACTION=silent_fail
rm -rf Installomator.sh
    

Script that install multiple apps on windows :

    $diskName = (get-location).Drive.Name;
    
    function ExecInstallation($wingetDirectory) {
        $tempFolderPath = "$($diskName):\Temp"
        if (-Not (Test-Path -Path $tempFolderPath)) {
            echo "Creating temporary repository - $tempFolderPath"
            New-Item -Path $tempFolderPath -ItemType Directory
        }

        cd $wingetDirectory
        echo "Running $wingetDirectory\winget.exe"
        
        
        cmd.exe /c "winget.exe install --disable-interactivity --silent --accept-package-agreements --accept-source-agreements --force Microsoft.Office"; cmd.exe /c "winget.exe install --disable-interactivity --silent --accept-package-agreements --accept-source-agreements --force Microsoft.Teams"; cmd.exe /c "winget.exe install --disable-interactivity --silent --accept-package-agreements --accept-source-agreements --force 1Password.1Password"
    }

    Set-ExecutionPolicy -ExecutionPolicy Bypass -Force
    $expectedGlobalPath = "$($diskName):\Program Files\WindowsApps\Microsoft.DesktopAppInstaller_*_x64_*"

    echo $expectedGlobalPath

    if (Test-Path "$expectedGlobalPath\winget.exe") {
        $wingetDirectory = $expectedGlobalPath
        echo "Global winget directory found for - $wingetDirectory"
        ExecInstallation "$wingetDirectory"
    } else {
        echo "Global winget directory not found	"
        $userFolders = Get-ChildItem -Path "$($diskName):\Users" -Directory

        foreach ($userFolder in $userFolders) {
            $user = $userFolder.Name
            $wingetDirectory = "C:\Users\$user\AppData\Local\Microsoft\WindowsApps\Microsoft.DesktopAppInstaller*"

            if (Test-Path "$wingetDirectory\winget.exe") {
                echo "Winget directory found for user $user - $wingetDirectory"
                ExecInstallation "$wingetDirectory"
                break
            }
        }
    }

@noahtalerman
Copy link
Member

@valentinpezon-primo thanks!

it's not the same script that get blocked everytime

Got it.

Also, i saw this small note on the fleet ui "Script is running or will run when the host comes online."
Maybe the issue is not related to the queue itself but related to the way you tell your script queue that the host is online ? since being online is the trigger to start the scripts in queue

Hmm, that's an interesting thought. For the hosts with scripts stuck in "Upcoming," are these hosts offline?

@noahtalerman
Copy link
Member

cc @georgekarrv ^^

@valentinpezon-primo
Copy link

Hmm, that's an interesting thought. For the hosts with scripts stuck in "Upcoming," are these hosts offline?

No they come back online, at least the UI says they are online, my idea was that maybe the front "knows" but the script queue doesnt..

But I don't want to interfer with your debugging, It's just some random idea

@martinpannier
Copy link

Hey @noahtalerman, I think there may be 2 issues that are getting mixed up in one? There has been a lot of focus on scripts timing out. But we are seeing the issue on a fresh device, just enrolled, online — no failed scripts so far, scripts are just not running:
Screenshot_2024-03-28_at_11_41_04

As a reminder, we rely on scripts to install applications.

@noahtalerman
Copy link
Member

Thanks @martinpannier!

I think there may be 2 issues that are getting mixed up in one? There has been a lot of focus on scripts timing out. But we are seeing the issue on a fresh device, just enrolled, online — no failed scripts so far, scripts are just not running

This issue covers the "scripts are not just running" bit.

I updated the issue description to clarify this. Does that capture the issue y'all are seeing?

FYI @dantecatalfamo

@dantecatalfamo
Copy link
Member

I'm looking into this, but haven't been able to recreate it yet

@martinpannier
Copy link

@noahtalerman Yes, perfect thanks!
@dantecatalfamo Happy to do a live call so you can see the problem first hand and run some tests

We would also love to have some mitigation steps if you guys have any idea (up to & including the dreaded "have you tried restarting your computer?")

@noahtalerman
Copy link
Member

noahtalerman commented Apr 1, 2024

Happy to do a live call so you can see the problem first hand and run some tests

Hey @martinpannier! I think this is the plan for Tues (4/2) call.

We would also love to have some mitigation steps if you guys have any idea (up to & including the dreaded "have you tried restarting your computer?")

Currently, running these cleanup queries in the Fleet DB is one known workaround:

Delete pending scripts for a single host matching host_id X:

DELETE FROM host_script_results WHERE host_id = X AND exit_code IS NULL

Delete pending scripts for all hosts:

DELETE FROM host_script_results WHERE exit_code IS NULL

Obviously, this is a workaround that neither the IT admin nor the end user can take.

Taking this back to the Fleet team for ideas on workarounds for IT admin / end user.

@georgekarrv georgekarrv added this to the 4.49.0-tentative milestone Apr 2, 2024
@georgekarrv georgekarrv removed the :incoming New issue in triage process. label Apr 3, 2024
@noahtalerman noahtalerman added the P2 Prioritize as urgent label Apr 10, 2024
dantecatalfamo added a commit that referenced this issue Apr 16, 2024
#17695

The windows exit code is a 32-bit unsigned integer, but the command
interpreter treats it like a signed integer. When a process is killed,
it returns 0xFFFFFFFF (interpreted as -1). We convert the integer to an
signed 32-bit integer to flip it to a -1 to match our expectations, and
fit in our db column.

https://en.wikipedia.org/wiki/Exit_status#Windows

FIxed on both the client and server side.
@georgekarrv georgekarrv modified the milestones: 4.49.0-tentative, 4.48.3 Apr 16, 2024
sharon-fdm pushed a commit that referenced this issue Apr 16, 2024
#17695

The windows exit code is a 32-bit unsigned integer, but the command
interpreter treats it like a signed integer. When a process is killed,
it returns 0xFFFFFFFF (interpreted as -1). We convert the integer to an
signed 32-bit integer to flip it to a -1 to match our expectations, and
fit in our db column.

https://en.wikipedia.org/wiki/Exit_status#Windows

FIxed on both the client and server side.
@georgekarrv georgekarrv added the ~agent Related to Fleet's osquery runtime and agent autoupdater (Orbit) label Apr 16, 2024
@fleet-release
Copy link
Contributor

No scripts stuck, stalled,
Fleet flows like river, unblocked.
Clear path for code's call.

@georgekarrv georgekarrv added :demo and removed :demo labels Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
~agent Related to Fleet's osquery runtime and agent autoupdater (Orbit) bug Something isn't working as documented ~csa Issue was created by or deemed important by the Customer Solutions Architect. customer-preston #g-mdm MDM product group P2 Prioritize as urgent :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.
Development

No branches or pull requests

10 participants