Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing output from Start-RSJob *bug* #121

Closed
hou8182 opened this issue Jan 24, 2017 · 14 comments
Closed

Missing output from Start-RSJob *bug* #121

hou8182 opened this issue Jan 24, 2017 · 14 comments

Comments

@hou8182
Copy link

hou8182 commented Jan 24, 2017

I ran a script to return the number of days that a list of 10,000 computers has been online. I sent the output to a CSV file, but when I opened it I was missing results from over 4,000 machines. I have had other scripts that I've fed into Start-RSJob that have performed in a similar manner where the output results do not come back 100%.

I have to rerun the script over on everything that Start-RSJob did not return the first time around. Sometimes I have to do this several times. I'm not sure what the problem could be since I was able to verify that all machines were online and reachable. Is this a known bug that was fixed or is being worked on in a later release of PoshRSJob?

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem

` $computers = Get-Content "C:\Users\Admin\Desktop\PowerShell Testing\DifferenceResults.txt"

$Computers | Start-RSJob -Name {"$($_)"} -Throttle 200 -ScriptBlock {
Param($Computer)
$PSBoundParameters.GetEnumerator() | ForEach {

    $OSObject = Get-WmiObject win32_operatingsystem -ComputerName $Computer
    $Today = Get-Date
    $Uptime = $OSObject.ConvertToDateTime($OSObject.LastBootUpTime)
    $Diff = $Today.Subtract($Uptime)
    $DaysUp  = $Diff.days
    

    $results = @()
    $eachcomputer = [ordered]@{"Computer"=$Computer;
                               "DaysOnline"=$DaysUp}

    $newobj = New-Object psobject -property $eachcomputer
    $results += $newobj }
    
    $results | Export-Csv "C:\Users\Admin\Desktop\PowerShell Testing\DaysSinceLastReboot2.csv" -Append -NoTypeInformation }`

What is the expected behavior?

The script should return 100% of the uptime results into the CSV file that I named, but seems to only return about half of the results.

Which versions of Powershell and which OS are affected by this issue? Did this work in previous versions of our scripts?

I am running PS5 on Win7.

@MVKozlov
Copy link
Contributor

except online/offline problems I suspect bad behaviour of export-csv in multithreaded environment
you may surrond your code block to try {} catch { } and redirect any possible errors to log file

@hou8182
Copy link
Author

hou8182 commented Jan 24, 2017

Thanks for the reply. I'm not sure how to check the PoshRS release version that I am currently running, but could this possibly be a bug that was fixed from an earlier release? I see that there is a new version of PoshRS that was released 9 days ago. I may try that if you think it could help. I'll also try the try/catch blocks like you suggested.

@proxb
Copy link
Owner

proxb commented Jan 25, 2017

This definitely sounds like an caused by multithreading with different runspaces fighting over who gets to write to a single file. I would recommend using something like a mutex in your scriptblock to help ensure that only run thread has access at a time to update the CSV file to avoid missing information. I wrote about using mutexes in PowerShell at the link below that might help you out.

Using Mutexes to Write Data to the Same Logfile Across Processes With PowerShell

@hou8182
Copy link
Author

hou8182 commented Jan 25, 2017 via email

@proxb
Copy link
Owner

proxb commented Jan 29, 2017

Awesome, thanks for the reply! I'll read up on it and give it a shot. Not
sure if this could be an option to add into the next release of PoshRS or
not.

@hou8182 Unfortunately, this would be difficult to do as you would really only want to use the mutex where needed which would rely on the person writing the code to identify where that would be. I could add some mutex code at the beginning and end of the scriptblock, but that type of block would impact the performance on the runspace jobs as each one would have to wait for the runspace holding the lock to complete before beginning the next job.

@proxb proxb self-assigned this Jan 31, 2017
@proxb proxb added the question label Jan 31, 2017
@codykonior
Copy link
Contributor

codykonior commented Feb 21, 2017

@hou8182 do the jobs finish? Do they set the HasErrors flags? Or do they finish and with no errors but Receive-RSJob shows nothing?

If so try setting Set-StrictMode -VersionLatest; $ErrorActionPreference = "Stop"; at the top of your script block so that any errors terminate and print out the cause. (Those will be shown on screen by Receive-RSJob but not directed to a variable, so you'd have to run it interactively or redirect the error stream).

@MVKozlov
Copy link
Contributor

MVKozlov commented Apr 3, 2017

Just catched the same problem
Jobs not always return output
Tested with latest Pester (v4.0.3)
1-5% of all tests failed, Seems, somewhere between Wait-RSJob and Receive-RSJob we have race condition

function DoTest {
	$jobs = 1,2,3 |	Start-RSJob { $_ }

	#bad
	$jobs | Wait-RSJob | Receive-RSJob | Sort

	#good
#	$wj = $jobs | Wait-RSJob
#	$jobs | Receive-RSJob | Sort

	$jobs | Remove-RSJob
}

Describe "Test Wait-RSJob | Receive-RSJob Scenario" -Tags Scenario {
	    Context "output" {
			1..200 | Foreach-Object {
				It "try $_, get 1,2,3" {
				   DoTest  | Should -Be 1,2,3
				}
			}
		}
}

btw, for 1000 iterations I get memory problems

@MVKozlov
Copy link
Contributor

MVKozlov commented Apr 3, 2017

debug output on good and error conditions:

      [+] try 180, get 1,2,3 225ms
VERBOSE: ParameterSet: All
DEBUG: Bound: False
VERBOSE: Adding RSJob
VERBOSE: Adding RSJob
VERBOSE: Adding RSJob
VERBOSE: WhereString: $_.id -match "^1747$|^1748$|^1749$"
VERBOSE: Using scriptblock
VERBOSE: 3
VERBOSE: Wait: 0
VERBOSE: Completed: (3)
VERBOSE: Total: (3)
VERBOSE: Status: 1
DEBUG: ParameterSet: Job
DEBUG: ScriptBlock:
      [+] try 181, get 1,2,3 106ms
VERBOSE: ParameterSet: All
DEBUG: Bound: False
VERBOSE: Adding RSJob
VERBOSE: Adding RSJob
VERBOSE: Adding RSJob
VERBOSE: WhereString: $_.id -match "^1750$|^1751$|^1752$"
VERBOSE: Using scriptblock
VERBOSE: 3
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 3
VERBOSE: Completed: (0)
VERBOSE: Total: (3)
VERBOSE: Status: 0
VERBOSE: Wait: 2
VERBOSE: Completed: (1)
VERBOSE: Total: (3)
VERBOSE: Status: 0.333333333333333
VERBOSE: Wait: 0
VERBOSE: Completed: (3)
VERBOSE: Total: (3)
VERBOSE: Status: 1
DEBUG: ParameterSet: Job
DEBUG: ScriptBlock:
      [-] try 182, get 1,2,3 243ms
        Expected: {1 2 3}
        But was:  {2 3}
        at <ScriptBlock>, D:\Work\ps\Scripts\Pester\m\rsj.Tests.ps1: line 18
        18:                                DoTest  | Should -Be 1,2,3

@codykonior
Copy link
Contributor

Are you using the current version? Because Boe included a fix of mine a month or so ago.

@MVKozlov
Copy link
Contributor

MVKozlov commented Apr 3, 2017

definitely.
for test just cloned latest from github
Tests Passed: 184, Failed: 16

 D:\Work\ps\Scripts\Pester\m> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      5.1.14409.1005
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.14409.1005
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

 D:\Work\ps\Scripts\Pester\m> Get-Module PoshRSJob

ModuleType Version    Name                                ExportedCommands
---------- -------    ----                                ----------------
Script     1.7.3.8    PoshRSJob                           {Get-RSJob, Receive-RSJob, Remove-RSJob, Start-RSJob...}

@MVKozlov
Copy link
Contributor

MVKozlov commented Apr 3, 2017

Try to launch on other computer with more memory and Win10 onboard
errors happen much less often
for around 800 iterations I get "low memory" error from system
Powershell take about 1600 Mb. and there is

PS D:\Work\1> $PoshRS_RunspacePools.Count
810

may be $PoshRS_RunspacePoolCleanup does not work as expected ?

btw, I'm not found any .InnerJob.dispose() in Remove-RSJob

@MVKozlov
Copy link
Contributor

MVKozlov commented Apr 3, 2017

I think at least it is need to move [gc]::collect() out of

[System.Threading.Monitor]::Enter($PoshRS_RunspacePools.syncroot)
[System.Threading.Monitor]::Exit($PoshRS_RunspacePools.syncroot)

or (minimum) out of foreach into code branch after
If ($DisposePoshRS_RunspacePools) {

and move Remove-Variable TempCollection -ErrorAction SilentlyContinue into same code branch

@MVKozlov
Copy link
Contributor

MVKozlov commented Apr 4, 2017

I'm got it
since Wait-RSJob nor Receive-RSJob does not use Monitor::Enter/Exit
there is race condition when .Completed flag set before output saved.

so, to resolve race condition and garbage collection outside of Monitor for #48

diff --git a/PoshRSJob/PoshRSJob.psm1 b/PoshRSJob/PoshRSJob.psm1
index cb85242..1df46cb 100644
--- a/PoshRSJob/PoshRSJob.psm1
+++ b/PoshRSJob/PoshRSJob.psm1
@@ -153,13 +153,13 @@ $PoshRS_jobCleanup.PowerShell = [PowerShell]::Create().AddScript({
                 }
                 #$PoshRS_jobCleanup.Host.UI.WriteVerboseLine("$($Job.Id) Disposing job")
                 $job.InnerJob.dispose() 
-                $job.Completed = $True  
                 #Return type from Invoke() is a generic collection; need to verify the first index is not NULL
                 If (($Data.Count -gt 0) -AND (-NOT ($Data.Count -eq 1 -AND $Null -eq $Data[0]))) {   
                     $job.output = $Data
                     $job.HasMoreData = $True                           
                 }              
                 $Error.Clear()
+                $job.Completed = $True
             } 
         }        
         [System.Threading.Monitor]::Exit($PoshRS_Jobs.syncroot)
@@ -192,6 +192,10 @@ $PoshRS_RunspacePoolCleanup.PowerShell = [PowerShell]::Create().AddScript({
     #Routine to handle completed runspaces
     Do { 
         #$ParentHost.ui.WriteVerboseLine("Beginning Do Statement")
+        If ($DisposePoshRS_RunspacePools) {
+            #Perform garbage collection
+            [gc]::Collect()
+        }
         $DisposePoshRS_RunspacePools=$False  
         If ($PoshRS_RunspacePools.Count -gt 0) { 
             #$ParentHost.ui.WriteVerboseLine("$($PoshRS_RunspacePools | Out-String)")           
@@ -205,11 +209,6 @@ $PoshRS_RunspacePoolCleanup.PowerShell = [PowerShell]::Create().AddScript({
                         $RunspacePool.RunspacePool.Dispose()
                         $RunspacePool.CanDispose = $True
                         $DisposePoshRS_RunspacePools=$True
-
-                        #Perform garbage collection
-                        [gc]::Collect()
-                        Start-Sleep -Seconds 5
-                        [gc]::Collect()
                     }
                 } Else {
                     $RunspacePool.LastActivity = (Get-Date)
@@ -224,12 +223,20 @@ $PoshRS_RunspacePoolCleanup.PowerShell = [PowerShell]::Create().AddScript({
                     #$ParentHost.ui.WriteVerboseLine("Removing runspacepool <$($_.RunspaceID)>")
                     [void]$PoshRS_RunspacePools.Remove($_)
                 }
+                #Not setting this to silentlycontinue seems to cause another runspace to be created if an error occurs
+                Remove-Variable TempCollection -ErrorAction SilentlyContinue
             }
-            #Not setting this to silentlycontinue seems to cause another runspace to be created if an error occurs
-            Remove-Variable TempCollection -ErrorAction SilentlyContinue
             [System.Threading.Monitor]::Exit($PoshRS_RunspacePools.syncroot)
         }
             #$ParentHost.ui.WriteVerboseLine("Sleeping")
+        If ($DisposePoshRS_RunspacePools) {
+            #Perform garbage collection
+            [gc]::Collect()
+            # already have sleep later,
+            #Start-Sleep -Seconds 5
+            # optimization: moved to top of a cycle
+            #[gc]::Collect()
+        }
         Start-Sleep -Milliseconds 5000     
     } while ($PoshRS_RunspacePoolCleanup.Flag)
 })

MVKozlov added a commit to MVKozlov/PoshRSJob that referenced this issue Apr 4, 2017
@MVKozlov
Copy link
Contributor

MVKozlov commented Apr 4, 2017

on much more powerful computer (which can create a lot of RunspacePools in small time) there is a memory problems
it opens about 600+ runspacepools and fails to open more while old not cleaned up
It even ruined my graphics driver
about 18000 handlers opened in a process
I tried to set $RunspacePool.CleanupInterval for automatic runspaces (without -Batch parameter) from 2 minutes to 10 seconds. not helps a much.

diff --git a/PoshRSJob/Public/Start-RSJob.ps1 b/PoshRSJob/Public/Start-RSJob.ps1
index 2ff9c86..88243d1 100644
--- a/PoshRSJob/Public/Start-RSJob.ps1
+++ b/PoshRSJob/Public/Start-RSJob.ps1
@@ -419,7 +419,12 @@ Function Start-RSJob {
                 }
                 [void]$RunspacePool.SetMaxRunspaces($Throttle)
                 If ($PSVersionTable.PSVersion.Major -gt 2) {
-                    $RunspacePool.CleanupInterval = [timespan]::FromMinutes(2)
+                    if ($PSBoundParameters.ContainsKey('Batch')) {
+                        $RunspacePool.CleanupInterval = [timespan]::FromMinutes(2)
+                    }
+                    else {
+                        $RunspacePool.CleanupInterval = [timespan]::FromSeconds(10)
+                    }
                 }
                 $RunspacePool.Open()
                 $RSPObject = New-Object RSRunspacePool -Property @{

$PoshRS_RunspacePoolCleanup.Timeout setting to smaller value solves this, but for even better processors.....

so there is a fundamental problem with such usage pattern (or .Net limitations I not know about)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants