Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot affect GPU to task #10

Closed
willDTF opened this issue Oct 29, 2019 · 19 comments
Closed

cannot affect GPU to task #10

willDTF opened this issue Oct 29, 2019 · 19 comments

Comments

@willDTF
Copy link

willDTF commented Oct 29, 2019

Hi,
actually we don't have any GPU options from gafferDeadline submitter
could be great to add the "gpu affinity" options to attribute a specific number of GPUs per task
then we can use gafferCycles (or openGL render) with multiple GPU

bests

@ericmehl
Copy link
Member

That's a good idea, and CPU affinity too. I'm thinking that since there are so many Deadline options I should add a general "additional options" field where you can input key / value combinations that correspond to Deadline options. That way it would keep the main UI relatively clean with the most common options but give you the ability to add less common options as well.

Do you think that would be a useful feature, or would you rather have it as an explicit option in the UI?

@willDTF
Copy link
Author

willDTF commented Oct 30, 2019

Hi, I think it depends a lot of workflows

As i use CPU and GPU affinity (sometimes both at the same time) for almost every dispatch, In my case it's a common option so to me it should be static
then having custom fields for more exotic options would be great

@willDTF
Copy link
Author

willDTF commented Apr 6, 2020

Hello, I have made some try to implement this, without success sadly (i have tried to mimic the redshift one)
are you still considering adding this ?
bests

@ericmehl
Copy link
Member

ericmehl commented May 2, 2020

Hey @Kaiz3rTool I didn't forget about your request! I'm not entirely sure how to implement this because I'm not familiar with the use case of GPU / CPU affinity. It looks like it's not common to all Deadline jobs but only certain plugins like Redshift, Nuke and some others.

How are you running Redshift from Gaffer? Is it through a SystemCommand or PythonCommand? Or something else?

@willDTF
Copy link
Author

willDTF commented May 9, 2020

Hey, i don't run redshift on gaffer, I was trying to understand how GPU affinity works in redshift's deadline plugin to join it to the gaffer one but it's too complicated to me sadly

@ericmehl
Copy link
Member

ericmehl commented Jan 25, 2021

@Kaiz3rTool, following up on this and your post on the Deadline forum.

As far as I can tell, GPU and CPU affinity are specific to the renderer you are using, so supporting it will depend on how you render through Gaffer.

You can use the new custom environment variables in GafferDeadline, or you can bake it into the command itself. If you are using a SystemCommand node there is also a set of substitutions you can set (part of Gaffer, not GafferDeadline).

Here's a very rough template you an copy and paste into Gaffer to try and explain my thoughts a little better.


import Gaffer
import GafferDispatch
import IECore
import imath

Gaffer.Metadata.registerValue( parent, "serialiser:milestoneVersion", 0, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:majorVersion", 58, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:minorVersion", 3, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:patchVersion", 2, persistent=False )

__children = {}

__children["SystemCommand"] = GafferDispatch.SystemCommand( "SystemCommand" )
parent.addChild( __children["SystemCommand"] )
__children["SystemCommand"]["dispatcher"]["deadline"]["environmentVariables"].addChild( Gaffer.NameValuePlug( "", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "member1", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["SystemCommand"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["SystemCommand"]["dispatcher"]["deadline"]["pool"].setValue( 'none' )
__children["SystemCommand"]["dispatcher"]["deadline"]["secondaryPool"].setValue( 'none' )
__children["SystemCommand"]["dispatcher"]["deadline"]["group"].setValue( 'none' )
__children["SystemCommand"]["dispatcher"]["deadline"]["onJobComplete"].setValue( 'Nothing' )
__children["SystemCommand"]["dispatcher"]["deadline"]["dependencyMode"].setValue( 'Auto' )
__children["SystemCommand"]["dispatcher"]["deadline"]["environmentVariables"]["member1"]["name"].setValue( 'gpuAffinity' )
__children["SystemCommand"]["dispatcher"]["deadline"]["environmentVariables"]["member1"]["value"].setValue( 'bigBossGPU' )
__children["SystemCommand"]["command"].setValue( '/opt/redshift/bin/redshift --gpu=${gpuAffinity} redshiftFile' )
__children["SystemCommand"]["__uiPosition"].setValue( imath.V2f( 11.1500006, 6.8499999 ) )

del __children

@ericmehl
Copy link
Member

It looks like Deadline / Redshift set an environment variable called REDSHIFT_GPUDEVICES so if you have that set to the right value it may do what you are looking for. Try this updated node for a couple of different ways of setting that environment variable.


import Gaffer
import GafferDispatch
import IECore
import imath

Gaffer.Metadata.registerValue( parent, "serialiser:milestoneVersion", 0, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:majorVersion", 58, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:minorVersion", 3, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:patchVersion", 2, persistent=False )

__children = {}

__children["SystemCommand"] = GafferDispatch.SystemCommand( "SystemCommand" )
parent.addChild( __children["SystemCommand"] )
__children["SystemCommand"]["dispatcher"]["deadline"]["environmentVariables"].addChild( Gaffer.NameValuePlug( "", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "member1", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["SystemCommand"]["environmentVariables"].addChild( Gaffer.NameValuePlug( "", Gaffer.StringPlug( "value", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), True, "member1", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["SystemCommand"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["SystemCommand"]["dispatcher"]["deadline"]["pool"].setValue( 'none' )
__children["SystemCommand"]["dispatcher"]["deadline"]["secondaryPool"].setValue( 'none' )
__children["SystemCommand"]["dispatcher"]["deadline"]["group"].setValue( 'none' )
__children["SystemCommand"]["dispatcher"]["deadline"]["onJobComplete"].setValue( 'Nothing' )
__children["SystemCommand"]["dispatcher"]["deadline"]["dependencyMode"].setValue( 'Auto' )
__children["SystemCommand"]["dispatcher"]["deadline"]["environmentVariables"]["member1"]["name"].setValue( 'REDSHIFT_GPUDEVICES' )
__children["SystemCommand"]["dispatcher"]["deadline"]["environmentVariables"]["member1"]["value"].setValue( 'bigBossGPU' )
__children["SystemCommand"]["command"].setValue( '/opt/redshift/bin/redshift redshiftFile' )
__children["SystemCommand"]["environmentVariables"]["member1"]["name"].setValue( 'REDSHIFT_GPUDEVICES' )
__children["SystemCommand"]["__uiPosition"].setValue( imath.V2f( 11.1500006, 6.8499999 ) )

del __children

@willDTF
Copy link
Author

willDTF commented Jan 31, 2021

Hi Eric,
Thanks for your tests !
I don't run redshift, i was just trying to understand how it handle gpu affinity to try to mimic it for arnold
I use both arnold and cycles to render in gaffer and both doesn't have gpu affinity from any DCC from deadline (that's why it's complex i guess)
so I was hoping that they can give you a hand on this :)

@ericmehl
Copy link
Member

ericmehl commented Feb 1, 2021

Ah, I get it now. I don't know how Arnold does the GPU settings, it looks like the version currently shipping with Gaffer still leaves it as experimental. But it may be similar to Cycles with a little luck. I don't have multiple GPU configurations to test on unfortunately but give this a go:

With Cycles I think it might be pretty straightforward. In the CyclesOptions node I think you can set the device value to an expression to pull from environment variables. Try something like this in a a new Python expression:
import os
parent["CyclesOptions"]["options"]["device"] = os.environ.get("gpuDevice", "CPU")

You may need to replace the CyclesOptions name to the name of your options node. I don't know how your devices will be named, but you can use the presets dropdown and get the value using
root["CyclesOptions"]["options"]["device"].getValue()
in the Gaffer Python panel.

Then in GafferDeadline, add a new environment variable called gpuDevice to correspond to the gpuDevice text in the expression. Set the value to the appropriate setting from your getValue() test and when you submit it and run in Deadline it will set the environment variable before starting Gaffer and the expression will read it. And if you don't have any value set, it will default to "CPU".

@willDTF
Copy link
Author

willDTF commented Feb 4, 2021

hi eric,
sorry i think I have poorly explain what i'm trying to do
your idea works well if I want to render on a specific gpu, but i can already choose it in cycleOption node
that i try to do, is to dispatch a gpu per task, like redshift can do, to render multiple frames on multiple gpus at the same time
like :
GPU0 compute 0-10
GPU1 compute 11-20
GPU2 compute 21-30

and so on...
the redshift gpu affinity in maya allow this kind of things so I guess that from what you suggest, the gpuDevice variable must change for every job with a loop at the 8th one (as i have 8 gpus)

from what i can see in redshift deadline plugins (in Redshift.py), the interesting part to talk to deadline is :
`def getGPUList( self ):
gpusPerTask = self.GetIntegerPluginInfoEntryWithDefault( "GPUsPerTask", 0 )
selectGPUDevices = self.GetPluginInfoEntryWithDefault( "SelectGPUDevices", "" ).strip()
resultGPUs = []

    if self.OverrideGpuAffinity():
        slaveGPUAffinity = list(self.GpuAffinity())
        if gpusPerTask == 0 and selectGPUDevices != "":
            tempGPUs = selectGPUDevices.split( "," )
            notFoundGPUs = []
            for gpu in tempGPUs:
                gpu = gpu.strip()
                if int( gpu ) in slaveGPUAffinity:
                    resultGPUs.append( gpu )
                else:
                    notFoundGPUs.append( gpu )
            
            if len( notFoundGPUs ) > 0:
                self.LogWarning( "The Worker is overriding its GPU affinity and the following GPUs do not match the Workers affinity so they will not be used: %s" % ",".join( notFoundGPUs ) )

            if len( resultGPUs ) == 0:
                self.FailRender( "The Worker does not have affinity for any of the GPUs specified in the job." )

        elif gpusPerTask > 0:
            if gpusPerTask > len( slaveGPUAffinity ):
                self.LogWarning( "The Worker is overriding its GPU affinity and the Worker only has affinity for %s Workers of the %s requested." % ( len( slaveGPUAffinity ), gpusPerTask ) )
                resultGPUs = [ str( gpu ) for gpu in slaveGPUAffinity ]
            else:
                startingGPU = self.GetThreadNumber() * gpusPerTask
                numOverrideGPUs = len( slaveGPUAffinity )
                startIndex = startingGPU % numOverrideGPUs
                endIndex = ( startingGPU + gpusPerTask) % numOverrideGPUs
                if startIndex < endIndex:
                    gpus = slaveGPUAffinity[startIndex:endIndex]
                else:
                    #If there are multiple render threads going we could roll over the available GPUs in which case we need to grab from both ends of the available GPUs
                    gpus = slaveGPUAffinity[ :endIndex ] + slaveGPUAffinity[ startIndex: ]
                    
                resultGPUs = [ str( gpu ) for gpu in gpus ]

        else:
            resultGPUs = [ str( gpu ) for gpu in slaveGPUAffinity ]
        
        self.LogInfo( "The Worker is overriding the GPUs to render, so the following GPUs will be used: %s" % ",".join( resultGPUs ) )

    elif gpusPerTask == 0 and selectGPUDevices != "":
        self.LogInfo( "Specific GPUs specified, so the following GPUs will be used: %s" % selectGPUDevices )
        resultGPUs = selectGPUDevices.split( "," )

    elif gpusPerTask > 0:
        # As of redshift 1.3 there is no command line option to specify multiple threads, but this code should still work
        startIndex = self.GetThreadNumber() * gpusPerTask
        
        for i in range( startIndex, startIndex + gpusPerTask ):
            resultGPUs.append( str( i ) )

        self.LogInfo( "GPUs per task is greater than 0, so the following GPUs will be used: " + ",".join( resultGPUs ) )

    return resultGPUs`

that correspond to the panel :

image

@ericmehl
Copy link
Member

ericmehl commented Feb 5, 2021

I see, that is a much more clever use that I didn't know they did for Redshift. Thanks for that really good explanation.

My thought is to change the GafferDeadline plugin (on the Deadline side) to add the GPU thread and CPU thread as environment variables set before Gaffer launches. Then you could create an expression for the GPU device to mimic what Deadline does to assign GPU threads to GPU devices.

That would be pretty simple on my end and I think a great addition so I'll put it on my list for next week.

In the meantime if you are rendering multiple frames you may be able to get by with a modulo expression setting the GPU device based on frame number?

@willDTF
Copy link
Author

willDTF commented Feb 6, 2021

thanks Eric, sounds great ! thank you
With this solution, my concern is in case of fail job, how deadline will manage reassignment ? how does it works on multiple machines without the same number of GPU ?
As deadline already knows how to handle gpu per task, maybe it's easier / more flexible to match it ?

I have made some search in plugins/"mayabatch.py" to see how gpu per task is done, it seems that the process is to create a gpu override with specific renderer's flag to lock on GPU (descriptions for arnold , didn't found for blender for now ) then populate the scriptBuilder en let deadline handle the gpu assigment.
sounds easy from my mouth but i'm pretty sure it's not, but I guess it can be simpler / more efficient to match deadline's way
let me know if you want me to search something to make your life easier, or do some tests

bests

Thanks !

@ericmehl
Copy link
Member

ericmehl commented Feb 6, 2021

I found this on the Deadline blog that gives a good explanation: https://www.awsthinkbox.com/blog/cpu-and-gpu-affinity-in-deadline.

Depending on how your jobs typically come out you can

  1. Launch multiple Deadline Workers on the same machine that each has a GPU (or multiple) assigned to it. That would pick up multiple frames and multiple jobs at the same time, with some overhead of having multiple workers running.
  2. If you usually are rendering sequences of tasks for the same job, you can have one Worker take on multiple tasks and each task will get one or more GPUs assigned. (That's what I get from the section at the very end).

I take that to mean that if a task errors, when it's picked up by the next available worker it gets assigned whichever GPU is available. It would also work in the case of some frames being faster to finish than others - the next worker just takes whatever thread / GPU is assigned to it and it should utilize your full set of GPUs pretty well.

And luckily for us with either of those methods I can just pass Deadline the GPU thread we're given into Gaffer and then it can be given to the renderer to handle appropriately.

@willDTF
Copy link
Author

willDTF commented Feb 6, 2021

that's great, I use case 2 most of the time and sometimes both at the same times for very fast tasks

@ericmehl
Copy link
Member

ericmehl commented Feb 9, 2021

Hey @Kaiz3rTool, give the latest release a try, I think it should sort out your GPU affinity: https://github.com/hypothetical-inc/GafferDeadline/releases/tag/0.54.0.0

If you are using the second method from that blog post, I think you'll need to actually use the environment variable CPUTHREAD similar to how they do it in Redshift, reading it in an expression on the GPU device plug.

startIndex = self.GetThreadNumber() * gpusPerTask

for i in range( startIndex, startIndex + gpusPerTask ):
    resultGPUs.append( str( i ) )`

Or if you can only use one GPU for rendering at a time, I expect you can just use the CPUTHREAD directly.

@willDTF
Copy link
Author

willDTF commented Feb 11, 2021

Hey @ericmehl
thank you for the update, I have made some tests with GPU affinity and arnold, it's seems to works nicely, well done !
I can fire 4 Gpu per worker, following method one from blog post

about the second method (multiple task with 1 GPU per task), I don't get it, sorry
I guess that the variable you wanted to quote was 'GPUAFFINITY' but where the gpuPerTask variable comes from ?
I don't find any in deadlineGaffer submitter

thanks

@ericmehl
Copy link
Member

Sorry, I was a little unclear in the quoted code. That is from the Redshift example you posted earlier. If you only want one GPU per task, you can ignore the gpusPerTask variable and just assign the CPUTHREAD environment variable as the GPU number in an expression

parent["renderNode"]["gpuPlug"] = os.environ("CPUTHREAD")  # or perhaps str(os.environ("CPUTHREAD")) depending on formatting need

So if your single Worker is launching 4 different tasks, each will have a single CPUTHREAD value from 1-4 and that maps to a GPU.

If you want multiple GPUs per task, you can add a variable called gpusPerTask to the Deadline Dispatcher environment variables. Then you can use an expression to determine which GPUs to use, similar in my last post, but I'll be more clear here:

import os
gpusPerTask = os.environ["gpusPerTask"]
startIndex = os.environ["CPUTHREAD"] * gpusPerTask
resultGPUs = []
for i in range(startIndex, startIndex + gpusPerTask):
    resultGPUs.append(i)

That will give you a list of GPU numbers to use for that task in resultGPUs. I imagine you'd need to convert that in some way for Cycles / Arnold before setting the GPU selection plug. Maybe as simple as ",".join(str(resultGPUs))?

Apologies if any of that code isn't quite right, but it should be pretty close.

@willDTF
Copy link
Author

willDTF commented Mar 25, 2021

hi Eric,
sorry for the late answer, those days were busy

thanks for your help !

@willDTF willDTF closed this as completed Mar 25, 2021
@ericmehl
Copy link
Member

No worries, I'm glad it's working!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants