I915_CONTEXT_PARAM_PERSISTENCE support #228

BartoszDunajski · 2019-10-24T11:41:58Z

Add new IOCTL call to disable persistence on given context

Signed-off-by: Dunajski, Bartosz bartosz.dunajski@intel.com

MichalMrozek

looks good to me

Add new IOCTL call to disable persistence on given context Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>

SlawomirMakowski · 2019-10-29T13:20:54Z

Internal testing was done.

Testing scenario:

spawn 2 separate processes that will occupy GPU (i915) for long time (like couple of minutes)
kill one of the processes

Expected behavior: second process runs until completion and no GPU hang is present.

I have executed above scenario on Kaby Lake with:

Ubuntu 18.04.1 LTS default kernel and compute-runtime driver from top
Ubuntu 18.04.1 LTS with kernel containing following patchwork: https://patchwork.freedesktop.org/series/68515/ and compute-runtime driver containing changes from this PR

For 1. I have observed GPU reset because driver has finished and there were still work on i915 engine present for first process.
For 2. No GPU reset occurred and second process finished without any issues because after killing first process driver finished and no work on i915 engine was present for this process.

Our existing behaviour is to allow contexts and their GPU requests to persist past the point of closure until the requests are complete. This allows clients to operate in a 'fire-and-forget' manner where they can setup a rendering pipeline and hand it over to the display server and immediately exit. As the rendering pipeline is kept alive until completion, the display server (or other consumer) can use the results in the future and present them to the user. The compute model is a little different. They have little to no buffer sharing between processes as their kernels tend to operate on a continuous stream, feeding the results back to the client application. These kernels operate for an indeterminate length of time, with many clients wishing that the kernel was always running for as long as they keep feeding in the data, i.e. acting like a DSP. Not all clients want this persistent "desktop" behaviour and would prefer that the contexts are cleaned up immediately upon closure. This ensures that when clients are run without hangchecking (e.g. for compute kernels of indeterminate runtime), any GPU hang or other unexpected workloads are terminated with the process and does not continue to hog resources. The default behaviour for new contexts is the legacy persistence mode, as some desktop applications are dependent upon the existing behaviour. New clients will have to opt in to immediate cleanup on context closure. If the hangchecking modparam is disabled, so is persistent context support -- all contexts will be terminated on closure. We expect this behaviour change to be welcomed by compute users, who have often been caught between a rock and a hard place. They disable hangchecking to avoid their kernels being "unfairly" declared hung, but have also experienced true hangs that the system was then unable to clean up. Naturally, this leads to bug reports. Testcase: igt/gem_ctx_persistence Link: intel/compute-runtime#228 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Jon Bloomfield Reviewed-by: Jon Bloomfield Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20191029202338.8841-1-chris@chris-wilson.co.uk

MichalMrozek · 2020-01-07T09:19:07Z

Change is merged 07cacd4

Our existing behaviour is to allow contexts and their GPU requests to persist past the point of closure until the requests are complete. This allows clients to operate in a 'fire-and-forget' manner where they can setup a rendering pipeline and hand it over to the display server and immediately exit. As the rendering pipeline is kept alive until completion, the display server (or other consumer) can use the results in the future and present them to the user. The compute model is a little different. They have little to no buffer sharing between processes as their kernels tend to operate on a continuous stream, feeding the results back to the client application. These kernels operate for an indeterminate length of time, with many clients wishing that the kernel was always running for as long as they keep feeding in the data, i.e. acting like a DSP. Not all clients want this persistent "desktop" behaviour and would prefer that the contexts are cleaned up immediately upon closure. This ensures that when clients are run without hangchecking (e.g. for compute kernels of indeterminate runtime), any GPU hang or other unexpected workloads are terminated with the process and does not continue to hog resources. The default behaviour for new contexts is the legacy persistence mode, as some desktop applications are dependent upon the existing behaviour. New clients will have to opt in to immediate cleanup on context closure. If the hangchecking modparam is disabled, so is persistent context support -- all contexts will be terminated on closure. We expect this behaviour change to be welcomed by compute users, who have often been caught between a rock and a hard place. They disable hangchecking to avoid their kernels being "unfairly" declared hung, but have also experienced true hangs that the system was then unable to clean up. Naturally, this leads to bug reports. Testcase: igt/gem_ctx_persistence Link: intel/compute-runtime#228 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20191029202338.8841-1-chris@chris-wilson.co.uk (cherry picked from commit a0e047156cdebbccf253768b39d7e1dbf954c449) BUG=b:152719649 TEST=Test Graphics/Media/Display use cases Signed-off-by: Ap, Kamal <kamal.ap@intel.com> Change-Id: Iccf876394da8eaebfad011c4e45c8b466aec6f53

Our existing behaviour is to allow contexts and their GPU requests to persist past the point of closure until the requests are complete. This allows clients to operate in a 'fire-and-forget' manner where they can setup a rendering pipeline and hand it over to the display server and immediately exit. As the rendering pipeline is kept alive until completion, the display server (or other consumer) can use the results in the future and present them to the user. The compute model is a little different. They have little to no buffer sharing between processes as their kernels tend to operate on a continuous stream, feeding the results back to the client application. These kernels operate for an indeterminate length of time, with many clients wishing that the kernel was always running for as long as they keep feeding in the data, i.e. acting like a DSP. Not all clients want this persistent "desktop" behaviour and would prefer that the contexts are cleaned up immediately upon closure. This ensures that when clients are run without hangchecking (e.g. for compute kernels of indeterminate runtime), any GPU hang or other unexpected workloads are terminated with the process and does not continue to hog resources. The default behaviour for new contexts is the legacy persistence mode, as some desktop applications are dependent upon the existing behaviour. New clients will have to opt in to immediate cleanup on context closure. If the hangchecking modparam is disabled, so is persistent context support -- all contexts will be terminated on closure. We expect this behaviour change to be welcomed by compute users, who have often been caught between a rock and a hard place. They disable hangchecking to avoid their kernels being "unfairly" declared hung, but have also experienced true hangs that the system was then unable to clean up. Naturally, this leads to bug reports. Testcase: igt/gem_ctx_persistence Link: intel/compute-runtime#228 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20191029202338.8841-1-chris@chris-wilson.co.uk (cherry picked from commit a0e0471) BUG=b:152719649 TEST=Test Graphics/Media/Display use cases Signed-off-by: Ap, Kamal <kamal.ap@intel.com> Change-Id: Iccf876394da8eaebfad011c4e45c8b466aec6f53

Our existing behaviour is to allow contexts and their GPU requests to persist past the point of closure until the requests are complete. This allows clients to operate in a 'fire-and-forget' manner where they can setup a rendering pipeline and hand it over to the display server and immediately exit. As the rendering pipeline is kept alive until completion, the display server (or other consumer) can use the results in the future and present them to the user. The compute model is a little different. They have little to no buffer sharing between processes as their kernels tend to operate on a continuous stream, feeding the results back to the client application. These kernels operate for an indeterminate length of time, with many clients wishing that the kernel was always running for as long as they keep feeding in the data, i.e. acting like a DSP. Not all clients want this persistent "desktop" behaviour and would prefer that the contexts are cleaned up immediately upon closure. This ensures that when clients are run without hangchecking (e.g. for compute kernels of indeterminate runtime), any GPU hang or other unexpected workloads are terminated with the process and does not continue to hog resources. The default behaviour for new contexts is the legacy persistence mode, as some desktop applications are dependent upon the existing behaviour. New clients will have to opt in to immediate cleanup on context closure. If the hangchecking modparam is disabled, so is persistent context support -- all contexts will be terminated on closure. We expect this behaviour change to be welcomed by compute users, who have often been caught between a rock and a hard place. They disable hangchecking to avoid their kernels being "unfairly" declared hung, but have also experienced true hangs that the system was then unable to clean up. Naturally, this leads to bug reports. Testcase: igt/gem_ctx_persistence Link: intel/compute-runtime#228 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Link: https://patchwork.freedesktop.org/patch/msgid/20191029202338.8841-1-chris@chris-wilson.co.uk

…org/drm/drm-intel into drm-next UAPI Changes: - Make context persistence optional Allow userspace to tie the context lifetime to FD lifetime, effectively allowing Ctrl-C killing of a process to also clean up the hardware immediately. Compute changes: intel/compute-runtime#228 The compute driver is shipping in Ubuntu. uAPI acked by Mesa folks. - Put future HW and their uAPIs under STAGING & BROKEN Introduces DRM_I915_UNSTABLE Kconfig menu for working on the new uAPI for future HW in upstream. We already disable driver loading by default the platform is deemed ready. This is a second level of protection based on compile time switch (STAGING & BROKEN). - Under DRM_I915_UNSTABLE: Add the fake lmem region on iGFX Fake local memory region on integrated GPU through cmdline: memmap=2G$16G i915.fake_lmem_start=0x400000000 Currently allows testing non-mappable GGTT behavior and running kernel selftest for local memory. Driver Changes: - Fix Bugzilla #112084: VGA external monitor not working (Ville) - Add support for half float framebuffers (Ville) - Add perf support on TGL (Lionel) - Replace hangcheck by heartbeats (Chris) - Allow SPT PCH on all AML devices (James) - Add new CNL PCH for CML platform (Imre) - Allow 100 ms (Kconfig) for workloads to exit before reset (Chris, Jon, Joonas) - Forcibly pre-empt a context after 100 ms (Kconfig) of delay (Chris) - Make timeslice duration Kconfig configurable (Chris) - Whitelist PS_(DEPTH|INVOCATION)_COUNT for Tigerlake (Tapani) - Support creating LMEM objects in kernel (Matt A) - Adjust the location of RING_MI_MODE in the context image for TGL (Chris) - Handle AUX interrupts for TC ports (Matt R) - Add support for devices without mappable GGTT aperture (Daniele) - Rename "inject_load_failure" module parameter to "inject_probe_failure" (Janusz) - Handle fused off HDCP, FBC, DMC and DSC (Jose) - Add support to one DP-MST stream on Tigerlake (Lucas) - Add HuC firmware (and GuC) for TGL (Daniele) - Allow ICL+ DSI on any pipe (Ville) - Check some transcoder timing minimum limits (Ville) - Don't set queue_priority_hint if we don't kick the submission (Chris) - Introduce barrier pulses along engines to flush idle/in-flight requests (Chris) - Drop assertion that ce->pin_mutex guards state updates (Chris) - Cancel banned contexts on schedule-out (Chris) - Cancel contexts when hangchecking is disabled (Chris) - Catch GTT fault errors for gen11+ planes (Matt R) - Print in debugfs if PSR is not enabled because of sink (Jose) - Do not set MOCS control values on dgfx (Lucas) - Setup io-mapping for LMEM (Abdiel) - Support kernel mapping of LMEM objects (Abdiel) - Add LMEM selftests (Matt A) - Initialise PMU spinlock before registering (Chris) - Clear DKL_TX_PMD_LANE_SUS before program TC voltage swing (Jose) - Flip interpretation of ips fmin/fmax to max rps (Chris) - Add VBT compression parameter block definition (Jani) - Limit the blitter sizes to ensure low preemption latency (Chris) - Fixup block_size rounding on BLT (Matt A) - Don't try to place HWS in non-existing mappable region (Michal Wa) - Don't allocate the ring in stolen if we lack aperture (Matt A) - Add AUX B & C to DC_OFF_POWER_DOMAINS for Tigerlake (Matt R) - Avoid HPD poll detect triggering a new detect cycle (Imre) - Document the userspace fail with possible_crtcs (Ville) - Drop lrc header page now unused by GuC (Daniele) - Do not switch aux to TBT mode for non-TC ports (Jose) - Restructure code to avoid depending on i915 but smaller structs (Chris, Tvrtko, Andi) - Remove pm park/unpark notifications (Chris) - Avoid lockdep cross-contamination between object types (Chris) - Restructure DSC code (Jani) - Fix dead locking in early workload shadow (Zhenyu) - Split the legacy submission backend from the common CS ring buffer (Chris) - Move intel_engine_context_in/out into intel_lrc.c (Tvrtko) - Describe perf/wakeref structure members in documentation (Anna) - Update renamed header files names in documentation (Anna) - Add debugs to distingiush a cd2x update from a full cdclk pll update (Ville) - Rework atomic global state locking (Ville) - Allow planes to declare their minimum acceptable cdclk (Ville) - Eliminate skl_check_pipe_max_pixel_rate() and simplify skl_max_scale() (Ville) - Making loglevel of PSR2/SU logs same (Ap) - Capture aux page table error register (Lionel) - Add is_dgfx to device info (Jose) - Split gen11_irq_handler to make it shareable (Lucas) - Encapsulate kconfig constant values inside boolean predicates (Chris) - Split memory_region initialisation into its own file (Chris) - Use _PICK() for CHICKEN_TRANS() and add CHICKEN_TRANS_D (Ville) - Add perf helper macros for comparing with whitelisted registers (Umesh) - Fix i915_inject_load_error() name to read *_probe_* (Janusz) - Drop unused AUX register offsets (Matt R) - Provide more information on DP AUX failures (Matt R) - Add GAM/SFC instdone to error state (Mika) - Always track callers to intel_rps_mark_interactive() (Chris) - Nuke 'mode' argument to intel_get_load_detect_pipe() (Ville) - Simplify LVDS crtc_mask and pipe_mask setup (Ville) - Stop frobbing crtc->base.mode (Ville) - Do s/crtc_mask/pipe_mask/ (Ville) - Split detaching and removing the vma (Chris) - Selftest improvements (Chris, Tvrtko, Mika, Matt A, Lionel) - GuC code improvements (Rob, Andi, Daniele) - Check against i915_selftest only under CONFIG_SELFTEST (Chris) - Refine occupancy test in kill_context() (Chris) - Start kthreads before stopping (Chris) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191101104718.GA14323@jlahtine-desk.ger.corp.intel.com

MichalMrozek reviewed Oct 24, 2019

View reviewed changes

I915_CONTEXT_PARAM_PERSISTENCE support

1bf1fdc

Add new IOCTL call to disable persistence on given context Signed-off-by: Dunajski, Bartosz <bartosz.dunajski@intel.com>

BartoszDunajski force-pushed the persistence_patch branch from 0935d4d to 1bf1fdc Compare October 25, 2019 08:39

MichalMrozek closed this Jan 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I915_CONTEXT_PARAM_PERSISTENCE support #228

I915_CONTEXT_PARAM_PERSISTENCE support #228

Uh oh!

BartoszDunajski commented Oct 24, 2019

Uh oh!

MichalMrozek left a comment

Uh oh!

SlawomirMakowski commented Oct 29, 2019

Uh oh!

MichalMrozek commented Jan 7, 2020 •

edited

Loading

Uh oh!

Uh oh!

I915_CONTEXT_PARAM_PERSISTENCE support #228

I915_CONTEXT_PARAM_PERSISTENCE support #228

Uh oh!

Conversation

BartoszDunajski commented Oct 24, 2019

Uh oh!

MichalMrozek left a comment

Choose a reason for hiding this comment

Uh oh!

SlawomirMakowski commented Oct 29, 2019

Uh oh!

MichalMrozek commented Jan 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MichalMrozek commented Jan 7, 2020 •

edited

Loading