Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel/gpu: Add support for Gen9+ #4260

Closed
ssumpf opened this issue Sep 21, 2021 · 10 comments
Closed

intel/gpu: Add support for Gen9+ #4260

ssumpf opened this issue Sep 21, 2021 · 10 comments

Comments

@ssumpf
Copy link
Member

ssumpf commented Sep 21, 2021

Currently the GPU multiplexer only supports IGT-devices found on Broadwell (Gen8). We want to add support for Skylake (Gen9) and Kabylake (Gen9 and Gen9.5) as well.

ssumpf added a commit to ssumpf/genode that referenced this issue Sep 21, 2021
Offer GPU session for testing 3D on Genode. Note, this will also work in
Qemu because the mutliplexer will probe for supported GPUs during
startup.

issue genodelabs#4260
ssumpf added a commit to ssumpf/genode that referenced this issue Sep 21, 2021
* Wait for for completion before return from 'execbuffer2'. This makes
  buffer execution synchronous.

* Because the Iris driver manages the virtual address space of the GPU
  and creates one GEM context for each batch buffer we have to map/unmap
  all buffer objects before and after batch buffer execution.

issue genodelabs#4260
ssumpf added a commit to ssumpf/genode that referenced this issue Sep 21, 2021
This commit contains features and buf fixes:

* Catch errors during resource allocation

* Because Mesa tries to allocate fence (hardware) registers for each
  batch buffer execution, do not allocate new fences for buffer objects
  that are already fenced

* Add support for global hardware status page. Each context additionally
  has a per-process hardware status page, which we used to set the
  global hardware status page during Vgpu switch. This was obviously
  wrong. There is only one global hardware status page (set once during
  initialization) and a distinct per-process page for contexts.

* Write the sequence number of the currently executing batch buffer to
  dword 52 of the per-process hardware status page. We use the pipe line
  command with QW_WRITE (quad word write), GLOBAL_GTT_IVB disabled
  (address space is per-process address space), and STORE_DATA_INDEX
  enabled (write goes to offset of hardware status page). This command
  used to write to the scratch page. But Linux now uses the first
  reserved word of the per-process hardware status page.

* Add Gen9+ WaEnableGapsTsvCreditFix workaround. This sets the "GAPS TSV
  Credit fix Enable" bit of the Arbiter control register (GARBCNTLREG)
  as described by the documentation this bit should be set by the BIOS
  but is not on most Gen9/9.5 platforms. Not setting this bit leads to
  random GPU hangs.

* Increase the context size from 20 to 22 pages for Gen9. On Gen8 the
  hardware context is 20 pages (1 hardware status page + 19 ring context
  register pages). On Gen9 the size of the ring context registers has
  increased by two pages to 21 pages or 81.3125 KBytes as the IGD
  documentation states.

* The logical ring size in the ring buffer control of the execlist
  context has to be programmed with number of pages - 1. So 0 is 1 page.
  We programmed the actual number of pages before, leading to ring
  buffer execution of NOOPs if page behind our ring buffer was empty or
  GPU hangs if there was data on the page.

issue genodelabs#4260
@ssumpf
Copy link
Member Author

ssumpf commented Sep 21, 2021

The commits above add support for Intel GPU's Gen9/9.5 to Genode.

@ssumpf ssumpf added the fixed label Sep 21, 2021
@nfeske
Copy link
Member

nfeske commented Sep 21, 2021

Congratulations @ssumpf, @alex-ab, and @cnuke for solving all these tricky issues!

ssumpf added a commit to ssumpf/genode that referenced this issue Sep 21, 2021
* Add 'Mi_arb_on_off' and 'Mi_arb_check' to commands.h
* Use Mi_* commands from commands.h

issue genodelabs#4260
@ssumpf
Copy link
Member Author

ssumpf commented Sep 21, 2021

@nfeske: 3c0e644 uses classes from commands.h to setup MI_* commands instead of macros.

ssumpf added a commit to ssumpf/genode that referenced this issue Sep 22, 2021
This is a left over from Mesa-11 and we exchanged it with a
'wait_and_dispatch_one_io_signal' for synchronous signal waits.

issue genodelabs#4260
@ssumpf
Copy link
Member Author

ssumpf commented Sep 22, 2021

@nfeske: ccb8037 removes the signal entrypoint from libdrm for Iris. It is not present in entaviv.

chelmuth pushed a commit that referenced this issue Sep 23, 2021
Offer GPU session for testing 3D on Genode. Note, this will also work in
Qemu because the mutliplexer will probe for supported GPUs during
startup.

issue #4260
chelmuth pushed a commit that referenced this issue Sep 23, 2021
* Wait for for completion before return from 'execbuffer2'. This makes
  buffer execution synchronous.

* Because the Iris driver manages the virtual address space of the GPU
  and creates one GEM context for each batch buffer we have to map/unmap
  all buffer objects before and after batch buffer execution.

issue #4260
chelmuth pushed a commit that referenced this issue Sep 23, 2021
This commit contains features and buf fixes:

* Catch errors during resource allocation

* Because Mesa tries to allocate fence (hardware) registers for each
  batch buffer execution, do not allocate new fences for buffer objects
  that are already fenced

* Add support for global hardware status page. Each context additionally
  has a per-process hardware status page, which we used to set the
  global hardware status page during Vgpu switch. This was obviously
  wrong. There is only one global hardware status page (set once during
  initialization) and a distinct per-process page for contexts.

* Write the sequence number of the currently executing batch buffer to
  dword 52 of the per-process hardware status page. We use the pipe line
  command with QW_WRITE (quad word write), GLOBAL_GTT_IVB disabled
  (address space is per-process address space), and STORE_DATA_INDEX
  enabled (write goes to offset of hardware status page). This command
  used to write to the scratch page. But Linux now uses the first
  reserved word of the per-process hardware status page.

* Add Gen9+ WaEnableGapsTsvCreditFix workaround. This sets the "GAPS TSV
  Credit fix Enable" bit of the Arbiter control register (GARBCNTLREG)
  as described by the documentation this bit should be set by the BIOS
  but is not on most Gen9/9.5 platforms. Not setting this bit leads to
  random GPU hangs.

* Increase the context size from 20 to 22 pages for Gen9. On Gen8 the
  hardware context is 20 pages (1 hardware status page + 19 ring context
  register pages). On Gen9 the size of the ring context registers has
  increased by two pages to 21 pages or 81.3125 KBytes as the IGD
  documentation states.

* The logical ring size in the ring buffer control of the execlist
  context has to be programmed with number of pages - 1. So 0 is 1 page.
  We programmed the actual number of pages before, leading to ring
  buffer execution of NOOPs if page behind our ring buffer was empty or
  GPU hangs if there was data on the page.

issue #4260
chelmuth pushed a commit that referenced this issue Sep 23, 2021
* Add 'Mi_arb_on_off' and 'Mi_arb_check' to commands.h
* Use Mi_* commands from commands.h

issue #4260
chelmuth pushed a commit that referenced this issue Sep 23, 2021
This is a left over from Mesa-11 and we exchanged it with a
'wait_and_dispatch_one_io_signal' for synchronous signal waits.

issue #4260
alex-ab added a commit to alex-ab/genode that referenced this issue Sep 23, 2021
and don't assume 8M, which leads to Region_conflicts if size is >8M (X201).

Issue genodelabs#4260
alex-ab added a commit to alex-ab/genode that referenced this issue Sep 23, 2021
and don't assume 8M, which leads to Region_conflicts if size is >8M (X201).

Issue genodelabs#4260
alex-ab added a commit to alex-ab/genode that referenced this issue Sep 23, 2021
chelmuth pushed a commit that referenced this issue Sep 23, 2021
and don't assume 8M, which leads to Region_conflicts if size is >8M (X201).

Issue #4260
alex-ab added a commit to alex-ab/genode that referenced this issue Sep 24, 2021
chelmuth pushed a commit that referenced this issue Sep 24, 2021
- Lenovo T470p, T490, T490s

Issue #4260
ssumpf added a commit to ssumpf/genode that referenced this issue Sep 24, 2021
* use Gpu::addr_t (64 Bit) where necessary instead of Genode::addr_t.

issue genodelabs#4260
@ssumpf
Copy link
Member Author

ssumpf commented Sep 24, 2021

@chelmuth: 67f4019 enables building the multiplexer for 32bit platforms.

chelmuth pushed a commit that referenced this issue Sep 24, 2021
* use Gpu::addr_t (64 Bit) where necessary instead of Genode::addr_t.

issue #4260
alex-ab added a commit to alex-ab/genode that referenced this issue Sep 24, 2021
Adjust drivers_interactive, drivers_managed and sculpt accordingly.

Issue genodelabs#4260
@alex-ab
Copy link
Member

alex-ab commented Sep 24, 2021

@chelmuth: fc87fe4 implements the discussed feature, to add an additional supported device on Sculpt on the fly

@chelmuth
Copy link
Member

@alex-ab I've no objections to merge the commit beside that I like to ask you to move gpu_drv.config into gpu/intel. It feels more natural to host this driver specific information in the driver directory like we did with the event-filter chargen files.

alex-ab added a commit to alex-ab/genode that referenced this issue Sep 27, 2021
Adjust drivers_interactive, drivers_managed and sculpt accordingly.

Issue genodelabs#4260
@alex-ab
Copy link
Member

alex-ab commented Sep 27, 2021

@chelmuth: please find the adjusted commit 895647a which moves the gpu_drv config

chelmuth pushed a commit that referenced this issue Sep 27, 2021
Adjust drivers_interactive, drivers_managed and sculpt accordingly.

Issue #4260
nfeske pushed a commit that referenced this issue Sep 28, 2021
- Lenovo T470p, T490, T490s

Issue #4260
nfeske pushed a commit that referenced this issue Sep 28, 2021
* use Gpu::addr_t (64 Bit) where necessary instead of Genode::addr_t.

issue #4260
nfeske pushed a commit that referenced this issue Sep 28, 2021
Adjust drivers_interactive, drivers_managed and sculpt accordingly.

Issue #4260
nfeske added a commit that referenced this issue Oct 8, 2021
This patch reverts the commit "driver_interactive-pc: start GPU
multiplexer". The addition of the GPU service introduces inconsistencies
with boards other than PC, and increases the quota demand of the drivers
subsystem, which breaks run scripts like demo.run on OKL4 or seL4.

The generic drivers_interactive subsystems should remain deliberately
limited to plain input and framebuffer devices. To cater additional
peripherals like networking, audio, or GPU, we should use Sculpt as
testbed or create board-specific run scripts.

Issue #4260
@nfeske
Copy link
Member

nfeske commented Oct 8, 2021

Commit 395ae75 reverts the addition of the GPU service to the drivers_interactive-pc subsystem, restoring the demo.run tests on sel4, hw, and okl4.

nfeske pushed a commit that referenced this issue Oct 13, 2021
* Wait for for completion before return from 'execbuffer2'. This makes
  buffer execution synchronous.

* Because the Iris driver manages the virtual address space of the GPU
  and creates one GEM context for each batch buffer we have to map/unmap
  all buffer objects before and after batch buffer execution.

issue #4260
nfeske pushed a commit that referenced this issue Oct 13, 2021
This commit contains features and buf fixes:

* Catch errors during resource allocation

* Because Mesa tries to allocate fence (hardware) registers for each
  batch buffer execution, do not allocate new fences for buffer objects
  that are already fenced

* Add support for global hardware status page. Each context additionally
  has a per-process hardware status page, which we used to set the
  global hardware status page during Vgpu switch. This was obviously
  wrong. There is only one global hardware status page (set once during
  initialization) and a distinct per-process page for contexts.

* Write the sequence number of the currently executing batch buffer to
  dword 52 of the per-process hardware status page. We use the pipe line
  command with QW_WRITE (quad word write), GLOBAL_GTT_IVB disabled
  (address space is per-process address space), and STORE_DATA_INDEX
  enabled (write goes to offset of hardware status page). This command
  used to write to the scratch page. But Linux now uses the first
  reserved word of the per-process hardware status page.

* Add Gen9+ WaEnableGapsTsvCreditFix workaround. This sets the "GAPS TSV
  Credit fix Enable" bit of the Arbiter control register (GARBCNTLREG)
  as described by the documentation this bit should be set by the BIOS
  but is not on most Gen9/9.5 platforms. Not setting this bit leads to
  random GPU hangs.

* Increase the context size from 20 to 22 pages for Gen9. On Gen8 the
  hardware context is 20 pages (1 hardware status page + 19 ring context
  register pages). On Gen9 the size of the ring context registers has
  increased by two pages to 21 pages or 81.3125 KBytes as the IGD
  documentation states.

* The logical ring size in the ring buffer control of the execlist
  context has to be programmed with number of pages - 1. So 0 is 1 page.
  We programmed the actual number of pages before, leading to ring
  buffer execution of NOOPs if page behind our ring buffer was empty or
  GPU hangs if there was data on the page.

issue #4260
nfeske pushed a commit that referenced this issue Oct 13, 2021
This is a left over from Mesa-11 and we exchanged it with a
'wait_and_dispatch_one_io_signal' for synchronous signal waits.

issue #4260
nfeske pushed a commit that referenced this issue Oct 13, 2021
and don't assume 8M, which leads to Region_conflicts if size is >8M (X201).

Issue #4260
nfeske pushed a commit that referenced this issue Oct 13, 2021
- Lenovo T470p, T490, T490s

Issue #4260
nfeske pushed a commit that referenced this issue Oct 13, 2021
* use Gpu::addr_t (64 Bit) where necessary instead of Genode::addr_t.

issue #4260
nfeske pushed a commit that referenced this issue Oct 13, 2021
Adjust drivers_managed and sculpt accordingly.

Issue #4260
@nfeske
Copy link
Member

nfeske commented Oct 14, 2021

Fixed in master.

@nfeske nfeske closed this as completed Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants