-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RX570/POLARIS12 panic during GPU post on 13/stable aarch64 #84
Comments
Just to be sure, try CURRENT with manually built 5.4-lts. But looks like it might be a PCIe issue. Is this the early revision LX2160 or the newer one? Also, why is it POSTing here, doesn't the UEFI include the QEMU package to run the VBIOS? Would be interesting to see what happens on an already-POSTed GPU. (On my mcbin, POSTing POLARIS10 from the driver works fine too, but still) |
Interesting though. I haven't updated the firmware in a while, will pull a new BSP image from SR later today for testing. I'm not sure why it seems to be POSTing so late actually, especially since it clearly already did and I typically live in in efifb just fine. Is it worth reflashing the GPU firmware with an arm64 blob? Currently my GPUs only have the vendor-installed x64 GOP driver, but adding the Amd64 GOP driver seems easy enough and might help? (https://www.workofard.com/2020/12/aarch64-option-roms-for-amd-gpus/). In the meantime, another similar coredump with a different AMD GPU, although still POLARIS12.
pciconf -lv
|
I don't think it is.
huh. Well then probably the same thing that causes the POST to fail also causes the driver to wrongly detect the GPU as uninitialized. Again, is this the early revision LX2160 or the newer one? And please try CURRENT with manually built 5.4-lts from this repo. |
Yep, still the early hardware revision, but I haven't had any PCI problems since the last we talked about it. Will test more with CURRENT GENERIC + 5.4-lts once it's all built. |
No dice:
with
|
There's this
Soooooo try |
Of course the error is in a screenshot where it's not text-searchable... Anyways, added the following to
Do we still need the syscons disable? I see issue 60 where you finally puzzled that out. |
The fix is in #61 which is still unmerged :/ but you can apply that yourself (rebase the branch onto current 5.4-lts or cherry-pick the commit). Also you didn't definitely need syscons disable, only if your efifb resolution was high enough that the memory overlapped. (I needed it for >=1440p) To 100% make sure there's no weirdness with the tunable stuff, try doing this in code instead, changing drm-kmod/drivers/gpu/drm/amd/include/amd_pcie.h Lines 43 to 47 in b45715c
#define AMDGPU_DEFAULT_PCIE_GEN_MASK (CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3 | CAIL_ASIC_PCIE_LINK_SPEED_SUPPORT_GEN3)
|
Same panic :( It looks I'm still using a UEFI from a while ago and there may be some updates there. I'll drop a new one in and see if that helps as well... |
Tried with a fresh firmware build, same panic. |
Found this: ROCm/ROCK-Kernel-Driver#62 (comment)
This is curious, because there shouldn't be another GPU attached? And this: http://macchiatobin.net/forums/topic/gpu/#post-7368 Added |
I also found some more in threads about IOMMU and iirc we do not support IOMMU (SMMU on Arm?) and I think that's something Jon was building into the firmware. Possibly that's an issue? |
Debug flag for us is
That's just one possible cause, absolutely not the only one. (Also "GPU posting now" on its own is not an error, if you don't have efifb it's expected.)
Oh, as you can see this post is already talking about OpenGL stuff, not early init. I've seen the corruption myself :) This was solved upstream some time ago, my backport was FreeBSDDesktop/kms-drm@7fe2f58
We support SMMU 3 since https://reviews.freebsd.org/D24618 but not SMMU 2. In any case it shouldn't be mandatory to use the IOMMU. Especially since you do have other PCIe cards working… hmm hmm I wonder if the PCIe link gen is not being applied through LinuxKPI somehow |
Wrong button... Latest |
|
Oddly the Try this --- i/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ w/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4013,9 +4013,11 @@ static void amdgpu_device_get_pcie_info(struct amdgpu_device *adev)
adev->pm.pcie_gen_mask = AMDGPU_DEFAULT_PCIE_GEN_MASK;
if (adev->pm.pcie_mlw_mask == 0)
adev->pm.pcie_mlw_mask = AMDGPU_DEFAULT_PCIE_MLW_MASK;
- return;
+ // return;
}
+ adev->pm.pcie_gen_mask = CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3 | CAIL_ASIC_PCIE_LINK_SPEED_SUPPORT_GEN3;
+
if (adev->pm.pcie_gen_mask && adev->pm.pcie_mlw_mask)
return; |
Same panic... |
I just noticed this:
|
Hm, very similar on my mcbin actually, even 1 more corrected error, but everything works pciconf here
Changes after loading amdgpu: @@ -5,14 +5,14 @@
subclass = VGA
bar [10] = type Prefetchable Memory, range 64, base 0x800000000, size 268435456, enabled
bar [18] = type Prefetchable Memory, range 64, base 0x810000000, size 2097152, enabled
- bar [20] = type I/O Port, range 32, base 0, size 256, disabled
+ bar [20] = type I/O Port, range 32, base 0, size 256, enabled
bar [24] = type Memory, range 32, base 0xc0000000, size 262144, enabled
cap 09[48] = vendor (length 8)
cap 01[50] = powerspec 3 supports D0 D1 D2 D3 current D0
cap 10[58] = PCI-Express 2 legacy endpoint max data 128(256) RO NS
max read 512
- link x4(x16) speed 8.0(8.0) ASPM disabled(L1)
- cap 05[a0] = MSI supports 1 message, 64 bit
+ link x4(x16) speed 5.0(8.0) ASPM disabled(L1)
+ cap 05[a0] = MSI supports 1 message, 64 bit enabled with 1 message
ecap 000b[100] = Vendor [1] ID 0001 Rev 1 Length 16
ecap 0001[150] = AER 2 0 fatal 0 non-fatal 2 corrected
ecap 0015[200] = Resizable BAR 1
@@ -33,7 +33,7 @@
cap 01[50] = powerspec 3 supports D0 D1 D2 D3 current D3
cap 10[58] = PCI-Express 2 legacy endpoint max data 128(256) RO NS
max read 512
- link x4(x16) speed 8.0(8.0) ASPM disabled(L1)
+ link x4(x16) speed 5.0(8.0) ASPM disabled(L1)
cap 05[a0] = MSI supports 1 message, 64 bit
ecap 000b[100] = Vendor [1] ID 0001 Rev 1 Length 16
ecap 0001[150] = AER 2 0 fatal 0 non-fatal 2 corrected
That's just what the device is, that's static data IIUC. Seems like link speed renegotiation is initiated from the GPU side firmware, and that parameter in the driver should tell it what speeds to support, so there kinda shouldn't be differences between Linux and FreeBSD in terms of that stuff. To be sure: have you tested under Linux? |
I have not had a chance to test under Linux, will try to get to that this week. Meanwhile, I just saw this on the OpenBSD 6.9 release notes:
That got me to this commit: openbsd/src@9e1dc75 From bluerise on Discord:
|
Huh.
Well, our implementations do too. Even though there is an odd "XXX This is all x86 specific" comment, it works fine on the MACCHIATObin so we can be sure there's nothing affecting arm64-in-general. Looking at the impl, __io_br(); // __compiler_membar() // __asm __volatile(" " : : : "memory")
v = le32toh(__raw_readl(addr));
__io_ar(); // rmb() ifdef rmb // defined as dmb(ld) in arm64/include/atomic.h and that's how it is on Linux too.
|
Sorry, forgot to answer - Yes, still the early board rev, but I haven't had problems with PCI in months. |
Confirmed the WX 2100 works just fine with |
And just to be sure I'm not doing something super dumb elsewhere, here's
No kld lines in |
Having an absolutely attrocious time getting Linux to run on this thing, but Fed34 gave me this:
There's a lot more but the system hangs for arbitrary amounts of time and get significant graphical distotions. Importantly I had to set
If I can get it to behave, I'll be able to post cleaner, fuller output... |
Some more context: https://gist.github.com/agrajag9/b0c3722f472d4e8ef6f27c194fc2cf19 |
That read on my RX 480 + mcbin returns
No. The thing on the mcbin (and socionext developerbox) is that the DesignWare controller doesn't filter TLPs properly, so some devices — mostly "legacy" ones — would appear duplicated, possibly into all the slots. AMD GPUs actually do their own filtering (just like devices supporting ARI, except it doesn't support ARI), so all we had to do was remove the workaround that basically only allowed legacy devices to work. The gen4 controller used in the early rev LX2160 is a completely different controller. drm-kmod/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c Lines 179 to 182 in e5194b8
this is how these reads work. You could add logging right there to see what other similar reads return. Maybe also add a |
dmesg spammed with Current patch is looking like this: --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c.orig 2021-06-29 00:04:05.165929000 +0000
+++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 2021-06-29 13:16:16.977577000 +0000
@@ -178,10 +178,12 @@
spin_lock_irqsave(&adev->mmio_idx_lock, flags);
writel((reg * 4), ((void __iomem *)adev->rmmio) + (mmMM_INDEX * 4));
+ msleep(2);
ret = readl(((void __iomem *)adev->rmmio) + (mmMM_DATA * 4));
spin_unlock_irqrestore(&adev->mmio_idx_lock, flags);
}
trace_amdgpu_mm_rreg(adev->pdev->device, reg, ret);
+ DRM_INFO("In amdgpu_mm_rreg: ret == %x\n", ret);
return ret;
}
@@ -836,8 +838,10 @@
{
uint32_t reg;
- if (amdgpu_sriov_vf(adev))
+ if (amdgpu_sriov_vf(adev)) {
+ DRM_INFO("amdgpu_sriov_vf(adev) == false\n");
return false;
+ }
if (amdgpu_passthrough(adev)) {
/* for FIJI: In whole GPU pass-through virtualization case, after VM reboot
@@ -846,6 +850,7 @@
* vpost executed for smc version below 22.15
*/
if (adev->asic_type == CHIP_FIJI) {
+ DRM_INFO("adev->asic_type == CHIP_FIJI\n");
int err;
uint32_t fw_ver;
err = request_firmware(&adev->pm.fw, "amdgpu/fiji_smc.bin", adev->dev);
@@ -860,13 +865,17 @@
}
if (adev->has_hw_reset) {
+ DRM_INFO("adev->has_hw_reset == false\n");
adev->has_hw_reset = false;
return true;
}
/* bios scratch used on CIK+ */
- if (adev->asic_type >= CHIP_BONAIRE)
+ if (adev->asic_type >= CHIP_BONAIRE) {
+ DRM_INFO("adev->asic_type >= CHIP_BONAIRE\n");
+ DRM_INFO("calling amdgpu_atombios_scratch_need_asic_init(adev)\n");
return amdgpu_atombios_scratch_need_asic_init(adev);
+ }
/* check MEM_SIZE for older asics */
reg = amdgpu_asic_get_config_memsize(adev);
@@ -4013,8 +4022,10 @@
adev->pm.pcie_gen_mask = AMDGPU_DEFAULT_PCIE_GEN_MASK;
if (adev->pm.pcie_mlw_mask == 0)
adev->pm.pcie_mlw_mask = AMDGPU_DEFAULT_PCIE_MLW_MASK;
- return;
+ // return;
}
+
+ adev->pm.pcie_gen_mask = CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3 | CAIL_ASIC_PCIE_LINK_SPEED_SUPPORT_GEN3;
if (adev->pm.pcie_gen_mask && adev->pm.pcie_mlw_mask)
return; |
Added a little more to the patch and seeing some other interesting things:
This repeats several thousand times until the eventual panic at 10s. Either we're failing to properly read the registers or we're pointed at the wrong location in memory. I went further down the rabbithole and I'm wondering if maybe the linuxkpi pci code is doing something wrong here? drm-kmod/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c Lines 2684 to 2686 in b45715c
|
WAIT WAIT WAIT.
Is this.. supposed to happen.. or are these host physical addresses and did both PCIe controllers map their devices into the same address?? Just in case I'm not completely stupid, please test without an NVMe drive, using SATA or USB for the system disk. |
Removed the NVMe and booted from USB. It still tries to post and panics, but it looks like it might be at least accessing different registers eventually?
|
Well, that's curious... |
That's range minimum, but they have different translation offset. In pciconf output we only see that offset applied in "Prefetchable Memory" BARs but not ones marked just "Memory". So might not be an issue after all (?) I really don't know at this point, it just looks suspicious. |
This is interesting... https://gist.github.com/agrajag9/cfc0a6887a8a001d0b35335ba4de0400 Starts with
but then almost immediately after:
From
|
You've replicated pciconf with log statements :) Look at the BTW, full info about the BARs is found at https://rocmdocs.amd.com/en/latest/GCN_ISA_Manuals/PCIe-features.html#bar-memory-overview — so the Looking at the Linux dmesg again
so, we are supposed to assign |
Potentially useful logging: diff --git i/sys/dev/pci/pci_host_generic.c w/sys/dev/pci/pci_host_generic.c
index 0c45f5d316e..3f999d86c5b 100644
--- i/sys/dev/pci/pci_host_generic.c
+++ w/sys/dev/pci/pci_host_generic.c
@@ -345,6 +345,7 @@ generic_pcie_translate_resource(device_t dev, int type, rman_res_t start,
phys_base = sc->ranges[i].phys_base;
size = sc->ranges[i].size;
+ device_printf(dev, "translate: start %lx pci_base %lx phys_base %lx size %x\n", start, pci_base, phys_base, size);
if (start < pci_base || start >= pci_base + size)
continue;
@@ -364,6 +365,7 @@ generic_pcie_translate_resource(device_t dev, int type, rman_res_t start,
if (type == space) {
*new_start = start - pci_base + phys_base;
*new_end = end - pci_base + phys_base;
+ device_printf(dev, "translate: new start %lx end %lx\n", new_start, new_end);
found = true;
break;
}
@@ -412,6 +414,10 @@ pci_host_generic_core_alloc_resource(device_t dev, device_t child, int type,
device_get_nameunit(child));
return (NULL);
}
+ device_printf(dev,
+ "translated resource %jx-%jx type %x for %s to %lx-%lx\n",
+ (uintmax_t)start, (uintmax_t)end, type,
+ device_get_nameunit(child), phys_start, phys_end);
if (bootverbose) {
device_printf(dev,
@@ -456,9 +462,14 @@ generic_pcie_activate_resource(device_t dev, device_t child, int type,
start = rman_get_start(r);
end = rman_get_end(r);
+ rman_res_t ostart = start, oend = end;
if (!generic_pcie_translate_resource(dev, type, start, end, &start,
&end))
return (EINVAL);
+ device_printf(dev,
+ "activate:translated resource %jx-%jx type %x for %s to %lx-%lx\n",
+ (uintmax_t)ostart, (uintmax_t)oend, type,
+ device_get_nameunit(child), start, end);
rman_set_start(r, start);
rman_set_end(r, end);
diff --git i/sys/dev/pci/pci_host_generic_acpi.c w/sys/dev/pci/pci_host_generic_acpi.c
index 763a84d2fd5..d16d614f5b1 100644
--- i/sys/dev/pci/pci_host_generic_acpi.c
+++ w/sys/dev/pci/pci_host_generic_acpi.c
@@ -157,6 +157,7 @@ pci_host_generic_acpi_parse_resource(ACPI_RESOURCE *res, void *arg)
res->Data.Address.ResourceType == ACPI_IO_RANGE) {
sc->base.ranges[r].pci_base = min;
sc->base.ranges[r].phys_base = min + off;
+ device_printf(dev, "ACPIPCI-parse range %d pci_base %lx phys_base %lx\n", r, min, min + off);
sc->base.ranges[r].size = max - min + 1;
if (res->Data.Address.ResourceType == ACPI_MEMORY_RANGE)
sc->base.ranges[r].flags |= FLAG_TYPE_MEM; Not tested so you'd have to fix the errors if there are any. |
No change to the output when loading the module, but dmesg here: https://gist.github.com/agrajag9/be5c9c58b91497923ae9512dac32f0d3
Also had to change a few things in your patch to make it work right, also in that gist. |
Interestingly I think we only see this for pcib1. There are no translation lines for pcib0, when I'm pretty sure there should be. |
Oh, you should've loaded amdgpu in that dmesg, the In any case, diff --git i/sys/dev/pci/pci_host_generic.c w/sys/dev/pci/pci_host_generic.c
index 0c45f5d316e..99927487e29 100644
--- i/sys/dev/pci/pci_host_generic.c
+++ w/sys/dev/pci/pci_host_generic.c
@@ -419,7 +419,7 @@ pci_host_generic_core_alloc_resource(device_t dev, device_t child, int type,
start, end, count);
}
- res = rman_reserve_resource(rm, start, end, count, flags, child);
+ res = rman_reserve_resource(rm, phys_start, phys_end, count, flags, child);
if (res == NULL)
goto fail; try |
New dmesg: https://gist.github.com/agrajag9/7a46de7807c43a8bea3c876727d85820 Except now it panics WAY faster:
|
oh.. okay. So it must be done this way for a reason. BTW:
I'm not sure if there's any harm caused by this (quite possibly none) but I've noticed https://reviews.freebsd.org/D30953 has appeared recently to fix this |
Okay now I think I see it. LinuxKPI uses pci_host_generic does not implement it. Only Because of that, LinuxKPI returns the PCI address instead of the translated physical address to the driver. diff --git i/sys/dev/pci/pci_host_generic.c w/sys/dev/pci/pci_host_generic.c
index 0c45f5d316e..6694da9d43c 100644
--- i/sys/dev/pci/pci_host_generic.c
+++ w/sys/dev/pci/pci_host_generic.c
@@ -324,7 +324,7 @@ pci_host_generic_core_release_resource(device_t dev, device_t child, int type,
}
static bool
-generic_pcie_translate_resource(device_t dev, int type, rman_res_t start,
+generic_pcie_translate_resource_end(device_t dev, int type, rman_res_t start,
rman_res_t end, rman_res_t *new_start, rman_res_t *new_end)
{
struct generic_pcie_core_softc *sc;
@@ -380,6 +380,16 @@ generic_pcie_translate_resource(device_t dev, int type, rman_res_t start,
return (found);
}
+static int
+generic_pcie_translate_resource(device_t bus, int type,
+ rman_res_t start, rman_res_t *newstart)
+{
+ rman_res_t newend; /* unused */
+
+ return (generic_pcie_translate_resource_end(
+ bus, type, start, 0, newstart, &newend));
+}
+
struct resource *
pci_host_generic_core_alloc_resource(device_t dev, device_t child, int type,
int *rid, rman_res_t start, rman_res_t end, rman_res_t count, u_int flags)
@@ -404,7 +414,7 @@ pci_host_generic_core_alloc_resource(device_t dev, device_t child, int type,
type, rid, start, end, count, flags));
/* Translate the address from a PCI address to a physical address */
- if (!generic_pcie_translate_resource(dev, type, start, end, &phys_start,
+ if (!generic_pcie_translate_resource_end(dev, type, start, end, &phys_start,
&phys_end)) {
device_printf(dev,
"Failed to translate resource %jx-%jx type %x for %s\n",
@@ -456,7 +466,7 @@ generic_pcie_activate_resource(device_t dev, device_t child, int type,
start = rman_get_start(r);
end = rman_get_end(r);
- if (!generic_pcie_translate_resource(dev, type, start, end, &start,
+ if (!generic_pcie_translate_resource_end(dev, type, start, end, &start,
&end))
return (EINVAL);
rman_set_start(r, start);
@@ -527,6 +537,7 @@ static device_method_t generic_pcie_methods[] = {
DEVMETHOD(bus_activate_resource, generic_pcie_activate_resource),
DEVMETHOD(bus_deactivate_resource, generic_pcie_deactivate_resource),
DEVMETHOD(bus_release_resource, pci_host_generic_core_release_resource),
+ DEVMETHOD(bus_translate_resource, generic_pcie_translate_resource),
DEVMETHOD(bus_setup_intr, bus_generic_setup_intr),
DEVMETHOD(bus_teardown_intr, bus_generic_teardown_intr), |
No dice:
dmesg: https://gist.github.com/agrajag9/8c7d7f03b536f9287d913b6c6ea3a4e4 |
Looks like you haven't reverted the bad one line patch from above that changes |
🤦♂️ yes, you are correct... But still broken :(
dmesg.boot and the current patchset: https://gist.github.com/agrajag9/5d5242d920f4b4b1a90ea2d0c29f479b |
That's weird. Now first, revert ALL patches, rebuild everything — make sure you're in the state that was there when the thread started — Then apply this: --- i/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ w/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2682,7 +2682,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
/* Registers mapping */
/* TODO: block userspace mapping of io register */
if (adev->asic_type >= CHIP_BONAIRE) {
- adev->rmmio_base = pci_resource_start(adev->pdev, 5);
+ adev->rmmio_base = pci_resource_start(adev->pdev, 5) + 0xa000000000;
adev->rmmio_size = pci_resource_len(adev->pdev, 5);
} else {
adev->rmmio_base = pci_resource_start(adev->pdev, 2);
@@ -4013,9 +4013,11 @@ static void amdgpu_device_get_pcie_info(struct amdgpu_device *adev)
adev->pm.pcie_gen_mask = AMDGPU_DEFAULT_PCIE_GEN_MASK;
if (adev->pm.pcie_mlw_mask == 0)
adev->pm.pcie_mlw_mask = AMDGPU_DEFAULT_PCIE_MLW_MASK;
- return;
+ // return;
}
+ adev->pm.pcie_gen_mask = CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3 | CAIL_ASIC_PCIE_LINK_SPEED_SUPPORT_GEN3;
+
if (adev->pm.pcie_gen_mask && adev->pm.pcie_mlw_mask)
return; If it works, two things to test:
|
No panic! Need to test all the other stuff too, but certainly this is progress of some sort.
|
Loads without forcing PCIe 3.0, even without the sysctl in loader.conf!
|
So that confirms the issue 🥳 🎉 🚀 the "suspicious overlap" wasn't itself the issue but it did point me in the right direction eventually. Now, using the Funny how it still logs
because the logging statement cuts the address down to a 32-bit type |
Well this is interesting:
Still trying to post and fails, but now it doesn't panic? |
oh I didn't check what return value was expected, sorry *facepalm* --- i/sys/dev/pci/pci_host_generic.c
+++ w/sys/dev/pci/pci_host_generic.c
@@ -324,7 +324,7 @@ pci_host_generic_core_release_resource(device_t dev, device_t child, int type,
}
static bool
-generic_pcie_translate_resource(device_t dev, int type, rman_res_t start,
+generic_pcie_translate_resource_end(device_t dev, int type, rman_res_t start,
rman_res_t end, rman_res_t *new_start, rman_res_t *new_end)
{
struct generic_pcie_core_softc *sc;
@@ -380,6 +380,16 @@ generic_pcie_translate_resource(device_t dev, int type, rman_res_t start,
return (found);
}
+static int
+generic_pcie_translate_resource(device_t bus, int type,
+ rman_res_t start, rman_res_t *newstart)
+{
+ rman_res_t newend; /* unused */
+
+ return (!generic_pcie_translate_resource_end(
+ bus, type, start, 0, newstart, &newend));
+}
+
struct resource *
pci_host_generic_core_alloc_resource(device_t dev, device_t child, int type,
int *rid, rman_res_t start, rman_res_t end, rman_res_t count, u_int flags)
@@ -404,7 +414,7 @@ pci_host_generic_core_alloc_resource(device_t dev, device_t child, int type,
type, rid, start, end, count, flags));
/* Translate the address from a PCI address to a physical address */
- if (!generic_pcie_translate_resource(dev, type, start, end, &phys_start,
+ if (!generic_pcie_translate_resource_end(dev, type, start, end, &phys_start,
&phys_end)) {
device_printf(dev,
"Failed to translate resource %jx-%jx type %x for %s\n",
@@ -456,7 +466,7 @@ generic_pcie_activate_resource(device_t dev, device_t child, int type,
start = rman_get_start(r);
end = rman_get_end(r);
- if (!generic_pcie_translate_resource(dev, type, start, end, &start,
+ if (!generic_pcie_translate_resource_end(dev, type, start, end, &start,
&end))
return (EINVAL);
rman_set_start(r, start);
@@ -527,6 +537,7 @@ static device_method_t generic_pcie_methods[] = {
DEVMETHOD(bus_activate_resource, generic_pcie_activate_resource),
DEVMETHOD(bus_deactivate_resource, generic_pcie_deactivate_resource),
DEVMETHOD(bus_release_resource, pci_host_generic_core_release_resource),
+ DEVMETHOD(bus_translate_resource, generic_pcie_translate_resource),
DEVMETHOD(bus_setup_intr, bus_generic_setup_intr),
DEVMETHOD(bus_teardown_intr, bus_generic_teardown_intr), |
That did it!
Now to see what happens if I run glmark2... |
Confirmed: glmark2 runs at >5k FPS with https://gist.github.com/agrajag9/62cbce1b92662079e34c8d445eb4116d I think we can call this one solved and move further discussion to https://reviews.freebsd.org/D30986 👍 |
In D21096 BUS_TRANSLATE_RESOURCE was introduced to allow LinuxKPI to get physical addresses in pci_resource_start for PowerPC and implemented in ofw_pci. When the translation was implemented in pci_host_generic in 372c142, this method was not implemented; instead a local static function was added for a similar purpose. Rename the static function to "_common" and implement the bus function as a wrapper around that. With this a LinuxKPI driver using physical addresses correctly finds the configuration registers of the GPU. This unbreaks amdgpu on NXP Layerscape LX2160A SoC (SolidRun HoneyComb LX2K workstation) which has a Translation Offset in ACPI for below-4G PCI addresses. More info: freebsd/drm-kmod#84 Tested by: dan.kotowski_a9development.com Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D30986
In D21096 BUS_TRANSLATE_RESOURCE was introduced to allow LinuxKPI to get physical addresses in pci_resource_start for PowerPC and implemented in ofw_pci. When the translation was implemented in pci_host_generic in 372c142, this method was not implemented; instead a local static function was added for a similar purpose. Rename the static function to "_common" and implement the bus function as a wrapper around that. With this a LinuxKPI driver using physical addresses correctly finds the configuration registers of the GPU. This unbreaks amdgpu on NXP Layerscape LX2160A SoC (SolidRun HoneyComb LX2K workstation) which has a Translation Offset in ACPI for below-4G PCI addresses. More info: freebsd/drm-kmod#84 Tested by: dan.kotowski_a9development.com Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D30986
When attempting to
kldmod amdgpu
, the system panicsFreeBSD version
FreeBSD honeycomb 13.0-STABLE FreeBSD 13.0-STABLE #2 stable/13-n245851-02966cbdf03: Wed Jun 2 23:16:06 UTC 2021 agrajag9@honeycomb:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64
PCI Info
pciconf -lv
DRM KMOD version
To Reproduce
Steps to reproduce the behavior:
kldload -v amdgpu
Additional context
The text was updated successfully, but these errors were encountered: