Installation failing on Raspberry Pi CM4 for PCI-E driver #280

timonsku · 2020-12-13T00:56:04Z

Following the installation guide for the M.2 I get several compilation errors when its trying to install gasket.
Here the log of the make process:
gasket-make.log

It seems its mostly the 3 same errors
invalid use of undefined type ‘struct msix_entry’’
implicit declaration of function ‘writeq_relaxed’; did you mean ‘writel_relaxed’
implicit declaration of function ‘readq_relaxed’; did you mean ‘readw_relaxed’
implicit declaration of function ‘pci_disable_msix’; did you mean ‘pci_disable_sriov’

This is using gcc version 8.3.0 using the latest Raspbian with Kernel 5.4.51-v7l+
Unsure whether this is compiler, kernel header or code issues.

The text was updated successfully, but these errors were encountered:

Namburger · 2020-12-15T01:04:05Z

Hello @timonsku we have investigated the CM4 previously and unfortunately, we determined that it won't works with our PCIe modules as the CPU doesn't have MSI-X supports as required by our requirements.

timonsku · 2020-12-15T01:15:01Z

Hey Namburger,
the pi engineers have worked on this and have added support for MSI-X in the latest kernel.
See this forum discussion: https://www.raspberrypi.org/forums/viewtopic.php?p=1772216&sid=fa34ae6597591c1f80cb68c8138c6a67#p1772216

Namburger · 2020-12-15T01:44:21Z

As I mentioned, we have explored this path and there is still a little on going efforts but I don't believe it is something we can promise. @mbrooksx might be able to give you more info on this

timonsku · 2020-12-15T01:57:09Z

Oh I see. If it doesn't turn out to be a true hw limitation I would be very interested in seeing this getting supported.
I currently have hardware in development that would see good use of the M.2 modules.

usbguru · 2020-12-15T17:29:01Z

@timonsku
Unfortunately this ARM hardware does not support MSI-X. The raspberry pi discussion you referenced raised my hopes that limited performance with emulated interrupts might work. Although it still does not work, the on-going work is encouraging, and might lead to performance nearly as good as if the original MSI-X hardware interrupts were on the ARM silicon. Stay tuned!

mbrooksx · 2020-12-15T17:40:18Z

@timonsku : Yes, I'm actively working with the people in the Pi forum discussion. While MSI-X isn't technically supported by the BCM2711, as you saw from that patch if SW indicates it works then the PCIe hardware is actually able to map some MSI-X interrupts correctly.

We've validated farther than you have (including MSI-X), your errors are because you're building for the 32-bit kernel but the driver expects 64-bit read/write (thus why writeq/readq don't exist). My plan is to customize the driver for Pi (including 32-bit workarounds) and likely submit it to the Pi kernel vs trying to update our DKMS package. Will keep you informed of the status.

timonsku · 2020-12-15T17:52:05Z

Awesome that is great to hear :)

Valdiolus · 2020-12-29T07:09:55Z

Great to hear that somebody is working on this issue! Already received my RPI CM4 + IO Board + PCIe Coral acc.
Any news? Maybe I can help?

markus-k · 2021-01-15T12:14:09Z

Has anyone had a go at this? I've done a bit of debugging and hacking myself and got the kernel module to load and libedgetpu to start an inference (although it never finishes, some event is missing, and there is an HIB error?).

There are some changes needed in both the kernel module and the user-space drivers, so far primarily replacing 64bit memory accesses with two 32bit ones. My progress is here for the module which I have updated to the latest version from the dkms package and here for libedgetpu, but these changes are of course nowhere near merge-quality.

This is what libedgetpu logs:

I :273] Starting in normal mode
I :83] Opening /dev/apex_0. read_only=0
I :97] mmap_offset=0x0000000000040000, mmap_size=4096
I :108] Got map addr at 0x0xb6fde000
I :97] mmap_offset=0x0000000000044000, mmap_size=4096
I :108] Got map addr at 0x0xb6fdd000
I :97] mmap_offset=0x0000000000048000, mmap_size=4096
I :108] Got map addr at 0x0xb6fdc000
I :229] Read: offset = 0x00000000000486f0, value: = 0x0000000000000000, w0=0x00000000, w1=0x00000000
I :191] Write: offset = 0x00000000000487a8, value = 0x0000000000000000
I :229] Read: offset = 0x0000000000048578, value: = 0x0000000000000010, w0=0x00000010, w1=0x00000000
I :136] MmuMapper#Map() : 00000000b6627000 -> 0000000001000000 (1 pages) flags=00000000.
I :55] MapMemory() page-aligned : device_address = 0x0000000001000000
I :169] Queue base : 0xb6627000 -> 0x0000000001000000 [4096 bytes]
I :136] MmuMapper#Map() : 00000000b6628000 -> 0000000001001000 (1 pages) flags=00000000.
I :55] MapMemory() page-aligned : device_address = 0x0000000001001000
I :179] Queue status block : 0xb6628000 -> 0x0000000001001000 [16 bytes]
I :191] Write: offset = 0x0000000000048590, value = 0x0000000001000000
I :191] Write: offset = 0x0000000000048598, value = 0x0000000001001000
I :191] Write: offset = 0x00000000000485a0, value = 0x0000000000000100
I :191] Write: offset = 0x0000000000048568, value = 0x0000000000000005
I :229] Read: offset = 0x0000000000048570, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :229] Read: offset = 0x00000000000486d0, value: = 0x0000000000000000, w0=0x00000000, w1=0x00000000
I :191] Write: offset = 0x0000000000044018, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000044158, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000044198, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000441d8, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000044218, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000048788, value = 0x000000000000007f
I :229] Read: offset = 0x0000000000048788, value: = 0x000000000000007f, w0=0x0000007f, w1=0x00000000
I :191] Write: offset = 0x00000000000400c0, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040150, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040110, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040250, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040298, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000402e0, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040328, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040190, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000401d0, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040210, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000486e8, value = 0x0000000000000000
I :45] Set event fd : event_id:0 -> event_fd:7,
I :45] Set event fd : event_id:4 -> event_fd:11,
I :62] event_fd=7. Monitor thread begin.
I :45] Set event fd : event_id:5 -> event_fd:12,
I :45] Set event fd : event_id:6 -> event_fd:13,
I :62] event_fd=12. Monitor thread begin.
I :62] event_fd=11. Monitor thread begin.
I :45] Set event fd : event_id:7 -> event_fd:14,
I :62] event_fd=13. Monitor thread begin.
I :45] Set event fd : event_id:8 -> event_fd:15,
I :62] event_fd=14. Monitor thread begin.
I :45] Set event fd : event_id:9 -> event_fd:16,
I :45] Set event fd : event_id:10 -> event_fd:17,
I :62] event_fd=15. Monitor thread begin.
I :45] Set event fd : event_id:11 -> event_fd:18,
I :62] event_fd=16. Monitor thread begin.
I :62] event_fd=17. Monitor thread begin.
I :45] Set event fd : event_id:12 -> event_fd:19,
I :62] event_fd=18. Monitor thread begin.
I :191] Write: offset = 0x00000000000486a0, value = 0x000000000000000f
I :191] Write: offset = 0x00000000000485c0, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000486c0, value = 0x0000000000000001
I :172] Opening device at /dev/apex_0
I :62] event_fd=19. Monitor thread begin.
I :75] event_fd=19. Monitor thread got num_events=1.
I :191] Write: offset = 0x00000000000486c0, value = 0x0000000000000000
I :191] Write: offset = 0x00000000000486c8, value = 0x0000000000000000
I :229] Read: offset = 0x00000000000486f0, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :229] Read: offset = 0x0000000000048700, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
E :254] HIB Error. hib_error_status = 0000000000000001, hib_first_error_status = 0000000000000001
I :75] event_fd=19. Monitor thread got num_events=1.
I :191] Write: offset = 0x00000000000486c0, value = 0x0000000000000000
I :191] Write: offset = 0x00000000000486c8, value = 0x0000000000000000
I :229] Read: offset = 0x00000000000486f0, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :229] Read: offset = 0x0000000000048700, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
E :254] HIB Error. hib_error_status = 0000000000000001, hib_first_error_status = 0000000000000001
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
I :47] Adding input "map/TensorArrayStack/TensorArrayGatherV3" with 150528 bytes.
I :58] Adding output "prediction" with 965 bytes.
I :167] Request prepared, total batch size: 1, total TPU requests required: 1.
I :310] Request [0]: Submitting P0 request immediately.
I :373] Request [0]: Need to map parameters.
I :136] MmuMapper#Map() : 00000000ad93d000 -> 8000000000000000 (953 pages) flags=00000002.
I :55] MapMemory() page-aligned : device_address = 0x8000000000000000
I :252] Mapped params : Buffer(ptr=0xad93d000) -> 0x8000000000000000, 3900864 bytes.
I :252] Mapped params : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.
I :387] Request [0]: Need to do parameter-caching.
I :80] [0] Request constructed.
I :46] InstructionBuffers created.
I :653] Created new instruction buffers.
I :75] Mapped scratch : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.
I :368] MapDataBuffers() done.
I :187] Linking Parameter: 0x8000000000000000
I :136] MmuMapper#Map() : 0000000001266000 -> 8000000000400000 (3 pages) flags=00000002.
I :55] MapMemory() page-aligned : device_address = 0x8000000000400000
I :223] Mapped "instructions" : Buffer(ptr=0x1266000) -> 0x8000000000400000, 9680 bytes. Direction=1
I :384] MapInstructionBuffers() done.
I :481] [0] SetState old=0, new=1.
I :393] [0] NotifyRequestSubmitted()
I :481] [0] SetState old=1, new=2.
I :83] Request[0]: Submitted
I :401] [0] NotifyRequestActive()
I :481] [0] SetState old=2, new=3.
I :133] Request[0]: Scheduling DMA[0]
I :394] Adding an element to the host queue.
I :191] Write: offset = 0x00000000000485a8, value = 0x0000000000000001
I :80] [1] Request constructed.
I :113] Adding input "map/TensorArrayStack/TensorArrayGatherV3" with 150528 bytes.
I :188] Adding output "prediction" with 965 bytes.
I :46] InstructionBuffers created.
I :653] Created new instruction buffers.
I :75] Mapped scratch : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.
I :136] MmuMapper#Map() : 0000000001226000 -> 8000000000440000 (38 pages) flags=00000002.
I :55] MapMemory() page-aligned : device_address = 0x8000000000440000
I :223] Mapped "map/TensorArrayStack/TensorArrayGatherV3" : Buffer(ptr=0x1226440) -> 0x8000000000440440, 150528 bytes. Direction=1
I :136] MmuMapper#Map() : 0000000001276000 -> 8000000000404000 (1 pages) flags=00000004.
I :55] MapMemory() page-aligned : device_address = 0x8000000000404000
I :223] Mapped "prediction" : Buffer(ptr=0x1276000) -> 0x8000000000404000, 968 bytes. Direction=2
I :368] MapDataBuffers() done.
I :93] Linking map/TensorArrayStack/TensorArrayGatherV3[0]: 0x8000000000440440
I :93] Linking prediction[0]: 0x8000000000404000
I :136] MmuMapper#Map() : 00000000012b9000 -> 8000000000420000 (32 pages) flags=00000002.
I :55] MapMemory() page-aligned : device_address = 0x8000000000420000
I :223] Mapped "instructions" : Buffer(ptr=0x12b9000) -> 0x8000000000420000, 129536 bytes. Direction=1
I :384] MapInstructionBuffers() done.
I :481] [1] SetState old=0, new=1.
I :393] [1] NotifyRequestSubmitted()
I :481] [1] SetState old=1, new=2.
I :83] Request[1]: Submitted
I :401] [1] NotifyRequestActive()
I :481] [1] SetState old=2, new=3.
I :133] Request[1]: Scheduling DMA[0]
I :394] Adding an element to the host queue.
I :191] Write: offset = 0x00000000000485a8, value = 0x0000000000000002

Also the only interrupt firing seems to be the fatal error one:

cat /sys/class/apex/apex_0/interrupt_counts
0x00: 0
0x01: 0
0x02: 0
0x03: 0
0x04: 0
0x05: 0
0x06: 0
0x07: 0
0x08: 0
0x09: 0
0x0a: 0
0x0b: 0
0x0c: 2

Namburger · 2021-01-15T14:07:43Z

@markus-k woa, thanks for sharing that
@mbrooksx for awareness

hiwudery · 2021-01-31T05:29:55Z

@markus-k thank your for your sharing.
I add othbootargs=gasket.dma_bit_mask=32 to avoid HIB error.
But after running the sample program, I still get the following errors.
Did you have any ideas ? (Rasbian OS is 32bit; all the code is download from markus-k's repo)
Thank you
-Jack

markus-k · 2021-01-31T10:15:55Z

@hiwudery That's weird. Your upper and lower 32bits are cloned when reading from the device (see the line with I :229), which my patch should fix. Maybe the compiler optimized the two reads into one ldrd? But since that still performs two 32bit accesses, I don't really understand why that happens.

I just tried setting dma_bit_mask but still get HIB Errors, in addition to out of memory errors when mapping buffers. Also from dmesg:

[  971.201472] apex 0000:01:00.0: gasket_perform_mapping i 0
[  971.201480] apex 0000:01:00.0: gasket_page_table_map done: ha b657c000 daddr 1000000 num 1, flags 0 ret 0
[  971.201552] apex 0000:01:00.0: gasket_perform_mapping i 0
[  971.201558] apex 0000:01:00.0: gasket_page_table_map done: ha b657d000 daddr 1001000 num 1, flags 0 ret 0
[  971.271839] apex 0000:01:00.0: gasket_alloc_extended_subtable -> fail to map page ffffffffffffffff [pfn 6d9fed66 phys 732d8923]
[  971.271854] apex 0000:01:00.0: no memory for extended addr subtable
[  971.271861] apex 0000:01:00.0: page table slots (0,0) (@ 0x8000000000000000) to (8191,511) are not available
[  971.271868] apex 0000:01:00.0: gasket_page_table_map done: ha ad63c000 daddr 8000000000000000 num 953, flags 2 ret -12
[  971.271907] apex 0000:01:00.0: gasket_alloc_extended_subtable -> fail to map page ffffffffffffffff [pfn 6d9fed66 phys 732d8923]
[  971.271915] apex 0000:01:00.0: no memory for extended addr subtable
[  971.271921] apex 0000:01:00.0: page table slots (0,0) (@ 0x8000000000000000) to (8191,511) are not available
[  971.271928] apex 0000:01:00.0: gasket_page_table_map done: ha ad63c000 daddr 8000000000000000 num 953, flags 0 ret -12

I'm also not sure if dma_bit_mask is right here. The comment says it's used for PCIe controller which can't do 64bit addressing, but the Raspberry Pis PCIe controller can do 64bit addressing, but only 32bit wide accesses (as noted by PhilE here).

mbrooksx · 2021-02-03T20:38:23Z

Yes, what you've done is essentially everything I've done for debug. The only additional change you alluded to is correct - the compiler is too smart for libedgetpu and expects a competent system that would be able have 64-bit wide accesses. I fixed this by using volatile variables to skip caching. My repos of progress are:
https://github.com/mbrooksx/libedgetpu (Userspace)
https://github.com/mbrooksx/pi-cm4-gasket-hacks (Kernel)

Note that I added an additional print - the host-side page address for the failed DMA transaction (it reports 0x100004000000000 - which is outside of the Pi RAM). The hope is that dma_bit_mask and command line swiotlb=65536 would create shadow registers in the 32-bit space but the Pi PCIe restrictions are very challenging. It is likely the coherent memory (setup in libedgetpu) is corrupted and thus the shared memory between the two is passing invalid information.

The other option that may be easier is the 32-bit kernel. It has issues with allocating enough BAR memory, but with some device tree tweaks this could likely be fixed. This paired with the 32-bit "aware" user-space may be an easier path. I've asked the Pi team to investigate this as well.

geerlingguy · 2021-02-03T23:34:08Z

@mbrooksx - And for the benefit of anyone who hasn't touched BAR space allocations, here's a guide I wrote on it a few months back testing graphics cards on the CM4: https://gist.github.com/geerlingguy/9d78ea34cab8e18d71ee5954417429df

The latest 5.10.y kernels for Pi OS already increased the default allocation to 1 GB I think (maybe even 4 or 8 GB? I don't remember if I followed up and checked on those commits).

markus-k · 2021-02-04T10:56:29Z

Yes, what you've done is essentially everything I've done for debug. The only additional change you alluded to is correct - the compiler is too smart for libedgetpu and expects a competent system that would be able have 64-bit wide accesses. I fixed this by using volatile variables to skip caching. My repos of progress are:
https://github.com/mbrooksx/libedgetpu (Userspace)
https://github.com/mbrooksx/pi-cm4-gasket-hacks (Kernel)

Note that I added an additional print - the host-side page address for the failed DMA transaction (it reports 0x100004000000000 - which is outside of the Pi RAM). The hope is that dma_bit_mask and command line swiotlb=65536 would create shadow registers in the 32-bit space but the Pi PCIe restrictions are very challenging. It is likely the coherent memory (setup in libedgetpu) is corrupted and thus the shared memory between the two is passing invalid information.

The other option that may be easier is the 32-bit kernel. It has issues with allocating enough BAR memory, but with some device tree tweaks this could likely be fixed. This paired with the 32-bit "aware" user-space may be an easier path. I've asked the Pi team to investigate this as well.

Alright, at least I haven't been looking in the completely wrong place. I've done most of my debugging on a 32-bit kernel so far. The default BAR space seems to be 1GB, I'm not sure if that's enough, but I'm not seeing any BAR allocation errors.

In case this helps anyone, some more debug logs. I've added your additional debug print, on a 32-bit kernel without any additional parameters:

[   77.630936] apex 0000:01:00.0: Fault VA: 0x0
[   77.630952] apex 0000:01:00.0: Fault VA: 0x0
[   77.635926] apex 0000:01:00.0: Fault VA: 0x0
[   77.635940] apex 0000:01:00.0: Fault VA: 0x0
[   77.635953] apex 0000:01:00.0: Fault VA: 0x0
[   77.635966] apex 0000:01:00.0: Fault VA: 0x0
[   77.635978] apex 0000:01:00.0: Fault VA: 0x0
[   77.635990] apex 0000:01:00.0: Fault VA: 0x0
[   77.636002] apex 0000:01:00.0: Fault VA: 0x0
[   77.636014] apex 0000:01:00.0: Fault VA: 0x0
[   83.141193] apex 0000:01:00.0: Fault VA: 0x1001000
[   83.141216] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x8, Simple: 0x1001
[   83.141237] apex 0000:01:00.0: Computed Failing Bus Addr: 0x40c800000
[   83.141259] apex 0000:01:00.0: Fault VA: 0x1001000
[   83.141277] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x8, Simple: 0x1001
[   83.141296] apex 0000:01:00.0: Computed Failing Bus Addr: 0x40c800000
[   83.141320] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff
[   83.141345] apex 0000:01:00.0: Fault VA: 0xffffffff
[   83.141362] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x7ff, Simple: 0x1fff
[   83.141381] apex 0000:01:00.0: Computed Failing Bus Addr: 0x0
[   83.141402] apex 0000:01:00.0: Fault VA: 0x0
[   83.150222] apex 0000:01:00.0: Fault VA: 0x0
[   83.150243] apex 0000:01:00.0: Fault VA: 0x0
[   83.150263] apex 0000:01:00.0: Fault VA: 0x0
[   83.150284] apex 0000:01:00.0: Fault VA: 0x0
[   83.150309] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff

I've also tried using gasket.dma_bit_mask=32 swiotlb=65536 on a 32-bit kernel:

[   41.372303] apex 0000:01:00.0: Fault VA: 0x0
[   41.372321] apex 0000:01:00.0: Fault VA: 0x0
[   41.378062] apex 0000:01:00.0: Fault VA: 0x0
[   41.378079] apex 0000:01:00.0: Fault VA: 0x0
[   41.378094] apex 0000:01:00.0: Fault VA: 0x0
[   41.378109] apex 0000:01:00.0: Fault VA: 0x0
[   41.378124] apex 0000:01:00.0: Fault VA: 0x0
[   41.378139] apex 0000:01:00.0: Fault VA: 0x0
[   41.378153] apex 0000:01:00.0: Fault VA: 0x0
[   41.378168] apex 0000:01:00.0: Fault VA: 0x0
[   41.628343] ------------[ cut here ]------------
[   41.628367] WARNING: CPU: 3 PID: 707 at kernel/dma/swiotlb.c:683 swiotlb_map+0x38c/0x43c
[   41.628374] apex 0000:01:00.0: swiotlb addr 0x0000000415400000+4096 overflow (mask ffffffff, bus limit 47fffffff).
[   41.628379] Modules linked in: sha256_generic cfg80211 rfkill 8021q garp stp llc binfmt_misc v3d raspberrypi_hwmon vc4 gpu_sched dwc2 cec roles drm_kms_helper drm bcm2835_isp(C) i2c_bcm2835 bcm2835_codec(C) bcm2835_v4l2(C) drm_panel_orientation_quirks v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc apex(C) snd_soc_core vc_sm_cma(C) gasket(C) snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops backlight rpivid_mem uio_pdrv_genirq uio i2c_dev ip_tables x_tables ipv6
[   41.628599] CPU: 3 PID: 707 Comm: python3 Tainted: G         C        5.10.6-v7l+ #6
[   41.628602] Hardware name: BCM2711
[   41.628605] Backtrace:
[   41.628617] [<c0b84b94>] (dump_backtrace) from [<c0b84f24>] (show_stack+0x20/0x24)
[   41.628621]  r7:ffffffff r6:00000000 r5:60000013 r4:c12e6c98
[   41.628626] [<c0b84f04>] (show_stack) from [<c0b892bc>] (dump_stack+0xcc/0xf8)
[   41.628632] [<c0b891f0>] (dump_stack) from [<c02216d4>] (__warn+0xfc/0x114)
[   41.628637]  r10:00001000 r9:00000009 r8:c02a5a50 r7:000002ab r6:00000009 r5:c02a5a50
[   41.628640]  r4:c0e3cd00 r3:c1205094
[   41.628645] [<c02215d8>] (__warn) from [<c0b856c8>] (warn_slowpath_fmt+0xa4/0xd8)
[   41.628648]  r7:000002ab r6:c0e3cd00 r5:c1205048 r4:c0e3ccbc
[   41.628654] [<c0b85628>] (warn_slowpath_fmt) from [<c02a5a50>] (swiotlb_map+0x38c/0x43c)
[   41.628658]  r9:c1b8b070 r8:c1205048 r7:00000000 r6:ffffffff r5:00000000 r4:ffffffff
[   41.628664] [<c02a56c4>] (swiotlb_map) from [<c02a0668>] (dma_map_page_attrs+0x254/0x394)
[   41.628668]  r10:00000001 r9:00001000 r8:c1b8b1e0 r7:00000000 r6:ffffffff r5:c1205048
[   41.628671]  r4:c1b8b070
[   41.628690] [<c02a0414>] (dma_map_page_attrs) from [<bf115184>] (gasket_map_extended_pages+0x100/0x45c [gasket])
[   41.628694]  r10:00000000 r9:c4112000 r8:c32ab700 r7:f09dc000 r6:00000200 r5:000003b9
[   41.628697]  r4:f085d018
[   41.628717] [<bf115084>] (gasket_map_extended_pages [gasket]) from [<bf115900>] (gasket_page_table_map+0xa8/0x100 [gasket])
[   41.628721]  r10:c32ab740 r9:ad63c000 r8:00000000 r7:80000000 r6:c2f97c00 r5:c32ab700
[   41.628724]  r4:000003b9
[   41.628741] [<bf115858>] (gasket_page_table_map [gasket]) from [<bf112a9c>] (gasket_map_buffers_common+0x90/0xa8 [gasket])
[   41.628745]  r10:00000005 r9:00000001 r8:c30e1180 r7:4028dc0c r6:c2f97c00 r5:c2f97c00
[   41.628748]  r4:c32a5d90
[   41.628767] [<bf112a0c>] (gasket_map_buffers_common [gasket]) from [<bf112cac>] (gasket_handle_ioctl+0x1f8/0x8e0 [gasket])
[   41.628770]  r5:beb40fa0 r4:c1205048
[   41.628788] [<bf112ab4>] (gasket_handle_ioctl [gasket]) from [<bf1106f8>] (gasket_ioctl+0x9c/0x118 [gasket])
[   41.628792]  r9:beb40fa0 r8:c2f97c00 r7:bf09a1b0 r6:4028dc0c r5:c30e1180 r4:c1205048
[   41.628805] [<bf11065c>] (gasket_ioctl [gasket]) from [<c0451180>] (sys_ioctl+0x1d4/0x8ec)
[   41.628809]  r9:c32a4000 r8:00000000 r7:c30e1180 r6:c30e1181 r5:c1205048 r4:4028dc0c
[   41.628815] [<c0450fac>] (sys_ioctl) from [<c0200040>] (ret_fast_syscall+0x0/0x28)
[   41.628818] Exception stack(0xc32a5fa8 to 0xc32a5ff0)
[   41.628822] 5fa0:                   beb40f9c 00000000 00000005 4028dc0c beb40fa0 00000005
[   41.628826] 5fc0: beb40f9c 00000000 b454da7c 00000036 00000001 01f0349c 00000000 b48a4bbc
[   41.628829] 5fe0: b454db58 beb40f74 b443ba3f b6cd551c
[   41.628833]  r10:00000036 r9:c32a4000 r8:c0200204 r7:00000036 r6:b454da7c r5:00000000
[   41.628836]  r4:beb40f9c
[   41.628840] ---[ end trace a2d67e6b70f87dd2 ]---
[   41.628855] apex 0000:01:00.0: no memory for extended addr subtable
[   41.628861] apex 0000:01:00.0: page table slots (0,0) (@ 0x8000000000000000) to (8191,511) are not available
[   41.628911] apex 0000:01:00.0: no memory for extended addr subtable
[   41.628917] apex 0000:01:00.0: page table slots (0,0) (@ 0x8000000000000000) to (8191,511) are not available
[   41.646322] apex 0000:01:00.0: Fault VA: 0x1001000
[   41.646330] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x8, Simple: 0x1001
[   41.646338] apex 0000:01:00.0: Computed Failing Bus Addr: 0xc800000
[   41.646347] apex 0000:01:00.0: Fault VA: 0x1001000
[   41.646352] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x8, Simple: 0x1001
[   41.646359] apex 0000:01:00.0: Computed Failing Bus Addr: 0xc800000
[   41.646372] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff
[   41.646384] apex 0000:01:00.0: Fault VA: 0xffffffff
[   41.646389] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x7ff, Simple: 0x1fff
[   41.646396] apex 0000:01:00.0: Computed Failing Bus Addr: 0xdeadbeef
[   41.646405] apex 0000:01:00.0: Fault VA: 0x0
[   41.648266] apex 0000:01:00.0: Fault VA: 0x0
[   41.648275] apex 0000:01:00.0: Fault VA: 0x0
[   41.648283] apex 0000:01:00.0: Fault VA: 0x0
[   41.648292] apex 0000:01:00.0: Fault VA: 0x0
[   41.648305] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff

In this case mapping the buffer fails in libedgetpu:

I :192] Write: offset = 0x00000000000486a0, value = 0x000000000000000f
I :62] event_fd=19. Monitor thread begin.
I :192] Write: offset = 0x00000000000485c0, value = 0x0000000000000001
I :192] Write: offset = 0x00000000000486c0, value = 0x0000000000000001
I :75] event_fd=19. Monitor thread got num_events=1.
I :192] Write: offset = 0x00000000000486c0, value = 0x0000000000000000
I :192] Write: offset = 0x00000000000486c8, value = 0x0000000000000000
I :231] Read: offset = 0x00000000000486f0, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :172] Opening device at /dev/apex_0
I :231] Read: offset = 0x0000000000048700, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
E :254] HIB Error. hib_error_status = 0000000000000001, hib_first_error_status = 0000000000000001
I :75] event_fd=19. Monitor thread got num_events=1.
I :192] Write: offset = 0x00000000000486c0, value = 0x0000000000000000
I :192] Write: offset = 0x00000000000486c8, value = 0x0000000000000000
I :231] Read: offset = 0x00000000000486f0, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :231] Read: offset = 0x0000000000048700, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
E :254] HIB Error. hib_error_status = 0000000000000001, hib_first_error_status = 0000000000000001
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
I :47] Adding input "map/TensorArrayStack/TensorArrayGatherV3" with 150528 bytes.
I :58] Adding output "prediction" with 965 bytes.
I :167] Request prepared, total batch size: 1, total TPU requests required: 1.
I :310] Request [0]: Submitting P0 request immediately.
I :373] Request [0]: Need to map parameters.
I :118] Failed to map buffer with flags, error -1
Traceback (most recent call last):
  File "classify_image.py", line 126, in <module>
    main()
  File "classify_image.py", line 115, in main
    interpreter.invoke()
  File "/home/pi/venv/lib/python3.7/site-packages/tflite_runtime/interpreter.py", line 540, in invoke
    self._interpreter.Invoke()
RuntimeError: Failed to execute request. Could not map pages : 5 (Cannot allocate memory)Node number 1 (EdgeTpuDelegateForCustomOp) failed to invoke.

I :226] Releasing Edge TPU device at /dev/apex_0
I :178] Closing Edge TPU device at /dev/apex_0

hiwudery · 2021-02-04T18:06:24Z

@markus-k in gasket_page_table.c, the page table is 64bit format not 32bit format. I think the gasket_page_table also need to modify in 32bit kernel.

Address format:
Simple addresses - those whose containing pages are directly placed in the
device's address translation registers - are laid out as:
[ 63 - 25: 0 | 24 - 12: page index | 11 - 0: page offset ]

geerlingguy · 2021-02-17T05:36:08Z

I also wanted to note something here that may be of interest—I noticed earlier someone mentioned writeq being present on 64-bit OSes. I'll soon be testing the Coral TPU (M.2 A+E key version) on a Pi so haven't yet had first-hand experience, but with a different driver I was taking a look at, it seems that one problem may be that writeq is not supported on Pi OS / the Pi's PCI-E bus like it may be on some other 64-bit systems.

Edit: New bug reported relating to that driver issue is here: raspberrypi/linux#4158

geerlingguy · 2021-02-17T23:07:36Z

On 64-bit Pi OS (with latest kernel compiled at 5.10.14-v8+), I get the following kernel panic after running through the default steps in the setup guide:

(Cross-linking to geerlingguy/raspberry-pi-pcie-devices#44 (comment))

markus-k · 2021-02-17T23:33:27Z

You should probably read the rest of this issue, there hasn't been any development since my last comment to my knowledge. The default gasket module won't work at all, my fixed one at least loads and can read temperature, but something is still wrong with the DMA, so it won't work either. Then there's probably still a few other things broken in the user space driver as well.

I don't have the time to dig into this right now, and my knowledge with kernel dev is limited anyway. So best we can do is hope someone with deep understanding of how the DMA and TPU works can find some time and look into it.

timonsku · 2021-02-17T23:41:47Z

@mbrooksx sounded like Google was working on it? Maybe he could update us. I still have very big interest in this for my product but don't have the resources or know-how to dig into this.

markus-k · 2021-02-19T13:12:49Z

If someone at Google is working on it, or is going to, it would be nice to get a very rough ETA (weeks, months) on when we can expect to know whether or not the TPU will ever work over PCIe on a CM4. I'll be creating a new revision of my products PCB in few weeks, and if there's very little chance the PCIe TPU won't work anytime soon, I'll have to switch both to USB.

manishbuttan · 2022-04-13T06:41:50Z

Hello Michael, We have procured 500 Google chips for a project. The datasheet says that USB 3 is available but we need to contact Google for this. Can you please guide on where to get the USB3 details from? Regards, Manish

…

On 25-Jan-2022, at 5:34 AM, Michael Brooks ***@***.***> wrote: The driver is located at https://github.com/google/gasket-driver <https://github.com/google/gasket-driver> — Reply to this email directly, view it on GitHub <#280 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAU7ODSGFU6CKHMVIEPIXW3UXXSH5ANCNFSM4UY4ATPA>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.

manoj7410 · 2022-04-13T06:53:51Z

@manishbuttan Please contact at Coral sales link given at https://coral.ai/products/accelerator-module#tech-specs

manishbuttan · 2022-04-13T07:07:07Z

Thanks Manoj. I have posted the query on the sales link, but it usually takes a long time to hear back from them. Since this is a bit urgent, I was hoping someone from Google here can connect me to the right team to take this forward.

…

On 13-Apr-2022, at 12:24 PM, Manoj ***@***.***> wrote: @manishbuttan <https://github.com/manishbuttan> Please contact at Coral sales link given at https://coral.ai/products/accelerator-module#tech-specs <https://coral.ai/products/accelerator-module#tech-specs> — Reply to this email directly, view it on GitHub <#280 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAU7ODQ5ZIWIFRYTNSLA6ETVEZVQZANCNFSM4UY4ATPA>. You are receiving this because you were mentioned.

manishbuttan · 2022-04-18T11:24:25Z

Hello, anyone here from Google Coral Team? I have filled the online form for Sales contact twice, but there is no response. Please share USB3 implementation details for Coral Accelerator Module. I have already received over 200 Corals Accelerator Modules and can't proceed with PCB design without this information. Thanks.

hjonnala · 2022-04-18T17:33:35Z

Hello @manishbuttan Our sales team have responded to your inquiry on our website. Please check your email and follow up there. Thanks!

manishbuttan · 2022-04-18T17:42:05Z

Thanks Hemanth. Yes, I just received an email from Bill from Google. Am working with him to get this completed.

magic-blue-smoke · 2022-05-29T00:12:17Z

Designed m.2 card with Coral Accelerator Module that seem to work fine with Piunora CM4 baseboard.
Test suggestions are welcomed

TimPearson · 2022-05-29T08:17:20Z

Designed m.2 card with Coral Accelerator Module that seem to work fine with Piunora CM4 baseboard.
Test suggestions are welcomed

Brilliant idea - Why not try the cheap and widely available Waveshare board that are only around $20 and have an M key M.2 interface. I had all but given up on Coral they don't work over PCiE on RPi4 but this now becomes a new possibility.

vukitoso · 2022-05-29T11:04:56Z

Waveshare board that are only around $20 and have an M key M.2 interface

can you give a link to "Waveshare board"?

timonsku · 2022-05-30T16:40:16Z

Designed m.2 card with Coral Accelerator Module that seem to work fine with Piunora CM4 baseboard. Test suggestions are welcomed

Glad to see Piunora making full circle. It was the original reason I opened this thread :)
Hope it worked out for you so far!

langestefan · 2022-06-01T18:27:44Z

Designed m.2 card with Coral Accelerator Module that seem to work fine with Piunora CM4 baseboard. Test suggestions are welcomed

Cool stuff. Is this available somewhere?

JakobTewes · 2022-09-11T00:58:04Z

Designed m.2 card with Coral Accelerator Module that seem to work fine with Piunora CM4 baseboard. Test suggestions are welcomed

Cool stuff. Is this available somewhere?

Also quite some interest here 😜

CipherLab · 2022-10-12T17:38:20Z

Lets say I wanted to use coral usb and inserted a M.2 to USB pcie Riser Adapter... on a PC you'd go into the bios and change it from m2 to pcie, but would I have to do something similar on the Yellow? If so, can someone guide me to how?

hjonnala · 2022-10-18T05:00:26Z

Designed m.2 card with Coral Accelerator Module that seem to work fine with Piunora CM4 baseboard. Test suggestions are welcomed

Hello @magic-blue-smoke feel free to run the CTS test to evaluate the hardware design. Thanks!

fhloston · 2023-02-09T09:27:57Z

This looks very promising - Coral and NVME

https://twitter.com/Merocle/status/1622644808626970624/photo/1

ghollingworth · 2023-02-09T09:50:20Z

No it doesn't... It's no more promising that any of the other products that will not work due to hardware limitations of BCM2711 and the Google Coral device

will127534 · 2023-02-09T16:06:44Z

I'm just going to do a shameless plug here: https://github.com/will127534/Coral-USB3-M2-Module
A full opensourced design with CTS test passed.

kklem0 · 2023-02-09T16:08:49Z

I'm just going to do a shameless plug here: https://github.com/will127534/Coral-USB3-M2-Module

A full opensourced design with CTS test passed.

Nice! With MIT license 👍👍👍!

EnziinSystem · 2023-04-15T05:21:20Z

I'm just going to do a shameless plug here: https://github.com/will127534/Coral-USB3-M2-Module A full opensourced design with CTS test passed.

Your board design is very professional.

Did it work with Pi CM4 at full performance?

Thanks.

jambamamba · 2023-05-19T18:57:00Z

I'm just going to do a shameless plug here: https://github.com/will127534/Coral-USB3-M2-Module A full opensourced design with CTS test passed.

Where can we get this from? I want to try it out

will127534 · 2023-05-21T00:40:28Z

Did it work with Pi CM4 at full performance?

I think that's CTS was testing?

Where can we get this from? I want to try it out

I'm not going to sell this, I've been using this board to evaluate Coral module but it's performance, availability (The one you saw in the image takes 6 months of waiting) and having a USB3 controller in the middle of both device and host that supports PCIe just doesn't make sense in terms of power, cost, complexity. The git repo is more about documenting the USB3 capability for Coral module that Google hides from it's datasheet.

jambamamba · 2023-05-23T13:13:53Z

I'm not going to sell this, I've been using this board to evaluate Coral module but it's performance, availability (The one you saw in the image takes 6 months of waiting) and having a USB3 controller in the middle of both device and host that supports PCIe just doesn't make sense in terms of power, cost, complexity. The git repo is more about documenting the USB3 capability for Coral module that Google hides from it's datasheet.

Really appreciate you spending so much time and energy on this,

gtxaspec · 2023-05-26T06:51:05Z

offtopic a bit:

@will127534 have you thought of Coral IC> USB 3 / 3.1 / 3.2? (since the availability for the usb coral is limited at the moment ) Would this technically work with a new design?

Couldn't you add x corals to a system using this method if the Coral IC supports USB?

will127534 · 2023-05-26T07:32:12Z

offtopic a bit:

@will127534 have you thought of Coral IC> USB 3 / 3.1 / 3.2? (since the availability for the usb coral is limited at the moment ) Would this technically work with a new design?

There must be a reason why Google hide the USB3 function in the datasheet at the first place, my guess is that it needs more care (Signal boost) if the USB 3 traces goes longer, so yes you probably can but I'm not sure if that will work with a longer USB3 cable. Also Coral module is limited too.....

Couldn't you add x corals to a system using this method if the Coral IC supports USB?

Adding more coral module is indeed possible but at that point I'll move on to Nvidia's solution to probably save some bucks and save some optimizing effort for that setup.

n1mda · 2024-03-28T19:08:55Z

Is there any progress on making the Coral with with CM4?

geerlingguy · 2024-03-28T19:56:52Z

@n1mda - Coral and CM4 are a no go. Coral seems to work on the Pi 5 (and hopefully the CM5 when it is released), as it has a more compliant PCIe bus.

Namburger added the PCIe Issue relating to our pcie modules label Dec 15, 2020

Namburger closed this as completed Dec 15, 2020

usbguru reopened this Dec 15, 2020

usbguru closed this as completed Dec 15, 2020

mbrooksx reopened this Dec 15, 2020

geerlingguy mentioned this issue Dec 24, 2020

Test Google Coral TPU M.2 Accelerator A+E key geerlingguy/raspberry-pi-pcie-devices#44

Open

Namburger mentioned this issue Feb 4, 2021

Unable to install gasket-dkms #223

Closed

Namburger mentioned this issue Feb 5, 2021

Failed to verify PCIE is loaded #308

Closed

Namburger mentioned this issue Feb 18, 2021

Edge tpu PCIe on 32-bit armv7 ? #320

Closed

hjonnala assigned hjonnala and unassigned manoj7410 May 3, 2022

magic-blue-smoke mentioned this issue May 26, 2022

CM4 Coral m.2 TPU bring up log magic-blue-smoke/CM4-Coral-m.2-TPU#1

Open

LaurenceGough mentioned this issue May 7, 2023

[Detector Support]: PCIe based corals for RPi blakeblackshear/frigate#5908

Closed

Installation failing on Raspberry Pi CM4 for PCI-E driver #280

Installation failing on Raspberry Pi CM4 for PCI-E driver #280

Comments

timonsku commented Dec 13, 2020

Namburger commented Dec 15, 2020 • edited

timonsku commented Dec 15, 2020

Namburger commented Dec 15, 2020

timonsku commented Dec 15, 2020

usbguru commented Dec 15, 2020

mbrooksx commented Dec 15, 2020

timonsku commented Dec 15, 2020

Valdiolus commented Dec 29, 2020

markus-k commented Jan 15, 2021 • edited

Namburger commented Jan 15, 2021

hiwudery commented Jan 31, 2021

markus-k commented Jan 31, 2021

mbrooksx commented Feb 3, 2021

geerlingguy commented Feb 3, 2021

markus-k commented Feb 4, 2021

hiwudery commented Feb 4, 2021

geerlingguy commented Feb 17, 2021 • edited

geerlingguy commented Feb 17, 2021

markus-k commented Feb 17, 2021

timonsku commented Feb 17, 2021

markus-k commented Feb 19, 2021

manishbuttan commented Apr 13, 2022 via email

manoj7410 commented Apr 13, 2022

manishbuttan commented Apr 13, 2022 via email

manishbuttan commented Apr 18, 2022

hjonnala commented Apr 18, 2022

manishbuttan commented Apr 18, 2022

magic-blue-smoke commented May 29, 2022

TimPearson commented May 29, 2022

vukitoso commented May 29, 2022

timonsku commented May 30, 2022

langestefan commented Jun 1, 2022

JakobTewes commented Sep 11, 2022

CipherLab commented Oct 12, 2022

hjonnala commented Oct 18, 2022

fhloston commented Feb 9, 2023

ghollingworth commented Feb 9, 2023

will127534 commented Feb 9, 2023 • edited

kklem0 commented Feb 9, 2023

EnziinSystem commented Apr 15, 2023 • edited

jambamamba commented May 19, 2023

will127534 commented May 21, 2023 • edited

jambamamba commented May 23, 2023

gtxaspec commented May 26, 2023 • edited

will127534 commented May 26, 2023 • edited

n1mda commented Mar 28, 2024

geerlingguy commented Mar 28, 2024

Namburger commented Dec 15, 2020 •

edited

markus-k commented Jan 15, 2021 •

edited

geerlingguy commented Feb 17, 2021 •

edited

will127534 commented Feb 9, 2023 •

edited

EnziinSystem commented Apr 15, 2023 •

edited

will127534 commented May 21, 2023 •

edited

gtxaspec commented May 26, 2023 •

edited

will127534 commented May 26, 2023 •

edited