Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probe: handle pipelined OUT transfers properly #47

Closed
wants to merge 1 commit into from

Conversation

P33M
Copy link
Contributor

@P33M P33M commented Nov 22, 2022

NB: don't merge until I find out why this causes the VL805 integrated hub to spew full-speed bus garbage around SOF and why picoprobe doesn't recover from a bus error condition.

A consequence of tinyUSB's implementation of vendor interfaces is that
it treats pipe traffic as free-flowing, i.e. no distinction is made if a
short packet is received. CMSIS-DAP should ordinarily encapsulate a set
of commands in a single packet, but if multiple packets are receieved
before the FIFO is drained then all the commands need to be processed at
once.

Ensure there is always enough RX space, and TX space for the generated
response.
@tom-van
Copy link

tom-van commented Nov 22, 2022

Tested with Ubuntu 22.04 x86_64
SWCLK seems overclocked more than 200%, tested @ adapter speed 10000 (actually more than 20 Mhz).
Speed is better, approx 340 KiB/s down, 330 KiB/s up (not as good as K82, it now gives 581/425 KiB/s with the not-yet-published OpenOCD CMSIS-DAP tuning).
Unfortunately picoprobe is not 100% reliable: a packet is lost approx in 1 to 2 of 10 memory dumps
'''

dump_image /dev/null 0x20000000 0x40000
CMSIS-DAP command mismatch. Expected 0x5 received 0xa1
SWD DPIDR 0x0bc12477, DLPIDR 0x00000001
Failed to read memory at 0x20019c00

dump_image /dev/null 0x20000000 0x40000
dumped 262144 bytes in 0.787822s (324.947 KiB/s)

dump_image /dev/null 0x20000000 0x40000
CMSIS-DAP command mismatch. Expected 0x5 received 0x6e
SWD DPIDR 0x0bc12477, DLPIDR 0x00000001
Failed to read memory at 0x20038c00

dump_image /dev/null 0x20000000 0x40000
dumped 262144 bytes in 0.779235s (328.527 KiB/s)
'''

Your version correctly picks concatenated packet from the input FIFO.
But what if two output packets get joined in the output FIFO?
No debugger copes with that.

If you change DAP_PACKET_COUNT to 4U (OpenOCD now uses up to 4
to allow tweaking) the packet errors happen quite often.

@P33M
Copy link
Contributor Author

P33M commented Nov 22, 2022

Tested with Ubuntu 22.04 x86_64 SWCLK seems overclocked more than 200%, tested @ adapter speed 10000 (actually more than 20 Mhz).

Is this verified with a logic probe? I can't think of anything I did that would break swclk setup.

Edit: IN transfers will likely violate the constraint that DAP responses must not cross packet boundaries. I need to find out if tinyUSB has some method of enforcing a transfer boundary in this case.

@tom-van
Copy link

tom-van commented Nov 22, 2022

Tested with Ubuntu 22.04 x86_64 SWCLK seems overclocked more than 200%, tested @ adapter speed 10000 (actually more than 20 Mhz).

Is this verified with a logic probe? I can't think of anything I did that would break swclk setup.

LA sampled @ 40 MHz

Edit: IN transfers will likely violate the constraint that DAP responses must not cross packet boundaries. I need to find out if tinyUSB has some method of enforcing a transfer boundary in this case.

I already proposed:

Seems me that picoprobe should define an application tusb driver for vendor class instead and avoid concatenating packet data in FIFOs.

I don't know details, but https://github.com/hathach/tinyusb#readme
references
raspberrypi/pico-sdk@ed45e8a
as an example of using usbd_app_driver_get_cb()

@rgrr
Copy link

rgrr commented Dec 3, 2022

I think there are overlaps between this PR and this one: #51

In fact I think the mentioned PR makes this one superfluous (sorry).

@tom-van
Copy link

tom-van commented Dec 5, 2022

Unfortunately picoprobe is not 100% reliable: a packet is lost approx in 1 to 2 of 10 memory dumps

I tried a tud_vendor_flush() after
https://github.com/raspberrypi/picoprobe/blob/pipeline-packets/src/main.c#L83

With the flush the fw seems to work reliably up to DAP_PACKET_COUNT 3.
tud_vendor_flush() is probably not blocking so the FIFO can concatenate response packets with higher packet count.

Measured (target is not a RP2040 this time):

> load_image /run/user/1000/ram128k.bin 0x1fff0000
downloaded 131072 bytes in 0.405028s (316.028 KiB/s) 

> dump_image /dev/null 0x1fff0000 0x40000
dumped 262144 bytes in 0.939887s (272.373 KiB/s)

@P33M Any progress on your side?

@rgrr
Copy link

rgrr commented Dec 5, 2022

@tom-van : could you please tell me, what those load_image and dump_image commands are? Could be of help for my tests.

@tom-van
Copy link

tom-van commented Dec 5, 2022

@rgrr
Copy link

rgrr commented Dec 5, 2022

https://openocd.org/doc/html/General-Commands.html#index-image-loading

Thanks for that.

I find your numbers above very astonishing! When I'm doing this load/dump_image I'm getting ~25KByte/s. With my CMSIS-DAPv2 "patch" I'm getting ~100KByte/s.

I'm wondering what part of my setup is broken.

@rgrr rgrr mentioned this pull request Dec 5, 2022
@P33M
Copy link
Contributor Author

P33M commented Dec 5, 2022

Unfortunately picoprobe is not 100% reliable: a packet is lost approx in 1 to 2 of 10 memory dumps

I tried a tud_vendor_flush() after https://github.com/raspberrypi/picoprobe/blob/pipeline-packets/src/main.c#L83

With the flush the fw seems to work reliably up to DAP_PACKET_COUNT 3. tud_vendor_flush() is probably not blocking so the FIFO can concatenate response packets with higher packet count.

Measured (target is not a RP2040 this time):

> load_image /run/user/1000/ram128k.bin 0x1fff0000
downloaded 131072 bytes in 0.405028s (316.028 KiB/s) 

> dump_image /dev/null 0x1fff0000 0x40000
dumped 262144 bytes in 0.939887s (272.373 KiB/s)

@P33M Any progress on your side?

Testing this has exposed an actual hardware bug in rp2040, so I've been a bit busy dealing with mitigations for that. You're correct in that responses can get arbitrarily merged by tinyUSB, and that we need a dedicated per-packet callback instead of a fifo.

@P33M
Copy link
Contributor Author

P33M commented Sep 21, 2023

The actual fix went in via 58fa7a1

@P33M P33M closed this Sep 21, 2023
@tom-van
Copy link

tom-van commented Oct 3, 2023

Tested with the current OpenOCD git master and patched OpenOCD which really pipelines CMSIS-DAP USB-bulk requests
(https://review.openocd.org/c/openocd/+/7365/7 and https://review.openocd.org/c/openocd/+/7366/6 applied)

git master:

> load_image ram256k.bin 0x20000000
262144 bytes written at address 0x20000000
downloaded 262144 bytes in 1.103554s (231.978 KiB/s)

> dump_image /dev/null 0x20000000 0x40000
dumped 262144 bytes in 2.389915s (107.117 KiB/s)

pipelining patches applied:

> load_image ram256k.bin 0x20000000
262144 bytes written at address 0x20000000
downloaded 262144 bytes in 0.681045s (375.893 KiB/s)

> dump_image /dev/null 0x20000000 0x40000
dumped 262144 bytes in 0.791802s (323.313 KiB/s)

Although not the fastest USB FS adapter, the speed-up due to pipelining is significant.
No more problems with picoprobe tusb FIFO.
I'll release OpenOCD patches for review/merging process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants