Problem clearing software breakpoints with other halted hart on smp #893

wxjstz · 2023-07-20T07:28:30Z

This patch(39a4f37) introduces a problem. After clearing the software breakpoint, need to run fence.i to refresh the instruction cache. This instruction can only be run by the hart with the breakpoint set.

The text was updated successfully, but these errors were encountered:

wxjstz · 2023-07-20T07:29:59Z

@eosea @timsifive

TommyMurphyTM1234 · 2023-07-20T15:46:01Z

But Zifencei is an optional extension so OpenOCD can't assume that the target supports the fence.i instruction?

timsifive · 2023-07-20T20:44:08Z

Can you elaborate a bit more on what exactly the problem case is?

OpenOCD already tries to execute fence.i whenever a hart is resumed, so I don't see how you can end up in a situation where the "wrong" hart clearing a software breakpoint causes a problem.

@TommyMurphyTM1234, you are right, and OpenOCD has always just executed fence.i anyway. I imagine we'll fix that when someone is motivated or it breaks on hardware.

TommyMurphyTM1234 · 2023-07-20T21:18:01Z

@TommyMurphyTM1234, you are right, and OpenOCD has always just executed fence.i anyway. I imagine we'll fix that when someone is motivated or it breaks on hardware.

In practice that's probably ok until (if it ever happens) someone has an implementation that doesn't support Zifencei/fence.i. I might have been mixing this up a bit with the confusion that arises when people build the toolchain and omit to build multilibs for rvXX..._zifencei ever since Zifencei (and Zicsr) was split out from the base integer ISA. 🙂

wxjstz · 2023-07-21T01:41:33Z

RV32G/RV64G are general purpose platforms, G includes IMAFD_Zicsr_Zifencei. so general purpose platforms support Zifencei extensions, we may need to test if the target platform supports Zifencei or add the option. But the patch to clear software breakpoints through available cores doesn't just affect RISC-V, I believe other platforms have cache synchronization issues as well.

wxjstz · 2023-07-21T01:50:06Z

Although Zifencei is an optional extension, it often appears, and only some simple embedded platforms do not need to implement this extension.

TommyMurphyTM1234 · 2023-07-21T06:35:51Z

Apologies for going off topic on the Zifencei/fence.i issue.
I was not aware earlier that OpenOCD already uses (depends on?) this extension/instruction.
The wider issue is what extensions/instructions OpenOCD depends on the target supporting.
(In my opinion this should ideally be the base I/E integer ISAs with graceful degradation in service where other "desirable" extensions are not available).
And this issue may have arisen already in other contexts - including the wider context of the ongoing RISC-V debug specification.
Either way, this is a separate issue from the one in hand so please ignore it here from now on. :-)

Edit: I see that the issue was previously mentioned here:

Test if hart has support for fence.i #415

And there's a potentially related but wider in scope discussion here:

https://lists.riscv.org/g/tech-debug/topic/98622044#1213

wxjstz · 2023-07-24T11:35:44Z

What I mean is: this patch introduces bugs and needs to be removed. Whether the Zifencei extension is supported is another question.

TommyMurphyTM1234 · 2023-07-24T11:42:05Z

What I mean is: this patch introduces bugs and needs to be removed. Whether the Zifencei extension is supported is another question.

But you haven't addressed @timsifive's question here:

Problem clearing software breakpoints with other halted hart on smp #893 (comment)

wxjstz · 2023-07-28T04:55:48Z

Can you elaborate a bit more on what exactly the problem case is?

OpenOCD already tries to execute fence.i whenever a hart is resumed, so I don't see how you can end up in a situation where the "wrong" hart clearing a software breakpoint causes a problem.

Assume that the software breakpoint of hart0 is cleared through hart1, hart0 is running, and hart1 modifies the memory. The memory modified by hart1 may not be synchronized to the instruction cache of hart0. hart0 itself is needed to run fence.i.

@TommyMurphyTM1234, you are right, and OpenOCD has always just executed fence.i anyway. I imagine we'll fix that when someone is motivated or it breaks on hardware.

timsifive · 2023-07-28T18:31:04Z

Assume that the software breakpoint of hart0 is cleared through hart1, hart0 is running, and hart1 modifies the memory. The memory modified by hart1 may not be synchronized to the instruction cache of hart0. hart0 itself is needed to run fence.i.

That will only happen in unusual circumstances. We're looking for harts within an SMP group, and OpenOCD tries to keep all harts in a single SMP group in the same state. If it's not successful, then at best gdb is going to get confused. Having one hart in an SMP group running while another is halted I don't think is a supported use case. (It could happen because the debugger doesn't have complete control over a target, but if things change spontaneously on the target there are all kinds of possible problems.)

wxjstz · 2023-07-29T08:51:45Z

Assume that the software breakpoint of hart0 is cleared through hart1, hart0 is running, and hart1 modifies the memory. The memory modified by hart1 may not be synchronized to the instruction cache of hart0. hart0 itself is needed to run fence.i.

That will only happen in unusual circumstances. We're looking for harts within an SMP group, and OpenOCD tries to keep all harts in a single SMP group in the same state. If it's not successful, then at best gdb is going to get confused. Having one hart in an SMP group running while another is halted I don't think is a supported use case. (It could happen because the debugger doesn't have complete control over a target, but if things change spontaneously on the target there are all kinds of possible problems.)

39a4f37#diff-926a25c5eb17bac57cb7ba08fb85a02d9ffe1df033fd3336368b906ea3672805R373-R379
This modification attempts to use a halted hart to clear a running hart's software breakpoint.

timsifive · 2023-07-31T14:05:07Z

Yes, and the only way I'm aware of that code actually being executed is if the running hart became running/unavailable without the debugger requesting it. What do you think should happen in such a case?

wxjstz · 2023-08-01T02:05:09Z

+--------------+      +----------------+
| running hart |      | available hart |
+--------------+      +----------------+
        ↑fetch                ↓write (clean software breakpoint)
   +---------+             +---------+
   | I-Cache |             | D-Cache |
   +---------+             +---------+
        ↕                      ↕
        .                       .
        .                       .
        .                       .
+-----------------------------------------+
|               memory                    |
+-----------------------------------------+

FENCE.I instruction that provides explicit synchronization between writes to instruction memory and instruction fetches on the same hart.

Therefore, using available hart instead of running hart to clear software breakpoints has the problem of cache synchronization. Maybe available hart only writes instructions to its own D-Cache, and the I-Cache of running hart is not updated.

timsifive · 2023-08-01T16:50:53Z

That situation doesn't occur during normal operation, because OpenOCD will ensure that harts in the same SMP group are all halted or all running.
If it does happen, what should OpenOCD do?

wxjstz · 2023-08-01T19:34:32Z

The I-Cache, unlike the D-Cache, does not require frequent synchronization with the next level of memory, and only needs to be loaded when the cache is not hit.

When this happens there are two result:

if the address where the software breakpoint is located is cached by I-Cache, then clearing the software breakpoint will fail
if the address where the software breakpoint is located is not cached by I-Cache, then clearing the software breakpoint can succeed.

aap-sc · 2023-08-01T20:01:02Z

Could it be the case that such HW configurations should not use SMP configuration at all? I believe such configurations have difficulties even with setting SW breakpoints. The problem is the same - we set SW breakpoint only from 1 hart.

timsifive · 2023-08-01T20:54:01Z

@aap-sc, when setting SW breakpoints generally all harts are halted. OpenOCD writes the breakpoint through one hart, and then has all of them execute some fence instructions which should let the caches synchronize.

aap-sc · 2023-08-01T21:11:52Z

execute some fence instructions which should let the caches synchronize.

But does execution of fence instruction guarantee that caches are always synchronized? I mean, if I understand correctly - such synchronization is only guaranteed if some load from say hart1 actually observes a store from hart0. Which may never happen (meaning hart1 could read a previos value even if hart0 issued a fence). Or is my understanding is incorrect?

wxjstz · 2023-08-02T01:20:35Z

FENCE.I instruction that provides explicit synchronization between writes to instruction memory and instruction fetches on the same hart.

Each hart must perform write and fence.i

aap-sc · 2023-08-18T16:57:23Z

Each hart must perform write and fence.i

So if each hart must perform a write and fence - then setting of SW breakpoints may not work in the current implementation, since AFAIK we set SW breakpoint only once for the active core.
Moreover, this implies that functions like load_image need to write the same program for each hart separately, to be sure that memory is synchronized across cores. @timsifive do you know if there is any chance that debug spec somehow allows some relaxations regarding this?

TommyMurphyTM1234 · 2023-08-18T19:31:36Z

Is this article of any relevance in this context?

https://support.lauterbach.com/kb/articles/risc-v-hw-designer-instruction-data-cache-handling-in-trace32

timsifive · 2023-08-29T00:05:45Z

RISC-V and caches are still a bit behind feature completeness. This is in part because for a long time the official position was that caches are an implementation detail not visible to the ISA. There are more extensions focused on cache behavior these days. In any case, the problem of downloading a program through a debugger and executing a "process" on a different hart is identical, so the debugger should be able to do whatever software does, because it can execute all the instructions that software can. This is a long way of saying I don't think there will be any changes to the debug spec to handle caching using the program buffer.

aap-sc · 2023-08-31T18:35:55Z

@TommyMurphyTM1234

Is this article of any relevance in this context?

Thanks for the article. However, it focuses mostly on cache-maintenance operations issued by the same core. I guess that the current implementation in OpenOCD is fine, if the hart supports ifence. My understanding that the issue discussed here is mostly about a situation when we need to maintain some kind of i-cache coherence when we have several cores.

This is a long way of saying I don't think there will be any changes to the debug spec to handle caching using the program buffer.

Got it, thanks.

Sorry for the ratholing.

However, as per @wxjstz comment:

Each hart must perform write and fence.i

This means that the current implementation is not quite correct. Looks like when a SW breakpoint is set or cleared the memory should be updated by all harts. This is not what OpenOCD does currenly. Should this be fixed?

aap-sc · 2023-08-31T18:51:01Z

This means that the current implementation is not quite correct. Looks like when a SW breakpoint is set or cleared the memory should be updated by all harts. This is not what OpenOCD does currenly. Should this be fixed?

Nevermind. My bad. Looks like I'm wrong. Sorry for the potential confusion. If we set a breakpoint then this operation is performed only by the active hart, and this is by design. So the potential issue is only for the case when we clear the breakpoint in case when the original hart is unavailable.

TommyMurphyTM1234 mentioned this issue Oct 10, 2023

execute_fence does not handle errors #933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem clearing software breakpoints with other halted hart on smp #893

Problem clearing software breakpoints with other halted hart on smp #893

wxjstz commented Jul 20, 2023

wxjstz commented Jul 20, 2023

TommyMurphyTM1234 commented Jul 20, 2023 •

edited

Loading

timsifive commented Jul 20, 2023

TommyMurphyTM1234 commented Jul 20, 2023 •

edited

Loading

wxjstz commented Jul 21, 2023

wxjstz commented Jul 21, 2023

TommyMurphyTM1234 commented Jul 21, 2023 •

edited

Loading

wxjstz commented Jul 24, 2023

TommyMurphyTM1234 commented Jul 24, 2023

wxjstz commented Jul 28, 2023

timsifive commented Jul 28, 2023

wxjstz commented Jul 29, 2023

timsifive commented Jul 31, 2023

wxjstz commented Aug 1, 2023

timsifive commented Aug 1, 2023

wxjstz commented Aug 1, 2023

aap-sc commented Aug 1, 2023 •

edited

Loading

timsifive commented Aug 1, 2023

aap-sc commented Aug 1, 2023 •

edited

Loading

wxjstz commented Aug 2, 2023

aap-sc commented Aug 18, 2023

TommyMurphyTM1234 commented Aug 18, 2023

timsifive commented Aug 29, 2023

aap-sc commented Aug 31, 2023

aap-sc commented Aug 31, 2023 •

edited

Loading

Problem clearing software breakpoints with other halted hart on smp #893

Problem clearing software breakpoints with other halted hart on smp #893

Comments

wxjstz commented Jul 20, 2023

wxjstz commented Jul 20, 2023

TommyMurphyTM1234 commented Jul 20, 2023 • edited Loading

timsifive commented Jul 20, 2023

TommyMurphyTM1234 commented Jul 20, 2023 • edited Loading

wxjstz commented Jul 21, 2023

wxjstz commented Jul 21, 2023

TommyMurphyTM1234 commented Jul 21, 2023 • edited Loading

wxjstz commented Jul 24, 2023

TommyMurphyTM1234 commented Jul 24, 2023

wxjstz commented Jul 28, 2023

timsifive commented Jul 28, 2023

wxjstz commented Jul 29, 2023

timsifive commented Jul 31, 2023

wxjstz commented Aug 1, 2023

timsifive commented Aug 1, 2023

wxjstz commented Aug 1, 2023

aap-sc commented Aug 1, 2023 • edited Loading

timsifive commented Aug 1, 2023

aap-sc commented Aug 1, 2023 • edited Loading

wxjstz commented Aug 2, 2023

aap-sc commented Aug 18, 2023

TommyMurphyTM1234 commented Aug 18, 2023

timsifive commented Aug 29, 2023

aap-sc commented Aug 31, 2023

aap-sc commented Aug 31, 2023 • edited Loading

TommyMurphyTM1234 commented Jul 20, 2023 •

edited

Loading

TommyMurphyTM1234 commented Jul 20, 2023 •

edited

Loading

TommyMurphyTM1234 commented Jul 21, 2023 •

edited

Loading

aap-sc commented Aug 1, 2023 •

edited

Loading

aap-sc commented Aug 1, 2023 •

edited

Loading

aap-sc commented Aug 31, 2023 •

edited

Loading