Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb] Add support for large watchpoints in lldb #79962

Conversation

jasonmolenda
Copy link
Collaborator

This patch is the next piece of work in my Large Watchpoint proposal, https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116

This patch breaks a user's watchpoint into one or more WatchpointResources which reflect what the hardware registers can cover. This means we can watch objects larger than 8 bytes, and we can watched unaligned address ranges. On a typical 64-bit target with 4 watchpoint registers you can watch 32 bytes of memory if the start address is doubleword aligned.

Additionally, if the remote stub implements AArch64 MASK style watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2 size region of memory up to 2GB, aligned to that same size.

I updated the Watchpoint constructor and CommandObjectWatchpoint to create a CompilerType of Array when the size of the watched region is greater than pointer-size and we don't have a variable type to use. For pointer-size and smaller, we can display the watched granule as an integer value; for larger-than-pointer-size we will display as an array of bytes.

I have watchpoint list now print the WatchpointResources used to implement the watchpoint.

I added a WatchpointAlgorithm class which has a top-level static method that takes an enum flag mask WatchpointHardwareFeature and a user address and size, and returns a vector of WatchpointResources covering the request. It does not take into account the number of watchpoint registers the target has, or the number still available for use. Right now there is only one algorithm, which monitors power-of-2 regions of memory. For up to pointer-size, this is what Intel hardware supports. AArch64 Byte Address Select watchpoints can watch any number of contiguous bytes in a pointer-size memory granule, that is not currently supported so if you ask to watch bytes 3-5, the algorithm will watch the entire doubleword (8 bytes). The newly default "modify" style means we will silently ignore modifications to bytes outside the watched range.

I've temporarily skipped TestLargeWatchpoint.py for all targets. It was only run on Darwin when using the in-tree debugserver, which was a proxy for "debugserver supports MASK watchpoints". I'll be adding the aforementioned feature flag from the stub and enabling full mask watchpoints when a debugserver with that feature is enabled, and re-enable this test.

I added a new TestUnalignedLargeWatchpoint.py which only has one test but it's a great one, watching a 22-byte range that is unaligned and requires four 8-byte watchpoints to cover.

I also added a unit test, WatchpointAlgorithmsTests, which has a number of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I think there's interesting possible different approaches to how we cover these; I note in the unit test that a user requesting a watch on address 0x12e0 of 120 bytes will be covered by two watchpoints today, a 128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer false positives/private stops. As we try refining this one, it's helpful to have a collection of tests to make sure things don't regress.

I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64 Ubuntu. I have not modifed the Windows process plugins yet, I might try that as a standalone patch, I'd be making the change blind, but the necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty small so it might be obvious enough that I can change it and see what the Windows CI thinks.

There isn't yet a packet (or a qSupported feature query) for the gdb remote serial protocol stub to communicate its watchpoint capabilities to lldb. I'll be doing that in a patch right after this is landed, having debugserver advertise its capability of AArch64 MASK watchpoints, and have ProcessGDBRemote add eWatchpointHardwareArmMASK to WatchpointAlgorithms so we can watch larger than 32-byte requests on Darwin.

I haven't yet tackled WatchpointResource sharing by multiple Watchpoints. This is all part of the goal, especially when we may be watching a larger memory range than the user requested, if they then add another watchpoint next to their first request, it may be covered by the same WatchpointResource (hardware watchpoint register). Also one "read" watchpoint and one "write" watchpoint on the same memory granule need to be handled, making the WatchpointResource cover all requests.

As WatchpointResources aren't shared among multiple Watchpoints yet, there's no handling of running the conditions/commands/etc on multiple Watchpoints when their shared WatchpointResource is hit. The goal beyond "large watchpoint" is to unify (much more) the Watchpoint and Breakpoint behavior and commands. I have a feeling I may be slowly chipping away at this for a while.

rdar://108234227

This patch is the next piece of work in my Large Watchpoint proposal,
https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116

This patch breaks a user's watchpoint into one or more WatchpointResources
which reflect what the hardware registers can cover. This means we
can watch objects larger than 8 bytes, and we can watched unaligned
address ranges.  On a typical 64-bit target with 4 watchpoint
registers you can watch 32 bytes of memory if the start address is
doubleword aligned.

Additionally, if the remote stub implements AArch64 MASK style
watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2
size region of memory up to 2GB, aligned to that same size.

I updated the Watchpoint constructor and CommandObjectWatchpoint
to create a CompilerType of Array<UInt8> when the size of the watched
region is greater than pointer-size and we don't have a variable
type to use.  For pointer-size and smaller, we can display the
watched granule as an integer value; for larger-than-pointer-size
we will display as an array of bytes.

I have `watchpoint list` now print the WatchpointResources used to
implement the watchpoint.

I added a WatchpointAlgorithm class which has a top-level static
method that takes an enum flag mask WatchpointHardwareFeature and
a user address and size, and returns a vector of WatchpointResources
covering the request.  It does not take into account the number of
watchpoint registers the target has, or the number still available
for use.  Right now there is only one algorithm, which monitors
power-of-2 regions of memory.  For up to pointer-size, this is what
Intel hardware supports. AArch64 Byte Address Select watchpoints
can watch any number of contiguous bytes in a pointer-size memory
granule, that is not currently supported so if you ask to watch
bytes 3-5, the algorithm will watch the entire doubleword (8 bytes).
The newly default "modify" style means we will silently ignore
modifications to bytes outside the watched range.

I've temporarily skipped TestLargeWatchpoint.py for all targets.
It was only run on Darwin when using the in-tree debugserver, which
was a proxy for "debugserver supports MASK watchpoints".  I'll be
adding the aforementioned feature flag from the stub and enabling
full mask watchpoints when a debugserver with that feature is enabled,
and re-enable this test.

I added a new TestUnalignedLargeWatchpoint.py which only has one
test but it's a great one, watching a 22-byte range that is unaligned
and requires four 8-byte watchpoints to cover.

I also added a unit test, WatchpointAlgorithmsTests, which has a
number of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints.
I think there's interesting possible different approaches to how
we cover these; I note in the unit test that a user requesting a
watch on address 0x12e0 of 120 bytes will be covered by two watchpoints
today, a 128-bytes at 0x1280 and at 0x1300.  But it could be done
with a 16-byte watchpoint at 0x12e0 and a 128-byte at 0x1300, which
would have fewer false positives/private stops.  As we try refining
this one, it's helpful to have a collection of tests to make sure
things don't regress.

I tested this on arm64 macOS, x86_64 macOS, and AArch64 Ubuntu.  I
have not modifed the Windows process plugins yet, I might try that
as a standalone patch, I'd be making the change blind, but the
necessary changes (see ProcessGDBRemote::EnableWatchpoint) are
pretty small so it might be obvious enough that I can change it and
see what the Windows CI thinks.

There isn't yet a packet (or a qSupported feature query) for the
gdb remote serial protocol stub to communicate its watchpoint
capabilities to lldb.  I'll be doing that in a patch right after
this is landed, having debugserver advertise its capability of
AArch64 MASK watchpoints, and have ProcessGDBRemote add
eWatchpointHardwareArmMASK to WatchpointAlgorithms so we can watch
larger than 32-byte requests on Darwin.

I haven't yet tackled WatchpointResource *sharing* by multiple
Watchpoints.  This is all part of the goal, especially when we may
be watching a larger memory range than the user requested, if they
then add another watchpoint next to their first request, it may be
covered by the same WatchpointResource (hardware watchpoint register).
Also one "read" watchpoint and one "write" watchpoint on the same
memory granule need to be handled, making the WatchpointResource
cover all requests.

As WatchpointResources aren't shared among multiple Watchpoints
yet, there's no handling of running the conditions/commands/etc on
multiple Watchpoints when their shared WatchpointResource is hit.
The goal beyond "large watchpoint" is to unify (much more) the
Watchpoint and Breakpoint behavior and commands.  I have a feeling
I may be slowly chipping away at this for a while.

rdar://108234227
@llvmbot llvmbot added the lldb label Jan 30, 2024
@jasonmolenda jasonmolenda requested review from DavidSpickett and removed request for JDevlieghere January 30, 2024 08:19
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 30, 2024

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

This patch is the next piece of work in my Large Watchpoint proposal, https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116

This patch breaks a user's watchpoint into one or more WatchpointResources which reflect what the hardware registers can cover. This means we can watch objects larger than 8 bytes, and we can watched unaligned address ranges. On a typical 64-bit target with 4 watchpoint registers you can watch 32 bytes of memory if the start address is doubleword aligned.

Additionally, if the remote stub implements AArch64 MASK style watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2 size region of memory up to 2GB, aligned to that same size.

I updated the Watchpoint constructor and CommandObjectWatchpoint to create a CompilerType of Array<UInt8> when the size of the watched region is greater than pointer-size and we don't have a variable type to use. For pointer-size and smaller, we can display the watched granule as an integer value; for larger-than-pointer-size we will display as an array of bytes.

I have watchpoint list now print the WatchpointResources used to implement the watchpoint.

I added a WatchpointAlgorithm class which has a top-level static method that takes an enum flag mask WatchpointHardwareFeature and a user address and size, and returns a vector of WatchpointResources covering the request. It does not take into account the number of watchpoint registers the target has, or the number still available for use. Right now there is only one algorithm, which monitors power-of-2 regions of memory. For up to pointer-size, this is what Intel hardware supports. AArch64 Byte Address Select watchpoints can watch any number of contiguous bytes in a pointer-size memory granule, that is not currently supported so if you ask to watch bytes 3-5, the algorithm will watch the entire doubleword (8 bytes). The newly default "modify" style means we will silently ignore modifications to bytes outside the watched range.

I've temporarily skipped TestLargeWatchpoint.py for all targets. It was only run on Darwin when using the in-tree debugserver, which was a proxy for "debugserver supports MASK watchpoints". I'll be adding the aforementioned feature flag from the stub and enabling full mask watchpoints when a debugserver with that feature is enabled, and re-enable this test.

I added a new TestUnalignedLargeWatchpoint.py which only has one test but it's a great one, watching a 22-byte range that is unaligned and requires four 8-byte watchpoints to cover.

I also added a unit test, WatchpointAlgorithmsTests, which has a number of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I think there's interesting possible different approaches to how we cover these; I note in the unit test that a user requesting a watch on address 0x12e0 of 120 bytes will be covered by two watchpoints today, a 128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer false positives/private stops. As we try refining this one, it's helpful to have a collection of tests to make sure things don't regress.

I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64 Ubuntu. I have not modifed the Windows process plugins yet, I might try that as a standalone patch, I'd be making the change blind, but the necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty small so it might be obvious enough that I can change it and see what the Windows CI thinks.

There isn't yet a packet (or a qSupported feature query) for the gdb remote serial protocol stub to communicate its watchpoint capabilities to lldb. I'll be doing that in a patch right after this is landed, having debugserver advertise its capability of AArch64 MASK watchpoints, and have ProcessGDBRemote add eWatchpointHardwareArmMASK to WatchpointAlgorithms so we can watch larger than 32-byte requests on Darwin.

I haven't yet tackled WatchpointResource sharing by multiple Watchpoints. This is all part of the goal, especially when we may be watching a larger memory range than the user requested, if they then add another watchpoint next to their first request, it may be covered by the same WatchpointResource (hardware watchpoint register). Also one "read" watchpoint and one "write" watchpoint on the same memory granule need to be handled, making the WatchpointResource cover all requests.

As WatchpointResources aren't shared among multiple Watchpoints yet, there's no handling of running the conditions/commands/etc on multiple Watchpoints when their shared WatchpointResource is hit. The goal beyond "large watchpoint" is to unify (much more) the Watchpoint and Breakpoint behavior and commands. I have a feeling I may be slowly chipping away at this for a while.

rdar://108234227


Patch is 28.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79962.diff

17 Files Affected:

  • (added) lldb/include/lldb/Breakpoint/WatchpointAlgorithms.h (+38)
  • (modified) lldb/include/lldb/lldb-enumerations.h (+26)
  • (modified) lldb/packages/Python/lldbsuite/test/concurrent_base.py (+6-1)
  • (modified) lldb/source/Breakpoint/CMakeLists.txt (+1)
  • (modified) lldb/source/Breakpoint/Watchpoint.cpp (+24-4)
  • (added) lldb/source/Breakpoint/WatchpointAlgorithms.cpp (+141)
  • (modified) lldb/source/Breakpoint/WatchpointResource.cpp (+3-1)
  • (modified) lldb/source/Commands/CommandObjectWatchpoint.cpp (+14-1)
  • (modified) lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp (+6-7)
  • (modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (+17-17)
  • (modified) lldb/source/Target/StopInfo.cpp (+3-1)
  • (modified) lldb/test/API/functionalities/watchpoint/large-watchpoint/TestLargeWatchpoint.py (+5)
  • (added) lldb/test/API/functionalities/watchpoint/unaligned-large-watchpoint/Makefile (+3)
  • (added) lldb/test/API/functionalities/watchpoint/unaligned-large-watchpoint/TestUnalignedLargeWatchpoint.py (+68)
  • (added) lldb/test/API/functionalities/watchpoint/unaligned-large-watchpoint/main.c (+16)
  • (modified) lldb/unittests/Breakpoint/CMakeLists.txt (+1)
  • (added) lldb/unittests/Breakpoint/WatchpointAlgorithmsTests.cpp (+162)
diff --git a/lldb/include/lldb/Breakpoint/WatchpointAlgorithms.h b/lldb/include/lldb/Breakpoint/WatchpointAlgorithms.h
new file mode 100644
index 0000000000000..8bfffba477930
--- /dev/null
+++ b/lldb/include/lldb/Breakpoint/WatchpointAlgorithms.h
@@ -0,0 +1,38 @@
+//===-- WatchpointAlgorithms.h ------------------------------------*- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_BREAKPOINT_WATCHPOINTALGORITHMS_H
+#define LLDB_BREAKPOINT_WATCHPOINTALGORITHMS_H
+
+#include "lldb/Breakpoint/WatchpointResource.h"
+#include "lldb/Utility/ArchSpec.h"
+#include "lldb/lldb-public.h"
+
+#include <vector>
+
+namespace lldb_private {
+
+class WatchpointAlgorithms {
+
+public:
+  static std::vector<lldb::WatchpointResourceSP> AtomizeWatchpointRequest(
+      lldb::addr_t addr, size_t size, bool read, bool write,
+      lldb::WatchpointHardwareFeature supported_features, ArchSpec &arch);
+
+  // Should be protected, but giving access to the algorithms in the unit
+  // tests is not easy, so it's public.
+  static std::vector<std::pair<lldb::addr_t, size_t>>
+  PowerOf2Watchpoints(lldb::addr_t user_addr, size_t user_size,
+                      size_t min_byte_size, size_t max_byte_size,
+                      uint32_t address_byte_size);
+};
+
+} // namespace lldb_private
+
+#endif // LLDB_BREAKPOINT_WATCHPOINTALGORITHMS_H
diff --git a/lldb/include/lldb/lldb-enumerations.h b/lldb/include/lldb/lldb-enumerations.h
index 392d333c23a44..7a106ca0fec20 100644
--- a/lldb/include/lldb/lldb-enumerations.h
+++ b/lldb/include/lldb/lldb-enumerations.h
@@ -448,6 +448,32 @@ enum WatchpointWriteType {
   eWatchpointWriteTypeOnModify
 };
 
+/// The hardware and native stub capabilities for a given target,
+/// for translating a user's watchpoint request into hardware
+/// capable watchpoint resources.
+FLAGS_ENUM(WatchpointHardwareFeature){
+    /// lldb will fall back to a default that assumes the target
+    /// can watch up to pointer-size power-of-2 regions, aligned to
+    /// power-of-2.
+    eWatchpointHardwareFeatureUnknown = (1u << 0),
+
+    /// Intel systems can watch 1, 2, 4, or 8 bytes (in 64-bit targets),
+    /// aligned naturally.
+    eWatchpointHardwareX86 = (1u << 1),
+    ///
+    /// ARM systems with Byte Address Select watchpoints
+    /// can watch any consecutive series of bytes up to the
+    /// size of a pointer (4 or 8 bytes), at a pointer-size
+    /// alignment.
+    eWatchpointHardwareArmBAS = (1u << 2),
+
+    /// ARM systems with MASK watchpoints can watch any power-of-2
+    /// sized region from 8 bytes to 2 gigabytes, aligned to that
+    /// same power-of-2 alignment.
+    eWatchpointHardwareArmMASK = (1u << 3),
+};
+LLDB_MARK_AS_BITMASK_ENUM(WatchpointHardwareFeature)
+
 /// Programming language type.
 ///
 /// These enumerations use the same language enumerations as the DWARF
diff --git a/lldb/packages/Python/lldbsuite/test/concurrent_base.py b/lldb/packages/Python/lldbsuite/test/concurrent_base.py
index 72e04bbb20a04..39eb27fd99747 100644
--- a/lldb/packages/Python/lldbsuite/test/concurrent_base.py
+++ b/lldb/packages/Python/lldbsuite/test/concurrent_base.py
@@ -166,7 +166,12 @@ def do_thread_actions(
 
         # Initialize the (single) watchpoint on the global variable (g_watchme)
         if num_watchpoint_threads + num_delay_watchpoint_threads > 0:
-            self.runCmd("watchpoint set variable g_watchme")
+            # The concurrent tests have multiple threads modifying a variable
+            # with the same value.  The default "modify" style watchpoint will
+            # only report this as 1 hit for all threads, because they all wrote
+            # the same value.  The testsuite needs "write" style watchpoints to
+            # get the correct number of hits reported.
+            self.runCmd("watchpoint set variable -w write g_watchme")
             for w in self.inferior_target.watchpoint_iter():
                 self.thread_watchpoint = w
                 self.assertTrue(
diff --git a/lldb/source/Breakpoint/CMakeLists.txt b/lldb/source/Breakpoint/CMakeLists.txt
index 3b39189e52587..2fa659f803c28 100644
--- a/lldb/source/Breakpoint/CMakeLists.txt
+++ b/lldb/source/Breakpoint/CMakeLists.txt
@@ -21,6 +21,7 @@ add_lldb_library(lldbBreakpoint NO_PLUGIN_DEPENDENCIES
   StoppointSite.cpp
   StopPointSiteList.cpp
   Watchpoint.cpp
+  WatchpointAlgorithms.cpp
   WatchpointList.cpp
   WatchpointOptions.cpp
   WatchpointResource.cpp
diff --git a/lldb/source/Breakpoint/Watchpoint.cpp b/lldb/source/Breakpoint/Watchpoint.cpp
index 1a8ef87c1e67a..7728fd09a07ae 100644
--- a/lldb/source/Breakpoint/Watchpoint.cpp
+++ b/lldb/source/Breakpoint/Watchpoint.cpp
@@ -45,10 +45,16 @@ Watchpoint::Watchpoint(Target &target, lldb::addr_t addr, uint32_t size,
       LLDB_LOG_ERROR(GetLog(LLDBLog::Watchpoints), std::move(err),
                      "Failed to set type: {0}");
     } else {
-      if (auto ts = *type_system_or_err)
-        m_type =
-            ts->GetBuiltinTypeForEncodingAndBitSize(eEncodingUint, 8 * size);
-      else
+      if (auto ts = *type_system_or_err) {
+        if (size <= target.GetArchitecture().GetAddressByteSize()) {
+          m_type =
+              ts->GetBuiltinTypeForEncodingAndBitSize(eEncodingUint, 8 * size);
+        } else {
+          CompilerType clang_uint8_type =
+              ts->GetBuiltinTypeForEncodingAndBitSize(eEncodingUint, 8);
+          m_type = clang_uint8_type.GetArrayType(size);
+        }
+      } else
         LLDB_LOG_ERROR(GetLog(LLDBLog::Watchpoints), std::move(err),
                        "Failed to set type: Typesystem is no longer live: {0}");
     }
@@ -352,6 +358,20 @@ void Watchpoint::DumpWithLevel(Stream *s,
       s->Printf("\n    declare @ '%s'", m_decl_str.c_str());
     if (!m_watch_spec_str.empty())
       s->Printf("\n    watchpoint spec = '%s'", m_watch_spec_str.c_str());
+    if (IsEnabled()) {
+      if (ProcessSP process_sp = m_target.GetProcessSP()) {
+        auto &resourcelist = process_sp->GetWatchpointResourceList();
+        size_t idx = 0;
+        s->Printf("\n    watchpoint resources:");
+        for (WatchpointResourceSP &wpres : resourcelist.Sites()) {
+          if (wpres->ConstituentsContains(this)) {
+            s->Printf("\n       #%zu: ", idx);
+            wpres->Dump(s);
+          }
+          idx++;
+        }
+      }
+    }
 
     // Dump the snapshots we have taken.
     DumpSnapshots(s, "    ");
diff --git a/lldb/source/Breakpoint/WatchpointAlgorithms.cpp b/lldb/source/Breakpoint/WatchpointAlgorithms.cpp
new file mode 100644
index 0000000000000..c1fc0f75ab288
--- /dev/null
+++ b/lldb/source/Breakpoint/WatchpointAlgorithms.cpp
@@ -0,0 +1,141 @@
+//===-- WatchpointAlgorithms.cpp ------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "lldb/Breakpoint/WatchpointAlgorithms.h"
+#include "lldb/Breakpoint/WatchpointResource.h"
+#include "lldb/Target/Process.h"
+#include "lldb/Utility/ArchSpec.h"
+
+#include <utility>
+#include <vector>
+
+using namespace lldb;
+using namespace lldb_private;
+
+std::vector<WatchpointResourceSP>
+WatchpointAlgorithms::AtomizeWatchpointRequest(
+    addr_t addr, size_t size, bool read, bool write,
+    WatchpointHardwareFeature supported_features, ArchSpec &arch) {
+
+  std::vector<std::pair<addr_t, size_t>> entries;
+
+  if (supported_features &
+      WatchpointHardwareFeature::eWatchpointHardwareArmMASK) {
+    entries =
+        PowerOf2Watchpoints(addr, size,
+                            /* min_byte_size */ 1,
+                            /* max_byte_size */ 2147483648,
+                            /* address_byte_size */ arch.GetAddressByteSize());
+  } else {
+    // As a fallback, assume we can watch any power-of-2
+    // number of bytes up through the size of an address in the target.
+    entries =
+        PowerOf2Watchpoints(addr, size,
+                            /* min_byte_size */ 1,
+                            /* max_byte_size */ arch.GetAddressByteSize(),
+                            /* address_byte_size */ arch.GetAddressByteSize());
+  }
+
+  std::vector<WatchpointResourceSP> resources;
+  for (std::pair<addr_t, size_t> &ent : entries) {
+    addr_t addr = std::get<0>(ent);
+    size_t size = std::get<1>(ent);
+    WatchpointResourceSP wp_res_sp =
+        std::make_shared<WatchpointResource>(addr, size, read, write);
+    resources.push_back(wp_res_sp);
+  }
+
+  return resources;
+}
+
+/// Convert a user's watchpoint request (\a user_addr and \a user_size)
+/// into hardware watchpoints, for a target that can watch a power-of-2
+/// region of memory (1, 2, 4, 8, etc), aligned to that same power-of-2
+/// memory address.
+///
+/// If a user asks to watch 4 bytes at address 0x1002 (0x1002-0x1005
+/// inclusive) we can implement this with two 2-byte watchpoints
+/// (0x1002 and 0x1004) or with an 8-byte watchpoint at 0x1000.
+/// A 4-byte watchpoint at 0x1002 would not be properly 4 byte aligned.
+///
+/// If a user asks to watch 16 bytes at 0x1000, and this target supports
+/// 8-byte watchpoints, we can implement this with two 8-byte watchpoints
+/// at 0x1000 and 0x1008.
+
+std::vector<std::pair<addr_t, size_t>>
+WatchpointAlgorithms::PowerOf2Watchpoints(addr_t user_addr, size_t user_size,
+                                          size_t min_byte_size,
+                                          size_t max_byte_size,
+                                          uint32_t address_byte_size) {
+
+  // Can't watch zero bytes.
+  if (user_size == 0)
+    return {};
+
+  // The aligned watch region will be less than/equal to the size of
+  // an address in this target.
+  const int address_bit_size = address_byte_size * 8;
+
+  size_t aligned_size = std::max(user_size, min_byte_size);
+  /// Round up \a user_size to the next power-of-2 size
+  /// user_size == 8   -> aligned_size == 8
+  /// user_size == 9   -> aligned_size == 16
+  /// user_size == 15  -> aligned_size == 16
+  /// user_size == 192 -> aligned_size == 256
+  /// Could be `std::bit_ceil(aligned_size)` when we build with C++20?
+
+  aligned_size = 1ULL << (address_bit_size - __builtin_clzll(aligned_size - 1));
+
+  addr_t aligned_start = user_addr & ~(aligned_size - 1);
+
+  // Does this power-of-2 memory range, aligned to power-of-2 that the
+  // hardware can watch, completely cover the requested region.
+  if (aligned_size <= max_byte_size &&
+      aligned_start + aligned_size >= user_addr + user_size)
+    return {{aligned_start, aligned_size}};
+
+  // If the maximum region we can watch is larger than the aligned
+  // size, try increasing the region size by one power of 2 and see
+  // if aligning to that amount can cover the requested region.
+  //
+  // Increasing the aligned_size repeatedly instead of splitting the
+  // watchpoint can result in us watching large regions of memory
+  // unintentionally when we could use small two watchpoints.  e.g.
+  //    user_addr 0x
+  if (max_byte_size >= (aligned_size << 1)) {
+    aligned_size <<= 1;
+    aligned_start = user_addr & ~(aligned_size - 1);
+    if (aligned_size <= max_byte_size &&
+        aligned_start + aligned_size >= user_addr + user_size)
+      return {{aligned_start, aligned_size}};
+
+    // Go back to our original aligned size, to try the multiple
+    // watchpoint approach.
+    aligned_size >>= 1;
+  }
+
+  // We need to split the user's watchpoint into two or more watchpoints
+  // that can be monitored by hardware, because of alignment and/or size
+  // reasons.
+  aligned_size = std::min(aligned_size, max_byte_size);
+  aligned_start = user_addr & ~(aligned_size - 1);
+
+  std::vector<std::pair<addr_t, size_t>> result;
+  addr_t current_address = aligned_start;
+  const addr_t user_end_address = user_addr + user_size;
+  while (current_address + aligned_size < user_end_address) {
+    result.push_back({current_address, aligned_size});
+    current_address += aligned_size;
+  }
+
+  if (current_address < user_end_address) {
+    result.push_back({current_address, aligned_size});
+  }
+
+  return result;
+}
diff --git a/lldb/source/Breakpoint/WatchpointResource.cpp b/lldb/source/Breakpoint/WatchpointResource.cpp
index c1eb50c0358b3..8f15fc7c49583 100644
--- a/lldb/source/Breakpoint/WatchpointResource.cpp
+++ b/lldb/source/Breakpoint/WatchpointResource.cpp
@@ -9,6 +9,7 @@
 #include <assert.h>
 
 #include "lldb/Breakpoint/WatchpointResource.h"
+#include "lldb/Utility/Stream.h"
 
 #include <algorithm>
 
@@ -113,7 +114,8 @@ bool WatchpointResource::ShouldStop(StoppointCallbackContext *context) {
 }
 
 void WatchpointResource::Dump(Stream *s) const {
-  return; // LWP_TODO
+  s->Printf("addr = 0x%8.8" PRIx64 " size = %zu", m_addr, m_size);
+  return;
 }
 
 wp_resource_id_t WatchpointResource::GetNextID() {
diff --git a/lldb/source/Commands/CommandObjectWatchpoint.cpp b/lldb/source/Commands/CommandObjectWatchpoint.cpp
index c80868d33905e..438a16c50bd67 100644
--- a/lldb/source/Commands/CommandObjectWatchpoint.cpp
+++ b/lldb/source/Commands/CommandObjectWatchpoint.cpp
@@ -1139,9 +1139,22 @@ class CommandObjectWatchpointSetExpression : public CommandObjectRaw {
 
     // Fetch the type from the value object, the type of the watched object is
     // the pointee type
-    /// of the expression, so convert to that if we  found a valid type.
+    /// of the expression, so convert to that if we found a valid type.
     CompilerType compiler_type(valobj_sp->GetCompilerType());
 
+    std::optional<uint64_t> valobj_size = valobj_sp->GetByteSize();
+    // Set the type as a uint8_t array if the size being watched is
+    // larger than the ValueObject's size (which is probably the size
+    // of a pointer).
+    if (valobj_size && size > *valobj_size) {
+      auto type_system = compiler_type.GetTypeSystem();
+      if (type_system) {
+        CompilerType clang_uint8_type =
+            type_system->GetBuiltinTypeForEncodingAndBitSize(eEncodingUint, 8);
+        compiler_type = clang_uint8_type.GetArrayType(size);
+      }
+    }
+
     Status error;
     WatchpointSP watch_sp =
         target->CreateWatchpoint(addr, size, &compiler_type, watch_type, error);
diff --git a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
index 565941d3168f1..d756354f9bd27 100644
--- a/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
+++ b/lldb/source/Plugins/Process/Utility/StopInfoMachException.cpp
@@ -491,14 +491,13 @@ static StopInfoSP GetStopInfoForHardwareBP(Thread &thread, Target *target,
                                            uint64_t exc_sub_sub_code) {
   // Try hardware watchpoint.
   if (target) {
-    // LWP_TODO: We need to find the WatchpointResource that matches
-    // the address, and evaluate its Watchpoints.
-
     // The exc_sub_code indicates the data break address.
-    lldb::WatchpointSP wp_sp =
-        target->GetWatchpointList().FindByAddress((lldb::addr_t)exc_sub_code);
-    if (wp_sp && wp_sp->IsEnabled()) {
-      return StopInfo::CreateStopReasonWithWatchpointID(thread, wp_sp->GetID());
+    WatchpointResourceSP wp_rsrc_sp =
+        target->GetProcessSP()->GetWatchpointResourceList().FindByAddress(
+            (addr_t)exc_sub_code);
+    if (wp_rsrc_sp && wp_rsrc_sp->GetNumberOfConstituents() > 0) {
+      return StopInfo::CreateStopReasonWithWatchpointID(
+          thread, wp_rsrc_sp->GetConstituentAtIndex(0)->GetID());
     }
   }
 
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index 4a06027501a89..2e06d4d22f74e 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -24,6 +24,7 @@
 #include <sys/types.h>
 
 #include "lldb/Breakpoint/Watchpoint.h"
+#include "lldb/Breakpoint/WatchpointAlgorithms.h"
 #include "lldb/Breakpoint/WatchpointResource.h"
 #include "lldb/Core/Debugger.h"
 #include "lldb/Core/Module.h"
@@ -3153,23 +3154,22 @@ Status ProcessGDBRemote::EnableWatchpoint(WatchpointSP wp_sp, bool notify) {
   bool write = wp_sp->WatchpointWrite() || wp_sp->WatchpointModify();
   size_t size = wp_sp->GetByteSize();
 
-  // New WatchpointResources needed to implement this Watchpoint.
-  std::vector<WatchpointResourceSP> resources;
-
-  // LWP_TODO: Break up the user's request into pieces that can be watched
-  // given the capabilities of the target cpu / stub software.
-  // As a default, breaking the watched region up into target-pointer-sized,
-  // aligned, groups.
-  //
-  // Beyond the default, a stub can / should inform us of its capabilities,
-  // e.g. a stub that can do AArch64 power-of-2 MASK watchpoints.
-  //
-  // And the cpu may have unique capabilities. AArch64 BAS watchpoints
-  // can watch any sequential bytes in a doubleword, but Intel watchpoints
-  // can only watch 1, 2, 4, 8 bytes within a doubleword.
-  WatchpointResourceSP wp_res_sp =
-      std::make_shared<WatchpointResource>(addr, size, read, write);
-  resources.push_back(wp_res_sp);
+  ArchSpec target_arch = GetTarget().GetArchitecture();
+  WatchpointHardwareFeature supported_features =
+      eWatchpointHardwareFeatureUnknown;
+
+  // LWP_TODO: enable MASK watchpoint for arm64 debugserver
+  // when it reports that it supports them.
+  if (target_arch.GetTriple().getOS() == llvm::Triple::MacOSX &&
+      target_arch.GetTriple().getArch() == llvm::Triple::aarch64) {
+#if 0
+       supported_features |= eWatchpointHardwareArmMASK;
+#endif
+  }
+
+  std::vector<WatchpointResourceSP> resources =
+      WatchpointAlgorithms::AtomizeWatchpointRequest(
+          addr, size, read, write, supported_features, target_arch);
 
   // LWP_TODO: Now that we know the WP Resources needed to implement this
   // Watchpoint, we need to look at currently allocated Resources in the
diff --git a/lldb/source/Target/StopInfo.cpp b/lldb/source/Target/StopInfo.cpp
index 3b65d661c1abd..95f78056b1644 100644
--- a/lldb/source/Target/StopInfo.cpp
+++ b/lldb/source/Target/StopInfo.cpp
@@ -987,8 +987,10 @@ class StopInfoWatchpoint : public StopInfo {
 
         // Don't stop if the watched region value is unmodified, and
         // this is a Modify-type watchpoint.
-        if (m_should_stop && !wp_sp->WatchedValueReportable(exe_ctx))
+        if (m_should_stop && !wp_sp->WatchedValueReportable(exe_ctx)) {
+          wp_sp->UndoHitCount();
           m_should_stop = false;
+        }
 
         // Finally, if we are going to stop, print out the new & old values:
         if (m_should_stop) {
diff --git a/lldb/test/API/functionalities/watchpoint/large-watchpoint/TestLargeWatchpoint.py b/lldb/test/API/functionalities/watchpoint/large-watchpoint/TestLargeWatchpoint.py
index c5e161497e628..f7ceb47c0b615 100644
--- a/lldb/test/API/functionalities/watchpoint/large-watchpoint/TestLargeWatchpoint.py
+++ b/lldb/test/API/functionalities/watchpoint/large-watchpoint/TestLargeWatchpoint.py
@@ -25,6 +25,11 @@ def continue_and_report_stop_reason(self, process, iter_str):
     @skipIf(archs=no_match(["arm64", "arm64e", "aarch64"]))
     @skipUnlessDarwin
 
+    # LWP_TODO: until debugserver advertises that it supports
+    # MASK watchpoints, this test can't be enabled, lldb won't
+    # try to send watchpoints larger than 8 bytes.
+    @skipIfDarwin
+
     # debugserver only gained the ability to watch larger regions
     # with this patch.
     @skipIfOutOfTreeDebugserver
diff --git a/lldb/test/API/functionalities/watchpoint/unaligned-large-watchpoint/Makefile b/lldb/test/API/functionalities/watchpoint/unaligned-large-watchpoint/Makefile
new file mode 100644
index 0000000000000..10495940055b6
--- /dev/null
+++ b/lldb/test/API/functionalities/watchpoint/unaligned-large-watchpoint/Makefile
@@ -0,0 +1,3 @@
+C_SOURCES := main.c
+
+include Makefile.rules
diff --git a/lldb/test/API/functionalities/watchpoint/unaligned-large-watchpoint...
[truncated]

trying to minimize the number of hardware watchpoint registers used
for a single watchpoint.  The pathological cases can lead to watching
giant regions of memory for a small object.
Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the algorithm in great detail yet but a lot of the surrounding stuff looks pretty reasonable I think.

lldb/include/lldb/Breakpoint/WatchpointAlgorithms.h Outdated Show resolved Hide resolved
lldb/source/Breakpoint/WatchpointAlgorithms.cpp Outdated Show resolved Hide resolved
Add doxygen comments for WatchpointAlgorithms.h.
Use INT32_MAX instead of the hardcoded value in
WatchpointAlgorithms.cpp.
Watch a 16-byte object and make sure we get a watchpoint hit for
every write to it.  This isn't an unaligned object but it's an easy
place to add this test for now, to make sure we pick up the object
size properly from the variable's type.
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on documenting the code, it's a pleasure to read through. The algorithm makes sense to me and it has solid test coverage. I've left some nits but overall this LGTM.

lldb/include/lldb/Breakpoint/WatchpointAlgorithms.h Outdated Show resolved Hide resolved
lldb/include/lldb/lldb-enumerations.h Outdated Show resolved Hide resolved
lldb/source/Breakpoint/WatchpointAlgorithms.cpp Outdated Show resolved Hide resolved
lldb/source/Breakpoint/WatchpointAlgorithms.cpp Outdated Show resolved Hide resolved
lldb/source/Breakpoint/WatchpointAlgorithms.cpp Outdated Show resolved Hide resolved
lldb/include/lldb/Breakpoint/WatchpointAlgorithms.h Outdated Show resolved Hide resolved
@jasonmolenda jasonmolenda merged commit 57c66b3 into llvm:main Jan 31, 2024
4 checks passed
@jasonmolenda jasonmolenda deleted the large-watchpoint-algorithm-to-create-watchpointresources branch January 31, 2024 17:41
Comment on lines +101 to +105
// For the unittests to have access to the individual algorithms
class WatchpointAlgorithmsTest : public WatchpointAlgorithms {
public:
using WatchpointAlgorithms::PowerOf2Watchpoints;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've put this in the unittest.

WatchpointHardwareFeature::eWatchpointHardwareArmMASK) {
entries =
PowerOf2Watchpoints(addr, size,
/*min_byte_size*/ 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing the = at the end of the comment.

/*min_byte_size=*/ 1,

jasonmolenda added a commit that referenced this pull request Jan 31, 2024
@jasonmolenda
Copy link
Collaborator Author

Merged this PR and found two unexpected (and one expected that I forgot to skip a test on) CI bot failures. I've collected enough logs to debug them locally I think, I reverted this patch from main until I have fixes and I will reland.

jasonmolenda added a commit that referenced this pull request Feb 1, 2024
This patch is the next piece of work in my Large Watchpoint proposal,
https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116

This patch breaks a user's watchpoint into one or more
WatchpointResources which reflect what the hardware registers can cover.
This means we can watch objects larger than 8 bytes, and we can watched
unaligned address ranges. On a typical 64-bit target with 4 watchpoint
registers you can watch 32 bytes of memory if the start address is
doubleword aligned.

Additionally, if the remote stub implements AArch64 MASK style
watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2
size region of memory up to 2GB, aligned to that same size.

I updated the Watchpoint constructor and CommandObjectWatchpoint to
create a CompilerType of Array<UInt8> when the size of the watched
region is greater than pointer-size and we don't have a variable type to
use. For pointer-size and smaller, we can display the watched granule as
an integer value; for larger-than-pointer-size we will display as an
array of bytes.

I have `watchpoint list` now print the WatchpointResources used to
implement the watchpoint.

I added a WatchpointAlgorithm class which has a top-level static method
that takes an enum flag mask WatchpointHardwareFeature and a user
address and size, and returns a vector of WatchpointResources covering
the request. It does not take into account the number of watchpoint
registers the target has, or the number still available for use. Right
now there is only one algorithm, which monitors power-of-2 regions of
memory. For up to pointer-size, this is what Intel hardware supports.
AArch64 Byte Address Select watchpoints can watch any number of
contiguous bytes in a pointer-size memory granule, that is not currently
supported so if you ask to watch bytes 3-5, the algorithm will watch the
entire doubleword (8 bytes). The newly default "modify" style means we
will silently ignore modifications to bytes outside the watched range.

I've temporarily skipped TestLargeWatchpoint.py for all targets. It was
only run on Darwin when using the in-tree debugserver, which was a proxy
for "debugserver supports MASK watchpoints". I'll be adding the
aforementioned feature flag from the stub and enabling full mask
watchpoints when a debugserver with that feature is enabled, and
re-enable this test.

I added a new TestUnalignedLargeWatchpoint.py which only has one test
but it's a great one, watching a 22-byte range that is unaligned and
requires four 8-byte watchpoints to cover.

I also added a unit test, WatchpointAlgorithmsTests, which has a number
of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I
think there's interesting possible different approaches to how we cover
these; I note in the unit test that a user requesting a watch on address
0x12e0 of 120 bytes will be covered by two watchpoints today, a
128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte
watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer
false positives/private stops. As we try refining this one, it's helpful
to have a collection of tests to make sure things don't regress.

I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64
Ubuntu. I have not modifed the Windows process plugins yet, I might try
that as a standalone patch, I'd be making the change blind, but the
necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty
small so it might be obvious enough that I can change it and see what
the Windows CI thinks.

There isn't yet a packet (or a qSupported feature query) for the gdb
remote serial protocol stub to communicate its watchpoint capabilities
to lldb. I'll be doing that in a patch right after this is landed,
having debugserver advertise its capability of AArch64 MASK watchpoints,
and have ProcessGDBRemote add eWatchpointHardwareArmMASK to
WatchpointAlgorithms so we can watch larger than 32-byte requests on
Darwin.

I haven't yet tackled WatchpointResource *sharing* by multiple
Watchpoints. This is all part of the goal, especially when we may be
watching a larger memory range than the user requested, if they then add
another watchpoint next to their first request, it may be covered by the
same WatchpointResource (hardware watchpoint register). Also one "read"
watchpoint and one "write" watchpoint on the same memory granule need to
be handled, making the WatchpointResource cover all requests.

As WatchpointResources aren't shared among multiple Watchpoints yet,
there's no handling of running the conditions/commands/etc on multiple
Watchpoints when their shared WatchpointResource is hit. The goal beyond
"large watchpoint" is to unify (much more) the Watchpoint and Breakpoint
behavior and commands. I have a feeling I may be slowly chipping away at
this for a while.

Re-landing this patch after fixing two undefined behaviors in
WatchpointAlgorithms found by UBSan and by failures on different
CI bots.

rdar://108234227
jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Feb 1, 2024
This patch is the next piece of work in my Large Watchpoint proposal,
https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116

This patch breaks a user's watchpoint into one or more
WatchpointResources which reflect what the hardware registers can cover.
This means we can watch objects larger than 8 bytes, and we can watched
unaligned address ranges. On a typical 64-bit target with 4 watchpoint
registers you can watch 32 bytes of memory if the start address is
doubleword aligned.

Additionally, if the remote stub implements AArch64 MASK style
watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2
size region of memory up to 2GB, aligned to that same size.

I updated the Watchpoint constructor and CommandObjectWatchpoint to
create a CompilerType of Array<UInt8> when the size of the watched
region is greater than pointer-size and we don't have a variable type to
use. For pointer-size and smaller, we can display the watched granule as
an integer value; for larger-than-pointer-size we will display as an
array of bytes.

I have `watchpoint list` now print the WatchpointResources used to
implement the watchpoint.

I added a WatchpointAlgorithm class which has a top-level static method
that takes an enum flag mask WatchpointHardwareFeature and a user
address and size, and returns a vector of WatchpointResources covering
the request. It does not take into account the number of watchpoint
registers the target has, or the number still available for use. Right
now there is only one algorithm, which monitors power-of-2 regions of
memory. For up to pointer-size, this is what Intel hardware supports.
AArch64 Byte Address Select watchpoints can watch any number of
contiguous bytes in a pointer-size memory granule, that is not currently
supported so if you ask to watch bytes 3-5, the algorithm will watch the
entire doubleword (8 bytes). The newly default "modify" style means we
will silently ignore modifications to bytes outside the watched range.

I've temporarily skipped TestLargeWatchpoint.py for all targets. It was
only run on Darwin when using the in-tree debugserver, which was a proxy
for "debugserver supports MASK watchpoints". I'll be adding the
aforementioned feature flag from the stub and enabling full mask
watchpoints when a debugserver with that feature is enabled, and
re-enable this test.

I added a new TestUnalignedLargeWatchpoint.py which only has one test
but it's a great one, watching a 22-byte range that is unaligned and
requires four 8-byte watchpoints to cover.

I also added a unit test, WatchpointAlgorithmsTests, which has a number
of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I
think there's interesting possible different approaches to how we cover
these; I note in the unit test that a user requesting a watch on address
0x12e0 of 120 bytes will be covered by two watchpoints today, a
128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte
watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer
false positives/private stops. As we try refining this one, it's helpful
to have a collection of tests to make sure things don't regress.

I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64
Ubuntu. I have not modifed the Windows process plugins yet, I might try
that as a standalone patch, I'd be making the change blind, but the
necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty
small so it might be obvious enough that I can change it and see what
the Windows CI thinks.

There isn't yet a packet (or a qSupported feature query) for the gdb
remote serial protocol stub to communicate its watchpoint capabilities
to lldb. I'll be doing that in a patch right after this is landed,
having debugserver advertise its capability of AArch64 MASK watchpoints,
and have ProcessGDBRemote add eWatchpointHardwareArmMASK to
WatchpointAlgorithms so we can watch larger than 32-byte requests on
Darwin.

I haven't yet tackled WatchpointResource *sharing* by multiple
Watchpoints. This is all part of the goal, especially when we may be
watching a larger memory range than the user requested, if they then add
another watchpoint next to their first request, it may be covered by the
same WatchpointResource (hardware watchpoint register). Also one "read"
watchpoint and one "write" watchpoint on the same memory granule need to
be handled, making the WatchpointResource cover all requests.

As WatchpointResources aren't shared among multiple Watchpoints yet,
there's no handling of running the conditions/commands/etc on multiple
Watchpoints when their shared WatchpointResource is hit. The goal beyond
"large watchpoint" is to unify (much more) the Watchpoint and Breakpoint
behavior and commands. I have a feeling I may be slowly chipping away at
this for a while.

Re-landing this patch after fixing two undefined behaviors in
WatchpointAlgorithms found by UBSan and by failures on different
CI bots.

rdar://108234227
(cherry picked from commit 147d7a6)
@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Feb 1, 2024

I didn't have time to review before merge, just looking at this now.

see what the Windows CI thinks.

I assume you mean the buildbots (lldb isn't built in the pre-commit CI), but I think given:
https://github.com/llvm/llvm-zorg/blob/590f0a62919ad313758362d18a31e7d40255e6e6/buildbot/osuosl/master/config/builders.py#L1342
The tests wouldn't run there anyway due to --skip-category=watchpoint.

I don't know specifically why we skip those tests on our bot. Could be that the Windows APIs are target specific like ptrace is, and we just didn't implement it for Windows on Arm yet, could be that the hardware we have doesn't support it.

I'll found out what the status is there.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised this isn't horribly broken on Linux :)

Just a bunch of nits.

class WatchpointAlgorithms {

public:
/// Convert a user's watchpoint request into an array of memory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean
one register = one memory region
Or
one register = all the memory regions (at once)

I think the former but the phrasing implies the latter.

How about:
"an array of memory regions, each of which can be watched by one"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good clarification.

struct Region {
lldb::addr_t addr;
size_t size;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a lot like Range from lldb/include/lldb/Utility/RangeMap.h. If any of the existing methods on that seem useful, consider using that.

(though iirc, it does get cumbersome at times)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally using a std::pair, it's a simple POD tuple used only within the WatchpointAlgorithms class (and the unittest). If I was passing it outside of these methods I'd agree shouldn't introduce Yet Another Range object.


protected:
/// Convert a user's watchpoint request into an array of addr+size that
/// can be watched with power-of-2 style hardware watchpoints.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So again, "each of which can be watched with".

///
/// This is the default algorithm if we have no further information;
/// most watchpoint implementations can be assumed to be able to watch up
/// to pointer-size regions of memory in power-of-2 sizes and alingments.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointer sized means sizeof(void*) or literally 2^(virtual address bits)-1 sized?

alingments -> alignments

/// The user's specified byte length.
///
/// \param[in] min_byte_size
/// The minimum byte size supported on this target.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful to say that this is the minimum byte size of the range of memory.

Lest anyone think this thing supports architectures where a byte is in fact 8 bytes.

@@ -113,7 +114,8 @@ bool WatchpointResource::ShouldStop(StoppointCallbackContext *context) {
}

void WatchpointResource::Dump(Stream *s) const {
return; // LWP_TODO
s->Printf("addr = 0x%8.8" PRIx64 " size = %zu", m_addr, m_size);
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need a return, unless this thing is supposed to return something?

"""Test watching an unaligned region of memory that requires multiple watchpoints."""
self.build()
self.main_source_file = lldb.SBFileSpec("main.c")
(target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need ():
target, process, thread, bkpt = ...
Works fine.

Comment on lines +61 to +72
c_count = 0
reason = self.continue_and_report_stop_reason(process, "continue #%d" % c_count)
while reason == lldb.eStopReasonWatchpoint:
c_count = c_count + 1
reason = self.continue_and_report_stop_reason(
process, "continue #%d" % c_count
)
self.assertLessEqual(c_count, 22)

self.assertEqual(c_count, 22)
self.expect("watchpoint list -v", substrs=["hit_count = 22"])
self.assertEqual(wp.GetHitCount(), 22)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could go in some expected_watchpoint_hits(self, n) function and be called below as well.


# Now try watching a 16 byte variable
# (not unaligned, but a good check to do anyway)
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray #

#
frame = thread.GetFrameAtIndex(0)
err = lldb.SBError()
wp = frame.locals["variable"][0].Watch(True, False, True, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the bools here doing?

@jasonmolenda
Copy link
Collaborator Author

Thanks for the great feedback @DavidSpickett sorry I merged it before you had a chance to look. These are good points, I'll land a cleanup patch in a bit to address them.

carlosgalvezp pushed a commit to carlosgalvezp/llvm-project that referenced this pull request Feb 1, 2024
carlosgalvezp pushed a commit to carlosgalvezp/llvm-project that referenced this pull request Feb 1, 2024
This patch is the next piece of work in my Large Watchpoint proposal,
https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116

This patch breaks a user's watchpoint into one or more
WatchpointResources which reflect what the hardware registers can cover.
This means we can watch objects larger than 8 bytes, and we can watched
unaligned address ranges. On a typical 64-bit target with 4 watchpoint
registers you can watch 32 bytes of memory if the start address is
doubleword aligned.

Additionally, if the remote stub implements AArch64 MASK style
watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2
size region of memory up to 2GB, aligned to that same size.

I updated the Watchpoint constructor and CommandObjectWatchpoint to
create a CompilerType of Array<UInt8> when the size of the watched
region is greater than pointer-size and we don't have a variable type to
use. For pointer-size and smaller, we can display the watched granule as
an integer value; for larger-than-pointer-size we will display as an
array of bytes.

I have `watchpoint list` now print the WatchpointResources used to
implement the watchpoint.

I added a WatchpointAlgorithm class which has a top-level static method
that takes an enum flag mask WatchpointHardwareFeature and a user
address and size, and returns a vector of WatchpointResources covering
the request. It does not take into account the number of watchpoint
registers the target has, or the number still available for use. Right
now there is only one algorithm, which monitors power-of-2 regions of
memory. For up to pointer-size, this is what Intel hardware supports.
AArch64 Byte Address Select watchpoints can watch any number of
contiguous bytes in a pointer-size memory granule, that is not currently
supported so if you ask to watch bytes 3-5, the algorithm will watch the
entire doubleword (8 bytes). The newly default "modify" style means we
will silently ignore modifications to bytes outside the watched range.

I've temporarily skipped TestLargeWatchpoint.py for all targets. It was
only run on Darwin when using the in-tree debugserver, which was a proxy
for "debugserver supports MASK watchpoints". I'll be adding the
aforementioned feature flag from the stub and enabling full mask
watchpoints when a debugserver with that feature is enabled, and
re-enable this test.

I added a new TestUnalignedLargeWatchpoint.py which only has one test
but it's a great one, watching a 22-byte range that is unaligned and
requires four 8-byte watchpoints to cover.

I also added a unit test, WatchpointAlgorithmsTests, which has a number
of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I
think there's interesting possible different approaches to how we cover
these; I note in the unit test that a user requesting a watch on address
0x12e0 of 120 bytes will be covered by two watchpoints today, a
128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte
watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer
false positives/private stops. As we try refining this one, it's helpful
to have a collection of tests to make sure things don't regress.

I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64
Ubuntu. I have not modifed the Windows process plugins yet, I might try
that as a standalone patch, I'd be making the change blind, but the
necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty
small so it might be obvious enough that I can change it and see what
the Windows CI thinks.

There isn't yet a packet (or a qSupported feature query) for the gdb
remote serial protocol stub to communicate its watchpoint capabilities
to lldb. I'll be doing that in a patch right after this is landed,
having debugserver advertise its capability of AArch64 MASK watchpoints,
and have ProcessGDBRemote add eWatchpointHardwareArmMASK to
WatchpointAlgorithms so we can watch larger than 32-byte requests on
Darwin.

I haven't yet tackled WatchpointResource *sharing* by multiple
Watchpoints. This is all part of the goal, especially when we may be
watching a larger memory range than the user requested, if they then add
another watchpoint next to their first request, it may be covered by the
same WatchpointResource (hardware watchpoint register). Also one "read"
watchpoint and one "write" watchpoint on the same memory granule need to
be handled, making the WatchpointResource cover all requests.

As WatchpointResources aren't shared among multiple Watchpoints yet,
there's no handling of running the conditions/commands/etc on multiple
Watchpoints when their shared WatchpointResource is hit. The goal beyond
"large watchpoint" is to unify (much more) the Watchpoint and Breakpoint
behavior and commands. I have a feeling I may be slowly chipping away at
this for a while.

Re-landing this patch after fixing two undefined behaviors in
WatchpointAlgorithms found by UBSan and by failures on different
CI bots.

rdar://108234227
@jasonmolenda
Copy link
Collaborator Author

I didn't have time to review before merge, just looking at this now.

see what the Windows CI thinks.

I assume you mean the buildbots (lldb isn't built in the pre-commit CI), but I think given: https://github.com/llvm/llvm-zorg/blob/590f0a62919ad313758362d18a31e7d40255e6e6/buildbot/osuosl/master/config/builders.py#L1342 The tests wouldn't run there anyway due to --skip-category=watchpoint.

Ah, thanks. I saw the buildbot fail and assumed it was from the TestUnalignedLargeWatchpoint test I added. But it was more likely by the fact (Jonas later pointed out) that the compiler on Windows doesn't have __builtin_clzll which I was using (I switched to the llvm equiv in the final version).

jasonmolenda added a commit that referenced this pull request Feb 2, 2024
David Spickett had several suggestions for
#79962 after I'd
already merged it.  Address those.
@DavidSpickett
Copy link
Collaborator

I don't know specifically why we skip those tests on our bot. Could be that the Windows APIs are target specific like ptrace is, > and we just didn't implement it for Windows on Arm yet, could be that the hardware we have doesn't support it.

I'll found out what the status is there.

@jasonmolenda, #80665.

They could work but there is work to be done to get there. You don't need to consider it The tests should work on x86 Windows, but there's no bot for that anymore. Maybe someone pops up in future with a report if it's being tested elsewhere.

@DavidSpickett
Copy link
Collaborator

Unlikely even that given the range breakpoint stuff is all on AArch64 anyway.

agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
This patch is the next piece of work in my Large Watchpoint proposal,
https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116

This patch breaks a user's watchpoint into one or more
WatchpointResources which reflect what the hardware registers can cover.
This means we can watch objects larger than 8 bytes, and we can watched
unaligned address ranges. On a typical 64-bit target with 4 watchpoint
registers you can watch 32 bytes of memory if the start address is
doubleword aligned.

Additionally, if the remote stub implements AArch64 MASK style
watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2
size region of memory up to 2GB, aligned to that same size.

I updated the Watchpoint constructor and CommandObjectWatchpoint to
create a CompilerType of Array<UInt8> when the size of the watched
region is greater than pointer-size and we don't have a variable type to
use. For pointer-size and smaller, we can display the watched granule as
an integer value; for larger-than-pointer-size we will display as an
array of bytes.

I have `watchpoint list` now print the WatchpointResources used to
implement the watchpoint.

I added a WatchpointAlgorithm class which has a top-level static method
that takes an enum flag mask WatchpointHardwareFeature and a user
address and size, and returns a vector of WatchpointResources covering
the request. It does not take into account the number of watchpoint
registers the target has, or the number still available for use. Right
now there is only one algorithm, which monitors power-of-2 regions of
memory. For up to pointer-size, this is what Intel hardware supports.
AArch64 Byte Address Select watchpoints can watch any number of
contiguous bytes in a pointer-size memory granule, that is not currently
supported so if you ask to watch bytes 3-5, the algorithm will watch the
entire doubleword (8 bytes). The newly default "modify" style means we
will silently ignore modifications to bytes outside the watched range.

I've temporarily skipped TestLargeWatchpoint.py for all targets. It was
only run on Darwin when using the in-tree debugserver, which was a proxy
for "debugserver supports MASK watchpoints". I'll be adding the
aforementioned feature flag from the stub and enabling full mask
watchpoints when a debugserver with that feature is enabled, and
re-enable this test.

I added a new TestUnalignedLargeWatchpoint.py which only has one test
but it's a great one, watching a 22-byte range that is unaligned and
requires four 8-byte watchpoints to cover.

I also added a unit test, WatchpointAlgorithmsTests, which has a number
of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I
think there's interesting possible different approaches to how we cover
these; I note in the unit test that a user requesting a watch on address
0x12e0 of 120 bytes will be covered by two watchpoints today, a
128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte
watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer
false positives/private stops. As we try refining this one, it's helpful
to have a collection of tests to make sure things don't regress.

I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64
Ubuntu. I have not modifed the Windows process plugins yet, I might try
that as a standalone patch, I'd be making the change blind, but the
necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty
small so it might be obvious enough that I can change it and see what
the Windows CI thinks.

There isn't yet a packet (or a qSupported feature query) for the gdb
remote serial protocol stub to communicate its watchpoint capabilities
to lldb. I'll be doing that in a patch right after this is landed,
having debugserver advertise its capability of AArch64 MASK watchpoints,
and have ProcessGDBRemote add eWatchpointHardwareArmMASK to
WatchpointAlgorithms so we can watch larger than 32-byte requests on
Darwin.

I haven't yet tackled WatchpointResource *sharing* by multiple
Watchpoints. This is all part of the goal, especially when we may be
watching a larger memory range than the user requested, if they then add
another watchpoint next to their first request, it may be covered by the
same WatchpointResource (hardware watchpoint register). Also one "read"
watchpoint and one "write" watchpoint on the same memory granule need to
be handled, making the WatchpointResource cover all requests.

As WatchpointResources aren't shared among multiple Watchpoints yet,
there's no handling of running the conditions/commands/etc on multiple
Watchpoints when their shared WatchpointResource is hit. The goal beyond
"large watchpoint" is to unify (much more) the Watchpoint and Breakpoint
behavior and commands. I have a feeling I may be slowly chipping away at
this for a while.

Re-landing this patch after fixing two undefined behaviors in
WatchpointAlgorithms found by UBSan and by failures on different
CI bots.

rdar://108234227
agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
David Spickett had several suggestions for
llvm#79962 after I'd
already merged it.  Address those.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants