Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating 24.2.x branch #109

Merged
merged 45 commits into from
Apr 30, 2024
Merged

Creating 24.2.x branch #109

merged 45 commits into from
Apr 30, 2024

Conversation

graphcareful
Copy link

@graphcareful graphcareful commented Apr 25, 2024

This PR creates a new branch with all of the upstream changes from seastar/master along with all of the changes that haven't yet been merged upstream that exist in our current fork, specifically the diff between seastar/master and 24.1.x. To achieve this normally we just perform a plain git rebase, however due to the fact that the order of the commits merged upstream did not always match the order of the commits as they entered our fork, the rebase was quiet difficult. Instead I opted to start with master and apply the missing commits on top. This is partly why the base of 24.2.x is seastar/master, that way reviewers can see only the differences I've applied between the forks.

How this was performed

The negative of such a process is that a commit may be missed due to human error. As such I've devised a strategy , using a master list of the delta between the two branches using the git diff command, I could determine which commits could be dropped (as they were already upstreamed) and which had to be cherry-picked on.

Diff between redpanda/24.1.x and upstream/master

git log master..redpanda/v24.1.x --oneline
Author Status Conflict Relevent PR Commit
@rockwotj Unmerged None ee2b79d Merge pull request #103 from redpanda-data/rockwood/wasm-escape-hatch
6c9d388 core/cpu_profiler: add danger zone escape hatch
@oleiman Unmerged e22885b Merge pull request #101 from oleiman/cb-trust-file-blob
Manual 7246372 tls: Include trust file contents with reload callback
@BenPope Unmerged eaeb66d Merge pull request #98 from BenPope/reactor_timers_missing_include
None 81676b6 internal/timers: Add missing #include
@pgellert Merged None Seastar PR 1847312 Merge pull request #97 from pgellert/path-param-decode
f3baa58 http/request: add get_path_param method
8f23b44 http/request: get_query_param refactor
51b0584 http/util: pass string_view by value
fb5a925 http/util: add path_decode method
@StephanDollberg Merged None Seastar PR 0e06279 Merge pull request #96 from redpanda-data/stephan/cp-io-submit-error
7f921ba reactor: Print correct errno on io_submit failure -
@StephanDollberg Merged None Seastar PR 1e1382e Merge pull request #95 from
fec20e9 perf: Calculate length of name column in perf tests
@andrwng Merged None Seastar PR 2ec2f4d Merge pull request #93 from andrwng/metrics-value-vector-deque
a19b192 metrics: change value_vector type to std::deque
@NyaliaLui Merged None Seastar PR 09eca6e metrics: replace vector with deque
aa40b12 metrics: change metadata vector to deque
14a130f Revert "metrics: change metadata vector to chunked_fifo"
@travisdowns Merged None Seastar PR 6cb1dc6 Fix edge case in memory sampler at OOM
@StephanDollberg Merged None d1d5354 Merge pull request #90 from redpanda-data/stephan/upstreams
b3c398c tls: linearize small packets on send
8c1e54c epoll: Avoid spinning on aborted connections
3c8bc61 Revert "epoll: Avoid spinning on aborted connections"
a8adecc Revert "tls: linearize small packets on send"
Seastar PR 7ca1eae Merge pull request #87 from redpanda-data/stephan/short-flush
Seastar PR c69081e Merge pull request #88 from redpanda-data/stephan/tls-linearize
1d6c9bc tls: linearize small packets on send
@StephanDollberg Unmerged d9175fc aio_general_context: Allow more than max_poll() queued iocbs
None 9fff5f3 aio_general_context: flush: abort on unexpected errors
None 5138d5f aio_general_context: flush: return short flush on error
@oleiman Unmerged None 08b399a Merge pull request #86 from oleiman/cert-info-bytes
342d73e net/tls: Adjust type for cert_info.serial
@michael-redpanda Merged None Seastar PR 87373c5 Merge pull request #83 from michael-redpanda/expose-addr-in-req
4cd5f11 httpd: Added server and client addresses to request structure
158c897 net: Added remote address accessors
e797fe8 Revert "httpd: Added server and client addresses to request structure
@StephanDollberg Unmerged None d3b7520 Merge pull request #84 from redpanda-data/stephan/update-aggregate
d593d0b metrics: Add update_aggregate_labels()
@StephanDollberg Merged None Seastar PR ecc25b4 Merge pull request #85 from redpanda-data/stephan/backport-when-all-
3b5a16b loop: Fix iterator_range_estimate_vector_capacity for random iters
@BenPope N/A Note: Upstream commit via Avi 576a2f0 tests: perf: shard_token_bucket: avoid capturing unused variables in
@oleiman Unmerged 2c1a195 Merge pull request #78 from oleiman/expose-tls-certs-rp
None b561bad tls_test: Add tests for new reload callback and cert_info accessors
Conflict 81595bb net/tls: Introduce cert_info and accessors
None e14d19d net/tls: Add reload_callback_with_creds
@oleiman Dropped Note: Build broke when commited, change backed out eaaf179 Merge pull request #79 from oleiman/revert-tls-get-impl
Dropped af0d071 Revert "Merge pull request #75 from oleiman/tls_cert_creds"
Dropped ab57ab6 Merge pull request #75 from oleiman/tls_cert_creds
@michael-redpanda c83e017 Merge pull request #76 from redpanda-data/boquard/add-client-info-
@oleiman Dropped 3a398d3 net/tls: Add reload_callback_with_creds
Merged None 5a8e7b8 net/tls: Introduce get_impl
@Lazin 58d13ee Merge pull request #77 from redpanda-data/expose-bandwidth-conf-v23.3
Unmerged Conflict eebb4d7 Store original values from io-priorities yaml
@michael-redpanda Merged Conflict Seastar PR 703d29a httpd: Added server and client addresses to request structure
@andijcr ca621a5 Merge pull request #69 from andijcr/fix/http_reply_namespace
Unmerged Dropped Note: Fixed upstream 27e72a9 http: fix deprecated warning, reply -> http::reply
@ballard26 Unmerged ee8c8da Merge pull request #68 from redpanda-data/cpu-profiler-fixes
None 3f6a975 core/cpu_profiler: finer-grain stats for why a sample was dropped
Conflict 67646ee tests/cpu_profiler: add tests that ensure correct behavior during exc
@ballard26 6f66c48 Merge pull request #70 from redpanda-data/thread-local
Unmerged None 1ba4eed tests/cpu_profiler: add test to verify that on_signal doesn't alloc
Unmerged None f564303 core/cpu_profiler: avoid taking samples during exception unwinding
@donwat Merged None Seastar PR 2dd01e1 smp: make service management semaphore thread local
@VladLazar Merged None Seastar PR c98680f Merge pull request #67 from VladLazar/23-3-tls-optional-eof-skip
a00ee53 tls: Optionally skip client EOF wait
@StephanDollberg Merged None Seastar PR 0a6c064 Merge pull request #61 from redpanda-data/stephan/epoll-fix-spin
None 7dc31f1 epoll: Avoid spinning on aborted connections
------------------ ---------- ---------- ------------------------------------------------------------- --------------------------------------------------------------------------------
@BenPope Note: 23.3.x cut e5207c4 Merge pull request #57 from BenPope/v23.3.x
------------------ ---------- ---------- ------------------------------------------------------------- --------------------------------------------------------------------------------
@BenPope Merged None Seastar Hash c54583d build: pass -DBoost_NO_CXX98_FUNCTION_BASE to C++ compiler
@andrwng Unmerged Dropped 310c15f metrics: change metadata vector to chunked_fifo
Unmerged Dropped 2bf2afd chunked_fifo: add default ctor for basic_iterator
@travisdowns Unmerged None 205b792 Match type for pure_check_for_work
@travisdowns 4674eb8 Do not use std::function for check_for_work()
@travisdowns Merged None Seastar PR abca2ea Add a basic sized deletion test
Unmerged Dropped 1e9c3e6 Optimize the allocator free path
Unmerged Dropped ac6de30 Optimize allocate path
@ballard26 Unmerged Conflict Note: Double check 7f48723 core: cpu profiler implementation
Unmerged None 7c391b5 core: add required include to circular_buffer_fixed_capacity
Unmerged Conflict 7692de6 tests/unit: refactor stall_detector test functions
Unmerged Conflict f97e35d reactor: refactor timers out of stall_detector
@travisdowns Merged Seastar PR 5a0b583 memory: Optimize free fast path
Merged Seastar PR cbd2ee8 memory: Optimize small alloc alloation path
Merged Seastar PR 0e2be7c memory: Limit alloc_sites size
Merged Seastar PR dfaf887 memory: Add general comment about sampling strategy
Merged Seastar PR 1c97d12 memory: Use probabilistic sampler
Merged Seastar PR 8878d0e util: Adapt memory sampler to seastar
Merged Seastar PR 760da9b util: Import Android Memory Sampler
Unmerged Dropped ab1a8ec memory: Use separate small pool for tracking sampled allocations
Unmerged Dropped 9ba50a9 memory: Support enabling memory profiling at runtime
Merged Seastar PR 4a3d6e3 memory: Add allocator perf tests
Unmerged None 149acb9 Revert default backend to aio
Unmerged None 8fa957f http: rename http client logger
@VladLazar Unmerged None 70f238b net/dns: fixed socket concurrent access
@VladLazar Unmerged None 2c540c1 scheduling_group: expose usage statistics
------------------ ---------- ---------- ------------------------------------------------------------- --------------------------------------------------------------------------------
@VladLazar Note: Unmerged Metrics PR
------------------ ---------- ---------- ------------------------------------------------------------- --------------------------------------------------------------------------------
Unmerged None 6e0f3d4 tests: add metrics replication unit tests
Unmerged Conflict 31fd128 metrics: public family replication interface
Unmerged None 0617543 metrics: register replicated metrics dinamically
Unmerged Conflict 1009a9a metrics: add metric replication internal interface
Unmerged None 42c4940 metrics: allow for removal of replicated metrics
Unmerged Conflict 06b347f metrics: add helpers for creation of replicas
Unmerged None 960d849 metrics: Expose 'skip_when_empty' for metrics
Unmerged None 7c75b52 scollectd: select internal metrics implementation
Unmerged Conflict d860e9f prometheus: support multiple metric impls
Unmerged Conflict 369c7c3 metrics: Use handle for impl object
Unmerged None ed9b04e metrics: expose metric impl handle to external api
Unmerged None 0fe29ce metrics: expose handle in metric_groups_impl
Unmerged Conflict ccb8be9 metrics: expose metric impl handle to internal api
Unmerged Conflict b4f9acf metrics: allow multiple metrics::impl instances
------------------ ---------- ---------- ------------------------------------------------------------- --------------------------------------------------------------------------------
Unmerged ?? 8539e51 http: use base_exception content type in non-json errors
Unmerged ?? bcd8ae6 http: don't jsonize exception if has content type
Unmerged ?? d978718 http: enable specifying a content type on exceptions
Merged Seastar PR d46f34d http: make redirect_exception work from inside futures
Merged Seastar PR 73fb553 http: permit custom status on redirect_exception
Unmerged Conflict 041e9de httpd: added listener index to http request
Merged None Seastar PR 88ed72a httpd: per listener credentials in http_server_control
Merged None Seastar PR 62d7c87 Fix escaping of regexes
Merged None Seastar PR 5b67af4 core: add sleep_abortable instantiation for manual_clock

https://redpandadata.atlassian.net/browse/CORE-2434

mmaslankaprv and others added 30 commits April 22, 2024 16:43
Seastar http server implementation supports multiple listeners. It may
be required for the handler logic to know which listener the connection
is coming from. Added listener_idx field to `httpd::request` to allow
handler recognize listener.

Signed-off-by: Michal Maslanka <michal@vectorized.io>
Since an exception carries some text for the response
body text, the raising site might like to specify
the content type if it's e.g. json.

Signed-off-by: John Spray <jcs@vectorized.io>
This enables throwing a base_exception from
a json request handler with a json payload
inside it.

Signed-off-by: John Spray <jcs@vectorized.io>
Signed-off-by: John Spray <jcs@vectorized.io>
Prior to this patch seastar only exposes one global metrics::impl::impl
object which holds all metric related data for one application.

This patch changes the implementation details such that multiple
metrics::impl::impl objects can exist for any given application.
Said objects are stored into a map on each shard and created
dinamically whenever requested. A metrics::impl::impl is identified
by an integer handle that acts as the key for the storage map.

Implementation note: in order to avoid issues caused by the ordering
of static thread_local objects I had to declare the storage in
reactor.cc.

(cherry picked from commit 585a8af)
This patch extends the metrics internal apis to use a specific
metrics::impl::impl object identified by its integer handle.

(cherry picked from commit 6ee4af7)
Add a public method to metric_groups_impl that exposes the handle
of the internal implementation it is using. This is required in order
for the metric_groups class to be able to reset itself to the configured
implementation handle.
This patch extends the metrics user facing apis to use a specific
metrics::impl::impl object identified by its integer handle.

Note that the constructor of 'metric_groups' is marked explicit
in this patch and updates two call sites where the constructor was used
implicitly.
This patch removes two subsequent calls to `get_local_impl` and reuses
the returned handle in that scope.
This patch extends the user facing prometheus apis allowing the user to
specify the internal metrics implementation to be used through a handle.
Additionally, 'add_prometheus_routes' now takes an argument that
specifies the route on which to advertise the metrics. This enables
different metrics "namespaces" to be served by different endpoints in
isolation.

(cherry picked from commit 6189522)
This patch extends the scollectd apis with the ability to select the
internal metrics implementation to be used by providing a handle.

(cherry picked from commit d4331d1)
This patch adds a 'get_skip_when_empy' getter to the 'registered_metric'
class. It is used by follow-up patches in order to replicate metrics.
This patch adds private methods to the 'metrics::impl' class that deal
with the creation of replicated metrics. They will be used to build the
public api in future commits.
This patch adds private helpers to 'metrics::impl' that deal with the
removal of replicated metric families from their destintation
implementation. These methods will be used in subsequent commits to
manage the lifetime of replicated metrics.
This patch adds a public method to the 'metrics::impl' class:
'set_metric_families_to_replicate'. When this method is called
the families that match any of the specifications will be replicated
on the specified destinations.
This patch extends the metric registration and unregistration processes
to make them aware of metric replication.

In the case of metric registration, if the new metric belongs to a
family that matches one of the replication specs, then a replicated
metric is created accordingly.

For unregistration of a metric, the replicated metric is unregistered
too if one exists.
This patch exposes a method in the public interface of the metrics
module ('replicate_metric_families'), which enables metric replication
internally for the requested metric families.
This commit extends the public interface of scheduling_group to expose
usage statistics (e.g. runtime).
The problem with DNS queries is caused by the fact that the UDP socket
is used concurrently i.e. write is scheduled while the other one is
pending.

DNS queries are based on c-ares library. The library is initialized with
a set of functions (callbacks) that are called whenever there is a need
to send, receive, create socket, connect or close.

Fixed processing of send requests leading to assertion being triggered
in reactor.

The fix involves checking if send is already pending and if so returning
an error. The error returned from send callback will instruct c-ares
library to retry the query.

Signed-off-by: Michal Maslanka <michal@vectorized.io>
Redpanda uses this logger name for its http client and we choose to
change the logger name in Seastar to avoid duplicate logger registration
exceptions.
We sort of inadvertently picked up the change to io_uring when
rebasing our seastar fork, but for the coming release we'd like to
keep using aio to reduce risk and give sufficient time to do
performance tests on io_uring.

This effectively rolls back the upstream commit:

eedca15

We simply put aio and epoll above io_uring in the available list,
the default backend is the first one in that list

Issue redpanda-data/redpanda#10105.
This change is to allow for the timer code in the stall detectors to be
used in a profiler implementation. The hope is to be able to reuse a lot
of the stall_detector codebase in the profiler without complicating the
existing stall_detector implementation.
Moves some functions and classes defined in the stall_detector test to a
separate header file so they can be reused in other tests.
Adds an include for <algorithm> as it's where `move_backward` is
defined.
The check_for_work() function is called in the core reactor loop (and
not just in the path where we are about to enter idle) and is assigned
to a std::function object outside the loop. This forces all invocations
of this function to go through the std::function machinery, which
means that among other things they cannot be inlined.

This does not appear to be intentional, rather the intent seems to have
been to DRY this small function as it is called from two places in
the reactor loop. Instead, use auto type to infer the lambda type and
avoid assignment to std::function, leading to something akin to a direct
call.
Before the reactor main loop a local pure_check_for_work std::function
is created on the stack. This is only ever passed to
the _idle_cpu_handler, which expects a
ss::noncopyable_function. The type mismatch means we need to clone the
std::function every time, creating a temporary
object to bind to the seastar function.

This causes the construction and destruction methods
to show up as a non-trivial portion of performance
profiles whenever the idle check occurs frequently enough.

Instead, just change the type to non-copyable function so the
reference can be passed directly without any copy.
In libgcc there is a critical section where the stack is being modified so
execution can return to an exceptions landing pad. However, this
partially modified state isn’t valid or specified by the dwarf info in
the eh_frames. So a seg fault tends to occur when `backtrace()` tries to
unwind through this partially modified stack and follows an invalid
pointer in it.
StephanDollberg and others added 8 commits April 24, 2024 14:41
Extends the metrics api to allow changing the aggregation labels of a
metrics family.

Otherwise one had to un-register every single metric instance in a
metric family and then re-register with the changed aggregation labels.

For metric families with thousands of instances (e.g.: histograms with
lots of different labels) this is quite expensive.

With this change we avoid the full reconstruction of the metrics family
and all its metrics. Only the work associated with marking the metrics
`dirty()` is needed then.
GnuTLS provides serial numbers not as a legible string of
hex digits but as a byte array corresponding to an integer
value up to 20 octets wide. The cert_info interface should
reflect this.

To that end, this commit updates the type of `cert_info::serial`
to an non-null-terminated basic_sstring of uint8_t.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
As seen in scylladb#1784,
having flush retry `io_submit` in a loop might be futile
if the kernel internal queue is full and we need to reap
completions to free up space for more iocb:s.

This change breaks out of the loop as soon as
`io_submit` fails and just moves the remaining entries
in the iocbs queue into the front so more entries can
be queued (otherwise, we'd need a circular queue that
will penalize the error-free path for the unlikely event of
an error)

Fixes scylladb#1784

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
EAGAIN is expected here when "Insufficient resources are available to queue any iocbs" (see io_submit(2)).
Abort on any other error, as those indicate an internal error on our side.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The `aio_general_context` had the implicit assumption that in a single
tick we would never queue more than `--max-networking-io-control-blocks`
events/iocbs. This however ignores situations such as queuing multiple
iocbs per socket per tick, having left over iocbs in the queue from
previous ticks via the new EAGAIN handling or simply because a lot more
sockets are in use which isn't prevented anywhere else.

If this condition was hit (`last > end`) the reactor would just assert
out and crash.

To avoid this situation this patch introduces a backlog into which
elements are being enqueued when the original array is full and which
can grow unbounded. This mirrors how the `aio_storage_context` works
which uses the `io_sink` for the same purpose.

To avoid oversized allocations after startup the split into two separate
data structures is needed (instead of just regrowing the array). Further
the datastructure from which the iocbs are passed into `io_submit` needs to
be in contigiuous memory (and also provide an API to use it which most
containers don't).

`std::deque` is used in the backlog to avoid oversized allocations in
the backlog itself. The existing array solution for `iocbs` is kept to
fulfill the contigiuous memory requirement.

Further we slightly change how EAGAIN is handled. Instead of
backshifting the array we keep the array as is and just track the
`begin` of the array across `flush` calls. This is possible now as the
backlog handling is in place.

This introduces "batching" and prevents degenerate cases where only a
single element is being submitted which would then result in repeated
shifting of the whole array on each `flush` call. Given we use a chunked
data structure like `std::deque` erasing from the front of the backlog
is relatively cheap and does not require shifting all the elements in
the backlog. Hence, the per-iocb overhead is amortized constant.

Note that in general we try to submit as many iocbs per `io_submit`
call. Given the new behaviour of not backshifting the iocb array and
immediately backfilling from the backlog we might introduce `io_submit`
calls that don't try to submit the max amount of iocbs.

However we assume that if we ran into EAGAIN then either:

 - We are still behind the next time around: it's unlikely we would
   succeed in submitting all the iocbs anyway
 - We have now caugt up: we have introduced a single additional
   `io_submit` call which only submits `max_poll()/2` iocbs on average.
   The backlog will be drained at full `max_poll` per `io_submit`.
Signed-off-by: Ben Pope <ben@redpanda.com>
Passed up to application code as a tls::blob (basic_string_view<char>).

Also provide the ability to fetch this blob directly via
credentials_builder::get_trust_file_blob()

Updates unit tests to accommodate the new reload callback signature.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Add the ability to disable the CPU profiler in regions with known
issues. Specifically for Redpanda this is to disable stacktraces
while Wasmtime is running, as JIT doesn't generate stack trace
entries (for other reasons).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@oleiman
Copy link
Member

oleiman commented Apr 26, 2024

@graphcareful - The ones w/ my name on them lgtm.

  • For e347da9, looks like just a conflicting include stmt
  • For 9696233, looks like some unrelated code was added to tls.hh at some intervening point

@ballard26
Copy link

Note that in data/projects/seastar/tests/unit/cpu_profiler_test.cc the following tests are not expected to pass;

  • exception_handler_case
  • config_thrashing
    I'm not sure what had happened here, but the configuration of the test was changed at some point. Will have a followup PR to this one to fix.

Comment on lines +352 to +356
/// This is a low level method for use cases where it is very important to
/// avoid any allocations. The \arg writer will be passed a
/// internal::log_buf::inserter_iterator that allows it to write into the log
/// buffer directly, avoiding the use of any intermediary buffers.
void log(log_level level, log_writer& writer, format_info_t<> fmt = {}) noexcept {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used and can its use be avoided?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -22,6 +22,7 @@
#pragma once

#ifndef SEASTAR_MODULE
#include <atomic>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. This code is not upstream.

Copy link

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes on my notes look good

Copy link

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commits for me look good

@graphcareful graphcareful merged commit a7ea1cd into v24.2.x Apr 30, 2024
0 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet