Skip to content

Conversation

@Okabe-Rintarou-0
Copy link
Member

@Okabe-Rintarou-0 Okabe-Rintarou-0 commented Jul 12, 2024

What type of PR is this?
feature

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Proposal here #397

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@Okabe-Rintarou-0
Copy link
Member Author

Okabe-Rintarou-0 commented Jul 12, 2024

how should i get the cluster id? Using cluster name will exceed the stack size limit?
may be extract hashName to a public package?

@Okabe-Rintarou-0
Copy link
Member Author

image
It's quite confusing

@LiZhenCheng9527
Copy link
Contributor

Some files need to be updated, please run 'make gen' and include any changed files in your PR

@codecov
Copy link

codecov bot commented Jul 13, 2024

Codecov Report

Attention: Patch coverage is 64.06250% with 23 lines in your changes missing coverage. Please review.

Project coverage is 49.54%. Comparing base (1902392) to head (e9af645).
Report is 48 commits behind head on main.

Files with missing lines Patch % Lines
pkg/cache/v2/cluster.go 79.41% 5 Missing and 2 partials ⚠️
pkg/bpf/bpf.go 0.00% 5 Missing ⚠️
pkg/utils/hash_name.go 0.00% 4 Missing ⚠️
pkg/controller/ads/cache.go 66.66% 3 Missing ⚠️
pkg/controller/controller.go 0.00% 3 Missing ⚠️
pkg/controller/workload/workload_processor.go 50.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
pkg/bpf/ads/loader.go 47.43% <100.00%> (+1.38%) ⬆️
pkg/bpf/workload/sock_connection.go 54.01% <ø> (ø)
pkg/bpf/workload/sock_ops.go 55.88% <ø> (ø)
pkg/bpf/workload/xdp.go 48.57% <ø> (ø)
pkg/controller/ads/ads_controller.go 80.70% <100.00%> (ø)
pkg/controller/ads/ads_processor.go 75.33% <100.00%> (ø)
pkg/controller/client.go 65.62% <100.00%> (ø)
pkg/controller/workload/workload_processor.go 61.67% <50.00%> (+2.51%) ⬆️
pkg/controller/ads/cache.go 48.67% <66.66%> (+0.16%) ⬆️
pkg/controller/controller.go 0.00% <0.00%> (ø)
... and 3 more

... and 10 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4ca182...e9af645. Read the comment docs.

@Okabe-Rintarou-0
Copy link
Member Author

/unhold

@Okabe-Rintarou-0 Okabe-Rintarou-0 force-pushed the feat/cluster_conn branch 2 times, most recently from 0119ade to a9e3c58 Compare July 16, 2024 14:06
@Okabe-Rintarou-0 Okabe-Rintarou-0 changed the title [WIP] maintain cluster active connection counter for circuit breaker maintain cluster active connection counter for circuit breaker Jul 21, 2024
@Okabe-Rintarou-0
Copy link
Member Author

Okabe-Rintarou-0 commented Jul 21, 2024

I use a special way to reject connection, plz check MARK_REJECTED macro.

Tested in oe23.03.

Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM

Defer to @nlgwcy for the connection reset


core.ApiStatus api_status = 128;
string name = 1;
uint32 id = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weired, seems the tag number is random

stats = kmesh_map_lookup_elem(&map_of_cluster_stats, key);
if (!stats) {
struct cluster_stats new_stats = {0};
new_stats.active_connections = delta;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it happen delta = -1 here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, need a check here.

"Current active connections %d exceeded max connections %d, reject connection\n",
stats->active_connections,
cbs->max_connections);
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always use 0 as success and negative number as error code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

(ctx)->user_ip4 = (address)->ipv4; \
(ctx)->user_port = (address)->port

#define MARK_REJECTED(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what is this for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cgroups/connect4 and sockops both use cluster_manager, without it will compile fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In kernel-native mode, if the running environment is not enhanced kernel(that is, L7 kernel traffic management is not supported), Kmesh completes L4 traffic management in the connect hook.

// cgroup_sock.c
cgroup_connect4_prog -> sock4_traffic_control -> listener_manager -> SEC_TAIL(KMESH_TAIL_CALL_FILTER_CHAIN) -> SEC_TAIL(KMESH_TAIL_CALL_FILTER) -> tcp_proxy_manager -> SEC_TAIL(KMESH_TAIL_CALL_CLUSTER) -> on_cluster_sock_bind

So, on this path, if circuit breaker is triggered, the service also needs to be rejected.

}
break;
case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
if (bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_STATE_CB_FLAG) != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set cb for inbound

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will delete it

Comment on lines +85 to +83
if (daddr == 0 && dport == 0) {
tcp_set_state(sk, TCP_CLOSE);
sk->sk_route_caps = 0;
inet_sk(sk)->inet_dport = 0;
err = -1;
goto out;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nlgwcy to review

stats->active_connections,
cbs->max_connections);
return 0;
return 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1

Signed-off-by: Okabe-Rintarou-0 <923048992@qq.com>
Signed-off-by: 923048992@qq.com <923048992@qq.com>
Signed-off-by: Okabe-Rintarou-0 <923048992@qq.com>
Signed-off-by: Okabe-Rintarou-0 <923048992@qq.com>
Signed-off-by: Okabe-Rintarou-0 <923048992@qq.com>
Signed-off-by: Okabe-Rintarou-0 <923048992@qq.com>
Signed-off-by: Okabe-Rintarou-0 <923048992@qq.com>
Signed-off-by: Okabe-Rintarou-0 <923048992@qq.com>
Signed-off-by: Okabe-Rintarou-0 <923048992@qq.com>
Signed-off-by: 923048992@qq.com <923048992@qq.com>
Signed-off-by: Okabe-Rintarou-0 <923048992@qq.com>
Signed-off-by: 923048992@qq.com <923048992@qq.com>
Signed-off-by: 923048992@qq.com <923048992@qq.com>
@Okabe-Rintarou-0 Okabe-Rintarou-0 force-pushed the feat/cluster_conn branch 2 times, most recently from b9585ec to 2510d8f Compare October 24, 2024 14:25
* limitations under the License.
* Author: nlgwcy
* Create: 2022-02-26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

"kmesh.net/kmesh/daemon/options"
)

type BpfTracePoint struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have split bpf progs, should not revert

// resourceHash[0]:cds resourceHash[1]:eds
resourceHash map[string][2]uint64
resourceHash map[string][2]uint64
hashName *utils.HashName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is persisting in hashName, do we need it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, i would think we can pkg github.com/google/uuid to generate uuid


for it.Next(&key, &value) {
if key.ClusterId == clusterId {
log.Debugf("remove cluster stats with key %v", key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

Signed-off-by: 923048992@qq.com <923048992@qq.com>
Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#570 (comment) can be done later

/lgtm

@kmesh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmesh-bot kmesh-bot merged commit a94c2d1 into kmesh-net:main Oct 30, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants