Skip to content

[improve][cli] pulsar-perf: refactor to reduce code duplication#8

Closed
pgier wants to merge 8 commits intomasterfrom
pulsar-perf-cli-refactor
Closed

[improve][cli] pulsar-perf: refactor to reduce code duplication#8
pgier wants to merge 8 commits intomasterfrom
pulsar-perf-cli-refactor

Conversation

@pgier
Copy link
Owner

@pgier pgier commented Dec 16, 2022

CI run for apache#19279

@pgier pgier force-pushed the pulsar-perf-cli-refactor branch 4 times, most recently from d83ad42 to 1c3a319 Compare December 17, 2022 16:22
@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jan 17, 2023
@pgier pgier force-pushed the pulsar-perf-cli-refactor branch 6 times, most recently from cd04110 to 3f01ce1 Compare January 18, 2023 22:20
@pgier pgier force-pushed the pulsar-perf-cli-refactor branch from 3f01ce1 to 6fb9cc8 Compare February 1, 2023 16:45
@pgier pgier force-pushed the pulsar-perf-cli-refactor branch from 6fb9cc8 to 44f676d Compare February 17, 2023 16:21
michaeljmarshall and others added 7 commits May 19, 2023 19:19
…essage skip to avoid unnecessary consumption stuck (apache#20335)

### Motivation
- apache#7105 provide a mechanism to avoid a stuck consumer affecting the consumption of other consumers: 
  - if all consumers can not accept more messages, stop delivering messages to the client.
  - if one consumer can not accept more messages, just read new messages and deliver them to other consumers.
- apache#7553 provide a mechanism to fix the issue of lost order of consumption: If the consumer cannot accept any more messages, skip the consumer for the next round of message delivery because there may be messages with the same key in the replay queue.
- apache#10762 provide a mechanism to fix the issue of lost order of consumption: If there have any messages with the same key in the replay queue, do not deliver the new messages to this consumer.

apache#10762 and apache#7553 do the same thing and apache#10762 is better than apache#7553 , so apache#7553 is unnecessary. 

### Modifications
remove the mechanism provided by apache#7553 to avoid unnecessary consumption stuck.
…lict (apache#20359)

Signed-off-by: Paul Gier <paul.gier@datastax.com>
…e#20315)

### Motivation

Load balancer extension needs to shut down gracefully, especially when shutting down the leader broker. When the leader broker closes the leader election service too late, service unit lookups to the leader broker could fail during the shutdown. This could delay client reconnection time.

Lookup failure logs
```
(shutdown starts)
pulsar-broker-8 pulsar-broker 2023-04-22T00:19:52,630+0000 [pulsar-service-shutdown] INFO  org.apache.pulsar.broker.PulsarService - Closing PulsarService
pulsar-broker-3 pulsar-broker 2023-04-22T00:19:52,690+0000 [pulsar-io-4-5] INFO  org.apache.pulsar.client.impl.ConnectionHandler - [persistent://pulsar/system/loadbalancer-service-unit-state] [pulsar-18-9] Closed connection [id: 0x907e74b0, L:/10.0.13.6:35838 ! R:pulsar-broker-8.pulsar-broker.pulsar.svc.cluster.local/10.0.18.18:6650] -- Will try again in 0.1 s
pulsar-broker-3 pulsar-broker 2023-04-22T00:19:52,691+0000 [pulsar-io-4-5] INFO  org.apache.pulsar.client.impl.ConnectionHandler - [persistent://pulsar/system/loadbalancer-service-unit-state] [reader-30e6b40dd9] Closed connection [id: 0x907e74b0, L:/10.0.13.6:35838 ! R:pulsar-broker-8.pulsar-broker.pulsar.svc.cluster.local/10.0.18.18:6650] -- Will try again in 0.1 s

(znode is gone)
pulsar-broker-8 pulsar-broker 2023-04-22T00:19:52,652+0000 [pulsar-load-manager-1-1] INFO  org.apache.pulsar.broker.loadbalance.extensions.channel.ServiceUnitStateChannelImpl - BrokerRegistry detected the broker:pulsar-broker-8.pulsar-broker.pulsar.svc.cluster.local:8080 registry has been deleted.

(lookup failure)
org.apache.pulsar.client.impl.ConnectionHandler - [persistent://pulsar/system/loadbalancer-service-unit-state] [reader-30e6b40dd9] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException$ConnectException: Disconnected from server at pulsar-broker-3.pulsar-broker.pulsar.svc.cluster.local/10.0.13.6:6650 -- Will try again in 0.193 s
pulsar-broker-3 pulsar-broker 2023-04-22T00:19:52,827+0000 [main-EventThread] INFO  org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for topic persistent://pulsar/system/loadbalancer-service-unit-state with error Failed to look up a broker registry:pulsar-broker-8.pulsar-broker.pulsar.svc.cluster.local:8080 for bundle:pulsar/system/0x40000000_0x50000000
pulsar-broker-3 pulsar-broker 2023-04-22T00:19:52,827+0000 [main-EventThread] INFO  org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for topic persistent://pulsar/system/loadbalancer-service-unit-state with error Failed to look up a broker registry:pulsar-broker-8.pulsar-broker.pulsar.svc.cluster.local:8080 for bundle:pulsar/system/0x40000000_0x50000000

(leader election service has been closed)
pulsar-broker-3 pulsar-broker 2023-04-22T00:19:55,569+0000 [pulsar-load-manager-1-1] INFO  org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl - This broker:pulsar-broker-3.pulsar-broker.pulsar.svc.cluster.local:8080 plays the leader now.

```

### Modifications
During the shutdown flow, when calling loadbalancer.disableBroker(), the load balancer extension gracefully gives up the owned bundles to other brokers. After that, it closes the leader election service and removes its register in the metadata store.
PIP: apache#16691

### Motivation
When upgrading the pulsar version and changing the pulsar load manager to `ExtensibleLoadManagerImpl` it might cause NPE. The root cause is the old version of pulsar does not contain the `loadManagerClassName` field.
```
2023-05-18T05:42:50,557+0000 [pulsar-io-4-1] INFO  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.6:51345] connected with role=[pulsarinstance-v3-0-n@test.dev](mailto:pulsarinstance-v3-0-n@test.dev) using authMethod=token, clientVersion=Pulsar Go 0.9.0, clientProtocolVersion=18, proxyVersion=null
2023-05-18T05:42:50,558+0000 [pulsar-io-4-1] WARN  org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup [pulsarinstance-v3-0-n@test.dev](mailto:pulsarinstance-v3-0-n@test.dev) for topic persistent://xxx with error java.lang.NullPointerException: Cannot invoke “String.equals(Object)” because the return value of “org.apache.pulsar.broker.loadbalance.extensions.data.BrokerLookupData.getLoadManagerClassName()” is null
java.util.concurrent.CompletionException: java.lang.NullPointerException: Cannot invoke “String.equals(Object)” because the return value of “org.apache.pulsar.broker.loadbalance.extensions.data.BrokerLookupData.getLoadManagerClassName()” is null
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) ~[?:?]
	at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1194) ~[?:?]
	at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309) ~[?:?]
	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.selectAsync(ExtensibleLoadManagerImpl.java:385) ~[io.streamnative-pulsar-broker-3.0.0.1.jar:3.0.0.1]
	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$assign$6(ExtensibleLoadManagerImpl.java:336) ~[io.streamnative-pulsar-broker-3.0.0.1.jar:3.0.0.1]
	at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187) ~[?:?]
	at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309) ~[?:?]
	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.lambda$assign$10(ExtensibleLoadManagerImpl.java:333) ~[io.streamnative-pulsar-broker-3.0.0.1.jar:3.0.0.1]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.put(ConcurrentOpenHashMap.java:409) ~[io.streamnative-pulsar-common-3.0.0.1.jar:3.0.0.1]
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.computeIfAbsent(ConcurrentOpenHashMap.java:243) ~[io.streamnative-pulsar-common-3.0.0.1.jar:3.0.0.1]
	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.assign(ExtensibleLoadManagerImpl.java:327) ~[io.streamnative-pulsar-broker-3.0.0.1.jar:3.0.0.1]
	at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerWrapper.findBrokerServiceUrl(ExtensibleLoadManagerWrapper.java:66) ~[io.streamnative-pulsar-broker-3.0.0.1.jar:3.0.0.1]
	at org.apache.pulsar.broker.namespace.NamespaceService.lambda$getBrokerServiceUrlAsync$0(NamespaceService.java:191) ~[io.streamnative-pulsar-broker-3.0.0.1.jar:3.0.0.1]
```

### Modifications

* Add null check when using`getLoadManagerClassName`.
* Add test to cover this case.
* Add `RedirectManager` unit test.
@pgier pgier force-pushed the pulsar-perf-cli-refactor branch 2 times, most recently from fa77970 to ceeb6d6 Compare May 22, 2023 14:48
* reduce code duplication between perf clients
* move common topic args and validation into new class PerformanceTopicListArguments
* improve CLI arg validation
* deprecate inconsistent args

Signed-off-by: Paul Gier <paul.gier@datastax.com>
@pgier pgier force-pushed the pulsar-perf-cli-refactor branch from ceeb6d6 to 3e52d72 Compare May 22, 2023 14:51
@pgier
Copy link
Owner Author

pgier commented Mar 13, 2024

Upstream PR is merged

@pgier pgier closed this Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants