Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup Task Difference #1392

Closed
9 tasks done
simonhir opened this issue Mar 1, 2024 · 6 comments
Closed
9 tasks done

Cleanup Task Difference #1392

simonhir opened this issue Mar 1, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@simonhir
Copy link
Member

simonhir commented Mar 1, 2024

Cleanup the current existing task difference between engine and tasklist. The difference is visible through the new monitoring. To better monitor new occurring differences for #1348 the current difference needs to be cleanup on all environments.
This can be done by comparing the engine-db task entries with the tasklist db task-entries and removing all entries from the tasklist which are not present in the engine. If this does not completely fix the difference there needs to be also checked if there exist task in the engine which don't exist in the tasklist.

After the cleanup, we need an alert to notify us if this problem occurs again.

Acceptance criteria

  • Dev has no task difference
  • Test has no task difference
  • Demo has no task difference
  • Processes-Test has no task difference
  • Processes-Demo has no task difference
  • Processes-Hotfix has no task difference
  • Processes-Training has no task difference
  • Prod has no task difference
  • Alert on Processes-* and Prod on Task-Diff (implementiert mit digiwf-ops!422)
@simonhir simonhir added the bug Something isn't working label Mar 1, 2024
@simonhir simonhir self-assigned this Mar 1, 2024
@darenegade
Copy link
Member

Man könnte mit einem schlauen SQL Statement oder einem Skript die Reihenfolge der Events in der domain_event_entry wieder korrigieren und dann den Token auf Dezember 23 zurücksetzen, sodass sich alle falschen Events wieder korrigieren.

@simonhir
Copy link
Member Author

simonhir commented Apr 9, 2024

Folgende Usertasks fehlen auf Prod in der Taskliste.

Task-ID					Typ			Instance-ID
6e9df7c1-c198-11ee-876a-0a580a8a338b	Zurückziehen		6e262f24-c198-11ee-876a-0a580a8a338b
7c5153f5-c190-11ee-876a-0a580a8a338b	Zurückziehen		7bc7d817-c190-11ee-876a-0a580a8a338b
2e2dbec3-c178-11ee-876a-0a580a8a338b	Zurückziehen		2d5119d6-c178-11ee-876a-0a580a8a338b
b1f2537c-bc14-11ee-8ffa-0a580a8a32ae	Zurückziehen		b17c117e-bc14-11ee-8ffa-0a580a8a32ae

Die Zurückziehen Tasks wurden hierbei mit Absicht nicht neu angelegt und sind seit dem letzten Cleanup auch deutlich zurück gegangen.

Updated: 29.04.2024

@simonhir
Copy link
Member Author

Prod-Tasks siehe oben gecancled.

Processes-Test-Tasks von Dezember 2023 mit anderem Anwendungs-Namen gelöscht. Vermutlich durch Konfigurations-Fehler entstanden.

@simonhir
Copy link
Member Author

simonhir commented Apr 30, 2024

Folgende drei Tasks aus Taskliste gelöscht:

  • c24d74f1-0624-11ef-b23a-0a580a8a19e8
  • 0012cd9f-e2c9-11ee-99fa-0a580a8a0ff2
  • 273b352b-0075-11ef-88e9-0a580a8a1019

Folgende zwei Tasks via Engine Modify neugestartet:

  • 9e1e7ec0-f710-11ee-a425-0a580a8a1371 (Instance: 9dbb9e9c-f710-11ee-a425-0a580a8a1371)
  • 247022d6-0625-11ef-b23a-0a580a8a19e8 (Instance: 24118887-0625-11ef-b23a-0a580a8a19e8)

@simonhir
Copy link
Member Author

simonhir commented May 3, 2024

Processes-Demo task 480c2b60-0869-11ef-8770-0a580a8a2e6e missing in tasklist.
Following error in digiwf-engine-service log:

Log
EventListener [RoutingKafkaEventPublisher] failed to handle event [576e8609-ea8b-4751-84a6-e6998786c8d0] (io.holunda.camunda.taskpool.api.task.TaskCreatedEngineEvent). Continuing processing with next listener
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.
Wrapped by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:97)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:79)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
	at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.waitForPublishAck(KafkaPublisher.java:194)
	... 26 common frames omitted
Wrapped by: org.axonframework.messaging.EventPublicationFailedException: Event publication failed, exception occurred while waiting for event publication.
	at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.waitForPublishAck(KafkaPublisher.java:203)
	at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.send(KafkaPublisher.java:162)
	at org.axonframework.extensions.kafka.eventhandling.producer.KafkaEventPublisher.handle(KafkaEventPublisher.java:80)
	at de.muenchen.oss.digiwf.task.polyflow.kafka.RoutingKafkaEventPublisher.handle(RoutingKafkaEventPublisher.java:31)
	at org.axonframework.eventhandling.SimpleEventHandlerInvoker.invokeHandlers(SimpleEventHandlerInvoker.java:128)
	at org.axonframework.eventhandling.SimpleEventHandlerInvoker.handle(SimpleEventHandlerInvoker.java:114)
	at org.axonframework.eventhandling.MultiEventHandlerInvoker.handle(MultiEventHandlerInvoker.java:91)
	at org.axonframework.eventhandling.AbstractEventProcessor.processMessageInUnitOfWork(AbstractEventProcessor.java:195)
	at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$1(AbstractEventProcessor.java:173)
	at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:57)
	at org.axonframework.messaging.interceptors.CorrelationDataInterceptor.handle(CorrelationDataInterceptor.java:67)
	at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:55)
	at org.axonframework.eventhandling.TrackingEventProcessor.lambda$new$1(TrackingEventProcessor.java:181)
	at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:55)
	at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$2(AbstractEventProcessor.java:174)
	at org.axonframework.tracing.Span.runCallable(Span.java:132)
	at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$3(AbstractEventProcessor.java:170)
	at org.axonframework.messaging.unitofwork.BatchingUnitOfWork.executeWithResult(BatchingUnitOfWork.java:92)
	at org.axonframework.eventhandling.AbstractEventProcessor.lambda$processInUnitOfWork$4(AbstractEventProcessor.java:166)
	at org.axonframework.tracing.Span.runCallable(Span.java:132)
	at org.axonframework.eventhandling.AbstractEventProcessor.processInUnitOfWork(AbstractEventProcessor.java:165)
	at org.axonframework.eventhandling.TrackingEventProcessor.processBatch(TrackingEventProcessor.java:491)
	at org.axonframework.eventhandling.TrackingEventProcessor.processingLoop(TrackingEventProcessor.java:316)
	at org.axonframework.eventhandling.TrackingEventProcessor$TrackingSegmentWorker.run(TrackingEventProcessor.java:1200)
	at org.axonframework.eventhandling.TrackingEventProcessor$WorkerLauncher.cleanUp(TrackingEventProcessor.java:1402)
	at org.axonframework.eventhandling.TrackingEventProcessor$WorkerLauncher.run(TrackingEventProcessor.java:1379)
	at java.base/java.lang.Thread.run(Thread.java:840)

Processes-Test task 2f402ec0-0861-11ef-8959-0a580a8a36ac not deleted from tasklist.
Following error in digiwf-engine-service log:

Log
EventListener [RoutingKafkaEventPublisher] failed to handle event [a79cafcd-4201-4194-af1c-e9337fc0e16c] (io.holunda.camunda.taskpool.api.task.TaskCompletedEngineEvent). Continuing processing with next listener
org.apache.kafka.common.errors.NetworkException: Disconnected from node 2
Wrapped by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NetworkException: Disconnected from node 2
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:97)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:79)
	at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
	at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.waitForPublishAck(KafkaPublisher.java:194)
	... 26 common frames omitted
Wrapped by: org.axonframework.messaging.EventPublicationFailedException: Event publication failed, exception occurred while waiting for event publication.
	at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.waitForPublishAck(KafkaPublisher.java:203)
	at org.axonframework.extensions.kafka.eventhandling.producer.KafkaPublisher.send(KafkaPublisher.java:162)
	at org.axonframework.extensions.kafka.eventhandling.producer.KafkaEventPublisher.handle(KafkaEventPublisher.java:80)
	at de.muenchen.oss.digiwf.task.polyflow.kafka.RoutingKafkaEventPublisher.handle(RoutingKafkaEventPublisher.java:31)
	at org.axonframework.eventhandling.SimpleEventHandlerInvoker.invokeHandlers(SimpleEventHandlerInvoker.java:128)
	at org.axonframework.eventhandling.SimpleEventHandlerInvoker.handle(SimpleEventHandlerInvoker.java:114)
	at org.axonframework.eventhandling.MultiEventHandlerInvoker.handle(MultiEventHandlerInvoker.java:91)
	at org.axonframework.eventhandling.AbstractEventProcessor.processMessageInUnitOfWork(AbstractEventProcessor.java:195)
	at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$1(AbstractEventProcessor.java:173)
	at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:57)
	at org.axonframework.messaging.interceptors.CorrelationDataInterceptor.handle(CorrelationDataInterceptor.java:67)
	at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:55)
	at org.axonframework.eventhandling.TrackingEventProcessor.lambda$new$1(TrackingEventProcessor.java:181)
	at org.axonframework.messaging.DefaultInterceptorChain.proceed(DefaultInterceptorChain.java:55)
	at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$2(AbstractEventProcessor.java:174)
	at org.axonframework.tracing.Span.runCallable(Span.java:132)
	at org.axonframework.eventhandling.AbstractEventProcessor.lambda$null$3(AbstractEventProcessor.java:170)
	at org.axonframework.messaging.unitofwork.BatchingUnitOfWork.executeWithResult(BatchingUnitOfWork.java:92)
	at org.axonframework.eventhandling.AbstractEventProcessor.lambda$processInUnitOfWork$4(AbstractEventProcessor.java:166)
	at org.axonframework.tracing.Span.runCallable(Span.java:132)
	at org.axonframework.eventhandling.AbstractEventProcessor.processInUnitOfWork(AbstractEventProcessor.java:165)
	at org.axonframework.eventhandling.TrackingEventProcessor.processBatch(TrackingEventProcessor.java:491)
	at org.axonframework.eventhandling.TrackingEventProcessor.processingLoop(TrackingEventProcessor.java:316)
	at org.axonframework.eventhandling.TrackingEventProcessor$TrackingSegmentWorker.run(TrackingEventProcessor.java:1200)
	at org.axonframework.eventhandling.TrackingEventProcessor$WorkerLauncher.cleanUp(TrackingEventProcessor.java:1402)
	at org.axonframework.eventhandling.TrackingEventProcessor$WorkerLauncher.run(TrackingEventProcessor.java:1379)
	at java.base/java.lang.Thread.run(Thread.java:840)

Was probably caused by hotfix rollout

@simonhir
Copy link
Member Author

simonhir commented May 8, 2024

Tasks remain synchronised. Main problem fixed. Due to unavailability of the Kafka cluster and non-existent retry in Axon-Kafka differences can still occur. This is detected via the existing monitoring and can then be analysed separately if necessary.

@simonhir simonhir closed this as completed May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants