Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Brain on HTTP requests for features flag requests : Dacha restart fixes #1102

Closed
1 of 5 tasks
chkp-yaakovg opened this issue Jan 3, 2024 · 5 comments
Closed
1 of 5 tasks
Labels
waiting on feedback Waiting for feedback from user

Comments

@chkp-yaakovg
Copy link

Describe the bug
HTTP requests to featureHub responds with different size/flags amount
Restart of Dacha fixes this issue, some sort of splitBran

Which area does this issue belong to?

  • FeatureHub Admin Web app
  • SDK
  • SDK examples
  • Documentation
  • Other

To Reproduce
Steps to reproduce the behavior:

  1. Deploy Featurehub 1.6.3 with Helm Chart defaults (2 dacha)
  2. Add remove featureflags over time
  3. Send Client Eval apik key for requests over time
  4. Add remove features over time
  5. See some requests return different flags/size of flags on features/ requests using client Eval.
  6. Restart dacha pod : Fixes issue

Expected behavior
All requests to features/ return same number of flags and updated according to changes in MR

Versions

  • FeatureHub version 1.6.3
  • K8s on AWS EKS 1.28
  • Helm Chart version : 4.0.5
  • Container image: featurehub/dacha2:1.6.3

Additional context
See the following errors in the dacha logs
"data": { "@timestamp": "2023-12-29T17:37:31.133+0000", "connect.response.statusCode": "200", "connect.rest.method": "responded: GET - http://10.26.49.151:8701/metrics", "host": "featurehub-dacha-68c4d6c968-j4jdp", "message": "An I/O error has occurred while writing a response message entity to the container output stream.", "path": "org.glassfish.jersey.server.ServerRuntime$Responder", "priority": "ERROR", "stack_trace": "org.glassfish.jersey.server.internal.process.MappableException: java.io.IOException: Connection is closed\n\torg.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:67) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1116) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:677) [jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:385) [jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:375) [jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:264) [jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors.process(Errors.java:292) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors.process(Errors.java:274) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors.process(Errors.java:244) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:240) [jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:697) [jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:367) [jersey-container-grizzly2-http-3.1.1.jar:?]\n\torg.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:190) [grizzly-http-server-4.0.0.jar:4.0.0]\n\torg.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:535) [grizzly-framework-4.0.0.jar:4.0.0]\n\torg.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:515) [grizzly-framework-4.0.0.jar:4.0.0]\n\tjava.lang.Thread.run(Unknown Source) [?:?]\n\tCaused by: java.io.IOException: Connection is closed\n\torg.glassfish.grizzly.nio.NIOConnection.assertOpen(NIOConnection.java:420) ~[grizzly-framework-4.0.0.jar:4.0.0]\n\torg.glassfish.grizzly.http.io.OutputBuffer.write(OutputBuffer.java:613) ~[grizzly-http-4.0.0.jar:4.0.0]\n\torg.glassfish.grizzly.http.server.NIOOutputStreamImpl.write(NIOOutputStreamImpl.java:60) ~[grizzly-http-server-4.0.0.jar:4.0.0]\n\tjava.io.ByteArrayOutputStream.writeTo(Unknown Source) ~[?:?]\n\torg.glassfish.jersey.message.internal.CommittingOutputStream.flushBuffer(CommittingOutputStream.java:278) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:218) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.logging.BaseFilteringLogger$LoggingStream.write(BaseFilteringLogger.java:340) ~[connect-jersey-common-2.1.jar:?]\n\tjava.io.FilterOutputStream.write(Unknown Source) ~[?:?]\n\torg.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:276) ~[jersey-common-3.1.1.jar:?]\n\tsun.nio.cs.StreamEncoder.writeBytes(Unknown Source) ~[?:?]\n\tsun.nio.cs.StreamEncoder.implWrite(Unknown Source) ~[?:?]\n\tsun.nio.cs.StreamEncoder.implWrite(Unknown Source) ~[?:?]\n\tsun.nio.cs.StreamEncoder.write(Unknown Source) ~[?:?]\n\tsun.nio.cs.StreamEncoder.write(Unknown Source) ~[?:?]\n\tjava.io.OutputStreamWriter.write(Unknown Source) ~[?:?]\n\torg.glassfish.jersey.message.internal.ReaderWriter.writeToAsString(ReaderWriter.java:238) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.AbstractMessageReaderWriterProvider.writeToAsString(AbstractMessageReaderWriterProvider.java:107) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:76) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:36) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:242) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:227) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.logging.BaseFilteringLogger.aroundWriteTo(BaseFilteringLogger.java:264) ~[connect-jersey-common-2.1.jar:?]\n\torg.glassfish.jersey.logging.FilteringServerLoggingFilter.aroundWriteTo(FilteringServerLoggingFilter.java:24) ~[connect-jersey-common-2.1.jar:?]\n\torg.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:85) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.spi.ContentEncoder.aroundWriteTo(ContentEncoder.java:113) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) ~[jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:61) ~[jersey-server-3.1.1.jar:?]\n\tCaused by: java.io.IOException: Locally closed\n\t", "thread": "grizzly-http-server-3" },

received update for unknown feature 492b9cc7-edef-457e-bc55-xxxxxxxxxx:

@rvowles
Copy link
Contributor

rvowles commented Jan 4, 2024

Heya - could you consider upgrading to 1.7.0 please? We went on a big bug hunt in that version for edge cases for Dacha2 particularly around webhooks and there were some concurrency issues we managed to iron out. We will however add a longer running e2e test to the suite that checks this - we will need to add some test APIs that let us knock out the dacha2 instances so they lose state. Are you running your eks on fargate (which will use karpenter) or are you using something like karpenter yourselves?

That stack trace indicates that your Prometheus server is terminating the GET request early so while the server is writing out metrics it is getting a socket closed fault (so it logs it).

@chkp-yaakovg
Copy link
Author

chkp-yaakovg commented Jan 4, 2024

Hi @rvowles , thanks for the answer, updated from 1.6.0 to 1.6.3 two weeks ago, the jump to 1.7.0 on a mostly production system , was worried too big of a jump.
After discussing with my team:
Was running on EKS EC2 instances that were instance type t3.medium with replica of 2 for dacha and edge.
Karpentar is on our road-map, using Autoscaling Groups.
Modified this now to instance type m5.large with 3 replica's for dacha and edge, and added topologySpreadContraints : for Zones (AZS),
Noticed Network IN(Bytes) and Network OUT(Bytes) increased by a little bit.
Perhaps application is sensitive to network issues.
Will plan update to 1.7.0

@rvowles
Copy link
Contributor

rvowles commented Jan 26, 2024

1.7.1 is already out by the way.

@rvowles rvowles added the waiting on feedback Waiting for feedback from user label Jan 26, 2024
@chkp-yaakovg
Copy link
Author

Thanks, please close.

@rvowles
Copy link
Contributor

rvowles commented Feb 15, 2024

@chkp-yaakovg if you need any further help - feel free to leave a discussion item or ideally jump onto our Slack!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting on feedback Waiting for feedback from user
Projects
None yet
Development

No branches or pull requests

2 participants