Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix re-setting properties on BytesMessage #1042

Closed
wants to merge 22 commits into from
Closed

Conversation

me3x
Copy link

@me3x me3x commented Nov 21, 2019

Sleuth JMS instrumentation cleared all properties on ActiveMQBytesMessage and was unable to put them back.

fix MessageNotWriteableException - Message body is read-only while re-setting properties on BytesMessage

please squash commits with merge

Copy link
Member

@jeqo jeqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@me3x thanks for this PR! I've added some comments to do some small refactoring. But before moving forward, this issue with BytesMessage is specific to ActiveMQ as far as I remember. If we introduce this changes, we have to make sure is not impacting users that are using other JMS providers. Have you had the opportunity to check it with other providers?

On my side, I'll try to find time to do this check with IBM MQ or Oracle JMS and validate if it works before/after this changes.

me3x and others added 2 commits November 25, 2019 10:54
Co-Authored-By: Jorge Quilcate Otoya <quilcate.jorge@gmail.com>
@me3x
Copy link
Author

me3x commented Nov 25, 2019

I only tested with ActiveMQ. Please check.

@me3x
Copy link
Author

me3x commented Nov 25, 2019

@jeqo could you please help me to fix a test? I believe it shouldn't be failing. It probably relies on clearing properties even if there's no need to clear.

ITJmsTracingMessageConsumer>ITJms_1_1_TracingMessageConsumer.receive_customSampler:300

@jeqo
Copy link
Member

jeqo commented Nov 25, 2019

Hey @me3x thanks for the fixes!

I've test it further and actually that test case is preventing to break functionality here. Let me explain:

  • That case is a non-traced producer sending a message, and a traced consumer with a sampler based on queue name.
  • traced consumer should be able to extract context from producer (null in this case) and inject context for further processing.
  • as in the current implementation we are returning when disjoint before reset properties, properties are kept as read-only, failing to inject context later. Check by debugging a breaking point on MessagePropagation:45

Based on this I recommend to remove disjoint flag as we need to reset properties anyway, simplifying loop logic.

@jeqo
Copy link
Member

jeqo commented Nov 25, 2019

btw, I've test it with Oracle WebLogic JMS and works before (doesn't have a similar bug as ActiveMQ) and works after. So, I think we are good to merge on that side.

@jeqo
Copy link
Member

jeqo commented Nov 25, 2019

@me3x , could you re-enable this IT that was validating this issue: ITJms_1_1_TracingMessageConsumer:253

If that one goes green we should be good to merge.

we need to always reset properties
@me3x
Copy link
Author

me3x commented Nov 27, 2019

I removed disjoint check. It's a little bit weird design to rely on properties being writable for this scenario, but let's merge without it.

@me3x
Copy link
Author

me3x commented Nov 27, 2019

@jeqo Should we have similar test with BytesMessage. It failed only without instance of condition. In case of ActiveMQBytesMessage, setObjectProperty will not work after I call bytesMessage.reset();

Copy link
Member

@jeqo jeqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@me3x just got the unit test enabled to validate this fix. I left some comments as well. Thanks for your patience :)

@me3x
Copy link
Author

me3x commented Nov 28, 2019

@jeqo do you have any idea why it started to fail? [ERROR] ITJmsTracingMessageConsumer>ITJms_1_1_TracingMessageConsumer.receive_resumesTrace_bytes:255->ITJms_1_1_TracingMessageConsumer.receive_resumesTrace:275 expected:<"463ac35c9f6413ad-8fdeb02e21ce5744-1"> but was:

@jeqo
Copy link
Member

jeqo commented Nov 28, 2019

@me3x take a look to the last bytesMessage.reset() on PropertyFilter#resetBytesMessageProperties, it is closing the message again; triggering failure later when injecting context at the end:

  void handleReceive(Message message) {
    if (message == null || tracing.isNoop()) return;
    MessageConsumerRequest request = new MessageConsumerRequest(message, destination(message));

    TraceContextOrSamplingFlags extracted =
      jmsTracing.extractAndClearProperties(extractor, request, message);
    Span span = jmsTracing.nextMessagingSpan(sampler, request, extracted);

    if (!span.isNoop()) {
      span.name("receive").kind(Span.Kind.CONSUMER);
      Destination destination = destination(message);
      if (destination != null) jmsTracing.tagQueueOrTopic(request, span);
      if (remoteServiceName != null) span.remoteServiceName(remoteServiceName);

      // incur timestamp overhead only once
      long timestamp = tracing.clock(span.context()).currentTimeMicroseconds();
      span.start(timestamp).finish(timestamp);
    }
    injector.inject(span.context(), request);
  }

I think we can remove this reset and let JMS to close the message before sending it.

@me3x
Copy link
Author

me3x commented Nov 28, 2019

I tried and it didn't work without reset. It failed on JmsListener ... unable to read, because body was in write mode.

@@ -62,6 +63,9 @@ void handleReceive(Message message) {
span.start(timestamp).finish(timestamp);
}
injector.inject(span.context(), request);
if (message instanceof BytesMessage) {
(BytesMessage message).reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@me3x sorry if I wasn't clear enough. This part is not compiling, and actually not needed. I believe jms will take care of the message itself; no need to reset ourselves. I've give a try removing this and verifying ITs locally and all's green :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instrumentation should left the message in the same state it started. I tried to read body in JmsListener and it failed.

@jeqo
Copy link
Member

jeqo commented Nov 28, 2019 via email

@me3x
Copy link
Author

me3x commented Nov 29, 2019

I don't have a test, but here is a stack trace:

Caused by: javax.jms.MessageNotReadableException: Message body is write-only
	at org.apache.activemq.command.ActiveMQBytesMessage.checkWriteOnlyBody(ActiveMQBytesMessage.java:867) ~[activemq-client-5.15.10.jar:5.15.10]
	at org.apache.activemq.command.ActiveMQBytesMessage.initializeReading(ActiveMQBytesMessage.java:872) ~[activemq-client-5.15.10.jar:5.15.10]
	at org.apache.activemq.command.ActiveMQBytesMessage.getBodyLength(ActiveMQBytesMessage.java:203) ~[activemq-client-5.15.10.jar:5.15.10]
	at com.cs.ifs.oxygen.common.JmsUtils.getBytesMessagePayloadAsString(JmsUtils.java:27) ~[classes/:na]
	... proxies and transactions ...

	at com.cs.ifs.oxygen.am.CoreRefDataListener.onMessage(CoreRefDataListener.java:60) ~[classes/:na]

@jeqo
Copy link
Member

jeqo commented Nov 29, 2019

@me3x I won't be able to help much if we don't have a test-case to check this scenario, I'm sorry.

If we want to cover this case where this change is needed, we need test cases first to prove that works and doesn't affect other scenarios.

If we need to review how we are handling JMS messages and if we need to reset them or not at the end, I'd prefer to take this on another PR and only solve issue with ActiveMQ Bytes Message here first.

@me3x
Copy link
Author

me3x commented Nov 29, 2019

from javax.jms.BytesMessage:

  • If {@code clearBody} is called on a message in read-only mode,

  • the message body is cleared and the message is in write-only mode.
  • If a client attempts to read a message in write-only mode, a

  • {@code MessageNotReadableException} is thrown.

@me3x
Copy link
Author

me3x commented Nov 29, 2019

I removed body reset. I'll create another PR

@jeqo
Copy link
Member

jeqo commented Nov 29, 2019

@me3x I've second thoughts about my previous comments. I realize now that we are only "touching" message content on this PR and not in the existing implementation where we deal with properties only, then we have to solve the read-only issue here. Sorry for the confusion.

To have temporal bytesMessage handling on the same place, we can move changes upper to TracingConsumer instead of PropertyFilter. Here is a draft with some comments:

  void handleReceive(Message message) {
    if (message == null || tracing.isNoop()) return;
    MessageConsumerRequest request = new MessageConsumerRequest(message, destination(message));

    byte[] body = null;
    if (message instanceof BytesMessage) {
      try {
        BytesMessage bytesMessage = (BytesMessage) message;
        body = new byte[(int) bytesMessage.getBodyLength()];
        //bytesMessage.reset(); // do we need this reset() before reading?
        bytesMessage.readBytes(body);
        bytesMessage.clearBody(); // any exception after this can lead to loose message
      } catch (Throwable t) {
        propagateIfFatal(t);
        //log(t, "error getting property {0} from message {1}", name, message);
        return;
      }
    }

    TraceContextOrSamplingFlags extracted =
      jmsTracing.extractAndClearProperties(extractor, request, message);
    // ....current code
    injector.inject(span.context(), request);

    if (message instanceof BytesMessage) {
      try {
        BytesMessage bytesMessage = (BytesMessage) message;
        bytesMessage.writeBytes(body);
        bytesMessage.reset();
      } catch (Throwable t) {
        propagateIfFatal(t);
        //log(t, "error getting property {0} from message {1}", name, message);
        //return;
      }
    }
  }

This will also mean we can rollback some of the changes on PropertyFilter back: mainly property names enumeration was enough (we are not using disjoint anymore).

Lastly, I'm worried about the impact of this changes as we are carrying the technical debt of an ActiveMQ bug, as well as touching message content. I'm asking other contributors (@openzipkin/brave) to give feedback here.

There are different options from my point of view: (1) add this workaround and maintain it until activemq client is fixed. (2) move this workaround to a new module (e.g. activemq) which will only impact/help ActiveMQ users (3) have this workaround on another repo (e.g. inside openzipkin-contrib) until bug is fixed.

Sorry if this is late in the review process.

@me3x
Copy link
Author

me3x commented Dec 2, 2019

@jeqo We should read properties immediately after clearing the body and than we can write body back. We don't have to wait. Clearing it is the only way to put the message to write mode, but it will remain writable until reset.

@jeqo
Copy link
Member

jeqo commented Dec 2, 2019

@me3x right, so we can do the write back right after the clearBody isn't it? This is much better. What do you think on moving things to TracingConsumer?

We can test if possible reduce the filter even further by adding (message instanceof BytesMessage && message.getClass().getName().equals("org.apache.activemq.command.ActiveMQBytesMessage")), so we can go for option 1 and keep the fix as part of the current module. I'll test this today.

@me3x
Copy link
Author

me3x commented Dec 2, 2019

@jeqo It's a good idea. I found that it's used also in class JmsTracing and TracingMessageListener, so there are 3 places. Do we have ActiveMQBytesMessage class available? If not, maybe it could be part of some recovery after exception?

@me3x
Copy link
Author

me3x commented Dec 2, 2019

@jeqo I mean current behavior is the worst possible, because we loose user's properties completely.

@me3x
Copy link
Author

me3x commented Dec 2, 2019

we could rethrow MessageNotWriteableException and handle it with clearing the body

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate a pragmatic approach to this, but this solution is dangerous to the app. Some changes requested:

  • revert the loop logic change as it is less efficient than before and not sure what if anything was needed to change there
  • consider how to make this optional. For example, is there any signal used in activeMQ message only (besides type name)?
  • see if you can fix or nudge the underlying problem to get it resolved. Spaghetti causes problems not just here but the whole world working around the bug. Sometimes you can get someone to fix the actual problem and give hope for the future

} catch (Throwable t) {
propagateIfFatal(t);
log(t, "error getting property names from {0}", message, null);
return;
}

while (names.hasMoreElements()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this logic changed? was there a problem? Using collections.list will add allocate an array, and changing this logic makes it hard to tell what change is actually a problem or in focus.

}
}

private static void resetBytesMessageProperties( Message message, List<Object> out ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will cause a lot of overhead as we are subject to OOM. as this is a bug in activemq, we need to be more specific than this.

@codefromthecrypt
Copy link
Member

@jeqo if I recall correctly, there's a fix to this upstream now?

@jeqo
Copy link
Member

jeqo commented Mar 11, 2020

@adriancole still waiting for the next ActiveMQ client release to be available.

I will create a PR with the changes and notes about target versions on README, and close this one.

@jeqo jeqo mentioned this pull request Mar 13, 2020
@codefromthecrypt
Copy link
Member

we should close this now with ActiveMQ release fixing this internally correct? meanwhile it is nearing a year old :D

@codefromthecrypt
Copy link
Member

closing as out-of-date and seems possible addressed upstream. If not the case, please re-open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants