New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problems when deserializing #40

Closed
jschneider opened this Issue Oct 11, 2014 · 6 comments

Comments

Projects
None yet
2 participants
@jschneider

jschneider commented Oct 11, 2014

I have run some performance tests and compared bson4jackson with jackson when deserializing using the stream api.

Surprisingly bson4jackson is much slower for me.
I tried to deserialize a very simple file 500.000 times.

Plain Jackson needs 330-390ms while bson4jackson takes between 870 and 1010ms.

Since your benchmark results (http://www.michel-kraemer.com/binary-json-with-bson4jackson) seem to be much better, there must be something wrong on my side. Here is the code I am using:

The file I am parsing (has been generated using bson4jackson

00000000026964000a00000043616e6f6e205261770008646570656e64656e74000003657874656e73696f6e000000000002657874656e73696f6e0004000000637232000864656661756c7400010264656c696d6974657200020000002e000000

And here the code used to deserialize:

      JsonParser parser = factory.createParser( contentSample );

      assertEquals( JsonToken.START_OBJECT, parser.nextToken() );

      assertEquals( JsonToken.FIELD_NAME, parser.nextToken() );
      assertEquals( "id", parser.getCurrentName() );
      assertEquals( JsonToken.VALUE_STRING, parser.nextToken() );
      String id = parser.getText();
      assertEquals( "Canon Raw", id );

      assertEquals( JsonToken.FIELD_NAME, parser.nextToken() );
      assertEquals( "dependent", parser.getCurrentName() );
      assertEquals( JsonToken.VALUE_FALSE, parser.nextToken() );
      boolean dependent = parser.getBooleanValue();
      assertFalse( dependent );

      assertEquals( JsonToken.FIELD_NAME, parser.nextToken() );
      assertEquals( "extension", parser.getCurrentName() );
      assertEquals( JsonToken.START_OBJECT, parser.nextToken() );

      assertEquals( JsonToken.FIELD_NAME, parser.nextToken() );
      assertEquals( "extension", parser.getCurrentName() );
      assertEquals( JsonToken.VALUE_STRING, parser.nextToken() );
      String extension = parser.getText();
      assertEquals( "cr2", extension );

      assertEquals( JsonToken.FIELD_NAME, parser.nextToken() );
      assertEquals( "default", parser.getCurrentName() );
      assertEquals( JsonToken.VALUE_TRUE, parser.nextToken() );
      boolean isDefault = parser.getBooleanValue();
      assertTrue( isDefault );

      assertEquals( JsonToken.FIELD_NAME, parser.nextToken() );
      assertEquals( "delimiter", parser.getCurrentName() );
      assertEquals( JsonToken.VALUE_STRING, parser.nextToken() );
      String delimiter = parser.getText();
      assertEquals( ".", delimiter );

      assertEquals( JsonToken.END_OBJECT, parser.nextToken() );
      assertEquals( JsonToken.END_OBJECT, parser.nextToken() );
      assertNull( parser.nextToken() );

      parser.close();

      FileType type = new FileType( id, new Extension( delimiter, extension, isDefault ), dependent );
      assertNotNull( type );

The factory is only created once:

                        BsonFactory jsonFactory = new BsonFactory();
                        jsonFactory.enable( BsonGenerator.Feature.ENABLE_STREAMING );

Any ideas?

@jschneider

This comment has been minimized.

Show comment
Hide comment
@jschneider

jschneider Oct 11, 2014

I did a little bit of profiling:

  • most time (about 50%) is lost in LittleEndianInputStream.readUTF.
  • UTF_8.newDecorder takes about 6% of the time. Maybe the CharsetDecorder instance might be cached?
  • about 6% in CharsetDecorder.decode

jschneider commented Oct 11, 2014

I did a little bit of profiling:

  • most time (about 50%) is lost in LittleEndianInputStream.readUTF.
  • UTF_8.newDecorder takes about 6% of the time. Maybe the CharsetDecorder instance might be cached?
  • about 6% in CharsetDecorder.decode
@michel-kraemer

This comment has been minimized.

Show comment
Hide comment
@michel-kraemer

michel-kraemer Oct 14, 2014

Owner

Strange. I'll check this and get back to you.

Michel

Owner

michel-kraemer commented Oct 14, 2014

Strange. I'll check this and get back to you.

Michel

@michel-kraemer

This comment has been minimized.

Show comment
Hide comment
@michel-kraemer

michel-kraemer Oct 15, 2014

Owner

I tried to reproduce your example, but I get an exception reading your sample file. I suppose it's base64 encoded, but still no luck. Maybe you can send me the original file and also your test code via email? Thanks a lot!

Michel

Owner

michel-kraemer commented Oct 15, 2014

I tried to reproduce your example, but I get an exception reading your sample file. I suppose it's base64 encoded, but still no luck. Maybe you can send me the original file and also your test code via email? Thanks a lot!

Michel

@jschneider

This comment has been minimized.

Show comment
Hide comment
@jschneider

jschneider Oct 15, 2014

No, it is hex encoded. Sorry - should have mentioned that.
I will create an executable project. Just give me a few days.

jschneider commented Oct 15, 2014

No, it is hex encoded. Sorry - should have mentioned that.
I will create an executable project. Just give me a few days.

@michel-kraemer

This comment has been minimized.

Show comment
Hide comment
@michel-kraemer

michel-kraemer Oct 17, 2014

Owner

Oh. I should have figured that out myself!

No worries. No need for a sample project. I was able to reproduce the issue. I'll see what I can do.

Michel

Owner

michel-kraemer commented Oct 17, 2014

Oh. I should have figured that out myself!

No worries. No need for a sample project. I was able to reproduce the issue. I'll see what I can do.

Michel

@michel-kraemer

This comment has been minimized.

Show comment
Hide comment
@michel-kraemer

michel-kraemer Oct 21, 2014

Owner

OK. I was able to improve the performance a lot by replacing ByteArrayInputStream and BufferedInputStream from the JDK by my own implementations that are not thread safe (96d09f8).

However, I think there's still room for improvement. I will continue work on this issue.

Cheers,
Michel

Owner

michel-kraemer commented Oct 21, 2014

OK. I was able to improve the performance a lot by replacing ByteArrayInputStream and BufferedInputStream from the JDK by my own implementations that are not thread safe (96d09f8).

However, I think there's still room for improvement. I will continue work on this issue.

Cheers,
Michel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment