Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup #30

Merged
merged 22 commits into from
Mar 28, 2017
Merged

Speedup #30

merged 22 commits into from
Mar 28, 2017

Conversation

altaite
Copy link
Contributor

@altaite altaite commented Mar 6, 2017

Several changes in decoder, resulting in about 4x speedup of BioJava MMTF parsing. Largest change (but not most significant in terms of time) is replacing Jackson with Jmol version of MessagePack parsing (sources were extracted, no dependency). A test for that was created, comparing objects created by both methods.

@coveralls
Copy link

Coverage Status

Coverage decreased (-17.9%) to 74.114% when pulling 1570040 on altaite:speedup into 7d9a709 on rcsb:master.

@josemduarte
Copy link
Member

josemduarte commented Mar 7, 2017

Thanks. I'm trying to review it, give me some time. As an advice for next time, please don't do code reformatting at the same time that you do important changes. The diffs as they are shown are not very helpful.

Having said that, there's a really nice trick to ignore whitespaces in the diff that will make them more readable: add a ?w=1 at the end of the URL (see https://github.com/blog/967-github-secrets). For this PR's diff it would be: https://github.com/rcsb/mmtf-java/pull/30/files?w=1

Copy link
Member

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like the file pdb_entry_type.txt shouldn't be here, it's quite a large file. Is it needed?

return new GenericDecoder(ReaderUtils.getDataFromFile(inFile));
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not your change, but it'd be nice to fix: don't catch the exception, just declare a "throws IOException" and error handling will be much clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text files are in gz now.

mmtf-api/pom.xml Outdated
<parent>
<groupId>org.rcsb</groupId>
<artifactId>mmtf</artifactId>
<version>1.0.5-SNAPSHOT</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change the version here, SNAPSHOT versions are for development. Versions in pom should only change when a release happens. Same applies to other pom files.

lines.add(line);
}
return lines;
} catch (IOException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to declare the exception as thrown in the method signature. Error handling will be easier like that

}
return codes;
} catch (Exception ex) {
throw new RuntimeException(ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just declare the exception as throws in signature. It will have the same effect and will be clearer

MessagePackSerialization.setJackson(true);
StructureDataInterface sdiJackson = parse(bytes);

ReflectionAssert.assertReflectionEquals(sdiJackson, sdiJmol);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use this assert method instead of the standard assert from the junit library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert performs deep comparison, while standard assert just checks boolean.

* creation gives identical data.
*
*/
public class MessagePackTest extends TestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use junit4. It doesn't require extending TestCase, just adding the @test annotation to the method you want to make into a test

<groupId>org.unitils</groupId>
<artifactId>unitils-core</artifactId>
<version>3.4.2</version>
<scope>test</scope>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? Can't we just use junit4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the deep comparison of objects.

mmtfBean = objectMapper.readValue(inputStream, MmtfStructure.class);
} catch (IOException e) {
e.printStackTrace();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception should be declared in signature and not caught

return s;
} catch (Exception e) {
throw new RuntimeException(e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare IOException in signature

try {
objectMapper.writeValue(outputStream, mmtfStructure);
} catch (IOException e) {
e.printStackTrace();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't catch the exception, declare it in signature

@josemduarte
Copy link
Member

One concern is the 17.9% drop in test coverage, I guess because of the new message pack decoding code from jmol that has no tests. Do you have some metrics on the performance difference of the new decoder versus the msgpack library? It's a pity to lose a well-tested library in favor of an ad-hoc decoder.

Anyway thanks a lot for the patch! looks good in general. If you could address the comments that'd be great.

@coveralls
Copy link

Coverage Status

Coverage decreased (-17.9%) to 74.085% when pulling 3822c89 on altaite:speedup into 7d9a709 on rcsb:master.

RuntimeExceptions by IOException and try with resources.
}
}

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need gzipping rather than zipping here.

This is how you do it with gzipping:

public static void gzipFile(File inFile, File outFile) throws IOException {
		GZIPOutputStream zos = new GZIPOutputStream(new FileOutputStream(outFile));
		FileInputStream is = new FileInputStream(inFile);

		int b;
		while ( (b=is.read())!=-1) {
			zos.write(b);
		}
		zos.close();
		is.close();
	}

And ungzip is:

public static void gunzipFile(File inFile, File outFile) throws IOException {
		GZIPInputStream zis = new GZIPInputStream(new FileInputStream(inFile));
		FileOutputStream os = new FileOutputStream(outFile);
		int b;
		while ( (b=zis.read())!=-1) {
			os.write(b);
		}
		zis.close();
		os.close();
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that my code is probably faster (buffer). And more importantly, it is easier to prepare the resource, because I do not need to do it with Java. I just do it by file manager or command line, which is far more convenient. Also I would leave it because it does not matter much I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is zip vs gzip, not so much about efficiency. Zip is an archive format (to put a bunch of files together), while gzip is purely a compression format.

You can also do gzip easily from command line with tools like gzip and gunzip, so that's also not a concern.

previous was failing on Windows again. Going to use URL elsewhere 
too in the next commit.
whenever possible (simpler and more reliable), File left only when 
necessary for API tests.
@altaite
Copy link
Contributor Author

altaite commented Mar 23, 2017

Concerning test coverage (I do not see way how to comment directly):
There is a test which allows to compare the new code creates exactly the same objects as Jackson, on the whole PDB: MessagePackTest. But it must be in different subproject mmtf-codec because of dependencies. Is there some easy way how to connect this to the code, so that coverage would be better? It is integration test, but it test equality of Jackson and new code completely, if you run it on whole database.

@josemduarte
Copy link
Member

Thanks for all the changes! the only thing missing is putting back the version to 1.0.4-SNAPSHOT. After that I think we should merge. By the way there's a merge conflict now, could you take care of that too? (just merge into this branch the latest master). The conflict should be minimal, the only file that diverged since you branched is mmtf-codec/src/main/java/org/rcsb/mmtf/decoder/ReaderUtils.java

*
*/

public class SB {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just found this one: could we get rid of it? This looks like a replacement for StringBuilder needed in jmol so that the java to js transpiler is happy. We don't need it here, let's replace it by StringBuilder

*
* @author Antonin Pavelka
*/
public class ObjectTree {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have some documentation for this class? or at least change the method names to something more usable, one letter method names without javadocs are going to be difficult to understand....

}
}

public Hashtable l(String s) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't seem to be used. Do we need it?

*
* @author Antonin Pavelka
*/
public class MmtfStructureFactory {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some javadocs to explain what this class does? The individual methods are more self-explanatory

…after 12/01/2016). Increased -Xmx for

testing from 1500 to 6000 MB because sometimes GC timeout occured in MessagePackTest. Improved comments on the MessagePack parser.
@coveralls
Copy link

Coverage Status

Coverage decreased (-11.6%) to 80.377% when pulling cd39c88 on altaite:speedup into 8aa07a8 on rcsb:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-11.5%) to 80.479% when pulling 457bf2c on altaite:speedup into 8aa07a8 on rcsb:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-11.4%) to 80.58% when pulling 07425c1 on altaite:speedup into 8aa07a8 on rcsb:master.

Copy link
Member

@josemduarte josemduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good now, thank you so much for all the changes. Just one final detail: I'd propose method names for ObjectTree like getString, getInteger, getByteArray etc. As I said, single letter names make things difficult to understand. Let's merge after you refactor those, it should only take a couple of minutes.

@coveralls
Copy link

Coverage Status

Coverage decreased (-11.4%) to 80.58% when pulling 72e468e on altaite:speedup into 8aa07a8 on rcsb:master.

@josemduarte josemduarte merged commit 2978574 into rcsb:master Mar 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants