Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Don't initialize cursor and encoder during constructor. This improves p... #58

wants to merge 1 commit into


None yet
2 participants

bpfoster commented Aug 22, 2012

...erformance of input split calculation.
Calling getCursor() in the constructor for every split is slow, especially when auth is enabled. getCursor subsequently calls MongoConfigUtil.getCollection() which in the case of an authenticated cluster, causes an authentication to happen for each split. In an example case of 572 splits, calculation of splits went from 20-30 seconds to < 1 second. The eager loading of the cursor seems pointless anyway, since the splits are serialized out to the processing nodes before any records are pulled.


visualzhou commented Dec 12, 2013

Cursor initialization has been moved out into MongoRecordReader and the splitter has been refactored. Sorry for the late response on this. Thanks for all your contributions to the project!

@visualzhou visualzhou closed this Dec 12, 2013

@bpfoster bpfoster deleted the bpfoster:split-creation-lazyload branch Jan 10, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment