Save best model state in CPU memory instead of file (#224)
Saving to file still has a potential race condition issue if we're training ensemble models in parallel, this diff changes the behavior to save it in cpu memory

