Skip to content
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.

how was the failure handing in server side? #3

Closed
YongCHN opened this issue Nov 10, 2015 · 4 comments
Closed

how was the failure handing in server side? #3

YongCHN opened this issue Nov 10, 2015 · 4 comments
Labels

Comments

@YongCHN
Copy link

YongCHN commented Nov 10, 2015

if i use the mpi, how does multiverse handles the failure of the server? does it support the failover?

@chivee
Copy link
Contributor

chivee commented Nov 10, 2015

Considering MPI mechanism, fail tolerance is an important feature for us. in the current version, we don't offer an API for checkpoint. you can do an quick hacking on the client side( dumping whole parameter for couples of EPOCH). we will add this feature soon.

@YongCHN
Copy link
Author

YongCHN commented Nov 11, 2015

how about the zeromq? was the failover supported? thanks.

@feiga
Copy link
Contributor

feiga commented Nov 11, 2015

Not support yet.

@YongCHN YongCHN closed this as completed Nov 11, 2015
@chivee chivee reopened this Nov 11, 2015
@chivee
Copy link
Contributor

chivee commented Nov 11, 2015

i think it necessary to add failure recovery, lets' remain this to open.

@chivee chivee closed this as completed Nov 13, 2015
chivee pushed a commit that referenced this issue Apr 14, 2016
yarn ApplicationMaster for multiverso_zmq
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants