Integrate callback functionality into elephas (History, checkpoints, etc.) #131

sd12832 · 2019-02-10T23:57:05Z

https://github.com/keras-team/keras/blob/master/keras/callbacks.py#L341

The fit function in Keras returns a graph that can be used to determine if the model is overfitting or not. This would be very useful from Elephas.

maxpumperla · 2019-02-11T07:10:38Z

@sd12832 very interesting idea. with elephas, each of the N workers would receive a callback instance. The question is how to consolidate callback data once training is done, i.e. after we send updates back to the master network. A list of callbacks? Suggestions?

sd12832 · 2019-02-11T09:04:47Z

@maxpumperla Maybe each worker gets their own history based on asynchronous callbacks, with an option to average the values among all the workers? We can follow the keras code of having a callback base class that is inherited into different purposes.

In terms of the list of callbacks, wouldn't n asynchronous callbacks suffice? Pardon my lack of understanding of the elephas code, I have not looked at it in too much detail.

I would very much like to accelerate the building of this feature by helping as much as I can. Please guide me on how to do so!

maxpumperla · 2019-02-11T09:14:15Z

@sd12832 thanks for your feedback. yeah, so the first step is to init the master network with a callback, serialize it so that elephas can ship it to workers, and then deserialize the callback and set it to each of the N worker networks. that's the straightforward part (in a way). The question is what to with the callback data. For model history you basically accumulate logs, right? hence my specific, naive suggestion of lists.

Now, what do you do about things like early stopping? Do you want individual workers to stop, or would you rather have the whole training process stopped? In the latter case the callback on master is enough and we just need to evaluate the callback properly.

Come to think of it, maybe this is what you want. I.e. do you just want to register callbacks on the master network? In the end, it doesn't really matter where the updates come from.

You see, there are a lot of questions :D

maxpumperla · 2019-02-11T09:20:12Z

after all, elephas' master network is a keras model, which can have callbacks. it may really just be a question of how to incorporate this into the top level API of elephas.

sd12832 · 2019-02-11T23:25:23Z

So this is my understanding of how Elephas works right now (please correct me if I'm wrong). We can just serialize the callbacks, which are registered on the master network, and ship it to the workers. Once we get the data from callbacks(that are grouped together?), we can accumulate the data.

I think early stopping would result in the entire training process to stop, right? Why would we want early stopping to happen on a single node? Therefore, the callback on the master should be enough?

I think the top level API should be working in the same manner as the Keras with automatically getting the history back from a fit function and the ability to add callbacks.

sd12832 · 2019-02-12T20:10:24Z

Any updates? I can start exploring the code today.

maxpumperla · 2019-02-14T08:51:32Z

@sd12832 I can barely follow my github notifications these days, sorry. so no "updates" from me. But your reasoning sounds good, if you poke around a little and see what we can do, I'm happy to discuss here. Let's come up with a plan first and execute in a bit (I can help).

Ben-Epstein · 2020-05-14T15:19:24Z

Hello, are there any updates on this? I think this is a crucial tool

danielenricocahall · 2021-01-15T00:20:35Z

I will start working on this for the next release, as it seems like a useful and desired feature!

OscarDPan · 2021-02-12T03:04:27Z

Hi @danielenricocahall I was wondering if you are at this project yet? I think for now Elephas doesn't support any Callback? I was thinking if I can help with adding some high level integration to accept Callback objects? If yes maybe I can start working on it sometime next week.

danielenricocahall · 2021-02-12T11:17:13Z

I started working on it but I've been a bit busy - nothing too groundbreaking, just playing with the idea of making the callback objects a broadcast variables. There were a few minor bumps I hit that I haven't worked around yet - I can push up my branch over the weekend in case you want to use it for reference, or if you want to collaborate on things.

hassanmehmud · 2022-08-04T14:20:24Z

Hello, is the feature working now ? how can we get the loss with every epoch using Pipelines?

Thank you for support

danielenricocahall · 2022-10-11T13:07:42Z

Moved this issue to the new fork: danielenricocahall#9. Closing for now but still on the radar!

sd12832 mentioned this issue Feb 11, 2019

Integrate Keras based model checkpoint based callbacks #130

Closed

maxpumperla changed the title ~~Add the functionality to get the history from the fit function~~ Integrate callback functionality into elephas (History, checkpoints, etc.) Feb 11, 2019

danielenricocahall mentioned this issue Jan 19, 2021

How to show log on every epoch ? #40

Closed

danielenricocahall mentioned this issue Oct 11, 2022

Integrate Callbacks danielenricocahall/elephas#9

Open

danielenricocahall closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate callback functionality into elephas (History, checkpoints, etc.) #131

Integrate callback functionality into elephas (History, checkpoints, etc.) #131

sd12832 commented Feb 10, 2019

maxpumperla commented Feb 11, 2019

sd12832 commented Feb 11, 2019 •

edited

Loading

maxpumperla commented Feb 11, 2019

maxpumperla commented Feb 11, 2019

sd12832 commented Feb 11, 2019

sd12832 commented Feb 12, 2019

maxpumperla commented Feb 14, 2019

Ben-Epstein commented May 14, 2020

danielenricocahall commented Jan 15, 2021

OscarDPan commented Feb 12, 2021

danielenricocahall commented Feb 12, 2021 •

edited

Loading

hassanmehmud commented Aug 4, 2022

danielenricocahall commented Oct 11, 2022

Integrate callback functionality into elephas (History, checkpoints, etc.) #131

Integrate callback functionality into elephas (History, checkpoints, etc.) #131

Comments

sd12832 commented Feb 10, 2019

maxpumperla commented Feb 11, 2019

sd12832 commented Feb 11, 2019 • edited Loading

maxpumperla commented Feb 11, 2019

maxpumperla commented Feb 11, 2019

sd12832 commented Feb 11, 2019

sd12832 commented Feb 12, 2019

maxpumperla commented Feb 14, 2019

Ben-Epstein commented May 14, 2020

danielenricocahall commented Jan 15, 2021

OscarDPan commented Feb 12, 2021

danielenricocahall commented Feb 12, 2021 • edited Loading

hassanmehmud commented Aug 4, 2022

danielenricocahall commented Oct 11, 2022

sd12832 commented Feb 11, 2019 •

edited

Loading

danielenricocahall commented Feb 12, 2021 •

edited

Loading