Added at_train_end logic to base pipeline #1932

maturk · 2023-05-16T12:18:32Z

Here I have added at_train_end() logic to the base pipeline. To accommodate various different user needs, I added similar methods to the base_datamanager and base_model which are callable with **kwargs.

Here is a small example of how I am personally using it within my custom pipeline for color correction:

    def at_train_end(self) -> None:
        self.eval()
        camera_ray_bundle, batch = self.datamanager.at_train_end()
        preds, refs = self.model.at_train_end(camera_ray_bundle=camera_ray_bundle, batch=batch)
        cc_images = self.color_correct(img=preds.cpu(), ref=refs.cpu())
        save_image((cc_images *255).astype(np.uint8), self.save_path, log=True)

tancik

We should either make sure that the datamanager and model train_end are called, or remove them and leave it to the user to add such a function and call it in the pipeline.

tancik · 2023-05-16T16:51:19Z

nerfstudio/data/datamanagers/base_datamanager.py

+    def at_train_end(self, **kwargs: Any) -> Optional[Any]:  # pylint: disable=unused-argument disable=no-self-use
+        """Called at end of training for optional datamanager outputs."""
+


This is never called, I worry that users will override this function and expect it to be called.

tancik · 2023-05-16T16:51:28Z

nerfstudio/models/base_model.py

+
+    def at_train_end(self, **kwargs: Any) -> Optional[Any]:  # pylint: disable=unused-argument disable=no-self-use
+        """Called at end of training for optional model outputs."""


Also never called, same potential confusion as above.

jkulhanek · 2023-05-16T16:58:05Z

Wouldn’t it be better to implement this using a new callback type? That way no code would have to be added to the model and the datamanager. Also, shouldn’t it be “on_train_end” instead of at..?

maturk · 2023-05-17T08:53:23Z

Thanks for the feedback again. I have added a new callback type OnTrainEndCallback. I removed unnecessary on_train_end calls from base model and base pipeline. This callback has access to the pipeline object. I also made naming consistent "on train end".

Here is an example of how I am using the new callback type within my custom pipeline similar to my previous implementation:

    def get_on_train_end_callbacks(self) -> List[OnTrainEndCallback]:
        on_train_end_callbacks = []
        def color_correct_images():
            self.eval()
            camera_ray_bundle, batch = self.datamanager.at_train_end() # No longer in base datamanager, just a user defined function
            preds, refs = self.model.at_train_end(camera_ray_bundle=camera_ray_bundle, batch=batch) # No longer in base model, just a user defined function
            preds = self.color_correct(img=preds.cpu(), ref=refs.cpu())
            save_image((preds *255).astype(np.uint8), self.cc_save_paths, log=True)
        
        on_train_end_callbacks.append(
            OnTrainEndCallback(func=color_correct_images)
        )
        return on_train_end_callbacks

jkulhanek · 2023-05-17T14:18:03Z

I am very sorry I didn't explain it clearly. I meant the already existing callback infrastructure. ...registering the callback here:

nerfstudio/nerfstudio/engine/callbacks.py

Line 42 in 8b9574b

class TrainingCallbackLocation(Enum):

, etc...

jkulhanek · 2023-05-17T14:18:37Z

I think that would integrate in a more concise way with the rest of the code.

maturk · 2023-05-17T20:11:20Z

Whew, no problem :) Happy to learn, maybe third time is the the charm! So I integrated it into the existing callback locations. Only thing I am not sure about is regarding the "step" var when calling run_callback_at_location(step,location) at the end of training. There is really no way to know if step==max-train-iterations inside the TrainingCallback class since this is not exposed. I have set step=None for now to signify that this is being called at train end.

jkulhanek · 2023-05-19T08:45:15Z

nerfstudio/engine/callbacks.py

@@ -44,6 +44,7 @@ class TrainingCallbackLocation(Enum):

    BEFORE_TRAIN_ITERATION = auto()
    AFTER_TRAIN_ITERATION = auto()
+    ON_TRAIN_END = auto()


Can we name it AFTER_TRAIN_END?

Or perhaps "AFTER _TRAIN"?

I have have now renamed it to AFTER_TRAIN

jkulhanek · 2023-05-19T08:45:35Z

nerfstudio/engine/trainer.py

@@ -297,6 +297,10 @@ def train(self) -> None:
        table.add_row("Checkpoint Directory", str(self.checkpoint_dir))
        CONSOLE.print(Panel(table, title="[bold][green]:tada: Training Finished :tada:[/bold]", expand=False))

+        # on train end callbacks
+        for callback in self.callbacks:
+            callback.run_callback_at_location(step=None, location=TrainingCallbackLocation.ON_TRAIN_END)


Can you pass the actual step here?

jkulhanek · 2023-05-19T08:45:49Z

nerfstudio/engine/callbacks.py


-    def run_callback_at_location(self, step: int, location: TrainingCallbackLocation) -> None:
+    def run_callback_at_location(self, step: Union[int, None], location: TrainingCallbackLocation) -> None:


Please keep the signature as int

jkulhanek · 2023-05-19T08:46:06Z

nerfstudio/engine/callbacks.py

        self.where_to_run = where_to_run
        self.update_every_num_iters = update_every_num_iters
        self.iters = iters
        self.func = func
        self.args = args if args is not None else []
        self.kwargs = kwargs if kwargs is not None else {}

-    def run_callback(self, step: int) -> None:
+    def run_callback(self, step: Union[int, None]) -> None:


Please keep int here.

jkulhanek · 2023-05-19T08:46:20Z

nerfstudio/engine/callbacks.py

@@ -71,15 +72,15 @@ def __init__(
    ):
        assert (
            "step" in signature(func).parameters.keys()
-        ), f"'step: int' must be an argument in the callback function 'func': {func.__name__}"
+        ), f"'step: Union[int, None]' must be an argument in the callback function 'func': {func.__name__}"


Same here, please keep int

…main

tancik

LGTM

maturk and others added 3 commits April 28, 2023 10:33

process additional images

4c3b0ab

Merge branch 'nerfstudio-project:main' into main

b57bbf7

Merge branch 'nerfstudio-project:main' into main

c2eb9b8

SauravMaheshkar assigned maturk May 16, 2023

SauravMaheshkar added python Pull requests that update Python code quality of life enhancement New feature or request labels May 16, 2023

tancik reviewed May 16, 2023

View reviewed changes

Merge branch 'main' of github.com:maturk/nerfstudio into main

777c76e

jkulhanek reviewed May 19, 2023

View reviewed changes

maturk added 9 commits May 22, 2023 10:22

Merge branch 'main' of github.com:nerfstudio-project/nerfstudio into …

ee235cc

…main

added on_train_end to base pipeline

52aedd3

added on_train_end to base pipeline

c1d6300

remove on_train_end from Base Model and Base Datamanager

2ed5361

Add new callback type OnTrainEndCallback

f643f5f

return empty list as default

61c193e

add on_train_end to trainingcallbacklocation enum

f5e5ce0

make step var into union of int and None

6bfae35

after train end callback

bb67a6c

maturk force-pushed the at_train_end branch from 3aac1a1 to bb67a6c Compare May 22, 2023 08:59

maturk added 3 commits May 22, 2023 12:01

Delete unnecessary file

72bf464

Delete unnecessary file

a92ee28

Delete unnecessary file

da92a92

tancik approved these changes May 23, 2023

View reviewed changes

Merge branch 'main' into at_train_end

02d62d7

tancik merged commit 33d95f3 into nerfstudio-project:main May 23, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added at_train_end logic to base pipeline #1932

Added at_train_end logic to base pipeline #1932

maturk commented May 16, 2023

tancik left a comment

tancik May 16, 2023

tancik May 16, 2023

jkulhanek commented May 16, 2023

maturk commented May 17, 2023 •

edited

jkulhanek commented May 17, 2023

jkulhanek commented May 17, 2023

maturk commented May 17, 2023

jkulhanek May 19, 2023

jkulhanek May 19, 2023

maturk May 22, 2023

jkulhanek May 19, 2023

jkulhanek May 19, 2023

jkulhanek May 19, 2023

jkulhanek May 19, 2023

tancik left a comment

		def at_train_end(self, **kwargs: Any) -> Optional[Any]: # pylint: disable=unused-argument disable=no-self-use
		"""Called at end of training for optional datamanager outputs."""


		def run_callback_at_location(self, step: int, location: TrainingCallbackLocation) -> None:
		def run_callback_at_location(self, step: Union[int, None], location: TrainingCallbackLocation) -> None:

Added at_train_end logic to base pipeline #1932

Added at_train_end logic to base pipeline #1932

Conversation

maturk commented May 16, 2023

tancik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkulhanek commented May 16, 2023

maturk commented May 17, 2023 • edited

jkulhanek commented May 17, 2023

jkulhanek commented May 17, 2023

maturk commented May 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tancik left a comment

Choose a reason for hiding this comment

maturk commented May 17, 2023 •

edited