Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid casting with numpy() in multiprocessing.py #19945

Closed
Peiffap opened this issue Jun 4, 2024 · 1 comment · Fixed by #20005
Closed

Avoid casting with numpy() in multiprocessing.py #19945

Peiffap opened this issue Jun 4, 2024 · 1 comment · Fixed by #20005
Labels
help wanted Open to be worked on refactor

Comments

@Peiffap
Copy link
Contributor

Peiffap commented Jun 4, 2024

Outline & Motivation

Currently, get_extra_results() casts callback metrics to numpy to avoid problems with memory sharing:

callback_metrics: dict = apply_to_collection(
trainer.callback_metrics, Tensor, lambda x: x.cpu().numpy()
) # send as numpy to avoid issues with memory sharing
return {"callback_metrics": callback_metrics}

Then update_main_process_results() casts back to Tensor:
# NOTE: `get_extra_results` needs to be called before
callback_metrics = extra["callback_metrics"]
trainer.callback_metrics.update(apply_to_collection(callback_metrics, np.ndarray, lambda x: torch.tensor(x)))

It would be neater (and part of a greater goal of not depending on the numpy package, see #16649) to avoid this trick.

Pitch

Remove the cast to numpy without introducing errors. Remove the numpy depencency in multiprocessing.py.

Additional context

As discussed in #19841 (ref.).

cc @justusschock @awaelchli

@Peiffap Peiffap added needs triage Waiting to be triaged by maintainers refactor labels Jun 4, 2024
@awaelchli awaelchli added help wanted Open to be worked on and removed needs triage Waiting to be triaged by maintainers labels Jun 4, 2024
@Sar2580P
Copy link

Sar2580P commented Jun 7, 2024

I visited the code of get_extra_results() . It solely converts the torch tensors to detached numpy arrays. If we remove usage of numpy in it, then it means that function reduces to below body :

def get_extra_results(self, trainer: "pl.Trainer") -> Dict[str, Any]:
        """Gather extra state from the Trainer and return it as a dictionary for sending back to the main process. To
        avoid issues with memory sharing, we cast the data to numpy.

        Args:
            trainer: reference to the Trainer.

        Returns:
            A dictionary with items to send back to the main process where :meth:`update_main_process_results` will
            process this output.

        """
        return {"callback_metrics": trainer.callback_metrics}

I think we can use simple Python instances like list , dict , etc ... to avoid using numpy ....
For that case, the better function to work with would be apply_to_collection

Pls share your thoughts....
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Open to be worked on refactor
Projects
None yet
3 participants