Avoid casting with `numpy()` in `multiprocessing.py` #19945

Peiffap · 2024-06-04T23:24:25Z

Outline & Motivation

Currently, get_extra_results() casts callback metrics to numpy to avoid problems with memory sharing:

pytorch-lightning/src/lightning/pytorch/strategies/launchers/multiprocessing.py

Lines 239 to 242 in 5fa32d9

    
           callback_metrics: dict = apply_to_collection( 
        
               trainer.callback_metrics, Tensor, lambda x: x.cpu().numpy() 
        
           )  # send as numpy to avoid issues with memory sharing 
        
           return {"callback_metrics": callback_metrics}

Then update_main_process_results() casts back to Tensor:

pytorch-lightning/src/lightning/pytorch/strategies/launchers/multiprocessing.py

Lines 254 to 256 in 5fa32d9

    
           # NOTE: `get_extra_results` needs to be called before 
        
           callback_metrics = extra["callback_metrics"] 
        
           trainer.callback_metrics.update(apply_to_collection(callback_metrics, np.ndarray, lambda x: torch.tensor(x)))

It would be neater (and part of a greater goal of not depending on the numpy package, see #16649) to avoid this trick.

Pitch

Remove the cast to numpy without introducing errors. Remove the numpy depencency in multiprocessing.py.

Additional context

As discussed in #19841 (ref.).

cc @justusschock @awaelchli

The text was updated successfully, but these errors were encountered:

Sar2580P · 2024-06-07T11:35:15Z

I visited the code of get_extra_results() . It solely converts the torch tensors to detached numpy arrays. If we remove usage of numpy in it, then it means that function reduces to below body :

def get_extra_results(self, trainer: "pl.Trainer") -> Dict[str, Any]:
        """Gather extra state from the Trainer and return it as a dictionary for sending back to the main process. To
        avoid issues with memory sharing, we cast the data to numpy.

        Args:
            trainer: reference to the Trainer.

        Returns:
            A dictionary with items to send back to the main process where :meth:`update_main_process_results` will
            process this output.

        """
        return {"callback_metrics": trainer.callback_metrics}

I think we can use simple Python instances like list , dict , etc ... to avoid using numpy ....
For that case, the better function to work with would be apply_to_collection

Pls share your thoughts....
Thanks

Peiffap added needs triage Waiting to be triaged by maintainers refactor labels Jun 4, 2024

awaelchli added help wanted Open to be worked on and removed needs triage Waiting to be triaged by maintainers labels Jun 4, 2024

awaelchli mentioned this issue Jun 22, 2024

Convert tensors to bytes instead of numpy in multiprocessing result-queue #20005

Merged

awaelchli closed this as completed in #20005 Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid casting with `numpy()` in `multiprocessing.py` #19945

Avoid casting with `numpy()` in `multiprocessing.py` #19945

Peiffap commented Jun 4, 2024 •

edited

Loading

Sar2580P commented Jun 7, 2024

Avoid casting with numpy() in multiprocessing.py #19945

Avoid casting with numpy() in multiprocessing.py #19945

Comments

Peiffap commented Jun 4, 2024 • edited Loading

Outline & Motivation

Pitch

Additional context

Sar2580P commented Jun 7, 2024

Avoid casting with `numpy()` in `multiprocessing.py` #19945

Avoid casting with `numpy()` in `multiprocessing.py` #19945

Peiffap commented Jun 4, 2024 •

edited

Loading