fft_method pyfftw causes unexpected noise additions when using multi-threading #337

RubenImhoff · 2023-08-17T13:27:32Z

If I run an ensemble using multiple cores (so, dask will parallellize the ensemble members over the cores), it seems that the ensemble order is lost, resulting in weird transitions (due to noise from a different member that ends up in that member). See for instance (these are three 15-min instance in the forecast):

Or maybe even clearer:

If I run on 1 core, this problem does not occur, so it gives the impression that this has to do with parallellizing using dask. After having contact with @mpvginde, I could not reproduce the error with the Belgian test case that we have (the figures above are with our Dutch data and setup). Only difference between our setups turned out to be the setting that 'fft_method' = pyfftw instead of numpy in the Belgian setup. After changing this, the problems disappeared when running multi-threaded. This gives the impression that pyfftw should not be used for nowcasting and blended forecasting when using more than 1 worker/thread.
Is this a familiar issue to you and is there anything we can do about it?

RubenImhoff · 2023-11-07T14:56:04Z

Update on this possible bug: I see that it still comes back in the STEPS nowcasts (not the blending anymore) with both pyfftw and numpy as fft_methods. Seems that in the newer Dask versions, we have to more explicitly pin the activities to specific cores and keep track of it. For the nowcast loop this still goes well multi-threaded, provided that numpy is used as method, but for all other multi-threaded processes (cascade decomposition, noise initialization, etc.) it seems to go wrong. We should either fix those options to one worker or find a different solution, I'm afraid.

dnerini · 2023-11-07T19:13:43Z

Hi @RubenImhoff thanks for the update. Would it be possible to get a better idea of the changes you are suggesting? If I understand well, one option would be to explicitly set some arguments for dask, right?

RubenImhoff · 2023-11-08T06:33:20Z

Hi @dnerini, of course. The simple solution is to only use one worker (thread) for the parts where it goes wrong. I have tested it by fixing num_workers to 1 in the steps.py code in the nowcasting module, except for the nowcasting main loop:

precip_forecast = nowcast_main_loop(
        precip,
        velocity,
        state,
        timesteps,
        extrap_method,
        _update,
        extrap_kwargs=extrap_kwargs,
        velocity_pert_gen=velocity_perturbators,
        params=params,
        ensemble=True,
        num_ensemble_members=n_ens_members,
        callback=callback,
        return_output=return_output,
        num_workers=num_ensemble_workers,
        measure_time=measure_time,
    )

where num_workers can be > 1.

I think ideally, we would make full use of dask. In that case, we would have to pin the work to specific cores or so. I believe that is possible in Dask too, but I have no experience with it. Maybe you do or @pulkkins?

…o dask multi-threading (pySTEPS#337)

mpvginde · 2024-02-09T10:48:50Z

Hi everyone,
I think I found the issue:
In the _update helper function in nowcasts.steps the numpy-array holding the final forecasted precipation: precip_forecast_out is created from a list which is initialized at the start of the _update function:

pysteps/pysteps/nowcasts/steps.py

Lines 708 to 709 in be8eea4

    
           def _update(state, params): 
        
               precip_forecast_out = []

The workers inside the _update function then append their result to this list:

pysteps/pysteps/nowcasts/steps.py

Lines 822 to 831 in be8eea4

    
           if params["mask_method"] == "incremental": 
        
               state["mask_prec"][j] = nowcast_utils.compute_dilated_mask( 
        
                   precip_forecast >= params["precip_thr"], 
        
                   params["struct"], 
        
                   params["mask_rim"], 
        
               ) 
        
           precip_forecast[params["domain_mask"]] = np.nan 
        
           precip_forecast_out.append(precip_forecast)

Finally the list is converted to a numpy array:

pysteps/pysteps/nowcasts/steps.py

Lines 833 to 846 in be8eea4

    
           if ( 
        
               DASK_IMPORTED 
        
               and params["n_ens_members"] > 1 
        
               and params["num_ensemble_workers"] > 1 
        
           ): 
        
               res = [] 
        
               for j in range(params["n_ens_members"]): 
        
                   res.append(dask.delayed(worker)(j)) 
        
               dask.compute(*res, num_workers=params["num_ensemble_workers"]) 
        
           else: 
        
               for j in range(params["n_ens_members"]): 
        
                   worker(j) 
        
           return np.stack(precip_forecast_out), state

When dask is not used their is no concern since the workers will always be triggered in the same order. But when dask is used the order in which the workers are triggered is quite random. This will cause that during the _update the ensemble members might get shuffled.

I have proposed a fix in PR #347.

@RubenImhoff could you check if this solves your issues?

* Bugfix: fix random placement of ensemble members in numpy array due to dask multi-threading (#337) * Bugfix: make STEPS (blending) nowcast reproducable when the seed argument is given (#346) * Bugfix: make STEPS (blending) nowcast reproducable, independent of number of workers (#346) * Formatting with black --------- Co-authored-by: ned <daniele.nerini@meteoswiss.ch>

RubenImhoff · 2024-04-09T14:27:11Z

After some testing, I can confirm that #347 fixes the issue. :)

RubenImhoff added the bug Something isn't working label Aug 17, 2023

RubenImhoff assigned dnerini and pulkkins Aug 17, 2023

RubenImhoff mentioned this issue Aug 17, 2023

Parallellized ensemble run seems to lose track of ensemble number and order mpvginde/pysteps#20

Closed

mpvginde mentioned this issue Feb 9, 2024

No reproducibility of steps blended nowcast when using noise_stddev_adj='auto' #346

Closed

mpvginde added a commit to mpvginde/pysteps that referenced this issue Feb 9, 2024

Bugfix: fix random placement of ensemble members in numpy array due t…

324419e

…o dask multi-threading (pySTEPS#337)

mpvginde mentioned this issue Feb 9, 2024

Bugfix reproducibility & ensemble member order with dask #347

Merged

RubenImhoff closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fft_method pyfftw causes unexpected noise additions when using multi-threading #337

fft_method pyfftw causes unexpected noise additions when using multi-threading #337

RubenImhoff commented Aug 17, 2023

RubenImhoff commented Nov 7, 2023

dnerini commented Nov 7, 2023

RubenImhoff commented Nov 8, 2023

mpvginde commented Feb 9, 2024 •

edited

Loading

RubenImhoff commented Apr 9, 2024

fft_method pyfftw causes unexpected noise additions when using multi-threading #337

fft_method pyfftw causes unexpected noise additions when using multi-threading #337

Comments

RubenImhoff commented Aug 17, 2023

RubenImhoff commented Nov 7, 2023

dnerini commented Nov 7, 2023

RubenImhoff commented Nov 8, 2023

mpvginde commented Feb 9, 2024 • edited Loading

RubenImhoff commented Apr 9, 2024

mpvginde commented Feb 9, 2024 •

edited

Loading