Background loading / data pre-fetching #6

scottstanie · 2023-01-31T16:08:48Z

Currently the data IO might be the largest percentage of the total run time prior to unwrapping.

We should use something like https://github.com/isce-framework/isce3/blob/develop/python/packages/isce3/io/background.py to pre-fetch data blocks

scottstanie · 2023-02-01T18:06:21Z

Pre-fetching is probably going to be a big speedup:

From a test run with

2 bursts
15 dates per burst
no unwrapping
20 meter output spacing (strides = {x:4, y:2})
using the GPU

It took 15 minutes total (7.5 minutes per burst),

[02/01 09:53:03] [INFO _log.py] Total elapsed time for run : 15.51 minutes (930.62 seconds)
         3997323 function calls (3836454 primitive calls) in 931.307 seconds

...where each PS step took ~160 seconds (5 min 20 seconds total for 2 bursts), meaning that the wrapped phase step took a little over 4 minutes per burst.
The fact that the PS step, where it does just a few simple number nanmean/nanstds, is close in runtime to the phase linking step means that #24 should also be a big speed up.
BUT, that won't be as relevant for the online-updating PS where we'd only pull and iterate over 1-2 SLCs to get a new PS map.

The top time spent:

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      410  490.677    1.197  490.677    1.197 {built-in method osgeo._gdal_array.DatasetIONumPy}
     3682   70.823    0.019   70.823    0.019 {built-in method osgeo._gdal_array.BandRasterIONumPy}
     3624   69.144    0.019   69.144    0.019 {built-in method osgeo._gdal.Band_FlushCache}
        2   29.233   14.616  323.911  161.955 ps.py:18(create_ps)
     1047   21.461    0.020   21.461    0.020 {built-in method numpy.array}

(More of the results of python -m cProfile -s time `which dolphin` run dolphin_config.yaml)

[02/01 09:53:03] [INFO _log.py] Total elapsed time for run : 15.51 minutes (930.62 seconds)
         3997323 function calls (3836454 primitive calls) in 931.307 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      410  490.677    1.197  490.677    1.197 {built-in method osgeo._gdal_array.DatasetIONumPy}
     3682   70.823    0.019   70.823    0.019 {built-in method osgeo._gdal_array.BandRasterIONumPy}
     3624   69.144    0.019   69.144    0.019 {built-in method osgeo._gdal.Band_FlushCache}
        2   29.233   14.616  323.911  161.955 ps.py:18(create_ps)
     1047   21.461    0.020   21.461    0.020 {built-in method numpy.array}
      232   18.691    0.081  509.818    2.197 io.py:654(iter_blocks)
     4998   18.075    0.004   18.075    0.004 {method 'reduce' of 'numpy.ufunc' objects}
     4816   16.160    0.003   16.160    0.003 {built-in method osgeo._gdal.Open}
     1104   13.529    0.012   13.529    0.012 {method 'read' of '_io.BufferedReader' objects}
      832   12.733    0.015   12.733    0.015 {method 'repeat' of 'numpy.ndarray' objects}
      208   12.439    0.060   20.947    0.101 mle.py:262(_fill_ps_pixels)
      208   12.055    0.058   57.488    0.276 sequential.py:315(compress)
9850/8474   11.845    0.001   89.194    0.011 {built-in method numpy.core._multiarray_umath.implement_array_function}
      416    7.788    0.019   20.570    0.049 utils.py:424(upsample_nearest)
      496    7.746    0.016    7.746    0.016 {method 'copy' of 'numpy.ndarray' objects}
  661/659    7.177    0.011    7.178    0.011 {method 'astype' of 'numpy.ndarray' objects}
      208    7.078    0.034    7.078    0.034 {method 'conj' of 'numpy.ndarray' objects}
  189/188    6.986    0.037    6.989    0.037 {built-in method _imp.create_dynamic}
        2    6.669    3.334  512.544  256.272 sequential.py:34(run_evd_sequential)
       16    6.468    0.404   20.647    1.290 nanfunctions.py:1616(nanvar)
      208    5.886    0.028   89.729    0.431 mle.py:24(run_mle)
    10762    5.764    0.001    5.796    0.001 {built-in method posix.stat}
      448    5.642    0.013   30.522    0.068 nanfunctions.py:68(_replace_nan)
      416    5.581    0.013    5.584    0.013 {method 'get' of 'cupy._core.core._ndarray_base' objects}
      208    5.417    0.026    5.417    0.026 {built-in method cupy.cuda.device.get_cublas_handle}
       56    4.775    0.085    4.775    0.085 {built-in method osgeo._gdal.InfoInternal}
     1076    4.526    0.004    4.526    0.004 {built-in method io.open_code}
       50    4.114    0.082    4.115    0.082 {built-in method posix.mkdir}
       54    3.518    0.065    3.518    0.065 {built-in method osgeo._gdal.Dataset_FlushCache}
       16    2.938    0.184   44.982    2.811 ps.py:95(calc_ps_block)
      208    2.294    0.011   10.621    0.051 linalg.py:2357(norm)
      208    2.137    0.010   23.149    0.111 _mle_gpu.py:14(run_gpu)
     3140    2.017    0.001    2.022    0.001 driver.py:317(safe_cuda_api_call)
      224    1.876    0.008   19.697    0.088 nanfunctions.py:952(nanmean)
       28    1.815    0.065    1.839    0.066 io.py:201(copy_projection)
        7    1.795    0.256    1.795    0.256 {built-in method _ctypes.dlopen}
       54    1.677    0.031    1.677    0.031 {built-in method osgeo._gdal.Driver_Create}
        4    1.594    0.399    2.157    0.539 stack.py:133(_write)
      232    1.424    0.006  511.715    2.206 std.py:1174(__iter__)
      208    1.409    0.007    1.456    0.007 mle.py:253(_check_all_nans)
       28    1.326    0.047    1.352    0.048 stitching.py:254(_blend_new_arr)
      208    1.294    0.006    1.318    0.006 {cupy.cusolver.syevj}
      256    1.008    0.004    1.018    0.004 nanfunctions.py:187(_divide_by_count)
        2    1.007    0.503    1.007    0.503 core.py:3435(__setmask__)
     1489    0.942    0.001    0.942    0.001 {built-in method posix.lstat}
      208    0.554    0.003    7.949    0.038 mle.py:162(mle_stack)
        1    0.470    0.470    0.470    0.470 {built-in method cupy_backends.cuda.libs.nvrtc.compileProgram}
     3624    0.440    0.000   81.974    0.023 io.py:506(write_block)
       54    0.433    0.008    6.097    0.113 io.py:365(write_arr)
       32    0.422    0.013    0.422    0.013 {built-in method posix.rename}
      112    0.376    0.003    0.376    0.003 {built-in method osgeo._gdal.InvGeoTransform}
        1    0.368    0.368   67.388   67.388 stitching.py:20(merge_by_date)
       14    0.367    0.026   67.004    4.786 stitching.py:73(merge_images)
       32    0.341    0.011    0.341    0.011 ufunclike.py:127(isposinf)
      160    0.301    0.002    0.301    0.002 {built-in method osgeo._gdal.Dataset_GetProjection}
       28    0.292    0.010    7.537    0.269 interferogram.py:188(__init__)
      478    0.273    0.001    0.487    0.001 {built-in method io.open}
       32    0.262    0.008    0.262    0.008 ufunclike.py:199(isneginf)
        2    0.258    0.129  858.900  429.450 wrapped_phase.py:11(run)
      172    0.238    0.001   13.329    0.077 io.py:44(get_raster_xysize)
       56    0.214    0.004    0.214    0.004 {built-in method osgeo._gdal.new_GDALInfoOptions}
      208    0.199    0.001    0.206    0.001 basic.py:196(zeros)
    11248    0.195    0.000    0.195    0.000 {built-in method unicodedata.east_asian_width}
       54    0.193    0.004    0.196    0.004 crs.py:183(__init__)
      842    0.180    0.000    0.191    0.000 ffi.py:149(__call__)
   596672    0.161    0.000    0.187    0.000 {built-in method builtins.isinstance}
86012/14153    0.159    0.000    0.262    0.000 ir.py:313(_rec_list_vars)
      230    0.155    0.001    0.156    0.001 numerictypes.py:356(issubdtype)
        1    0.154    0.154    0.154    0.154 nvvm.py:215(compile)
     1070    0.126    0.000    0.126    0.000 {built-in method marshal.loads}
        2    0.124    0.062    0.124    0.062 core.py:1003(__call__)
83355/82097    0.119    0.000    0.151    0.000 {built-in method builtins.getattr}
4570/4519    0.115    0.000    0.240    0.000 {built-in method builtins.__build_class__}
       16    0.093    0.006   22.638    1.415 nanfunctions.py:1777(nanstd)

You can also see how much waiting time is happening by the GPU utilization

scottstanie · 2023-02-02T16:10:15Z

Ok in my quick attempt to implement it here #25, it's already working pretty well.

Total time:

[02/02 07:49:06] [INFO _log.py] Total elapsed time for run : 6.57 minutes (394.25 seconds)

PS creation went from 160 seconds -> 60 seconds per burst
Each burst wrapped phase went from 256 seconds per burst -> 124 seconds per burst

GPU usage (closer together blue spikes means less waiting around time by the GPU)

scottstanie · 2023-02-02T22:23:46Z

Closed by #25

scottstanie added the enhancement New feature or request label Jan 31, 2023

scottstanie mentioned this issue Feb 1, 2023

Combining the PS-selection with SHP-finding and phase-linking #24

Open

scottstanie added the run time Related to workflow processing speed label Feb 1, 2023

scottstanie mentioned this issue Feb 2, 2023

Background io #25

Merged

scottstanie closed this as completed Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Background loading / data pre-fetching #6

Background loading / data pre-fetching #6

scottstanie commented Jan 31, 2023

scottstanie commented Feb 1, 2023 •

edited

Loading

scottstanie commented Feb 2, 2023 •

edited

Loading

scottstanie commented Feb 2, 2023

Background loading / data pre-fetching #6

Background loading / data pre-fetching #6

Comments

scottstanie commented Jan 31, 2023

scottstanie commented Feb 1, 2023 • edited Loading

scottstanie commented Feb 2, 2023 • edited Loading

scottstanie commented Feb 2, 2023

scottstanie commented Feb 1, 2023 •

edited

Loading

scottstanie commented Feb 2, 2023 •

edited

Loading