Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background loading / data pre-fetching #6

Closed
scottstanie opened this issue Jan 31, 2023 · 3 comments
Closed

Background loading / data pre-fetching #6

scottstanie opened this issue Jan 31, 2023 · 3 comments
Labels
enhancement New feature or request run time Related to workflow processing speed

Comments

@scottstanie
Copy link
Collaborator

Currently the data IO might be the largest percentage of the total run time prior to unwrapping.

@scottstanie scottstanie added the enhancement New feature or request label Jan 31, 2023
@scottstanie
Copy link
Collaborator Author

scottstanie commented Feb 1, 2023

Pre-fetching is probably going to be a big speedup:

From a test run with

  • 2 bursts
  • 15 dates per burst
  • no unwrapping
  • 20 meter output spacing (strides = {x:4, y:2})
  • using the GPU

It took 15 minutes total (7.5 minutes per burst),

[02/01 09:53:03] [INFO _log.py] Total elapsed time for run : 15.51 minutes (930.62 seconds)
         3997323 function calls (3836454 primitive calls) in 931.307 seconds

...where each PS step took ~160 seconds (5 min 20 seconds total for 2 bursts), meaning that the wrapped phase step took a little over 4 minutes per burst.
The fact that the PS step, where it does just a few simple number nanmean/nanstds, is close in runtime to the phase linking step means that #24 should also be a big speed up.
BUT, that won't be as relevant for the online-updating PS where we'd only pull and iterate over 1-2 SLCs to get a new PS map.

The top time spent:

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      410  490.677    1.197  490.677    1.197 {built-in method osgeo._gdal_array.DatasetIONumPy}
     3682   70.823    0.019   70.823    0.019 {built-in method osgeo._gdal_array.BandRasterIONumPy}
     3624   69.144    0.019   69.144    0.019 {built-in method osgeo._gdal.Band_FlushCache}
        2   29.233   14.616  323.911  161.955 ps.py:18(create_ps)
     1047   21.461    0.020   21.461    0.020 {built-in method numpy.array}

(More of the results of python -m cProfile -s time `which dolphin` run dolphin_config.yaml)

[02/01 09:53:03] [INFO _log.py] Total elapsed time for run : 15.51 minutes (930.62 seconds)
         3997323 function calls (3836454 primitive calls) in 931.307 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      410  490.677    1.197  490.677    1.197 {built-in method osgeo._gdal_array.DatasetIONumPy}
     3682   70.823    0.019   70.823    0.019 {built-in method osgeo._gdal_array.BandRasterIONumPy}
     3624   69.144    0.019   69.144    0.019 {built-in method osgeo._gdal.Band_FlushCache}
        2   29.233   14.616  323.911  161.955 ps.py:18(create_ps)
     1047   21.461    0.020   21.461    0.020 {built-in method numpy.array}
      232   18.691    0.081  509.818    2.197 io.py:654(iter_blocks)
     4998   18.075    0.004   18.075    0.004 {method 'reduce' of 'numpy.ufunc' objects}
     4816   16.160    0.003   16.160    0.003 {built-in method osgeo._gdal.Open}
     1104   13.529    0.012   13.529    0.012 {method 'read' of '_io.BufferedReader' objects}
      832   12.733    0.015   12.733    0.015 {method 'repeat' of 'numpy.ndarray' objects}
      208   12.439    0.060   20.947    0.101 mle.py:262(_fill_ps_pixels)
      208   12.055    0.058   57.488    0.276 sequential.py:315(compress)
9850/8474   11.845    0.001   89.194    0.011 {built-in method numpy.core._multiarray_umath.implement_array_function}
      416    7.788    0.019   20.570    0.049 utils.py:424(upsample_nearest)
      496    7.746    0.016    7.746    0.016 {method 'copy' of 'numpy.ndarray' objects}
  661/659    7.177    0.011    7.178    0.011 {method 'astype' of 'numpy.ndarray' objects}
      208    7.078    0.034    7.078    0.034 {method 'conj' of 'numpy.ndarray' objects}
  189/188    6.986    0.037    6.989    0.037 {built-in method _imp.create_dynamic}
        2    6.669    3.334  512.544  256.272 sequential.py:34(run_evd_sequential)
       16    6.468    0.404   20.647    1.290 nanfunctions.py:1616(nanvar)
      208    5.886    0.028   89.729    0.431 mle.py:24(run_mle)
    10762    5.764    0.001    5.796    0.001 {built-in method posix.stat}
      448    5.642    0.013   30.522    0.068 nanfunctions.py:68(_replace_nan)
      416    5.581    0.013    5.584    0.013 {method 'get' of 'cupy._core.core._ndarray_base' objects}
      208    5.417    0.026    5.417    0.026 {built-in method cupy.cuda.device.get_cublas_handle}
       56    4.775    0.085    4.775    0.085 {built-in method osgeo._gdal.InfoInternal}
     1076    4.526    0.004    4.526    0.004 {built-in method io.open_code}
       50    4.114    0.082    4.115    0.082 {built-in method posix.mkdir}
       54    3.518    0.065    3.518    0.065 {built-in method osgeo._gdal.Dataset_FlushCache}
       16    2.938    0.184   44.982    2.811 ps.py:95(calc_ps_block)
      208    2.294    0.011   10.621    0.051 linalg.py:2357(norm)
      208    2.137    0.010   23.149    0.111 _mle_gpu.py:14(run_gpu)
     3140    2.017    0.001    2.022    0.001 driver.py:317(safe_cuda_api_call)
      224    1.876    0.008   19.697    0.088 nanfunctions.py:952(nanmean)
       28    1.815    0.065    1.839    0.066 io.py:201(copy_projection)
        7    1.795    0.256    1.795    0.256 {built-in method _ctypes.dlopen}
       54    1.677    0.031    1.677    0.031 {built-in method osgeo._gdal.Driver_Create}
        4    1.594    0.399    2.157    0.539 stack.py:133(_write)
      232    1.424    0.006  511.715    2.206 std.py:1174(__iter__)
      208    1.409    0.007    1.456    0.007 mle.py:253(_check_all_nans)
       28    1.326    0.047    1.352    0.048 stitching.py:254(_blend_new_arr)
      208    1.294    0.006    1.318    0.006 {cupy.cusolver.syevj}
      256    1.008    0.004    1.018    0.004 nanfunctions.py:187(_divide_by_count)
        2    1.007    0.503    1.007    0.503 core.py:3435(__setmask__)
     1489    0.942    0.001    0.942    0.001 {built-in method posix.lstat}
      208    0.554    0.003    7.949    0.038 mle.py:162(mle_stack)
        1    0.470    0.470    0.470    0.470 {built-in method cupy_backends.cuda.libs.nvrtc.compileProgram}
     3624    0.440    0.000   81.974    0.023 io.py:506(write_block)
       54    0.433    0.008    6.097    0.113 io.py:365(write_arr)
       32    0.422    0.013    0.422    0.013 {built-in method posix.rename}
      112    0.376    0.003    0.376    0.003 {built-in method osgeo._gdal.InvGeoTransform}
        1    0.368    0.368   67.388   67.388 stitching.py:20(merge_by_date)
       14    0.367    0.026   67.004    4.786 stitching.py:73(merge_images)
       32    0.341    0.011    0.341    0.011 ufunclike.py:127(isposinf)
      160    0.301    0.002    0.301    0.002 {built-in method osgeo._gdal.Dataset_GetProjection}
       28    0.292    0.010    7.537    0.269 interferogram.py:188(__init__)
      478    0.273    0.001    0.487    0.001 {built-in method io.open}
       32    0.262    0.008    0.262    0.008 ufunclike.py:199(isneginf)
        2    0.258    0.129  858.900  429.450 wrapped_phase.py:11(run)
      172    0.238    0.001   13.329    0.077 io.py:44(get_raster_xysize)
       56    0.214    0.004    0.214    0.004 {built-in method osgeo._gdal.new_GDALInfoOptions}
      208    0.199    0.001    0.206    0.001 basic.py:196(zeros)
    11248    0.195    0.000    0.195    0.000 {built-in method unicodedata.east_asian_width}
       54    0.193    0.004    0.196    0.004 crs.py:183(__init__)
      842    0.180    0.000    0.191    0.000 ffi.py:149(__call__)
   596672    0.161    0.000    0.187    0.000 {built-in method builtins.isinstance}
86012/14153    0.159    0.000    0.262    0.000 ir.py:313(_rec_list_vars)
      230    0.155    0.001    0.156    0.001 numerictypes.py:356(issubdtype)
        1    0.154    0.154    0.154    0.154 nvvm.py:215(compile)
     1070    0.126    0.000    0.126    0.000 {built-in method marshal.loads}
        2    0.124    0.062    0.124    0.062 core.py:1003(__call__)
83355/82097    0.119    0.000    0.151    0.000 {built-in method builtins.getattr}
4570/4519    0.115    0.000    0.240    0.000 {built-in method builtins.__build_class__}
       16    0.093    0.006   22.638    1.415 nanfunctions.py:1777(nanstd)

You can also see how much waiting time is happening by the GPU utilization

Image

@scottstanie
Copy link
Collaborator Author

scottstanie commented Feb 2, 2023

Ok in my quick attempt to implement it here #25, it's already working pretty well.

Total time:

[02/02 07:49:06] [INFO _log.py] Total elapsed time for run : 6.57 minutes (394.25 seconds)
  • PS creation went from 160 seconds -> 60 seconds per burst
  • Each burst wrapped phase went from 256 seconds per burst -> 124 seconds per burst

GPU usage (closer together blue spikes means less waiting around time by the GPU)

image

@scottstanie
Copy link
Collaborator Author

Closed by #25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request run time Related to workflow processing speed
Projects
None yet
Development

No branches or pull requests

1 participant