-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
block-by-block IO - part 2 #488
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
+ block-by-block IO for tropo_pyaps3 using writefile.layout_hdf5() and writefile.write_hdf5_block() + add run_or_skip() within calculate_delay_timeseries() for auto-skip
+ add --ram/--memory option for custom memory usage with default value of 2GB and template reading support + import cluster.split_box2sub_boxes() for patch spliting + block-by-block processing using writefile.layout_hdf5/write_hdf5_block()
+ fully integrate the bootstrap method with complex time func support
+ add --ram/--memory option for max memory usage setup + split run_timeseries2time_func() into: - read_inps2model() to get model dict and print key model info - layout_hdf5() to create HDF5 file with time func structure - write_hdf5_block() to write the block of time func + objects.cluster.split_box2sub_boxes: refactor
+ dem_error: do not write step_model into file/disk because: 1. the step function estimation is now supported via timeseries2velocity.h5 and the latter has more powerful functionality 2. the step model HFD5 file from dem_error.py is different from the one from ts2vel.h5 and the latter is preferred for its simplicity. + drop the common operations support in the following scripts of timeseriesStepModel.h5 file, which sometimes has one date in the time dimension: - geocode.py - mask.py - multilook.py - subset.py + dem_error: replace split2boxes with cluster.split_box2sub_boxes() test_sbApp: plot velocity alone for snap and aria also
…or 3D dset + utils.readfile: - read_hdf5_file/binary(): fix the size discrepency when x/ystep > 1 to be consistent with multilooking output size - read_hdf5_file(): use for loop when ystep * xstep > 1 for 3D dataset to save memory + multilook.multilook_dataset(): add method arg to support/swtich btw. average and nearest + view: expand multilook_num to all multiple subplots scenarios to save memory because the readfile.read(x/ystep) now won't distort data (due to nearest sampling instead of previous averaging)
For executable scripts, use ``` if __name == '__main__': main(sys.argv[1:]) ``` instead of ``` if __name == '__main__': main() ``` because the latter return error in interactive python when cmd_line_parsee() is not called in the main body of main(), such as the case in tsview.py, therefore the former is more generic and useful. + defaults/auto_path: use watermask.msk for ARIA
+ use pyresample as the default softwre for geocoding dataset produced by isce (lut in radar-coord) and gamma (lut in geo-coord) + comparison of geocodeing results between pyresample / scipy on Wells EQ dataset gives identical results on 99.1% of all valid pixels, thus change the default geocoding software from scipy to pyresample for: 1. consistency with config for other processors 2. flexibility, i.e. customized SNWE and lat/lon step. 3. efficiency, pyresample supports 3D matrix, thus, is more efficient. + consistent internal definition + clean up the following concepts/variables in resample objects: - always to coordinates at pixel center for interpolation - SNWE indicates bounding box at pixel outer boundary, consistent with Y/X_FIRST definition, unless noted in the adjacent comments.
+ rename mintpy.compute.memorySize to mintpy.compute.maxMemory for a more intuitive name + change default max memory from 2 GB to 4 GB + dem_error: add dask parallel option in prepare for dask support + sbApp(_auto).cfg: merge mintpy.geocode.latStep and mintpy.geocode.lonStep into one as mintpy.geocode.laloStep for consistency with mintpy.objects.resample object
* geocode.py + add --ram option from utils.arg_group.py + merge -y/x into --lalo-step for consistency with resample obj + more explicit checking / error message for --lalo-step option, since it's only customizable if radar2geo AND lut in radar-coord + block-by-block IO for both HDF5 and binary file, the latter is bbb in read only * objects/resample.py + move all configurations into __init__() to simplify the run_resample() + consistent member variables across all scenarios (radar2geo/geo2radar, geo/radar-coord lookup table, scipy/pyresmaple), including: - lalo_step - SNWE - length/width - src/dest_box_list - src/dest_def_list (for pyresample) - src/dest_pts and interp_mask (for scipy) + add get_num_box() + prepare_geometry_definition_radar(): - add block-by-block geometry preparation for radar2geo - add custom SNWE support for geo2radar + prepare_geometry_definition_geo() - add custom SNWE support for radar2geo
+ use max_memory to calc block size in temp_avg/pha_closure/ifg_inv + round block step to the nearest 10 + add used time info for load_data and plot_sbApp + Update mkdocs.yml
+ change default mintpy.compute.cluster value from no to none, to be consistent with utils.arg_group.add_parallel_argument() + dem_error: add parallel computing support via dask
+ docs/hdfeos5.md: add metadata session in require / recommend / auto-grab sub-section, to facilitate manual specification + save_hdfeos5: change the followingg metadataa - remove "frame" from UNAVCO definition completely and use first/last_frame only, for simplicity - uncomment processing_software - hardwire processing_type = LOS_TIMESERIES. This can be changed in the future if velocity/interferogram capabiligy is added. - hardwire post_processing_method = MintPy. This can be changed in the future if the script supports products from other softwares. + save_hdfeos5: add date-by-date IO to save memory / handle big data
+ move the following sub-functions into a new sub-module utils.attribute: - utils.utils0.subset_attribute() --> update_attribute4subset() - multilook.multilook_attribute() --> update_attribute4multilook() - geocode.metadata_radar2geo() --> update_attribute4radar2geo() - geocode.metadata_geo2radar() --> update_attribute4geo2radar() + update docs/api/module_hierarchy.md for utils.attribute/arg_group
+ add `mintpy.load.x/ystep` with default value of 1 for smallbaselineApp.py + multilook.multilook_data(): - add default lks_y/x value of 1 - return directly if no multilook number is specified + load_data.py - use iDict to replace inpsDict for simplicity - read mintpy.load.x/ystep and pass them to ifgramStackDict/geometryDict object + objects/stackDict.py: support x/ystep in all write2hdf5() + prep_aria: support multilook via mintpy.load.x/ystep
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
This PR, together with #478, addresses the memory issue for large datasets in the routine workflow
smallbaselineApp.py
, e.g., #199, #216, #473. Testing on an ifgramStack.h5 file of 128 GB with unwrapPhase in the shape of (384, 8412, 5276) on my laptop (with 16GB memory) shows the max memory usage of 4 GB.block-by-block IO for the following scripts:
memory-efficient view.py via readfile.read(x/ystep) with improved handling of large 3D matrix
timeseries2velocity: integrate complex time func with bootstrap, so that one could use 1) bootstrap or 2) normal least squares with error propagation, for the estimation of complex time function and their uncertainty.
dem_error:
mintpy.compute.*
options, in the same way as ifgram_inversion.pyadd
mintpy.load.x/ystep
option to support multilooking during load_data step, to downsize datasetReminders