DaskCluster: a general dask wrapper for mintpy by Ovec8hkin · Pull Request #357 · insarlab/MintPy

Ovec8hkin · 2020-05-27T19:49:37Z

Generalized use of Dask with creation of DaskCluster object that handles cluster setup, client connections, and worker submission and processing (#354). This should make integrating dask into other scripts much easier.

And update the Dask performance figure of runtime vs number of cores (6 in #347).

Reminders

Pass Codacy code review (green)
Pass testing with $MINTPY_HOME/test/test_smallbaselineApp.py
Make sure that your code follows our style. Use the other functions/files as a basis.
If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
If adding new functionality, add a detailed description to the documentation and/or an example.

…ompted via object argument (default True)

…ient scheduler

…it_workers and compile_workers. Also added methods for closing cluster/client connections and moving out/err files.

…ew cluster abstraction. Moved old_cluster.py to legacy/.

Ovec8hkin · 2020-05-27T19:51:12Z

@yunjunz @falkamelung I would have someone checkout this branch and test it on reasonably large dataset (AND LOOK AT THE RESULTS) to make sure it works as expected. I messed it up in a few places when I initially wrote it, which caused only part of the dataset to run correctly. Kilauea would be a good dataset to test on in my opinion.

It's fine to remove the old version as it's in the git history.

+ move initiation and scale code to open() for a light weight __init()__ + simplify the submit_work() and collect_result() and add more content to run() + bring back the comments from old ifgram_inversion.py for book keeping + change the default value for config_name from 'no' to None in ifgram_inversion.py and False to None in template file, so that they are consistent.

yunjunz · 2020-05-31T06:40:17Z

From my testing with two datasets, the current memory usage is not quite as specified in the -r option. We need to re-check the following to make sure memorySize we input is the max memory the program is gonna use during the whole time period. This is important for job scheduling on HPC where control/check is more restrict.

the calculation from memorySize to chunkSize
for -w var a temporary float64 is used, this might bring up the memory (now fixed)
in the end of the cluster job, there is a memory surge, need to find out why.

yunjunz · 2020-05-31T06:57:12Z

On my laptop, when I have 2 ifgram_inversion.py running, the following message came out. I guess it's fine. The "6 workers" is a little bit weird, will check again in more details tomorrow (it was a bug, now it's fixed).

------- start parallel processing using Dask -------
input Dask cluster type: local
initiate Dask cluster
/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/distributed/node.py:244: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 57863 instead
  http_address["port"], self.http_server.port
scale the cluster to 6 workers

FYI, I was running ifgram_inversion.py inputs/ifgramStack.h5 -w var for the Fernandina dataset in one terminal and $MINTPY_HOME/test/test_smallbaselineApp.py in another terminal.

yunjunz · 2020-05-31T07:28:21Z

@Ovec8hkin could we remove the dask-worker-space folder after the running?

in utils1.check_template_auto_value()

ifgram_inversion: + add block-by-block writing for all output data. + move weight calculation before obs reading to reduce memory usage due to the internal float64 format while calculating weight. + remove obsolete write_aux2hdf5_file() writefile: + add write_hdf5_block() to support block-by-block writing.

Ovec8hkin

@yunjunz these changes to cluster.py use quite a few non-pythonic anti patterns that I think we should avoid. Namely, there doesn't need to be empty returns in most of the functions, and most of the variables do not need to be written as self.parameter_name, especially those that are simply computed and then used once within their own function (ie. self.futures, self.start_time_sub) Permission to clean this up?

Ovec8hkin · 2020-06-01T03:52:22Z

On my laptop, when I have 2 ifgram_inversion.py running, the following message came out. I guess it's fine. The "6 workers" is a little bit weird, will check again in more details tomorrow (it was a bug, now it's fixed).
------- start parallel processing using Dask -------
input Dask cluster type: local
initiate Dask cluster
/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/distributed/node.py:244: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 57863 instead
  http_address["port"], self.http_server.port
scale the cluster to 6 workers
FYI, I was running ifgram_inversion.py inputs/ifgramStack.h5 -w var for the Fernandina dataset in one terminal and $MINTPY_HOME/test/test_smallbaselineApp.py in another terminal.

This it probably due to how dask creates its worker Client object. Im not sure what happens if multiple client objects are running simultaneously. Its possible they are sharing workers. I would advise against running multiple ifgram_inversion processes simultaneously, as its very possible that workers submitted from one client, might finish and get collected by a different running client, which could mess up results. This would need to be tested more carefully.

add cluster to the module_hierarchy doc

yunjunz · 2020-06-01T04:18:10Z

Hi @Ovec8hkin, the Dask cluster port change message is fine as explained, so we can ignore it I think.

For the variables in the cluster.py, if you find variables used once and can be cleaned, yes please.

From my side, the last thing to check is the memory. With the default 4GB memory input, ifgram_inversion.py is actually using up to 10 GB in the process. We definitely want to avoid that. But this is not related to Dask, I guess we do that in another issue / pull request.

The rest looks all good to me!

following codacy's suggestion

yunjunz · 2020-06-01T19:06:11Z

Here is an overview of what got changed by this pull request:

Complexity increasing per file
==============================
- mintpy/ifgram_inversion.py  1
         

Complexity decreasing per file
==============================
+ mintpy/objects/cluster.py  -5
         

Clones removed
==============
+ mintpy/ifgram_inversion.py  -2

See the complete overview on Codacy

yunjunz

All the tests look good and normal. After so many rounds of refactoring, I think now ifgram_inversion.py is the most beautiful script in this repo!

Ovec8hkin added 11 commits May 26, 2020 12:51

Created DaskCluster object for generalization of parallel routines

b87b1af

Added static formatting methods for config name and walltime

0413873

Created separate function to write cluster job script to file when pr…

fb4fe83

…ompted via object argument (default True)

Created submit_workers function for submitting dask workers to the cl…

5e5b8fc

…ient scheduler

Updated ifgram_inversion.py to use cluster.submit_workers

3f1c1de

Created DaskCluster.compile_workers to recompile subbox results

38f053b

Simplified DaskCluster API by adding run() wrapper method around subm…

9aaa98b

…it_workers and compile_workers. Also added methods for closing cluster/client connections and moving out/err files.

Fixed equality check for importing dask_jobqueue

462e4cb

Fixed early return error in DaskCluster.compile_workers

9ef72db

Removed unecesarry print statements

3c4165c

Commented cluster.py and ifgram_inversion.py with information about n…

5165308

…ew cluster abstraction. Moved old_cluster.py to legacy/.

Ovec8hkin added the enhancement label May 27, 2020

Ovec8hkin requested review from gravelcycles and yunjunz May 27, 2020 19:49

Ovec8hkin and others added 4 commits May 27, 2020 15:03

Updated dask.md documentation

197a30a

Moved write_job_script call so as to only run for HPC cluster types

ef155a5

Changed default value of weightFunc to 'var'

1079121

Update dask.md

aa910aa

yunjunz reviewed May 28, 2020

View reviewed changes

Comment thread docs/dask.md Outdated

yunjunz reviewed May 28, 2020

View reviewed changes

Comment thread mintpy/objects/cluster.py Outdated

yunjunz reviewed May 28, 2020

View reviewed changes

Comment thread mintpy/objects/cluster.py Outdated

falkamelung and others added 8 commits May 29, 2020 09:27

Update dask.md

ad120f3

Update dask.md

2ea6d8c

uses working options

a750497

working slurm options for comet

659abd2

Update dask.md

cdbce21

Update dask.md

f85925b

Updated dask.md documentation

8486f4d

Removed --walltime command line option

e787707

yunjunz and others added 4 commits May 30, 2020 13:54

Update dask.md

ab46d9d

Update smallbaselineApp.cfg

bed6775

Delete old_cluster.py

f2dba06

It's fine to remove the old version as it's in the git history.

yunjunz added 3 commits May 31, 2020 12:35

fix bug for numWorker = all translation

e8e3224

in utils1.check_template_auto_value()

remove ifgram_inversion from the stdout/err_folder name

a04d0af

Ovec8hkin commented Jun 1, 2020

View reviewed changes

add more del in ifgram_inversion to cut memory usage

b5ea3df

add cluster to the module_hierarchy doc

Ovec8hkin and others added 9 commits May 31, 2020 23:32

Made DaskCluster object implementation more pythonic

1fef224

Fixed type in DaskCluster.collect_result documentation

2e5bfcd

Redefined cluster_kwargs

cd3b559

Update cluster.py

f301aa5

Update dask.md

c72a3e8

remove the duplicated write_hdf5_block in stack.timeseries obj

e00fe78

following codacy's suggestion

add cluster.DaskCluster.format_num_worker()

0ef53e2

Update ifgram_inversion.py

a0f7c2e

Update cluster.py

94eb132

yunjunz approved these changes Jun 1, 2020

View reviewed changes

yunjunz merged commit 34478e7 into master Jun 1, 2020

yunjunz changed the title ~~DaskCluster object as a general dask wrapper for mintpy~~ DaskCluster: a general dask wrapper for mintpy Jun 1, 2020

yunjunz deleted the general-dask-wrapper branch June 1, 2020 19:45

yunjunz mentioned this pull request Jun 2, 2020

ifg_inv: calibrate memory usage #361

Merged

5 tasks

s-sasaki-earthsea-wizard mentioned this pull request May 2, 2026

RFC: opt-in GPU backend for invert_network (torch.linalg.lstsq, CUDA) #1489

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DaskCluster: a general dask wrapper for mintpy#357

DaskCluster: a general dask wrapper for mintpy#357
yunjunz merged 50 commits into
masterfrom
general-dask-wrapper

Ovec8hkin commented May 27, 2020 •

edited by yunjunz

Loading

Uh oh!

Ovec8hkin commented May 27, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yunjunz commented May 31, 2020 •

edited

Loading

Uh oh!

yunjunz commented May 31, 2020 •

edited

Loading

Uh oh!

yunjunz commented May 31, 2020

Uh oh!

Ovec8hkin left a comment

Uh oh!

Ovec8hkin commented Jun 1, 2020

Uh oh!

yunjunz commented Jun 1, 2020 •

edited

Loading

Uh oh!

yunjunz commented Jun 1, 2020

Uh oh!

yunjunz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ovec8hkin commented May 27, 2020 • edited by yunjunz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ovec8hkin commented May 27, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yunjunz commented May 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yunjunz commented May 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yunjunz commented May 31, 2020

Uh oh!

Ovec8hkin left a comment

Choose a reason for hiding this comment

Uh oh!

Ovec8hkin commented Jun 1, 2020

Uh oh!

yunjunz commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yunjunz commented Jun 1, 2020

Uh oh!

yunjunz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ovec8hkin commented May 27, 2020 •

edited by yunjunz

Loading

yunjunz commented May 31, 2020 •

edited

Loading

yunjunz commented May 31, 2020 •

edited

Loading

yunjunz commented Jun 1, 2020 •

edited

Loading