Skip to content

DaskCluster: a general dask wrapper for mintpy#357

Merged
yunjunz merged 50 commits into
masterfrom
general-dask-wrapper
Jun 1, 2020
Merged

DaskCluster: a general dask wrapper for mintpy#357
yunjunz merged 50 commits into
masterfrom
general-dask-wrapper

Conversation

@Ovec8hkin
Copy link
Copy Markdown
Contributor

@Ovec8hkin Ovec8hkin commented May 27, 2020

Generalized use of Dask with creation of DaskCluster object that handles cluster setup, client connections, and worker submission and processing (#354). This should make integrating dask into other scripts much easier.

And update the Dask performance figure of runtime vs number of cores (6 in #347).

Reminders

  • Pass Codacy code review (green)
  • Pass testing with $MINTPY_HOME/test/test_smallbaselineApp.py
  • Make sure that your code follows our style. Use the other functions/files as a basis.
  • If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
  • If adding new functionality, add a detailed description to the documentation and/or an example.

@Ovec8hkin
Copy link
Copy Markdown
Contributor Author

@yunjunz @falkamelung I would have someone checkout this branch and test it on reasonably large dataset (AND LOOK AT THE RESULTS) to make sure it works as expected. I messed it up in a few places when I initially wrote it, which caused only part of the dataset to run correctly. Kilauea would be a good dataset to test on in my opinion.

Comment thread docs/dask.md Outdated
Comment thread mintpy/objects/cluster.py Outdated
Comment thread mintpy/objects/cluster.py Outdated
yunjunz and others added 4 commits May 30, 2020 13:54
It's fine to remove the old version as it's in the git history.
+ move initiation and scale code to open() for a light weight __init()__

+ simplify the submit_work() and collect_result() and add more content to run()

+ bring back the comments from old ifgram_inversion.py for book keeping

+ change the default value for config_name from 'no' to None in ifgram_inversion.py and False to None in template file, so that they are consistent.
@yunjunz
Copy link
Copy Markdown
Member

yunjunz commented May 31, 2020

From my testing with two datasets, the current memory usage is not quite as specified in the -r option. We need to re-check the following to make sure memorySize we input is the max memory the program is gonna use during the whole time period. This is important for job scheduling on HPC where control/check is more restrict.

  • the calculation from memorySize to chunkSize
  • for -w var a temporary float64 is used, this might bring up the memory (now fixed)
  • in the end of the cluster job, there is a memory surge, need to find out why.

@yunjunz
Copy link
Copy Markdown
Member

yunjunz commented May 31, 2020

On my laptop, when I have 2 ifgram_inversion.py running, the following message came out. I guess it's fine. The "6 workers" is a little bit weird, will check again in more details tomorrow (it was a bug, now it's fixed).

------- start parallel processing using Dask -------
input Dask cluster type: local
initiate Dask cluster
/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/distributed/node.py:244: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 57863 instead
  http_address["port"], self.http_server.port
scale the cluster to 6 workers

FYI, I was running ifgram_inversion.py inputs/ifgramStack.h5 -w var for the Fernandina dataset in one terminal and $MINTPY_HOME/test/test_smallbaselineApp.py in another terminal.

@yunjunz
Copy link
Copy Markdown
Member

yunjunz commented May 31, 2020

@Ovec8hkin could we remove the dask-worker-space folder after the running?

yunjunz added 3 commits May 31, 2020 12:35
in utils1.check_template_auto_value()
ifgram_inversion:
+ add block-by-block writing for all output data.
+ move weight calculation before obs reading to reduce memory usage due to the internal float64 format while calculating weight.
+ remove obsolete write_aux2hdf5_file()

writefile:
+ add write_hdf5_block() to support block-by-block writing.
Copy link
Copy Markdown
Contributor Author

@Ovec8hkin Ovec8hkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yunjunz these changes to cluster.py use quite a few non-pythonic anti patterns that I think we should avoid. Namely, there doesn't need to be empty returns in most of the functions, and most of the variables do not need to be written as self.parameter_name, especially those that are simply computed and then used once within their own function (ie. self.futures, self.start_time_sub) Permission to clean this up?

@Ovec8hkin
Copy link
Copy Markdown
Contributor Author

On my laptop, when I have 2 ifgram_inversion.py running, the following message came out. I guess it's fine. The "6 workers" is a little bit weird, will check again in more details tomorrow (it was a bug, now it's fixed).

------- start parallel processing using Dask -------
input Dask cluster type: local
initiate Dask cluster
/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/distributed/node.py:244: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 57863 instead
  http_address["port"], self.http_server.port
scale the cluster to 6 workers

FYI, I was running ifgram_inversion.py inputs/ifgramStack.h5 -w var for the Fernandina dataset in one terminal and $MINTPY_HOME/test/test_smallbaselineApp.py in another terminal.

This it probably due to how dask creates its worker Client object. Im not sure what happens if multiple client objects are running simultaneously. Its possible they are sharing workers. I would advise against running multiple ifgram_inversion processes simultaneously, as its very possible that workers submitted from one client, might finish and get collected by a different running client, which could mess up results. This would need to be tested more carefully.

add cluster to the module_hierarchy doc
@yunjunz
Copy link
Copy Markdown
Member

yunjunz commented Jun 1, 2020

Hi @Ovec8hkin, the Dask cluster port change message is fine as explained, so we can ignore it I think.

For the variables in the cluster.py, if you find variables used once and can be cleaned, yes please.

From my side, the last thing to check is the memory. With the default 4GB memory input, ifgram_inversion.py is actually using up to 10 GB in the process. We definitely want to avoid that. But this is not related to Dask, I guess we do that in another issue / pull request.

The rest looks all good to me!

Copy link
Copy Markdown
Member

yunjunz commented Jun 1, 2020

Codacy Here is an overview of what got changed by this pull request:

Complexity increasing per file
==============================
- mintpy/ifgram_inversion.py  1
         

Complexity decreasing per file
==============================
+ mintpy/objects/cluster.py  -5
         

Clones removed
==============
+ mintpy/ifgram_inversion.py  -2
         

See the complete overview on Codacy

Copy link
Copy Markdown
Member

@yunjunz yunjunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the tests look good and normal. After so many rounds of refactoring, I think now ifgram_inversion.py is the most beautiful script in this repo!

@yunjunz yunjunz merged commit 34478e7 into master Jun 1, 2020
@yunjunz yunjunz changed the title DaskCluster object as a general dask wrapper for mintpy DaskCluster: a general dask wrapper for mintpy Jun 1, 2020
@yunjunz yunjunz deleted the general-dask-wrapper branch June 1, 2020 19:45
@yunjunz yunjunz mentioned this pull request Jun 2, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants