ENH: Adding and documenting configs in nx-parallel #75

Schefflera-Arboricola · 2024-08-19T18:28:38Z

This PR makes nx-parallel compatible with networkx's config system and documents how to configure nx-parallel using joblib.parallel_config.

In PR #68, I attempted to create a unified configuration system in nx-parallel by wrapping the joblib.Parallel()(inside nx-parallel algorithms) within a with joblib.parallel_config(configs) context, here configs are extracted from nx.config.backends.parallel. This approach made NetworkX’s config closely mirror joblib’s, giving the appearance of synchronization between the two systems. However, in the last meeting, Dan mentioned that this approach complicates things.

In this PR, I tried to simplify things by clearly documenting both configuration methods for nx-parallel and also added implementation to make nx-parallel compatible with NetworkX’s config. The _set_nx_config decorator wrap the nx-parallel function call within a joblib.parallel_config context only when NetworkX configs are enabled. If not, then configs set via joblib take precedence. (to make nx-parallel compatible with joblib.parallel_config I just had to remove n_jobs from the internal joblib.Parallel)

In the last meeting, Dan also questioned why we need to make nx-parallel compatible with networkx’s config and why can’t we simply use joblib to configure nx-parallel. And it is necessary to keep nx-parallel compatible with other networkx backends and backend functionalities. Like we would need this compatibility if we would want to change backend_priority and some backend configs in a context like this:

with nx.config(backend_priority=[‘cugraph`, ‘parallel’], backends[“parallel”][“n_jobs”] = 5):
	nx.square_clustering(G)

refer Config.md file in this PR for more.

In the issue #76, I have clearly outlined the challenges in creating a unified configuration system in nx-parallel.

ref. PR #68 for more context

Thank you :)

Schefflera-Arboricola · 2024-08-20T09:05:50Z

Notes for reviewers(@dschult):

TODOs(not done yet to avoid over-crowding this PR; for ease of review; only done for square_clustering right now):
- add the _set_nx_config decorator to all algorithms
- rename cpu_count() to get_n_jobs()
- rename total_cores to n_jobs
Please LMK if it would be possible to get this PR also reviewed and merged before GSoC ends(on August 26th) based on your schedule, so that I can accordingly mention it in my GSoC final report. I'm open to scheduling a meeting to discuss this PR if that would help you review this better and faster.

_nx_parallel/config.py

_nx_parallel/__init__.py

Schefflera-Arboricola · 2024-08-23T08:47:52Z

nx_parallel/utils/decorators.py

+]
+
+
+def _configure_if_nx_active():


We agreed on this name in the last meeting. (ref. Notes)

Schefflera-Arboricola · 2024-08-23T08:47:56Z

pyproject.toml

@@ -86,7 +86,7 @@ line-length = 88
 target-version = 'py310'

 [tool.ruff.lint.per-file-ignores]
-"__init__.py" = ['I', 'F403']
+"__init__.py" = ['I', 'F403', 'F401']


'F401' is added here to not get "module imported but unused." while running pre-commit on the nx_parallel/__init__.py because we just import _configure_if_nx_active in the file to include it in the public API but we don't use it.

nx_parallel/__init__.py

dschult

This is looking good -- I have a couple questions below. Probably tweaks in wording will clear it up. Thanks...

Config.md

… joblib.parallel_config or networkx config

dschult

I took another look at the whole PR and the files rather than the diffs. So I've got a few more suggestions to think about. Adapt or reject them as you see fit.

Config.md

dschult · 2024-08-24T10:18:26Z

Config.md

+
+In this example, if `another_nx_backend` also internally utilizes `joblib.Parallel` (without exposing it to the user) within its implementation of the `square_clustering` algorithm, then the `nx-parallel` configurations set by `joblib.parallel_config` will influence the internal `joblib.Parallel` used by `another_nx_backend`. To prevent unexpected behavior, it is advisable to configure these settings through the NetworkX configuration system.
+
+If you're using `nx-parallel` independently, either the NetworkX or `joblib` configuration system can be used, depending on your preference.


addressing the question of what "independently" means: one attempt

Suggested change

If you're using `nx-parallel` independently, either the NetworkX or `joblib` configuration system can be used, depending on your preference.

If your code using `nx-parallel` does not simultaneously use joblib (including inside another networkx backend) then either the `networkx.config` system or the `joblib.parallel_config` system can be used, depending on your preference.

Also, I think this paragraph should appear at the top of this section, as I think most users will be in this situation.

I don't think it would be a great idea to mention this here because there might be other conflicts too that we don't know of right now. This was just one of the cases where a conflict can occur. LMK what you think.

I agree that we don't want to exclude other possible conflicts -- so let's not be too specific about them.

The first point I'm looking for here is a better phrase than "using nx-parallel independently". (Independently of what?)

The second point is that this seems like a nice first paragraph for this section rather than coming down here. A reader who is using nx-parallel without other joblib code will be able to read this sentence and skip the rest of this section if it doesn't apply to them.

I hope the recent commit resolves this concern :)

Is there any chance of trouble when the other backend doesn't use joblib?
For example, could the config using joblib.parallel_config interfere in any way with nx-cugraph or graphblas-algorithms?

I think the only way to have a "conflict" is if the other code is using joblib too. Right?

Thank you for your question. I'm not familiar with the codebases of other backends, so I can't say for certain. However, the occurrence of the unexpected behaviours also largely depends on the specific joblib backend(loky, threading, multiprocessing, dask, etc.) that the user is employing.

Also, it's beneficial to use NetworkX's configuration when setting backend_priority and configurations within a context for multiple networkx backends, like this:

with nx.config(backend_priority = ['parallel', 'another_nx_backend`], backends=Config(parallel=<nx_parallel_configs>, another_nx_backend=<another_nx_backend_configs>)): ...

I think it doesn't matter what the codebases of the other backends are: the only config options nx-parallel uses are related to joblib. And the only way for another backend to be influenced by those choices is if they look at or use the config options in joblib. So only backends which use joblib will have a possible conflict over config values.

In the interest of putting this into place, I am going to merge this PR. And I'll open another one to make further suggestions.

dschult · 2024-08-24T10:24:49Z

README.md

@@ -107,6 +107,13 @@ Note that for all functions inside `nx_code.py` that do not have an nx-parallel
 import networkx as nx
 import nx_parallel as nxp

+# enabling networkx's config for nx-parallel


I'm wonder if putting config issues into this section is a distraction. They are trying to understand backend usage, not how to configure. Maybe a comment line pointing to the config docs here is sufficient. If you want to keep this here, there should at least be a word indicating that the config stuff is optional.

Suggested change

# enabling networkx's config for nx-parallel

# optional enabling networkx's config for nx-parallel

I don't think it is optional. We don't get any errors when we run with the default configs. But, we are also not running the algorithms on multiple cpu cores or threads because n_jobs=None by default. So, to actually run an algorithm as a parallel algorithm a user would have to set at least the n_jobs config to something non-None or an integer(excluding 0 and 1). I have updated the README.md accordingly.

Wait -- the default value gives no parallel computation? That doesn't seem right to me. Shouldn't the default be to use all the cpus we can get?

Wait -- the default value gives no parallel computation? That doesn't seem right to me.

I understand your concern. However, this behavior is consistent with joblib.parallel_config, where parallel computation isn’t enabled by default (i.e. n_jobs=None by default). Since modifying the defaults within joblib.parallel_config is outside my control(bcoz that would mean changing joblib's codebase), changing the default n_jobs in NetworkX’s configuration from None to -1 will introduce inconsistencies between the two configuration systems, and increase differences between the two config systems, leading to user confusion and potential pitfalls.

Shouldn't the default be to use all the cpus we can get?

That was indeed the previous default, but I've since updated the documentation to reflect the current behavior. Reverting this change would necessitate another extensive documentation update across the entire package, which might not be feasible before the GSoC deadline.

dschult

I approve of this PR.

I will make some suggested wording changes about when to use which config system once I figure out what they should be. But there is no point in delaying this PR for those changes.

Thanks very much @Schefflera-Arboricola for putting this together and guiding it through!!

initial commit

ba92ed2

Schefflera-Arboricola added the type: Enhancement New feature or request label Aug 19, 2024

minor docs updates

97a562e

Schefflera-Arboricola mentioned this pull request Aug 20, 2024

Synchronizing NetworkX and Joblib configurations in nx-parallel #76

Open

Schefflera-Arboricola marked this pull request as ready for review August 20, 2024 09:04

Merge branch 'main' into config_2

6db39f0

dschult reviewed Aug 20, 2024

View reviewed changes

_nx_parallel/config.py Outdated Show resolved Hide resolved

Schefflera-Arboricola added 2 commits August 20, 2024 21:09

style fix

4af10d9

mv _set_nx_config to decorators.py

7db445d

dschult reviewed Aug 20, 2024

View reviewed changes

_nx_parallel/__init__.py Outdated Show resolved Hide resolved

renamed nx_config to active

fa23351

Schefflera-Arboricola mentioned this pull request Aug 20, 2024

Ignoring functions in utils for the get_info dict #78

Merged

Schefflera-Arboricola added 2 commits August 22, 2024 00:31

Merge branch 'main' into config_2

51a5700

style fix

87b8270

Schefflera-Arboricola mentioned this pull request Aug 22, 2024

WIP: Adding config to nx-parallel #68

Closed

Schefflera-Arboricola added 5 commits August 22, 2024 20:10

added _set_nx_config to main namespace

0368951

added F401

e320459

Renamed _set_nx_config to _configure_if_nx_active

0416e62

updated Config.md

515ddde

Improved Config.md

e78071b

Schefflera-Arboricola commented Aug 23, 2024

View reviewed changes

Schefflera-Arboricola requested a review from dschult August 23, 2024 08:54

dschult reviewed Aug 23, 2024

View reviewed changes

Config.md Outdated Show resolved Hide resolved

Config.md Outdated Show resolved Hide resolved

Schefflera-Arboricola added 6 commits August 24, 2024 13:56

improved Config.md

90aedae

added _configure_if_nx_active to all funcs

7208862

renamed cpu_count to get_n_jobs

67c90fd

removing n_jobs from Parallel() because that will be configured using…

6051193

… joblib.parallel_config or networkx config

renaming cpu_count or total_cores to n_jobs

fc89e35

updated README

c9c8d01

Schefflera-Arboricola requested a review from dschult August 24, 2024 08:55

dschult reviewed Aug 24, 2024

View reviewed changes

Schefflera-Arboricola added 2 commits August 24, 2024 17:24

updated docs acc to config

4e2beb6

updated Config.md and README.md based on the review comments

0ad5d9d

Schefflera-Arboricola requested a review from dschult August 24, 2024 12:31

Schefflera-Arboricola mentioned this pull request Aug 24, 2024

[WIP]: Refactor-- consolidate and simplify #7

Closed

Schefflera-Arboricola added the Infrastructure Related to the general infrastructure and organisation of code in the repo label Aug 24, 2024

improved config docs

273b6fe

dschult approved these changes Aug 26, 2024

View reviewed changes

dschult merged commit 15de782 into networkx:main Aug 26, 2024
11 checks passed

jarrodmillman added this to the 0.3 milestone Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Adding and documenting configs in nx-parallel #75

ENH: Adding and documenting configs in nx-parallel #75

Schefflera-Arboricola commented Aug 19, 2024 •

edited

Loading

Schefflera-Arboricola commented Aug 20, 2024

Schefflera-Arboricola Aug 23, 2024

Schefflera-Arboricola Aug 23, 2024

dschult left a comment

dschult left a comment

dschult Aug 24, 2024

Schefflera-Arboricola Aug 24, 2024

dschult Aug 26, 2024

Schefflera-Arboricola Aug 26, 2024

dschult Aug 26, 2024

Schefflera-Arboricola Aug 26, 2024 •

edited

Loading

dschult Aug 26, 2024

dschult Aug 24, 2024 •

edited

Loading

Schefflera-Arboricola Aug 24, 2024 •

edited

Loading

dschult Aug 26, 2024

Schefflera-Arboricola Aug 26, 2024

dschult left a comment


		In this example, if `another_nx_backend` also internally utilizes `joblib.Parallel` (without exposing it to the user) within its implementation of the `square_clustering` algorithm, then the `nx-parallel` configurations set by `joblib.parallel_config` will influence the internal `joblib.Parallel` used by `another_nx_backend`. To prevent unexpected behavior, it is advisable to configure these settings through the NetworkX configuration system.

		If you're using `nx-parallel` independently, either the NetworkX or `joblib` configuration system can be used, depending on your preference.

	If you're using `nx-parallel` independently, either the NetworkX or `joblib` configuration system can be used, depending on your preference.
	If your code using `nx-parallel` does not simultaneously use joblib (including inside another networkx backend) then either the `networkx.config` system or the `joblib.parallel_config` system can be used, depending on your preference.

	# enabling networkx's config for nx-parallel
	# optional enabling networkx's config for nx-parallel

		]


		def _configure_if_nx_active():

ENH: Adding and documenting configs in nx-parallel #75

ENH: Adding and documenting configs in nx-parallel #75

Conversation

Schefflera-Arboricola commented Aug 19, 2024 • edited Loading

Schefflera-Arboricola commented Aug 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dschult left a comment

Choose a reason for hiding this comment

dschult left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Schefflera-Arboricola Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dschult Aug 24, 2024 • edited Loading

Choose a reason for hiding this comment

Schefflera-Arboricola Aug 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dschult left a comment

Choose a reason for hiding this comment

Schefflera-Arboricola commented Aug 19, 2024 •

edited

Loading

Schefflera-Arboricola Aug 26, 2024 •

edited

Loading

dschult Aug 24, 2024 •

edited

Loading

Schefflera-Arboricola Aug 24, 2024 •

edited

Loading