Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds functions for sampling nodes from a graph #1536

Closed
wants to merge 2 commits into from

Conversation

jfinkels
Copy link
Contributor

This pull request includes code from issue #463. I'm not sure how to properly test randomized sampling functions, though.

This module may also be a good place for functions that sample from trees, see issue #1125.

'uniform_independent_node_sample']


# def estimate_mean(sample, values, weights=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question whether this should function should be provided by NetworkX. This function has nothing in particular to do with graphs, so it should not be provided as part of the public API of NetworkX. At best, these should be provided as examples only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is questionable whether any of these functions belong in the
public networkx API.

On Thu, May 21, 2015 at 9:52 AM, jfinkels notifications@github.com wrote:

In networkx/algorithms/sampling.py
#1536 (comment):

+# NetworkX is distributed under a BSD license; see LICENSE.txt for more
+# information.
+"""Provides functions for sampling nodes from a graph."""
+from future import division
+
+import bisect
+import itertools
+import random
+
+import networkx as nx
+
+all = ['degree_weighted_independent_node_sample', 'random_walk',

  •       'uniform_independent_node_sample']
    
    +# def estimate_mean(sample, values, weights=None):

I question whether this should function should be provided by NetworkX.
This function has nothing in particular to do with graphs, so it should not
be provided as part of the public API of NetworkX. At best, these should be
provided as examples only.


Reply to this email directly or view it on GitHub
https://github.com/networkx/networkx/pull/1536/files#r30816423.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a fair question, it should probably discussed (directly on the pull request).

@hagberg
Copy link
Member

hagberg commented May 21, 2015

There are some random sample helpers in networkx.utils too.

@jfinkels
Copy link
Contributor Author

I'll take a look at combining those with this code.

@hagberg
Copy link
Member

hagberg commented May 21, 2015

Maybe they should be in "utils" or as examples?
There is also this old thread on the mailing list about choosing random edges
https://groups.google.com/forum/#!topic/networkx-discuss/qH3WynBjp3g
Maybe something useful there?

@jfinkels
Copy link
Contributor Author

jfinkels commented Jun 2, 2015

@hagberg wrote:

There are some random sample helpers in networkx.utils too.

I looked at the random_* functions in networkx.utils, but they were not particularly applicable to the task of sampling nodes or edges from a graph.

Maybe they should be in "utils" or as examples?

As I proposed in #1533, only internal functions should live in a utils package; if these functions are not used elsewhere in the code as helper functions, they should not be put in a utils package.

There is, however, an argument to made for these to be examples only. I'm not making that argument, though, so perhaps someone else can. I think these should be part of the public API; the issues I referenced originally indicate some interest in sampling.

There is also this old thread on the mailing list about choosing random edges
https://groups.google.com/forum/#!topic/networkx-discuss/qH3WynBjp3g
Maybe something useful there?

I added a modified version of the edge sampling code from your post there and updated the pull request. One problem is that one of the functions you provided comes from ActiveState, which has a silly and burdensome copyright license requirement, as I describe at the head of the module.

@jfinkels
Copy link
Contributor Author

Okay, I've updated this pull request so there are three major modules in the sampling package, one for sampling nodes, one for sampling edges, and one for generating random walks. A fourth module contains the generic sampling methods. It also has unit tests now.

I removed the code for sampling with replacement from the Google groups message reference above because it didn't apply to sample sizes greater than the size of the population. I removed the code for estimating the relative size of a set given a sample of a population, but someone else might make a pull request for them in a separate algorithms.statistics module, say. They now live in a gist here.

This commit adds a new package, `networkx.algorithms.sampling`, that
includes functions for sampling nodes and edges from a graph. This
package also has functions for generating random walks in a graph.
@dschult
Copy link
Member

dschult commented Nov 9, 2023

I'm commenting here based on a chat comment from yesterday's community meeting.

This is an 8 year old PR that adds sampling functions for edges and nodes. Basically, the functions let you select a set of N random nodes (or random edges). Most are chosen uniformly random, but there's also functionality to prescribe probabilities to each.

I think this can be done now with something like

rng = np.random.default_rng()
sample_nodes = rng.choice(G.nodes, size=15, p=[n/sum(G.nodes) for n in G.nodes])

So perhaps this should be closed -- or maybe find a place in the docs to point this out? Or suggest a nx-guide topic on this? Do we have a place to suggest nx-guide topics? Maybe in the mentored projects?

@dschult
Copy link
Member

dschult commented Dec 13, 2023

I'm going to close this issue. I think sampling interfaces generally and the NX random walks functions make this easier.
Comment here if you are still interested in this.

@dschult dschult closed this Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants