-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds functions for sampling nodes from a graph #1536
Conversation
networkx/algorithms/sampling.py
Outdated
'uniform_independent_node_sample'] | ||
|
||
|
||
# def estimate_mean(sample, values, weights=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I question whether this should function should be provided by NetworkX. This function has nothing in particular to do with graphs, so it should not be provided as part of the public API of NetworkX. At best, these should be provided as examples only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is questionable whether any of these functions belong in the
public networkx API.
On Thu, May 21, 2015 at 9:52 AM, jfinkels notifications@github.com wrote:
In networkx/algorithms/sampling.py
#1536 (comment):+# NetworkX is distributed under a BSD license; see LICENSE.txt for more
+# information.
+"""Provides functions for sampling nodes from a graph."""
+from future import division
+
+import bisect
+import itertools
+import random
+
+import networkx as nx
+
+all = ['degree_weighted_independent_node_sample', 'random_walk',
'uniform_independent_node_sample']
+# def estimate_mean(sample, values, weights=None):
I question whether this should function should be provided by NetworkX.
This function has nothing in particular to do with graphs, so it should not
be provided as part of the public API of NetworkX. At best, these should be
provided as examples only.—
Reply to this email directly or view it on GitHub
https://github.com/networkx/networkx/pull/1536/files#r30816423.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a fair question, it should probably discussed (directly on the pull request).
There are some random sample helpers in networkx.utils too. |
I'll take a look at combining those with this code. |
Maybe they should be in "utils" or as examples? |
@hagberg wrote:
I looked at the
As I proposed in #1533, only internal functions should live in a There is, however, an argument to made for these to be examples only. I'm not making that argument, though, so perhaps someone else can. I think these should be part of the public API; the issues I referenced originally indicate some interest in sampling.
I added a modified version of the edge sampling code from your post there and updated the pull request. One problem is that one of the functions you provided comes from ActiveState, which has a silly and burdensome copyright license requirement, as I describe at the head of the module. |
Okay, I've updated this pull request so there are three major modules in the I removed the code for sampling with replacement from the Google groups message reference above because it didn't apply to sample sizes greater than the size of the population. I removed the code for estimating the relative size of a set given a sample of a population, but someone else might make a pull request for them in a separate |
This commit adds a new package, `networkx.algorithms.sampling`, that includes functions for sampling nodes and edges from a graph. This package also has functions for generating random walks in a graph.
I'm commenting here based on a chat comment from yesterday's community meeting. This is an 8 year old PR that adds sampling functions for edges and nodes. Basically, the functions let you select a set of N random nodes (or random edges). Most are chosen uniformly random, but there's also functionality to prescribe probabilities to each. I think this can be done now with something like rng = np.random.default_rng()
sample_nodes = rng.choice(G.nodes, size=15, p=[n/sum(G.nodes) for n in G.nodes]) So perhaps this should be closed -- or maybe find a place in the docs to point this out? Or suggest a nx-guide topic on this? Do we have a place to suggest nx-guide topics? Maybe in the mentored projects? |
I'm going to close this issue. I think sampling interfaces generally and the NX random walks functions make this easier. |
This pull request includes code from issue #463. I'm not sure how to properly test randomized sampling functions, though.
This module may also be a good place for functions that sample from trees, see issue #1125.