New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
used queue instead of ordinary list #5217
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my impression that the queue
module is more focused on threading support than on "queue" objects in the data structure sense - maybe the collections.deque
container would be more suitable? It has the properties that you've identified (e.g. O(1) insertion/popping from the left) and a (IMO at least) nicer API, i.e. popleft()
. What do you think?
At face value I think this change makes sense, but it'd be great to also have some benchmarking numbers (even if they're very basic) to evaluate what effect this has on performance, if any.
Thank you for your feedback! Yes, you are right. It looks like deque is the right choice if we just use it as a "queue" in a data structure sense.
It looks like deque has a 10~20% speed advantage over the ordinary list in many cases. Especially, the graph is large, the difference is noticeable. Also, it turned out using queue was no faster. Therefore, using deque looks good to me. What do you think? I committed another commit. The graph I used are
The benchmark code I used is import time
import networkx as nx
from scipy.io import mmread
from networkx.algorithms.centrality.betweenness import betweenness_centrality
graph_name = 'graph_name.mtx'
G = nx.from_scipy_sparse_matrix(mmread(graph_name))
print(f'Graph name: {graph_name}')
print(f'Number of nodes: {G.number_of_nodes()}')
print(f'Number of edges: {G.number_of_edges()}')
#Use ordinary list
t0 = time.time()
betweenness = betweenness_centrality(G, normalized=True, endpoints=True, mode="list")
print('-' * 20)
print("ordinary list")
print(f'Time taken: {round(time.time() - t0, 3)} s')
#Use Queue
t1 = time.time()
betweenness = betweenness_centrality(G, normalized=True, endpoints=True, mode="queue")
print('-' * 20)
print('queue')
print(f'Time taken: {round(time.time() - t1, 3)} s')
#Use deque
t2 = time.time()
betweenness = betweenness_centrality(G, normalized=True, endpoints=True, mode="deque")
print('-' * 20)
print("deque")
print(f'Time taken: {round(time.time() - t2, 3)} s') In the betweenness.py, #added
def keep_loop(Q, mode):
if mode == 'queue':
return not Q.empty()
else:
return len(Q) != 0
def _single_source_shortest_path_basic(G, s, mode=None):
S = []
P = {}
for v in G:
P[v] = []
sigma = dict.fromkeys(G, 0.0) # sigma[v]=0 for v in G
D = {}
sigma[s] = 1.0
D[s] = 0
#added
Q = [s]
if mode == 'list':
Q = [s]
elif mode == 'queue':
Q = Queue()
Q.put(s)
elif mode == 'deque':
Q = deque([s])
while keep_loop(Q, mode): # use BFS to find shortest paths
#added
if mode == 'queue':
v = Q.get()
elif mode == 'deque':
v = Q.popleft()
else:
v = Q.pop(0)
S.append(v)
Dv = D[v]
sigmav = sigma[v]
for w in G[v]:
if w not in D:
#added
if mode == 'queue':
Q.put(w)
else:
Q.append(w)
D[w] = Dv + 1
if D[w] == Dv + 1: # this is a shortest path, count paths
sigma[w] += sigmav
P[w].append(v) # predecessors
return S, P, sigma, D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the benchmarking and sharing all your work! I was surprised by the Erdos992 values so I tried that one locally - I saw more modest improvements but IMO that's not surprising since benchmarking can be notoriously difficult (also: your computer is much faster than mine 😂 ). Regardless, I think this is a very nice improvement both in terms of pedagogical value (i.e. leveraging the correct data structures for BFS) and performance.
Thanks @kryuki !
It's my pleasure. Yes, actually, the benchmark is different even from run to run on my computer. But I am happy to show at least some improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, and especially for the quick adaptations to deque and for your performance exploration. This looks good!
* used queue instead of ordinary list * use deque instead of queue
* used queue instead of ordinary list * use deque instead of queue
* used queue instead of ordinary list * use deque instead of queue
I noticed that you are using an ordinary list when performing BFS. If you pop(0) from a list, it takes O(N) time complexity since all the element in the list have to move 1 step to the left. However, if you use Queue module, pop() operation is done in O(1) time complexity, and it's going to be faster.