Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

used queue instead of ordinary list #5217

Merged
merged 2 commits into from Dec 9, 2021
Merged

Conversation

kryuki
Copy link
Contributor

@kryuki kryuki commented Dec 6, 2021

I noticed that you are using an ordinary list when performing BFS. If you pop(0) from a list, it takes O(N) time complexity since all the element in the list have to move 1 step to the left. However, if you use Queue module, pop() operation is done in O(1) time complexity, and it's going to be faster.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my impression that the queue module is more focused on threading support than on "queue" objects in the data structure sense - maybe the collections.deque container would be more suitable? It has the properties that you've identified (e.g. O(1) insertion/popping from the left) and a (IMO at least) nicer API, i.e. popleft(). What do you think?

At face value I think this change makes sense, but it'd be great to also have some benchmarking numbers (even if they're very basic) to evaluate what effect this has on performance, if any.

@kryuki
Copy link
Contributor Author

kryuki commented Dec 8, 2021

Thank you for your feedback! Yes, you are right. It looks like deque is the right choice if we just use it as a "queue" in a data structure sense.
I tested the performance on several kinds of graphs, and here are the results of the benchmark. I tested with Python 3.9.6.

- polbooks.mtx mycielskian8.mtx GD00_a.mtx Vehicle_10NN.mtx Erdos992.mtx
nodes 105 191 352 846 6100
edges 441 2360 458 5447 7515
list 0.078 s 0.338 s 0.079 s 5.195 s 196.86 s
queue 0.142 s 0.436 s 0.119 s 6.616 s 179.035 s
deque 0.069 s 0.314 s 0.081 s 4.48 s 107.628 s

It looks like deque has a 10~20% speed advantage over the ordinary list in many cases. Especially, the graph is large, the difference is noticeable. Also, it turned out using queue was no faster.

Therefore, using deque looks good to me. What do you think? I committed another commit.

The graph I used are

The benchmark code I used is

import time
import networkx as nx
from scipy.io import mmread
from networkx.algorithms.centrality.betweenness import betweenness_centrality

graph_name = 'graph_name.mtx'
G = nx.from_scipy_sparse_matrix(mmread(graph_name))

print(f'Graph name: {graph_name}')
print(f'Number of nodes: {G.number_of_nodes()}')
print(f'Number of edges: {G.number_of_edges()}')

#Use ordinary list
t0 = time.time()
betweenness = betweenness_centrality(G, normalized=True, endpoints=True, mode="list")
print('-' * 20)
print("ordinary list")
print(f'Time taken: {round(time.time() - t0, 3)} s')

#Use Queue
t1 = time.time()
betweenness = betweenness_centrality(G, normalized=True, endpoints=True, mode="queue")
print('-' * 20)
print('queue')
print(f'Time taken: {round(time.time() - t1, 3)} s')

#Use deque
t2 = time.time()
betweenness = betweenness_centrality(G, normalized=True, endpoints=True, mode="deque")
print('-' * 20)
print("deque")
print(f'Time taken: {round(time.time() - t2, 3)} s')

In the betweenness.py,

#added 
def keep_loop(Q, mode):
    if mode == 'queue':
        return not Q.empty()
    else:
        return len(Q) != 0

def _single_source_shortest_path_basic(G, s, mode=None):
    S = []
    P = {}
    for v in G:
        P[v] = []
    sigma = dict.fromkeys(G, 0.0)  # sigma[v]=0 for v in G
    D = {}
    sigma[s] = 1.0
    D[s] = 0

    #added 
    Q = [s]
    if mode == 'list':
        Q = [s]
    elif mode == 'queue':
        Q = Queue()
        Q.put(s)
    elif mode == 'deque':
        Q = deque([s])

    while keep_loop(Q, mode):  # use BFS to find shortest paths
        #added 
        if mode == 'queue':
            v = Q.get()
        elif mode == 'deque':
            v = Q.popleft()
        else:
            v = Q.pop(0)
        S.append(v)
        Dv = D[v]
        sigmav = sigma[v]
        for w in G[v]:
            if w not in D:
                #added 
                if mode == 'queue':
                    Q.put(w)
                else:
                    Q.append(w)
                D[w] = Dv + 1
            if D[w] == Dv + 1:  # this is a shortest path, count paths
                sigma[w] += sigmav
                P[w].append(v)  # predecessors
    return S, P, sigma, D

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the benchmarking and sharing all your work! I was surprised by the Erdos992 values so I tried that one locally - I saw more modest improvements but IMO that's not surprising since benchmarking can be notoriously difficult (also: your computer is much faster than mine 😂 ). Regardless, I think this is a very nice improvement both in terms of pedagogical value (i.e. leveraging the correct data structures for BFS) and performance.

Thanks @kryuki !

@kryuki
Copy link
Contributor Author

kryuki commented Dec 9, 2021

It's my pleasure. Yes, actually, the benchmark is different even from run to run on my computer. But I am happy to show at least some improvement.
Thank you as well for your wonderful library! I often use this networkx library for my research involving graphs, and it's really nice and robust. I really appreciate your work. Thanks @rossbar !

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, and especially for the quick adaptations to deque and for your performance exploration. This looks good!

@dschult dschult merged commit 96a236e into networkx:main Dec 9, 2021
@jarrodmillman jarrodmillman added this to the networkx-2.7 milestone Dec 13, 2021
dschult pushed a commit to dschult/networkx that referenced this pull request Jan 25, 2022
* used queue instead of ordinary list

* use deque instead of queue
MridulS pushed a commit to MridulS/networkx that referenced this pull request Feb 4, 2023
* used queue instead of ordinary list

* use deque instead of queue
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
* used queue instead of ordinary list

* use deque instead of queue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants