

---


Exercise 2.3 (thinkcomplexity2) In my implementation of reachable_nodes, you might be bothered by the apparent inefficiency of adding all neighbors to the stack without checking whether they are already in seen. Write a version of this function that checks the neighbors before adding them to the stack. Does this "optimization" change the order of growth? Does it make the function faster?


---





In [1]:
import networkx as nx

In [2]:
def all_pair(nodes):
  for i, j in enumerate(nodes):
    for u, v in enumerate(nodes):
      if i > u:
        yield j, v

In [3]:
def make_complete_graph(n):
  G = nx.Graph()
  nodes = range(n)
  G.add_nodes_from(nodes)
  G.add_edges_from(all_pair(nodes))
  return G


---


Below is the author's code for the reachable_nodes.


---



In [4]:
def reachable_nodes(G,start):
  seen = set() #creates an empty set
  stack = [start] #creates a list with our parameter 'start'
  while stack:
    node = stack.pop() #moves out the latest object of 'stack'
    if node not in seen: #this checks if this popped out object is in 'seen'. if not...
      seen.add(node)#we add this node to the empty set 'seen'
      stack.extend(G.neighbors(node))#we add the neighbors of node to stack.
  return seen






---


The quick solution for this is to simply define a new set variable after the seen.add(node) line such that we can subtract the 'seen' set from it. 


---



In [5]:
def reachable_nodes_modified(G,start):
  seen = set()
  stack = [start]
  while stack:
    node = stack.pop()
    if node not in seen:
      seen.add(node)
      check_neighbors = set(G.neighbors(node)) - seen
      stack.extend(check_neighbors)
    return seen




---


Definitions:
\begin{align}
\begin{split}
n &= \textit{number of nodes} \\m &= \textit{number of edges}
\end{split}
\end{align}

Here, I am only able to reduce the added nodes in 'stack'. The original function definition 'reachable_nodes(G,start)' adds a total of $2m$ nodes to stack *because we considered every edge twice* (I cannot understand this intuitively). 

In our modified code above, we reduced the added nodes to 'stack' into $m$. I cannot prove why but for a complete graph, this is true. Hence, the order of growth for our modified function is 
\begin{align}
\mathcal{O}(n+m),
\end{align}
as compared to the previous code $\mathcal{O}(n+2m)$. This implies that our 'optimization' does not change the order of growth for the function (order/power is still the same).


---



In [6]:
complete = make_complete_graph(1000)

In [7]:
%timeit reachable_nodes(complete,0)

10 loops, best of 3: 114 ms per loop


In [8]:
%timeit reachable_nodes_modified(complete,0)

The slowest run took 10.89 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 26.2 µs per loop







---


Surprisingly, the difference between our modified code and the original code is a magnitude of three orders. However, as $n \to \infty$, both function definition should have the same order of magnitude for the time it takes to run it.






---







---
*END* of my answer to the exercise


---








---

*Reflection*




In my implementation of reachable_nodes, you might be bothered by the apparent inefficiency of adding all neighbors to the stack *without checking whether they are already in* **seen**...


---
I was initially trying to make a code that checks if the nodes are already in **stack**. I had a hard time and I am thinking that I will be able to achieve it using a loop, which will make it more inefficient. Is this task possible (checking of stack instead of seen)?


---
Up until this point, I have not looked at the solution manual. Rest assure that after looking at it, no edits were done at all lines above this point (From our course guide, I am assuming that this is allowed as long as I do not copy-paste a code and I am rewritting it under the assumption that I *understood* it. Below is the author's solution:



In [9]:
def reachable_nodes_precheck(G, start):
    seen = set()
    stack = [start]
    while stack:
        node = stack.pop()
        if node not in seen:
            seen.add(node)
            neighbors = set(G[node]) - seen
            stack.extend(neighbors)
    return seen

I do not understand the syntax

neighbors = set(G[node]) - seen

I am cofused specifically at the G[node] syntax. Also, as means of checking, he used the len function. Why did he use it instead of using the raw function?