# Connected Components( Depth First Search)

-Learned handling of graph data structures.

-Implemented depth-first search for identifying the connected components of an undirected graph, implementing procedure Search as a subroutine along the way.

-Used the [NetworkX](https://networkx.github.io/) Python package to represent and manipulate graphs. 

-Worked with a dataset recording the interactions between characters in Homer's *Iliad*.

In [24]:
import networkx
import urllib2
homer = urllib2.urlopen('http://people.sc.fsu.edu/~jburkardt/datasets/sgb/homer.dat')

The format of the data is straightforward. After some comment lines (beginning with \*), the file lists a codename for each character (i.e., node of the graph), followed by a description. The file then lists the groups of characters that interact in each chapter, from which you will form the edges. For instance, the first line has the form:

```1:CH,AG,ME,GS;AP,CH;HE,AC;AC,AG,CA;HE,AT;AT,AC;AT,OG;NE,AG,AC;CS,OD```

This means that CH,AG,ME,GS interacted, so there are edges for all pairs of these nodes. Groups of characters that interacted are separated by semicolons. The lines start with chapter information of the form `1:` or `&:`, which can be ignored for this problem.

##### Function to read in the nodes from the input file

In [25]:
def read_nodes(gfile):
    """Reads in the nodes of the graph from the input file.
    
    Args:
        gfile: A handle for the file containing the graph data, starting at the top.
        
    Returns:
        A generator of the nodes in the graph, yielding a list of the form:
            ['CH', 'AG, 'ME', ...]
    """
   
    content = gfile.readline()
    while content != '\n':
        content = content.replace('\n', '')    #replace all the newlines by spaces
        if len(content) != 0:                  
            if content[0] != '*':              #skip all the lines starting with *
                if content[:2] != '1:':        
                    yield content[:2]          #returns a generator object, taking characters till index 2
                else: break
        content = gfile.readline()
    pass 

##### Function to read in the edges from the input file.

In [34]:
def read_edges(gfile):
    """Reads in the edges of the graph from the input file.
    
    Args:
        gfile: A handle for the file containing the graph data, starting at the top 
            of the edges section.
            
    Returns:
        A generator of the edges in the graph, yielding a list of pairs of the form:
            [('CH', 'AG'), ('AG', 'ME'), ...]
    """
    
    for content in gfile.readlines():
        content = content.replace('\n', '')                                  #replace all the newlines by spaces
        if len(content)>1:
            if ':' in content:
                mark = content.index(':')
                content = content[(mark+1):]                #store all the interactions starting after ':' in content
                groups = content.split(';')                 #split by ';'
                for interactions in groups:                 
                    interactions = interactions.split(',')  #split by ','
                    for node1 in range(len(interactions)-1):
                        for node2 in range(node1+1, len(interactions)):
                            edge = (interactions[node1], interactions[node2])
                            yield edge
    pass
print(list(read_edges(homer)))

[]


#### create the graph.

In [27]:
import networkx as nx
G = nx.Graph()
G.add_nodes_from(read_nodes(homer))
G.add_edges_from(read_edges(homer))

In [33]:
G.number_of_edges()

1629

##### Function to implement procedure Search. The function takes in a graph and a root node, and returns a list of the nodes visited during the search. The nodes  appear in the order in which they were *first visited*. The neighbors of a nodes are processed in *alphabetical order*, where numbers come before letters. 

In [28]:
def Search(graph, root):
    """Runs depth-first search through a graph, starting at a given root. Neighboring
    nodes are processed in alphabetical order.
    
    Args:
        graph: the given graph, with nodes encoded as strings.
        root: the node from which to start the search.
        
    Returns:
        A list of nodes in the order in which they were first visited.
    """
  
    stack = [root]                        #initialize a stack with the root node
    explored = []                         #list of nodes that have been discovered
    while stack:
        node = stack.pop()
        if node in explored : continue 
        explored.append(node)             #add the node to the list of explored if undiscovered
        neighbors = graph.neighbors(node) #neighbors of the node
        neighbors.sort()                  #sort the neighbors in alphanumeric order
        stack.extend(neighbors)           #add neighbors to the stack
    return explored
    pass

##### Function to implement DFS to find the connected components of the character graph. When choosing roots for components, always pick the *smallest unvisited node* according to alphabetical ordering. 

In [30]:
def connected_components(graph):
    """Computes the connected components of the given graph.
    
    Args: 
        graph: the given graph, with nodes encoded as strings.
        
    Returns:
        The connected components of the graph. Components are listed in
        alphabetical order of their root nodes.
    """
   
    vertices = graph.nodes()                        #list of all the nodes
    vertices.sort()                                 #sort the nodes
    seen = []                                       #empty list of explored nodes
    components = []                                 #stores lists of connected components
    for v in vertices:
        if v not in seen:
            connected_component = Search(graph,v)
            components.append(connected_component)
            seen.extend(connected_component)
    return components
    
    pass

In [31]:
character_interactions = connected_components(G)

In [32]:
component_sizes = [len(c) for c in character_interactions]
print "There are 12 connected components in the Iliad:", len(component_sizes) == 12
print "The giant component has size 542:", max(component_sizes) == 542
print "There are 5 isolated characters:", len([c for c in component_sizes if c == 1]) == 5

There are 12 connected components in the Iliad: True
The giant component has size 542: True
There are 5 isolated characters: True
