###### Introduction to Network Analysis 2023/24 (ii)

## Network representations, basic network algorithms

You are given four networks in Pajek format that was presented in lectures.

+ Tiny toy network for testing ([toy.net](http://lovro.fri.uni-lj.si/ina/nets/toy.net))
+ Zachary karate club network ([karate_club.net](http://lovro.fri.uni-lj.si/ina/nets/karate_club.net))
+ IMDb actors collaboration network ([collaboration_imdb.net](http://lovro.fri.uni-lj.si/ina/nets/collaboration_imdb.net))
+ A small part of Google web graph ([www_google.net](http://lovro.fri.uni-lj.si/ina/nets/www_google.net))

### I. Adjacency list representation

1. **(code)** Assume that all networks are undirected. Implement your own adjacency list representation of the networks as an array of lists and represent all four networks.


#### 1. Solution with our implementation

In [16]:
# Begin iterating over a list of network names
for name in ["toy", "karate_club", "collaboration_imdb", "www_google"]:

    # For each network, we start with an empty graph G (which will later represent the graph as an adjacency list)
    # and set the number of nodes, n, in the graph to be zero
    G, n = None, 0

    # Open the corresponding '.net' file for the current network
    with open("./networks/" + name + ".net", 'r') as file:

        # Read the first line of the file which contains the number of nodes
        # Extract this number from the line, convert it to an integer, and store it in n
        n = int(file.readline().split()[1])

        # Now, initialize G to be a list of n empty lists
        # Each of these empty lists will eventually contain the indices of the nodes adjacent to a given node
        G = [[] for _ in range(n)]

        # Continue reading lines from the file until we find a line that starts with "*"
        # This marks the end of node information
        for line in file:
            if line.startswith("*"):
                break

        # After the node information, the file contains edge information
        # Each line in this section of the file contains two node indices, indicating an edge between these nodes

        # So, we continue reading lines from the file
        for line in file:
            # For each line, we extract the first two numbers
            # These are the indices of the nodes for the current edge
            # Note that the indices in the file start at 1, but Python lists are 0-indexed
            # So, we subtract 1 from these indices before storing them in i and j
            i, j = (int(x) - 1 for x in line.split()[:2])

            # Since this is an undirected graph, each edge goes both ways
            # So, we add j to the adjacency list of node i and also add i to the adjacency list of node j
            G[i].append(j)
            G[j].append(i)

    print("{:s} {:s}: {:,d}".format('Number of nodes in', name, len(G)))

Number of nodes in toy: 5
Number of nodes in karate_club: 34
Number of nodes in collaboration_imdb: 17,577
Number of nodes in www_google: 875,713


#### 1. Solution using NetworkX

In [17]:
import networkx as nx

def read_and_convert_to_adj_list(file_url):
    # The function nx.read_pajek reads the Pajek format file and creates a NetworkX
    # graph from it. The graph is assigned to G.
    G = nx.read_pajek(file_url)
    # The function nx.to_dict_of_lists converts the graph G into a dictionary of lists
    # that represents an adjacency list. The adjacency list is assigned to adj_list.
    adj_list = nx.to_dict_of_lists(G)

    return adj_list


for name in ["toy", "karate_club", "collaboration_imdb", "www_google"]:
    adj_list = read_and_convert_to_adj_list("./networks/" + name + ".net")
    print("{:s} {:s}: {:,d}".format('Number of nodes in', name, len(adj_list)))

Number of nodes in toy: 5
Number of nodes in karate_club: 34
Number of nodes in collaboration_imdb: 17,577
Number of nodes in www_google: 875,713


2. **(discuss)** Now, assume that all networks are directed. How would you extend your network representation?

If the networks are directed, we can expand the representation to a dictionary of lists, where each node's value correspond to a pair of lists: one for the outgoing edges and one for the incoming edges. The nodes where there is an edge from the key node to that node would be included in the outgoing edges list, and the nodes where there is an edge from that node to the key node would be included in the incoming edges list.

3. **(discuss)** Does your network representation allow for multiple links between the nodes, loops on nodes and isolated nodes?

Our network representation can indeed support multiple links between nodes, loops on nodes, and isolated nodes.

- **Multiple links between the nodes**: In the given representation, there is no mechanism to prevent multiple edges between two nodes. If the graph contain multiple edges between the same nodes, the adjacency list will also contain duplicate entries. For example, if there are two edges between nodes 1 and 2, then G[1] will contain two entries of 2 and G[2] will contain two entries of 1.
- **Loops on nodes**: We don't explicitly prevent self-loops (edges from a node to itself). If the graph contains a line where both node indices are the same, then the corresponding node in the adjacency list will have two entries pointing to itself.
- **Isolated nodes**: Isolated nodes (nodes with no edges) are also supported by our representation. If a node has no edges in the graph, its corresponding sublist in the adjacency list will remain empty. For example, if node 3 has no edges, G[3] will be [].