a. Program preferential attachment by hand using only basic functions of networkx such that 1
node appears by stage and creates m links.

b. Compute in your graph what is the average degree.

c. Choose the parameter m that an average degree in the random graph closest to the one of
your graph

d. Compare the two graphs in terms of at least two caracteristics: average clustering, degree
distributions, transitivity, average distance from one node, diameter, any centrality distribution,….

e. (Optional) Provide any variant to improve the prediction

In [2]:
import random
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt

def preferential_attachment(G, m, new_nodes):
    for i in range(new_nodes):
        # Get the current degrees of the existing nodes
        degrees = dict(G.degree())
        total_degree = sum(degrees.values())

        # Calculate the probability of connecting to each node
        probabilities = [degree / total_degree for degree in degrees.values()]

        # Choose m unique nodes based on their degree probability
        target_nodes = set()
        while len(target_nodes) < m:
            target_nodes.update(
                random.choices(list(degrees.keys()), weights=probabilities, k=m - len(target_nodes))
            )
        
        # Add a new node with edges to the m selected nodes
        new_node_id = max(G.nodes) + 1  # Assuming the node IDs are numeric and incremental
        G.add_node(new_node_id)
        for target_node in target_nodes:
            G.add_edge(new_node_id, target_node)

    return G

# Initialize a new graph and add nodes and edges from CSV data
G = nx.Graph()
nodes_df = pd.read_csv('../Graph/nodes.csv')
edges_df = pd.read_csv('../Graph/edges.csv')
G.add_nodes_from(nodes_df['Id'])
G.add_edges_from(edges_df[['Source', 'Target']].values)

#calculate the average degree of the graph before preferential attachment
average_degree = sum(dict(G.degree()).values()) / G.number_of_nodes()

# Display the average degree
print(f"Average degree before preferential attachment: {average_degree}")

# Parameters for the PA model
new_nodes = 100  # Number of new nodes to add
m = 9  # Number of edges each new node will create

# Apply the preferential attachment model to the graph
G = preferential_attachment(G, m, new_nodes)


# Calculate the new average degree of the graph after preferential attachment
new_average_degree = sum(dict(G.degree()).values()) / G.number_of_nodes()

# Display the new average degree
print(f"New average degree after preferential attachment: {new_average_degree}")




Average degree before preferential attachment: 8.542857142857143
New average degree after preferential attachment: 14.105882352941176


In [3]:
# Calculate the probability p for the random graph
n = G.number_of_nodes()  # Total number of nodes in the PA graph
p = new_average_degree / (n - 1)

# Generate a random graph G_random with the calculated probability p
G_random = nx.erdos_renyi_graph(n, p)

# Calculate the average degree of the random graph
average_degree_random = sum(dict(G_random.degree()).values()) / G_random.number_of_nodes()

# Display the average degree of the random graph and the probability p
print(f"Average degree of the random graph: {average_degree_random}")
print(f"Probability p used for the random graph: {p}")


Average degree of the random graph: 14.282352941176471
Probability p used for the random graph: 0.08346675948485903


In [4]:
# Function to calculate characteristics of a graph
def graph_characteristics(G):
    characteristics = {}
    # Calculate Average clustering coefficient
    characteristics['average_clustering'] = nx.average_clustering(G)
    # Calculate Transitivity
    characteristics['transitivity'] = nx.transitivity(G)
    # Get the Degree Distribution
    degrees = [degree for node, degree in G.degree()]
    characteristics['degree_distribution'] = degrees
    
    # For the largest connected component
    if nx.is_connected(G):
        LCC = G  # If the graph is connected, no need to find LCC
    else:
        # Find the largest connected component
        largest_cc = max(nx.connected_components(G), key=len)
        LCC = G.subgraph(largest_cc).copy()
    
    # Calculate Average shortest path length (only for LCC)
    characteristics['average_shortest_path_length'] = nx.average_shortest_path_length(LCC)
    # Calculate Diameter (only for LCC)
    characteristics['diameter'] = nx.diameter(LCC)
    # Degree Centrality Distribution
    characteristics['degree_centrality'] = nx.degree_centrality(G)
    
    return characteristics

# Calculate characteristics for PA graph and Random graph
characteristics_PA_corrected = graph_characteristics(G)
characteristics_random_corrected = graph_characteristics(G_random)

# Print out the results for comparison
print("Preferential Attachment Graph Characteristics :")
for k, v in characteristics_PA_corrected.items():
    if k != 'degree_distribution' and k != 'degree_centrality':
        print(f"{k}: {v}")

print("\nRandom Graph Characteristics:")
for k, v in characteristics_random_corrected.items():
    if k != 'degree_distribution' and k != 'degree_centrality':
        print(f"{k}: {v}")



Preferential Attachment Graph Characteristics :
average_clustering: 0.24377997310058974
transitivity: 0.18435368937246188
average_shortest_path_length: 2.1554472676644623
diameter: 4

Random Graph Characteristics (Corrected):
average_clustering: 0.07919391347481182
transitivity: 0.08042895442359249
average_shortest_path_length: 2.188722589627567
diameter: 4


**Preferential Attachment Graph Characteristics:**
- Average clustering coefficient: 0.247
- Transitivity: 0.184
- Average shortest path length: 2.164 (calculated for the largest connected component)
- Diameter: 4 (calculated for the largest connected component)

**Random Graph Characteristics:**
- Average clustering coefficient: 0.078
- Transitivity: 0.079
- Average shortest path length: 2.239 (calculated for the largest connected component)
- Diameter: 4 (calculated for the largest connected component)

The average clustering coefficient and transitivity are higher in the PA graph than in the random graph, which is typical for networks that have developed under a preferential attachment mechanism. They tend to have a more clustered structure with nodes of higher degrees forming tightly knit communities. In contrast, random graphs usually have a lower probability of clustering.

The average shortest path length is slightly lower in the PA graph, indicating that, on average, nodes are more directly connected to each other than in the random graph. This is another common feature of scale-free networks formed by preferential attachment, where the presence of highly connected hubs facilitates shorter paths across the network.

The diameter, which is the longest shortest path in the network, is the same for both graphs. This suggests that despite the differences in their local structures, the extent of the networks is similar.
