# Data Generation examples
- This notebook shows how to use this package, and more specifically part 1 (Data Generation)
- Please check other notebooks for other parts (contestants and multi-criteria approach)

## Data Generation
- Generating data (graphs) following a set of parameters and displaying summaries

In [14]:
%load_ext autoreload
%autoreload 2
import igraph
from moo.data_generation import ExpConfig, DataGenerator

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [15]:
# Define an experiment configuration instance (parameters for data generation)
expconfig = ExpConfig(
    L=100, U=500,
    NumEdges=1000, ML=0.4, MU=0.4,
    BC=0.1, NumGraphs=30,
    shuffle=True, # Shuffle labels (or no)
    seed=None # For reproducibility (this is the default, but can be changed)
    )
print(expconfig) # Print parameters, or access individually, e.g., expconfig.NumEdges

<ExpConfig: L=100, U=500, NumNodes=600, NumEdges=1000, ML=0.4, MU=0.4, BC=0.1, NumGraphs=30, shuffle=True, seed=42>


In [16]:
# Generate data following the defined experiment confguration
expgen = DataGenerator(expconfig=expconfig) # Pass defined parameters
print(expgen)
datagen = expgen.generate_data() # datagen is an iterator

<DataGenerator: ExpConfig: L=100, U=500, NumNodes=600, NumEdges=1000, ML=0.4, MU=0.4, BC=0.1, NumGraphs=30, shuffle=True, seed=42>


In [17]:
# Iterate through the generates data (graphs)
# Possible usages include saving to disk, visualizing, feeding to downstream steps, etc.
for idx, graph in enumerate(datagen):
    # Save graphs, visualize, etc.
    # Here, I print the graph summary
    print(f'\nGraph {idx+1} out of f{expconfig.NumGraphs}')
    igraph.summary(graph)


Graph 1 out of f30
IGRAPH U--T 534 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 2 out of f30
IGRAPH U--T 520 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 3 out of f30
IGRAPH U--T 518 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 4 out of f30
IGRAPH U--T 524 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 5 out of f30
IGRAPH U--T 528 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 6 out of f30
IGRAPH U--T 529 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 7 out of f30
IGRAPH U--T 522 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 8 out of f30
IGRAPH U--T 525 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 9 out of f30
IGRAPH U--T 531 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 10 out of f30
IGRAPH U--T 519 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 11 out of f30
IGRAPH U--T 523 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 12 out of f30
IGRAPH U--T 530 1000 -- 
+ attr: GT (v), VX (v), type (v)

Graph 13 out of f30
IGRAPH U--T 537 1000 -- 
+ attr: GT (v),

## Fig. 6 data generation and saving in legacy format
- This shows how to generate graphs using the new code, but saving in the legacy format (for debugging purposes)
- Please refer to the utils module for saving/loading graphs functionality allowing to link the legacy code with the new code in terms of formatting data accordingly

In [18]:
# Save data in the old format
# Define an experiment configuration instance (parameters for data generation)
# Fig. 6 (see paper)
expconfig = ExpConfig(
    L=30, U=30, NumEdges=100, ML=0.5, MU=0.5, BC=0.1, NumGraphs=100, # 30 graphs in the paper
    shuffle=True, seed=42
)
print(expconfig) # Print parameters, or access individually, e.g., expconfig.NumEdges

# Generate data following the defined experiment confguration
expgen = DataGenerator(expconfig=expconfig) # Pass defined parameters
print(expgen)
datagen = expgen.generate_data() # datagen is an iterator

import os
from moo.utils import save_graphs_to_path
path = "./_temp/data_2_shuffled_fig06/"
os.makedirs(path, exist_ok=True)
save_graphs_to_path(path, datagen)

<ExpConfig: L=30, U=30, NumNodes=60, NumEdges=100, ML=0.5, MU=0.5, BC=0.1, NumGraphs=100, shuffle=True, seed=42>
<DataGenerator: ExpConfig: L=30, U=30, NumNodes=60, NumEdges=100, ML=0.5, MU=0.5, BC=0.1, NumGraphs=100, shuffle=True, seed=42>


In [19]:
# Save data in the old format
# Fig. 6 (see paper)
expconfig = ExpConfig(
    L=30, U=30, NumEdges=100, ML=0.5, MU=0.5, BC=0.1, NumGraphs=100, # 30 graphs in the paper
    shuffle=False, seed=42
)
print(expconfig) # Print parameters, or access individually, e.g., expconfig.NumEdges

# Generate data following the defined experiment confguration
expgen = DataGenerator(expconfig=expconfig) # Pass defined parameters
print(expgen)
datagen = expgen.generate_data() # datagen is an iterator

from moo.utils import save_graphs_to_path
path = "./_temp/data_1_nonshuffled_fig06/"
os.makedirs(path, exist_ok=True)
save_graphs_to_path(path, datagen)

<ExpConfig: L=30, U=30, NumNodes=60, NumEdges=100, ML=0.5, MU=0.5, BC=0.1, NumGraphs=100, shuffle=False, seed=42>
<DataGenerator: ExpConfig: L=30, U=30, NumNodes=60, NumEdges=100, ML=0.5, MU=0.5, BC=0.1, NumGraphs=100, shuffle=False, seed=42>


## Using 3rd party bipartite graphs and groundtruth
- The following code is for users who have bipartite graphs in igraph format but the ground truth is provided separateley
- It shows how to augment the graph strcture with the ground truth in order to use it with the contestant or the multicriteria approaches

In [20]:
# For simplicity, lets use the current package to generate a bipartite graph, then remove the embedded ground truth to simulate a typical graph that a user may provide
# Fig. 6 (see paper)
import igraph
from moo.data_generation import ExpConfig, DataGenerator
expconfig = ExpConfig(L=30, U=30, NumEdges=100, ML=0.5, MU=0.5, BC=0.1, NumGraphs=1, # 30 graphs in the paper
    shuffle=False, seed=42
)
expgen = DataGenerator(expconfig=expconfig) # Pass defined parameters
datagen = expgen.generate_data() # datagen is an iterator
graph = next(datagen)
igraph.summary(graph)




IGRAPH U--T 55 100 -- 
+ attr: GT (v), VX (v), type (v)


In [21]:
# The above graph has the ground truth embedded
# Let's strip the ground truth (this does it in place)
from moo.utils import strip_groundtruth
strip_groundtruth(graph, 'GT') # The ground truth is encoded as the'GT' attribute)
igraph.summary(graph)


IGRAPH U--T 55 100 -- 
+ attr: VX (v), type (v)


In [22]:
# It can be seen that the graph has no ground truth
# Now, let's create a random ground truth vector and embed it into the graph
import numpy as np
gt = np.random.choice([0,1], size=len(graph.vs))
gt

array([0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0])

In [23]:
# Embed the groundtruth into the graph
graph.vs['GT'] = gt
igraph.summary(graph)

IGRAPH U--T 55 100 -- 
+ attr: GT (v), VX (v), type (v)


In [24]:
# The graph can now be used as input to any contestant approach and also to the multi-criteria approach, as the groundtruth allows to generate performance metrics