# Computing Graph Descriptors with D2C
This notebook demonstrates how to use the D2C class to compute graph descriptors from synthetic datasets.

## Setup
First, we import necessary libraries and suppress warnings:

In [1]:
import sys
import os
import pandas as pd 
sys.path.append(os.path.abspath('..')) 
from src.descriptors import D2C, DataLoader

In [2]:
#suppress warnings #TODO: handle
import warnings
warnings.filterwarnings("ignore")

## Loading Synthetic Data
We load the previously generated synthetic data using the DataLoader class. 
That class allow to keep the data-storage format transparent to the user. 

In [3]:
dataloader = DataLoader()
dataloader.from_pickle('../data/example/synthetic_data.pkl')
observations = dataloader.get_observations()
dags = dataloader.get_dags()

## Computing Graph Descriptors
We initialize the D2C object with:

* observations: The synthetic datasets
* dags: The corresponding ground truth DAGs
* seed: 42 for reproducibility
* n_jobs: 40 parallel processes for computation
* full: True to compute all available descriptors

In [4]:
d2c = D2C(observations=observations,
        dags=dags, 
        seed=42,
        n_jobs=10)

The `initialize` method actually performs the computation of descriptors

In [5]:
d2c.initialize()

## Retrieving and Processing Descriptors

In [None]:
matrices = d2c.get_matrices()

TypeError: list indices must be integers or slices, not str

In [None]:
graph_id = 0
descriptor = 
matrices[]

[{'graph_id': 0,
  'edge_source': 'X0',
  'edge_dest': 'X1',
  'is_causal': True,
  'coeff_cause': np.float64(2.5576713465322745),
  'coeff_eff': np.float64(0.22833046534129237),
  'HOC_3_1': np.float64(1.370204539597046),
  'HOC_1_2': np.float64(0.080718200297899),
  'HOC_2_1': np.float64(-0.07314626052733092),
  'HOC_1_3': np.float64(1.7229424756245193),
  'kurtosis_ca': np.float64(-1.1874278749072331),
  'kurtosis_ef': np.float64(-0.600940328209409),
  'skewness_ca': np.float64(-0.09622432399583232),
  'skewness_ef': np.float64(0.25290543627127865),
  'com_cau': np.float64(0.5828091101004058),
  'cau_eff': np.float64(0.5816377361293104),
  'eff_cau': np.float64(0.5833339137553621),
  'eff_cau_mbeff': np.float64(0.5834512838697649),
  'cau_eff_mbcau': np.float64(0.5828091101004058),
  'mca_mef_cau_q0': np.float64(0.0),
  'mca_mef_cau_q1': np.float64(0.0),
  'mca_mef_cau_q2': np.float64(0.0),
  'mca_mef_cau_q3': np.float64(0.0),
  'mca_mef_cau_q4': np.float64(0.0),
  'mca_mef_cau_q5':

In [None]:
df = d2c.descriptors_df
df.columns

Index(['graph_id', 'edge_source', 'edge_dest', 'is_causal', 'coeff_cause',
       'coeff_eff', 'HOC_3_1', 'HOC_1_2', 'HOC_2_1', 'HOC_1_3', 'kurtosis_ca',
       'kurtosis_ef', 'skewness_ca', 'skewness_ef', 'com_cau', 'cau_eff',
       'eff_cau', 'eff_cau_mbeff', 'cau_eff_mbcau', 'mca_mef_cau_q0',
       'mca_mef_cau_q1', 'mca_mef_cau_q2', 'mca_mef_cau_q3', 'mca_mef_cau_q4',
       'mca_mef_cau_q5', 'mca_mef_cau_q6', 'mca_mef_eff_q0', 'mca_mef_eff_q1',
       'mca_mef_eff_q2', 'mca_mef_eff_q3', 'mca_mef_eff_q4', 'mca_mef_eff_q5',
       'mca_mef_eff_q6', 'cau_m_eff_q0', 'cau_m_eff_q1', 'cau_m_eff_q2',
       'cau_m_eff_q3', 'cau_m_eff_q4', 'cau_m_eff_q5', 'cau_m_eff_q6',
       'eff_m_cau_q0', 'eff_m_cau_q1', 'eff_m_cau_q2', 'eff_m_cau_q3',
       'eff_m_cau_q4', 'eff_m_cau_q5', 'eff_m_cau_q6', 'm_cau_q0', 'm_cau_q1',
       'm_cau_q2', 'm_cau_q3', 'm_cau_q4', 'm_cau_q5', 'm_cau_q6',
       'eff_cau_mbcau_plus_q0', 'eff_cau_mbcau_plus_q1',
       'eff_cau_mbcau_plus_q2', 'eff_cau_mbcau_

We add a 'function_id' column to identify which causal mechanism generated each dataset. We know that the each sequence of 40 graph_ids corresponds to a function. (Because we generated 40 graphs per function)

In [7]:
df['function_id'] = df['graph_id'] // 40

In [8]:
df.to_csv('../data/example/descriptors.csv', index=False)