# Computing Graph Descriptors with D2C
This notebook demonstrates how to use the D2C class to compute graph descriptors from synthetic datasets.

## Setup
First, we import necessary libraries and suppress warnings:

In [1]:
import sys
import os
import pandas as pd 
sys.path.append(os.path.abspath('..')) 
from src.descriptors import D2C, DataLoader

In [2]:
#suppress warnings #TODO: handle
import warnings
warnings.filterwarnings("ignore")

## Loading Synthetic Data
We load the previously generated synthetic data using the DataLoader class. 
That class allow to keep the data-storage format transparent to the user. 

In [3]:
dataloader = DataLoader()
dataloader.from_pickle('../data/example/synthetic_data.pkl')
observations = dataloader.get_observations()
dags = dataloader.get_dags()

## Computing Graph Descriptors
We initialize the D2C object with:

* observations: The synthetic datasets
* dags: The corresponding ground truth DAGs
* seed: 42 for reproducibility
* n_jobs: 40 parallel processes for computation
* full: True to compute all available descriptors

In [None]:
d2c = D2C(observations=observations,
        dags=dags, 
        seed=42,
        n_jobs=10)

The `initialize` method actually performs the computation of descriptors

In [5]:
d2c.initialize()

X0 X1
0 1
Index(['X0', 'X1', 'X2', 'X3', 'U0', 'U1'], dtype='object')
X0    0
X1    0
X2    0
X3    0
U0    0
U1    0
dtype: int64
[2] [2]
[0.1832542  1.00765057 0.39356003 2.69629826 4.67755903 1.60118443
 3.84008018 2.49815761 3.01867434 2.91035063 3.87852219 2.65900079
 1.38652217 2.86980638 3.0969153  1.0956512  1.37778222 3.90426275
 0.44996958 3.45270848 1.40432873 1.46338253 4.12107968 1.52952583
 0.36030657 4.29746329 1.58261624 3.60655602 1.92485747 3.86352427
 1.96475219 1.17152614 3.93094682 4.49725595 2.31526278 1.23851178
 1.42758753 2.10203874 3.19087062 3.99918675 1.69363826 3.49414772
 1.74185013 2.09462902 3.02396045 3.30784799 5.5694695  2.78223462
 3.55914972 3.82491324 4.69212077 1.15737385 3.97873248 2.12769751
 2.49952673 1.45291561 3.19580625 3.12023    1.85911969 1.25169637
 6.02415249 2.25914603 1.15247381 2.71796811 1.37273962 3.44831986
 3.18441478 1.33319287 2.76071963 5.09725247 3.7936907  2.75805224
 1.45384498 2.70541836 2.73085842 2.75762122 1.8546111  1

KeyboardInterrupt: 

## Retrieving and Processing Descriptors

In [None]:
df = pd.DataFrame(d2c.descriptors_df)
df

AttributeError: 'D2C' object has no attribute 'descriptors_df'

We add a 'function_id' column to identify which causal mechanism generated each dataset. We know that the each sequence of 40 graph_ids corresponds to a function. (Because we generated 40 graphs per function)

In [None]:
df['function_id'] = df['graph_id'] // 40

In [None]:
df.to_csv('../data/example/descriptors.csv', index=False)