# Usage



First, let's import the necessary modules and classes:

In [8]:
import sys
sys.path.append("..")

In [9]:
from d2c.simulatedDAGs import SimulatedDAGs
from d2c.D2C import D2C
import pandas as pd

Next, let's set the values for the essential parameters:

In [10]:
N_JOBS = 1

n_dags = 3
n_observations = 100
n_nodes = 5

We can now create our simulated DAGs by instantiating the `SimulatedDAGs` object and calling the `generate_dags()` method. 

In [11]:
simulated_dags = SimulatedDAGs(n_dags, n_observations, n_nodes, n_jobs=N_JOBS)
simulated_dags.generate_dags()

Once the DAGs have been created, we can create observations that respect the structure of the DAGs (i.e. the causal relationships between variables that have a `parent -> child` connection in the graph is manifested by creating the `child` variable as a function of the `parent(s)` variables.)

In [12]:
simulated_dags.simulate_observations()

Let's take a look at one of the generated DAGs using the `plot_DAG()` method. We choose the DAG with index `1`. 

In [13]:
simulated_dags.plot_DAG(1)

We have DAGs and corresponding observations now. We can proceed by initializing our `D2C` object that will allow us to compute the descriptors. 

In [15]:
d2c = D2C(simulated_dags.get_dags(), simulated_dags.get_observations(), n_jobs=N_JOBS)

We can calculate the descriptors with the line `d2c.initialize()`. The object also offers a method `load_descriptors()` to load the descriptors that have been previously computed and stored into a file, to avoid repeating the computation.

In [17]:
#d2c.initialize()
d2c.load_descriptors_df('dataframe.csv')

We can take a look at the descriptors. Let's retrieve the dataframe from the D2C object using the `get_df()` and print it.

In [19]:
df = d2c.get_descriptors_df()
print(df)

    graph_id  edge_source  edge_dest         effca         effef    comcau  \
0          0            1          0  6.892631e-01  4.801133e-16  0.449849   
1          0            0          1  4.801133e-16  4.640393e-16  0.449732   
2          0            2          0  2.193653e-02 -3.165594e-16  0.178385   
3          0            0          2 -5.255358e-17  2.840123e-16  0.178429   
4          0            2          1 -3.772340e-01  2.678803e-16  0.353385   
..       ...          ...        ...           ...           ...       ...   
95         4            1          4  1.830362e-16  1.620039e-16  0.402702   
96         4            4          2 -3.613654e-19  4.052794e-01  0.137863   
97         4            2          4 -4.352466e-16 -2.635870e-16  0.138677   
98         4            4          3  1.157671e-16 -5.090505e+13  0.220710   
99         4            3          4 -6.852847e-17  4.148920e-16  0.218113   

       delta    delta2      delta.i1      delta.i2  ...   Int3.

The `D2C` object offers a method `get_score()` to quickly assess the ability of using these descriptors to perform causal discovery. In fact, descriptors for `child -> parent` edges are also computed, but their label is set to zero. Therefore, we can perform a standard classification task on this dataframe. By default, this method uses a `RandomForestClassifier()`.

In [22]:

# Get the score of a Random Forest Classifier
score = d2c.get_score(n_splits=5)
print(f'The accuracy of the model is: {score}')


The accuracy of the model is: 0.9199999999999999
