# Usage



First, let's import the necessary modules and classes:

In [1]:
import sys
sys.path.append("..")

In [2]:
from d2c.simulatedDAGs import SimulatedDAGs
from d2c.D2C import D2C
import pandas as pd

Next, let's set the values for the essential parameters:

In [3]:
N_JOBS = 1

n_dags = 3
n_observations = 100
n_nodes = 5

We can now create our simulated DAGs by instantiating the `SimulatedDAGs` object and calling the `generate_dags()` method. 

In [4]:
simulated_dags = SimulatedDAGs(n_dags, n_observations, n_nodes, n_jobs=N_JOBS)
simulated_dags.generate_dags()

Once the DAGs have been created, we can create observations that respect the structure of the DAGs (i.e. the causal relationships between variables that have a `parent -> child` connection in the graph is manifested by creating the `child` variable as a function of the `parent(s)` variables.)

In [5]:
simulated_dags.simulate_observations()

Let's take a look at one of the generated DAGs using the `plot_DAG()` method. We choose the DAG with index `1`. 

In [6]:
simulated_dags.plot_DAG(1)

  from .autonotebook import tqdm as notebook_tqdm


We have DAGs and corresponding observations now. We can proceed by initializing our `D2C` object that will allow us to compute the descriptors. 

In [7]:
d2c = D2C(simulated_dags, n_jobs=N_JOBS)

TypeError: __init__() missing 1 required positional argument: 'observations'

We can calculate the descriptors with the line `d2c.initialize()`. The object also offers a method `load_descriptors()` to load the descriptors that have been previously computed and stored into a file, to avoid repeating the computation.

In [None]:
#d2c.initialize()
d2c.load_descriptors('dataframe.csv')

We can take a look at the descriptors. Let's retrieve the dataframe from the D2C object using the `get_df()` and print it.

In [None]:
df = d2c.get_df()
print(df)

The `D2C` object offers a method `get_score()` to quickly assess the ability of using these descriptors to perform causal discovery. In fact, descriptors for `child -> parent` edges are also computed, but their label is set to zero. Therefore, we can perform a standard classification task on this dataframe. By default, this method uses a `RandomForestClassifier()`.

In [None]:

# Get the score of a Random Forest Classifier
score = d2c.get_score(test_size=0.2, metric='accuracy')
print(f'The accuracy of the model is: {score}')


(0.725, 0.7027027027027027)