Code to demonstrate basic FastCDA capabilities in a Jupyter notebook.

In [None]:
from fastcda import FastCDA
from dgraph_flex import DgraphFlex
import semopy
import pprint as pp

# create  an instance of FastCDA
fc = FastCDA()

###  *Reading in data*

For this demo, we are going to use a sample ema dataset that is built into
the fastcda package.

To read in your own csv data file "mydata.csv" you would
use the pandas package, a very powerful package for working 
with dataframes.

Here is the code:
```
import pandas as pd

df = pd.read_csv("mydata.csv")

```

In [None]:
# read in the sample ema dataset and view it
df = fc.getEMAData()

df

In [None]:
# add the lags, with a suffix of '_lag'
lag_stub = '_lag'
df_lag = fc.add_lag_columns(df, lag_stub=lag_stub)
df_lag

In [None]:
# standardize the data
df_lag_std = fc.standardize_df_cols(df_lag)
df_lag_std

In [None]:
# lets get the dataframe col names
cols = df.columns
cols

In [None]:
# Create the knowledge prior content for temporal
# order. The lag variables can only be parents of the non
# lag variables

knowledge = {'addtemporal': {
                            0: [col + lag_stub for col in cols],
                            1: [col for col in cols]
                            }
            }
knowledge

In [None]:
# run model with run_model_search
result, graph = fc.run_model_search(df_lag_std, 
                             model = 'gfci',
                             score={'sem_bic': {'penalty_discount': 1.0}},
                             test={"fisher_z": {"alpha": .01}},
                             knowledge=knowledge
                             )


In [None]:
graph.show_graph()

Show the graph with just the directed edges.

In [None]:
# lets show the directed edges only
graph.show_graph(directed_only=True)

Lets make the nodes stand out. Let's highlight the lag variables 
by making them have a dotted outline and PANAS_PA to have a light green color, PANAS_NA to have a lightpink color and have alcohol_bev* have a rectangular shape and have a purple fill color.

In [None]:
node_styles = [
    {"pattern": "*_lag",        "style": "dotted"},
    {"pattern": "PANAS_PA*",    "style": "filled", "fillcolor": "lightgreen"},
    {"pattern": "PANAS_NA*",    "style": "filled", "fillcolor": "lightpink"},
    {"pattern": "PANAS_PA_lag", "style": "filled,dotted", "fillcolor": "lightgreen"},
    {"pattern": "PANAS_NA_lag", "style": "filled,dotted", "fillcolor": "lightpink"},
    {"pattern": "alcohol_bev*", "shape": "box", "style": "filled", "fillcolor": "purple", "fontcolor": "white"},
]

fc.show_styled_graph(graph, node_styles)

In [None]:
# Show styled graph with directed edges only
fc.show_styled_graph(graph, node_styles, directed_only=True)

### *Multi-Graph Comparison*

When comparing causal discovery results from different configurations, it helps to have nodes placed in the same positions across all graphs. The `show_n_graphs` method computes a shared layout from the union of all graphs and pins nodes at consistent coordinates. Disconnected nodes (those with no edges in a particular graph) are grayed out by default.

In [None]:
# Run two more model searches with different penalty discounts
result2, graph2 = fc.run_model_search(df_lag_std, 
                             model = 'gfci',
                             score={'sem_bic': {'penalty_discount': 2.0}},
                             test={"fisher_z": {"alpha": .01}},
                             knowledge=knowledge
                             )

result3, graph3 = fc.run_model_search(df_lag_std, 
                             model = 'gfci',
                             score={'sem_bic': {'penalty_discount': 3.0}},
                             test={"fisher_z": {"alpha": .01}},
                             knowledge=knowledge
                             )

In [None]:
# Compare three graphs side-by-side with shared node layout
# Use graph_size to force identical dimensions (width,height in inches)
# Nodes without edges in a graph are grayed out by default
fc.show_n_graphs(
    [graph, graph2, graph3],
    node_styles=node_styles,
    gray_disconnected=True,
    labels=["PD=1.0", "PD=2.0", "PD=3.0"],
    graph_size="10,8"
)

In [None]:
# Compare three graphs with directed edges only
fc.show_n_graphs(
    [graph, graph2, graph3],
    node_styles=node_styles,
    gray_disconnected=True,
    directed_only=True,
    labels=["PD=1.0", "PD=2.0", "PD=3.0"],
    graph_size="10,8"
)

In [None]:
# Same comparison but without graying out disconnected nodes
fc.show_n_graphs(
    [graph, graph2, graph3],
    node_styles=node_styles,
    gray_disconnected=False,
    labels=["PD=1.0", "PD=2.0", "PD=3.0"],
    graph_size="10,8"
)

In [None]:
# Save graphs to PNG files
fc.save_n_graphs(
    [graph, graph2, graph3],
    ["paired_graph_pd1", "paired_graph_pd2", "paired_graph_pd3"],
    node_styles=node_styles,
    gray_disconnected=True,
    labels=["PD=1.0", "PD=2.0", "PD=3.0"],
    graph_size="10,8",
    res=300
)
print("Saved: paired_graph_pd1.png, paired_graph_pd2.png, paired_graph_pd3.png")