Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have a question about Dynotears #74

Closed
minsik-bioinfo opened this issue Nov 3, 2020 · 11 comments
Closed

I have a question about Dynotears #74

minsik-bioinfo opened this issue Nov 3, 2020 · 11 comments
Labels
bug Something isn't working

Comments

@minsik-bioinfo
Copy link

Description

I want to know an input data and result for dynotears.

Context

I tried to use dynotears.from_pandas using DREAM4 challenge data, but get an empty graph.
I constructed a list of dataframe as below that contains 10 dataframes.
For each dataframe, the column is node and the row is timepoint such as below.
g1 g2
1 1 2
2 4 2
3 3 1

@qbphilip
Copy link
Contributor

qbphilip commented Nov 3, 2020

Did you have a look at
from causalnex.structure.transformers import DynamicDataTransformer?

Its a util to get the data in the right shape.

@minsik-bioinfo
Copy link
Author

@qbphilip
Thank you for your comment.
I have a look at DynamicDataTransformer.
It transforms data like g1_lag0 g2_lag0 g1_lag1 g2_lag2.... g1_lagp g2_lagp.
However from_pandas_dynamic function contains DynamicDataTransformer.
Could you give me a simple example code and data for dynotears?

@GabrielAzevedoFerreiraQB
Copy link
Contributor

GabrielAzevedoFerreiraQB commented Nov 9, 2020

Hi!
Thanks for the interest in causalnex.

if you provide a pandas dataset whose index represent timestamps, from_pandas_dynamic should work. That is, the row 0 is the timestamp zero, row 1 the first timestamp, and so on... We do not need to create the "lagged" columns because from_pandas_dynamic creates it for you.

I am printing below some code of a simulated dataset. Maybe the API changed a little because this code as many months old:

#### GET A SIMULATED TIME SERIES ####
import warnings
import pandas as pd
import numpy as np
from sklearn import preprocessing
import seaborn as sns
# silence warnings
warnings.filterwarnings("ignore")

from causalnex.structure.data_generators import gen_stationary_dyn_net_and_df

# Obtain simulated structure (g), dataset sampled from g and list of intra- and inter-slice node names
g, df, intra_nodes, inter_nodes = gen_stationary_dyn_net_and_df(
    num_nodes = 10,
    n_samples = 10000,
    p = 1,
    degree_intra = 3,
    degree_inter = 2,
    graph_type_intra = 'erdos-renyi',
    graph_type_inter = 'erdos-renyi',
    w_min_intra = 0.3,
    w_max_intra = 2,
    w_min_inter = 0.3,
    w_max_inter = 0.5,
    w_decay = 1.0,
    sem_type = 'linear-gauss',
    noise_scale = 1,
    max_data_gen_trials = 1000,
)

df = df[intra_nodes]
df.columns = [el.split('_')[0] for el in df.columns]
df.head() # <<--- this is a time series data. each row is a timestamp 

image

Then you can call dynotears:

from causalnex.structure.dynotears import from_pandas_dynamic
g_learnt = from_pandas_dynamic(df,1,lambda_w=.1,lambda_a=.1,w_threshold=.1)
g_learnt

then, to see the graph you can do:

from copy import deepcopy
g_learnt_2 = deepcopy(g_learnt)
g_learnt_2.remove_edges_below_threshold(.1)
from causalnex.plots import plot_structure
from IPython.display import Image
viz = plot_structure(g_learnt_2.get_largest_subgraph())
f='dbn_learnt.jpg'
viz.draw(f)
Image(f)

image

@GabrielAzevedoFerreiraQB
Copy link
Contributor

each variable has a _lagX on it. this represents the variable shiftted of one timestamp. for example,
the edge 1_lag1 --> 5_lag0 indicates that variable 5 in a time $t$ will be affected by the value of the variable 1 at $t-1$.

@GabrielAzevedoFerreiraQB
Copy link
Contributor

Let me know if it helps :)

@LukaJakovljevic
Copy link

Hi @GabrielAzevedoFerreiraQB ,

I have tried to execute your code from #74 (comment)

When running second snippet (with g_learnt) I am getting this error in Jupyter notebook (Python 3.7.6):

`~\anaconda3\lib\site-packages\causalnex\structure\transformers.py in _check_input_from_pandas(self, time_series)
203
204 if t.index.dtype != int:
--> 205 raise TypeError("Index must be integers")
206
207 if self.columns is not None:

TypeError: Index must be integers`

I get the same error when I initially tried to execute from_pandas_dynamic on my own dataframe.

Do you know what can be the problem?

cc: @qbphilip
Thanks!

@GabrielAzevedoFerreiraQB
Copy link
Contributor

hmm, somehow the returned index are not integers. Maybe there was a change in the generators, and the indexes are not integers anymore

The index of your dataframes must represent the sampling time of your time series. (i.e. x_1, x_2, x_3...) (this is very important)

I suggest taking a look at df.index to make sure that they are (1) integer, (2) in increasing order and (3) that the indexes increases 1 by one.

@panas89
Copy link

panas89 commented Dec 4, 2020

Hi I can execute the code at comment 74 #74 (comment),
Is there a way to impose tabu_edges between lag1 and lag0 variables?
How should we use the learned structure to instantiate a DBN?

@oentaryorj
Copy link
Contributor

A more robust integer type checking has been implemented in this commit and will be available in the next CausalNex release.

@donaldRwilliams
Copy link

Hi,

How do I access the adjacency matrix of the "true" graph from the above code ?

@oentaryorj
Copy link
Contributor

oentaryorj commented Sep 3, 2021

Hi @donaldRwilliams,

My understanding is that the StructureModel object produced by from_pandas_dynamic extends (inherits) the networkx.DiGraph class (see here). So in principle you can extract the adjacency matrix using networkx's functionalities such as networkx.adjancency_matrix. Alternatively, you can write your own code to extract the matrix by iterating over g_learnt.nodes() or g_learnt.edges().

Hope this helps. Thanks.

@qbphilip qbphilip mentioned this issue Nov 10, 2021
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants