About from_pandas #56

ziyuwzf · 2020-06-30T08:19:01Z

Description

when i use 'from_pandas' to learning causal map by notears,i run 'watch -n 1 free -m',it shows that 3/16GB used.i run 370 thousand data but only use memory 3G?how to improve efficiency？

Context

Every 1.0s: free -m Tue Jun 30 16:18:36 2020

          total        used        free      shared  buff/cache   available

Mem: 16384 2799 12213 0 1371 13584
Swap: 0 0 0

The text was updated successfully, but these errors were encountered:

ziyuwzf · 2020-06-30T11:14:04Z

and from_pandas very slow

SteveLerQB · 2020-06-30T11:27:38Z

Hi @ziyuwzf, thanks for your question. We have recently included the option to run the NOTEARS algorithm with Pytorch in the develop branch, and this will be included in the upcoming release. To use this feature,

git clone and cd to root of the repo
git checkout develop
pip install ".[pytorch]"
In your code, do

from causalnex.structure.pytorch.notears import from_pandas

from_pandas(....)

Hope this will solve your problem. Please let me know if you still run into the same issue. Thanks. 🙂

ziyuwzf · 2020-06-30T11:55:51Z

Hi @ziyuwzf, thanks for your question. We have recently included the option to run the NOTEARS algorithm with Pytorch in the develop branch, and this will be included in the upcoming release. To use this feature,

git clone and cd to root of the repo

git checkout develop

pip install ".[pytorch]"

In your code, do
from causalnex.structure.pytorch.notears import from_pandas

from_pandas(....)
Hope this will solve your problem. Please let me know if you still run into the same issue. Thanks.

thanks！i will try it now

ziyuwzf · 2020-06-30T12:51:57Z

i can not use git command,so i download causalnex-develop.zip and unzip it,then i cd it.
but when i do step 3:pip install ".[pytorch]",it shows that
ERROR: Could not find a version that satisfies the requirement torch<2.0,>=1.4.0 (from causalnex==0.7.0) (from versions: 0.1.2, 0.1.2.post1, 0.3.1, 0.4.0, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2)
ERROR: No matching distribution found for torch<2.0,>=1.4.0 (from causalnex==0.7.0)

SteveLerQB · 2020-06-30T13:00:17Z

Hi @ziyuwzf, what python version are you using?

Do python -V to check your python version

ziyuwzf · 2020-06-30T13:06:28Z

Hi @ziyuwzf, what python version are you using?

Do python -V to check your python version

it works

from causalnex.structure.pytorch.notears import from_pandas

the reason of the error is just a piece of cake，because i set pip install '.[pytorch]' -i XX.XX.XX.XX。it works! thanks!

SteveLerQB · 2020-06-30T13:07:19Z

ok sure no worries! 🙂

ziyuwzf · 2020-06-30T13:10:03Z

ok sure no worries!

i have already used 'from causalnex.structure.pytorch.notears import from_pandas'
but the speed is not fast
and the memory shows
Every 1.0s: free -m Tue Jun 30 21:09:37 2020
total used free shared buff/cache available
Mem: 16384 2313 10290 0 3780 14070
Swap: 0 0 0

SteveLerQB · 2020-06-30T13:36:19Z

@ziyuwzf Thanks for sharing this. We will take a look into this. Just a quick question do you have a GPU? We have an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas. Will this be useful to you?

qbphilip · 2020-06-30T13:49:37Z

@ziyuwzf Thanks for your question, do I understand correctly that you have 370k rows? How many variables/features do you have?

You can often speed it up massively by using a larger beta (L1 regularisation). Please note that this parameter is not normalised to the number of variables. Hence, for ~1000 features a beta of 1 works quite well on real-world data. Obviously the quality of this depends on the graph of the ground truth/data generating process.

NOTE: On my macbook (4 cores, 8 threads) I ran the pytorch implementation with up to 1000 features and 1000 rows. The method should scale linearly with n_obs and cubically with the number of features (due to gradients and the constraint).

ziyuwzf · 2020-07-01T02:06:50Z

@ziyuwzf Thanks for sharing this. We will take a look into this. Just a quick question do you have a GPU? We have an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas. Will this be useful to you?

i have gpus,when can i use GPU by your an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas?

ziyuwzf · 2020-07-01T02:10:25Z

@ziyuwzf Thanks for your question, do I understand correctly that you have 370k rows? How many variables/features do you have?

You can often speed it up massively by using a larger beta (L1 regularisation). Please note that this parameter is not normalised to the number of variables. Hence, for ~1000 features a beta of 1 works quite well on real-world data. Obviously the quality of this depends on the graph of the ground truth/data generating process.

NOTE: On my macbook (4 cores, 8 threads) I ran the pytorch implementation with up to 1000 features and 1000 rows. The method should scale linearly with n_obs and cubically with the number of features (due to gradients and the constraint).

yes,370k rows and 30 columns.

SteveLerQB · 2020-07-01T13:15:31Z

@ziyuwzf Thanks for sharing this. We will take a look into this. Just a quick question do you have a GPU? We have an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas. Will this be useful to you?

i have gpus, when can i use GPU by your an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas?

Hi @ziyuwzf I have moved the WIP branch here: #57. Please go to the feature/pytorch_gpu branch, download the zip file from this branch and follow the same steps as before. Thanks 🙂

SteveLerQB · 2020-07-01T13:19:55Z

Example usage:

from_pandas(dataset,...., use_gpu=True)

ziyuwzf · 2020-07-21T02:14:09Z

Example usage:

from_pandas(dataset,...., use_gpu=True)

i used 'from_pandas(dataset, use_gpu=True)',but i shows error that

/usr/local/python3/lib/python3.6/site-packages/causalnex/structure/pytorch/notears.py in from_pandas(X, beta, max_iter, w_threshold, tabu_edges, tabu_parent_nodes, tabu_child_nodes, **kwargs)
    191         tabu_parent_nodes,
    192         tabu_child_nodes,
--> 193         **kwargs
    194     )
    195 
/usr/local/python3/lib/python3.6/site-packages/causalnex/structure/pytorch/notears.py in from_numpy(X, beta, w_threshold, max_iter, tabu_edges, tabu_parent_nodes, tabu_child_nodes, **kwargs)
    105     ]
    106 
--> 107     model = NotearsMLP(n_features=d, lasso_beta=beta, bounds=bnds, **kwargs)
    108 
    109     model.fit(X, max_iter=max_iter)
TypeError: __init__() got an unexpected keyword argument 'use_gpu'

rbartelme · 2021-04-09T21:00:21Z

Is the gpu acceleration in the released codebase? Or is it only on the developer branch?

oentaryorj · 2021-08-21T14:32:20Z

The GPU acceleration has been implemented in this commit and will be made available in the next release.

SteveLerQB added the enhancement New feature or request label Jun 30, 2020

oentaryorj closed this as completed Aug 21, 2021

qbphilip mentioned this issue Nov 10, 2021

Release/0.11.0 #141

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About from_pandas #56

About from_pandas #56

ziyuwzf commented Jun 30, 2020

ziyuwzf commented Jun 30, 2020

SteveLerQB commented Jun 30, 2020

ziyuwzf commented Jun 30, 2020

ziyuwzf commented Jun 30, 2020 •

edited

SteveLerQB commented Jun 30, 2020

ziyuwzf commented Jun 30, 2020

SteveLerQB commented Jun 30, 2020

ziyuwzf commented Jun 30, 2020

SteveLerQB commented Jun 30, 2020

qbphilip commented Jun 30, 2020 •

edited

ziyuwzf commented Jul 1, 2020

ziyuwzf commented Jul 1, 2020

SteveLerQB commented Jul 1, 2020

SteveLerQB commented Jul 1, 2020

ziyuwzf commented Jul 21, 2020 •

edited

rbartelme commented Apr 9, 2021

oentaryorj commented Aug 21, 2021

About from_pandas #56

About from_pandas #56

Comments

ziyuwzf commented Jun 30, 2020

Description

Context

ziyuwzf commented Jun 30, 2020

SteveLerQB commented Jun 30, 2020

ziyuwzf commented Jun 30, 2020

ziyuwzf commented Jun 30, 2020 • edited

SteveLerQB commented Jun 30, 2020

ziyuwzf commented Jun 30, 2020

SteveLerQB commented Jun 30, 2020

ziyuwzf commented Jun 30, 2020

SteveLerQB commented Jun 30, 2020

qbphilip commented Jun 30, 2020 • edited

ziyuwzf commented Jul 1, 2020

ziyuwzf commented Jul 1, 2020

SteveLerQB commented Jul 1, 2020

SteveLerQB commented Jul 1, 2020

ziyuwzf commented Jul 21, 2020 • edited

rbartelme commented Apr 9, 2021

oentaryorj commented Aug 21, 2021

ziyuwzf commented Jun 30, 2020 •

edited

qbphilip commented Jun 30, 2020 •

edited

ziyuwzf commented Jul 21, 2020 •

edited