Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About from_pandas #56

Closed
ziyuwzf opened this issue Jun 30, 2020 · 17 comments
Closed

About from_pandas #56

ziyuwzf opened this issue Jun 30, 2020 · 17 comments
Labels
enhancement New feature or request

Comments

@ziyuwzf
Copy link

ziyuwzf commented Jun 30, 2020

Description

when i use 'from_pandas' to learning causal map by notears,i run 'watch -n 1 free -m',it shows that 3/16GB used.i run 370 thousand data but only use memory 3G?how to improve efficiency?

Context

Every 1.0s: free -m Tue Jun 30 16:18:36 2020

          total        used        free      shared  buff/cache   available

Mem: 16384 2799 12213 0 1371 13584
Swap: 0 0 0

@ziyuwzf
Copy link
Author

ziyuwzf commented Jun 30, 2020

and from_pandas very slow

@SteveLerQB
Copy link
Contributor

Hi @ziyuwzf, thanks for your question. We have recently included the option to run the NOTEARS algorithm with Pytorch in the develop branch, and this will be included in the upcoming release. To use this feature,

  1. git clone and cd to root of the repo
  2. git checkout develop
  3. pip install ".[pytorch]"
  4. In your code, do
from causalnex.structure.pytorch.notears import from_pandas

from_pandas(....)

Hope this will solve your problem. Please let me know if you still run into the same issue. Thanks. 🙂

@ziyuwzf
Copy link
Author

ziyuwzf commented Jun 30, 2020

Hi @ziyuwzf, thanks for your question. We have recently included the option to run the NOTEARS algorithm with Pytorch in the develop branch, and this will be included in the upcoming release. To use this feature,

  1. git clone and cd to root of the repo
  2. git checkout develop
  3. pip install ".[pytorch]"
  4. In your code, do
from causalnex.structure.pytorch.notears import from_pandas

from_pandas(....)

Hope this will solve your problem. Please let me know if you still run into the same issue. Thanks.

thanks!i will try it now

@ziyuwzf
Copy link
Author

ziyuwzf commented Jun 30, 2020

i can not use git command,so i download causalnex-develop.zip and unzip it,then i cd it.
but when i do step 3:pip install ".[pytorch]",it shows that
ERROR: Could not find a version that satisfies the requirement torch<2.0,>=1.4.0 (from causalnex==0.7.0) (from versions: 0.1.2, 0.1.2.post1, 0.3.1, 0.4.0, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2)
ERROR: No matching distribution found for torch<2.0,>=1.4.0 (from causalnex==0.7.0)

@SteveLerQB
Copy link
Contributor

Hi @ziyuwzf, what python version are you using?

Do python -V to check your python version

@ziyuwzf
Copy link
Author

ziyuwzf commented Jun 30, 2020

Hi @ziyuwzf, what python version are you using?

Do python -V to check your python version

it works

from causalnex.structure.pytorch.notears import from_pandas

the reason of the error is just a piece of cake,because i set pip install '.[pytorch]' -i XX.XX.XX.XX。it works! thanks!

@SteveLerQB
Copy link
Contributor

ok sure no worries! 🙂

@ziyuwzf
Copy link
Author

ziyuwzf commented Jun 30, 2020

ok sure no worries!

i have already used 'from causalnex.structure.pytorch.notears import from_pandas'
but the speed is not fast
and the memory shows
Every 1.0s: free -m Tue Jun 30 21:09:37 2020
total used free shared buff/cache available
Mem: 16384 2313 10290 0 3780 14070
Swap: 0 0 0

@SteveLerQB SteveLerQB added the enhancement New feature or request label Jun 30, 2020
@SteveLerQB
Copy link
Contributor

@ziyuwzf Thanks for sharing this. We will take a look into this. Just a quick question do you have a GPU? We have an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas. Will this be useful to you?

@qbphilip
Copy link
Contributor

qbphilip commented Jun 30, 2020

@ziyuwzf Thanks for your question, do I understand correctly that you have 370k rows? How many variables/features do you have?

You can often speed it up massively by using a larger beta (L1 regularisation). Please note that this parameter is not normalised to the number of variables. Hence, for ~1000 features a beta of 1 works quite well on real-world data. Obviously the quality of this depends on the graph of the ground truth/data generating process.

NOTE: On my macbook (4 cores, 8 threads) I ran the pytorch implementation with up to 1000 features and 1000 rows. The method should scale linearly with n_obs and cubically with the number of features (due to gradients and the constraint).

@ziyuwzf
Copy link
Author

ziyuwzf commented Jul 1, 2020

@ziyuwzf Thanks for sharing this. We will take a look into this. Just a quick question do you have a GPU? We have an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas. Will this be useful to you?

i have gpus,when can i use GPU by your an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas?

@ziyuwzf
Copy link
Author

ziyuwzf commented Jul 1, 2020

@ziyuwzf Thanks for your question, do I understand correctly that you have 370k rows? How many variables/features do you have?

You can often speed it up massively by using a larger beta (L1 regularisation). Please note that this parameter is not normalised to the number of variables. Hence, for ~1000 features a beta of 1 works quite well on real-world data. Obviously the quality of this depends on the graph of the ground truth/data generating process.

NOTE: On my macbook (4 cores, 8 threads) I ran the pytorch implementation with up to 1000 features and 1000 rows. The method should scale linearly with n_obs and cubically with the number of features (due to gradients and the constraint).

yes,370k rows and 30 columns.

@SteveLerQB
Copy link
Contributor

@ziyuwzf Thanks for sharing this. We will take a look into this. Just a quick question do you have a GPU? We have an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas. Will this be useful to you?

i have gpus, when can i use GPU by your an internal WIP branch which provides the option to use gpu for causalnex.structure.pytorch.notears.from_pandas?

Hi @ziyuwzf I have moved the WIP branch here: #57. Please go to the feature/pytorch_gpu branch, download the zip file from this branch and follow the same steps as before. Thanks 🙂

@SteveLerQB
Copy link
Contributor

Example usage:

from_pandas(dataset,...., use_gpu=True)

@ziyuwzf
Copy link
Author

ziyuwzf commented Jul 21, 2020

Example usage:

from_pandas(dataset,...., use_gpu=True)

i used 'from_pandas(dataset, use_gpu=True)',but i shows error that

/usr/local/python3/lib/python3.6/site-packages/causalnex/structure/pytorch/notears.py in from_pandas(X, beta, max_iter, w_threshold, tabu_edges, tabu_parent_nodes, tabu_child_nodes, **kwargs)
    191         tabu_parent_nodes,
    192         tabu_child_nodes,
--> 193         **kwargs
    194     )
    195 
/usr/local/python3/lib/python3.6/site-packages/causalnex/structure/pytorch/notears.py in from_numpy(X, beta, w_threshold, max_iter, tabu_edges, tabu_parent_nodes, tabu_child_nodes, **kwargs)
    105     ]
    106 
--> 107     model = NotearsMLP(n_features=d, lasso_beta=beta, bounds=bnds, **kwargs)
    108 
    109     model.fit(X, max_iter=max_iter)
TypeError: __init__() got an unexpected keyword argument 'use_gpu'

@rbartelme
Copy link

Is the gpu acceleration in the released codebase? Or is it only on the developer branch?

@oentaryorj
Copy link
Contributor

The GPU acceleration has been implemented in this commit and will be made available in the next release.

@qbphilip qbphilip mentioned this issue Nov 10, 2021
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants