Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PNL #85

Merged
merged 3 commits into from
Nov 5, 2022
Merged

Update PNL #85

merged 3 commits into from
Nov 5, 2022

Conversation

ErdunGAO
Copy link
Contributor

@ErdunGAO ErdunGAO commented Nov 4, 2022

Updates

  • Change the gradient descent training method to stochastic gradient descent (For acceleration).
  • Delete the dele_abnormal function, which is useless since our new implementation is robust.
  • Add the testing results of our method on the real data and an additional simulation case simulation_dataset_3.

Description

  • Files in data folder TestData/
    • Reduce the number of samples in the two previous simulation datasets from $1000$ to $200$, to make the test faster. (Actually, $200$ is enough for finding the causal direction of two variables.)
    • Add one more simulation dataset.
  • TestPNL.py file
    • Increase the p-value threshold from $0.1$ to $0.5$.
    • Update the stored results since the previous datasets are changed.
    • Add the test functions for the third simulation dataset and the real data.
  • PNL.py file
    • Add a PairDataset(Dataset) function (for batch training).
    • Reduce the total epochs from $100000$ to $3000$.
    • Codes in lines $96-107$ are cleaned. Actually, the code logic is not changed.
    • Change the learning rate from $1e-5$ to $1e-4$.
    • Delete the loss recording variables named loss_all, loss_pdf_all, and loss_jacob_all.
    • Variables $y1$, $y2$ are deleted. $y1$ is useless and $y2$ is actually the estimated noise. We use $e$ to replace $y2$ and $e_estimated$ to name the estimated noise.
    • zero_grad() of G1 and G2 is replaced by optimizer.zero_grad().
    • The dele_abnormal function is deleted.

Notes

  • Please kindly note that the p_value_threshold is set to $0.5$ in TestPNL.py. This is because the p_value_backward (0.394) on real data is high, which may caused by the model misspecification on real data.
  • Please try to normalize your testing data if it was collected from the real world.

Test plan
python -m unittest tests.TestPNL # should pass

image


# Set the threshold for independence test
p_value_threshold = 0.1 # useless now but left
p_value_threshold = 0.5 # useless now but left
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only concern I have.

Do we have enough evidence to claim something if we compare with threshold 0.5? This basically means you can have false positive error as large as 0.5?

@tofuwen
Copy link
Contributor

tofuwen commented Nov 5, 2022

cc @kunwuz to merge

@kunwuz kunwuz merged commit e717ad1 into py-why:main Nov 5, 2022
@tofuwen tofuwen mentioned this pull request Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants