The performance gap bewteen `KCI_UInd` and `KCI_CInd` under a similar setting

**The issue is based on the code in Pull request #55**

Here is just a weird problem with the performance gap between `KCI_UInd` and `KCI_CInd`. Intuitively, the test of $X\bot Y$ and $X\bot Y|Z=1$ (Z is a constant) should have a similar performance, or the latter test(use `KCI_CInd`) should have a worse performance due to it handling a more universal case. However, when I ran the code, the result is not as I excepted.

![image](https://user-images.githubusercontent.com/37894651/178270371-2edd7cb4-b148-4121-81b2-2819082fc61c.png)

I test the code by a random collider dataset, which means $X\bot Z$, $X\equiv Y$; and I also visualize the `test statistics`, `mean` and `var` for convenient debugging. And the result shows a similar p-value of $X\bot Z$, $X\bot Y$ and a different p-value of $X\bot Z | 1$, $X\bot Y | 1$.

Following is my test code:

```python
from icecream import ic
from causallearn.utils.cit import CIT
from tqdm import trange
import numpy as np


def generate_single_sample(type, dim):
    if (type == 'chain'):
        X = np.random.random(dim)
        Y = np.random.random(dim)+X
        Z = np.random.random(dim)+Y
        #X->Y->Z
    elif (type == 'collider'):
        # X->Y<-Z
        X = np.random.random(dim)
        Z = np.random.random(dim)
        Y = np.random.random(dim)+X+Z
    #Y = np.zeros(dim)+np.average(Y)
    return list(X)+list(Y)+list(Z)+[1]# 31 dim X:0..9; Y:10..19; Z:20..29; 1: 30

def generate_dataset(dim, size):
    dataset = []
    for i in range(size):
        datapoint = generate_single_sample('collider', dim)
        dataset.append(datapoint)
    dataset = np.array(dataset)
    return dataset


if __name__ == '__main__':
    dataset = generate_dataset(10, 1000)
    cit_tester = CIT(dataset, method = 'kci')
    #ic(cit_tester.kci(0, 20, []))
    # Origin version can not pass this due to the feature-30 have the similar value
    #ic(cit_tester.kci(0, 20, [30]))
    # The follow is from one of my recent requirements, which is using CIT to test high dim variables
    # Test high dim variables is not supported by current cit class, which is different from the documents,
    # so I also implement this function in the last commit.
    # An issue is related to the "CIT of test high dim variables" which I will put forward latter
    ic(cit_tester.kci(range(10), range(20,30), range(10,20)))
    ic(cit_tester.kci(range(10), range(20,30), []))
    ic(cit_tester.kci(range(10), range(10,20), []))
    ic(cit_tester.kci(range(10), range(20,30), [30]))
    ic(cit_tester.kci(range(10), range(10,20), [30]))
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The performance gap bewteen `KCI_UInd` and `KCI_CInd` under a similar setting #56

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The performance gap bewteen KCI_UInd and KCI_CInd under a similar setting #56

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The performance gap bewteen `KCI_UInd` and `KCI_CInd` under a similar setting #56