Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FisherS return nan value when ID is pretty large #10

Closed
marsggbo opened this issue Dec 9, 2021 · 2 comments
Closed

FisherS return nan value when ID is pretty large #10

marsggbo opened this issue Dec 9, 2021 · 2 comments

Comments

@marsggbo
Copy link

marsggbo commented Dec 9, 2021

def test_ID_estimator(D, name='TwoNN', *args, **kwargs):
  ids = []
  Ns = [64, 128, 256, 512, 1024, 2048]
  for N in Ns:
    data = np.zeros((N,3*32*32))
    data[:,:D] = skdim.datasets.hyperBall(n = N, d = D, radius = 2, random_state = 666)
    _id = eval(f"skdim.id.{name}")(*args,**kwargs).fit_transform(X=data)
    ids.append(_id)
    print(f'{name}', N, _id)

Results

test_ID_estimate(10, name='FisherS')
>>>
FisherS 64 11.258882790713601
FisherS 128 12.293955630520735
FisherS 256 10.300845319222297
FisherS 512 10.0622264687654
FisherS 1024 10.117248646587507
FisherS 2048 10.083245836860842
test_ID_estimate(20, name='FisherS')
>>>
FisherS 64 nan
FisherS 128 25.11715241495022
FisherS 256 19.984501791074795
FisherS 512 20.91391435506641
FisherS 1024 19.95265685270067
FisherS 2048 19.77165730041636
test_ID_estimate(50, name='FisherS')
>>>
FisherS 64 nan
FisherS 128 nan
FisherS 256 nan
FisherS 512 nan
FisherS 1024 nan
FisherS 2048 49.44956027057592
test_ID_estimate(100, name='FisherS')
>>>
FisherS 64 nan
FisherS 128 nan
FisherS 256 nan
FisherS 512 nan
FisherS 1024 nan
FisherS 2048 nan
@marsggbo
Copy link
Author

marsggbo commented Dec 9, 2021

Should I modify some arguments of FisherS?

@j-bac
Copy link
Collaborator

j-bac commented Dec 9, 2021

FisherS has theoretical restrictions on maximum detectable ID depending on dataset cardinality (you can look at Figure 1-E in https://arxiv.org/pdf/2001.11739.pdf). ESS or DANCo should be more robust in such situations. Maybe TwoNN also

In such a case for FisherS you can tune alphas parameter values further down. E.g. instead of the default
alphas = np.arange(0.6, 1, 0.02)[None] you can use alphas = np.arange(0.3, 1, 0.02)[None]

@j-bac j-bac closed this as completed Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants