Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

much improved NN graph #43

Open
wants to merge 92 commits into
base: master
Choose a base branch
from
Open

much improved NN graph #43

wants to merge 92 commits into from

Conversation

mdeff
Copy link
Collaborator

@mdeff mdeff commented Feb 25, 2019

Contains PR #21 with some improvements.

Major fixes:

  • correct symmetrization of distance matrix (695272b)
  • squared width in gaussian kernel (f544e1e)
  • FLANN returns distance^p for a p-norm (8cc3539)

Major improvements:

  • much better and complete documentation
  • better and more complete tests
  • feature standardization (bf7427f)
  • radius estimation (26e12e3)
  • user choice of similarity kernel (9c8e86e)

Ohers:

  • width = radius / 2 for radius graphs (011dc81)

naspert and others added 30 commits March 19, 2018 10:57
- do not use underscores in functions args
# Conflicts:
#	pygsp/graphs/nngraphs/nngraph.py
#	pygsp/tests/test_graphs.py
@nperraud
Copy link
Collaborator

I have just added the function I coded to this branch to get the nearest neighbor search out of the Knn graph. The Knn graph function still needs to be modified.

@mdeff
Copy link
Collaborator Author

mdeff commented Jul 23, 2019

Thanks! Can you fix the tests?

@nperraud
Copy link
Collaborator

I tried, but failed. They work on my machine but the seed is different. I have to think about how to do it... Maybe you can help me. Let us discuss later...

@nperraud
Copy link
Collaborator

nperraud commented Jul 26, 2019

Tests are passing... The coverage is reduced (probably less raise Error check...) but I think it will increase again once the functions in _nearest_neighboors are cleaned. I wil do that before my vacation...

Currently the code is not very clean...

@nperraud
Copy link
Collaborator

Ok Should be ready for the merge...

@nperraud
Copy link
Collaborator

@mdeff Everything is done. For me, it can be merged to master.

@gcuendet
Copy link

Very nice to see more efficient knn search implementations for knn graph building!
Is there anything still missing before merging the PR? Do you have a rough idea of when this will be merged?
Thanks for all the hard work on this very useful python package!

@mdeff
Copy link
Collaborator Author

mdeff commented Oct 17, 2019

Thanks for the kind words @gcuendet. I mostly need to address review comments. I would also like to merge knn and radius graphs. I'll probably get to this in the not too far future as two other PRs (#55 and #57) depend on it. Then I'll make a new release.

if kind == 'knn':
params['k'] = k + 1
elif kind == 'radius':
params['k'] = features.shape[0] # number of vertices

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a problem here. This does not scale.
According to the doc of scipy.spatial.cKDTree.query

If x has shape tuple+(self.m,), then d has shape tuple+(k,).

Which means tree.query() is going to return a huge matrix of NxN.

This is actually different from scipy.spatial.KDTree.query which, according to the doc, returns :

If k is None, then d is an object array of shape tuple, containing lists of distances.

Which, if I understand correctly, means that tree.query() is going to return an array of size N containing lists with varying length, depending on the number of neighbours in the epsilon ball.

Anyway, I am testing this locally with the fountain dataset that you made available here and it seems that constructing a radius NN graph using this branch of pygsp crashes

G = graphs.NNGraph(points, kind='radius', radius=0.01, 
                   standardize=False, kernel_width=0.1)

whereas using the pip version of pygsp does not.

G = graphs.NNGraph(points, NNtype='radius', epsilon=0.01, 
                   rescale=False, sigma=0.1)

On my initial experiments, I also used scipy.spatial.cKDTree.query_ball_point() which only returns the indices, so distances need to be computed seperately. That might be an alternative maybe?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR #60 about the use of scipy.spatial.cKDTree.query_ball_point()

@mdeff
Copy link
Collaborator Author

mdeff commented Dec 18, 2020

@nperraud
Copy link
Collaborator

nperraud commented Dec 19, 2020

I suggest the following:

  • NearestNeighbors instead of NNGraph
  • I find all possibilities bad for the variable kernel_width because it is linked to the gaussian function. So I am not against scale, but I am not pushing for it...
  • Do yo want to do learn graph is the same PR?

@mdeff
Copy link
Collaborator Author

mdeff commented Dec 19, 2020

  • Let's go with NearestNeighbors then. :)
  • Agreed. Some synonyms for ideas: spread, extent, size, width, span, stretch. Maybe we should think of it as a step before the kernel, as in the end we do similarity = kernel(distance / scale). So it's acting on the distances. What about something that "adjusts" the metric? Like expansion, contraction, inflation, amplification, growth, dilation. Contraction of space/distances = expansion of the kernel.
  • learned-graph stays in its own PR, as does sphere-graphs.

@nperraud
Copy link
Collaborator

Seeing all your proposition, I believe scale is the best option. It is rescaling or renormalisation, but I would prefer not to use something like norm because it is confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants