Skip to content
This repository has been archived by the owner on Sep 1, 2023. It is now read-only.

Update KNNClassifier docstrings #3535

Merged
merged 9 commits into from
Apr 12, 2017
Merged

Conversation

rhyolight
Copy link
Member

@rhyolight rhyolight commented Apr 11, 2017

Fixes #3534

TODO: Somebody help me write this.

:return:
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@numenta/nupic-reviewers I need help writing this. You can simply respond in a comment with a function description and I'll do the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def finishLearning(self):
    """Used for batch scenarios.  This method needs to be called between learning and inference."""

:param numSVDSamples:
:param finalize:
:return:
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@numenta/nupic-reviewers I need help writing this. You can simply respond in a comment with a function description and I'll do the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


def computeSVD(self, numSVDSamples=None, finalize=True):
    """
    Compute the singular value decomposition (SVD). The SVD is a factorization of a real or complex matrix. It factors the matrix `a` as `u * np.diag(s) * v`, where `u` and `v` are unitary and `s` is a 1-d array of `a`'s singular values.

    Reason for computing the SVD: 
    There are cases where you want to feed a lot of vectors to the KNNClassifier. However, this can be slow. You can speed up training by (1) computing the SVD of the input patterns which will give you the eigenvectors, (2) only keeping a fraction of the eigenvectors, and (3) projecting the input patterns onto the remaining eigenvectors. 

    Note that all input patterns are projected onto the eigenvectors in the same fashion. Keeping only the highest eigenvectors increases training performance since it reduces the dimensionality of the input. 

    :param numSVDSamples: (int) the number of samples to use for the SVD computation.
    :param finalize: (bool) whether to apply SVD to the input patterns.
    :return s: (array) The singular values for every matrix, sorted in descending order.
    """

:param singularValues:
:param fractionOfMax:
:return:
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@numenta/nupic-reviewers I need help writing this. You can simply respond in a comment with a function description and I'll do the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def getAdaptiveSVDDims(self, singularValues, fractionOfMax=0.001):
    """Compute the number of eigenvectors (singularValues) to keep."""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also provide short descriptions of the params and return value?


:param numSVDDims:
:return:
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@numenta/nupic-reviewers I need help writing this. You can simply respond in a comment with a function description and I'll do the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this should be a private method _finalizeSVD.

  def finalizeSVD(self, numSVDDims=None):
    """
    Called by finalizeLearning(). This will project all the patterns onto the SVD eigenvectors.
    :param numSVDDims: (int) number of egeinvectors used for projection.
    """

@marionleborgne
Copy link
Member

@rhyolight I have experience using the KNNClassifier, but I am not super familiar with its internals. After chatting with @subutai, I put together a few docstring comments in reply to your questions. Let me know if this works for you.


:returns: a list of numPatterns elements where the i'th position contains
the integer partition Id associated with pattern i. If pattern i had no
partition Id, it's value will be numpy.inf
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marionleborgne Does this method make sense to you? It says it accepts i and returns one id, but it actually returns all of them. I'll see if it is actually used anywhere, but if not I'll probably remove it. It's false advertising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place I see this method used is within a unit test:

https://github.com/numenta/nupic/blob/3680b5589fc1a84c3eb1fd6f9b8e7de1104cc31c/tests/unit/nupic/algorithms/knn_classifier_test.py#L355-L358

I recommend we remove the function, unless @subutai has a reason to leave it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a useful accessor unless there is some other way to get _partitionIdList. Just because it isn't used in NuPIC doesn't mean it isn't useful for clients.

Copy link
Member Author

@rhyolight rhyolight Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottpurdy Then I'll rename it to getPartitionIds and rewrite the :returns: docstring more correctly. It implies a singular return value given a pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we used it recently in research

@rhyolight
Copy link
Member Author

I have test failures to fix.

@rhyolight
Copy link
Member Author

@marionleborgne Thanks for the help. I have a couple comments above. Still working on fixing some broken tests.

@rhyolight
Copy link
Member Author

@marionleborgne @scottpurdy This is again ready for review.

@@ -843,33 +867,33 @@ def getPartitionId(self, i):
return partitionId


def getPartitionIdPerPattern(self):
def getPartitionIds(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is essentially just a getter, it might be nice to match the attribute name (getPartitionIdList) but up to you.

@rhyolight rhyolight merged commit b981848 into numenta:master Apr 12, 2017
@rhyolight rhyolight deleted the docstrings branch April 12, 2017 23:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants