Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added distribution for dot products / intercepts #768

Merged
merged 1 commit into from
Dec 17, 2015
Merged

Conversation

arvoelke
Copy link
Contributor

@arvoelke arvoelke commented Jul 3, 2015

Subclassed SubvectorLength to get the distribution that I use for my adaptive cleanup (heteroassociative memory). This is important for setting the intercepts of neurons so that they fire with some chosen probability (see docstring for details).

Also added the inverse cdf to SqrtBeta, and some optional unit tests for the SqrtBeta distribution.

@arvoelke
Copy link
Contributor Author

arvoelke commented Jul 4, 2015

Also, I think the invcdf I added for SqrtBeta would be useful for setting the radius in an EnsembleArray in order to cover some proportion of inputs (as suggested in @jgosmann paper)? Maybe this should be documented more clearly?

@jgosmann
Copy link
Collaborator

jgosmann commented Jul 5, 2015

Could definitely by useful (but I don't need it the paper/spaopt).

@Seanny123
Copy link
Contributor

@arvoelke why is this marked as "work in progress"? What else is left to be done to this pull request before it can be properly reviewed?

@arvoelke
Copy link
Contributor Author

arvoelke commented Dec 7, 2015

Think I did that in case @jgosmann had suggestions based on his research, and because of the two todo's in the original post (better naming? changelog?). These can be subsumed by the review (so yes it's ready). 😄

ndarray
Evaluation points `x` in [0, 1] such that `P(X <= x) = y`.
"""
from scipy.special import betaincinv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually if we have a SciPy dependency, we throw a special error if it's not found. Should this be done here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it was done in a couple other places in this file.

@arvoelke
Copy link
Contributor Author

Renamed invcdf to ppf. Think this should be ready if no complaints.

ppf = dist.ppf(cdf)

# The pdf should reflect the samples
np.random.seed(seed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't have to do this. Make a numpy.random.RandomState and pass it where you need it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you can just get one with the rng fixture. And same thing in the tests above and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@hunse
Copy link
Collaborator

hunse commented Dec 11, 2015

Made a few comments. Go ahead and add the changelog entry, too (that's the submitter's job).

@arvoelke
Copy link
Contributor Author

Added changelog and addressed above comments (thanks)!

@arvoelke
Copy link
Contributor Author

Bumping since this is needed by @mundya and @neworderofjamie this week.

def __init__(self, dimensions):
super(CosineSimilarity, self).__init__(dimensions)

def sample(self, num, rng=np.random):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample doesn't have the same signature as we typically use for Distribution.sample. We might as well take the extra d parameter like SubvectorLength does, if it's not too much trouble (and I don't think it should be).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed this now, added a test, and removed some repeat logic in the process.

def __init__(self, dimensions, subdimensions=1):
super(SubvectorLength, self).__init__(
dimensions - subdimensions, subdimensions)


class CosineSimilarity(SubvectorLength):
"""Distribution of dot products between random unit vectors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Distribution of cosine similarity between two random vectors."
or maybe
"Distribution of the cosine of the angle between two random vectors."

then below: "The cosine similarity is given by the cosine of the angle between two vectors, which is equal to the norm of the dot product of the vectors, divided by the norms of the individual vectors".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But "norm of the dot product" -> just "dot product" (note the result is signed, unlike SubVectorLength).

@arvoelke
Copy link
Contributor Author

Improved documentation to clarify the connection between this distribution and the cosine angle.

It is the distribution of the cosine of the angle between two
random vectors, and can be useful to calculate intercepts such
that a particular neuron has a given probability `p` of firing
in response to a unit length input.
@hunse hunse merged commit 4a6917a into master Dec 17, 2015
@hunse hunse deleted the dotproductdist branch December 17, 2015 16:56
@arvoelke arvoelke mentioned this pull request Dec 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants