Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Samples distribution #1233

Merged
merged 1 commit into from
Mar 29, 2017
Merged

Add Samples distribution #1233

merged 1 commit into from
Mar 29, 2017

Conversation

tbekolay
Copy link
Member

@tbekolay tbekolay commented Dec 12, 2016

Motivation and context:
Look, people want to pass in NumPy arrays to EnsembleArray. They just do! See #691 and and #766 for proof, it's a thing. In #766 @arvoelke suggested a "no-op" distribution, which no one at the dev meeting disliked, so here it is.

Closes #691 and #766.

How has this been tested?
New EnsembleArray test included 👍

How long should this take to review?

  • Quick (less than 40 lines changed or changes are straightforward)

Types of changes:

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have read the CONTRIBUTING.rst document.
  • I have updated the documentation accordingly.
  • I have included a changelog entry.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Still to do:

  • Discussion point: right now I make a copy of the array passed into Samples. Should we?
  • Add to docs
  • Add changelog entry
  • Are there other places Samples would simplify code?

Copy link
Contributor

@arvoelke arvoelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet. Looks good! I realized half way that this was only meant for ndarray, but my feedback is working under the assumption that we could make this to handle any iterable / array_like parameter?

shape = (n,) if d is None else (n, d)

if d is None:
samples = samples.squeeze()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if d is not None and does not match the second dimension? Is this handled somewhere downstream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops, totally forgot to check that ;) Originally the check below checked the whole shape, but then I changed it and forgot to add in the check for the second dimension. Will update tomorrow.

nengo/dists.py Outdated
self.samples = samples

def __repr__(self):
return "Samples(samples=%r)" % self.samples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... % (self.samples,), otherwise if self.samples is a tuple, then you will get a string formatting TypeError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, samples is always a numpy array here so we should be good. I considered the defensive tuple though, so I can slip it in there for kicks.


def __init__(self, samples):
super(Samples, self).__init__()
self.samples = samples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast to ndarray here instead of in sample? This will then work if given a generator, and will also resolve the comment in __repr__ indirectly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done automatically since samples is an NdarrayParam.

@@ -89,6 +89,10 @@ def __init__(self, n_neurons, n_ensembles, ens_dimensions=1,

super(EnsembleArray, self).__init__(label, seed, add_to_container)

for param in ens_kwargs:
if isinstance(ens_kwargs[param], np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be generalized to handle any iterable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right yeah, I'll use nengo.utils.compat.is_array_like.

assert np.allclose(built.max_rates, max_rates)
assert np.allclose(built.intercepts, intercepts)
assert np.allclose(built.eval_points, eval_points)
assert built.eval_points.shape == eval_points.shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include test for mismatched shape?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do! 👍

@arvoelke
Copy link
Contributor

arvoelke commented Dec 12, 2016

For your discussion point,

right now I make a copy of the array passed into Samples

I also think a copy is fine. Better safe than sorry (risk/reward is too high).

@@ -89,6 +89,10 @@ def __init__(self, n_neurons, n_ensembles, ens_dimensions=1,

super(EnsembleArray, self).__init__(label, seed, add_to_container)

for param in ens_kwargs:
if isinstance(ens_kwargs[param], np.ndarray):
ens_kwargs[param] = nengo.dists.Samples(ens_kwargs[param])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a little bit inconsistent and magical to do this only for ensemble arrays.

I would prefer that any array (or equivalent) assigned to a DistOrArrayParam will be automatically cast in the Samples distribution. PR #1207 adds some functionality to make such casts require less boilerplate, particularly the commit 7af0856.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so like we talked about in the dev meeting, the issue there is that then we lose the model construction-time validation of shapes. This isn't a simple no-op, it means that if people are accessing ensemble.encoders they will never get a NumPy array, which is almost always what they want if they're accessing it to do some math or some such. I see the benefit of this distribution for fixing this specific problem, but I think it's premature to switch a lot of core functionality over to it because of issues like this. Not saying we won't, but I'd prefer it in a separate PR. I'm happy to switch over other specific instances in which nothing changes to the user, but yeah, mostly I wanted to get this out quickly and haven't looked around for other places to do this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if people are accessing ensemble.encoders

How common is that? I think I never set encoders to something different than a distribution.

I see the benefit of this distribution for fixing this specific problem

How feel you about leaving this block out of the PR? It requires a little bit more boilerplate on the user side, but also makes explicit what's happening. Also with this change ensemble_array.ensembles[i].encoders will never return a NumPy array. So if we worry about that in general, why not in the case of ensemble arrays?

@tbekolay
Copy link
Member Author

OK, pushed a fixup addressing @arvoelke's comments. I opted for testing mismatched shapes on the distribution itself, so added tests to tests_dists.py for that.

Also, for future PRs, could you read over our reviewing guidelines @arvoelke? We're trying to move away from reviews of the form "please change this" and instead having the reviewer make the change themselves. We're also not using Github's reviewing interface and opting for labels instead, though we might move to using that in the new year.

Re @jgosmann's comments, setting encoders to numbers is why Xuan raised #691 so people do it. As for not including the block in EnsembleArray, if we didn't include it then we couldn't close #691 or #766, so I'm not a fan of that.

One possible way around it would be for the DistributionParam to return a NumPy array when accessing the value of the distribution is a Samples instance, but that's definitely pretty magical so I'd want to sit on that for a while. Setting parameters is far more common than reading them, so while in the ideal case what you set is always what you get back, that's already not quite the case in all situations. It's also not at very common to access the individual ensembles in an EnsembleArray, so given all these factors, this half-measure seems okay for now and not worth blocking this PR which fixes two open issues. Happy to hear other perspectives though.

@jgosmann
Copy link
Collaborator

setting encoders to numbers is why Xuan raised #691 so people do it

This was not about setting, but accessing encoders. So how common is it for people to set the encoders to numbers and access them afterwards to do math on them? (And even if, one could easily keep a variable with those encoders around or query them from Samples.)

Why was it that we're updating the config and do not pass **ens_kwargs to the Ensemble constructor? This seems to me to have exactly the same effect except that NumPy arrays instead of distributions would work out of the box.

@tbekolay
Copy link
Member Author

This was not about setting, but accessing encoders.

Sorry, I was replying to the comment "I think I never set encoders to something different than a distribution."

Why was it that we're updating the config and do not pass **ens_kwargs to the Ensemble constructor?

To save a few lines of code, and so that if you add more ensembles to the ensemble array after it's created those parameters still apply. I brought this up as a possible resolution to these issues when we discussed it in the dev meeting but no one had an opinion on it, but yeah, we could do that instead.

Anyone else have thoughts?

@jgosmann
Copy link
Collaborator

To save a few lines of code,

Doesn't this add a line of code?

so that if you add more ensembles to the ensemble array after it's created those parameters still apply.

To me this seems to be a pretty weird use case ... if no one is actually doing this or has any other arguments, I would favour passing **ens_kwargs to the Ensemble. Even though, the Samples distribution can still be useful in other situations.

@jgosmann
Copy link
Collaborator

Added a small commit, apart from the discussion about the usage in EnsembleArray this LGTM.

Copy link
Contributor

@Seanny123 Seanny123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've addressed the concerns of @arvoelke and the code looks good, so I personally believe that this is ready to merge.

@tbekolay
Copy link
Member Author

Pushed a commit for one comment I hadn't addressed yet. @arvoelke and/or @jgosmann, could you take another look at this?

@jgosmann
Copy link
Collaborator

The commit looks fine to me.

@tbekolay
Copy link
Member Author

Sorry should have been more explicit... could you take a look at the whole PR, and give it a review approval if it looks good?

Copy link
Collaborator

@jgosmann jgosmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't given this a close look, but it looked good to me before. I would still prefer to pass ens_kwargs directly to the ensembles instead of modifying the config, but I won't block this PR because of it.

This is essentially a no-op distribution that takes a set of
samples and provides them when the `samples` method is called.
It's intended to simplify situations in which a distribution or
samples could be provided by converting samples to a distribution.

The main place this happens is when passing `Ensemble` arguments
to the `EnsembleArray` constructor. Previously, a distribution had
to be provided, since the passed parameters are set as defaults
on the `EnsembleArray` network. Now, any samples passed in are
wrapped in the `Samples` distribution, which makes it possible to
pass in samples. Tests have been added to verify this.

Addresses #691 and #766.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants