Gains grow without bound as the neuron count is increased #1534

arvoelke · 2019-05-29T00:44:57Z

Related Issues

Background Analysis

As the intercept of a neuron, c, approaches 1 from the left-hand side, the gain on the response curve r grows without bound. To make this precise, the curve must transition from r(c) = 0 to r(1) = m, where m is the max_rate, which is by default sampled from U[200, 400). This results in a secant with a slope of (m - 0) / (1 - c) between these two points. As c -> 1 this slope goes off to infinity.

Let n be the number of neurons. The probability, p, of generating at least one uniform random variable from [-1, 1) that falls within the interval [c, 1) is:

Plugging this into the secant equation gives a slope of:

Example Manifestation

For example, given 20000 neurons there is a 99% chance that at least one of the ReLU's gains is on the order of 10^6 (see this by plugging in m = 300, p = 0.99, n = 20000 into Mathematica). This is validated by the following figure+code, which also demonstrates that this happens regardless of neuron model (although the analysis for the precise gain differs by a constant factor for models other than ReLU):

import nengo

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

def sample(neuron_type, n_batches=20, batch_size=1000, seed=0):
    intercepts = np.empty(n_batches * batch_size)
    gain = np.empty(n_batches * batch_size)
    for i in range(n_batches):
        with nengo.Network(seed=seed+i) as model:
            ens = nengo.Ensemble(n_neurons=batch_size, dimensions=1,
                                 neuron_type=neuron_type)
        with nengo.Simulator(model, progress_bar=False) as sim:
            pass
        intercepts[i*batch_size:(i+1)*batch_size] = sim.data[ens].intercepts
        gain[i*batch_size:(i+1)*batch_size] = sim.data[ens].gain
    return intercepts, gain

plt.figure(figsize=(14, 6))
for neuron_type in (nengo.LIF(), nengo.Sigmoid(), nengo.RectifiedLinear()):
    sns.scatterplot(*sample(neuron_type), alpha=0.4, label=repr(neuron_type))
plt.xlabel("Intercept")
plt.ylabel("Gain")
plt.yscale('log')
plt.ylim(1e-4, plt.ylim()[1])
plt.legend()
plt.show()

Impact on Network Accuracy

The consequence is that the error shoots up at the end-points of the representation (+/- 1). Showing the error in the tuning curves of some LIFRate neurons below.

To illustrate the severity of this issue, we can repeat this a number of times while scaling n and evaluating the curves at different input values approaching the positive edge (0.9, 0.99, 0.999).

As u becomes closer to 1, the RMSE increases (on log-y scale). The quality of the representation is roughly 10x worse at 1000 neurons on average. Similarly, 100 neurons at u = 0.9 does about as well as 1000 neurons at u = 0.99.

This same effect happens for the negative edge, and the relative magnitude of the effect can become more-or-less pronounced by altering the magnitude of L2-regularization. This also applies to the spiking case and to other neuron models -- although RectifiedLinear does much better at constant and linear functions due to the shape of its response curve.

from collections import defaultdict

from pandas import DataFrame

from nengo.builder.ensemble import get_activities
from nengo.utils.ensemble import tuning_curves


def go(n_neurons, neuron_type, u=1, y=1, seed=0, debug=False):
    with nengo.Network(seed=seed) as model:
        x = nengo.Ensemble(n_neurons, 1, neuron_type=neuron_type)
        conn = nengo.Connection(
            x, nengo.Node(size_in=1), function=lambda x: y)

    with nengo.Simulator(model, progress_bar=False) as sim:
        pass

    a = get_activities(sim.data[x], x, u*np.ones((1, 1)))
    y_hat = sim.data[conn].weights.dot(a.T)
    
    if debug:        
        eval_points, activities = tuning_curves(x, sim)
        error = activities.dot(sim.data[conn].weights.T) - y
        plt.figure()
        plt.plot(eval_points, error)
        plt.scatter(eval_points, error, s=5)
        plt.xlabel("Input")
        plt.ylabel("Error")
        plt.show()

    return nengo.utils.numpy.rmse(y_hat, y)

go(5000, nengo.LIFRate(), debug=True)

n_trials = 20
data = defaultdict(list)
for u in (0.9, 0.99, 0.999):
    for seed in range(n_trials):
        for n_neurons in np.geomspace(20, 2000, 3, dtype=int):
            rmse = go(
                n_neurons=n_neurons,
                neuron_type=nengo.LIFRate(),
                u=u,
                seed=seed,
            )
            data['# Neurons'].append(n_neurons)
            data['RMSE'].append(rmse)
            data['Seed'].append(seed)
            data['u'].append("$%s$" % u)  # to prevent pandas bug when castable to float

plt.figure(figsize=(14, 7))
sns.lineplot(data=DataFrame(data),
             x="# Neurons", y="RMSE", hue="u")
plt.xscale('log')
plt.yscale('log')
sns.despine(offset=15)
plt.show()

Why Didn't I Notice this Before?

For all models except for RectifiedLinear, saturation effects kick in to handle this somewhat gracefully by producing a response curve that is essentially a step-function. As an aside: if pure step-functions are useful response curves, then why have them only at +/- 1 (why not at other points in the vector space)?
The effect is most apparent when scaling up the number of neurons and measuring the error. Most of our models and tests do not analyze this scaling systematically to find the conditions where it might be falling apart.
The effect is only evident for input values close to the radius. We often set the radius to be slightly larger than what we need. Moreover, we often add neurons until the network's function falls within specification, rather than decomposing the error into its absolute quantities in order to understand their relative contributions with respect to the input statistics.
The effect is a systematic bias, in the sense that individual trials can be better or worse at many different points within the input space. The edges aren't always worse than every other point. But this is still an issue because it introduces a systematic bias in performance that can require 10x as many neurons on average if certain operating points are critical.
Decoder regularization helps compensate for exploding gains to some extent, by regularizing the decoding weights that correspond to the neurons with high gains. However, this is not a principled solution, as L2-regularization assumes all neurons should be treated equally. The first plot reveals that they are not. Moreover, the ability of the optimization problem to be sensitive to these extreme slopes relies on the density of evaluation points near +/- 1.

Possible Solutions

Switching the default intercept distribution to CosineSimilarity(d+2) might solve this for d > 1 (nengo/enhancement-proposals#10).

An interim solution is to always pick the radius to be larger than you need.

The following possibilities would require a change to our definition of max_rates. It also remains to be seen whether any of them resolve the impact on network accuracy shown above without requiring some other change such as extending evaluation points outside the radius.

Clip the gains at some reasonable maximum.
Balance the response curves such that gains are distributed independently of intercept (i.e., such that you would get a flat band in the first plot).
@tcstewar suggested shifting the anchor point for max_rates to 1 + eps where eps is some fudge factor (e.g., eps = 0.1).

Preventing intercepts from approaching 1 is a sub-optimal solution, because it is still helpful to get a dynamic range for input values close to +/- 1 that is comparable to the dynamic range for all other values. It's just that we don't want this dynamic range to approach infinite slope, since that introduces extreme sensitivity to the magnitude of that neuron's corresponding decoder (with respect to small changes in the input) in a way that is not captured by the optimization problem.

The text was updated successfully, but these errors were encountered:

tbekolay · 2019-05-29T12:25:37Z

Any idea whether these effects would be exaggerated for learning networks? I've often found that learning models (even learning a communication channel) have a hard time accurately learning extreme values like this, so curious if this is the root cause.

tcstewar · 2019-05-30T03:44:40Z

I'm definitely liking #3 as the way to go... especially since it's such a small change to the code (gain = (1 - x) / (intercepts - eps) instead of gain = (1 - x) / (intercepts - 1.0)) and that same approach works for all neuron models (I think). I guess eps would have to be an Ensemble-level parameter as I could see people wanting to change it....

jgosmann · 2019-05-30T09:30:31Z

In my thesis I also observed that there is an increased distortion error close the boundary of the radius. I attributed it to the effect that uniformly distributed evaluation points do not fully cover the unit-hypersphere (because they only generate a “hyper-polygon”). But the problem described here probably contributes too. I'd be curious what part of the error can be attributed to each of these explanations.

Also this taken together with the discussion in #1243 and #1248 leads to me to a rather radical proposal of a fourth possible solution (or might this be equivalent to proposal 2? that one seems to be underspecified on how it should be done): one should not specify the distribution of max_rates, but the distribution of gains. This would:

avoid the problem of near infinite gains (assuming a suitable distribution)
avoid the problem of negative gains (Raise exception on invalid intercepts. #1243)
avoid specifying a max_rate which isn't really a max_rate if one goes outside of the radius.
the distribution of response curve shape is the same everywhere (this is not true for proposal 1 and 3)

Of course there are certain disadvantages:

It is less obvious what the resulting distribution of max rates would be. Though for a given gain distribution, radius, and neuron model one should be able to determine (or at least approximate) the largest possible firing rate (in that radius) or even the distribution of firing rates. I suppose most people only care about having the firing rates in a reasonable range than having an exact distribution of max_rates matched (is that distribution even something one could determine experimentally in a sensible way?).
From a user perspective it would be more natural to think about and specify max_rates.

I'd be curious how specifying a gain distribution would actually affect the error.

arvoelke · 2019-05-30T21:34:10Z

@tbekolay

Any idea whether these effects would be exaggerated for learning networks? I've often found that learning models (even learning a communication channel) have a hard time accurately learning extreme values like this, so curious if this is the root cause.

I hadn't considered this, but definitely possible. Might be a good idea to codify all of these different observations into a series of benchmarks that we can use to assess different options.

@tcstewar

I'm definitely liking #3 as the way to go... especially since it's such a small change to the code (gain = (1 - x) / (intercepts - eps) instead of gain = (1 - x) / (intercepts - 1.0)) and that same approach works for all neuron models (I think). I guess eps would have to be an Ensemble-level parameter as I could see people wanting to change it....

I think neuron parameter also makes sense, as you did in the code you shared with me:

class LIFRateSafe(nengo.LIFRate):
    x_max = nengo.params.NumberParam('x_max')
    def __init__(self, x_max=1.1, tau_rc=0.02, tau_ref=0.002, amplitude=1):
        super().__init__(tau_rc=tau_rc, tau_ref=tau_ref, amplitude=amplitude)
        self.x_max = x_max
        
    def gain_bias(self, max_rates, intercepts):
        """Analytically determine gain, bias."""
        max_rates = np.array(max_rates, dtype=float, copy=False, ndmin=1)
        intercepts = np.array(intercepts, dtype=float, copy=False, ndmin=1)

        inv_tau_ref = 1. / self.tau_ref if self.tau_ref > 0 else np.inf
        if np.any(max_rates > inv_tau_ref):
            raise ValidationError("Max rates must be below the inverse "
                                  "refractory period (%0.3f)" % inv_tau_ref,
                                  attr='max_rates', obj=self)

        x = 1.0 / (1 - np.exp(
            (self.tau_ref - (1.0 / max_rates)) / self.tau_rc))
        gain = (1 - x) / (intercepts - self.x_max)
        bias = 1 - gain * intercepts
        return gain, bias

One way of going about option 2 that is equally simple is to replace gain = (1 - x) / (intercepts - 1) with gain = (1 - x) / (self.anchor - 1) where anchor is a reference point (e.g., -1) that is used to determine the distribution of gains. This gives the neurons a distributions of gains that are independent of the intercepts, but anchored to the distribution of max_rates that would be achieved given intercepts at anchor.

class LIFRateUnbiased(nengo.LIFRate):
    anchor = nengo.params.NumberParam('anchor')
    
    def __init__(self, anchor=-1, tau_rc=0.02, tau_ref=0.002, amplitude=1):
        super().__init__(tau_rc=tau_rc, tau_ref=tau_ref, amplitude=amplitude)
        self.anchor = anchor
        
    def gain_bias(self, max_rates, intercepts):
        """Analytically determine gain, bias."""
        max_rates = np.array(max_rates, dtype=float, copy=False, ndmin=1)
        intercepts = np.array(intercepts, dtype=float, copy=False, ndmin=1)

        inv_tau_ref = 1. / self.tau_ref if self.tau_ref > 0 else np.inf
        if np.any(max_rates > inv_tau_ref):
            raise ValidationError("Max rates must be below the inverse "
                                  "refractory period (%0.3f)" % inv_tau_ref,
                                  attr='max_rates', obj=self)

        x = 1.0 / (1 - np.exp(
            (self.tau_ref - (1.0 / max_rates)) / self.tau_rc))
        gain = (1 - x) / (self.anchor - 1)
        bias = 1 - gain * intercepts
        return gain, bias

This is what the distribution of gains look like in each case. Note that LIFRateUnbiased is a flat band, while LIFRateSafe is basically the same as LIFRate but stretched horizontally by a factor of 1 / (1 + eps) and then truncated.

@jgosmann

In my thesis I also observed that there is an increased distortion error close the boundary of the radius. I attributed it to the effect that uniformly distributed evaluation points do not fully cover the unit-hypersphere (because they only generate a “hyper-polygon”). But the problem described here probably contributes too. I'd be curious what part of the error can be attributed to each of these explanations.

Right on point. I've been finding that this interacts with eval_points extending beyond the radius, and even the existence of intercepts < -1. The rationale for having extra small intercepts is to create symmetry in the slopes of tuning curves that are passing over +/- 1. I'm finding it a challenge to decouple all of these considerations from one another. Currently I've been experimenting with the following combinations of parameters on the "benchmark" from the first post:

Default	Variant
`LIFRate()`	`LIFRateSafe()` or `LIFRateUnbiased()`
`intercepts=Uniform(-1, 1)`	`intercepts=Uniform(-1.5, 1)`
`eval_points=Uniform(-1.1, 1.1)`	`eval_points=Uniform(-1.1, 1.1)`

Surprisingly I've been finding that you need to be entirely in the Variant column in order to see an improvement. Any combination of parameters that have one or more Default choice still suffer in some way.

I'm also seeing that LIFRateUnbiased() performs better than LIFRateSafe(), although the difference is somewhat in the eye of the beholder. Note the following modifications to the original benchmark:

Changed the values of u
Increased the number of trials to 50
Now completely in the Variant column

I'm thinking that the LIFRateSafe() option (proposal 3) is more of a band-aid solution than LIFRateUnbiased() (instance of proposal 2). While the safe option might not be as extreme of a change, I think it is not addressing the root issue (which is the tuning curves having biased response properties depending on the part of the input space). Note the first plot from this post, which shows the LIFRateSafe() option still has a systematic shift in gains that spans an order of magnitude, while LIFRateUnbiased() does not.

Also this taken together with the discussion in #1243 and #1248 leads to me to a rather radical proposal of a fourth possible solution (or might this be equivalent to proposal 2? that one seems to be underspecified on how it should be done): one should not specify the distribution of max_rates, but the distribution of gains. This would:

Yay! It's encouraging that we were both thinking along the same lines. I gave a concrete instance of proposal 2 with LIFRateUnbiased().

Does my proposal help with the disadvantages you mentioned? This redefines the max_rate to be the rate at the value of 1 after shifting the response curve to anchor. For example, with anchor = -1, this defines the max_rate as the rate when the standard curve is evaluated at 2 + intercept. We could probably make this a bit more natural, but it bears resemblance to the current definition at least.

arvoelke · 2019-05-30T21:48:58Z

For what it's worth, here's the above LIFRateUnbiased() experiment, re-run with an L2-regularization of 0.01 (rather than the default of 0.1). The motivation for looking at this parameter at the same time is that we may have previously been over-regularizing the inside of the space in order to compensate for the regularization that was needed to tame the effects of large gains near the outside of the space.

Likewise, here's a zoom-in of the graph from the previous post (reg=0.1 and in the Variant column) to get a better side-by-side comparison of the two LIFRate[Unbiased|Safe]() methods, with all else kept equal.

jgosmann · 2019-05-31T15:59:08Z

Does my proposal help with the disadvantages you mentioned? This redefines the max_rate to be the rate at the value of 1 after shifting the response curve to anchor. For example, with anchor = -1, this defines the max_rate as the rate when the standard curve is evaluated at 2 + intercept. We could probably make this a bit more natural, but it bears resemblance to the current definition at least.

Interesting, this is sort of specifying the distribution of gains (as in my proposal), but via the max_rates. This might be more intuitive for a user because there is still a max_rates that works similar as before in so far as that max_rates won't be exceeded within the radius. The trade-off is that the resulting gain distribution is less obvious and the actual point where the max_rate is reached jumps around. But that might not be a huge deal.

For what it's worth, here's the above LIFRateUnbiased() experiment, re-run with an L2-regularization of 0.01 (rather than the default of 0.1). The motivation for looking at this parameter at the same time is that we may have previously been over-regularizing the inside of the space in order to compensate for the regularization that was needed to tame the effects of large gains near the outside of the space.

The 0.1 value is for spiking LIFs and I was already aware that 0.01 works better for LIFRate, presumably due to the missing spiking noise. So I'm not yet fully convinced that we were over-regularizing because of the gains, but that might be an additional effect.

btw: Maybe you find the benchmarking code and plotting in this notebook useful. It can give you nice plots of the error (separated into noise and distortion, though the former isn't that relevant*) across the representational space with CIs.

While the rate neuron plots (output cells 48 and 50) have a noise error of virtually 0 (< 1e-14), it clearly increases towards -1 and 1. This might reflect the influence of the large gains in those areas multiplying the floating point inaccuracies?

arvoelke · 2019-06-07T21:47:25Z

I accidentally stumbled into this issue again in the context of solvers with weights=True.

Since each postsynaptic neuron becomes a separate target, the ones with larger gains become more difficult to fit in proportion. Notably, the intercept doesn't even need to become very large for the error to become large. With only 10 separate simulations of 100 postsynaptic neurons, the worst-case RMSE can become approximately 200 times worse from one trial to the next (depending on how [un]lucky the intercept placement is). On one hand this makes sense because the targets and errors are varying by the same factor as the gain. However, an important observation is that the variance in the corresponding postsynaptic weights can then vary by a factor of 40,000 (or ~200^2, as validated in the code below). This indicates to me that the effects of regularization and sensitivity to noise are going to vary wildly from one postsynaptic neuron to the next.

import nengo
import numpy as np

import matplotlib.pyplot as plt

from nengo.builder.ensemble import get_activities
from nengo.utils.numpy import rmse

def trial(seed, solver=nengo.solvers.LstsqL2(weights=True),
          n_pre=50, n_post=100, test_size=1000):

    with nengo.Network(seed=seed) as model:
        x = nengo.Ensemble(n_pre, 1)
        y = nengo.Ensemble(n_post, 1)
        conn = nengo.Connection(x, y, solver=solver)

    with nengo.Simulator(model, progress_bar=None) as sim:
        pass
    
    assert conn.solver.weights
    eval_points = nengo.dists.Uniform(-1, 1).sample(test_size, 1, rng=sim.rng)
    A = get_activities(sim.data[x], x, eval_points)
    Y = sim.data[y].scaled_encoders.dot(eval_points.T)
    rmses = rmse(sim.data[conn].weights.dot(A.T), Y, axis=1)
    j = np.argmax(rmses)

    return (
        sim.data[y].gain[j],
        sim.data[y].intercepts[j],
        np.mean(sim.data[conn].weights[j, :] ** 2),
        rmses[j],
    )

gains = []
intercepts = []
weights = []
errors = []
for i in range(10):
    g, c, w, e = trial(i)
    gains.append(g)
    intercepts.append(c)
    weights.append(w)
    errors.append(e)

print(np.max(weights) / np.min(weights), np.max(errors) / np.min(errors))

fig, ax = plt.subplots(1, 3, figsize=(16, 4), sharey=True)
ax[0].scatter(gains, errors)
ax[1].scatter(intercepts, errors)
ax[2].scatter(weights, errors)
ax[0].set_ylabel("RMSE")
ax[0].set_xlabel("Gain")
ax[1].set_xlabel("Intercept")
ax[2].set_xlabel("Var(w)")
fig.show()

Note: I am computing the RMSEs manually because of #1539 for LstsqL2, but I've also verified that the same pattern holds for the training solver_info['rmses'] reported by solvers such as LstsqL1 or LstsqDrop.

hunse · 2019-08-07T14:39:09Z

I'm thinking that the LIFRateSafe() option (proposal 3) is more of a band-aid solution than LIFRateUnbiased() (instance of proposal 2). While the safe option might not be as extreme of a change, I think it is not addressing the root issue (which is the tuning curves having biased response properties depending on the part of the input space). Note the first plot from this post, which shows the LIFRateSafe() option still has a systematic shift in gains that spans an order of magnitude, while LIFRateUnbiased() does not.

I'm not convinced that having gains that are independent of intercepts is ideal. I think we need some sort of metric to measure the "effectiveness" of a neuron. One such metric would be to measure how much the RMSE increases if that neuron is left out. Of course, we want to do it in such a way that we also account for the noise of the neuron (neurons with lower firing rates are noisier), which is related to the decoder weight for the neuron that we regularize, so maybe this comes out in the wash with regularization. To make it even more complicated, we want to consider the neuron's effect on RMSE across the space of representable functions, not just for a particular one. All in all, this seems like a fair bit of work to do rigorously.

Intuitively, though, I think there is sense behind neurons with intercepts closer to 1 having larger gains. That way, they can fire at about the same rate as other neurons in that region, and contribute equally. If they have lower rates (as they do when gains are independent of intercepts), then they're going to be a lot noisier.

One limitation of your experiments above, @arvoelke, is that they don't account for this spike noise at all. They're all using the ideal rate curves, which ignore the noise we get when we switch to spikes (though of course the solvers do try to deal with it via regularization).

hunse · 2019-08-07T14:46:28Z

I did come up with a LIFRateNorm neuron type, that bridges LIFRateSafe and LIFRateUnbiased. For small values of max_x (e.g 1.1), it's equivalent to LIFRateSafe, but as max_x grows, it tends to LIFRateUnbiased. This seemed preferable to the behaviour of LIFRateSafe as max_x grows, which is that gains go towards zero if you're keeping intercepts in the default (-1, 1) range.

You can see it in action in the https://github.com/nengo/nengo/tree/gain-bounds-norm branch. Unfortunately, I ran into some problems getting the generic max_rates_intercepts function to work on NeuronType. Basically, the point at which neurons achieve their max rates is no longer fixed, it depends on intercept. So when doing max_rates_intercepts, we need to compute the intercepts first, and use them to compute max rates. But any error in the intercepts gets amplified in error in the max rates, and the function doesn't meet our current tolerances. This can be fixed by evaluating x_range in the function at more points, but I think the function was designed for neuron types where it might be expensive to evaluate their tuning curve (i.e. have to simulate stuff on hardware), so I'm not sure if we want to use a ton of points there.

class LIFRateNorm(nengo.LIFRate):

    max_x = nengo.params.NumberParam('max_x')

    def __init__(self, max_x=1.0, tau_rc=0.02, tau_ref=0.002, amplitude=1):
        super().__init__(tau_rc=tau_rc, tau_ref=tau_ref, amplitude=amplitude)
        self.max_x = max_x

    def gain_bias(self, max_rates, intercepts):
        """Analytically determine gain, bias."""
        max_rates = np.array(max_rates, dtype=float, copy=False, ndmin=1)
        intercepts = np.array(intercepts, dtype=float, copy=False, ndmin=1)

        inv_tau_ref = 1. / self.tau_ref if self.tau_ref > 0 else np.inf
        if np.any(max_rates > inv_tau_ref):
            raise ValidationError("Max rates must be below the inverse "
                                  "refractory period (%0.3f)" % inv_tau_ref,
                                  attr='max_rates', obj=self)

        x = -1 / np.expm1((self.tau_ref - 1 / max_rates) / self.tau_rc)
        normalizer = 0.5 * self.max_x + 0.5  # == 1 when max_x == 1, 0.5*max_x when max_x >> 1
        gain = (x - 1) * normalizer / (self.max_x - intercepts)
        bias = 1 - gain * intercepts
        return gain, bias

tbekolay · 2019-08-16T14:51:22Z

We merged #1561, which addresses the most common cases by changing the default intercept distribution. But I would say this issue is not quite finished; it might be good to have the functionality in LIFRateUnbiased and/or LIFRateNorm exposed somehow to the user, though perhaps in Nengo Extras. In any case, I'm going to leave this issue open.

arvoelke added the discussion label May 29, 2019

hunse mentioned this issue Aug 7, 2019

Change point at which neurons achieve max_rates #1561

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gains grow without bound as the neuron count is increased #1534

Gains grow without bound as the neuron count is increased #1534

arvoelke commented May 29, 2019 •

edited

Loading

tbekolay commented May 29, 2019 •

edited

Loading

tcstewar commented May 30, 2019

jgosmann commented May 30, 2019

arvoelke commented May 30, 2019 •

edited

Loading

arvoelke commented May 30, 2019 •

edited

Loading

jgosmann commented May 31, 2019

arvoelke commented Jun 7, 2019 •

edited

Loading

hunse commented Aug 7, 2019

hunse commented Aug 7, 2019

tbekolay commented Aug 16, 2019

Gains grow without bound as the neuron count is increased #1534

Gains grow without bound as the neuron count is increased #1534

Comments

arvoelke commented May 29, 2019 • edited Loading

Related Issues

Background Analysis

Example Manifestation

Impact on Network Accuracy

Why Didn't I Notice this Before?

Possible Solutions

tbekolay commented May 29, 2019 • edited Loading

tcstewar commented May 30, 2019

jgosmann commented May 30, 2019

arvoelke commented May 30, 2019 • edited Loading

arvoelke commented May 30, 2019 • edited Loading

jgosmann commented May 31, 2019

arvoelke commented Jun 7, 2019 • edited Loading

hunse commented Aug 7, 2019

hunse commented Aug 7, 2019

tbekolay commented Aug 16, 2019

arvoelke commented May 29, 2019 •

edited

Loading

tbekolay commented May 29, 2019 •

edited

Loading

arvoelke commented May 30, 2019 •

edited

Loading

arvoelke commented May 30, 2019 •

edited

Loading

arvoelke commented Jun 7, 2019 •

edited

Loading