# Parameter Bounds
We have to know the model's parameter bounds in order for the model to fit successfully. These notes report
what we've found. For model fitting we can start with the bounds array:

In [None]:
import numpy as np


lb = np.finfo(float).eps
ub = 1-np.finfo(float).eps

bounds = [
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, ub),
    (lb, 100),
    (lb, 100),
    (lb, ub),
    (lb, 10),
    (lb, 10)
]

where `lb` and `ub` specify default lower and upper bounds, respectively.

### Drift Rates

In past explorations, we found that `drift_rate` parameters cannot exceed values of `1.0` since the `rho`
quantity that enforces the length of `context` to 1.0 is `nan`. In practice anyway, `drift_rate` seems to
more or less define the extent to which `context_input` defines `context` going forward. At at a `drift_rate`
of `1.0`, `context` becomes identical to `context_input`, and at zero, `context` is static. So we should only
support parameter values in that range.

#### Potential Problems
`rho` doesn't seem to reliably enforce `context` to a length of 1. Instead, `context` slowly grows in length
over the course of the experiment. I have noticed this before, but so far it seems unaddressed. This is
probably a result of `context_input` having too high a magnitude? No; I normalize `context_input`. The
explanation has to lie in how I compute `rho`. My guess is that it's a matter of number precision; nothing I
can do, and unlikely to impact performance.

### Shared Support

This is supposed to encode uniform support items initially have for one another in recall competition. It's a
very odd parameter, and I'm considering holding it static to some low value. 

In the original CMR, the `shared_support` parameter set the nondiagonal values of pre-experimental Mcf. This
meant that if an item were arbitrarily presented as a probe to the model, there would be a minimum amount of
support for recall of all items, though especially the presented item itself. I already have something ensure
recall rates are always above zero for every item for any cue, but why doesn't `shared_support` handle that?
Why do I need both?

This alone should set a theoretical bound of 1.0 for this parameter since I otherwise never let items support
themselves more than that. In theory, too, I imagine I'd also normalize every instance vector to have a
length of 1 despite this parameter setting. I'm going to require `shared_support` to be above 0 but below 1.0
and work from there, excluding the minimum positivity constant in activations. 

Shared support can be zero and item support can be zero, but both can't be zero at the same time - probably
because it results in zero vectors sometimes? Behavior when one is nonzero and other is `lb` is identical to
when one is nonzero and other is zero though.

### Learning Rate

This is the gamma parameter of the other model. It would set the initial diagonal of $M_{fc}$ (always zero
otherwise), as well as weight the impact of new experiences on $M_{cf}$'s structure (specifically the
importance of new associations between experience features and experience context). The bigger the gamma, the
weaker the pre-experimental associations and the stronger the experimental associations. An assertion error
happens at $1.0$, presumably because I'm setting the feature half of my instance vectors to $0$ if I do that.
At $1-lb$ (lower bound), I have a left-lopsided CRP, which is theory-inconsistent. At $0$, the exact opposite
occurs. And there's no bug for setting it that low either. I'll still have a lower bound though.

I assume we shouldn't have negative numbers in our instance vectors, anyway, so we'll ceiling the parameter
at $1.0-lb$ (so diagonals of Mfc are always $>0$). Similarly, they should not exceed $1.0$, so we'll do the
same in the other direction too and floor at zero.

Shared support can be zero and item support can be zero, but both can't be zero at the same time - probably
because it results in zero vectors sometimes? Behavior when one is nonzero and other is `lb` is identical to
when one is nonzero and other is zero though.

### Item Support Rate

Corresponds to parameter deciding initial strength of the diagonal of Mcf. Thus must range between 0 and 1.
When the parameter is near zero, CRPs are symmetrical - but by `lb` it gets flat. And my CRPs themselves seem
to have a peak amount of right-weighted CRPs. Can I go above 1? If I do, the CRPs get left-symmetrical. There
seems to be no bound to the height, but we're controlling things. 

Shared support can be zero and item support can be zero, but both can't be zero at the same time - probably
because it results in zero vectors sometimes? Behavior when one is nonzero and other is `lb` is identical to
when one is nonzero and other is zero though. Just enforce that parameters never set an array entry out of
the value 1.0.

### Primacy Scale and Decay

Scaling of primacy gradient in learning rate on Mcf. The higher it is, the bigger the difference in impact on
memory for early versus later encoded items, multiplying an exponential function. Increasing it increases
recall rate for early items. And rate of decay from peak primacy in learning rate on Mcf. The higher it is,
the faster the decay of primacy over successive items.

Seems unbounded above 0, and with stop_probability parameters codetermine recall rates. Primacy scale
certainly strongly controls PFR for first item (stronger support for first item). Primacy decay increases PFR
for first item as it increases (less competition from other early items). Together they strongly control how
much competition early items get with more recently experienced items. If I really do unbound, there will be
fitting issues. But how far do I let them go? Any principled peak?

Standard practice is apparently to pick an arbitrary upper bound and increase it if model fitting tends to
prefer that bound as a configuration of the parameter. This result suggests that the optimal configuration
may exceed the bound, justifying an increase. Setting the upper bound too high, however, can result in
exploration of an unnecessarily long parameter space, increasing fitting times.

In [None]:
import numpy as np
primacy_scale = 14
primacy_decay = 2

primacy_scale * np.exp(-primacy_decay * np.arange(40)) + 1

### Stop probability scale and growth

When growth is 0, then stop probability is constant for each recall item. And when scale is 0, it's zero. So
I should have scale above 0 and probably peaked at 1. Growth is more ambiguous. Certainly floor it at 0. But
ceiling? Even with a scale of lb and growth of 2, stop happens long before item 20. Again, we experiment at a
probably too high ceiling of 10.

In [None]:
stop_probability_scale = lb
stop_probability_growth = 4
recall_total = np.arange(20)
stop_probability_scale * np.exp(recall_total * stop_probability_growth)

### Choice Sensitivity
Activations get exponentiated by this number. Result makes high activations might higher than low
activations. Also unbounded, so we'll peak at 10 and empirically increase it, too.