Skip to content

Conversation

steppi
Copy link
Contributor

@steppi steppi commented Dec 25, 2023

Reference issue

I don't think there's an issue for this.

What does this implement/fix?

This PR adds a private vectorized scalar_minimization bracket finder, similar to the vectorized root bracket finder from gh-18348. The intent of this function is the same as scipy.optimize.bracket. A bracket for the minima of a unimodal univariate function $f$ is a triple of points $x_l$, $x_m$, $x_r$, with $x_l < x_m < x_r$, such that

$$f(x_l) \geq f(x_m),\text{ }f(x_r) \geq f(x_m)$$

where at least one of these two inequalities is strict. Unlike scipy.optimize.bracket, this function is able to accept upper
and lower bounds on the endpoints of brackets. The algorithm was suggested to me by @mdhaber, who convinced me it is a sound approach. Quoting from the Notes section of the Docstring

Given an initial trio of points `xl`, `xm`, `xr`, the algorithm checks if
these points already give a valid bracket. If not, a new endpoint, `w` is
chosen in the "downhill" direction, `xm` becomes the new opposite endpoint,
and either `xl` or `xr` becomes the new middle point, depending on which
direction is downhill. The algorithm repeats from here.

The new endpoint `w` is chosen differently depending on whether or not a
boundary `min` or `max` has been set in the downhill direction. Without
loss of generality, suppose the downhill direction is to the right, so that
``f(xl) > f(xm) > f(xr)``. If there is no boundary to the right, then `w`
is chosen to be ``xr + phi * (xr - xm)`` where `phi` is the golden ratio;
so that step sizes increase in geometric proportion, with each step size
equal to the sum of the two previous step sizes. If there is a boundary,
`max` in this case, then `w` is chosen to be ``xr + (max - xr)/phi``, so
that step sizes decrease in geometric proportion, slowing to a stop at
`max`. This cautious approach ensures that a minima near but distinct from
the boundary isn't missed.

Given a

Additional information

This has been placed in scipy.optimize._zeros_py.py to avoid a circular import, since that is where most of the machinery used here lives. A more rightful home would be scipy.optimize._optimize.py where scipy.optimize.bracket lives. Since this is private, I don't think this is important just yet, but we should keep it in mind for the future. @mdhaber, I think I finally get the details of how this machinery works.

@steppi steppi requested a review from andyfaff as a code owner December 25, 2023 07:56
@steppi steppi added the enhancement A new feature or improvement label Dec 25, 2023
@steppi steppi modified the milestone: 1.13.0 Dec 25, 2023
@steppi steppi requested a review from mdhaber December 25, 2023 08:02
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm flying now, so I'd like to play with it a bit when I get to my desktop, but it looks good!

This has been placed in scipy.optimize._zeros_py.py to avoid a circular import, since that is where most of the machinery used here lives.

Shall I move the machinery before we merge this?

I think I finally get the details of how this machinery works.

Cool! In that case, would you review gh-19545? It relaxes the requirement that the shape of the function output must be the shape of the input. This is much more convenient in some cases (especially _tanhsinh, in which case users want to express callables as vector-valued).

@mdhaber
Copy link
Contributor

mdhaber commented Dec 26, 2023

Here's a pretty useful smoke test. I adjusted b = a + 0.1 to reduce the number of invalid input errors.

import numpy as np
from scipy import stats
rng = np.random.default_rng(2549824598234528)

from scipy.optimize._zeros_py import _bracket_minima
from scipy.stats._distr_params import distcont
distcont = dict(distcont)

n_trials = 0
n_success = 0

for distname, params in distcont.items():

    family = getattr(stats, distname)
    dist = family(*params)

    def f(x):
        return -dist.pdf(x)

    try:
        res = _bracket_minima(f, a=dist.rvs(size=10),
                              min=dist.support()[0], max=dist.support()[1])
        print(distname, np.all(res.success))
        n_trials += 10
        n_success += np.sum(res.success)
    except Exception as e:
        print(distname, str(e))

print(f'success rate: {n_success/n_trials}')

Obviously there's no reason that all of these should achieve success, but it does perfectly aside from the cases which got an invalid input error. The three distributions which did not always see success are pareto, truncpareto, and rdist. It correctly finds the left endpoint in the Pareto cases, and it finds either the left or right endpoint for rdist. The only thing to note is that rdist has infinite PDF at the mode, but we get status -3, whereas it might be reasonable to consider the correctly signed infinity to be a minimum.

Code to investigate a particular distribution:

import numpy as np
from scipy import stats
rng = np.random.default_rng(25498245982348)

from scipy.optimize._zeros_py import _bracket_minima
from scipy.stats._distr_params import distcont

distcont = dict(distcont)
distname = 'rdist'
params = distcont[distname]
family = getattr(stats, distname)
dist = family(*params)
guess = dist.rvs(size=10, random_state=rng)

def f(x):
    return -dist.pdf(x)

res = _bracket_minima(f, a=guess,
                      min=dist.support()[0], max=dist.support()[1])

Update: When I change to b = a + 0.001, all the invalid input errors go away. The only functions that see non-success status are uniform, beta, and arcsine. Of course, these are understandable, too, and I think the function does something reasonable in all these cases.

@steppi
Copy link
Contributor Author

steppi commented Dec 28, 2023

Thanks @mdhaber. Good points all. I have a clear idea of how to proceed for everything except how to handle the arguments for left, middle and right initial points. I don't have a strong preference yet, but will think about it some more and then we can discuss.

@steppi
Copy link
Contributor Author

steppi commented Jan 5, 2024

@mdhaber. I think I've made all of the code changes asked for. I changed the signature to include the midpoint as a positional arg, and made the left and right endpoints optional and keyword only. I wrote a scheme for the defaults to ensure the initial bracket won't go beyond a bound when the relevant endpoint isn't specified. I also changed how the updates work to match bracket_root, and made factor a parameter which can be an array.

I haven't updated the tests yet. There's still a question of standardizing variable names for local variables in the functions, and for the inputs. As a proposal, I've changed min and max to xmin and xmax, because I don't want to encourage overwriting the builtin min and max functions. We know better to avoid trouble, but a user might write something like

...
max = [1.0, 2.0, 3.0]
result = bracket_minimum(f, xm, max=max, args=args)

and run into problems

I also changed a, b, c, to xl, xr, xm, since I think a, b, c are not as clear for left, right, and middle points of bracket.

@lucascolley
Copy link
Member

docs build will be fixed upon merging main

Comment on lines 3072 to 3076
work.step[~work.limited] *= work.factor[~work.limited]
work.step[work.limited] /= work.factor[work.limited]
x = np.empty_like(work.xr)
x[~work.limited] = work.xr[~work.limited] + work.step[~work.limited]
x[work.limited] = work.limit[work.limited] - work.step[work.limited]
Copy link
Contributor

@mdhaber mdhaber Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is more similar to the _bracket_root logic now. It makes me realize, though, that I think we could redefine factor at the top like

work.factor[work.limited] = 1/work.factor[work.limited]

and then we don't have do fancy indexing every iteration.

work.step *= work.factor

Could do it here in both functions or in a separate PR if it sounds good.

Copy link
Contributor Author

@steppi steppi Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it for bracket_minimum. I think a separate PR for the other function would be better. I had to change

    factor = np.broadcast_to(factor, shape).astype(dtype, copy=False).ravel()

to

    factor = np.broadcast_to(factor, shape).copy().astype(dtype, copy=False).ravel()

because np.broadcast_to produces a read-only view. I guess the copy=False might not be needed anymore, but I left it in.

@steppi steppi force-pushed the scalar-minimize-bracket-finder branch from 0dbd52b to d206a4d Compare January 6, 2024 23:34
@steppi
Copy link
Contributor Author

steppi commented Jan 8, 2024

OK. I've tested this thoroughly and actually uncovered a couple bugs. 1) dtypes changing to np.float64 in _bracket_minimum_iv, 2) not technically a bug, but I was not calculating the new endpoint in the same way as in _bracket_root. Also, I've checked that all cases now pass in the smoketests you wrote above. I haven't updated internal variable names to match _bracket_root, but maybe that can be left for a followup. I like using fully spelled out names,
because it makes it easier for me to read what's going when my brain starts scrambling things. Now that it seems to be working correctly, I don't mind changing to match the convention from _bracket_root though.

I like xl, xm, xr better than a, b, c, and xmin and xmax better than min and max for parameter names. It's probably good to make the two functions consistent though.

@steppi
Copy link
Contributor Author

steppi commented Feb 6, 2024

OK. I think I've successfully resolved the merge conflicts. I think the main thing to settle down now is consistent argument names for _bracket_root and _bracket_minimum.

For the former we currently have a, b, min, max, but with the convention of the later these would be xl, xr, xmin, xmax.

For the later we have xm, xl, xr, xmin, xmax. Using the convention from the former this would be c, a, b, min, max.

I prefer the convention I chose for bracket_minimum because

  • I find it easy to mix up which points along the interval a, b, and c are supposed to be. Is b the midpoint, or c? This isn't a problem if we only have _bracket_root.
  • I don't think we should encourage redefining the built-ins min and max.

@mdhaber
Copy link
Contributor

mdhaber commented Feb 7, 2024

The reason I didn't use xl etc... was that these don't have the same properties/meaning as the results, so I didn't think the attributes of the return object should be the same as the names of these inputs. Maybe xmin, xl0, xm0, xr0, xmax? I don't love the 0s either because that suggests "guess", and I'm not sure if these should be interpreted as guesses in the same way as x0 of, say, newton. But I agree that we should standardize and the current names aren't ideal.

@steppi
Copy link
Contributor Author

steppi commented Feb 7, 2024

The reason I didn't use xl etc... was that these don't have the same properties/meaning as the results, so I didn't think the attributes of the return object should be the same as the names of these inputs. Maybe xmin, xl0, xm0, xr0, xmax? I don't love the 0s either because that suggests "guess", and I'm not sure if these should be interpreted as guesses in the same way as x0 of, say, newton. But I agree that we should standardize and the current names aren't ideal.

Oh, good point. I do think of them like initial guesses, so I don't think that aspect is an issue. It kind of bugs me that I can't help reading them as "x lo", "x mo", "x ro". xl_0, xm_0, xr_0, is more clear in that regard, but it's annoying to have to type the underscore, and then would we have x_min, x_max for consistency? I can live with xl0, xm0, xr0.

@mdhaber
Copy link
Contributor

mdhaber commented Feb 7, 2024

Would prefer to avoid the underscores, and it's x0 without the underscore in basically every other interface. So let's go with xmin, xl0, xm0, xr0, and xmax, at least while they're private.

@steppi
Copy link
Contributor Author

steppi commented Feb 9, 2024

Would prefer to avoid the underscores, and it's x0 without the underscore in basically every other interface. So let's go with xmin, xl0, xm0, xr0, and xmax, at least while they're private.

Done

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at _bracket.py and found some minor things. I'll take a look at the tests soon!

Comment on lines +422 to +430
if not np.all(factor > 1):
raise ValueError('All elements of `factor` must be greater than 1.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to remember thinking of cases in which factor < 1 could be desirable for _bracket_root. Anyway, something to think about, but this is the primary use case.

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test match those for TestBracketRoot for the most part, and I'm not thinking of other tests that would be needed here (that aren't already). The main comments are that the vectorization test looks like it could be stronger and I'd like to know more about the magic numbers in tests from test_scalar_no_limits through test_minima_at_boundary_point.

@mdhaber
Copy link
Contributor

mdhaber commented Feb 13, 2024

OK, I think those will be my only comments. When they are resolved, I think it's good to go.

When it's done, would you do a quick PR to address #19757 (review) for _bracket_root?

@steppi
Copy link
Contributor Author

steppi commented Feb 13, 2024

Thanks @mdhaber! I batched up and committed all of your suggestions and should have time to fix the remaining things on Thursday or Friday.

@mdhaber
Copy link
Contributor

mdhaber commented Feb 13, 2024

OK. When that's done, maybe squash commits before and after the name changes so that the name change commit is separate; we'll merge three commits.

@steppi
Copy link
Contributor Author

steppi commented Feb 19, 2024

I think I've addressed everything.

  • Replaced "minima" with "minimum" in another place in the docstring, because we are sticking with singular.
  • Changed vectorization test to match that for bracket_root.
  • added comments to explain the constants in the tests for bracket_minimum
  • Make it so xl0 and xr0 are updated to be made feasible only if they are taking default values, not if user passed infeasible values.
  • Added check that result.xl, ... result.fl .. are correct in special cases test where initial bracket is valid.
  • combined xl0 and xr0 in docstring

I've also fixed a place where I accidentally changed a, b to xl0, xr0, when these were args for the function f that really should have been a and b. After you think this looks good, I'll do an interactive rebase to group it into 3 commits like you suggested.

@mdhaber
Copy link
Contributor

mdhaber commented Feb 20, 2024

Re: xl0, xr0 "Must be broadcastable with one another." They also must be broadcastable with xm0. I've not been sure how to document this sort of thing because in the end, essentially all array arguments, including those in args, must be mutually broadcastable with one another. So far, I have been mentioning the broadcasting requirement with all arguments that have already been documented, so this might be like "Must be broadcastable with one another and xm0".

@steppi
Copy link
Contributor Author

steppi commented Feb 20, 2024

Well, that attempt at git surgery was less than successful. Sorry everyone who got notified.

@steppi
Copy link
Contributor Author

steppi commented Feb 20, 2024

@mdhaber, I'm just going to close this and make a new PR with the three commits.

@steppi steppi closed this Feb 20, 2024
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

Only update xl0/xr0 to feasible values when not supplied

Fix plural when should be singular

Combine xl0, xr0 in docstring

Fix false positive variable renaming

Change vectorization test to match TestBracketRoot

Add comments explaining magic constants in tests

Add another check to initial bracket valid test

Mention that `xl0`, `xr0` must be broadcastable with `xm0`
@steppi
Copy link
Contributor Author

steppi commented Feb 20, 2024

Here's the backup of my branch so you can check the diff https://github.com/steppi/scipy/tree/scalar-minimize-bracket-finder-backup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants