-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: code cleanup in optimize/linesearch #9019
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Watesoyan. Some cleanups look good, some are unclear to me.
@@ -684,7 +683,7 @@ def scalar_search_armijo(phi, phi0, derphi0, c1=1e-4, alpha0=1, amin=0): | |||
|
|||
# Otherwise compute the minimizer of a quadratic interpolant: | |||
|
|||
alpha1 = -(derphi0) * alpha0**2 / 2.0 / (phi_a0 - phi0 - derphi0 * alpha0) | |||
alpha1 = _quadmin(0, phi0, derphi0, alpha0, phi_a0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not obvious that this is the same as the old code, with the return None
that _quadmin
can do. Can you explain this change a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add codes to handle this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, verified that this matches:
import numpy as np
def _quadmin(a, fa, fpa, b, fb):
"""
Finds the minimizer for a quadratic polynomial that goes through
the points (a,fa), (b,fb) with derivative at a of fpa,
"""
# f(x) = B*(x-a)^2 + C*(x-a) + D
with np.errstate(divide='raise', over='raise', invalid='raise'):
try:
D = fa
C = fpa
db = b - a * 1.0
B = (fb - D - C * db) / (db * db)
xmin = a - C / (2.0 * B)
except ArithmeticError:
return None
if not np.isfinite(xmin):
return None
return xmin
for _ in range(100):
derphi0 = np.random.rand(1)
alpha0 = np.random.rand(1)
phi_a0 = np.random.rand(1)
phi0 = np.random.rand(1)
alpha1 = -(derphi0) * alpha0**2 / 2.0 / (phi_a0 - phi0 - derphi0 * alpha0)
alpha2 = _quadmin(0, phi0, derphi0, alpha0, phi_a0)
np.testing.assert_allclose(alpha1, alpha2, atol=0, rtol=5e-16)
b = b / factor | ||
|
||
alpha2 = (-b + np.sqrt(abs(b**2 - 3 * a * derphi0))) / (3.0*a) | ||
alpha2 = _cubicmin(0, phi0, derphi0, alpha0, phi_a0, alpha1, phi_a1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, doesn't look the same. can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the None return value actually happens, and the alpha2
value changes. So unclear if this change is correct and/or fully backwards compatible:
import numpy as np
def _cubicmin(a, fa, fpa, b, fb, c, fc):
"""
Finds the minimizer for a cubic polynomial that goes through the
points (a,fa), (b,fb), and (c,fc) with derivative at a of fpa.
If no minimizer can be found return None
"""
# f(x) = A *(x-a)^3 + B*(x-a)^2 + C*(x-a) + D
with np.errstate(divide='raise', over='raise', invalid='raise'):
try:
C = fpa
db = b - a
dc = c - a
denom = (db * dc) ** 2 * (db - dc)
d1 = np.empty((2, 2))
d1[0, 0] = dc ** 2
d1[0, 1] = -db ** 2
d1[1, 0] = -dc ** 3
d1[1, 1] = db ** 3
[A, B] = np.dot(d1, np.asarray([fb - fa - C * db,
fc - fa - C * dc]).flatten())
A /= denom
B /= denom
radical = B * B - 3 * A * C
xmin = a + (-B + np.sqrt(radical)) / (3 * A)
except ArithmeticError:
return None
try:
if not np.isfinite(xmin):
return None
except TypeError:
print(xmin)
raise
return xmin
for _ in range(100):
derphi0 = np.random.rand(1)
alpha0 = np.random.rand(1)
alpha1 = np.random.rand(1)
phi_a0 = np.random.rand(1)
phi_a1 = np.random.rand(1)
phi0 = np.random.rand(1)
factor = alpha0**2 * alpha1**2 * (alpha1-alpha0)
a = alpha0**2 * (phi_a1 - phi0 - derphi0*alpha1) - \
alpha1**2 * (phi_a0 - phi0 - derphi0*alpha0)
a = a / factor
b = -alpha0**3 * (phi_a1 - phi0 - derphi0*alpha1) + \
alpha1**3 * (phi_a0 - phi0 - derphi0*alpha0)
b = b / factor
alpha2 = (-b + np.sqrt(abs(b**2 - 3 * a * derphi0))) / (3.0*a)
alpha3 = _cubicmin(0, phi0, derphi0, alpha0, phi_a0, alpha1, phi_a1)
if alpha3 is None:
print(alpha1)
alpha3 = alpha1 / 2.0
try:
np.testing.assert_allclose(alpha3, alpha2, atol=0, rtol=5e-14)
except TypeError:
# If alpha3 is None, this could happen (added before catching that case above)
print(alpha2, alpha3)
There was an unresolved comment, there are merge conflicts, and this has been inactive for a while, so I'm tempted to close this. But @j-bowhay I see that you added this to the 1.10.0 milestone - did you plan to pick this up? I'm not sure that it's necessary. The code in question was all added in the same PR, so it might be nice if the code had been written this way originally (I'm not sure), but unless there's a substantial performance gain or major simplification, I'm not sure we need to disturb it with a refactoring. LMK your thoughts! |
@mdhaber I originally milestoned as I thought it looked pretty close just in need of some attention but with hindsight I agree lets close this. If it is later demonstrated to be of significant value we can reopen. |
simplify codes and reuse the functions, like
_quadmin
and_cubicmin