Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input variables scaling #95

Closed
jschueller opened this issue Oct 2, 2023 · 11 comments
Closed

input variables scaling #95

jschueller opened this issue Oct 2, 2023 · 11 comments

Comments

@jschueller
Copy link
Collaborator

jschueller commented Oct 2, 2023

@emmt
Copy link

emmt commented Oct 2, 2023

Scaling can be done automatically for fully bounded variables. Using a non-linear transform (like arcsine or tanh) may not be a good idea for the convergence. The main reason to scale variables is to have similar magnitude for the changes of all variables of the problem and, for example, avoid that the convergence depends on the units of the variables. This is ususal trick when applying a numerical method to a problem with heterogeneous variables.

@zaikunzhang
Copy link
Member

zaikunzhang commented Oct 8, 2023

Hi @jschueller and @emmt ,

Apologies for the late reply.

First of all, I totally agree that scaling is important. Thank you for raising this point.

However, I do not think a trivial and good scaling strategy exists.

  1. Similar to what @emmt pointed out, it is generally wrong to scale using a nonlinear transformation. I am not saying that it is always wrong --- a certain transformation may work extremely well on a particular class of problems (see the example at the end of this comment), but a nonlinear transformation that does not fit the problem will likely make the problem become more difficult.

  2. It is hard to find a scaling strategy that fits all problems. For example, consider the following problems.

    1.) No constraints. The starting point is $0$.
    2.) No constraints. The starting point is $10^{-3}$.
    3.) No constraints. The starting point is $10^{-16}$.
    4.) No constraints. The starting point is $10^{10}$.
    5.) The bound constraint is $-10^{10} \le x \le 10^{10}$. The starting point is $10^{10}$.
    6.) The bound constraint is $10^{10}-1 \le x \le 10^{10}+1$. The starting point is $10^{10}$.
    7.) The bound constraint is $-\infty < x \le 10^{10}+1$. The starting point is $10^{10}$.

  3. The best scaling strategy is only known to the user, who should scale the problem before passing it to the solver.
    If he/she does not do so, I suggest considering defining the scaling factor with the following strategy or something similar. The factor is defined entrywise. Here, $t$ is a small value, e.g., $10^{-3}$.

    1.) The bound constraint is $l \le x \le u$:
    $(u-l)/2$ provided that this value is at least eps (otherwise, we consider $x$ should be fixed at (l+u)/2 and remove this constraint.
    2.) The bound constraint is $-\inf < x \le u$:
    $\max(|x_0|, |u|)/2$ provided that this value is at least $t$.
    3.) The bound constraint is $l \le x < \inf$:
    $\max(|x_0|, |l|)/2$ provided that this value is at least $t$.
    4.) No bound constraint:
    $|x_0|/2$ provided that this value is at least $t$.

  4. There should be a flag to switch on or off the scaling. I suggest switching it off by default.

Note that the above strategy does not work well in the following case, where the problem probably needs a nonlinear transformation (taking logarithm) before any scaling (recall point 1 above).
- The bound constraints are $10^{-10} \le x \le 10^{10}$. The starting point is $10^{10}$.

Thanks.

@emmt
Copy link

emmt commented Oct 9, 2023

3. The best scaling strategy is only known to the user, who should scale the problem before passing it to the solver.
4. There should be a flag to switch on or off the scaling. I suggest switching it off by default.

I totally agree with these 2 points, this is why I would add the scaling as an optional argument (e.g. a keyword in Julia interface) whose values (the scaling factors for each variables of the problem) are chosen by the caller.

If the optional scaling is specified as a single value, we could assume that it is the parameter t in the above discussion and compute the scaling factors following what you propose and according to the values of the initial variables and the bounds (x0, l, and u).

@zaikunzhang
Copy link
Member

zaikunzhang commented Oct 17, 2023

If the optional scaling is specified as a single value, we could assume that it is the parameter t in the above discussion and compute the scaling factors following what you propose and according to the values of the initial variables and the bounds (x0, l, and u).

The scaling scheme involving t is only a very rough idea. I am not sure whether it is the best scaling we can have without information from the user. However, I do not think it is a good idea to let users input t in any way. It is a parameter that is too technical and should not be exposed to the user.

The MATLAB interface does the following:

  1. Should the problem be scaled? This is indicated by an option called scale, which is defaulted to false.
  2. If scale is true, then scale the problem in the best way that we can figure out according to the data (e.g., using x0, lb, ub as elaborated above).

We do not accept a scaling that is input by the user --- if the user does know the scaling, then he/she should have scaled the problem before sending it to the solvers. It is not something complicated/error-prone to do and it is unnecessary for the solvers to make things become complicated (e.g., iprint as you noted) by handling such things.

In other words, we only handle two cases:

  1. the user does not want to scale the problem;
  2. the user asks us to scale the problem and decide how to scale it ourselves.

Thanks.

@emmt
Copy link

emmt commented Oct 17, 2023

We do not accept a scaling that is input by the user --- if the user does know the scaling, then he/she should have scaled the problem before sending it to the solvers. It is not something complicated/error-prone to do and it is unnecessary for the solvers to make things become complicated (e.g., iprint as you noted) by handling such things.

OK but then this means that, internaly, the Powell's algorithms are capable of dealing with non-unit scaling of variables (in particular for the iprint stuff), so why not let the user specify scaling factors if she/he has good reasons to do so and do not want to cope with the complicated task of correctly dealing with these factors in the objective function. An example that comes to my mind is model fitting with heterogeneous parameters: most users would like to express their objective function as simply as possible considereng their units of choice for the variables but also have an idea of the typical magnitude of the variables at the solution. This information can easily be turned into scaling factors. This was my proposition.

@zaikunzhang
Copy link
Member

zaikunzhang commented Oct 17, 2023

OK. I agree.

What about the following?

Define two options, scale and scaling_factor, the first one being logical/boolean, and the second being a positive vector of length n (dimension of the problem).

  1. If the user inputs scale = true or does not input it, and specifies scaling_factor, then scale the problem as

    x_before_scaling = scaling_factor * x_after_scaling + x0

    where * is entry-wise.

  2. If the user inputs scale = true but does not specify scaling_factor, then we define the scaling factor in a certain way (e.g., the way outlined above) and do the scaling as in 2.

  3. If the user inputs scale = false but also inputs scaling_factor, then raise a warning and do not do the scaling.

If the user inputs scaling_factor, we need to verify its validity unless scale is set to false. A scaling factor is valid if each component is positive and finite. If any component is invalid, then we raise a warning and define the scaling factor for this component in our own way. All the valid components of the user-defined scaling factor will be used as is.

The problem with iprint is not solvable. The best we can do is to mention in the documentation that the information printed by the Fortran backend corresponds to the scaled problem if the scaling does happen.

@emmt
Copy link

emmt commented Oct 17, 2023

Since, in Julia, we have access to the type of the arguments/keywords, we can combine your suggested logic in a single keyword, say scale, so that when scale = true or, better, scale = :auto (where :auto is the symbolic name auto in Julia), then some automatic scaling strategy is applied (along your point 2), otherwise scale may be set with a vector of scaling factors, or left to its default value false or :none (again a Julia symbolic name). Using symbolic names rather than true or false is more informative about what will be done regarding the scaling of the variables and left open the possibility to implement different strategies identified by their names. Besides, having the scaling of the variables controled by a single keyword is easier to understand for the users.

If the user inputs scaling_factor, we need to verify its validity unless scale is set to false. A scaling factor is valid if each component is positive and finite. If any component is invalid, then we raise a warning and define the scaling factor for this component in our own way. All the valid components of the user-defined scaling factor will be used as is.

I agree that scaling factors must be checked for validity but I would rather throw an exception (i.e. raise an error rather than a warning) if they are invalid because it means that the caller has made a wrong choice and must fix that.

@zaikunzhang
Copy link
Member

zaikunzhang commented Oct 17, 2023

Since, in Julia, we have access to the type of the arguments/keywords, we can combine your suggested logic in a single keyword, say scale, so that when scale = true or, better, scale = :auto (where :auto is the symbolic name auto in Julia), then some automatic scaling strategy is applied (along your point 2), otherwise scale may be set with a vector of scaling factors, or left to its default value false or :none (again a Julia symbolic name). Using symbolic names rather than true or false is more informative about what will be done regarding the scaling of the variables and left open the possibility to implement different strategies identified by their names. Besides, having the scaling of the variables controled by a single keyword is easier to understand for the users.

This is an excellent idea!

I agree that scale = :auto conveys the idea better than scale = true.

If the user inputs scaling_factor, we need to verify its validity unless scale is set to false. A scaling factor is valid if each component is positive and finite. If any component is invalid, then we raise a warning and define the scaling factor for this component in our own way. All the valid components of the user-defined scaling factor will be used as is.

I agree that scaling factors must be checked for validity but I would rather throw an exception (i.e. raise an error rather than a warning) if they are invalid because it means that the caller has made a wrong choice and must fix that.

I tend to disagree with this point. The reason is as follows.

When developing a solver (be it optimization, linear algebra, PDE, ...), we should keep an important fact in mind: the "caller" is not necessarily a human. It is very likely another piece of code that calls the solver. The invocation of the solver is probably a quite small part of a bigger project. Moreover, the invocation of the solver may be done repeatedly in a loop.

For example, consider the following pseudo-code.

do k = 1, 2, ..., until x_k achieves some kind of convergence
x_k = X(x_1, ..., x_{k-1}, other data)
obj_k = O(x_1, ..., x_k, other data)
scale_k = S(x_1, ..., x_k, other data)
solver(obj_k, x_k, scale_k)
end do

Suppose that, in theory, it is guaranteed that scale_k is always valid. In computation, scale_k may become invalid (some components become zero or small negative values) due to ill-conditioning and rounding errors, even though the "some kind of convergence" is still far from being achieved. In this situation, should we raise an error and terminate the computation, or should we raise a warning, correct the scale_k, and continue with the loop?

In my opinion, a user-friendly package should tolerate any erroneous options as long as the problem being solved is still well-defined. If an option is invalid, then raise a warning and reset it to a correct one. (Of course, exceptions exist.)

Indeed, this happens quite often for linear solvers. Consider a problem where the coefficient matrix is theoretically nonsingular. In practice, the matrix may become numerically singular due to ill-conditioning. If this happens, should we abort the computation and raise an error, or should we raise a warning and solve the problem in the best way we can? I would prefer the latter. (In this case, the problem being solved is even not well-defined in the classical sense).

What do you think? Thanks.

@emmt
Copy link

emmt commented Oct 17, 2023

Well, for me, such an error indicates an erroneous use of the program which I would not left pass through. My experience is it's usually a bad idea to sweep this kind of mistake under the carpet because it delays the moment when one realizes the error and correctly fixes it.

In the case of your example, a possible explicit fix of scale_k is to replace the call to solver by:

solver(obj_k, x_k, max(scale_k, eps))

assuming max operates elementwise and where eps is any suitable positive threshold. Many other strategies are possible, for instance fixing the problem only if the issue is raised (you can generally catch errors in high-level languages).

We don't have to make a decision right now, especially not until we've defined a rescue strategy. I would propose to cope with this in several steps:

  1. Implement the option of manually scaling the variables. It is the caller's responsibility to provide good scaling factors, an error is raised if these factors are not definite or not strictly positive.
  2. Implement and test good automatic scaling strategies. As you pointed, this may be a long process.
  3. Decide on more elaborated ways to deal with erroneous scaling factors.

At least implementing step 1 is a prerequisite.

@zaikunzhang
Copy link
Member

zaikunzhang commented Oct 17, 2023

such an error indicates an erroneous use of the program which I would not left pass through.

I agree that some errors likely indicate an erroneous use of the program, which should not be left pass through. This is what I mean by the exceptions when saying the following.

In my opinion, a user-friendly package should tolerate any erroneous options as long as the problem being solved is still well-defined. If an option is invalid, then raise a warning and reset it to a correct one. (Of course, exceptions exist.)

However, I am not sure whether an invalid scale falls into this category or not. To me, if scale is defined by some code using some data, an invalid scale sounds similar to a numerically singular matrix in the case of linear systems.

My experience is it's usually a bad idea to sweep this kind of mistake under the carpet because it delays the moment when one realizes the error and correctly fixes it.

I agree with this idea in general. That is why users should always be informed when some inputs are erroneous, eitherby a warning (in most cases) or by an error (in exceptional cases).

We don't have to make a decision right now, especially not until we've defined a rescue strategy. I would propose to cope with this in several steps:

I agree. Anyway, the particular question of "how to deal with an invalid scale" is not a quite big issue (but the question of "how to deal with invalid inputs in general" is important).

Thanks.

@jschueller
Copy link
Collaborator Author

it turns out i'm not so much interested in that anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants