Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve optimisers for +R models #56

Open
roblanf opened this issue Feb 10, 2022 · 2 comments
Open

Improve optimisers for +R models #56

roblanf opened this issue Feb 10, 2022 · 2 comments
Labels
modelfinder2 things to do before benchmarking modelfinder2

Comments

@roblanf
Copy link
Collaborator

roblanf commented Feb 10, 2022

On many datasets I notice warnings like:

WARNING: Log-likelihood -7226.94 of K2P+R4 worse than K2P+R3 -7059.91

Obviously this shouldn't happen. The lnL should always be better with R4 than R3. I'm guessing this is just a limitation of the current optimiser. In many cases it seems like a fairly big limitation too. E.g. in the example above the difference is >150 likelihood units.

So, I have a suggestion. When we optimise RN+1 (e.g. R4) we should do an intialisation step where we start with the ML rate parameters from RN (e.g. R3), and just add an extra one while holding the initial N parameters constant. We can then try to optimise this constrained model, e.g. by sliding the new parameter from the minimum up to double the maximum rate from RN. My bet is that this will often get us a model with RN+1 that has a better likelihood. But even if it doesn't, we can then pass these RN+1 rates to the BFGS or EM optimiser to further optimise them all together.

Thoughts @bqminh and @thomaskf? This is really just a constrained EM step to start with. And maybe we already do something like this.

Either way, it seems like there's room for improvement here.

@roblanf roblanf added the modelfinder2 things to do before benchmarking modelfinder2 label Feb 10, 2022
@bqminh
Copy link
Collaborator

bqminh commented May 20, 2024

This is done already, i.e. R4 parameters are initialised from the R3. In the code it's this function: RateFree::initFromCatMinusOne() of model/ratefree.cpp,

void RateFree::initFromCatMinusOne() {
.
However, this is only one way of initialisation. Happy to chat if you have 'better' suggestions. Anyway, a lot of testings will be needed...

@roblanf
Copy link
Collaborator Author

roblanf commented May 20, 2024

My suggestion is not quite the same. It's that we initially hold the CatMinusOne parameters constant, and only optimise the new parameter. Once that's done, we optimise all of them.

E.g. if R2 gave: 0.1, 2.0

Then we initialise R3 with 0.1, 2.0, New

And we hold 0.1, 2.0 constant while finding the optimum value of New (allowing it to be anything from the minimum to maximum bound, i.e. smaller than 0.1, between 0.1 and 2.0, and larger than 2.0).

This might give e.g.

R3: New=0.08, 0.1, 2.0; 0.1, New=0.5, 2.0; 0.1, 2.0, New=3.2 etc.

It should be simple to find a better likelihood like this, because we are optimising a single parameter.

The final step is to optimise all parameters at once, using the values from the prior step as the initialisation.

Happy to test this if someone can implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modelfinder2 things to do before benchmarking modelfinder2
Projects
None yet
Development

No branches or pull requests

2 participants