Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix aic/bic calculation #263

Merged
merged 6 commits into from
Feb 26, 2021
Merged

fix aic/bic calculation #263

merged 6 commits into from
Feb 26, 2021

Conversation

dbrakenhoff
Copy link
Member

Short Description

  • Fix number of parameters: nparam is number of parameters used in optimization
  • use noise instead of residuals for calculation

Checklist before PR can be merged:

  • closes issue #xxxx
  • is documented
  • PEP8 compliant code
  • tests added / passed
  • passes Travis
  • Example Notebook (for new features)
  • API changes documented in Release Notes

@dbrakenhoff
Copy link
Member Author

I've looked into this again and I think the implementation of AIC/BIC calculation is currently incorrect. This might also explain why these statistics were yielding unexpected results, e.g. models with a much better fit after adding a missing variable resulting in a higher AIC than the model without this missing variable. Basically the penalty for extra parameters was outweighing the calculated increase in likelihood.

Based on this entry in the Wikipedia article about the AIC, I propose the following change:

Current implementation:

-2.0 * log((res ** 2.0).sum()) + 2.0 * nparam

Proposed implementation:

res.index.size * log((res ** 2.0).sum()) + 2.0 * nparam

Furthermore, as discussed before the AIC/BIC are calculated using the noise (uncorrelated residuals). That then leads to the question what to return for these statistics when no noisemodel is present? Do we use the residuals, or do we return NaN for the AIC/BIC? I've now opted for the former option, still calculating a value, but formally it can only be interpreted if the residuals meet the statistical criteria (no autocorrelation, homoscedastic, etc.). I think this makes sense, as the use of a noisemodel is also not necessarily a a guarantee that these criteria are met. The fit report curently sets the AIC/BIC to NaN if noise=False.

This change also leads to larger AIC/BIC values, so the fit_report is also changed slightly to accommodate the larger values.

@raoulcollenteur
Copy link
Member

I checked and this is the same as Lmfit does so I think that makes extra sense

https://github.com/lmfit/lmfit-py/blob/b7d458b548088a8a27d5ddf3c44ff8b224304172/lmfit/minimizer.py#L379

Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants