In [19]:
# Getting things ready!
!uv pip install statsmodels pandas --quiet # noqa

# Not All Models Are Wrong
## Peter Dresslar

This week, with the start of two foundational Masters-level Complexity Sciences courses, the same quotation was shared in each as the central basis of the week's discussion to open our exploration of model design and emergent phenomena.

> "All models are wrong. Some are useful."[^1]

This famous saying—promulgated by prominent twentieth-century statistician George Box—is one of the more widely-known commentaries on modeling discipline: so much so, that it has its own Wikipedia page.[^2] 

However, speaking at least from a contemporary perspective, one wonders how helpful it is to welcome new students and prospective practicioners to an exploration in complexity science with such a quixotic passage. 

In other words, is "All models are wrong..." useful?

<p align="center">. . .</p>

Before we get ahead of ourselves, it seems appropriate to discuss the context of the aphorism[^3] in which it was originally published.[^4] Here it is quoted from Box's book chapter, *Robustness in the Strategy of Scientific Model Building*, published in 1979. In the following quote, the capitalization and lack of internal punctuation are both taken from the source text.

>ALL MODELS ARE WRONG BUT SOME ARE USEFUL ... Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V and temperature T of an "ideal" gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules. For such a model there is no need to ask the question "Is the model true?". If "truth" is to be the "whole truth" the answer must be "No". The only question of interest is "Is the model illuminating and useful?".[^5] 

Interestingly, there are actually two seperate ideas hinted at in the discussion here. The first is the difficulty in replicating real-world conditions in the form of model math with absolute fidelity. The second is the idea that the value of a model is independent of such fidelity. 

Each of these two ideas is developed at length in the chapter, though the latter one is tempered through rigorous discussion of methods for model "Robustification."[^6] As Box was a statistician, those Robustification methods are unsurprisingly rooted deeply in analytical methods from that field.

Box would go on to utter the phrase in various forms at conferences and other fora, seemingly as sort of a "catch phrase" that caught on. In his writings and from the discussion of his life on his Wikipedia entry and other authors, it seems very clear that Box was both an iconoclast and—to leverage this author's lived experience—very much a man of his place and time. The pithiness of "all models are wrong" suited those places and times quite well. 

But places and times have changed.

Some academics have commented on the quotation over the years, with at least a few of them seeming to take a bit of issue with it.

>"Finally it does not seem helpful just to say that all models are wrong. The very word model implies simplification and idealization. The idea that complex physical, biological or sociological systems can be exactly described by a few formulae is patently absurd. The construction of idealized representations that capture important stable aspects of such systems is, however, a vital part of general scientific analysis..." [^7]

This commentary, by David Cox, appears in the commentary from a well-cited paper by Chris Chatfield called *Model Uncertainty, Data Mining and Statistical Inference.* Intriguingly, the comment appears to mix up elements of the Box discussion, given that Box actually refers to an "idealized system" in his prose. It would seem, perhaps, that Cox did not actually have access to the Box.[^8]

Still, his point is well-taken. What else would models exist to do, except to translate the systems of the world around us into "logical devices" through which we can generate understanding and predictive ability? And if such models are providing such utility, to the degree that we are able to trust them, how could we call them wrong?

Roughly fifty years after Box's first coinage of the saying, we are unquestionably living in a new regime of thinking; or, by the estimate of some, a new regime of *un*-thinking. Readers will need little reminder of society's changing attitudes toward basic sciences and the nature of facts themselves. 

<p align="center">...</p>

Whether or not Box's quotation is useful, it is certainly wrong.

Let's demonstrate this with a model.


In [20]:
import datetime, time # noqa

however_many_years = 77
todays_year = datetime.datetime.now().year
start_year = todays_year  # or feel free to choose your own
dt = 11  # years to step

for year in range(start_year, start_year + however_many_years, dt):
    if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0):
        print(f"\r{year} is a leap year     ", end="", flush=True) # no \n
    else:
        print(f"\r{year} is not a leap year", end="", flush=True) # no \n
    time.sleep(1)  #  Seconds. For human-utility purposes.

2095 is not a leap year 

This model is a flawless reprentation of the Gregorian calendar, a real-world, everyday system used by billions of people. We might estimate that this model has faint but non-zero usefulness. 

It is not wrong.

But what about more complex, real-world models?

<p align="center">. . . </p>

In the article, *The adaptive value of morphological, behavioural and life-history traits in reproductive female wolves*, Daniel Stahler ****et al.**** use a comprehensive dataset of wolf behaviors gathered from over a decade of observations at Yellowstone National Park to build a comprehensive understanding of wolf family and social structures.[^9] 

The article describes a model tailored to these data a Generalized linear mixed model, or GLMM. A GLMM uses a linking function to connect random and fixed effects within a system. 

In a separate article, Bolker *et al.* helpfully point out the advantage of the approach, communicating that "(n)onnormal data such as counts or proportions often defy classical statistical procedures. Generalized linear mixed models (GLMMs) provide a more flexible approach for analyzing nonnormal data when random effects are present."[^10] 

This description of GLMMs is not only a helpful description of the model class itself, but also a useful view into the kinds of concrete challenge facing modelers, which is where we started in the first place!

Stahler and colleagues use the GLMM to associate the effects of wolf characteristics like age, body mass, coat color, litter size, and pack size to identify the effects on the survival rates of wolf pups. An example of a "random" effect is the occassional outbreak of disease over the years, which is fitted for in the models. This is precisely the kind of effect that the "mixedness" of GLMM is good at handling. [^11]

Since this is a Python notebook, we can easily build a simplified GLMM (technically, a LMM) that echoes the paper.


In [21]:
# GLMM simplified into Linear Mixed Model (LMM) for easy execution in Statsmodels.
# See, for example, https://github.com/junpenglao/GLMM-in-Python/blob/master/GLMM_in_python.ipynb

import statsmodels.formula.api as smf
import pandas as pd

# Example wolf data, intended to mimic the Stahler data

data = pd.DataFrame({
    'pups_born': [5, 4, 6, 7, 3, 2, 6, 4, 5, 3],
    'body_mass': [34, 37, 42, 47, 29, 44, 39, 36, 41, 33],
    'pack_id': ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'D', 'E', 'F']
})

# Fit the LMM reminiscent of the Stahler GLMM
model = smf.mixedlm('pups_born ~ body_mass',
                    data=data, 
                    groups=data['pack_id'])  # note that Stahler2013 uses a Poisson GLMM; here we use the Gaussian distribution 

result = model.fit()
print(result.summary())

         Mixed Linear Model Regression Results
Model:            MixedLM Dependent Variable: pups_born
No. Observations: 10      Method:             REML     
No. Groups:       6       Scale:              1.3098   
Min. group size:  1       Log-Likelihood:     -18.1904 
Max. group size:  2       Converged:          Yes      
Mean group size:  1.7                                  
--------------------------------------------------------
           Coef.  Std.Err.    z    P>|z|  [0.025  0.975]
--------------------------------------------------------
Intercept  1.660     4.515  0.368  0.713  -7.188  10.509
body_mass  0.073     0.118  0.621  0.535  -0.158   0.305
Group Var  0.981     1.925                              



While our model (or, perhaps more aptly, our model of a model) above is extremely simplified and departs from the statistical controls presented in the paper's analysis, even this version is somewhat useful in order to conceptualize how the authors came to their conclusions. 

While the *Wolves* paper acknowledges the possibility of observational errors and uncertainties, it goes to great lengths in understanding, documenting, and controlling those errors.

And those conclusions seem, to the wolf-layperson, quite striking:

> (O)ur study our study clarifies how life history, sociality and ecological conditions interact in cooperative breeders and ranks the adaptive value of traits in promoting individual fitness in competitive and stochastic environments... In wolves, it appears that individual performance is influenced more by phenotypes than environmental conditions, and it would be valuable to know if this were true in other taxa. [^12]

According to Google Scholar, the 2013 work has been cited 130 times, with at least a few citations that confirm the original findings. [^13]

So we have a model that is definitely useful in applying ecological controls to a species of animal that needs special management to survive in anthropocenic times. It matches actually observed phenomena of the world to a degree that it can precisely measure. And it even has the power of extensive peer review behind it: peer review not being flawless, of course, but also being a system through which we human beings leverage our collective understanding to improve the verification and contextualization of new knowledge.

How could we call this model wrong?

<p align="center">. . .</p>

Many models are wrong. Or, to style it like Box might have:

<p align="center"><u>MANY MODELS ARE WRONG</u></p>

Just as a thought experiment, consider the following idea with our calendar simulation model, above. Could we change it, and still have it correctly simulate leap years? 

Surely yes. There would be dozens or hundreds of ways to implement the simulation correctly: though, with the enumeration of many correct models, the distinctions between successful implementations would probably become increasingly trivial. Then, if we imagine the very edges of the class of correct (but perhaps overly complicated or oblique) model implementations, we might imagine what lies beyond: myriad possible failing models. Models that fail to account for the extra leap years a certain modulos, or that just do not add fours correctly. Of course, all these definitely-wrong models must supremely outnumber the right ones, as there should be infinitely many ways to get a model to fail. After all, we can always add another operator and another Greek letter, and, *voila*, the model is still bad.

In fact we might expect the same arrangement to be found with all other models of mundane, universal phenomena: a small kernel of correct models, perhaps a larger cloud of nearly correct models, and then an enveloping sea of wrong answers.

Certainly, from the theoretical consideration of all the models that could ever be, most models are wrong. Nearly all possible models, we might conjecture, are wrong!

Grounding our discussion into the earthly domain, observers of the sciences starting prior to the Twentieth Century, and particularly through the middle years of the second half of that century, have increasingly become skeptical regarding the veracity of many different mathematical approaches to representing our world. These waves of skepticism range from criticism of the real mathematical fidelity of models to real world sources of data---of this wave, Box was certainly a strong proponent---to more general concerns about the ability of scientists to accurately gather data and model phenomena with sufficient precision, especially in complex arenas like ecology and fields of human study. Roughly mid-century in *Cybernetics*, Norbert Wiener aptly summed up these concerns with an analogy:

>  [T]he human sciences are very poor testing- grounds for a new mathematical technique: as poor as the statistical mechanics of a gas would be to a being of the order of size of a molecule, to whom the fluctuations which we ignore from a larger standpoint would be precisely the matters of greatest interest. [^17]

Authors like Feyeraband and Kuhn go even further, questioning the ability of scientists to in many cases extract themselves with their own perceptual biases, and, crucially, vocational constraints and social limitations. Indeed, reproducability is a well-known problem in many fields of research, and the strong social influences on science from the industry of science, the authors argue, have the effect of higher production of lower quality science, including models.

That these arguments do seem to have arrived in waves, spaced roughly every generation starting just before the turn of the Twentieth Century, suggests that there is a strong historigraphic framework driving acceptance of model development. While that framework is beyond the ken of this essay, we can turn our attention to the close of the century to sum up the challenges facing model integrity. To do that, we have this lengthy introduction by Steven Pincus from his work, *Approximate entropy as a measure of system complexity*:

> In an effort to understand complex phenomena, investigators throughout science are considering chaos as a possible underlying model. Formulas have been developed to characterize chaotic behavior, in particular to encapsulate properties of strange attractors that represent long-term system dynamics. Recently it has become apparent that in many settings nonmathematicians are applying new "formulas" and algorithms to experimental time-series data prior to careful statistical examination... While mathematical analysis of known deterministic systems is an interesting and deep problem, blind application of algorithms is dangerous, particularly so here. Even for low-dimensional chaotic systems, a huge number of points are needed to achieve convergence in these dimension and entropy algorithms, though they are often applied with an insufficient number of points. [^18]

Pincus succinctly sums up many of the concerns of all of our prior references here, and, happily, assigns them to the problem that is generating those concerns. That is to say, Pincus specifically targets complex phenomena. 

Of course, the fact that complex phenomena are so difficult to work with in systematic representations is particularly vexing to human beings: we live and breath chaos; as living beings we effectively *are* chaos. We are staggeringly multidimensional, perhaps infinitely so. And thus many of the models we are most interested in---ways to make our society more healthy, or less violent, or more fair---are the most difficult of all to work with. Surely *all* of the models dealing with human behavior are wrong, to some degree. There is no way they could be right.

So, complex phenomena are, generally speaking, the reason that many wrong models are wrong. Out of the Pincus quotation as a whole, we might call out the phrase "even for low-dimensional chaotic systems." It might be noted that this phrase keenly recalls the specific quotation from Box and his gas equations! The difference here being that Pincus has called out a *specific class* of phenomena, for whose models we might especially worry. And then, quite agreeably, he goes on in his article to try to fix those models, at least a to some degree.


Is there a way we could communicate that modeling complex systems is dauntingly challenging, while at the same time not throwing entire wings of the scientific community under the wheels of sweeping scientific budget cuts?

Volker Grimm and Steven Railsback are well-established academics working at the intersection of ecological theory and what they call "individual-based modeling." 

In their seminal textbook, *Individual-based Modeling and Ecology*, Grimm and Railsback iterate through the myriad challenges to be overcome by natural system models. They list errors in measurement, in assumptions, in effects fo time and space, and even (or especially) software errors, among many others, as hurdles to be cleared along the way. While they are clear-eyed, to say the least, about the effort required to develop faithful representations of the ecological world, they take a different approach in communicating this:

> We believe strongly that both ecological theory and environmental management benefit when theory development is closely tied to applied ecology. Obviously, theories that have been tested and shown valid can contribute to better manage ment of the systems they address... When we closely link theory to management applications, we force our theory to solve specific problems; finding theory that solves specific problems is a productive path toward finding theory that is general in the sense of solving many problems.[^14]

Grimm and Railsback provide foundational principles, in the form of their "Pattern-Oriented Modelling" approach, that are designed specifically to combat the issues hampering model-realism. Rather than dismissing simplifications as wrong, they attempt to guide practioners to *methodological simplification*, offering formal guidelines to achieve the "parsimony" advocated for by Box. 

In a later publication, Grimm and Railsback synopsize the Cox commentary to some degree, writing, "The goal is to produce ‘structurally realistic’ models that capture, in a simple yet useful way, the system’s generative mechanisms." [^15]

Here, they emphasize the value of models (at least for ecology) as potentially combinatorial, with "simple useful" models interacting with other models to solidify generalizable patterns of knowledge that can be applied more broadly, or even modeled for more selectively. In this way, even directly counterfactual simulations can be used to "bridge" to more solid theoretical grounds in other ecological (or biological, or social, etc.,) systems.

Grimm and Railsback are hardly alone in their efforts, and they have since been followed by cohorts of scientists rapidly advancing the state of the science, all the while assessing "correctness," reviewing usefulness, and communicating constructively about the advancing state of each.

<p align="center">. . .</p>

"All models are wrong but some are useful" is still evocative, pithy as ever, but perhaps no longer entirely useful in 2025. Box *wasn't* wrong[^], and the amount of constructive academic discourse the saying generated must surely count, over the years, as useful. 

Today, though, there are other, clearer ways to say the things that Box was trying to communicate, and those other ways would have the added advantage of avoiding discrediting science at a time when disinformation is, to put it mildly, problematic.

Not all models are wrong, many models are helpful, and making even better models is a great idea. Nearly as great idea, in fact, as financial supports for fundamental science.

---

### Notes

[^1] George Box. Various sources and transctriptions. ca. 1976-1979.

[^2] [Wikipedia](https://en.wikipedia.org/wiki/All_models_are_wrong).

[^3] Wikipedia calls the saying an aphorism, which seems appropriate for a statement so apparently widely used and widely poorly-sourced.

[^4] Sourcing the print origin of the quotation is complicated by the fact that Box first printed *part* of it in his earlier article, *Science and Statistics*, in 1976 (emphasis mine): 

> Parsimony ... Since **all models are wrong** the scientist cannot obtain a "correct" one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.

One could argue that this is the superior introduction of the concept! But, alas, it does not contain the entirety of the famous quotation.

[^5] It is perhaps worth noting that the model that Box uses in his example is itself not complete. The Ideal Gas Law itself includes an important additional factor, $n$, which hails from Avogadro's law and represents the "number of Moles" of the substance being measured (see: LibreTexts, 2023). So, the quote "a physical view of the behavior of gas molecules" is somewhat undercut by their complete abscence from the formula Box retells. Perhaps by providing a wrong model, Box was cleverly reinforcing his point.

[^6] Box (1979), p. 204.

[^7] Cox (1995), p. 420.

[^8] A turn of phrase that might be seen as fortunate, or unfortunate, depending upon the perspective of the reader. Given the times and the publication-related factors, it seems likely true, nonetheless.

[^9] Stahler *et al.* (2013). p. 222.

[^10] Bolker *et al.* (2009), p. 127.

[^11] Stahler *et al.* (2013). p. 227. Includes a delightfully comprehensive data outlay.

[^12] Stahler *et al.* (2013) p. 232.

[^12] Google Scholar. The confirming works were retrieved from the website [scite.ai](scite.ai) in 2025. They are Clement 2024 and Cassidy 2017.

[^13] Google Scholar. The confirming works are Clement *et al.* (2024) and Cassidy *et al.* (2017).

[^14] Grimm & Railsback (2005), p. 8.

[^15] Grimm & Railsback (2012), p. 302.

[^16] Except that, by his own definition, he was.

[^17] Weiner (1948)

### References

Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J. S. S. (2009). Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology & Evolution, 24(3), 127-135.

Box, G. E. P. (1976). Science and Statistics. Journal of the American Statistical Association, 71(356), 791-799.

Box, G. E. P. (1979). Robustness in the Strategy of Scientific Model Building. In R. L. Launer & G. N. Wilkinson (Eds.), Robustness in Statistics (pp. 201-236). Academic Press.

Cassidy, K. A., Mech, L. D., MacNulty, D. R., *et al.* (2017). Sexually dimorphic aggression indicates male gray wolves specialize in pack defense against conspecific groups. Behavioural Processes.

Chatfield, C. (1995). Model Uncertainty, Data Mining and Statistical Inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), 158(3), 419-466.

Clement, M. A., Oakleaf, J. K., Heffelfinger, J. R., *et al.* (2024). An evaluation of potential inbreeding depression in wild Mexican wolves. Journal of Wildlife Management.

Cox, D. R. (1995). Discussion of "Model Uncertainty, Data Mining and Statistical Inference" by C. Chatfield. Journal of the Royal Statistical Society: Series A (Statistics in Society), 158(3), 419-466.

Feyerabend, P. (1975). Against Method: Outline of an Anarchistic Theory of Knowledge. New Left Books.

Grimm, V., & Railsback, S. F. (2005). Individual-based Modeling and Ecology. Princeton University Press.

Grimm, V., & Railsback, S. F. (2012). Pattern-oriented modelling: a 'multi-scope' for predictive systems ecology. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1586), 298-310.

Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.

LibreTexts. (2023). The Ideal Gas Law. In Chemistry LibreTexts. Retrieved March 16, 2025, from [The Ideal Gas Law](https://chem.libretexts.org/Bookshelves/Physical_and_Theoretical_Chemistry_Textbook_Maps/Supplemental_Modules_(Physical_and_Theoretical_Chemistry)/Physical_Properties_of_Matter/States_of_Matter/Properties_of_Gases/Gas_Laws/The_Ideal_Gas_Law).

Stahler, D. R., MacNulty, D. R., Wayne, R. K., vonHoldt, B., & Smith, D. W. (2013). The adaptive value of morphological, behavioural and life-history traits in reproductive female wolves. Journal of Animal Ecology, 82(1), 222-234.

Wiener, N. (1948). Cybernetics: Or Control and Communication in the Animal and the Machine. MIT Press.