In this proposed Matters Arising contribution, Shah and Innig provide critical commentary on the paper "Deep learning aftershock patterns following large earthquakes", authored by Devries et al. and published in Nature in 2018. While I think that Shah and Innig raise make several valid and interesting points, I do not endorse publication of the comment-and-reply in Matters Arising. I will explain my reasoning for this decision in more detail below, but the upshot of my thinking is that (1) I do not feel that the central results of the study are compromised in any way, and (2) I am not convinced that the commentary is of interest to audience of non-specialists (that is, non machine learning practicioners).
Shah and Innig's comment (and Devries and Meade's response) centers on three main points of contention: (1) the notion of data leakage, (2) learning curve usage, and (3) the choice of deep learning approach in lieu of a simpler machine learning method. Point (1) is related to the partitioning of earthquakes into training and testing datasets. In the ideal world, these datasets should be completely independent, such that the latter constitutes a truly fair test of the trained model's performance on data that it has never seen before. Shah and Innig note that some of the ruptures in the training dataset are nearly collocated in space and time with ruptures in the testing dataset, and thus a subset of aftershocks are shared mutually. This certainly sets up the potential for information to transfer from the training to testing datasets (violating the desired independence described above), and it would be better if the authors had implemented grouping or pooling to safeguard against this risk. However, I find Devries and Meade's rebuttal to the point to be compelling, and would further posit that the potential data leakage between nearby ruptures is a somewhat rare occurrence that should not modify the main results significantly.
Shah and Innig's points (2) and (3) are both related, and while they are interesting to me, they are not salient to the central focus of the paper. It is neat (and perhaps worth noting in a supplement), that the trainable parameters in the neural network, the network biases and weights, can be adequately trained using a small batch of the full dataset. Unfortunately, this insight from the proposed learning curve scheme would likely shoot over the heads of the 95% of the general Nature audience that are unfamiliar with the mechanics of neural networks and how they are trained. Likewise, most readers wouldn't have the foggiest notion of what a Random Forest is, nor how it differs from a deep neural network, nor why it is considered simpler and more transparent. The purpose of the paper (to my understanding) was not to provide a benchmark machine learning algorithm so that future groups could apply more advanced techniques (GANs, Variational Autoencoders, etc.) to boost AUC performance by 5%. Instead, the paper showed that a relatively simple, but purely data-driven approach could predict aftershock locations better than Coulomb stress (the metric used in most studies to date) and also identify stress-based proxies (max shear stress, von Mises stress) that have physical significance and are better predictors than the classical Coulomb stress. In this way, the deep learning algorithm was used as a tool to remove our human bias toward the Coulomb stress criterion, which has been ingrained in our psyche by more than 20 years of published literature.
To summarize: regarding point (1), I wish the Devries et al. study had controlled for potential data leakage, but do not feel that the main results of the paper are compromised by doing so. As for point (2), I think it is interesting (though not surprising) that the neural network only needs a small batch of data to be adequately trained, but this is certainly a minor point of contention, relative to the key takeaways of the paper, which Shah and Innig may have missed. Point (3) follows more or less directly from (2), and it is intuitive that a simpler and more transparent machine learning algorithm (like a Random Forest) would give comparable performance to a deep neural network. Again, it would have been nice to have noted in the manuscript that the main insights could have been derived from a different machine learning approach, but this detail is of more interest to a data science or machine learning specialist than to a general Nature audience. I think the disconnect between the Shah and Innig and Devries et al. is a matter of perspective. Shah and Innig are concerned primarily with machine learning best practices methodology, and with formulating the problem as "Kaggle"-like machine learning challenge with proper benchmarking. Devries et al. are concerned primarily with using machine learning as tool to extract insight into the natural world, and not with details of the algorithm design.