Skip to content
Issues with Deep Learning of Aftershocks by DeVries
Jupyter Notebook Python
Branch: master
Clone or download
Latest commit 32ea347 Jun 24, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
correspondence reformatted correspondence May 7, 2019
.gitignore more clarification May 6, 2019
BuildTrainingTestingData.py
Calculate_ROC_and_AUC_NN.py initial Apr 15, 2019
Calculate_ROC_and_AUC_OtherQuantities.py initial Apr 15, 2019
EvalModel.py initial Apr 15, 2019
EvalModel2.py initial Apr 15, 2019
Exploratory Analysis.ipynb more clarification May 6, 2019
PlotThreeTestCases.py initial Apr 15, 2019
Plot_ROC_Curves.py initial Apr 15, 2019
README.md Update SciHub link Jun 24, 2019
RunNNPrediction_IdealizedOkadaCase.py initial Apr 15, 2019
TrainModel.py initial Apr 15, 2019
modelfunctions.py initial Apr 15, 2019
names_train_ids.csv initial Apr 15, 2019

README.md

aftershocks_issues

Issues with Deep Learning of Aftershocks by DeVries

This repo focuses on issues noted by me on by DeVries, et al., Deep learning of aftershock patterns following large Earthquakes or via sci-hub. This article has been widely used as a motivation for using deep learning, e.g., Tensorflow 2.0 release notes.

I raised concerns about target leakage and the suitability of the data science approach to both the author and Nature. Nature reviewed my concerns and decided not to act. You can view the detail of this communication in the correspondence folder.

The repo here demonstrates the issues I noted. The repo is a clone of the original analysis. To understand the issues, work through the notebook, Exploratory Analysis. To run these, you will need the data, which is available at on google drive. You may also want to see how the original test/train splits were conducted at DeVries processing repo.

To run the notebook, the code is using Python 3 and you must first download the data and put it in an adjoining folder to the repo.

The notebook has four sections:

  1. Replicating the results in the paper
  2. Replicating the results in the paper, but showing the results on both test and train. Puzzingly, the scores for the test set are higher than the train set.
  3. Replicating similar results using only 1500 rows of data with 2 epochs (The original paper used 4.7 million rows of data).
  4. One source of potential leakage in how test/train is constructed

I want to thank Lukas Innig and Shubham Cheema for their assistance, as well as all the great data scientists at DataRobot which supported me through this process.

Recently, I found papers by Arnaud Mignan and Marco Broccardo that identify issues in the aftershocks paper, see: One neuron is more informative than a deep neural network for aftershock pattern forecasting, Arxiv and A Deeper Look into ‘Deep Learning of Aftershock Patterns Following Large Earthquakes’: Illustrating First Principles in Neural Network Physical Interpretability - Springer

You can’t perform that action at this time.