# Stand Up for Best Practices

![img](https://cdn-images-1.medium.com/max/1600/1*jL9fT-oAR6Ki3HOvXpwMLQ.png)
Source: Yuriy Guts selection from Shutterstock

### **Stand Up for Best Practices:**

### **Misuse of Deep Learning in Nature’s Earthquake Aftershock Paper**

### The Dangers of Machine Learning Hype

Practitioners of AI, machine learning, predictive modeling, and data
science have grown enormously over the last few years. What was once a
niche field defined by its blend of knowledge is becoming a rapidly
growing profession. As the excitement around AI continues to grow, the
new wave of ML augmentation, automation, and GUI tools will lead to even
more growth in the number of people trying to build predictive models.

But here’s the rub: While it becomes easier to use the tools of
predictive modeling, predictive modeling knowledge is not yet a
widespread commodity. Errors can be counterintuitive and subtle, and
they can easily lead you to the wrong conclusions if you’re not careful.

I’m a data scientist who works with dozens of expert data science teams
for a living. In my day job, I see these teams striving to build
high-quality models. The best teams work together to review their models
to detect problems. There are many hard-to-detect-ways that lead to
problematic models (say, by allowing [target
leakage](https://www.datarobot.com/wiki/target-leakage/) into their
training data).

Identifying issues is not fun. This requires admitting that exciting
results are “too good to be true” or that their methods were not the
right approach. In other words, **it’s less about the sexy data science
hype that gets headlines and more about a rigorous scientific
discipline**.

### Bad Methods Create Bad Results

Almost a year ago, I read an article in Nature that claimed
unprecedented accuracy in [predicting earthquake aftershocks by using
deep learning](https://www.nature.com/articles/s41586-018-0438-y).
Reading the article, my internal radar became deeply suspicious of their
results. **Their methods simply didn’t carry many of the hallmarks of
careful predicting modeling.**

I started to dig deeper. In the meantime, this article blew up and
became [widely
recognized](https://blog.google/technology/ai/forecasting-earthquake-aftershock-locations-ai-assisted-science/)!
It was even included in the [release notes for
Tensorflow](https://medium.com/tensorflow/whats-coming-in-tensorflow-2-0-d3663832e9b8)
as an example of what deep learning could do. However, in my digging, I
found major flaws in the paper. Namely, data leakage which leads to
unrealistic accuracy scores and a lack of attention to model selection
(you don’t build a 6 layer neural network when a simpler model provides
the same level of accuracy).

![img](https://cdn-images-1.medium.com/max/1600/1*CPPVFzHd4GXlBSI4EILWZw.png)The
testing dataset had a much higher AUC than the training set . . . this
is not normal

To my earlier point: these are subtle, **but incredibly basic**
predictive modeling errors that can invalidate the entire results of an
experiment. Data scientists are trained to recognize and avoid these
issues in their work. I assumed that this was simply overlooked by the
author, so I contacted her and let her know so that she could improve
her analysis. Although we had previously communicated, she did not
respond to my email over concerns with the paper.

### Falling On Deaf Ears

So, what was I to do? My coworkers told me to just tweet it and let it
go, but I wanted to stand up for good modeling practices. I thought
reason and best practices would prevail, so I started a 6-month process
of writing up my results and shared them with Nature.

Upon sharing my results, I received a note from Nature in January 2019
that despite serious concerns about data leakage and model selection
that invalidate their experiment, they saw no need to correct the
errors, because “**Devries et al. are concerned primarily with using
machine learning as \[a\] tool to extract insight into the natural
world, and not with details of the algorithm design**”. The authors
provided a much harsher response.

You can read the entire exchange [on my
github](https://github.com/rajshah4/aftershocks_issues).

It’s not enough to say that I was disappointed. This was a major paper
(*it’s Nature!*) that bought into AI hype and published a paper despite
it using flawed methods.

Then, just this week, I ran across [articles by Arnaud Mignan and Marco
Broccardo](https://link.springer.com/chapter/10.1007/978-3-030-20521-8_1)
on [shortcomings](https://arxiv.org/abs/1904.01983) that they found in
the aftershocks article. Here are two more data scientists with
expertise in earthquake analysis who also noticed flaws in the paper. I
also have placed my analysis and reproducible code [on
github](https://github.com/rajshah4/aftershocks_issues).

![img](https://cdn-images-1.medium.com/max/1600/1*Op19T2cR7gG60fbQLWS5cA.png)Go
run the analysis yourself and see the issue

### Standing Up For Predictive Modeling Methods

I want to make it clear: my goal is not to villainize the authors of the
aftershocks paper. I don’t believe that they were malicious, and I think
that they would argue their goal was to just show how machine learning
could be applied to aftershocks. Devries is an accomplished earthquake
scientist who wanted to use the latest methods for her field of study
and found exciting results from it.

But here’s the problem: their insights and results were based on
fundamentally flawed methods. It’s not enough to say, “This isn’t a
machine learning paper, it’s an earthquake paper.” **If you use
predictive modeling, then the quality of your results are determined by
the quality of your modeling.** Your work becomes data science work, and
you are on the hook for your scientific rigor.

There is a huge appetite for papers that use the latest technologies and
approaches. It becomes very difficult to push back on these papers.

**But if we allow papers or projects with fundamental issues to advance,
it hurts all of us. It undermines the field of predictive modeling.**

Please push back on bad data science. Report bad findings to papers. And
if they don’t take action, go to twitter, post about it, share your
results and make noise. This type of collective action worked to raise
awareness of p-values and combat the epidemic of p-hacking. We need good
machine learning practices if we want our field to continue to grow and
maintain credibility.

**Acknowledgments:** I want to thank all the great data scientists at
[DataRobot](http://www.datarobot.com) that collaborated and supported me
this past year, a few of these include: Lukas Innig, Amanda Schierz,
Jett Oristaglio, Thomas Stearns, and Taylor Larkin.

**This article was orignally posted on
[Medium](https://towardsdatascience.com/stand-up-for-best-practices-8a8433d3e0e8)
and featured on
[Reddit](https://www.reddit.com/r/MachineLearning/comments/c4ylga/d_misuse_of_deep_learning_in_nature_journals/)**