Practical Limitations of Today's Deep Learning in Healthcare

Date	Title	Speaker	Type
12/12/2020	Practical Limitations for Deep Learning in Healthcare	Andrew Ng	Workshop (ML4Health)

Introduction

Big gap between research and production setting where we can actually help patients
F.e. a paper from Andrew NG "DL can achieve radiologist performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies -> ML sometimes does as well as Stanford level Radiologists
This does not stay exclusive to this kind of healthcare, we have had succesful (un)supervised ML research in the areas of Brain Aneurysms, Appendicitis, Liver cancer etc. - in about half of them we match or surpass human-level performance (diagnostic accuracy)
If there are so many papers by so many groups showing that there are learning algorithms that do better than doctors - Why arent these systems widely deployed in hospitals yet?
ML has a PoC to Production Gap problem: Bridge between Jupyter Notebook / research paper and a system in production is really, really large.

ML model vs. ML system

Google Paper: ML is high-interest stratichot of technical debt
This is the gap between a ML Model (something that has learned a mapping) and a ML System (ML Model in production that actually helps patients)

Pratical Challenges of ML Systems

Small Data issues
Robustness & Generalization
Safety & Regulation

Small data

Machine Learning works well when dataset looks uniform (as opposed to long-tailed/ skewed data)
F.e. lets zoom in to the claim: "Radiologist-level performance on 11 pathologies and not on 3"
- In pathologies where there were about 11000 examples, ML algo did as well as the Radiologist
- However, in pathologies where there are 110 examples, ML does significantly worse
- A ML Model can do very badly on rare classes and achieve very good average performance -> This is not OK for deployment

Humans vs ML

Easy for ML: Diagnose pneumonia given 10.000 labeled images
Hard for ML: Diagnose a rare condition from 10 images of a medical textbook explaining that condition

Time to rethink human-level performance (HLP)

Academia often aspires to beat human-level performance on benchmarks -> It can play a large role in establishing a baseline or to perform error analysis
But for practical systems: if HLP is low (two doctors agree only 60% of the time), consider instead if there is a way to label more consistently -> Instead of saying "Lets beat this easy HLP!", help solving the HLP first -> This also gives you better data! Making your ML better. If you have little amounts of data, then data labeling accuracy is absolutely critical!

Small Data Solutions

Data Augmentation: Set up creative automated pipelines for data augmentation
Data Synthesis
GANs
Pretraining

Robustness & Generalization

True for many industries, but also in healthcare
Slightly different system in different hospital can lead to large performance drop

Safety & Regulations

The standards we need to expect an ML system to hold have not been established yet
Patient safety is paramount, How do we make sure an ML algorithm does not cause harm
Clear processes to audit an ML system to see if it is meeting that standard

Palliative (end of life) care

Doctors make few palliative care referral -> because this feels like "giving up on a patient"
Volume of patients is just too much
Probability of people dieing to see which could be considered fo r palliative care
One of the things we learned: "Who are you to decide my patient will die" -> Spend enough time on change management
Once the doctors saw the explanation/visualization to see that the results are plausible

Full cycle of an ML Project

Scoping: Decide on a problem to solve -> Requires cross-functional brain-storming
Data: Acquire data to model
Modeling: Build/Train AI Models -> In AI research, we often keep training data fixed and tune hyperparams in algos. However, often IRL it is useful to keep algo fixed and change the training data and hyperparams
Deployment: Run in production to create value -> Deploy fast and iterate and improve

ML as a systematic engineering discipline

Some directions for full-cycle of ML:
- Data editing and data versioning tools
- Performance Auditing
- Safety Monitoring and Regulatory Compliance
- MLOps Tools for healthcare

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepLearningForMedicine.md

DeepLearningForMedicine.md

Practical Limitations of Today's Deep Learning in Healthcare

Introduction

ML model vs. ML system

Pratical Challenges of ML Systems

Small data

Humans vs ML

Time to rethink human-level performance (HLP)

Small Data Solutions

Robustness & Generalization

Safety & Regulations

Palliative (end of life) care

Full cycle of an ML Project

ML as a systematic engineering discipline

Files

DeepLearningForMedicine.md

Latest commit

History

DeepLearningForMedicine.md

File metadata and controls

Practical Limitations of Today's Deep Learning in Healthcare

Introduction

ML model vs. ML system

Pratical Challenges of ML Systems

Small data

Humans vs ML

Time to rethink human-level performance (HLP)

Small Data Solutions

Robustness & Generalization

Safety & Regulations

Palliative (end of life) care

Full cycle of an ML Project

ML as a systematic engineering discipline