Date | Title | Speaker | Type |
---|---|---|---|
12/12/2020 | Practical Limitations for Deep Learning in Healthcare | Andrew Ng | Workshop (ML4Health) |
- Big gap between research and production setting where we can actually help patients
- F.e. a paper from Andrew NG "DL can achieve radiologist performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies -> ML sometimes does as well as Stanford level Radiologists
- This does not stay exclusive to this kind of healthcare, we have had succesful (un)supervised ML research in the areas of Brain Aneurysms, Appendicitis, Liver cancer etc. - in about half of them we match or surpass human-level performance (diagnostic accuracy)
- If there are so many papers by so many groups showing that there are learning algorithms that do better than doctors - Why arent these systems widely deployed in hospitals yet?
- ML has a PoC to Production Gap problem: Bridge between Jupyter Notebook / research paper and a system in production is really, really large.
- Google Paper: ML is high-interest stratichot of technical debt
- This is the gap between a ML Model (something that has learned a mapping) and a ML System (ML Model in production that actually helps patients)
- Small Data issues
- Robustness & Generalization
- Safety & Regulation
- Machine Learning works well when dataset looks uniform (as opposed to long-tailed/ skewed data)
- F.e. lets zoom in to the claim: "Radiologist-level performance on 11 pathologies and not on 3"
- In pathologies where there were about 11000 examples, ML algo did as well as the Radiologist
- However, in pathologies where there are 110 examples, ML does significantly worse
- A ML Model can do very badly on rare classes and achieve very good average performance -> This is not OK for deployment
- Easy for ML: Diagnose pneumonia given 10.000 labeled images
- Hard for ML: Diagnose a rare condition from 10 images of a medical textbook explaining that condition
- Academia often aspires to beat human-level performance on benchmarks -> It can play a large role in establishing a baseline or to perform error analysis
- But for practical systems: if HLP is low (two doctors agree only 60% of the time), consider instead if there is a way to label more consistently -> Instead of saying "Lets beat this easy HLP!", help solving the HLP first -> This also gives you better data! Making your ML better. If you have little amounts of data, then data labeling accuracy is absolutely critical!
- Data Augmentation: Set up creative automated pipelines for data augmentation
- Data Synthesis
- GANs
- Pretraining
- True for many industries, but also in healthcare
- Slightly different system in different hospital can lead to large performance drop
- The standards we need to expect an ML system to hold have not been established yet
- Patient safety is paramount, How do we make sure an ML algorithm does not cause harm
- Clear processes to audit an ML system to see if it is meeting that standard
- Doctors make few palliative care referral -> because this feels like "giving up on a patient"
- Volume of patients is just too much
- Probability of people dieing to see which could be considered fo r palliative care
- One of the things we learned: "Who are you to decide my patient will die" -> Spend enough time on change management
- Once the doctors saw the explanation/visualization to see that the results are plausible
- Scoping: Decide on a problem to solve -> Requires cross-functional brain-storming
- Data: Acquire data to model
- Modeling: Build/Train AI Models -> In AI research, we often keep training data fixed and tune hyperparams in algos. However, often IRL it is useful to keep algo fixed and change the training data and hyperparams
- Deployment: Run in production to create value -> Deploy fast and iterate and improve
- Some directions for full-cycle of ML:
- Data editing and data versioning tools
- Performance Auditing
- Safety Monitoring and Regulatory Compliance
- MLOps Tools for healthcare