Interpretable-AI readings and resources

-- In Progress --

I will be updatting this more often as I read more papers and related readings along my research

Readings and Articles

Research Papers

Explaining Neural Networks by Decoding Layer Activations
InterpNET: Neural Introspection for Interpretable Deep Learning
- And it's publicly available code
What do Deep Networks Like to See?
Towards A Rigorous Science of Interpretable Machine Learning
- (Finale Doshi-Velez et al) discusses the field of interpretability research, and how it can be made more rigorous and well-defined. The authors first highlight the problem of defining interpretability in the first place - they don't have a resolution to this problem, but suggest that we can think of interpretability in terms of what it's used for. They claim that interpretability is used for confirming other important desiderata in ML systems, which stem from an incompleteness in the problem formalization. For example, if we want a system to be unbiased but aren't able to formally specify this in the reward function, or the reward we're optimising for is only a proxy of the true reward, then we could use interpretability to inspect our model and see whether it's reasoning how we want it to.
- The authors next move on to discussing how we can evaluate interpretability methods, providing a taxonomy of different evaluation methods: Application-grounded is when the method is evaluated in the context it will actually be used in, by real humans (i.e. doctors getting explanations for AI diagnoses); Human-grounded is about conducting simpler human-subject experiments (who are perhaps not domain experts) using possibly simpler tasks than what the intended purpose of the method is; Functionally-grounded is where no humans are involved in the experiments, and instead some formal notion of interpretability is measured for the method to evaluate its quality. Each of these evaluation methods can be used in different circumstances, depending on the method and the context it will be used in.
- Finally, the authors propose a data-driven approach to understanding the factors which are important in interpretability. They propose to try and create a dataset of applications of machine learning models to tasks, and then analyse this dataset to find important factors. They list some possible task- and method- related factors, and then conclude with recommendations to researchers doing interpretability.
Visualizing and Understanding Convolutional Networks - Also, in Cousera CNN > Week-4

Books

Interpretable ML-A Guide for Making Black Box Models Explainable-Christoph Molnar

Projects

Cool Tools

Play with a Neural Network

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Interpretable-AI readings and resources

Readings and Articles

Research Papers

Books

Projects

Cool Tools

About

Releases

Packages

jaygshah/Interpretable-AI

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Interpretable-AI readings and resources

Readings and Articles

Research Papers

Books

Projects

Cool Tools

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages