Skip to content

fregu856/papers

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

README.md

About

I categorize, annotate and write comments for all research papers I read (320+ papers since 2018).

In June 2023, I wrote the blog post The How and Why of Reading 300 Papers in 5 Years (why I think it’s important to read a lot of papers + how I organize my reading + paper statistics + a list of 30 particularly interesting papers).

Categories:

[Uncertainty Estimation], [Ensembling], [Stochastic Gradient MCMC], [Variational Inference], [Out-of-Distribution Detection], [Theoretical Properties of Deep Learning], [VAEs], [Normalizing Flows], [ML for Medicine/Healthcare], [Object Detection], [3D Object Detection], [3D Multi-Object Tracking], [3D Human Pose Estimation], [Visual Tracking], [Sequence Modeling], [Reinforcement Learning], [Energy-Based Models], [Neural Processes], [Neural ODEs], [Transformers], [Implicit Neural Representations], [Distribution Shifts], [ML & Ethics], [Diffusion Models], [Graph Neural Networks], [Selective Prediction], [NLP], [Representation Learning], [Vision-Language Models], [Image Restoration], [Miscellaneous].

Papers:

Papers Read in 2023:

[23-03-14] [paper322]
Well-written and interesting paper. Figure 1 is really interesting. Their proposed method makes intuitive sense, and it seems to consistently improve the regression accuracy. 
[23-09-13] [paper321]
  • Adding Conditional Control to Text-to-Image Diffusion Models [pdf] [annotated pdf]
  • 2023-02
  • [Diffusion Models], [Vision-Language Models]
Well-written and quite interesting paper. The "sudden convergence phenomenon" in Figure 4 seems odd. The results in Figure 11 are actually very cool.
[23-08-23] [paper320]
  • Random Word Data Augmentation with CLIP for Zero-Shot Anomaly Detection [pdf] [annotated pdf]
  • BMVC 2023
  • [Vision-Language Models]
Interesting and well-written paper, I enjoyed reading it (even though I really don't like the BMVC template). The proposed method in Section 3 is clever/neat/interesting.
[23-08-23] [paper319]
  • TextIR: A Simple Framework for Text-based Editable Image Restoration [pdf] [annotated pdf]
  • 2023-02
  • [Vision-Language Models], [Image Restoration]
Quite interesting and well-written paper. The idea in Section 3.1 is interesting/neat. The results in Figure 6 - 8 are quite interesting.
[23-08-22] [paper318]
  • Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model [pdf] [annotated pdf]
  • CVPR 2022
  • [Vision-Language Models], [Object Detection]
Quite interesting and quite well-written paper. They basically improve the "Open-vocabulary Object Detection via Vision and Language Knowledge Distillation" paper by using learnable prompts. Section 4.1 gives a pretty good background.
[23-08-21] [paper317]
  • Open-vocabulary Object Detection via Vision and Language Knowledge Distillation [pdf] [annotated pdf]
  • ICLR 2022
  • [Vision-Language Models], [Object Detection]
Quite well-written and fairly interesting paper. The simple (but slow) baseline in Section 3.2 makes sense, but then I struggled to properly understand the proposed method in Section 3.3. I might lack some required background knowledge.
[23-08-21] [paper316]
  • All-In-One Image Restoration for Unknown Corruption [pdf] [annotated pdf]
  • CVPR 2022
  • [Image Restoration]
Quite well-written and fairly interesting paper. Did not take very long to read. The general idea of the "Contrastive-Based Degradation Encoder" makes sense.
[23-08-18] [paper315]
  • ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration [pdf] [annotated pdf]
  • 2023-06
  • [Image Restoration]
Quite interesting and well-written paper. If I understand everything correctly, they need a user to select the correct task-specific visual prompt at test-time. I.e., the user needs to specify if a given input image is an image for denoising, low-light enhancement, deraining or deblurring. This seems like a quite significant limitation to me. Would like to have a model that, after being trained on restoration task 1, 2, ..., N, can restore a given image without any user input, for images from all N tasks.
[23-08-17] [paper314]
  • PromptIR: Prompting for All-in-One Blind Image Restoration [pdf] [annotated pdf]
  • NeurIPS 2023
  • [Image Restoration]
Well-written and quite interesting paper. They describe their overall method well in Section 3. I was not familiar with prompt-learning, but I think they did a good jobb explaining it.
[23-08-17] [paper313]
  • InstructPix2Pix: Learning to Follow Image Editing Instructions [pdf] [annotated pdf]
  • CVPR 2023
  • [Diffusion Models], [Vision-Language Models]
Well-written and quite interesting paper. The method is conceptually simple and makes intuitive sense. Definitely impressive visual results (I'm especially impressed by Figure 7 and the right part of Figure 17). Figure 14 is important, interesting to see such a clear example of gender bias in the data being reflected in the model.
[23-09-23] [paper312]
  • Machine learning: Trends, Perspectives, and Prospects [pdf] [annotated pdf]
  • Science, 2015
  • [Miscellaneous]
Well-written paper. I read it for my thesis writing, wanted to see some basic definitions of machine learning, I quite liked it.
[23-09-19] [paper311]
  • Blinded, Randomized Trial of Sonographer versus AI Cardiac Function Assessment [pdf] [annotated pdf]
  • Nature, 2023
  • [ML for Medicine/Healthcare]
Well-written and interesting paper. It seems like I quite enjoy reading these types of papers.
[23-09-19] [paper310]
  • Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style [pdf] [annotated pdf]
  • NeurIPS 2021
  • [Representation Learning]
Quite well-written and somewhat interesting paper. I struggled to properly understand quite large parts of it, probably because I lack some background knowledge. It's not clear to me what the main takeaway / main practical implication of this paper is.
[23-09-14] [paper309]
  • Artificial Intelligence-Supported Screen Reading versus Standard Double Reading in the Mammography Screening with Artificial Intelligence Trial (MASAI): A Clinical Safety Analysis of a Randomised, Controlled, Non-inferiority, Single-Blinded, Screening Accuracy Study [pdf] [unfortunately not open access, thus no annotated pdf]
  • The Lancet Oncology, 2023
  • [ML for Medicine/Healthcare]
Very very similar to the "Artificial Intelligence for Breast Cancer Detection in Screening Mammography in Sweden: A Prospective, Population-Based, Paired-Reader, Non-inferiority Study" paper, also well written and very interesting (and, it probably has the longest title of any paper I have ever read).
[23-09-14] [paper308]
  • Efficient Formal Safety Analysis of Neural Networks [pdf] [annotated pdf]
  • NeurIPS 2018
  • [Theoretical Properties of Deep Learning]
Well-written and quite interesting paper. I didn't entirely follow all details, and also find it difficult to know exactly how to interpret the results. I lack some background knowledge. I'm still not quite sure what a method like this actually could be used for in practice, how useful it actually would be for someone like me. Reading the paper made me think quite a lot though, which is a good thing.
[23-09-14] [paper307]
  • Artificial Intelligence for Breast Cancer Detection in Screening Mammography in Sweden: A Prospective, Population-Based, Paired-Reader, Non-inferiority Study [pdf] [annotated pdf]
  • The Lancet Digital Health, 2023
  • [ML for Medicine/Healthcare]
Well-written and very interesting paper. A bit different compared to the ML papers I usually read of course, but different in a good way. Definitely an impressive study with ~50 000 participants, and an ML system integrated into the standard mammography screening workflow at a hospital. The entire Discussion section is interesting.
[23-08-31] [paper306]
  • A Law of Data Separation in Deep Learning [pdf] [annotated pdf]
  • PNAS, 2023
  • [Theoretical Properties of Deep Learning]
Quite well-written and fairly interesting paper. Interesting up until the end of Section 1, but then I got a bit confused and less impressed/convinced. Difficult for me to judge how general these findings actually are, or how useful they would be in practice. 
[23-08-24] [paper305]
  • Loss Landscapes are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent [pdf] [annotated pdf]
  • ICLR 2023
  • [Theoretical Properties of Deep Learning]
 Interesting and quite well-written paper. I really liked the paper up until and including Section 4.1, but then I got less impressed. I found the experiments a bit confusing overall, and not entirely convincing. The paper structure is also a bit odd after Section 4 (why is Section 5 a separate section? Section 6 seems sort of out-of-place).
[23-08-15] [paper304]
Interesting and very well-written paper. Cool applications from biology, although I definitely don't understand them fully. Don't quite understand how they get to the loss function in eq. (7) (match the terms in (3) and (6), yes, but why should this then be minimized?). 
[23-06-14] [paper303]
  • Transport with Support: Data-Conditional Diffusion Bridges [pdf] [annotated pdf]
  • 2023-01
  • [Diffusion Models]
Well written and quite interesting paper. Not exactly what I had expected, my background knowledge was probably not sufficient to fully understand and appreciate this paper. I still enjoyed reading it though. Neat figures and examples.
[23-06-09] [paper302]
  • An Overlooked Key to Excellence in Research: A Longitudinal Cohort Study on the Association Between the Psycho-Social Work Environment and Research Performance [pdf] [annotated pdf]
  • Studies in Higher Education, 2021
  • [Miscellaneous]
Quite well written and interesting paper. I wanted to read something completely different compared to what I usually read. This paper was mentioned in a lecture I attended and seemed quite interesting. I don't regret reading it. I have never heard of "SEM analysis", thus it's difficult for me to judge how significant the results are. I think one can quite safely conclude that a good psycho-social work environment positively impacts research performance/excellence, but it's probably difficult to say ~how~ big this impact actually is. And, how big this impact is compared to various other factors. Either way, I quite enjoyed reading the paper.
[23-06-03] [paper301]
  • Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models [pdf] [annotated pdf]
  • 2023-05
  • [Out-of-Distribution Detection], [Vision-Language Models]
Well written and interesting paper. Section 2.1 provides a good background, and their proposed OOD scores in Section 2.2 make intuitive sense. The datasets and evaluation setup in Section 3 are described well. The experimental results definitely seem promising.
[23-06-02] [paper300]
  • Benchmarking Common Uncertainty Estimation Methods with Histopathological Images under Domain Shift and Label Noise [pdf] [annotated pdf]
  • 2023-01
  • [Uncertainty Estimation], [ML for Medicine/Healthcare]
Well written and fairly interesting paper. The setup with ID/OOD data (different clinics and scanners), as described in Section 3.1, is really neat. Solid evaluation. I was not overly surprised by the results/findings. Figure 3 is neat.
[23-06-02] [paper299]
  • Mechanism of Feature Learning in Deep Fully Connected Networks and Kernel Machines that Recursively Learn Features [pdf] [annotated pdf]
  • 2022-12
  • [Theoretical Properties of Deep Learning]
Well written and quite interesting paper. Not my main area expertise, and I would have needed to read it again to properly understand everything. Certain things seem potentially interesting, especially Section 2.1 and 2.2, but I struggle a bit to formulate one main takeaway.
[23-05-31] [paper298]
  • Simplified State Space Layers for Sequence Modeling [pdf] [annotated pdf]
  • ICLR 2023
  • [Sequence Modeling]
Well written and quite interesting paper (although not my main area of interest). Did not follow all details in Section 3 and 4.
[23-05-27] [paper297]
  • CARD: Classification and Regression Diffusion Models [pdf] [annotated pdf]
  • NeurIPS 2022
  • [Diffusion Models], [Uncertainty Estimation]
Quite well written and somewhat interesting paper. I focused mainly on the regression part, I found the classification part a bit confusing. For regression they just illustrate their method on 1D toy examples, without any baseline comparisons, and then evaluate on the UCI regression benchmark. Also, they don't compare with other simple models which can handle multi-modal p(y|x) distributions, e.g. GMMs, normalizing flows or EBMs.
[23-05-27] [paper296]
  • Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration [pdf] [annotated pdf]
  • 2023-03
  • [Diffusion Models]
Interesting paper. Quite a few small typos, but overall well written. The approach becomes very similar to our paper "Image Restoration with Mean-Reverting Stochastic Differential Equations". The basic idea, training a normal regression model but letting it predict iteratively, makes intuitive sense. Figure 3 is interesting, with the trade-off between perceptual and distortion metrics, that the number of steps controls this trade-off. Figure 5 is also interesting, that adding noise (epsilon > 0) is crucial for improved perceptual metrics here. However, I don't quite understand why adding noise is beneficial for super-resolution and JPEG restoration, but not for motion/defocus deblurring? Is there some fundamental difference between those tasks?
[23-05-25] [paper295]
Well written and interesting paper. Reading it raised a few questions though. It is not quite clear to me why the moving average technique is needed during training ("the EMA update and 'stopgrad' operator in Eq. (8) can greatly stabilize the training process", why is the training unstable without it?). Algo 1 also seems somewhat heuristic? And in Figure 4 it seems like while doing 2 steps instead of 1 step improves the sample quality significantly, doing 4 steps gives basically no additional performance gain? I was expecting to see the CD sample quality to converge towards that of the original diffusion model as the number of steps increases, but here a quite significant gap seems to remain?
[23-05-12] [paper294]
  • Collaborative Strategies for Deploying Artificial Intelligence to Complement Physician Diagnoses of Acute Respiratory Distress Syndrome [pdf] [annotated pdf]
  • npj Digital Medicine, 2023
  • [ML for Medicine/Healthcare]
Well written and quite interesting paper. A bit different (in a good way) compared to the pure ML papers I usually read. "It could communicate alerts to the respiratory therapist or nurses without significant physician oversight, only deferring to the physician in situations where the AI model has high uncertainty. This may be particularly helpful in low-resource settings, such as Intensive Care Units (ICU) without 24-hour access to critical care trained physicians", this would require that the model actually is well calibrated though (that you really can trust the model's uncertainty), and I'm not convinced that can be expected in many practical applications.
[23-05-03] [paper293]
Well-written and interesting paper. The overall approach becomes very similar to our paper "Image Restoration with Mean-Reverting Stochastic Differential Equations" (concurrent work) it seems, and I find it quite difficult to see what the main qualitative differences actually would be in practice. Would be interesting to compare the restoration performance. I didn't fully understand everything in Section 3.
[23-04-27] [paper292]
  • Assaying Out-Of-Distribution Generalization in Transfer Learning [pdf] [annotated pdf]
  • NeurIPS 2022
  • [Distribution Shifts]
Well-written and quite interesting paper. Just image classification, but a very extensive evaluation. Contains a lot of information, and definitely presents some quite interesting takeaways. Almost a bit too much information perhaps. I really liked the formatting, with the "Takeaway boxes" at the end of each subsection.
[23-04-20] [paper291]
  • A Deep Conjugate Direction Method for Iteratively Solving Linear Systems [pdf] [annotated pdf]
  • 2022-05
  • [Miscellaneous]
Quite well-written and somewhat interesting paper. I really struggled to understand everything properly, I definitely don't have the required background knowledge. I don't quite understand what data they train the network on, do they train separate networks for each example? Not clear to me how generally applicable this method actually is.
[23-04-15] [paper290]
  • A Roadmap to Fair and Trustworthy Prediction Model Validation in Healthcare [pdf] [annotated pdf]
  • 2023-04
  • [ML for Medicine/Healthcare]
A different type of paper compared to what I normally read (the title sounded interesting and I was just curious to read something a bit different). A quick read, fairly interesting. Not sure if I agree with the authors though (it might of course also just be that I don't have a sufficient background understanding). "...some works consider evaluation using external data to be stringent and highly encouraged due to the difference in population characteristics in evaluation and development settings. We propose an alternative roadmap for fair and trustworthy external validation using local data from the target population...", here I would tend to agree with the first approach, not their proposed alternative.
[23-04-15] [paper289]
  • Deep Anti-Regularized Ensembles Provide Reliable Out-of-Distribution Uncertainty Quantification [pdf] [annotated pdf]
  • 2023-04
  • [Uncertainty Estimation]
Well-written and fairly interesting paper. The idea is quite interesting and neat. I like the evaluation approach used in the regression experiments, with distribution shifts. Their results in Table 1 are a bit better than the baselines, but the absolute performance is still not very good. Not particularly impressed by the classification OOD detection experiments.
[23-04-15] [paper288]
  • SIO: Synthetic In-Distribution Data Benefits Out-of-Distribution Detection [pdf] [annotated pdf]
  • 2023-03
  • [Out-of-Distribution Detection]
Well-written and fairly interesting paper. Extremely simple idea and it seems to quite consistently improve the detection performance of various methods a bit. Another potentially useful tool. 
[23-04-05] [paper287]
  • Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis [pdf] [annotated pdf]
  • MIDL 2023
  • [Uncertainty Estimation], [ML for Medicine/Healthcare]
Well-written and somewhat interesting paper. The studied problem is interesting and important, but I'm not sure about the evaluation approach. "when the uncertainty threshold is reduced, thereby increasing the number of filtered uncertain predictions, the differences in the performances on the remaining confident predictions across the subgroups should be reduced", I'm not sure this is the best metric one could use. I think there are other aspects which also would be important to measure (e.g. calibration). Also, I find it difficult to interpret the results or compare methods in Figure 2 - 4.
[23-03-30] [paper286]
  • PID-GAN: A GAN Framework based on a Physics-informed Discriminator for Uncertainty Quantification with Physics [pdf] [annotated pdf]
  • KDD 2021
  • [Uncertainty Estimation]
Quite well-written and somewhat interesting paper. Compared to the "PIG-GAN" baseline, their method seems to be an improvement. However, I'm not overly convinced about the general method, it sort of seems unnecessarily complicated to me. 
[23-03-23] [paper285]
  • Resurrecting Recurrent Neural Networks for Long Sequences [pdf] [annotated pdf]
  • 2023-03
  • [Sequence Modeling]
Quite well-written and quite interesting paper. I did not really have the background knowledge necessary to properly evaluate/understand/appreciate everything. The paper is quite dense, contains a lot of detailed information. Still quite interesting though, seems to provide a number of relatively interesting insights.
[23-03-16] [paper284]
Interesting and well-written paper. A bit different than the papers I usually read, but in a good way. I enjoyed reading it and it made me think.
[23-03-11] [paper283]
  • How to Exploit Hyperspherical Embeddings for Out-of-Distribution Detection? [pdf] [annotated pdf]
  • ICLR 2023
  • [Out-of-Distribution Detection]
Very well-written and quite interesting paper. Very similar to "Out-of-Distribution Detection with Deep Nearest Neighbors", just use their proposed loss in equation (7) instead of SupCon, right? Somewhat incremental I suppose, but it's also quite neat that such a simple modification consistently improves the OOD detection performance. The analysis in Section 4.3 is also quite interesting.
[23-03-11] [paper282]
  • Out-of-Distribution Detection with Deep Nearest Neighbors [pdf] [annotated pdf]
  • ICML 2022
  • [Out-of-Distribution Detection]
Interesting and very well-written paper, I enjoyed reading it. They propose a simple extension of "SSD: A Unified Framework for Self-Supervised Outlier Detection": to use kNN distance to the train feature vectors instead of Mahalanobis distance. Very simple and intuitive, and consistently improves the results.
[23-03-11] [paper281]
  • SSD: A Unified Framework for Self-Supervised Outlier Detection [pdf] [annotated pdf]
  • ICLR 2021
  • [Out-of-Distribution Detection]
Well-written and interesting paper. The method is simple and makes intuitive sense, yet seems to perform quite well.
[23-03-10] [paper280]
  • Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need [pdf] [annotated pdf]
  • CVPR 2023
  • [Out-of-Distribution Detection]
Quite interesting, but not overly well-written paper. I don't like the "... is all you need" title, and they focus too much on selling how their method beats SOTA (Figure 1 does definitely not illustrate the performance difference in a fair way).
[23-03-10] [paper279]
  • Out-of-Distribution Detection and Selective Generation for Conditional Language Models [pdf] [annotated pdf]
  • ICLR 2023
  • [Out-of-Distribution Detection], [Selective Prediction], [NLP]
Well-written and quite interesting paper. Doing "selective generation" generally makes sense. Their method seems like a quite intuitive extension of "A simple fix to Mahalanobis distance for improving near-OOD detection" (relative Mahalanobis distance) to the setting of language models. Also seems to perform quite well, but not super impressive performance compared to the baselines perhaps.
[23-03-09] [paper278]
  • Learning to Reject Meets OOD Detection: Are all Abstentions Created Equal? [pdf] [annotated pdf]
  • 2023-01
  • [Out-of-Distribution Detection], [Selective Prediction]
Quite well-written and fairly interesting paper. I struggled to properly follow some parts. I'm not entirely convinced by their proposed approach.
[23-03-09] [paper277]
  • Calibrated Selective Classification [pdf] [annotated pdf]
  • TMLR, 2022
  • [Uncertainty Estimation], [Selective Prediction]
Well-written and quite interesting paper. The overall aim of "we extend selective classification to focus on improving model calibration over non-rejected instances" makes a lot of sense to me. The full proposed method (Section 4.2 - 4.5) seems a bit complicated though, but the experiments and results are definitely quite interesting. 
[23-03-09] [paper276]
Very well-written paper. There are topics which I generally find a lot more interesting, but I still definitely enjoyed reading this paper.
[23-03-08] [paper275]
  • A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification [pdf] [annotated pdf]
  • ICLR 2023
  • [Out-of-Distribution Detection]
Interesting and well-written paper, I'm glad that I found it and decided to read it in detail. The appendix contains a lot of information (and I did not have time to go through everything). Overall, I really like what the authors set out do with this paper. But in the end, I'm not entirely convinced. The AURC metric still has some issues, I think.
[23-03-08] [paper274]
  • High-Resolution Image Synthesis with Latent Diffusion Models [pdf] [annotated pdf]
  • CVPR 2022
  • [Diffusion Models]
Quite interesting and well-written paper. The method is described well in Section 3. Section 4.1 is quite interesting. The rest of the results I did not go through in much detail. Update 23-05-11: Read the paper again for our reading group, pretty much exactly the same impression this second time. The overall idea is simple and neat.
[23-03-07] [paper273]
  • Certifying Out-of-Domain Generalization for Blackbox Functions [pdf] [annotated pdf]
  • ICML 2022
  • [Distribution Shifts]
Well-written and quite interesting paper, I'm just not even close to having the background necessary to be able to properly understand/appreciate/evaluate these results. Could this be used in practice? If so, how useful would it actually be? I have basically no clue.
[23-03-07] [paper272]
  • Predicting Out-of-Distribution Error with the Projection Norm [pdf] [annotated pdf]
  • ICML 2022
  • [Distribution Shifts]
Well-written and quite interesting paper. The method is conceptually simple and makes some intuitive sense. I'm just not quite sure how/when this approach actually would be used in practice? They say in Section 6 that "Our method can potentially be extended to perform OOD detection", but I don't really see how that would be possible (since the method seems to require at least ~200 test samples)?
[23-03-07] [paper271]
  • Variational- and Metric-based Deep Latent Space for Out-of-Distribution Detection [pdf] [annotated pdf]
  • UAI 2022
  • [Out-of-Distribution Detection]
Quite well-written and somewhat interesting paper. Seems a bit ad hoc and quite incremental overall.
[23-03-07] [paper270]
  • Igeood: An Information Geometry Approach to Out-of-Distribution Detection [pdf] [annotated pdf]
  • ICLR 2022
  • [Out-of-Distribution Detection]
Quite well-written and somewhat interesting paper. The proposed method seems a bit ad hoc to me. Not overly impressive experimental results. Seems a bit incremental overall.
[23-03-03] [paper269]
  • The Tilted Variational Autoencoder: Improving Out-of-Distribution Detection [pdf] [annotated pdf]
  • ICLR 2023
  • [Out-of-Distribution Detection], [VAEs]
Quite well-written and somewhat interesting paper. I still don't fully understand the "Will-it-move test", not even after having read Appendix D. It seems a bit strange to me, and it requires access to OOD data. So, then you get the same type of problems as all "outlier exposure"-style methods (what if you don't have access to OOD data? And will the OOD detector actually generalize well to other OOD data than what it was tuned on)? Section 4.2.1 pretty interesting though.
[23-03-02] [paper268]
  • Improving Reconstruction Autoencoder Out-of-distribution Detection with Mahalanobis Distance [pdf] [annotated pdf]
  • 2018-12
  • [Out-of-Distribution Detection]
Quite well-written and somewhat interesting paper. Short (~4 pages) and a very quick read. A simple idea that makes intuitive sense. Very basic experiments (only MNIST).
[23-03-02] [paper267]
  • Denoising Diffusion Models for Out-of-Distribution Detection [pdf] [annotated pdf]
  • CVPR Workshops 2023
  • [Out-of-Distribution Detection], [Diffusion Models]
Well-written and interesting paper, I enjoyed reading it. Very similar to "Unsupervised Out-of-Distribution Detection with Diffusion Inpainting" (reconstruction-based OOD detection using diffusion models), but using a slightly different approach. The related work is described in a really nice way, and they compare with very relevant baselines it seems. Promising performance in the experiments.
[23-03-01] [paper266]
  • Conformal Prediction Beyond Exchangeability [pdf] [annotated pdf]
  • 2022-02
  • [Uncertainty Estimation], [Distribution Shifts]
Well-written and quite interesting paper, I quite enjoyed reading it. Much longer than usual (32 pages), but didn't really take longer than usual to read (I skipped/skimmed some of the theoretical parts). Their proposed method makes intuitive sense I think, but seems like it's applicable only to problems in which some kind of prior knowledge can be used to compute weights? From the end of Section 4.3: "On the other hand, if the test point comes from a new distribution that bears no resemblance to the training data, neither our upper bound nor any other method would be able to guarantee valid coverage without further assumptions. An important open question is whether it may be possible to determine, in an adaptive way, whether coverage will likely hold for a particular data set, or whether that data set exhibits high deviations from exchangeability such that the coverage gap may be large".
[23-02-27] [paper265]
  • Robust Validation: Confident Predictions Even When Distributions Shift [pdf] [annotated pdf]
  • 2020-08
  • [Uncertainty Estimation], [Distribution Shifts]
Quite interesting and well-written paper. Longer (~19 pages) and more theoretical than what I usually read. I did not understand all details in Section 2 and 3. Also find it difficult to know how Algorithm 2 and 3 actually are implemented, would like see some code. Not entirely sure how useful their methods actually would be in practice, but I quite enjoyed reading the paper at least.
[23-02-24] [paper264]
  • Conformal Prediction Under Covariate Shift [pdf] [annotated pdf]
  • NeurIPS 2019
  • [Uncertainty Estimation], [Distribution Shifts]
Quite interesting paper. It contains more theoretical results than I'm used to, and some things are sort of explained in an unnecessarily complicated way. The proposed method in Section 2 makes some intuitive sense, but I also find it a bit odd. It requires access to unlabeled test inputs, and then you'd have to train a classifier to distinguish train inputs from test inputs? Is this actually a viable approach in practice? Would it work well e.g. for image data? Not clear to me. In the paper, the method is applied to a single very simple example.
[23-02-23] [paper263]
  • Unsupervised Out-of-Distribution Detection with Diffusion Inpainting [pdf] [annotated pdf]
  • 2023-02
  • [Out-of-Distribution Detection], [Diffusion Models]
Well-written and interesting paper, I enjoyed reading it. The proposed method is conceptually very simple and makes a lot of intuitive sense. As often is the case with OOD detection papers, I find it difficult to judge how strong/impressive the experimental results actually are (the method is evaluated only on quite simple/small image classification datasets), but it seems quite promising at least.
[23-02-23] [paper262]
  • Adaptive Conformal Inference Under Distribution Shift [pdf] [annotated pdf]
  • NeurIPS 2021
  • [Uncertainty Estimation], [Distribution Shifts]
Interesting and well-written paper. The proposed method in Section 2 is quite intuitive and clearly explained. The examples in Figure 1 and 3 are quite neat. "The methods we develop are specific to cases where Y_t is revealed at each time point. However, there are many settings in which we receive the response in a delayed fashion or in large batches." - this is true, but there are also many settings in which the method would not really be applicable. In cases which it is though, I definitely think it could make sense to use this instead of standard conformal prediction.
[23-02-23] [paper261]
  • Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting [pdf] [annotated pdf]
  • NeurIPS 2020
  • [Sequence Modeling], [Graph Neural Networks]
Quite interesting and well-written paper, not a topic that I personally find overly interesting though.
[23-02-16] [paper260]
  • Neural Networks Trained with SGD Learn Distributions of Increasing Complexity [pdf] [annotated pdf]
  • ICML 2023
  • [Theoretical Properties of Deep Learning]
Interesting paper. I would have needed a bit more time to read it though, felt like I didn't quite have enough time to properly understand everything and evaluate the significance of the findings. Might have to go back to this paper again.
[23-02-09] [paper259]
  • The Forward-Forward Algorithm: Some Preliminary Investigations [pdf] [annotated pdf]
  • 2022-12
  • [Miscellaneous]
Somewhat interesting, but quite odd paper. I was quite confused by multiple parts of it. This is probably partly because of my background, but I do also think that the paper could be more clearly structured.
[23-02-01] [paper258]
  • Everything is Connected: Graph Neural Networks [pdf] [annotated pdf]
  • Current Opinion in Structural Biology, 2023
  • [Graph Neural Networks]
Quite interesting and well-written paper. A short survey, took just ~40 min to read. Not overly interesting, but a quite enjoyable read. Section 4, with the connection to transformers, is quite interesting.
[23-01-27] [paper257]
  • Gradient Descent Happens in a Tiny Subspace [pdf] [annotated pdf]
  • 2018-12
  • [Theoretical Properties of Deep Learning]
Quite interesting paper. Structured in a somewhat unusual way. Some kind of interesting observations. Difficult for me to judge how significant / practically impactful these observations actually are though.
[23-01-19] [paper256]
  • Out-Of-Distribution Detection Is Not All You Need [pdf] [annotated pdf]
  • AAAI 2023
  • [Out-of-Distribution Detection]
Quite interesting and well-written paper. How they describe limitations of OOD detection makes sense to me, I have always found the way OOD detection methods are evaluated a bit strange/arbitrary. However, I am not sure that the solution proposed in this paper actually is the solution.
[23-01-10] [paper255]
  • Diffusion Models: A Comprehensive Survey of Methods and Applications [pdf] [annotated pdf]
  • 2022-09
  • [Diffusion Models]
Quite interesting and well-written paper. ~28 pages, so a longer paper than usual. Section 1 and 2 (the first 9 pages) are interesting, they describe and show connections between the "denoising diffusion probabilistic models", "score-based generative models" and "stochastic differential equations" approaches. The remainder of the paper is quite but not overly interesting, I read it in less detail.

Papers Read in 2022:

[22-12-14] [paper254]
  • Continuous Time Analysis of Momentum Methods [pdf] [annotated pdf]
  • JMLR, 2020
  • [Theoretical Properties of Deep Learning]
Quite well-written and somewhat interesting paper. Longer (~20 pages) and more theoretical paper than what I usually read, and I definitely didn't understand all the details, but still a fairly enjoyable read. More enjoyable than I expected at least.
[22-12-14] [paper253]
  • Toward a Theory of Justice for Artificial Intelligence [pdf] [annotated pdf]
  • Daedalus, 2022
  • [ML & Ethics]
Well-written and quite interesting paper. Describes the distributive justice principles of John Rawls' book "A theory of justice" and explores/discusses what these might imply for how "AI systems" should be regulated/deployed/etc. Doesn't really provide any overly concrete takeaways, at least not for me, but still a quite enjoyable read.
[22-12-08] [paper252]
Well-written and interesting paper. Sections 1-6 and Section 11 are very interesting. A breath of fresh air to read this in the midst of the ChatGPT hype. It contains a lot of good quotes, for example:"To ensure that we can make informed decisions about the trustworthiness and safety of the AI systems we deploy, it is advisable to keep to the fore the way those systems actually work, and thereby to avoid imputing to them capacities they lack, while making the best use of the remarkable capabilities they genuinely possess".
[22-12-06] [paper251]
Well-written and interesting paper. Provides some interesting comments/critique on utilitarianism and how engineers/scientists like myself might be inclined to find that approach attractive: "The optimizing mindset prevalent among computer scientists and economists, among other powerful actors, has led to an approach focused on maximizing the fulfilment of human preferences..... But this preference-based utilitarianism is open to serious objections. This essay sketches an alternative, “humanistic” ethics for AI that is sensitive to aspects of human engagement with the ethical often missed by the dominant approach." - - - - "So ethics is reduced to an exercise in prediction and optimization: which act or policy is likely to lead to the optimal fulfilment of human preferences?" - - - - "This incommensurability calls into question the availability of some optimizing function that determines the single option that is, all things considered, most beneficial or morally right, the quest for which has animated a lot of utilitarian thinking in ethics."
[22-12-06] [paper250]
  • Physics-Informed Neural Networks for Cardiac Activation Mapping [pdf] [annotated pdf]
  • Frontiers in Physics, 2020
  • [ML for Medicine/Healthcare]
Quite well-written and somewhat interesting paper.
[22-12-05] [paper249]
  • AI Ethics and its Pitfalls: Not Living up to its own Standards? [pdf] [annotated pdf]
  • AI and Ethics, 2022
  • [ML & Ethics]
Well-written and somewhat interesting paper. Good reminder that also the practice of ML ethics could have unintended negative consequences. Section 2.6 is quite interesting.
[22-12-02] [paper248]
Well-written and very interesting paper. I enjoyed reading it, and it made me think - which is a good thing! Contains quite a few quotes which I really liked, for example: "However, it is wrong to assume that the goal is ethical AI. Rather, the primary aim from which detailed norms can be derived should be a peaceful, sustainable, and just society. Hence, AI ethics must dare to ask the question where in an ethical society one should use AI and its inherent principle of predictive modeling and classification at all".
[22-12-01] [paper247]
  • The Ethics of AI Ethics: An Evaluation of Guidelines [pdf] [annotated pdf]
  • Minds and Machines, 2020
  • [ML & Ethics]
Well-written and interesting paper. I liked that it discussed some actual ethical theories in Section 4.2. Sections 3.2, 3.3. and 4.1 were also interesting.
[22-12-01] [paper246]
Well-written and very interesting paper. I enjoyed reading it, and it made me think - which is a good thing!
[22-12-01] [paper245]
Quite well-written and interesting paper. I did struggle to properly understand everything in Section 3 & 4, felt like I didn't quite have the necessary background knowledge. Helped a lot to go through the paper again at our reading group.
[22-11-26] [paper244]
  • You Cannot Have AI Ethics Without Ethics [pdf]
  • AI and Ethics, 2021
  • [ML & Ethics]
Well-written and quite interesting paper. Just 5 pages long, quick to read. Sort of like an opinion piece. I enjoyed reading it. Main takeaway: "Instead of trying to reinvent ethics, or adopt ethical guidelines in isolation, it is incumbent upon us to recognize the need for broadly ethical organizations. These will be the only entrants in a position to build truly ethical AI. You cannot simply have AI ethics. It requires real ethical due diligence at the organizational level—perhaps, in some cases, even industry-wide refection".
[22-11-25] [paper243]
Well-written and interesting paper, quite straightforward to follow and understand everything. Section 6 & 7 are interesting, with the discussion about unintended consequences of recommender algorithms (how they contribute to an impaired democratic debate).
[22-11-25] [paper242]
  • The future of AI in our hands? To what extent are we as individuals morally responsible for guiding the development of AI in a desirable direction? [pdf] [annotated pdf]
  • AI and Ethics, 2022
  • [ML & Ethics]
Well-written and somewhat interesting paper. Not overly technical or difficult to read. Discusses different perspectives on who should be responsible for ensuring that the future development of "AI" actually benefits society.
[22-11-24] [paper241]
  • Collocation Based Training of Neural Ordinary Differential Equations [pdf] [annotated pdf]
  • Statistical Applications in Genetics and Molecular Biology, 2021
  • [Neural ODEs]
Quite well-written and fairly interesting paper. Not sure how much new insight it actually provided for me, but still interesting to read papers from people working in more applied fields.
[22-11-17] [paper240]
  • Prioritized Training on Points that are learnable, Worth Learning, and Not Yet Learnt [pdf] [annotated pdf]
  • ICML 2022
  • [Miscellaneous]
Well-written and quite interesting paper. The proposed method is explained well and makes intuitive sense overall, and seems to perform well in the intended setting.
[22-11-09] [paper239]
  • Learning Deep Representations by Mutual Information Estimation and Maximization [pdf] [annotated pdf]
  • ICLR 2019
  • [Representation Learning]
Quite interesting paper, but I struggled to properly understand everything. I might not have the necessary background knowledge. I find it difficult to formulate what my main takeaway from the paper would be, their proposed method seems quite similar to previous work? And also difficult to judge how significant/impressive their experimental results are?
[22-11-03] [paper238]
  • Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution [pdf] [annotated pdf]
  • ICLR 2022
  • [Theoretical Properties of Deep Learning]
Quite interesting and very well-written paper, I found it very easy to read and understand (to read it also took a lot less time than usual). Pretty much all the results/arguments make intuitive sense, and the proposed method (of first doing linear probing and then full fine-tuning) seems to perform well. I am not quite able to judge how significant/interesting/important these results are, but the paper was definitely an enjoyable read at least.
[22-10-26] [paper237]
  • Multi-scale Feature Learning Dynamics: Insights for Double Descent [pdf] [annotated pdf]
  • ICML 2022
  • [Theoretical Properties of Deep Learning]
Quite well-written paper. Definitely not my area of expertise, and I did not have enough time to really try and understand everything properly either. So, it is very difficult for me to judge how significant/important/interesting the analysis and experimental results actually are.
[22-10-20] [paper236]
Well-written and quite interesting paper. Not overly impressed by the experimental results, the "robustness to data contamination" problem seems a bit odd overall to me. The proposed training method is quite neat though (that it's not just a heuristic but follows from the scoring rule approach), and the flexibility offered by the hyperparameter gamma can probably be useful in practice sometimes.
[22-10-08] [paper235]
  • RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection [pdf] [annotated pdf]
  • NeurIPS 2022
  • [Out-of-Distribution Detection]
Quite interesting and well-written paper. The proposed method is quite neat / conceptually simple, and seems to perform very well relative to other post-hoc OOD detection scores. I don't expect the proposed score to perform well in all settings though, but it definitely seems like a useful tool.
[22-10-06] [paper234]
  • Mechanistic Models Versus Machine Learning, a Fight Worth Fighting for the Biological Community? [pdf] [annotated pdf]
  • Biology Letters, 2018
  • [Miscellaneous]
An opinion peace, not really a technical paper. Just 3-4 pages long. Well-written and quite interesting paper though, I quite enjoyed reading it. What the authors write at the end "Fundamental biology should not choose between small-scale mechanistic understanding and large-scale prediction. It should embrace the complementary strengths of mechanistic modelling and machine learning approaches to provide, for example, the missing link between patient outcome prediction and the mechanistic understanding of disease progression" makes a lot of sense to, this is my main takeaway. I also find the statement "The training of a new generation of researchers versatile in all these fields will be vital in making this breakthrough" quite interesting, this is probably true for really making progress in medical machine learning applications as well?
[22-09-22] [paper233]
  • Adversarial Examples Are Not Bugs, They Are Features [pdf] [annotated pdf]
  • NeurIPS 2019
  • [Miscellaneous]
Well-written and interesting paper, I quite enjoyed reading it. I found this quite a lot more interesting than previous papers I have read on adversarial examples. 
[22-09-15] [paper232]
  • Learning to Learn by Gradient Descent by Gradient Descent [pdf] [annotated pdf]
  • NeurIPS 2016
  • [Miscellaneous]
Quite interesting and well-written paper. Not my area of expertise, but still a relatively enjoyable read. "After each epoch (some fixed number of learning steps) we freeze the optimizer parameters..." is quite unclear though, it seems like they never specify for how many number of steps the optimizer is trained?
[22-09-01] [paper231]
  • On the Information Bottleneck Theory of Deep Learning [pdf] [annotated pdf]
  • ICLR 2018
  • [Theoretical Properties of Deep Learning]
Well-written and quite interesting paper. I was not particularly familiar with the previous information bottleneck papers, but everything was still fairly easy to follow. The discussion/argument on openreview is strange (`This “paper” attacks our work through the following flawed and misleading statements`), i honestly don't know who is correct.
[22-06-28] [paper230]
  • Aleatoric and Epistemic Uncertainty with Random Forests [pdf] [annotated pdf]
  • IDA 2020
  • [Uncertainty Estimation]
Quite well-written and somewhat interesting paper. 
[22-06-23] [paper229]
  • Linear Time Sinkhorn Divergences using Positive Features [pdf] [annotated pdf]
  • NeurIPS 2020
  • [Miscellaneous]
Fairly well-written and somewhat interesting paper. Definitely not my area of expertise, I struggled to understand some parts of the paper, and it's difficult for me to judge how important/significant/useful the presented method actually is.
[22-06-17] [paper228]
  • Greedy Bayesian Posterior Approximation with Deep Ensembles [pdf] [annotated pdf]
  • TMLR, 2022
  • [Uncertainty Estimation]
Quite well-written and fairly interesting paper. I was mainly just interested in reading one of the first ever TMLR accepted papers. Their final method in Algorithm 2 makes some intuitive sense, but I did not fully understand the theoretical arguments in Section 3.
[22-06-10] [paper227]
  • Weakly-Supervised Disentanglement Without Compromises [pdf] [annotated pdf]
  • ICML 2020
  • [Representation Learning]
Quite well-written and somewhat interesting paper. Definitely not my area of expertise (learning disentangled representations of e.g. images) and I didn't have a lot of time to read the paper, I struggled to understand big parts of the paper.
[22-06-02] [paper226]
  • Shaking the Foundations: Delusions in Sequence Models for Interaction and Control [pdf] [annotated pdf]
  • 2021-10
  • [Sequence Modeling]
Quite well-written and somewhat interesting paper. Definitely not my area of expertise (causality). I didn't understand everything properly, and it's very difficult for me to judge how interesting this paper actually is.
[22-05-23] [paper225]
  • When are Bayesian Model Probabilities Overconfident? [pdf] [annotated pdf]
  • 2020-03
  • [Miscellaneous]
Quite well-written and somewhat interesting paper. A bit different compared to the papers I usually read, this is written by people doing statistics. I did definitely not understand everything properly. Quite difficult for me to say what my main practical takeaway from the paper is.
[22-05-20] [paper224]
  • Open-Set Recognition: a Good Closed-Set Classifier is All You Need? [pdf] [annotated pdf]
  • ICLR 2022
  • [Out-of-Distribution Detection]
Well-written and quite interesting paper. Like the authors discuss, this open-set recognition problem is of course highly related to out-of-distribution detection. Their proposed benchmark (fine-grained classification datasets) is quite neat, definitely a lot mote challenging than many OOD detection datasets (this could be seen as "very near ODD" I suppose).
[22-04-08] [paper223]
  • Improving Conditional Coverage via Orthogonal Quantile Regression [pdf] [annotated pdf]
  • NeurIPS 2021
  • [Uncertainty Estimation]
Well-written and somewhat interesting paper. They propose an improved quantile regression method named orthogonal QR. The method entails adding a regularization term to the quantile regression loss, encouraging the prediction interval length to be independent of the coverage identifier (intuitively, I don't quite get why this is desired). They evaluate on 9 tabular regression datasets, the same used in e.g. "Conformalized Quantile Regression". The model is just a small 3-layer neural network. Compared to standard quantile regression, their method improves something called "conditional coverage" of the prediction intervals (they want to "achieve coverage closer to the desired level evenly across all sup-populations").
[22-04-08] [paper222]
Interesting and well-written paper. I should have read this paper before reading "Efficient and Differentiable Conformal Prediction with General Function Classes". They give a pretty good introduction to both quantile regression and conformal prediction, and then propose a method that combines these two approaches. Their method is quite simple, they use conformal prediction on validation data (the "calibration set") to calibrate the prediction intervals learned by a quantile regression method? This is sort of like temperature scaling, but for prediction intervals learned by quantile regression?
[22-04-08] [paper221]
  • Efficient and Differentiable Conformal Prediction with General Function Classes [pdf] [annotated pdf]
  • ICLR 2022
  • [Uncertainty Estimation]
Quite interesting and well-written paper. Mainly consider regression problems (tabular datasets + next-state prediction in RL, low-dimensional inputs). I should have read at least one more basic paper on conformal prediction and/or quantile regression first, I didn't quite understand all the details.
[22-04-06] [paper220]
  • Consistent Estimators for Learning to Defer to an Expert [pdf] [annotated pdf]
  • ICML 2020
  • [Uncertainty Estimation], [ML for Medicine/Healthcare]
Somewhat interesting paper. Image and text classification. The general problem setting (that a model can either predict or defer to an expert) is interesting and the paper is well-written overall, but in the end I can't really state any specific takeaways. I didn't understand section 4 or 5 properly. I don't think I can judge the significance of their results/contributions. 
[22-04-06] [paper219]
  • Uncalibrated Models Can Improve Human-AI Collaboration [pdf] [annotated pdf]
  • NeurIPS 2022
  • [ML for Medicine/Healthcare]
Quite interesting paper. Sort of thought-provoking, an interesting perspective. I was not exactly convinced in the end though. It seems weird to me that they don't even use an ML model to provide the advice, but instead use the average response of another group of human participants. Because this means that, like they write in Section 6, the average advice accuracy is higher than the average human accuracy. So, if the advice is better than the human participants, we just want to push the human predictions towards the advice? And therefore it's beneficial to increase the confidence of the advice (and thus make it uncalibrated), because this will make more humans actually change their prediction and align it more with the advice? I might miss something here, but this sort of seems a bit trivial?
[22-04-05] [paper218]
  • Exploring Covariate and Concept Shift for Detection and Calibration of Out-of-Distribution Data [pdf] [annotated pdf]
  • 2021-11
  • [Out-of-Distribution Detection]
Quite interesting and well-written paper. Only image classification (CIFAR10/100). I didn't quite spend enough time to properly understand everything in Section 4, or to really judge how impressive their experimental results actually are. Seems potentially useful.
[22-04-02] [paper217]
  • On the Out-of-distribution Generalization of Probabilistic Image Modelling [pdf] [annotated pdf]
  • NeurIPS 2021
  • [Out-of-Distribution Detection]
Well-written and interesting paper, I enjoyed reading it. Everything is clearly explained and the proposed OOD detection score in Section 3.1 makes intuitive sense. The results in Table 4 seem quite impressive. I was mostly interested in the OOD detection aspect, so I didn't read Section 4 too carefully.
[22-04-02] [paper216]
  • A Fine-Grained Analysis on Distribution Shift [pdf] [annotated pdf]
  • ICLR 2022
  • [Distribution Shifts]
Somewhat interesting paper. They consider 6 different datasets, only classification tasks. The takeaways and practical tips in Section 4 seem potentially useful, but I also find them somewhat vague.
[22-04-01] [paper215]
  • Transformer-Based Out-of-Distribution Detection for Clinically Safe Segmentation [pdf] [annotated pdf]
  • MIDL 2022
  • [ML for Medicine/Healthcare], [Out-of-Distribution Detection], [Transformers]
Well-written and interesting paper. I was not familiar with the VQ-GAN/VAE model, so I was confused by Section 2.3 at first, but now I think that I understand most of it. Their VQ-GAN + transformer approach seems quite complex indeed, but also seems to perform well. However, they didn't really compare with any other OOD detection method. I find it somewhat difficult to tell how useful this actually could be in practice.
[22-03-31] [paper214]
Well-written and somewhat interesting paper. The "health condition score" estimation problem seems potentially interesting. They only consider problems with 1D regression targets. Their two proposed methods are clearly explained. I could probably encounter this imbalanced issue at some point, and then I'll keep this paper in mind.
[22-03-31] [paper213]
  • Hidden in Plain Sight: Subgroup Shifts Escape OOD Detection [pdf] [annotated pdf]
  • MIDL 2022
  • [ML for Medicine/Healthcare], [Out-of-Distribution Detection], [Distribution Shifts]
Quite well-written, but somewhat confusing paper. The experiment in Table 1 seems odd to me, why would we expect or even want digit-5 images to be classified as OOD when the training data actually includes a bunch of digit-5 images (the bottom row)? And for what they write in the final paragraph of Section 3 (that the accuracy is a bit lower for the hospital 3 subgroup), this wouldn't actually be a problem in practice if the model then also is more uncertain for these examples? I.e., studying model calibration across the different subgroups would be what's actually interesting? Or am I not understanding this whole subgroup shift properly? I feel quite confused.
[22-03-30] [paper212]
  • Self-Distribution Distillation: Efficient Uncertainty Estimation [pdf] [annotated pdf]
  • UAI 2022
  • [Uncertainty Estimation], [Out-of-Distribution Detection]
Quite well-written and somewhat interesting paper. Only consider image classification. Their method in Figure 1 is in a way more interesting than I first realized, it's not entirely clear to me why this would improve performance compared to just training a model with the standard cross-entropy loss, their method induces some type of beneficial regularization? I didn't quite get the method described in Section 4.1.
[22-03-29] [paper211]
  • A Benchmark with Decomposed Distribution Shifts for 360 Monocular Depth Estimation [pdf] [annotated pdf]
  • NeurIPS Workshops 2021
  • [Distribution Shifts]
Somewhat interesting paper. A short paper of just 4-5 pages. The provided dataset could be useful for comparing methods in terms of distribution shift robustness.
[22-03-28] [paper210]
  • WILDS: A Benchmark of in-the-Wild Distribution Shifts [pdf] [annotated pdf]
  • ICML 2021
  • [Distribution Shifts]
Well-written and quite interesting paper. Neat benchmark with a diverse set of quite interesting datasets.
[22-03-24] [paper209]
  • Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning [pdf] [annotated pdf]
  • Nature Communications, 2016
  • [Theoretical Properties of Deep Learning]
Definitely not my area of expertise, but still a quite interesting paper to read. The authors are interested in the question of how error propagation-based learning algorithms potentially might be utilized in the human brain. Backpropagation is one such algorithm and is highly effective, but it "involves a precise, symmetric backward connectivity pattern" (to compute the gradient update of the current layer weight matrix, the error is multiplied with the weight matrix W of the following layer), which apparently is thought to be impossible in the brain. The authors show that backpropagation can be simplified but still offer effective learning, their feedback alignment method instead make use of "fixed, random connectivity patterns" (replace the weight matrix W with a random matrix B). Their study thus "reveals much lower architectural constraints on what is required for error propagation across layers of neurons".
[22-03-17] [paper208]
  • Comparing Elementary Cellular Automata Classifications with a Convolutional Neural Network [pdf] [annotated pdf]
  • ICAART 2021
  • [Miscellaneous]
I'm not familiar with "Cellular automata" at all, but still a somewhat interesting paper to read. I mostly understand what they're doing (previous papers have proposed different categorizations/groupings/classifications of ECAs, and in this paper they train CNNs to predict the classes assigned by these different ECA categorizations, to compare them), but I don't really know why it's interesting/useful.
[22-03-10] [paper207]
  • Structure and Distribution Metric for Quantifying the Quality of Uncertainty: Assessing Gaussian Processes, Deep Neural Nets, and Deep Neural Operators for Regression [pdf] [annotated pdf]
  • 2022-03
  • [Uncertainty Estimation]
Somewhat interesting paper, I didn't spend too much time on it. Just simply using the correlation between squared error and predicted variance makes some sense, I guess? I don't quite get what their NDIP metric in Section 2.2 will actually measure though? Also, I don't understand their studied application at all.
[22-03-10] [paper206]
  • How to Measure Deep Uncertainty Estimation Performance and Which Models are Naturally Better at Providing It [pdf] [annotated pdf]
  • 2021-10
  • [Uncertainty Estimation], [Out-of-Distribution Detection]
Quite interesting and well-written paper. They only study image classification. The E-AURC metric which is described in Appendix C should be equivalent to AUSE, I think? Quite interesting that knowledge distillation seems to rather consistently have a positive effect on the uncertainty estimation metrics, and that ViT models seem to perform very well compared to a lot of other architectures. Otherwise, I find it somewhat difficult to draw any concrete conclusions.
[22-03-10] [paper205]
  • The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers [pdf] [annotated pdf]
  • ICLR 2021
  • [Theoretical Properties of Deep Learning]
Well-written and quite interesting paper. I didn't take the time to try and really understand all the details, but a quite enjoyable read. The proposed framework seems to make some intuitive sense and lead to some fairly interesting observations/insights, but it's difficult for me to judge how significant it actually is.
[22-03-08] [paper204]
  • Selective Regression Under Fairness Criteria [pdf] [annotated pdf]
  • ICML 2022
  • [Uncertainty Estimation], [Selective Prediction]
Well-written and somewhat interesting paper. Gives a pretty good introduction to the fair regression problem, Section 2 is very well-written. Quite interesting that it can be the case that while overall performance improves with decreased coverage, the performance for a minority sub-group is degraded. I didn't quite follow everything in Section 5, the methods seem a bit niche. I'm not overly impressed by the experiments either.
[22-03-08] [paper203]
  • Risk-Controlled Selective Prediction for Regression Deep Neural Network Models [pdf] [annotated pdf]
  • IJCNN 2020
  • [Uncertainty Estimation], [Selective Prediction]
Interesting and well-written paper. They take the method from "Selective Classification for Deep Neural Networks" and extend it to regression. I don't really understand the details of the lemmas/theorems, but otherwise everything is clearly explained.
[22-03-08] [paper202]
  • Second Opinion Needed: Communicating Uncertainty in Medical Artificial Intelligence [pdf] [annotated pdf]
  • npj Digital Medicine, 2021
  • [Uncertainty Estimation], [ML for Medicine/Healthcare]
Well-written and quite interesting paper. A relatively short paper of just 4 pages. They give an overview of different uncertainty estimation techniques, and provide some intuitive examples and motivation for why uncertainty estimation is important/useful within medical applications. I quite enjoyed reading the paper.
[22-03-07] [paper201]
  • Selective Classification for Deep Neural Networks [pdf] [annotated pdf]
  • NeurIPS 2017
  • [Uncertainty Estimation], [Selective Prediction]
Interesting and well-written paper, I enjoyed reading it. I don't really understand the lemma/theorem in Section 3, but everything is still clearly explained.
[22-03-05] [paper200]
  • SelectiveNet: A Deep Neural Network with an Integrated Reject Option [pdf] [annotated pdf]
  • ICML 2019
  • [Uncertainty Estimation], [Selective Prediction]
Well-written and quite interesting paper. The proposed method is quite interesting and makes some intuitive sense, but I would assume that the calibration technique in Section 5 has similar issues as temperature scaling (i.e., the calibration might still break under various data shifts)?
[22-03-04] [paper199]
  • NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural Networks [pdf] [annotated pdf]
  • NeurIPS 2022
  • [Uncertainty Estimation], [Out-of-Distribution Detection]
Interesting paper. I found it difficult to understand Section 2, I wouldn't really be able to implement their proposed NUQ method. Only image classification, but their experimental evaluation is still quite extensive. And, they obtain strong performance.
[22-03-03] [paper198]
  • On the Practicality of Deterministic Epistemic Uncertainty [pdf] [annotated pdf]
  • ICML 2022
  • [Uncertainty Estimation]
Interesting and well-written paper. Their evaluation with the corrupted datasets makes sense I think. The results are interesting, the fact that ensembling/MC-dropout consistently outperforms the other methods. Another reminder of how strong of a baseline ensembling is when it comes to uncertainty estimation? Also, I think that their proposed rAULC is more or less equivalent to AUSE (area under the sparsification error curve)?
[22-03-03] [paper197]
  • Transformers Can Do Bayesian Inference [pdf] [code] [annotated pdf]
  • Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, Frank Hutter
  • ICLR 2022
  • [Transformers]
Quite interesting and well-written paper. I did however find it difficult to properly understand everything, it feels like a lot of details are omitted (I wouldn't really know how to actually implement this in practice). It's difficult for me to judge how impressive the results are or how practically useful this approach actually might be, what limitations are there? Overall though, it does indeed seem quite interesting.
[22-03-02] [paper196]
  • A Deep Bayesian Neural Network for Cardiac Arrhythmia Classification with Rejection from ECG Recordings [pdf] [code] [annotated pdf]
  • Wenrui Zhang, Xinxin Di, Guodong Wei, Shijia Geng, Zhaoji Fu, Shenda Hong
  • 2022-02
  • [Uncertainty Estimation], [ML for Medicine/Healthcare]
Somewhat interesting paper. They use a softmax model with MC-dropout to compute uncertainty estimates. The evaluation is not very extensive, they mostly just check that the classification accuracy improves as they reject more and more samples based on a uncertainty threshold.
[22-02-26] [paper195]
  • Out of Distribution Data Detection Using Dropout Bayesian Neural Networks [pdf] [annotated pdf]
  • Andre T. Nguyen, Fred Lu, Gary Lopez Munoz, Edward Raff, Charles Nicholas, James Holt
  • AAAI 2022
  • [Out-of-Distribution Detection]
Quite interesting and well-written paper. It seemed quite niche at first, but I think their analysis could potentially be useful.
[22-02-26] [paper194]
  • Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks [pdf] [code] [annotated pdf]
  • Shiyu Liang, Yixuan Li, R. Srikant
  • ICLR 2018
  • [Out-of-Distribution Detection]
Quite interesting and well-written paper. Two simple modifications of the "maximum softmax score" baseline, and the performance is consistently improved. The input perturbation method is quite interesting. Intuitively, it's not entirely clear to me why it actually works.
[22-02-25] [paper193]
  • Confidence-based Out-of-Distribution Detection: A Comparative Study and Analysis [pdf] [code] [annotated pdf]
  • Christoph Berger, Magdalini Paschali, Ben Glocker, Konstantinos Kamnitsas
  • MICCAI Workshops 2021
  • [Out-of-Distribution Detection], [ML for Medicine/Healthcare]
Interesting and well-written paper. Interesting that Mahalanobis works very well on the CIFAR10 vs SVHN but not on the medical imaging dataset. I don't quite get how/why the ODIN method works, I'll probably have to read that paper.
[22-02-25] [paper192]
  • Deep Learning Through the Lens of Example Difficulty [pdf] [annotated pdf]
  • Robert John Nicholas Baldock, Hartmut Maennel, Behnam Neyshabur
  • NeurIPS 2021
  • [Theoretical Properties of Deep Learning]
Quite interesting and well-written paper. The definition of "prediction depth" in Section 2.1 makes sense, and it definitely seems reasonable that this could correlate with example difficulty / prediction confidence in some way. Section 3 and 4, and all the figures, contain a lot of info it seems, I'd probably need to read the paper again to properly understand/appreciate everything.
[22-02-24] [paper191]
  • UncertaINR: Uncertainty Quantification of End-to-End Implicit Neural Representations for Computed Tomography [pdf] [annotated pdf]
  • Francisca Vasconcelos, Bobby He, Nalini Singh, Yee Whye Teh
  • TMLR, 2023
  • [Implicit Neural Representations], [Uncertainty Estimation], [ML for Medicine/Healthcare]
Interesting and well-written paper. I wasn't very familiar with CT image reconstruction, but they do a good job explaining everything. Interesting that MC-dropout seems important for getting well-calibrated predictions.
[22-02-21] [paper190]
  • Can You Trust Predictive Uncertainty Under Real Dataset Shifts in Digital Pathology? [pdf] [annotated pdf]
  • Jeppe Thagaard, Søren Hauberg, Bert van der Vegt, Thomas Ebstrup, Johan D. Hansen, Anders B. Dahl
  • MICCAI 2020
  • [Uncertainty Estimation], [Out-of-Distribution Detection], [ML for Medicine/Healthcare]
Quite interesting and well-written paper. They compare MC-dropout, ensemlbing and mixup (and with a standard softmax classifer as the baseline). Nothing groundbreaking, but the studied application (classification of pathology slides for cancer) is very interesting. The FPR95 metrics for OOD detection in Table 4 are terrible for ensembling, but the classification accuracy (89.7) is also pretty much the same as for D_test_int in Tabe 3 (90.1)? So, it doesn't really matter that the model isn't capable of distinguishing this "OOD" data from in-distribution? 
[22-02-21] [paper189]
  • Robust Uncertainty Estimates with Out-of-Distribution Pseudo-Inputs Training [pdf] [annotated pdf]
  • Pierre Segonne, Yevgen Zainchkovskyy, Søren Hauberg
  • 2022-01
  • [Uncertainty Estimation]
Somewhat interesting paper. I didn't quite understand everything, so it could be more interesting than I think. The fact that their pseudo-input generation process "relies on the availability of a differentiable density estimate of the data" seems like a big limitation? For regression, they only applied their method to very low-dimensional input data (1D toy regression and UCI benchmarks), but would this work for image-based tasks?
[22-02-19] [paper188]
  • Contrastive Training for Improved Out-of-Distribution Detection [pdf] [annotated pdf]
  • Jim Winkens, Rudy Bunel, Abhijit Guha Roy, Robert Stanforth, Vivek Natarajan, Joseph R. Ledsam, Patricia MacWilliams, Pushmeet Kohli, Alan Karthikesalingam, Simon Kohl, Taylan Cemgil, S. M. Ali Eslami, Olaf Ronneberger
  • 2020-07
  • [Out-of-Distribution Detection]
Quite interesting and very well-written paper. They take the method from the Mahalanobis paper ("A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks") (however, they fit Gaussians only to the features at the second-to-last network layer, and they don't use the input pre-processing either) and consistently improve OOD detection performance by incorporating contrastive training. Specifically, they first train the network using just the SimCLR loss for a large number of epochs, and then also add the standard classification loss. I didn't quite get why the label smoothing is necessary, but according to Table 2 it's responsible for a large portion of the performance gain.
[22-02-19] [paper187]
  • A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks [pdf] [code] [annotated pdf]
  • Kimin Lee, Kibok Lee, Honglak Lee, Jinwoo Shin
  • NeurIPS 2018
  • [Out-of-Distribution Detection]
Well-written and interesting paper. The proposed method is simple and really neat: fit class-conditional Gaussians in the feature space of a pre-trained classifier (basically just LDA on the feature vectors), and then use the Mahalanobis distance to these Gaussians as the confidence score for input x. They then also do this for the features at multiple levels of the network and combine these confidence scores into one. I don't quite get why the "input pre-processing" in Section 2.2 (adding noise to test samples) works, in Table 1 it significantly improves the performance.
[22-02-19] [paper186]
  • Noise Contrastive Priors for Functional Uncertainty [pdf] [code] [annotated pdf]
  • Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, James Davidson
  • UAI 2019
  • [Uncertainty Estimation], [Out-of-Distribution Detection]
Quite interesting and well-written paper. Only experiments on a toy 1D regression problem, and flight delay prediction in which the input is 8D. The approach of just adding noise to the input x to get OOD samples would probably not work very well e.g. for image-based problems?
[22-02-18] [paper185]
  • Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions [pdf] [annotated pdf]
  • Abhijit Guha Roy, Jie Ren, Shekoofeh Azizi, Aaron Loh, Vivek Natarajan, Basil Mustafa, Nick Pawlowski, Jan Freyberg, Yuan Liu, Zach Beaver, Nam Vo, Peggy Bui, Samantha Winter, Patricia MacWilliams, Greg S. Corrado, Umesh Telang, Yun Liu, Taylan Cemgil, Alan Karthikesalingam, Balaji Lakshminarayanan, Jim Winkens
  • Medical Image Analysis, 2022
  • [Out-of-Distribution Detection], [ML for Medicine/Healthcare]
Well-written and interesting paper. Quite long, so it took a bit longer than usual to read it. Section 1 and 2 gives a great overview of OOD detection in general, and how it can be used specifically in this dermatology setting. I can definitely recommend reading Section 2 (Related work). They assume access to some outlier data during training, so their approach is similar to the "Outlier exposure" method (specifically in this dermatology setting, they say that this is a fair assumption). Their method is an improvement of the "reject bucket" (add an extra class which you assign to all outlier training data points), in their proposed method they also use fine-grained classification of the outlier skin conditions. Then they also use an ensemble of 5 models, and also a more diverse ensemble (in which they combine models trained with different representation learning techniques). This diverse ensemble obtains the best performance.
[22-02-16] [paper184]
  • Being a Bit Frequentist Improves Bayesian Neural Networks [pdf] [code] [annotated pdf]
  • Agustinus Kristiadi, Matthias Hein, Philipp Hennig
  • AISTATS 2022
  • [Uncertainty Estimation], [Out-of-Distribution Detection]
Interesting and well-written paper. The proposed method makes intuitive sense, trying to incorporate the "OOD training" method (i.e., to use some kind of OOD data during training, similar to e.g. the "Deep Anomaly Detection with Outlier Exposure" paper) into the Bayesian deep learning approach. The experimental results do seem quite promising.
[22-02-15] [paper183]
  • Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning [pdf] [code] [annotated pdf]
  • Runa Eschenhagen, Erik Daxberger, Philipp Hennig, Agustinus Kristiadi
  • NeurIPS Workshops 2021
  • [Uncertainty Estimation], [Out-of-Distribution Detection]
Well-written and interesting paper. Short paper of just 3 pages, but with an extensive appendix which I definitely recommend going through. The method, training an ensemble and then applying the Laplace approximation to each network, is very simple and intuitively makes a lot of sense. I didn't realize that this would have basically the same test-time speed as ensembling (since they utilize that probit approximation), that's very neat. It also seems to consistently outperform ensembling a bit across almost all tasks and metrics.
[22-02-15] [paper182]
  • Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning [pdf] [annotated pdf]
  • Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhi-Hong Deng, Animesh Garg, Peng Liu, Zhaoran Wang
  • ICLR 2022
  • [Uncertainty Estimation], [Reinforcement Learning]
Well-written and somewhat interesting paper. I'm not overly familiar with RL, which makes it a bit difficult for me to properly evaluate the paper's contributions. They use standard ensembles for uncertainty estimation combined with an OOD sampling regularization. I thought that the OOD sampling could be interesting, but it seems very specific to RL. I'm sure this paper is quite interesting for people doing RL, but I don't think it's overly useful for me.
[22-02-15] [paper181]
  • On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks [pdf] [code] [annotated pdf]
  • Maximilian Seitzer, Arash Tavakoli, Dimitrije Antic, Georg Martius
  • ICLR 2022
  • [Uncertainty Estimation]
Quite interesting and very well-written paper, I enjoyed reading it. Their analysis of fitting Gaussian regression models via the NLL is quite interesting, I didn't really expect to learn something new about this. I've seen Gaussian models outperform standard regression (L2 loss) w.r.t. accuracy in some applications/datasets, and it being the other way around in others. In the first case, I've then attributed the success of the Gaussian model to the "learned loss attenuation". The analysis in this paper could perhaps explain why you get this performance boost only in certain applications. Their beta-NLL loss could probably be quite useful, seems like a convenient tool to have.
[22-02-15] [paper180]
  • Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation [pdf] [annotated pdf]
  • Vincent Mai, Kaustubh Mani, Liam Paull
  • ICLR 2022
  • [Uncertainty Estimation], [Reinforcement Learning]
Well-written and somewhat interesting paper. I'm not overly familiar with reinforcement learning, which makes it a bit difficult for me to properly evaluate the paper's contributions, but to me it seems like fairly straightforward method modifications? To use ensembles of Gaussian models (instead of ensembles of models trained using the L2 loss) makes sense. The BIV method I didn't quite get, it seems rather ad hoc? I also don't quite get exactly how it's used in equation (10), is the ensemble of Gaussian models trained _jointly_ using this loss? I don't really know if this could be useful outside of RL.
[22-02-14] [paper179]
  • Laplace Redux -- Effortless Bayesian Deep Learning [pdf] [code] [annotated pdf]
  • Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, Philipp Hennig
  • NeurIPS 2021
  • [Uncertainty Estimation]
Interesting and very well-written paper, I enjoyed reading it. I still think that ensembling probably is quite difficult to beat purely in terms of uncertainty estimation quality, but this definitely seems like a useful tool in many situations. It's not clear to me if the analytical expression for regression in "4. Approximate Predictive Distribution" is applicable also if the variance is input-dependent?
[22-02-12] [paper178]
  • Benchmarking Uncertainty Quantification on Biosignal Classification Tasks under Dataset Shift [pdf] [annotated pdf]
  • Tong Xia, Jing Han, Cecilia Mascolo
  • AAAI Workshops 2022
  • [Uncertainty Estimation], [Out-of-Distribution Detection], [ML for Medicine/Healthcare]
Well-written and interesting paper. They synthetically create dataset shifts (e.g. by adding Gaussian noise to the data) of increasing intensity and study whether or not the uncertainty increases as the accuracy degrades. They compare regular softmax, temperature scaling, MC-dropout, ensembling and a simple variational inference method. Their conclusion is basically that ensembling slightly outperforms the other methods, but that no method performs overly well. I think these type of studies are really useful.
[22-02-12] [paper177]
  • Deep Evidential Regression [pdf] [code] [annotated pdf]
  • Alexander Amini, Wilko Schwarting, Ava Soleimany, Daniela Rus
  • NeurIPS 2020
  • [Uncertainty Estimation], [Out-of-Distribution Detection]
Well-written and interesting paper. This is a good paper to read before "Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions". Their proposed method seems to have similar / slightly worse performance than a small ensemble, so the only real advantage is that it's faster at time-time? This is of course very important in many applications, but not in all. The performance also seems quite sensitive to the choice of lambda in the combined loss function (Equation (10)), according to Figure S2 in the appendix?
[22-02-11] [paper176]
  • On Out-of-distribution Detection with Energy-based Models [pdf] [code] [annotated pdf]
  • Sven Elflein, Bertrand Charpentier, Daniel Zügner, Stephan Günnemann
  • ICML Workshops 2021
  • [Out-of-Distribution Detection], [Energy-Based Models]
Well-written and quite interesting paper. A short paper, just 4 pages. They don't study the method from the "Energy-based Out-of-distribution Detection" paper as I had expected, but it was still a quite interesting read. The results in Section 4.2 seem interesting, especially for experiment 3, but I'm not sure that I properly understand everything.
[22-02-10] [paper175]
  • Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions [pdf] [annotated pdf]
  • Bertrand Charpentier, Oliver Borchert, Daniel Zügner, Simon Geisler, Stephan Günnemann
  • ICLR 2022
  • [Uncertainty Estimation], [Out-of-Distribution Detection]
Interesting and well-written paper. I didn't quite understand all the details, I'll have to read a couple of related/background papers to be able to properly appreciate and evaluate the proposed method. I definitely feel like I would like to read up on this family of methods. Extensive experimental evaluation, and the results seem promising overall.
[22-02-09] [paper174]
  • Energy-based Out-of-distribution Detection [pdf] [code] [annotated pdf]
  • Weitang Liu, Xiaoyun Wang, John D. Owens, Yixuan Li
  • NeurIPS 2020
  • [Out-of-Distribution Detection], [Energy-Based Models]
Interesting and well-written paper. The proposed method is quite clearly explained and makes intuitive sense (at least if you're familiar with EBMs). Compared to using the softmax score, the performance does seem to improve consistently. Seems like fine-tuning on an "auxiliary outlier dataset" is required to get really good performance though, which you can't really assume to have access to in real-world problems, I suppose?
[22-02-09] [paper173]
  • VOS: Learning What You Don't Know by Virtual Outlier Synthesis [pdf] [code] [annotated pdf]
  • Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li
  • ICLR 2022
  • [Out-of-Distribution Detection]
Interesting and quite well-written paper. I did find it somewhat difficult to understand certain parts though, they could perhaps be explained more clearly. The results seem quite impressive (they do consistently outperform all baselines), but I find it interesting that the "Gaussian noise" baseline in Table 2 performs that well? I should probably have read "Energy-based Out-of-distribution Detection" before reading this paper.

Papers Read in 2021:

[21-12-16] [paper172]
  • Efficiently Modeling Long Sequences with Structured State Spaces [pdf] [code] [annotated pdf]
  • Albert Gu, Karan Goel, Christopher Ré
  • ICLR 2022
  • [Sequence Modeling]
Very interesting and quite well-written paper. Kind of neat/fun to see state-space models being used. The experimental results seem very impressive!? I didn't fully understand everything in Section 3. I had to read Section 3.4 a couple of times to understand how the parameterization actually works in practice (you have H state-space models, one for each feature dimension, so that you can map a sequence of feature vectors to another sequence of feature vectors) (and you can then also have multiple such layers of state-space models, mapping sequence --> sequence --> sequence --> ....).
[21-12-09] [paper171]
  • Periodic Activation Functions Induce Stationarity [pdf] [code] [annotated pdf]
  • Lassi Meronen, Martin Trapp, Arno Solin
  • NeurIPS 2021
  • [Uncertainty Estimation], [Out-of-Distribution Detection]
Quite interesting and well-written paper. Quite a heavy read, probably need to be rather familiar with GPs to properly understand/appreciate everything. Definitely check Appendix D, it gives a better understanding of how the proposed method is applied in practice. I'm not quite sure how strong/impressive the experimental results actually are. Also seems like the method could be a bit inconvenient to implement/use?
[21-12-03] [paper170]
  • Reliable and Trustworthy Machine Learning for Health Using Dataset Shift Detection [pdf] [annotated pdf]
  • Chunjong Park, Anas Awadalla, Tadayoshi Kohno, Shwetak Patel
  • NeurIPS 2021
  • [Out-of-Distribution Detection], [ML for Medicine/Healthcare]
Interesting and very well-written paper. Gives a good overview of the field and contains a lot of seemingly useful references. The evaluation is very comprehensive. The user study is quite neat.
[21-12-02] [paper169]
  • An Information-theoretic Approach to Distribution Shifts [pdf] [code] [annotated pdf]
  • Marco Federici, Ryota Tomioka, Patrick Forré
  • NeurIPS 2021
  • [Theoretical Properties of Deep Learning]
Quite well-written paper overall that seemed interesting, but I found it very difficult to properly understand everything. Thus, I can't really tell how interesting/significant their analysis actually is.
[21-11-25] [paper168]
  • On the Importance of Gradients for Detecting Distributional Shifts in the Wild [pdf] [code] [annotated pdf]
  • Rui Huang, Andrew Geng, Yixuan Li
  • NeurIPS 2021
  • [Out-of-Distribution Detection]
Quite interesting and well-written paper. The experimental results do seem promising. However, I don't quite get why the proposed method intuitively makes sense, why is it better to only use the parameters of the final network layer?
[21-11-18] [paper167]
  • Masked Autoencoders Are Scalable Vision Learners [pdf] [annotated pdf]
  • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
  • CVPR 2022
  • [Representation Learning]
Interesting and well-written paper. The proposed method is simple and makes a lot of intuitive sense, which is rather satisfying. After page 4, there's mostly just detailed ablations and results.
[21-11-11] [paper166]
  • Transferring Inductive Biases through Knowledge Distillation [pdf] [code] [annotated pdf]
  • Samira Abnar, Mostafa Dehghani, Willem Zuidema
  • 2020-05
  • [Theoretical Properties of Deep Learning]
Quite well-written and somewhat interesting paper. I'm not very familiar with this area. I didn't spend too much time trying to properly evaluate the significance of the findings.
[21-10-28] [paper165]
  • Deep Classifiers with Label Noise Modeling and Distance Awareness [pdf] [annotated pdf]
  • Vincent Fortuin, Mark Collier, Florian Wenzel, James Allingham, Jeremiah Liu, Dustin Tran, Balaji Lakshminarayanan, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou
  • TMLR, 2022
  • [Uncertainty Estimation]
Quite interesting and well-written paper. I find the distance-awareness property more interesting than modelling of input/class-dependent label noise, so the proposed method (HetSNGP) is perhaps not overly interesting compared to the SNGP baseline.
[21-10-21] [paper164]
  • Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [pdf] [code] [annotated pdf]
  • Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, Vedant Misra
  • ICLR Workshops 2021
  • [Theoretical Properties of Deep Learning]
Somewhat interesting paper. The phenomena observed in Figure 1, that validation accuracy suddenly increases long after almost perfect fitting of the training data has been achieved is quite interesting. I didn't quite understand the datasets they use (binary operation tables).
[21-10-14] [paper163]
  • Learning to Simulate Complex Physics with Graph Networks [pdf] [code] [annotated pdf]
  • Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, Peter W. Battaglia
  • ICML 2020
  • [Graph Neural Networks]
Quite well-written and somewhat interesting paper. Cool application and a bunch of neat videos. This is not really my area, so I didn't spend too much time/energy trying to fully understand everything.
[21-10-12] [paper162]
  • Neural Unsigned Distance Fields for Implicit Function Learning [pdf] [code] [annotated pdf]
  • Julian Chibane, Aymen Mir, Gerard Pons-Moll
  • NeurIPS 2020
  • [Implicit Neural Representations]
Interesting and very well-written paper, I really enjoyed reading it! The paper also gives a good understanding of neural implicit representations in general.
[21-10-08] [paper161]
  • Probabilistic 3D Human Shape and Pose Estimation from Multiple Unconstrained Images in the Wild [pdf] [annotated pdf]
  • Akash Sengupta, Ignas Budvytis, Roberto Cipolla
  • CVPR 2021
  • [3D Human Pose Estimation]
Well-written and quite interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they predict a single Gaussian distribution for the pose (instead of hierarchical matrix-Fisher distributions). Also, they mainly focus on the body shape. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).
[21-10-08] [paper160]
  • Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild [pdf] [code] [annotated pdf]
  • Akash Sengupta, Ignas Budvytis, Roberto Cipolla
  • BMVC 2020
  • [3D Human Pose Estimation]
Well-written and farily interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they just use direct regression. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).
[21-10-07] [paper159]
  • Learning Motion Priors for 4D Human Body Capture in 3D Scenes [pdf] [code] [annotated pdf]
  • Siwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys, Siyu Tang
  • ICCV 2021
  • [3D Human Pose Estimation]
Well-written and quite interesting paper. I didn't fully understand everything though, and it feels like I probably don't know this specific setting/problem well enough to fully appreciate the paper. 
[21-10-07] [paper158]
  • Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild [pdf] [code] [annotated pdf]
  • Akash Sengupta, Ignas Budvytis, Roberto Cipolla
  • ICCV 2021
  • [3D Human Pose Estimation]
Well-written and very interesting paper, I enjoyed reading it. The hierarchical distribution prediction approach makes sense and consistently outperforms the independent baseline. Using matrix-Fisher distributions makes sense. The synthetic training framework and the input representation of edge-filters + 2D keypoint heatmaps are both interesting.
[21-10-06] [paper157]
  • SMD-Nets: Stereo Mixture Density Networks [pdf] [code] [annotated pdf]
  • Fabio Tosi, Yiyi Liao, Carolin Schmitt, Andreas Geiger
  • CVPR 2021
  • [Uncertainty Estimation]
Well-written and interesting paper. Quite easy to read and follow, the method is clearly explained and makes intuitive sense.
[21-10-04] [paper156]
  • We are More than Our Joints: Predicting how 3D Bodies Move [pdf] [code] [annotated pdf]
  • Yan Zhang, Michael J. Black, Siyu Tang
  • CVPR 2021
  • [3D Human Pose Estimation]
Well-written and fairly interesting paper. The marker-based representation, instead of using skeleton joints, makes sense. The recursive projection scheme also makes sense, but seems very slow (2.27 sec/frame)? I didn't quite get all the details for their DCT representation of the latent space.
[21-10-03] [paper155]
  • imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose [pdf] [code] [annotated pdf]
  • Thiemo Alldieck, Hongyi Xu, Cristian Sminchisescu
  • ICCV 2021
  • [3D Human Pose Estimation], [Implicit Neural Representations]
Interesting and very well-written paper, I really enjoyed reading it. Interesting combination of implicit representations and 3D human modelling. The "inclusive human modelling" application is neat and important.
[21-10-03] [paper154]
  • DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors [pdf] [code] [annotated pdf]
  • Jiahui Huang, Shi-Sheng Huang, Haoxuan Song, Shi-Min Hu
  • CVPR 2021
  • [Implicit Neural Representations]
Well-written and interesting paper, I enjoyed reading it. Neat application of implicit representations. The paper also gives a quite good overview of online 3D reconstruction in general.
[21-10-02] [paper153]
  • Contextually Plausible and Diverse 3D Human Motion Prediction [pdf] [annotated pdf]
  • Sadegh Aliakbarian, Fatemeh Sadat Saleh, Lars Petersson, Stephen Gould, Mathieu Salzmann
  • ICCV 2021
  • [3D Human Pose Estimation]
Well-written and quite interesting paper. The main idea, using a learned conditional prior p(z|c) instead of just p(z), makes sense and was shown beneficial also in "HuMoR: 3D Human Motion Model for Robust Pose Estimation". I'm however somewhat confused by their specific implementation in Section 4, doesn't seem like a standard cVAE implementation?
[21-10-01] [paper152]
  • Local Implicit Grid Representations for 3D Scenes [pdf] [code] [annotated pdf]
  • Chiyu Max Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Nießner, Thomas Funkhouser
  • CVPR 2020
  • [Implicit Neural Representations]
Well-written and quite interesting paper. Interesting application, being able to reconstruct full 3D scenes from sparse point clouds. I didn't fully understand everything, as I don't have a particularly strong graphics background.
[21-09-29] [paper151]
  • Information Dropout: Learning Optimal Representations Through Noisy Computation [pdf] [annotated pdf]
  • Alessandro Achille, Stefano Soatto
  • 2016-11
  • [Representation Learning]
Well-written and somewhat interesting paper overall. I'm not overly familiar with the topics of the paper, and didn't fully understand everything. Some results and insights seem quite interesting/neat, but I'm not sure exactly what the main takeaways should be, or how significant they actually are.
[21-09-24] [paper150]
  • Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation [pdf] [code] [annotated pdf]
  • Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li
  • ICCV 2021
  • [3D Human Pose Estimation]
Well-written and fairly interesting paper. Quite a lot of details on the attention architecture, which I personally don't find overly interesting. The experimental results are quite impressive, but I would like to see a comparison in terms of computational cost at test-time. It sounds like their method is rather slow.
[21-09-23] [paper149]
  • Physics-based Human Motion Estimation and Synthesis from Videos [pdf] [annotated pdf]
  • Kevin Xie, Tingwu Wang, Umar Iqbal, Yunrong Guo, Sanja Fidler, Florian Shkurti
  • ICCV 2021
  • [3D Human Pose Estimation]
Well-written and quite interesting paper. The general idea, refining frame-by-frame pose estimates via physical constraints, intuitively makes a lot of sense. I did however find it quite difficult to understand all the details in Section 3.
[21-09-21] [paper148]
  • Hierarchical VAEs Know What They Don't Know [pdf] [code] [annotated pdf]
  • Jakob D. Havtorn, Jes Frellsen, Søren Hauberg, Lars Maaløe
  • ICML 2021
  • [Uncertainty Estimation], [VAEs]
Very well-written and quite interesting paper, I enjoyed reading it. Everything is quite well-explained, it's relatively easy to follow. The paper provides a good overview of the out-of-distribution detection problem and current methods.
[21-09-17] [paper147]
  • Human Pose Regression with Residual Log-likelihood Estimation [pdf] [code] [annotated pdf]
  • Jiefeng Li, Siyuan Bian, Ailing Zeng, Can Wang, Bo Pang, Wentao Liu, Cewu Lu
  • ICCV 2021
  • [3D Human Pose Estimation]
Quite interesting paper, but also quite strange/confusing. I don't think the proposed method is explained particularly well, at least I found it quite difficult to properly understand what they actually are doing. In the end it seems like they are learning a global loss function that is very similar to doing probabilistic regression with a Gauss/Laplace model of p(y|x) (with learned mean and variance)? See Figure 4 in the Appendix. And while it's true that their performance is much better than for direct regression with an L2/L1 loss (see e.g. Table 1), they only compare with Gauss/Laplace probabilistic regression once (Table 7) and in that case the Laplace model is actually quite competitive?
[21-09-15] [paper146]
  • NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [pdf] [code] [annotated pdf]
  • Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng
  • ECCV 2020
  • [Implicit Neural Representations]
Extremely well-written and interesting paper. I really enjoyed reading it, and I would recommend anyone interested in computer vision to read it as well. All parts of the proposed method are clearly explained and relatively easy to understand, including the volume rendering techniques which I was unfamiliar with.
[21-09-08] [paper145]
  • Revisiting the Calibration of Modern Neural Networks [pdf] [code] [annotated pdf]
  • Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario Lucic
  • NeurIPS 2021
  • [Uncertainty Estimation]
Well-written paper. Everything is quite clearly explained and easy to understand. Quite enjoyable to read overall. Thorough experimental evaluation. Quite interesting findings.
[21-09-02] [paper144]
  • Differentiable Particle Filtering via Entropy-Regularized Optimal Transport [pdf] [code] [annotated pdf]
  • Adrien Corenflos, James Thornton, George Deligiannidis, Arnaud Doucet
  • ICML 2021
  • [Sequence Modeling]
[21-09-02] [paper143]
  • Character Controllers Using Motion VAEs [pdf] [code] [annotated pdf]
  • Hung Yu Ling, Fabio Zinno, George Cheng, Michiel van de Panne
  • SIGGRAPH 2020
  • [3D Human Pose Estimation]
[21-08-27] [paper142]
  • DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation [pdf] [code] [annotated pdf]
  • Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove
  • CVPR 2019
  • [Implicit Neural Representations]
[21-06-19] [paper141]
  • Generating Multiple Hypotheses for 3D Human Pose Estimation with Mixture Density Network [pdf] [code] [annotated pdf]
  • Chen Li, Gim Hee Lee
  • CVPR 2019
  • [3D Human Pose Estimation]
[21-06-19] [paper140]
  • Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [pdf] [code] [annotated pdf]
  • Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black
  • CVPR 2019
  • [3D Human Pose Estimation]
Very well-written and quite interesting paper. Gives a good understanding of the SMPL model and the SMPLify method.
[21-06-18] [paper139]
  • Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [pdf] [annotated pdf]
  • Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, Michael J. Black
  • ECCV 2016
  • [3D Human Pose Estimation]
[21-06-18] [paper138]
  • Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video [pdf] [code] [annotated pdf]
  • Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee
  • CVPR 2021
  • [3D Human Pose Estimation]
[21-06-17] [paper137]
  • Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation [pdf] [code] [annotated pdf]
  • Hanbyul Joo, Natalia Neverova, Andrea Vedaldi
  • 3DV 2021
  • [3D Human Pose Estimation]
[21-06-17] [paper136]
  • Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop [pdf] [code] [annotated pdf]
  • Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, Kostas Daniilidis
  • ICCV 2019
  • [3D Human Pose Estimation]
[21-06-16] [paper135]
  • A simple yet effective baseline for 3d human pose estimation [pdf] [code] [annotated pdf]
  • Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little
  • ICCV 2017
  • [3D Human Pose Estimation]
[21-06-16] [paper134]
  • Estimating Egocentric 3D Human Pose in Global Space [pdf] [annotated pdf]
  • Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Christian Theobalt
  • ICCV 2021
  • [3D Human Pose Estimation]
[21-06-15] [paper133]
  • End-to-end Recovery of Human Shape and Pose [pdf] [code] [annotated pdf]
  • Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik
  • CVPR 2018
  • [3D Human Pose Estimation]
[21-06-14] [paper132]
  • 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data [pdf] [annotated pdf]
  • Benjamin Biggs, Sébastien Ehrhadt, Hanbyul Joo, Benjamin Graham, Andrea Vedaldi, David Novotny
  • NeurIPS 2020
  • [3D Human Pose Estimation]
[21-06-04] [paper131]
  • HuMoR: 3D Human Motion Model for Robust Pose Estimation [pdf] [code] [annotated pdf]
  • Davis Rempe, Tolga Birdal, Aaron Hertzmann, Jimei Yang, Srinath Sridhar, Leonidas J. Guibas
  • ICCV 2021
  • [3D Human Pose Estimation]
[21-05-07] [paper130]
  • PixelTransformer: Sample Conditioned Signal Generation [pdf] [code] [annotated pdf]
  • Shubham Tulsiani, Abhinav Gupta
  • ICML 2021
  • [Neural Processes], [Transformers]
[21-04-29] [paper129]
  • Stiff Neural Ordinary Differential Equations [pdf] [annotated pdf]
  • Suyong Kim, Weiqi Ji, Sili Deng, Yingbo Ma, Christopher Rackauckas
  • 2021-03
  • [Neural ODEs]
[21-04-16] [paper128]
  • Learning Mesh-Based Simulation with Graph Networks [pdf] [code] [annotated pdf]
  • Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, Peter W. Battaglia
  • ICLR 2021
  • [Graph Neural Networks]
[21-04-09] [paper127]
  • Q-Learning in enormous action spaces via amortized approximate maximization [pdf] [annotated pdf]
  • Tom Van de Wiele, David Warde-Farley, Andriy Mnih, Volodymyr Mnih
  • 2020-01
  • [Reinforcement Learning]
[21-04-01] [paper126]
  • Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling [pdf] [code] [annotated pdf]
  • Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson
  • ICML 2021
  • [Uncertainty Estimation], [Ensembling]
[21-03-26] [paper125]
  • Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling [pdf] [annotated pdf]
  • Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio
  • NeurIPS 2020
  • [Energy-Based Models]
[21-03-19] [paper124]
  • Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability [pdf] [annotated pdf]
  • Jeremy M. Cohen, Simran Kaur, Yuanzhi Li, J. Zico Kolter, Ameet Talwalkar
  • ICLR 2021
  • [Theoretical Properties of Deep Learning]
[21-03-12] [paper123]
  • Unsupervised Learning of Visual Features by Contrasting Cluster Assignments [pdf] [code] [annotated pdf]
  • Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
  • NeurIPS 2020
  • [Representation Learning]
[21-03-04] [paper122]
  • Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations [pdf] [annotated pdf]
  • Winnie Xu, Ricky T.Q. Chen, Xuechen Li, David Duvenaud
  • AISTATS 2022
  • [Neural ODEs], [Uncertainty Estimation]
[21-02-26] [paper121]
  • Neural Relational Inference for Interacting Systems [pdf] [code] [annotated pdf]
  • Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, Richard Zemel
  • ICML 2018
  • [Miscellaneous]
[21-02-19] [paper120]
  • Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [pdf] [annotated pdf]
  • Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig
  • ICML 2021
  • [Representation Learning], [Vision-Language Models]
[21-02-12] [paper119]
  • On the Origin of Implicit Regularization in Stochastic Gradient Descent [pdf] [annotated pdf]
  • Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De
  • ICLR 2021
  • [Theoretical Properties of Deep Learning]
[21-02-05] [paper118]
  • Meta Pseudo Labels [pdf] [code] [annotated pdf]
  • Hieu Pham, Zihang Dai, Qizhe Xie, Minh-Thang Luong, Quoc V. Le
  • CVPR 2021
  • [Miscellaneous]
[21-01-29] [paper117]
  • No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models [pdf] [code] [annotated pdf]
  • Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud
  • ICLR 2021
  • [Energy-Based Models]
[21-01-22] [paper116]
  • Getting a CLUE: A Method for Explaining Uncertainty Estimates [pdf] [annotated pdf]
  • Javier Antorán, Umang Bhatt, Tameem Adel, Adrian Weller, José Miguel Hernández-Lobato
  • ICLR 2021
  • [Uncertainty Estimation]
[21-01-15] [paper115]
  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [pdf] [annotated pdf]
  • Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
  • ICML 2020
  • [Transformers]

Papers Read in 2020:

[20-12-18] [paper114]
  • Score-Based Generative Modeling through Stochastic Differential Equations [pdf] [code] [annotated pdf]
  • Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
  • ICLR 2021
  • [Diffusion Models]
[20-12-14] [paper113]
  • Dissecting Neural ODEs [pdf] [annotated pdf]
  • Stefano Massaroli, Michael Poli, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
  • NeurIPS 2020
  • [Neural ODEs]
[20-11-27] [paper112]
  • Rethinking Attention with Performers [pdf] [annotated pdf]
  • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller
  • ICLR 2021
  • [Transformers]
[20-11-23] [paper111]
  • Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images [pdf] [code] [annotated pdf]
  • Rewon Child
  • ICLR 2021
  • [VAEs]
[20-11-13] [paper110]
  • VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models [pdf] [annotated pdf]
  • Zhisheng Xiao, Karsten Kreis, Jan Kautz, Arash Vahdat
  • ICLR 2021
  • [Energy-Based Models], [VAEs]
[20-11-06] [paper109]
  • Approximate Inference Turns Deep Networks into Gaussian Processes [pdf] [annotated pdf]
  • Mohammad Emtiyaz Khan, Alexander Immer, Ehsan Abedi, Maciej Korzepa
  • NeurIPS 2019
  • [Theoretical Properties of Deep Learning]
[20-10-16] [paper108]
  • Implicit Gradient Regularization [pdf] [annotated pdf]
  • David G.T. Barrett, Benoit Dherin
  • ICLR 2021
  • [Theoretical Properties of Deep Learning]
Well-written and somewhat interesting paper. Quite interesting concept, makes some intuitive sense. Not sure if the experimental results were super convincing though.
[20-10-09] [paper107]
  • Satellite Conjunction Analysis and the False Confidence Theorem [pdf] [annotated pdf]
  • Michael Scott Balch, Ryan Martin, Scott Ferson
  • 2018-03
  • [Miscellaneous]
Quite well-written and somewhat interesting paper. Section 6 (Future and on-going work) is IMO the most interesting part of the paper ("We recognize the natural desire to balance the goal of preventing collisions against the goal of keeping manoeuvres at a reasonable level, and we further recognize that it may not be possible to achieve an acceptable balance between these two goals using present tracking resources"). To me, it seems like the difference between their proposed approach and the standard approach is mainly just a change in how to interpret very uncertain satellite trajectories. In the standard approach, two very uncertain trajectories are deemed NOT likely to collide (the two satellites could be basically anywhere, so what are the chances they will collide?) . In their approach, they seem to instead say: "the two satellites could be basically anywhere, so they COULD collide!". They argue their approach prioritize safety (which I guess they do, they will check more trajectories since they COULD collide), but it must also actually be useful in practice. I mean, the safest way to drive a car is to just remain stationary at all times, otherwise you risk colliding with something.
[20-09-24] [paper106]
  • Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness [pdf] [annotated pdf]
  • Jeremiah Zhe Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, Balaji Lakshminarayanan
  • NeurIPS 2020
  • [Uncertainty Estimation]
Interesting paper. Quite a heavy read (section 2 and 3). I didn't really spend enough time reading the paper to fully understand everything. The "distance awareness" concept intuitively makes a lot of sense, the example in Figure 1 is impressive, and the results on CIFAR10/100 are also encouraging. I did find section 3.1 quite confusing, Appendix A was definitely useful.
[20-09-21] [paper105]
  • Uncertainty Estimation Using a Single Deep Deterministic Neural Network [pdf] [code] [annotated pdf]
  • Joost van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal
  • ICML 2020
  • [Uncertainty Estimation]
Well-written and quite interesting paper. Interesting and neat idea, it definitely makes some intuitive sense. In the end though, I was not overly impressed. Once they used the more realistic setup on the CIFAR10 experiment (not using a third dataset to tune lambda), the proposed method was outperformed by ensembling (also using quite few networks). Yes, their method is more computationally efficient at test time (which is indeed very important in many applications), but it also seems quite a lot less convenient to train, involves setting a couple of important hyperparameters and so on. Interesting method and a step in the right direction though.
[20-09-11] [paper104]
  • Gated Linear Networks [pdf] [annotated pdf]
  • Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter
  • AAAI 2021
  • [Miscellaneous]
Quite well-written and somewhat interesting paper. Interesting paper in the sense that it was quite different compared to basically all other papers I've read. The proposed method seemed odd in the beginning, but eventually I think I understood it reasonably well. Still not quite sure how useful GLNs actually would be in practice though. It seems promising for online/continual learning applications, but only toy examples were considered in the paper? I don't think I understand the method well enough to properly assess its potential impact.
[20-09-04] [paper103]
  • Denoising Diffusion Probabilistic Models [pdf] [code] [annotated pdf]
  • Jonathan Ho, Ajay Jain, Pieter Abbeel
  • NeurIPS 2020
  • [Energy-Based Models]
Quite well-written and interesting paper. I do find the connection between "diffusion probabilistic models" and denoising score matching relatively interesting. Since I was not familiar with diffusion probabilistic models, the paper was however a quite heavy read, and the established connection didn't really improve my intuition (reading Generative Modeling by Estimating Gradients of the Data Distribution gave a better understanding of score matching, I think).
[20-06-18] [paper102]
  • Joint Training of Variational Auto-Encoder and Latent Energy-Based Model [pdf] [code] [annotated pdf]
  • Tian Han, Erik Nijkamp, Linqi Zhou, Bo Pang, Song-Chun Zhu, Ying Nian Wu
  • CVPR 2020
  • [VAEs], [Energy-Based Models]
Interesting and very well-written paper. Neat and interesting idea. The paper is well-written and provides a clear and quite intuitive description of EBMs, VAEs and other related work. The comment "Learning well-formed energy landscape remains a challenging problem, and our experience suggests that the learned energy function can be sensitive to the setting of hyper-parameters and within the training algorithm." is somewhat concerning.
[20-06-12] [paper101]
  • End-to-End Object Detection with Transformers [pdf] [code] [annotated pdf]
  • Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko
  • ECCV 2020
  • [Object Detection]
Interesting and well-written paper. Interesting and quite neat idea. Impressive results on object detection, and panoptic segmentation. It seems like the model requires longer training (500 vs 109 epochs?), and might be somewhat more difficult to train? Would be interesting to play around with the code. The "decoder output slot analysis" in Figure 7 is quite interesting. Would be interesting to further study what information has been captured in the object queries (which are just N vectors?) during training.
[20-06-05] [paper100]
  • Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors [pdf] [code] [annotated pdf]
  • Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-an Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran
  • ICML 2020
  • [Uncertainty Estimation], [Variational Inference]
Quite well-written and interesting paper. Extenstion of the BatchEnsemble paper. Still a quite neat and simple idea, and performance seems to be consistently improved compared to BatchEnsemble. Not quite clear to me if the model is much more difficult to implement or train. Seems quite promising overall.
[20-05-27] [paper99]
  • BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning [pdf] [code] [video] [annotated pdf]
  • Yeming Wen, Dustin Tran, Jimmy Ba
  • ICLR 2020
  • [Uncertainty Estimation], [Ensembling]
Quite interesting and well-written paper. Neat and quite simple idea. I am however not entirely sure how easy it is to implement, it must complicate things somewhat at least? Not overly impressed by the calibration/uncertainty experiments, the proposed method is actually quite significantly outperformed by standard ensembling. The decrease in test-time computational cost is however impressive.
[20-05-10] [paper98]
  • Stable Neural Flows [pdf] [annotated pdf]
  • Stefano Massaroli, Michael Poli, Michelangelo Bin, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
  • 2020-03
  • [Neural ODEs]
Somewhat well-written and interesting paper. Somewhat odd paper, I did not properly understand everything. It is not clear to me how the energy functional used here is connected to energy-based models.
[20-04-17] [paper97]
  • How Good is the Bayes Posterior in Deep Neural Networks Really? [pdf] [annotated pdf]
  • Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin
  • ICML 2020
  • [Uncertainty Estimation], [Stochastic Gradient MCMC]
Somewhat well-written and interesting paper. Quite odd paper. They refer to the appendix a whole lot, this work is not really suited for an 8 page paper IMO. They present a bunch of hypotheses, but I do not quite know what to do with the results in the end. The paper is rather inconclusive. I found it somewhat odd that they only evaluate the methods in terms of predictive performance, that is usually not the reason why people turn to Bayesian deep learning models.
[20-04-09] [paper96]
  • Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration [pdf] [code] [poster] [slides] [video] [annotated pdf]
  • Meelis Kull, Miquel Perello-Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, Peter Flach
  • NeurIPS 2019
  • [Uncertainty Estimation]
Well-written and quite interesting paper. Does a good job describing different notions of calibration (Definition 1 - 3). Classwise-ECE intuitively makes sense as a reasonable metric. I did not quite follow the paragraph on interpretability (or figure 2). The experiments seem extensive and rigorously conducted. So, matrix scaling (with ODIR regularization) outperforms Dirichlet calibration?
[20-04-03] [paper95]
  • Normalizing Flows: An Introduction and Review of Current Methods [pdf] [annotated pdf]
  • Ivan Kobyzev, Simon Prince, Marcus A. Brubaker
  • TPAMI, 2021
  • [Normalizing Flows]
Quite well-written and somewhat interesting paper. The paper is probably too short for it to actually fulfill the goal of "provide context and explanation to enable a reader to become familiar with the basics". It seems to me like one would have to have a pretty good understanding of normalizing flows, and various common variants, already beforehand to actually benefit much from this paper.
[20-03-27] [paper94]
  • Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning [pdf] [code] [annotated pdf]
  • Arsenii Ashukha, Alexander Lyzhov, Dmitry Molchanov, Dmitry Vetrov
  • ICLR 2020
  • [Uncertainty Estimation], [Ensembling], [Stochastic Gradient MCMC]
Quite well-written and interesting paper. The number of compared methods is quite impressive. The paper provides further evidence for what intuitively makes A LOT of sense: "Deep ensembles dominate other methods given a fixed test-time budget. The results indicate, in particular, that exploration of different modes in the loss landscape is crucial for good predictive performance". While deep ensembles might require a larger amount of total training time, they are extremely simple to train and separate ensemble members can be trained completely in parallel. Overall then, deep ensembles is a baseline that's extremely hard to beat IMO. Not convinced that "calibrated log-likelihood" is an ideal metric that addresses the described flaws of commonly used metrics. For example, "...especially calibrated log-likelihood is highly correlated with accuracy" does not seem ideal. Also, how would you generalize it to regression?
[20-03-26] [paper93]
  • Conservative Uncertainty Estimation By Fitting Prior Networks [pdf] [annotated pdf]
  • ICLR 2020
  • [Uncertainty Estimation]
Interesting and somewhat well-written paper. I found it quite difficult to actually understand the method at first, I think the authors could have done a better job describing it. I guess that "f" should be replaced with "f_i" in equation (2)? "...the obtained uncertainties are larger than ones arrived at by Bayesian inference.", I did not quite get this though. The estimated uncertainty is conservative w.r.t. the posterior process associated with the prior process (the prior process defined by randomly initializing neural networks), but only if this prior process can be assumed to be Gaussian? So, do we actually have any guarantees? I am not sure if the proposed method actually is any less "hand-wavy" than e.g. ensembling. The experimental results seem quite promising, but I do not agree that this is "an extensive empirical comparison" (only experiments on CIFAR-10).
[20-03-09] [paper92]
  • Batch Normalization Biases Deep Residual Networks Towards Shallow Paths [pdf] [annotated pdf]
  • Soham De, Samuel L. Smith
  • NeurIPS 2020
  • [Theoretical Properties of Deep Learning]
Quite well-written and somewhat interesting paper. The fact that SkipInit enabled training of very deep networks without batchNorm is quite interesting. I don't think I fully understood absolutely everything.
[20-02-28] [paper91]
  • Bayesian Deep Learning and a Probabilistic Perspective of Generalization [pdf] [code] [annotated pdf]
  • Andrew Gordon Wilson, Pavel Izmailov
  • NeurIPS 2020
  • [Uncertainty Estimation], [Ensembling]
Quite interesting and somewhat well-written paper. While I did find the paper quite interesting, I also found it somewhat confusing overall. The authors touch upon many different concepts, and the connection between them is not always very clear. It it not quite clear what the main selling point of the paper is. Comparing ensembling with MultiSWAG does not really seem fair to me, as MultiSWAG would be 20x slower at test-time. The fact that MultiSWA (note: MultiSWA, not MultiSWAG) seems to outperform ensembling quite consistently in their experiment is however quite interesting, it is not obvious to me why that should be the case.
[20-02-21] [paper90]
  • Convolutional Conditional Neural Processes [pdf] [code] [annotated pdf]
  • Jonathan Gordon, Wessel P. Bruinsma, Andrew Y. K. Foong, James Requeima, Yann Dubois, Richard E. Turner
  • ICLR 2020
  • [Neural Processes]
Quite interesting and well-written paper. Took me a pretty long time to read this paper, it is a quite heavy/dense read. I still do not quite get when this type of model could/should be used in practice, all experiments in the paper seem at least somewhat synthetic to me.
[20-02-18] [paper89]
  • Probabilistic 3D Multi-Object Tracking for Autonomous Driving [pdf] [code] [annotated pdf]
  • Hsu-kuang Chiu, Antonio Prioletti, Jie Li, Jeannette Bohg
  • ICRA 2021
  • [3D Multi-Object Tracking]
Interesting and well-written paper. They provide more details for the Kalman filter, which I appreciate. The design choices that differs compared to AB3DMOT all make sense I think (e.g., Mahalanobis distance instead of 3D-IoU as the affinity measure in the data association), but the gain in performance in Table 1 does not seem overly significant, at least not compared to the huge gain seen when switching to the MEGVII 3D detector in AB3DMOT.
[20-02-15] [paper88]
  • A Baseline for 3D Multi-Object Tracking [pdf] [code] [annotated pdf]
  • Xinshuo Weng, Kris Kitani
  • IROS 2020
  • [3D Multi-Object Tracking]
Well-written and interesting paper. Provides a neat introduction to 3D multi-object tracking in general, especially since the proposed method is intentionally straightforward and simple. It seems like a very good starting point. It is not clear to me exactly how the update step i in the Kalman filter is implemented? How did they set the covariance matrices? (I guess you could find this in the provided code though)
[20-02-14] [paper87]
Interesting and very well-written paper. I feel like I never quite know how significant improvements such as those in Table 2 actually are. What would you get if you instead used a more complex variational family (e.g. flow-based) and fitted that via standard KL, would the proposed method outperform also this baseline? And how would those compare in terms of computational cost?
[20-02-13] [paper86]
  • Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning [pdf] [annotated pdf]
  • Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, Steffen Udluft
  • ICML 2018
  • [Uncertainty Estimation], [Reinforcement Learning]
Well-written and quite interesting paper. Obviously similar to "Uncertainty Decomposition in Bayesian Neural Networks with Latent Variables", but contains more details, further experiments, and does a better job explaining some of the core concepts.
[20-02-08] [paper85]
  • Uncertainty Decomposition in Bayesian Neural Networks with Latent Variables [pdf] [annotated pdf]
  • Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, Steffen Udluft
  • 2017-06
  • [Uncertainty Estimation], [Reinforcement Learning]
Quite well-written and interesting paper. The toy problems illustrated in figure 2 and figure 3 are quite neat. I did however find it quite odd that they did not actually perform any active learning experiments here? Figure 4b is quite confusing with the "insert" for beta=0. I think it would have been better to show this entire figure somehow.
[20-01-31] [paper84]
  • Modelling heterogeneous distributions with an Uncountable Mixture of Asymmetric Laplacians [pdf] [code] [video] [annotated pdf]
  • Axel Brando, Jose A. Rodríguez-Serrano, Jordi Vitrià, Alberto Rubio
  • NeurIPS 2019
  • [Uncertainty Estimation]
Quite well-written and interesting paper. The connection to quantile regression is quite neat, but in the end, their loss in equation 6 just corresponds to a latent variable model (with a uniform distribution for the latent variable tau) trained using straightforward Monte Carlo sampling. I am definitely not impressed with the experiments. They only consider very simple problems, y is always 1D, and they only compare with self-implemented baselines. The results are IMO not overly conclusive either, the single Laplacian model is e.g. better calibrated than their proposed method in Figure 3.
[20-01-24] [paper83]
  • A Primal-Dual link between GANs and Autoencoders [pdf] [poster] [annotated pdf]
  • Hisham Husain, Richard Nock, Robert C. Williamson
  • NeurIPS 2019
  • [Theoretical Properties of Deep Learning]
Somewhat interesting and well-written paper. Very theoretical paper compared to what I usually read. I must admit that I did not really understand that much.
[20-01-20] [paper82]
  • A Connection Between Score Matching and Denoising Autoencoders [pdf] [annotated pdf]
  • Pascal Vincent
  • Neural Computation, 2011
  • [Energy-Based Models]
Quite well-written and interesting paper. The original paper for "denoising score matching", which it does a good job explaining. It also provides some improved understanding of score matching in general, and provides some quite interesting references for further reading.
[20-01-17] [paper81]
  • Multiplicative Interactions and Where to Find Them [pdf] [annotated pdf]
  • Siddhant M. Jayakumar, Jacob Menick, Wojciech M. Czarnecki, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu
  • ICLR 2020
  • [Theoretical Properties of Deep Learning], [Sequence Modeling]
Well-written and somewhat interesting paper. I had some trouble properly understanding everything. I am however not overly impressed by the choice of experiments. The experiment in figure 2 seems somewhat biased in their favor? I think it would be a more fair comparison if the number of layers in the MLP was allowed to be increased (since this would increase its expressivity)?
[20-01-16] [paper80]
  • Estimation of Non-Normalized Statistical Models by Score Matching [pdf] [annotated pdf]
  • Aapo Hyvärinen
  • JMLR, 2005
  • [Energy-Based Models]
Interesting and very well-written paper. The original paper for score matching. Somewhat dated of course, but still interesting and very well-written. It provides a really neat introduction to score matching! I did not read section 3 super carefully, as the examples seemed quite dated.
[20-01-15] [paper79]
  • Generative Modeling by Estimating Gradients of the Data Distribution [pdf] [code] [poster] [annotated pdf]
  • Yang Song, Stefano Ermon
  • NeurIPS 2019
  • [Energy-Based Models], [Diffusion Models]
Well-written and quite interesting paper. The examples in section 3 are neat and quite pedagogical. I would probably need to read a couple of papers covering the basics of score matching, and then come back and read this paper again to fully appreciate it. Like they write, their training method could be used to train an EBM (by replacing their score network with the gradient of the energy in the EBM). This would then be just like "denoising score matching", but combining multiple noise levels in a combined objective? I suppose that their annealed Langevin approach could also be used to sample from an EBM. This does however seem very computationally expensive, as they run T=100 steps of Langevin dynamics for each of the L=10 noise levels?
[20-01-14] [paper78]
  • Noise-Contrastive Estimation: A New Estimation Principle for Unnormalized Statistical Models [pdf] [annotated pdf]
  • Michael Gutmann, Aapo Hyvärinen
  • AISTATS 2010
  • [Energy-Based Models]
Well-written and interesting paper. The original paper for Noise Contrastive Estimation (NCE). Somewhat dated of course, but still interesting and well-written. Provides a quite neat introduction to NCE. They use a VERY simple problem to compare the performance of NCE to MLE with importance sampling, contrastive divergence (CD) and score-matching (and MLE, which gives the reference performance. MLE requires an analytical expression for the normalizing constant). CD has the best performance, but NCE is apparently more computationally efficient. I do not think such a simple problem say too much though. They then also apply NCE on a (by today's standards) very simple unsupervised image modeling problem. It seems to perform as expected.
[20-01-10] [paper77]
  • Z-Forcing: Training Stochastic Recurrent Networks [pdf] [code] [annotated pdf]
  • Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, Yoshua Bengio
  • NeurIPS 2017
  • [VAEs], [Sequence Modeling]
Quite interesting and well-written paper. Seems like Marco Fraccaro's thesis covers most of this paper, overall the proposed architecture is still quite similar to VRNN/SRNN both in design and performance. The auxiliary cost seems to improve performance quite consistently, but nothing revolutionary. It is not quite clear to me if the proposed architecture is more or less difficult / computationally expensive to train than SRNN.
[20-01-08] [paper76]
  • Practical Deep Learning with Bayesian Principles [pdf] [code] [annotated pdf]
  • Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan
  • NeurIPS 2019
  • [Uncertainty Estimation], [Variational Inference]
Interesting and quite well-written paper. To me, this mainly seems like a more practically useful alternative to Bayes by Backprop, scaling up variational inference to e.g. ResNet on ImageNet. The variational posterior approximation q is still just a diagonal Gaussian. I still do not fully understand natural-gradient variational inference. Only image classification is considered. It seems to perform ish as well as Adam in terms of accuracy (although it is 2-5 times slower to train), while quite consistently performing better in terms of calibration (ECE). The authors also compare with MC-dropout in terms of quality of the predictive probabilities, but these results are IMO not very conclusive.
[20-01-06] [paper75]
  • Maximum Entropy Generators for Energy-Based Models [pdf] [code] [annotated pdf]
  • Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio
  • 2019-01
  • [Energy-Based Models]
Quite well-written and interesting paper. The general idea, learning an energy-based model p_theta by drawing samples from an approximating distribution (that minimizes the KL divergence w.r.t p_theta) instead of generating approximate samples from p_theta using MCMC, is interesting and intuitively makes quite a lot of sense IMO. Since the paper was written prior to the recent work on MCMC-based learning (Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model, Implicit Generation and Generalization in Energy-Based Models, On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models), it is however difficult to know how well this method actually would stack up in practice.

Papers Read in 2019:

[19-12-22] [paper74]
  • Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One [pdf] [annotated pdf]
  • Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky
  • ICLR 2020
  • [Energy-Based Models]
Interesting and very well-written paper. I recommend actually going through the appendix as well, as it contains some interesting details. The idea to create an energy-based model for p(x) by marginalizing out y is really neat and makes a lot of sense in this classification setting (in which this corresponds to just summing the logits for all K classes). This EBM for p(x) is then trained using the MCMC-based ML learning method employed in other recent work. Simultaneously, a model for p(y|x) is also trained using the standard approach (softmax / cross entropy), thus training p(x, y) = p(y | x)*p(x). I am however not overly impressed/convinced by their experimental results. All experiments are conducted on relatively small and "toy-ish" datasets (CIFAR10, CIFAR100, SVHN etc), but they still seemed to have experienced A LOT of problems with training instability. Would be interesting to see results e.g. for semantic segmentation on Cityscapes (a more "real-world" task and dataset). Moreover, like the authors also point out themselves, training p(x) using SGLD-based sampling with L steps (they mainly use L=20 steps, but sometimes also have to restart training with L=40 to mitigate instability issues) basically makes training L times slower. I am just not sure if the empirically observed improvements are strong/significant enough to justify this computational overhead.
[19-12-20] [paper73]
  • Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency [pdf] [annotated pdf]
  • Zhuang Ma, Michael Collins
  • EMNLP 2018
  • [Energy-Based Models], [NLP]
Interesting and quite well-written paper. Quite theoretical paper with a bunch of proofs. Interesting to see NCE applied specifically to supervised problems (modelling p(y | x)).
[19-12-20] [paper72]
  • Flow Contrastive Estimation of Energy-Based Models [pdf] [annotated pdf]
  • Ruiqi Gao, Erik Nijkamp, Diederik P. Kingma, Zhen Xu, Andrew M. Dai, Ying Nian Wu
  • CVPR 2020
  • [Energy-Based Models], [Normalizing Flows]
Well-written and interesting paper. Provides a quite interesting comparison of EBMs and flow-based models in the introduction ("By choosing a flow model, one is making the assumption that the true data distribution is one that is in principle simple to sample from, and is computationally efficient to normalize."). Provides a pretty good introduction to Noise Contrastive Estimation (NCE). The proposed method is interesting and intuitively makes sense. The experimental results are not overly strong/decisive IMO, but that seems to be true for most papers in this area.
[19-12-19] [paper71]
  • On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models [pdf] [code] [annotated pdf]
  • Erik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, Ying Nian Wu
  • AAAI 2020
  • [Energy-Based Models]
Well-written and very interesting paper, a recommended read! Provides a good review and categorization of previous papers, how they differ from each other etc. Provides a solid theoretical understanding of MCMC-based ML learning of EBMs, with quite a few really interesting (and seemingly useful) insights.
[19-12-15] [paper70]
Interesting, but not overly well-written paper. Very similar to "Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model", but not as clearly written IMO. I personally find the experiments section somewhat unclear, but I'm also not too familiar with how generative image models usually are evaluated. It sounds like the training was quite unstable without the regularization described in section 3.3?
[19-12-14] [paper69]
  • Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model [pdf] [poster] [annotated pdf]
  • Erik Nijkamp, Mitch Hill, Song-Chun Zhu, Ying Nian Wu
  • NeurIPS 2019
  • [Energy-Based Models]
Well-written and interesting paper. Seeing the non-convergent, short-run MCMC as a learned generator/flow model is a really neat and interesting idea. I find figure 9 in the appendix interesting. It is somewhat difficult for me to judge how impressive the experimental results are, I do not really know how strong the baselines are or how significant the improvements are. I found section 4 difficult to follow.
[19-12-13] [paper68]
  • A Tutorial on Energy-Based Learning [pdf] [annotated pdf]
  • Yann LeCun, Sumit Chopra, Raia Hadsell, Marc Aurelio Ranzato, Fu Jie Huang
  • 2006-08
  • [Energy-Based Models]
Somewhat dated, but well-written and still quite interesting paper. A good introduction to enegy-based models (EBMs).
[19-11-29] [paper67]
Interesting and very well-written paper. A recommended read, even if you just want to gain an improved understanding of state-of-the-art RL in general and the PlaNet paper ("Learning Latent Dynamics for Planning from Pixels") in particular. Very similar to PlaNet, the difference is that they here learn an actor-critic model on-top of the learned dynamics, instead of doing planning using MPC. The improvement over PlaNet, in terms of experimental results, seems significant. Since they didn't actually use the latent overshooting in the PlaNet paper, I assume they don't use it here either?
[19-11-26] [paper66]
  • Deep Latent Variable Models for Sequential Data [pdf] [annotated pdf]
  • Marco Fraccaro
  • PhD Thesis, 2018
  • [Sequence Modeling], [VAEs]
Very well-written, VERY useful. VERY good general introduction to latent variable models, amortized variational inference, VAEs etc. VERY good introduction to various deep latent variable models for sequential data: deep state-space models, VAE-RNNs, VRNNs, SRNNs etc.
[19-11-22] [paper65]
  • Learning Latent Dynamics for Planning from Pixels [pdf] [code] [blog] [annotated pdf]
  • Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson
  • ICML 2019
  • [Reinforcement Learning]
Well-written and interesting paper! Very good introduction to the entire field of model-based RL I feel like. Seems quite odd to me that they spend an entire page on "Latent overshooting", but then don't actually use it for their RSSM model? It's not entirely clear to me how this approach differs from "Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models" (PETS), apart from the fact that PETS actually has access to the state (so, they don't need to apply VAE stuff to construct a latent state representation). The provided code seems like it could be very useful. Is it easy to use? The model seems to train on just 1 GPU in just 1 day anyway, which is good.
[19-10-28] [paper64]
  • Learning nonlinear state-space models using deep autoencoders [pdf] [annotated pdf]
  • Daniele Masti, Alberto Bemporad
  • CDC 2018
  • [Sequence Modeling]
Well-written and interesting paper. Really interesting approach actually, although somewhat confusing at first read since the method seems to involve quite a few different components. I would like to try to implement this myself and apply it to some simple synthetic example, I think that would significantly improve my understanding of the method and help me better judge its potential.
[19-10-18] [paper63]
  • Improving Variational Inference with Inverse Autoregressive Flow [pdf] [code] [annotated pdf]
  • Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
  • NeurIPS 2016
  • [Normalizing Flows]
Interesting and very well-written paper. Does a very good job introducing the general problem setup, normalizing flows, autoregressive models etc. Definitely a good introductory paper, it straightened out a few things I found confusing in Variational Inference with Normalizing Flows. The experimental results are however not very strong nor particularly extensive, IMO.
[19-10-11] [paper62]
  • Variational Inference with Normalizing Flows [pdf] [annotated pdf]
  • Danilo Jimenez Rezende, Shakir Mohamed
  • ICML 2015
  • [Normalizing Flows]
Well-written and quite interesting paper. I was initially somewhat confused by this paper, as I was expecting it to deal with variational inference for approximate Bayesian inference. Seems like a good starting point for flow-based methods, I will continue reading-up on more recent/advanced techniques.
[19-10-04] [paper61]
  • Trellis Networks for Sequence Modeling [pdf] [code] [annotated pdf]
  • Shaojie Bai, J. Zico Kolter, Vladlen Koltun
  • ICLR 2019
  • [Sequence Modeling]
Well-written and quite interesting paper. Interesting model, quite neat indeed how it can be seen as a bridge between RNNs and TCNs. The fact that they share weights across all network layers intuitively seems quite odd to me, but I guess it stems from the construction based on M-truncated RNNs? It is not obvious to me why they chose to use a gated activation function based on the LSTM cell, would using a "normal" activation function (e.g. ReLu) result in a significant drop in performance?
[19-07-11] [paper60]
  • Part-A^2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud [pdf] [annotated pdf]
  • Shaoshuai Shi, Zhe Wang, Xiaogang Wang, Hongsheng Li
  • TPAMI, 2020
  • [3D Object Detection]
Interesting and quite well-written paper. Same main authors as for the PointRCNN paper. The idea to use the intra-object point locations provided by the ground truth 3D bboxes as extra supervision makes a lot of sense, clever! In this paper, the bin-based losses from PointRCNN are NOT used.
[19-07-10] [paper59]
  • PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud [pdf] [code] [annotated pdf]
  • Shaoshuai Shi, Xiaogang Wang, Hongsheng Li
  • CVPR 2019
  • [3D Object Detection]
Interesting and quite well-written paper. I think I like this approach to 3DOD. Directly processing the point cloud and generating proposals by classifying each point as foreground/background makes sense, is quite simple and seems to perform well. Their bin-based regression losses seem somewhat strange to me though.
[19-07-03] [paper58]
Quite well-written and interesting paper. Multiple objects (of the same class) having the same (low-resolution) center point is apparently not very common in MS-COCO, but is that true also in real life in automotive applications? And in these cases, would only detecting one of these objects be a major issue? I do not really know, I find it somewhat difficult to even visualize cases where multiple objects would share center points. It is an interesting point that this method essentially corresponds to anchor-based one-stage detectors, but with just one shape-agnostic anchor. Perhaps having multiple anchors per location is not super important then?
[19-06-12] [paper57]
  • ATOM: Accurate Tracking by Overlap Maximization [pdf] [code] [annotated pdf]
  • Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg
  • CVPR 2019
  • [Visual Tracking]
Well-written and interesting paper. They employ the idea of IoU-Net in order to perform target estimation and thus improve tracking accuracy. Interesting that this idea seems to work well also in this case. The paper also gives a quite comprehensive introduction to visual object tracking in general, making the proposed method relatively easy to understand also for someone new to the field.
[19-06-12] [paper56]
Interesting idea that intuitively makes a lot of sense, neat to see that it actually seems to work quite well. While the predicted IoU is a measure of "localization confidence", it is not an ideal measure of localization uncertainty. Having an estimated variance each for (x, y, w, h) would provide more information.
[19-06-05] [paper55]
  • LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [pdf] [annotated pdf]
  • Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington
  • CVPR 2019
  • [Uncertainty Estimation], [3D Object Detection]
Quite well-written and interesting paper. It was however quite difficult to fully grasp their proposed method. I struggled to understand some steps of their method, it is e.g. not completely clear to me why both mean shift clustering and adaptive NMS has to be performed. I find the used probabilistic model somewhat strange. They say that "our proposed method is the first to capture the uncertainty of a detection by modeling the distribution of bounding box corners", but actually they just predict a single variance value per bounding box (at least when K=1, which is the case for pedestrians and bikes)? Overall, the method seems rather complicated. It is probably not the streamlined and intuitive 3DOD architecture I have been looking for.
[19-05-29] [paper54]
  • Attention Is All You Need [pdf] [annotated pdf]
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
  • NeurIPS 2017
  • [Transformers]
Quite well-written paper. The proposed architecture was explained in a quite clear way, even for someone who is not super familiar with the field. Not too related to my particular research, but still a quite interesting paper. I also think that the proposed architecture, the Transformer, has been extensively used in subsequent state-of-the-art models (I remember seeing it mentioned in a few different papers)? This paper is thus probably a good background read for those interested in language modeling, translation etc.
[19-04-05] [paper53]
  • Stochastic Gradient Descent as Approximate Bayesian Inference [pdf] [annotated pdf]
  • Stephan Mandt, Matthew D. Hoffman, David M. Blei
  • JMLR, 2017
  • [Uncertainty Estimation], [Stochastic Gradient MCMC]
Very well-written and quite interesting paper. Good background material on SGD, SG-MCMC and so on. It is however a relatively long paper (26 pages). It makes intuitive sense that running SGD with a constant learning rate will result in a sequence of iterates which first move toward a local minimum and then "bounces around" its vicinity. And, that this "bouncing around" thus should correspond to samples from some kind of stationary distribution, which depends on the learning rate, batch size and other hyper parameters. Trying to find the hyper parameters which minimize the KL divergence between this stationary distribution and the true posterior then seems like a neat idea. I am however not quite sure how reasonable the made assumptions are in more complex real-world problems. I am thus not quite sure how useful the specific proposed methods/formulas actually are.
[19-03-29] [paper52]
  • Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling [pdf] [annotated pdf]
  • Jacob Menick, Nal Kalchbrenner
  • ICLR 2019
  • [Miscellaneous]
Quite interesting paper. I do however think that the proposed method could be more clearly explained, the paper actually left me somewhat confused (I am however not particularly familiar with this specific sub-field). For e.g. the images in Figure 5, it is not clear to me how these are actually generated? Do you take a random image from ImageNet, choose a random slice of this image and then generate the image by size- and depth-upscaling? For training, I guess that they (for each image in the dataset) choose a random image slice, condition on the previous true image slices (according to their ordering), predict/generate the next image slice and compare this with the ground truth to compute an unbiased estimator of the NLL loss. But what do they do during evaluation? I.e., how are the NLL scores in Table 1-3 computed? The experimental results do not seem overly impressive/convincing to me.
[19-03-15] [paper51]
  • A recurrent neural network without chaos [pdf] [annotated pdf]
  • Thomas Laurent, James von Brecht
  • ICLR 2017
  • [Sequence Modeling]
Quite well-written and somewhat interesting paper. I note that their LSTM implementation consistently outperformed their proposed CFN, albeit with a small margin. Would be interesting to know if this architecture has been studied further since the release of this paper, can it match LSTM performance also on more complicated tasks?
[19-03-11] [paper50]
  • Auto-Encoding Variational Bayes [pdf]
  • Diederik P Kingma, Max Welling
  • ICLR 2014
  • [VAEs]
Quite interesting paper.
[19-03-04] [paper49]
  • Coupled Variational Bayes via Optimization Embedding [pdf] [poster] [code] [annotated pdf]
  • Bo Dai, Hanjun Dai, Niao He, Weiyang Liu, Zhen Liu, Jianshu Chen, Lin Xiao, Le Song
  • NeurIPS 2018
  • [VAEs]
Somewhat well-written and interesting paper. It was however a quite heavy read. Also, I should definitely have done some more background reading on VAEs etc. (e.g., "Auto-encoding variational bayes", "Variational inference with normalizing flows", "Improved variational inference with inverse autoregressive flow") before trying to read this paper. I did not properly understand their proposed method, I found section 3 quite difficult to follow. Definitely not clear to me how one actually would implement this in practice. I am not sure how strong the experimental results actually are, it is not completely obvious to me that their proposed method actually outperforms the baselines in a significant way.
[19-03-01] [paper48]
  • Language Models are Unsupervised Multitask Learners [pdf] [blog post] [code] [annotated pdf]
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
  • 2019-02
  • [NLP]
Interesting and quite well-written paper. There are not that many technical details, one would probably have to read previous work for that. One probably needs to be somewhat familiar with NLP. Very impressive work from an infrastructure perspective. Just as context to their model with 1.5 billion parameters: a ResNet101 has 45 million parameters, which takes up 180 Mb when saved to disk. DeepLabV3 for semantic segmentation has roughly 75 million parameters. This has become a pretty hyped paper, and I agree that the work is impressive, but it still seems to me like their model is performing roughly as one would expect. It performs really well on general language modeling tasks, which is exactly what it was trained for (although it was not fine-tuned on the specific benchmark datasets), but performs rather poorly on translation and question-answering. The fact that the model has been able to learn some basic translation in this fully unsupervised setting is still quite impressive and interesting though.
[19-02-27] [paper47]
  • Predictive Uncertainty Estimation via Prior Networks [pdf] [annotated pdf]
  • Andrey Malinin, Mark Gales
  • NeurIPS 2018
  • [Uncertainty Estimation]
Interesting and very well-written paper. It would be interesting to combine this approach with approximate Bayesian modeling (e.g. ensembling). They state in the very last sentence of the paper that their approach needs to be extended also to regression. How would you actually do that? It is not immediately obvious to me. Seems like a quite interesting problem. I would have liked to see a comparison with ensembling as well and not just MC-Dropout (ensembling usually performs better in my experience). Obtaining out-of-distribution samples to train on is probably not at all trivial actually. Yes, this could in theory be any unlabeled data, but how do you know what region of the input image space is covered by your training data? Also, I guess the model could still become over-confident if fed inputs which are far from both the in-distribution and out-of-distribution samples the model has seen during training? So, you really ought to estimate epistemic uncertainty using Bayesian modeling as well?
[19-02-25] [paper46]
  • Evaluating model calibration in classification [pdf] [code] [annotated pdf]
  • Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas B. Schön
  • AISTATS 2019
  • [Uncertainty Estimation]
Well-written and interesting paper. It is however a quite theoretical paper, and I personally found it difficult to follow certain sections. It also uses notation that I am not fully familiar with. This work seems important, and I will try to keep it in mind in the future. It is however still not quite clear to me what one should do in practice to evaluate and compare calibration of large models on large-scale datasets in a more rigorous way. I will probably need to read the paper again.
[19-02-22] [paper45]
  • Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks [pdf] [annotated pdf]
  • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang
  • ICML 2019
  • [Theoretical Properties of Deep Learning]
Somewhat interesting paper that is quite theoretical. I found it to be a rather heavy read, and I did not fully understand absolutely everything. I did not quite get why they fix the weights a_i of the second layer? They use gradient descent (GD) instead of SGD, could you obtain similar results also for SGD? I think that I probably did not understand the paper well enough to really be able to judge how significant/interesting the presented results actually are. How restrictive are their assumptions? In what way / to what extent could these results be of practical use in real-world applications? The reference section seems like a pretty neat resource for previous work on characterization of NN loss landscapes etc.
[19-02-17] [paper44]
  • Visualizing the Loss Landscape of Neural Nets [pdf] [code] [annotated pdf]
  • Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein
  • NeurIPS 2018
  • [Miscellaneous]
Interesting and quite well-written paper. I think that the paper is a good introduction to methods for NN loss function visualization and previous work aiming to understand the corresponding optimization problem. They cite a number of papers which seem interesting, I will probably try and read a couple of those in the future. It would be interesting to apply their visualization method to some of my own problems, I will probably look more carefully at their code at some point. It is however not immediately obvious to me how to apply their "filter normalization" to e.g. an MLP network.
[19-02-14] [paper43]
  • A Simple Baseline for Bayesian Uncertainty in Deep Learning [pdf] [code] [annotated pdf]
  • Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, Andrew Gordon Wilson
  • NeurIPS 2019
  • [Uncertainty Estimation]
Quite well-written and interesting paper. I am not quite sure how I feel about the proposed method though. It seems somewhat odd to me to first fit a Gaussian approximation to samples from the SGD trajectory and then draw new samples from this Gaussian to use for Bayesian model averaging. Why not just directly use some of those SGD samples for model averaging instead? Am I missing something here? Also, in SG-MCMC we have to (essentially) add Gaussian noise to the SGD update and decay the learning rate to obtain samples from the true posterior in the infinite limit. I am thus somewhat confused by the theoretical analysis in this paper. I would have liked to see a comparison with basic ensembling. In section C.5 they write that SWAG usually performs somewhat worse than deep ensembles, but that this is OK since SWAG is much faster to train. "Thus SWAG will be particularly valuable when training time is limited, but inference time may not be.", when is this actually true? It makes intuitive sense that this method will generate parameter samples with some variance (instead of just a single point estimate) and thus also provide some kind of estimate of the model uncertainty. However, it is not really theoretically grounded in any significant way, at least not more than e.g. ensembling. The most interesting experiment for which they provide reliability diagrams is IMO CIFAR-10 --> STL-10. I note that even the best model still is quite significantly over-confident in this case. I really liked their version of reliability diagrams. Makes it easy to compare multiple methods in a single plot.
[19-02-13] [paper42]
  • Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning [pdf] [code] [annotated pdf]
  • Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon Wilson
  • ICLR 2020
  • [Uncertainty Estimation], [Stochastic Gradient MCMC]
Well-written and VERY interesting paper (I did find a few typos though). Very interesting method. I have however done some experiments using their code, and I find that samples from the same cycle produce very similar predictions. Thus I am somewhat skeptical that the method actually is significantly better than snapshot-ensembling, or just regular ensembling for that matter. The results in table 3 do seem to suggest that there is something to gain from collecting more than just one sample per cycle though, right? I need to do more experiments and investigate this further. Must admit that I struggled to understand much of section 4, I am thus not really sure how impressive their theoretical results actually are.
[19-02-12] [paper41]
  • Bayesian Dark Knowledge [pdf] [annotated pdf]
  • Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling
  • NeurIPS 2015
  • [Uncertainty Estimation]
Well-written and quite interesting paper. The presented idea is something that has crossed my mind a couple of times, and it is indeed an attractive concept, but I have always ended up sort of rejecting the idea, since it sort of seems like it should not work. Take figure 2 for the toy 1d regression problem. It seems pretty obvious to me that one should be able to distill the SGLD predictive posterior into a Gaussian with input-dependent variance, but what about x values that lie outside of the shown interval? Will the model not become over-confident in that region anyway? To me it seems like this method basically only can be used to somewhat extend the region in which the model is appropriately confident. As we move away from the training data, I still think that the model will start to become over-confident at some point? However, perhaps this is still actually useful? Since the "ground truth labels" are generated by just running our SGLD model on any input, I guess we might be able to extend this region of appropriate confidence quite significantly?
[19-02-07] [paper40]
  • Noisy Natural Gradient as Variational Inference [pdf] [video] [code] [annotated pdf]
  • Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse
  • ICML 2018
  • [Uncertainty Estimation], [Variational Inference]
Well-written and somewhat interesting paper. Quite a heavy read for me as I am not particularly familiar with natural gradient optimization methods or K-FAC. I get that not being restricted to just fully-factorized Gaussian variational posteriors is something that could improve performance, but is it actually practical for properly large networks? They mention previous work on extending variational methods to non-fully-factorized posteriors, but I found them quite difficult to compare. It is not clear to me whether or not the presented method actually is a clear improvement (either in terms of performance or practicality).
[19-02-06] [paper39]
  • Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks [pdf] [annotated pdf]
  • José Miguel Hernández-Lobato, Ryan P. Adams
  • ICML 2015
  • [Uncertainty Estimation]
Quite well-written and interesting paper. I did however find it somewhat difficult to fully understand the presented method. I find it difficult to compare this method (PBP, which is an Assumed Density Filtering (ADF) method) with Variational Inference (VI) using a diagonal Gaussian as q. The authors seem to argue that their method is superior because it only employs one stochastic approximation (sub-sampling the data), whereas VI employs two (in VI one also approximates an expectation using Monte Carlo samples). In that case I guess that PBP should be very similar to Deterministic Variational Inference for Robust Bayesian Neural Networks? I guess it would be quite difficult to extend this method to CNNs?
[19-02-05] [paper38]
  • Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models [pdf] [poster] [video] [code] [annotated pdf]
  • Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine
  • NeurIPS 2018
  • [Uncertainty Estimation], [Ensembling], [Reinforcement Learning]
General comments on paper quality:
Well-written and very interesting paper. It applies relatively common methods for uncertainty estimation (ensemble of probabilistic NNs) to an interesting problem in RL and shows promising results.


Paper overview:
The authors present a model-based RL algorithm called Probabilistic Ensembles with Trajectory Sampling (PETS), that (at least roughly) matches the asymptotic performance of SOTA model-free algorithms on four control tasks, while requiring significantly fewer samples (model-based algorithms generally have much better sample efficiency, but worse asymptotic performance than the best model-free algorithms).

They use an ensemble of probabilistic NNs (Probabilistic Ensemble, PE) to learn a probabilistic dynamics model, p_theta(s_t+1 | s_t, a_t), where s_t is the state and a_t is the taken action at time t.

A probabilistic NN outputs the parameters of a probability distribution, in this case by outputting the mean, mu(s_t, a_t), and diagonal covariance matrix, SIGMA(s_t, a_t), of a Gaussian, enabling estimation of aleatoric (data) uncertainty. To also estimate epistemic (model) uncertainty, they train an ensemble of B probabilistic NNs.

The B ensemble models are trained on separate (but overlapping) datasets: for each ensemble model, a dataset is created by drawing N examples with replacement from the original dataset D (which also contains N examples).

The B ensemble models are then used in the trajectory sampling step, where P state particles s_t_p are propagated forward in time by iteratively sampling s_t+1_p ~ p_theta_b(s_t+1_p | s_t_p, a_t)). I.e., each ensemble model outputs a distribution, and they sample particles from these B distributions. This results in P trajectory samples, s_t:t+T_p (which we hope approximate the true distribution over trajectories s_t:t+T). The authors used P=20, B=5 in all their experiments.

Based on these P state trajectory samples, MPC is finally used to compute the next action a_t.


Comments:
Interesting method. Should be possible to benchmark various uncertainty estimation techniques using their setup, just like they compare probabilistic/deterministic ensembles and probabilistic networks. I found it quite interesting that a single probabilistic network (at least somewhat) outperformed a deterministic ensemble (perhaps this would change with a larger ensemble size though?).

Do you actually need to perform the bootstrap procedure when training the ensemble, or would you get the same performance by simply training all B models on the same dataset D?

I struggle somewhat to understand their method for lower/upper bounding the outputted variance during testing (appendix A.1). Do you actually need this? Also, I do not quite get the lines of code. Are max_logvar and min_logvar variables?
[19-01-28] [paper37]
  • Practical Variational Inference for Neural Networks [pdf] [annotated pdf]
  • Alex Graves
  • NeurIPS 2011
  • [Uncertainty Estimation], [Variational Inference]
Reasonably well-written and somewhat interesting paper. The paper seems quite dated compared to "Weight Uncertainty in Neural Networks". I also found it significantly more difficult to read and understand than "Weight Uncertainty in Neural Networks". One can probably just skip reading this paper, for an introduction to variational methods applied to neural networks it is better to read "Weight Uncertainty in Neural Networks" instead.
[19-01-27] [paper36]
  • Weight Uncertainty in Neural Networks [pdf] [annotated pdf]
  • Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra
  • ICML 2015
  • [Uncertainty Estimation], [Variational Inference]
General comments on paper quality:
Well-written and interesting paper. I am not particularly familiar with variational methods, but still found the paper quite easy to read and understand.


Comments:
Seems like a good starting point for learning about variational methods applied to neural networks. The theory is presented in a clear way. The presented method also seems fairly straightforward to implement.

They mainly reference "Keeping Neural Networks Simple by Minimizing the Description Length of the Weights" and "Practical Variational Inference for Neural Networks" as relevant previous work.

In equation (2), one would have to run the model on the data for multiple weight samples? Seems quite computationally expensive?

Using a diagonal Gaussian for the variational posterior, I wonder how much of an approximation that actually is? Is the true posterior e.g. very likely to be multi-modal?

The MNIST models are only evaluated in terms of accuracy. The regression experiment is quite neat (good to see that the uncertainty increases away from the training data), but they provide very little details. I find it difficult to draw any real conclusions from the Bandits experiment.
[19-01-26] [paper35]
  • Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification [pdf] [poster] [annotated pdf]
  • Chunyuan Li, Andrew Stevens, Changyou Chen, Yunchen Pu, Zhe Gan, Lawrence Carin
  • CVPR 2016
  • [Uncertainty Estimation], [Stochastic Gradient MCMC]
Quite interesting and well-written paper. Quite an easy read compared to many other SG-MCMC papers. I find it weird that they only evaluate their models in terms of accuracy. It is of course a good thing that SG-MCMC methods seem to compare favorably with optimization approaches, but I would have been more interested in an evaluation of some kind of uncertainty estimate (e.g. the sample variance). The studied applications are not overly interesting, the paper seems somewhat dated in that regard.
[19-01-25] [paper34]
  • Meta-Learning For Stochastic Gradient MCMC [pdf] [code] [slides] [annotated pdf]
  • Wenbo Gong, Yingzhen Li, José Miguel Hernández-Lobato
  • ICLR 2019
  • [Uncertainty Estimation], [Stochastic Gradient MCMC]
Fairly interesting paper.
[19-01-25] [paper33]
  • A Complete Recipe for Stochastic Gradient MCMC [pdf] [annotated pdf]
  • Yi-An Ma, Tianqi Chen, Emily B. Fox
  • NeurIPS 2015
  • [Uncertainty Estimation], [Stochastic Gradient MCMC]
General comments on paper quality:
Well-written and very interesting paper. After reading the papers on SGLD and SGHMC, this paper ties the theory together and provides a general framework for SG-MCMC.


Paper overview:
The authors present a general framework and recipe for constructing MCMC and SG-MCMC samplers based on continuous Markov processes. The framework entails specifying a stochastic differential equation (SDE) by two matrices, D(z) (positive semi-definite) and Q(z) (skew-symmetric). Here, z = (theta, r), where theta are the model parameters and r are auxiliary variables (r corresponds to the momentum variables in Hamiltonian MC).

Importantly, the presented framework is complete, meaning that all continuous Markov processes with the target distribution as its stationary distribution (i.e., all continuous Markov processes which provide samples from the target distribution) correspond to a specific choice of the matrices D(z), Q(z). Every choice of D(z), Q(z) also specifies a continuous Markov process with the target distribution as its stationary distribution.

The authors show how previous SG-MCMC methods (including SGLD, SGRLD and SGHMC) can be casted to their framework, i.e., what their corresponding D(z), Q(z) are.

They also introduce a new SG-MCMC method, named SGRHMC, by wisely choosing D(z), Q(z).

Finally, they conduct two simple experiments which seem to suggest (at least somewhat) improved performance of SGRHMC compared to previous methods (SGLD, SGRLD, SGHMC).


Comments:
How does one construct \hat{B_t}, the estimate of V(theta_t) (the noise of the stochastic gradient)?

If one (by computational reasons) only can afford evaluating, say, 10 samples to estimate various expectations, what 10 samples should one pick? The final 10 samples, or will those be heavily correlated? Pick the final sample (at time t = T) and then also the samples at time t=T-k*100 (k = 1, 2, ..., 9)? (when should one start collecting samples and with what frequency should they then be collected?)

If one were to train an ensemble of models using SG-MCMC and pick the final sample of each model, how would these samples be distributed?

If the posterior distribution is a simple bowl, like in the right part of figure 2, what will the path of samples actually look like compared to the steps taken by SGD? In figure 2, I guess that gSHRHMC will eventually converge to roughly the bottom of the bowl? So if one were to only collect samples from this later stage of traversing, the samples would actually NOT be (at least approximately) distributed according to the posterior?
[19-01-24] [paper32]
  • Tutorial: Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods [pdf] [annotated pdf]
  • Changyou Chen
  • 2016-08
  • [Uncertainty Estimation], [Stochastic Gradient MCMC]
Quite interesting.
[19-01-24] [paper31]
  • An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling [pdf] [code] [annotated pdf]
  • Shaojie Bai, J. Zico Kolter, Vladlen Koltun
  • 2018-04
  • [Sequence Modeling]
General comments on paper quality:
Well-written and interesting paper.


Paper overview:
"We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks."

The authors introduce a quite straightforward CNN designed for sequence modeling, named Temporal Convolutional Network (TCN). They only consider the setting where the output at time t, y_t, is predicted using only the previously observed inputs, x_0, ..., x_t. TCN thus employs causal convolution (zero pad with kernel_size-1 at the start of the input sequence).

To achieve a long effective history size (i.e., that the prediction for y_t should be able to utilize inputs observed much earlier in the input sequence), they use residual blocks (to be able to train deep networks, the effective history scales linearly with increased depth) and dilated convolutions.

They compare TCN with basic LSTM, GRU and vanilla-RNN models on a variety of sequence modeling tasks (which include polyphonic music modeling, word- and character-level language modeling as well as synthetic "stress test" tasks), and find that TCN generally outperforms the other models. The authors do however note that TCN is outperformed by more specialized RNN architectures on a couple of the tasks.

They specifically study the effective history/memory size of the models using the Copy Memory task (Input sequences are digits of length 10 + T + 10, the first 10 are random digits in {1, ..., 8}, the last 11 are 9:s and all the rest are 0:s. The goal is to generate an output of the same length that is 0 everywhere, except the last 10 digits which should be a copy of the first 10 digits in the input sequence), and find that TCN significantly outperforms the LSTM and GRU models (which is a quite interesting result, IMO).


Comments:
Interesting paper that challenges the viewpoint of RNN models being the default starting point for sequence modeling tasks. The presented TCN architecture is quite straightforward, and I do think it makes sense that CNNs might be a very competitive alternative for sequence modeling.
[19-01-23] [paper30]
  • Stochastic Gradient Hamiltonian Monte Carlo [pdf] [annotated pdf]
  • Tianqi Chen, Emily B. Fox, Carlos Guestrin
  • ICML 2014
  • [Uncertainty Estimation], [Stochastic Gradient MCMC]
Interesting paper.
[19-01-23] [paper29]
  • Bayesian Learning via Stochastic Gradient Langevin Dynamics [pdf] [annotated pdf]
  • Max Welling, Yee Whye Teh
  • ICML 2011
  • [Uncertainty Estimation], [Stochastic Gradient MCMC]
Interesting paper.
[19-01-17] [paper28]
  • How Does Batch Normalization Help Optimization? [pdf] [poster] [video] [annotated pdf]
  • Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry
  • NeurIPS 2018
  • [Theoretical Properties of Deep Learning]
General comments on paper quality:
Quite well-written and interesting paper. A recommended read if you have ever been given the explanation that batch normalization works because it reduces the internal covariate shift.


Paper overview:
The abstract summarizes the paper very well:

"Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called "internal covariate shift". In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training."
"In this work, we have investigated the roots of BatchNorm’s effectiveness as a technique for training deep neural networks. We find that the widely believed connection between the performance of BatchNorm and the internal covariate shift is tenuous, at best. In particular, we demonstrate that existence of internal covariate shift, at least when viewed from the - generally adopted – distributional stability perspective, is not a good predictor of training performance. Also, we show that, from an optimization viewpoint, BatchNorm might not be even reducing that shift."

"Instead, we identify a key effect that BatchNorm has on the training process: it reparametrizes the underlying optimization problem to make it more stable (in the sense of loss Lipschitzness) and smooth (in the sense of “effective” β-smoothness of the loss). This implies that the gradients used in training are more predictive and well-behaved, which enables faster and more effective optimization."

"We also show that this smoothing effect is not unique to BatchNorm. In fact, several other natural normalization strategies have similar impact and result in a comparable performance gain."


Comments:
It has never been clear to me how/why batch normalization works, I even had to remove all BatchNorm layers in an architecture once to get the model to train properly. Thus, I definitely appreciate this type of investigation.

It is somewhat unclear to me how general the presented theoretical results actually are.
[19-01-09] [paper27]
  • Relaxed Softmax: Efficient Confidence Auto-Calibration for Safe Pedestrian Detection [pdf] [poster] [annotated pdf]
  • Lukas Neumann, Andrew Zisserman, Andrea Vedaldi
  • NeurIPS Workshops 2018
  • [Uncertainty Estimation]