Skip to content

A formal proof that decisions stumps are PAC learnable

John Tristan edited this page Nov 4, 2019 · 2 revisions

Motivation

Safe machine learning

As machine learning makes its way into our lives, it is increasingly important to ensure that machine learning applications are trustworthy. There are many facets to this challenge, ranging from a better theoretical understanding of learning algorithms to interpretable models and methods for monitoring and control. We believe that formal verification can play a useful role in the development of trustworthy machine learning, both to prevent implementation errors of algorithms that are statistical in nature and notoriously hard to get right, but also as a building block for accountable machine learning.

Education

Proving that decision stumps are PAC learnable is a basic result of learning theory and is often one of the first generalization bound that students are introduced to. We have written the proof in such a way that a student who wants to understand all its details of the proof can follow using the proof assistant.

Mechanized mathematics

Finally, we believe and hope that proof assistant will play a role in the future of mathematics. From that perspective, our proof is the first step toward a more complete formalization of uniform convergence results (which are at the heart of learning theory) and a simple benchmark for what can be accomplished with the Lean theorem prover and the associated math.

Authors

Publications

Arxiv

The proof

The proof is split between two directories. In directory lib you will find generic lemmas about probability theory, vectors, etc. In directory stump you will find the definition of the machine learning algorithm to lean decision stumps and the proof that it is probably approximately correct.

The development starts with the definition of the problem of learning decision stumps and is accompanied by basic measurability results and some simple properties.

The definition of the algorithm is also accompanied by both measurability results about the learning algorithm and its properties.

The proof also makes use of basic results on the sample complexity function associated with learning decision stumps and a technical result of analysis proving the existence of a specific point on the real line.

The final proof is decomposed into two files. The first one contains all the critical lemmas that one finds in textbook proofs. The second one contains the final proof and glues everything together.