feat(MachineLearning/PACLearning): definitions by SamuelSchlesinger · Pull Request #492 · leanprover/cslib

SamuelSchlesinger · 2026-04-14T23:19:09Z

Define the PAC learning model generalized to an arbitrary label type and parameterized by a distribution family over labeled examples.

The unified definition IsPACLearnerFor captures the realizable, agnostic, and noise-tolerant settings. Online PAC learning is not treated here and left for future work. Given that the theorems we will prove on this definition will require statements that look a whole lot like IsPACLearnerFor anyways, this work will be compatible with future online definitions.

Define the PAC learning model [Valiant1984] generalized to an arbitrary label type `β` and parameterized by a distribution family `𝒟` on `α × β`. The unified definition `IsPACLearnerFor` captures the realizable, agnostic [Haussler1992], and noise-tolerant [AngluinLaird1988] settings by varying `𝒟`. Accuracy and confidence parameters `ε, δ` use the subtype `Set.Ioo (0 : ℝ≥0) 1` to ensure non-vacuity. Main contributions: - Core: `error`, `optimalError`, `IsPACLearnerFor`, `IsRPACLearnerFor` - Learnability: `IsPACLearnable`, `IsRPACLearnable` - Sample complexity: `sampleComplexity`, `rsampleComplexity` - API: `toIsRPACLearnerFor`, `mono`, `IsPACLearnable.toIsRPACLearnable` - Binary classification: `hypothesisError`, `hypothesisError_eq_add`, `error_map_eq_hypothesisError`

- Move all declarations under `Cslib.MachineLearning.PACLearning` to keep generic names like `error`, `optimalError` out of the parent namespace. - Rename predicate-level `.mono` lemmas that flip implication direction against `⊆` to `.antitone_family` / `.antitone_C` (`IsPACLearnerFor`, `IsRPACLearnerFor`, `IsPACLearnable`, `IsRPACLearnable`). Keep `.mono_δ` / `.mono_ε` on the predicates and `_mono_family` / `_mono_C` on sample complexity, where the direction actually is monotone. - Restore `Ω : Type*` on `IsRPACLearnerFor` so the randomness space is universe-polymorphic; downstream `IsRPACLearnerFor.*` theorems thread the universe via the `.{_, _, u}` pattern. `IsRPACLearnable` and `rsampleComplexity` pin `u := 0` (documented). - Wrap the `Binary` subsection in `section Binary` with a local `variable {α : Type*} [MeasurableSpace α]`. - Simplify `optimalError` monotonicity in `IsPACLearnerFor.antitone_C` via `iInf_le_iInf_of_subset`. - Add convenience variants on `IsPACLearnable` / `IsRPACLearnable` that discharge the `∃ m, IsPACLearnerFor m …` existence hypothesis from a learnability witness: `sampleComplexity_antitone_δ`, `_antitone_ε`, `_mono_family`, `_mono_C`, and the RPAC analogues. Axioms remain standard (`propext`, `Classical.choice`, `Quot.sound`).

…onvenience lemmas The `IsPACLearnable.sampleComplexity_*` and `IsRPACLearnable.rsampleComplexity_*` convenience lemmas now compose with the general `sampleComplexity_le_of_forall` directly instead of calling the top-level `sampleComplexity_*` lemmas whose unqualified names collide with the surrounding `IsPACLearnable.` / `IsRPACLearnable.` namespace. This removes six `_root_.Cslib.MachineLearning.PACLearning.*` qualifiers, lets the convenience lemmas live inside the main `variable {α β}` section again, and drops their now-redundant type binders.

Adds the classical version space abstraction (Mitchell 1982, Angluin 1980) to the PAC learning module. Version space — the subset of a concept class consistent with observed labeled data — is the structural substrate every sample complexity theorem operates on (Baby PAC, Sauer-Shelah, PAC lower bounds, NFL). This PR complements leanprover#492 by providing: - VersionSpace C S: the consistent subset of C given sample S - versionSpace_subset, versionSpace_empty_sample: sanity lemmas - versionSpace_antitone: structural monotonicity (more data → smaller VS), dual to sample complexity monotonicity - IsConsistent A C: predicate on learners (output always in version space) - IsConsistent.output_mem_conceptClass, output_consistent: consistent learner properties - mem_versionSpace_of_realizable: under realizable data, the target concept lies in the version space (the realizable-case bridge) - versionSpace_nonempty_of_realizable: corollary No measure theory, no new Mathlib dependencies, ~150 lines, 0 sorry. Together with leanprover#492 these suffice to state the Baby PAC theorem, Sauer-Shelah sample complexity, PAC lower bounds, and NFL as structural statements rather than ad-hoc computations.

SamuelSchlesinger requested review from chenson2018 and fmontesi as code owners April 14, 2026 23:19

SamuelSchlesinger mentioned this pull request Apr 14, 2026

feat(MachineLearning/PAC): PAC Learning Definitions and Sample Complexity Lower Bounds #483

Closed

SamuelSchlesinger added 2 commits April 17, 2026 21:21

Zetetic-Dhruv mentioned this pull request Apr 18, 2026

feat(MachineLearning/PACLearning): add VersionSpace abstraction SamuelSchlesinger/cslib#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(MachineLearning/PACLearning): definitions#492

feat(MachineLearning/PACLearning): definitions#492
SamuelSchlesinger wants to merge 3 commits intoleanprover:mainfrom
SamuelSchlesinger:feat/pac-learning-defs

SamuelSchlesinger commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SamuelSchlesinger commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SamuelSchlesinger commented Apr 14, 2026 •

edited

Loading