Skip to content

feat(MachineLearning/PACLearning): definitions#492

Open
SamuelSchlesinger wants to merge 3 commits intoleanprover:mainfrom
SamuelSchlesinger:feat/pac-learning-defs
Open

feat(MachineLearning/PACLearning): definitions#492
SamuelSchlesinger wants to merge 3 commits intoleanprover:mainfrom
SamuelSchlesinger:feat/pac-learning-defs

Conversation

@SamuelSchlesinger
Copy link
Copy Markdown
Contributor

@SamuelSchlesinger SamuelSchlesinger commented Apr 14, 2026

Define the PAC learning model generalized to an arbitrary label type and parameterized by a distribution family over labeled examples.

The unified definition IsPACLearnerFor captures the realizable, agnostic, and noise-tolerant settings. Online PAC learning is not treated here and left for future work. Given that the theorems we will prove on this definition will require statements that look a whole lot like IsPACLearnerFor anyways, this work will be compatible with future online definitions.

Define the PAC learning model [Valiant1984] generalized to an arbitrary
label type `β` and parameterized by a distribution family `𝒟` on `α × β`.

The unified definition `IsPACLearnerFor` captures the realizable, agnostic
[Haussler1992], and noise-tolerant [AngluinLaird1988] settings by varying
`𝒟`. Accuracy and confidence parameters `ε, δ` use the subtype
`Set.Ioo (0 : ℝ≥0) 1` to ensure non-vacuity.

Main contributions:
- Core: `error`, `optimalError`, `IsPACLearnerFor`, `IsRPACLearnerFor`
- Learnability: `IsPACLearnable`, `IsRPACLearnable`
- Sample complexity: `sampleComplexity`, `rsampleComplexity`
- API: `toIsRPACLearnerFor`, `mono`, `IsPACLearnable.toIsRPACLearnable`
- Binary classification: `hypothesisError`, `hypothesisError_eq_add`,
  `error_map_eq_hypothesisError`
- Move all declarations under `Cslib.MachineLearning.PACLearning` to keep
  generic names like `error`, `optimalError` out of the parent namespace.
- Rename predicate-level `.mono` lemmas that flip implication direction
  against `⊆` to `.antitone_family` / `.antitone_C`
  (`IsPACLearnerFor`, `IsRPACLearnerFor`, `IsPACLearnable`, `IsRPACLearnable`).
  Keep `.mono_δ` / `.mono_ε` on the predicates and `_mono_family` / `_mono_C`
  on sample complexity, where the direction actually is monotone.
- Restore `Ω : Type*` on `IsRPACLearnerFor` so the randomness space is
  universe-polymorphic; downstream `IsRPACLearnerFor.*` theorems thread
  the universe via the `.{_, _, u}` pattern. `IsRPACLearnable` and
  `rsampleComplexity` pin `u := 0` (documented).
- Wrap the `Binary` subsection in `section Binary` with a local
  `variable {α : Type*} [MeasurableSpace α]`.
- Simplify `optimalError` monotonicity in `IsPACLearnerFor.antitone_C`
  via `iInf_le_iInf_of_subset`.
- Add convenience variants on `IsPACLearnable` / `IsRPACLearnable` that
  discharge the `∃ m, IsPACLearnerFor m …` existence hypothesis from a
  learnability witness: `sampleComplexity_antitone_δ`, `_antitone_ε`,
  `_mono_family`, `_mono_C`, and the RPAC analogues.

Axioms remain standard (`propext`, `Classical.choice`, `Quot.sound`).
…onvenience lemmas

The `IsPACLearnable.sampleComplexity_*` and `IsRPACLearnable.rsampleComplexity_*`
convenience lemmas now compose with the general `sampleComplexity_le_of_forall`
directly instead of calling the top-level `sampleComplexity_*` lemmas whose
unqualified names collide with the surrounding `IsPACLearnable.` / `IsRPACLearnable.`
namespace.

This removes six `_root_.Cslib.MachineLearning.PACLearning.*` qualifiers, lets the
convenience lemmas live inside the main `variable {α β}` section again, and drops
their now-redundant type binders.
Zetetic-Dhruv pushed a commit to Zetetic-Dhruv/cslib that referenced this pull request Apr 18, 2026
Adds the classical version space abstraction (Mitchell 1982, Angluin 1980)
to the PAC learning module. Version space — the subset of a concept class
consistent with observed labeled data — is the structural substrate every
sample complexity theorem operates on (Baby PAC, Sauer-Shelah, PAC lower
bounds, NFL). This PR complements leanprover#492 by providing:

- VersionSpace C S: the consistent subset of C given sample S
- versionSpace_subset, versionSpace_empty_sample: sanity lemmas
- versionSpace_antitone: structural monotonicity (more data → smaller VS),
  dual to sample complexity monotonicity
- IsConsistent A C: predicate on learners (output always in version space)
- IsConsistent.output_mem_conceptClass, output_consistent: consistent learner
  properties
- mem_versionSpace_of_realizable: under realizable data, the target concept
  lies in the version space (the realizable-case bridge)
- versionSpace_nonempty_of_realizable: corollary

No measure theory, no new Mathlib dependencies, ~150 lines, 0 sorry.

Together with leanprover#492 these suffice to state the Baby PAC theorem, Sauer-Shelah
sample complexity, PAC lower bounds, and NFL as structural statements rather
than ad-hoc computations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant