CS7545_Sp23_Lecture_13: VC Dimension
Recap (for more details, refer to Lecture 11:
Given a set:
Definition: The restriction of
Definition: The growth function
The growth function is bounded by
Definition: A set of
Definition: The VC-dimension
This is the largest set shattered by
Note: If the
- There exists a shattered set of size d.
- There does not exist a shattered set of size d+1.
Remark: For any finite length hypothesis set
- Closed intervals in
$\mathbb{R}$ :$VC-dim(\mathcal{H}) = 2$
For detailed discussion, refer to Lecture 11
- Axis-aligned rectangles in
$\mathbb{R}^2$ :$VC-dim(\mathcal{H}) = 4$ .
In Lecture 11, we saw that there exists a set of 4 points that can be shattered. Now, we need to show that no set of 5 points can be shattered by axis-aligned rectangles in
Let
Let
Similarly choose
In either case, the points cannot be separated if the
-
$\mathcal H = \lbrace h : \text{Halfspace in }\mathbb R^2 \rbrace$ :$VC-dim(\mathcal H) = 3$ .
Halfspaces in
-
$\mathcal H = \lbrace h : \text{Halfspace in } \mathbb R^n \rbrace$ :$VC-dim(\mathcal H) = n+1$ .
We can find a set of
Let
Let
Let
Plugging in the chosen coordinates
In the case of
The most obvious choice of such
In the next step, we use Radon's theorem to show that no set of
Theorem (Radon's): Any set
Theorem Proof: Refer Theorem 3.4 in Foundations of Machine Learning
Coming back to our problem, consider any set
By Radon's theorem, they can be partitioned into two sets
Note that if two sets of points can be separated by a hyperplane, then their convex hulls can also be separated by the same hyperplane (For more details, see Hyperplane Separation Theorem).
Therefore,
-
$\mathcal H = \lbrace \text{all convex polygons in } \mathbb R^2 \rbrace$ has$VC-dim(\mathcal H) = \infty$ .
Depending on the number of points and labels, a convenient polygon can be chosen to shatter any number of points (see fig for illustration).
-
$\mathcal H = \lbrace h(x) = \sin \alpha x: \alpha \in \mathbb R \rbrace$ has$VC-dim(\mathcal H) = \infty$ .
Depending on the distribution of the points and labels, an