# Mistake Bounded Learning

## ***Vocabulary***

none yet

# Lecture Notes #

## ***Introduction***

#### **Learner, Teacher, and Update Rule**

Imagine a program that filters spam emails, and has an iron-clad guaranteee that it will only make 100 mistakes, even when your inbox has 30,000+ emails. This is called the mistake-bounded model of learning.

**Learner**
- Takes in samples (data points), and responds with its guess for the samples classification.

**Teacher**
- Responds to the classification guess with whether the guess was correct or incorrect.

**Update Rule**
- When the teacher tells the learner that it made a mistake, a counter for the number of mistakes increases by one. However, when the learner makes a mistake, it also learns from the mistake, updating its internal state.

\
We say a Learner has mistake-bound $t$ if for every sequence of challenges, the learner makes at most $t$ mistakes.

<br>
<center>
    <img src="images/1.1.1.png" alt="Professor Notes" />
</center>
<br>

#### **Function Class Introduction**

Let's introduce a function class called $\mathcal{C}$, which is the function class of all monotone disjunctions:

$$ \mathcal{C} = \{Monotone\;Disjunctions\;on\;n\;variables\} $$

And our domain, $\mathcal{D}$, consists of bit strings of length $n$:

$$ \mathcal{D} = \{0,1\}^n $$

Some examples of functions in $\mathcal{C}$:

$$f(x) = x_1 \lor x_3 $$

$$ f(x) = x_1 \lor x_5 \lor x_7 $$

## ***Monotone Disjunctions***

#### **Mistake Bounded Model Example**

Let us fix $f \in \mathcal{C}$, and the learner **does not** know $f$. The learner is trying to learn $f$, so it will guess 0 or 1, and the teacher will respond with correct if the guess equals $f(x)$, or will respond mistake otherwise.

No matter if the guess was correct or not, the learner learns something after each challenge. 

***Question: How can we come up with learner/algorithm that has mistake bound $n$?***

In this case, we will start with some monotone disjunction as our initial state. Each time we guess a mistake, we will update our monotone disjunction so it is consistent with what we've seen.

We will start with the learner using monotone disjunction: $x_1 \lor x_2 \lor \dots \lor x_n$ as its initial state. After each mistake, the learner will update its state to be consistent with the seen data. 

With each mistake, we will be able to eliminate at least one literal from our monotone disjunction. There are at most $n$ literals, which implies that the number of mistakes is at most $n$.


<br>
<center>
    <img src="images/1.1.2.png" alt="Professor Notes" />
</center>
<br>

## ***Disjunctions***

#### **Updating to a More Interesting Function Class**

Let's udpate our function class $\mathcal{C}$, to be the function class of all disjunctions:

$$ \mathcal{C} = \{Disjunctions\} $$

And our domain, $\mathcal{D}$, consists of bit strings of length $n$:

$$ \mathcal{D} = \{0,1\}^n $$

We can now have negations in our disjunction. Some examples of functions in the new $\mathcal{C}$:

$$ f(x) = x_1 \lor \bar{x_2} \lor x_5 \lor \bar{x_7} $$

***Question: How can we use the algorithm for monotone disjunctions to learn disjunctions?***

We will be performing something called "feature expansion", where we take a bit string of length $n$, and map it to a new string of length $2n$:

$$ x_1, \dots, x_n \mapsto x_1, \dots, x_n, y_1, \dots, y_n $$

Where each $y_i = \bar{x_i}$, the $y_i$ equals the negation of that $x_1$.

Thus, each $f(x_1, \dots, x_n)$ can be rewritten as a new function $f(x_1, \dots, x_n, y_1, \dots, y_n)$, that will behave the same:

$$f(x_1, \dots, x_n) = x_2 \lor \bar{x_4} \lor x_7 $$

$$f(x_1, \dots, x_n, y_1, \dots, y_n) = x_2 \lor y_4 \lor x_7 $$

<br>
<center>
    <img src="images/1.1.3.png" alt="Professor Notes" />
</center>
<br>

We now have a new algorithm for arbitrary disjunctions with mistake bound $ \le 2n$.

# Personal Notes #