# Concept Learning

Concept learning is an example of algorithmic machine learning. Algorithmic in the sense that they use computer science data structures to model learned information.

Algorithmic methods do not treat data as points in the Euclidean space, but try to store essential information into data structures.

## Background

Concept learning was studied in the early days of machine learning in 1950's and 1960's by psychologists who investigated human thinking process (see e.g. Bruner, J.S., Goodnow, J.J., & Austin, G.A., 1956,: "A study of thinking" by John Wiley and Sons.) "Concept" is a natural form of thinking to humans.

Concept learning forms a *conjuctive hypothesis space*.


<div>
<img src="attachment:image.png" width="200"/>
</div>

#### Dataset: EnjoySport

|Sky | Temp | Humid | Wind | Water | Forecst | EnjoySport |
| --- | --- | --- |--- | --- | --- | --- |
| Sunny | Warm | Normal | Strong  | Warm    | Same      | Yes |
| Sunny | Warm | High   | Strong  | Warm    | Same      | Yes |
| Rainy | Cold | High   | Strong  | Warm    | Change    | No |
| Sunny | Warm | High   | Strong  | Cool    | Change    | Yes |


## Find the most specific hypothesis: FIND-S Algorithm

One of the early algorithmi learning algorithms.

```

Initialize h to the most specific hypothesis in H
For each positive training instance x:
    For each attribute constraint a_i in h:
        If the constraint a_i in h is satisfied by x:
            do nothing
        Else:
            replace a_i in h by the next more general constraint that is satisfied by x

```

#### Example: FIND-S finds only the most specific hypothesis for EnjoySport, but other, more general, hypotheses can be found iteratively

## Finding all consistent hypotheses: The Candidate-Elimination Algorithm

#### Example: You should find 6 consistent hypotheses - use them all to classify the following test samples

|Sky | Temp | Humid | Wind | Water | Forecst |
| -- | ---  | ---   |---   | ---   | ---     |
| Sunny | Warm | Normal | Strong | Cool    | Change  |
| Rainy | Cold | Normal | Light  | Warm    | Same    | 
| Sunny | Warm | Normal | Light  | Warm    | Same    |
| Sunny | Cold | Normal | Strong | Warm    | Same    |

# Decision tree learning

Decision tree is a generalization of the concept learning, but forms a powerful method for machine learning, both regression and classification.

#### Example: Another set of EnjoySport training samples

|Sky | Temp | Humid | Wind | Water | Forecst | EnjoySport |
| --- | --- | --- |--- | --- | --- | --- |
| Sunny  | Warm | Normal | Strong  | Cool    | Change      | Yes |
| Cloudy | Warm | Normal | Strong  | Cool    | Change      | Yes |
| Rainy  | Warm | Normal | Strong  | Cool    | Change      | No |

## Disjunctive hypthesis space

Decision tree can represent disjunctive hypotheses - *disjunctive hypothesis space*. You should compare this example of disjunctive concept to the conjuctive concept represented by the concept learning.

<div>
<img src="attachment:image.png" width="200"/>
</div>

#### Example: PlayTennis samples

| Day | Outlook | Temperature | Humidity | Wind | PlayTennis |
| --  | --      | --          | --       | --   |   --       |
| D1 | Sunny | Hot | High | Weak | No |
| D2 | Sunny | Hot | High | Strong | No |
| D3 | Overcast | Hot | High | Weak | Yes |
| D4 | Rain | Mild | High | Weak | Yes |
| D5 | Rain | Cool | Normal | Weak | Yes |
| D6 | Rain | Cool | Normal | Strong | No |
| D7 | Overcast | Cool | Normal | Strong | Yes |
| D8 | Sunny | Mild | High | Weak | No |
| D9 | Sunny | Cool | Normal | Weak | Yes |
| D10 | Rain | Mild | Normal | Weak | Yes |
| D11 | Sunny | Mild | Normal | Strong | Yes |
| D12 | Overcast | Mild | High | Strong | Yes |
| D13 | Overcast | Hot | Normal | Weak | Yes |
| D14 | Rain | Mild | High | Strong | No |


## Decision tree training algorithm

It is essential to find out which attribute should be used at the root of the tree. If that can be known, then everything is straightforward.

### Finding the best attribute to the root of the tree

This can be approached by thinking what would be an *ideal* attribute to use or on the other hand the *worst* attribute to use.

## References

T. Mitchell (1997): Machine Learning, Chapters 2 & 3.