This notebook is part of the supplementary material of the books "Online Machine Learning - Eine praxisorientiere Einführung",  
https://link.springer.com/book/9783658425043 and "Online Machine Learning - A Practical Guide with Examples in Python" https://link.springer.com/book/9789819970063
The contents are open source and published under the "BSD 3-Clause License".
This software is provided "as is" without warranty of any kind, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose. The author or authors assume no liability for any damages or liability, whether in contract, tort, or otherwise, arising out of or in connection with the software or the use or other dealings with the software.

# Chapter 4 Initial Selection and Subsequent Update of OML Models

In [1]:
from river import datasets
from river import tree

## The dataset `ImageSegemnts` 

* This dataset contains 18 features describing image segments in seven classes: 
  * brickface (brick wall), 
  * sky (sky), 
  * foliage (foliage), 
  * cement (cement), 
  * window, 
  * path (path) and 
  * grass (grass).

* Instances were randomly drawn from a database of seven outdoor images. 
* 18 attributes ($x$ values) are used.
* The images were segmented by hand to create a classification for each pixel.

In [2]:
dataset = datasets.ImageSegments()

* Attributes and class of the first data set:

In [3]:
x, y = next(iter(dataset))
(x,y)

({'region-centroid-col': 218,
  'region-centroid-row': 178,
  'short-line-density-5': 0.11111111,
  'short-line-density-2': 0.0,
  'vedge-mean': 0.8333326999999999,
  'vegde-sd': 0.54772234,
  'hedge-mean': 1.1111094,
  'hedge-sd': 0.5443307,
  'intensity-mean': 59.629630000000006,
  'rawred-mean': 52.44444300000001,
  'rawblue-mean': 75.22222,
  'rawgreen-mean': 51.22222,
  'exred-mean': -21.555555,
  'exblue-mean': 46.77778,
  'exgreen-mean': -25.222220999999998,
  'value-mean': 75.22222,
  'saturation-mean': 0.31899637,
  'hue-mean': -2.0405545},
 'path')

## Example: Hoeffding Tree for Classification (HTC)

* We build the initial HTC model that does not (yet) see a data point:

In [4]:
model = tree.HoeffdingTreeClassifier()

* If the initial model is used for prediction, then an empty dictionary is returned:

In [5]:
model.predict_proba_one(x)

{}

* The reason why the dictionary is empty, is because the model has not seen any data yet. 
* The data set is not yet known.

## Model Training:

In [6]:
model.learn_one(x, y)
model.predict_proba_one(x)

{'path': 1.0}

## Further Training

* Training and Prediction on the first 50 data:

In [7]:
i = 0
for x, y in dataset:
    y_pred = model.predict_one(x)
    model.learn_one(x, y)
    print(y_pred)
    i = i +1
    if i > 50:
        break

path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
path
grass
window
foliage
brickface
grass
grass
grass
brickface
window
window
grass
brickface
grass
grass
window
window
brickface
path
window
foliage
cement


## Further Information

* Mehr: [river: Multi-class classification](https://riverml.xyz/0.13.0/introduction/getting-started/multiclass-classification/)