# Simple training example

In [1]:
from kilroyshare.interfaces import KilroyFace
from kilroyshare.post import PostData
from kilroylib.modules import KilroyModule
from kilroylib.data import MemoryCachingDatasetFactory
from kilroylib.training import KilroyTrainer

### Face definition

In [2]:
class MyFace(KilroyFace):
    def __init__(self):
        self.i = 0
        self.posts = {}
    
    def scrap(self, limit = None):
        n = limit or 100
        self.i = n+1
        for i in range(n):
            yield i, PostData(i % 3 == 0)  # every third is True

    def post(self, data):
        post_id = self.i
        self.i += 1
        self.posts[post_id] = data
        return post_id

    def score(self, post_id):
        return int(self.posts[post_id].x) * 2 - 1  # True -> +1, False -> -1

When scrapping posts every third post has x equal True, so around $33\%$ of posts have x equal True. The module should learn that distribution after fine-tuning.

When scoring posts all posts with x equal True are scored $+1$, all posts with x equal False are scored $-1$. The module should learn to generate posts with True only after reinforcing.

### Module definition

In [3]:
import random

class MyModule(KilroyModule):
    def __init__(self):
        self.p = 0.5  # Bernoulli distribution parameter
        self.i = 0
        self.posts = {}
    
    def generate(self):
        # generate according to parameter
        x = random.choices([True, False], weights=[self.p, 1-self.p], k=1)[0]
        post = PostData(x)
        post_id = self.i
        self.posts[post_id] = post
        self.i += 1
        return post_id, post

    def mimic(self, posts):
        # estimate new parameter from posts and step towards it
        p_est = sum(int(post.x) for post in posts) / len(posts)
        self.p = max(min(self.p + 0.001 * (p_est - self.p), 1), 0)
        return self

    def reinforce(self, scores):
        # update parameter according to scores
        scores = {self.posts[post_id].x: score for post_id, score in scores.items()}
        diff = sum((int(x) - self.p) * score for x, score in scores.items()) / len(scores)
        self.p = max(min(self.p + 0.1 * diff, 1), 0)
        return self

Simple Bernoulli distribution. 

When mimicking the module steps towards parameter estimated from real posts. 

When reinforcing the module steps in the right direction depending on scores.

### Setup

In [4]:
face = MyFace()
module = MyModule()
dataset_factory = MemoryCachingDatasetFactory()  # we can cache all data in memory, because it's so simple
trainer = KilroyTrainer(face, module, dataset_factory=dataset_factory)

In [5]:
module.p

0.5

Initial parameter value.

### Fine-tuning

In [6]:
trainer = trainer.fine_tune()

In [7]:
module.p

0.34001284216836875

After fine-tuning the parameter is around $0.33$, as it should be.

### Reinforcing

In [8]:
trainer = trainer.run(steps=100, post_rate=0.0001)

In [9]:
module.p

0.9999995369167228

Ater reinforcing the parameter is $1$, also as it should be.