Inspiration from Yann Lecun's paper: https://openreview.net/pdf?id=BZ5a1r-kVsf

Components of autonomous intelligence:
- configurator
- short-term memory
- cost
- world model
- perceptor
- actor

Let us implement a simple toy version of this architecture. 

The inputs is the data that the perceptor receives from the world and translates to an estimate of the current state of the world. 

The outputs are the updated world model, cost and action set.

Cost is measured by a scalar "energy" broken down into intrinsic i.e. purely on the current state and/or future states predicted by the world model e.g. hunger. We can also train a critic for task-specific things, it can predict future values of intrinsic costs. The ultimate goal of the agent is to minimise this cost over the long-run.

Short-term memory is used by the actor to access previous states of the world when deciding which actions to take.

Configurator is used to configure each component to a specific task.

Mode-1 perception-action episode. The perception module estimates the state of the
world s[0] = Enc(x). The actor directly computes an action, or a short sequence of actions, through
a policy module a[0] = A(s[0]).



Design:
- Ignore configurator for now
- Short-term memory remembers the state and actions of t-1 and t-2
- Cost is intrinsic only for now and is a function of the current estimated state produced by the perceptor and outputs a scalar "energy" - c(state[t])
- Perceptor receives a signal from the "world" and estimates the current state of the world. The "world" is uniformly distributed in two dimensions money ~U(0,1) and power ~U(0,1) but this is unknown to the agent. The perceptor estimates the current state of the world as p(s[t]) [dim 2] where s is the scalar signal at time t.
- The world model, X, corrects the estimate from the current state provided by the perceptor using the short-term memory AND predicts the t+1 state of the world given the current optimal action set -> X[p, a] = state[t], state[t+1]
- Actor computes the set of optimal actions that minimises the estimated cost using gradient learning. The actor feeds the actions to the world model to predict future states which the cost module uses to estimate future energy
- The actions then effect the "world"

TODO: 
- Decide the action set
- Decide the differentiable cost function
- Decide the effect of actions on the world
- Create a Estimator class to estimate cost??. --> entropy-based random forest and neural net from first principles including the cross-entropy calculation
- Get some nice charts of the learning taking place
- Can we use some golf data?

Improvements:
- Add configurator and trained critic cost for task-specific learning / actions
- Cost module takes into account potential future states from the world model
- train using a pipeline sklearn

In [4]:
!pip install sklearn

Looking in indexes: https://pypi.org/simple, https://oliver.flynn:****@artifactory.ops.babylontech.co.uk/artifactory/api/pypi/babylon-pypi/simple
Collecting sklearn
  Using cached sklearn-0.0.tar.gz (1.1 kB)
Collecting scikit-learn
  Using cached scikit_learn-1.1.1-cp39-cp39-macosx_10_13_x86_64.whl (8.6 MB)
Collecting scipy>=1.3.2
  Using cached scipy-1.8.1-cp39-cp39-macosx_12_0_universal2.macosx_10_9_x86_64.whl (55.6 MB)
Collecting joblib>=1.0.0
  Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Collecting numpy>=1.17.3
  Using cached numpy-1.23.1-cp39-cp39-macosx_10_9_x86_64.whl (18.1 MB)
Collecting threadpoolctl>=2.0.0
  Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Using legacy 'setup.py install' for sklearn, since package 'wheel' is not installed.
Installing collected packages: numpy, threadpoolctl, scipy, joblib, scikit-learn, sklearn
    Running setup.py install for sklearn ... [?25ldone
[?25hSuccessfully installed joblib-1.1.0 numpy-1.23.1 scikit-learn-1.1.

In [3]:
import sys
print(sys.path)

['/Users/oliver.flynn/learning', '/usr/local/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python39.zip', '/usr/local/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9', '/usr/local/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload', '', '/usr/local/lib/python3.9/site-packages', '/usr/local/lib/python3.9/site-packages/IPython/extensions', '/Users/oliver.flynn/.ipython']


In [17]:
%pip install pandas

Looking in indexes: https://pypi.org/simple, https://oliver.flynn:****@artifactory.ops.babylontech.co.uk/artifactory/api/pypi/babylon-pypi/simple
Collecting pandas
  Downloading pandas-1.4.3-cp39-cp39-macosx_10_9_x86_64.whl (11.5 MB)
[K     |████████████████████████████████| 11.5 MB 879 kB/s eta 0:00:01
Installing collected packages: pandas
Successfully installed pandas-1.4.3
You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [None]:
# implement self-supervised model regularised by tree distribution assumption
# as per https://web.stanford.edu/class/ee376a/files/kedar_slides.pdf

In [None]:
# implement cross entropy loss function

def x_entropy_loss(theta, X, y):
    
    # Calculate H(p, q).
    # Given a set of observations, X = (x_0, x_1, ...) with N rows, we calculate the loss function such that it
    # represents the amount of information in bits that are required to know whether an observation
    # came from probability dists, p [true distribution] or q [predicted distribution]. 
    # Here, q(theta) is the predicted distribution that we are
    # learning wrt weights theta. p is taken to be the "observed"/frequency distribution 
    # i.e. the labels 1 or 0 assuming logistic/binary regression.
    # 
    # Cross entropy is minus the expected value of log(q) over all values of p.
    # p is unknown so we need to estimate by assuming the true dist == freq dist.
    # so we have; - 1/N * sum over i (log(q(x_i)))
    # for logistic regression we model distribution q using a logistic function.
    
    N = rows_in_X
    
    sum_sample_x_entropy = 0
    
    
    # pseudocode
    # change to matrix multiplication of X, theta and y
    for row, obs in enumerate(X):
        exponent_i = 0
        for i, x_i in enumerate(obs):
            # need to iterate through the cols
            exponent_i+=x_i*theta[i]
        sample_x_i_entropy = y[row]*ln(1/(1+exp(exponent_i))) + (1-y[row])*ln(1-1/(1+exp(exponent_i)))
        sum_sample_x_entropy+=sample_x_entropy
    
    H = -(1/N)*sum_sample_x_entropy
    
    return H

In [20]:
# implement random forest from scratch using cross-entropy log loss from scratch to predict?
# https://towardsdatascience.com/decision-tree-from-scratch-in-python-46e99dfea775

from sklearn import datasets
import pandas as pd

toy_data = datasets.load_iris()
df = pd.DataFrame(data=toy_data.data, columns=toy_data.feature_names)
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [10]:
help(datasets)

Help on package sklearn.datasets in sklearn:

NAME
    sklearn.datasets

DESCRIPTION
    The :mod:`sklearn.datasets` module includes utilities to load datasets,
    including methods to load and fetch popular reference datasets. It also
    features some artificial data generators.

PACKAGE CONTENTS
    _arff_parser
    _base
    _california_housing
    _covtype
    _kddcup99
    _lfw
    _olivetti_faces
    _openml
    _rcv1
    _samples_generator
    _species_distributions
    _svmlight_format_fast
    _svmlight_format_io
    _twenty_newsgroups
    data (package)
    descr (package)
    images (package)
    setup
    tests (package)

FUNCTIONS
    clear_data_home(data_home=None)
        Delete all the content of the data home cache.
        
        Parameters
        ----------
        data_home : str, default=None
            The path to scikit-learn data directory. If `None`, the default path
            is `~/sklearn_learn_data`.
    
    dump_svmlight_file(X, y, f, *, zero_based

In [None]:
# implement neural net from scratch using cross-entropy log loss from scratch to predict?