# SingularityNET Simulation Tutorial

This is a simulation of the SingularityNET itself.
Its purpose is to compare solution approaches to problems in the network, as well as to make solutions out of AI services, which is the function of the SingularityNET. 
It facilitates algorithms that gain experience in tailoring solutions to individual needs, through agent based co-evolution, and helps users to test these algorithms before using them in the real world SingularityNET.
It helps AIs to build AIs.
This notebook offers an initial look at the fundamental problem of how to construct AI software solutions automatically out of AI programs submitted to the SingularityNET.

The simulation tests how different solutions affect SingularityNET values, such as delivering maximum utility to people seeking AI software at the lowest cost, giving everyone who makes high quality software a chance to sell, distributing credit where credit is due, and complexification to facilitate a singularity.
An important value of the SingularityNET is community participation in its construction, thus we offer this simulation in the competitive arena software to attract the community to play the "Singularity" game, in the spirit of democratic meritocracy.

This simulation is implemented in the [Mesa](https://github.com/projectmesa/mesa) agent based software in a style that mimics [OpenAI Gym](https://gym.openai.com/) reinforcement learning competition software, except that it's better tailored to multiple agents.
Mesa handles the network of multiple agents, their rules of interaction, and their schedule and cooperative setups.
Mesa simulation agents read and write to a network-mediated blackboard.
The blackboard displays human requests for a particular algorithm type that can pass a test (that has a gradient) on a particular dataset for a particular price range in AGI tokens.
The algorithm type, test, and dataset come from an ontology declared in a JSON configuration file, which describes pre-registered software.
In response, agents can place on the blackboard offers to construct, sell, or buy software from other agents. 
Construction can include finding appropriate types from the human-designed ontology and parameter settings in combinations that fit particular use cases. 
However, it can also include inventing new algorithms that trade an input list for an output list (in [OfferNet](https://singnet.github.io/offernet/public/offernet-documentation/index.html) fashion) that are not listed in the human-designed ontology.
Agents may also display a sign in the form of a vector of floats, and may express their preferences for agents whose sign is closest to an ideal sign.
The sign is used by the simulation to rank agents that already have matching trade plans.

The simulation has a general, flexible knowledge representation that is meant to accommodate the needs of different machine learning and reinforcement learning algorithms.
Different modular machine/reinforcement learning algorithms may compete on the same blackboard in contests to see which algorithms best fill user, developer, and AI growth needs.
Alternatively, the simulation could focus on a single algorithm, measuring its effects in isolation, in a more cooperative scenario.
The different algorithms implement agent integration schemes with their own approaches.
For example, it is up to each machine learning/reinforcement learning algorithm to assign meanings to the float vector signs that they read and display on the blackboard.
Different possible interpretations of the sign's float dimensions include reputation scores, the identity of an agent, the information needed for OfferNet trades, an emergent agent language or an emergent software ontology that augments the human-designed one.
Agents may also require that software pass certain tests on data, seen and unseen, before a transaction is accepted; they can list any number of tests and datasets.
They have the choice to indicate the type of software they are buying, selling, or constructing with the human ontology, which indicates an input and output list for them.

Buying agents (and humans) can use any level of generality of the ontology in their specification of what software they want.
Alternatively, they may offer an input list for an output list and allow the sign field to represent the ontological type.
They have the choice of paying solely through the input and output list without using AGI tokens for currency, in order to implement an OfferNet type of scenario - although they have the option of AGI tokens, they may not maximize these as it may not be in their utility or fitness function to do so. 
During competition, different meanings of signs could either isolate agents into solution groups having the same algorithm, or algorithms groups could learn other group's meanings, creating a competition on the meaning of signs.

Once all offers are on the blackboard, the blackboard makes the matches ranked by the cosine distance of those signs that have overlapping prices ranges and exceed agents stated test threshold criteria.
The environment creates the software, and distributes funds. 
It notifies the agent of payment through an OpenAI Gym-like environment reward signal, detailing the particular float vectors that were rewarded, and giving the agent a chance to view the blackboard. 
However it is up to the particular reinforcement learning algorithm to choose what aspects of the environment's reward, observation, and of itself to include in its personal reward.
Mesa keeps track of agent and human utility satisfaction and prices, and can be made to measure SingularityNET values such as adequate exploration of unknown agent capabilities. 

This general design promotes SingularityNET values in several ways. 
The simulation allows credit to be assigned through the market price signal. 
It can serve as the baseline for comparison with other assignment of credit algorithms, one that holds all else the same.
In this simulation, agents can also gain through more OfferNet-type utility based trades (for example, use of the input list for training), or more reputation utility based trade (for example, volunteering to increase ones reputation, to ultimately increase its market value in AGI tokens). 
Correct assignment of credit is needed to promote quality software that satisfies the user, while at the same time assures that developers have a fair shot. 
It is expected that agents that learn to do jobs of uniquely higher quality in uniquely necessary areas will be able to charge higher prices. 
Since agents are stateful and able to gain experience from a variety of different problems, it is expected that one way for an agent to gain a higher price would be to reuse and accumulate knowledge in one area, that is, to specialize. 
Such agents would contain parameters and functions suited to particular types of problems. 
Importantly, these problems need not be expressed in the human-designed ontology: they may involve whatever agent roles are needed in practice to get the job done. 
The emergence of automated new constructions and new ontological categories is prerequisite to AIs which can construct themselves and thus supports a singularity. 

As agents learn how to buy and sell the services of other agents, particularly higher quality but cheaper agents that they have learned to recognize through sign and test, it is expected that curating agents will emerge. 
Curation agents that can both recommend software successfully and give new developers a chance to become known satisfy both user and developer. 
The "no free lunch" theorem of Machine Learning and Optimization -the idea that no one solution fits all- makes curation agents essential to the SingularityNET. 

In creating this environment we have had to invent a few things that are also useful in other contexts. 
The effort to create a gradient of Python programs for machine learning programs to construct useful solutions, through representation and distributed AI, is the first contribution. 
A second contribution is an agent economy that solves problems.
Finally, we designed a convenient way of storing the partial products of AI solutions so the same combination, whether partial or complete, never has to be computed twice, even in subsequent runs of the simulation.

All of this may seem confusing. 
Is it a simulation? 
Or does it build products? 
Well, both, as we will explain in scaffolded fashion in this notebook. 
In part one, we introduce the knowledge representation of the communicating agents on the blackboard, as well as the utilities that make combinatorial AI practical. 
In part two, we will demonstrate an initial use of the simulation with a particular reinforcement learning algorithm to construct the AI software: SISTER (Symbolic Interactionist Simulation of Trade and Emergent Roles). 

SISTER is an evolutionary algorithm that implements agent-based coevolution. 
It uses the sign areas of the blackboard communications to learn specialized, non-preprogrammed roles for the agents, needed to solve problems. 
The utility/fitness/objective functions for individual agents is measured in AGI tokens, making use of Adam Smith's invisible hand rather than group selection coevolution methods that base utility on the good of the whole. 
Assignment of credit through the market process is inherent in this method. 
As her name suggests, emergent roles in SISTER are macro institutions from the micro-macro integration processes of micro-economics and micro-sociological symbolic interactionism. 
Thus SISTER models growth in AI on growth in economy and society. 

This appreciation for the life of the market does not imply that the author believes nature should be unfettered, rather, that we must first understand both nature and natural illness before we are able to apply technology to the treatment of today's policy challenges.

## Part 1. Representation for Evolvability

This simulation uses a representation of Python that addresses problems of wastefulness in combinatorial search techniques such as evolutionary algorithms, while at the same time making it easier to evolve AI solutions composed of existing Python programs. 

Since Python is not a tree based language, it is more difficult to evolve than tree based, functional languages such as Scala, and especially those that can easily use their program representations as data, such as Scheme. 
However, since it is the most used language in data science, we want to be able to use building blocks from existing Python programs. 
Many data science programs, such as TensorFlow, are arranged in such building blocks. 
Python has a few functional programming representations, such as lamdas for unnamed functions, but it does not offer currying, so we make use of an external implementation of it.

The pickling of bound curries in this preliminary implementation does not make use of introspection or marshalling, which would be important in making curries that were mobile across machines. 
Since pickles can be malicious, it is recommended that security issues be handled before this prototype simulation is actually used in competitions. 
However, when used for a simulation on a single machine, these curried pickles are saved to disk and kept track of to ensure that nothing need be computed more than once. 
Once security issues are resolved, the curry representation in itself is good for dividing a problem up into parts on multiple machines for high performance computing, and serialization in itself is good for checkpoint storage of computationally intense functions as are many machine learning functions. 

Tree-based genetic programming -the standard in evolving computer programs- has difficulty converging in ways that don't grow too quickly. 
Not being able to converge is problematic in agents that base their interactions on the equilibria of economics: for these agents, convergence represents a solution, a dynamic compromise, and a software institution "fuzzy rule". 
SISTER, for example, depends upon the compromise of convergence between agents to develop the institutions of role and role relations that are the solutions to AI integration problems. 

Trees offer the ability for different modifiers to refer to the same entity without having to rename it, a great advantage in genetic programming (GP). 
However, GP programs have difficulty in converging, because growing trees often involve displacing insertions, with small changes in genotype resulting in large changes in phenotype. 
That is why we combine genetic programming with agents that can self-organize into subroutines, so that we take advantage of referring to the same entity as trees do, but so that multiple branches of a tree (forks) are implemented with multiple agents. 
In our curried representation, a position corresponds to itself in the next generation, facilitating convergence. 
However, chromosome size can change as is needed for growth, not by insertion but using markers; using genetic terminology, by using a stop codon (marker) that moves through introns that lengthens a chromosome but preserves position. 

The hierarchical tree structure of the curries also gives gradient to machine learning programs by defining what is a distant/close change, becoming a kind of [Gray coding](https://en.wikipedia.org/wiki/Gray_code) that makes changes that are close in phenotype, close in genotype, as well as changes that are far in phenotype, far in genotype.
The representation using an ontology with markers to give gradient is a contribution of this project that allows many more functions to be represented than in standard genetic programming.

The knowledge representation of the agent communications is a vector of floats sent to the blackboard environment as a message, corresponding to an OpenAI Gym action. 
A floating point representation was chosen because of its generality, and can be used or converted into use by any machine learning algorithm. 
Such representation allows a more exacting search in the float parameter space quite easy to add on at a later point. 
It uses the ontology representation, and likewise is designed for evolvability by facilitating convergence and gradient, needed by many machine learning algorithms. 
For example, the SISTER algorithm uses the float vector of agent communications as the chromosome inside of an individual [CMA-ES](https://en.wikipedia.org/wiki/CMA-ES) algorithm within each agent. 

### Configuration file
The configuration file contains parameters, the initial agent configuration of messages on the blackboard, and the ontology.  

First we show a blackboard message: a human request to buy a clusterer. 
The meessage shows the price range (between low and high) the user is willing to pay, the tests the clusterer must pass, and the performance threshold (we use a short data file for this example to make a quick demo, so we have turned off the threshold). 

It also shows a vector of floats as a sign to rank potential selling agents, based on the closeness of their displayed sign to this vector of floats, given that they meet all of the other conditions of the trade.

To start, we import the configuration file and show the first entry in the blackboard:

In [6]:
import json
with open('config.json') as json_file:  
    config = json.load(json_file)
    print(json.dumps(config['blackboard'][0], indent=2))

{
  "type": "Human",
  "label": "Cluster Seeking Human",
  "sign": [
    0.45,
    0.23,
    0.94,
    0.24,
    0.68,
    0.29,
    0.95,
    0.47
  ],
  "trades": [
    {
      "type": "buy",
      "sign": [
        0.83,
        0.59,
        0.35,
        0.7,
        0.13,
        0.93,
        0.35,
        0.12
      ],
      "item": "clusterer_stop",
      "low": 0.0,
      "high": 0.8,
      "tests": [
        {
          "test": "test_clusterer_silhouette",
          "data": "data_freetext_editorial",
          "threshold": -0.99,
          "hidden": false
        }
      ]
    }
  ]
}


This offer to buy matches best with the following offer to sell, also in the blackboard:

In [2]:
print(json.dumps(config['blackboard'][1], indent=2))

{
  "type": "SISTER",
  "label": "Clusterer that purchases vector space Agent 1",
  "sign": [
    0.86,
    0.67,
    0.3,
    0.73,
    0.1,
    0.96,
    0.29,
    0.19
  ],
  "trades": [
    {
      "type": "sell",
      "sign": [
        0.45,
        0.38,
        0.96,
        0.38,
        0.64,
        0.96,
        0.74,
        0.57
      ],
      "item": "clusterer_sklearn_kmeans_20clusters",
      "low": 0.7,
      "high": 0.99,
      "tests": [
        {
          "test": "stop_clusterer_silhouette",
          "data": "data_freetext_editorial",
          "threshold": 0.4,
          "hidden": false
        }
      ]
    },
    {
      "type": "buy",
      "sign": [
        0.45,
        0.89,
        0.85,
        0.3,
        0.59,
        0.45,
        0.58,
        0.38
      ],
      "item": "vectorSpace_stop",
      "low": 0.45,
      "high": 0.99,
      "tests": [
        {
          "test": "test_stop_silhouette",
          "data": "data_freetext_editorial",
        

The agent below has parameterized the clusterer and bought a vector space, so as to sell it to the human cluster seeker.
It has matching trade plans with both the human and with the vector space seller.
The vector space seller has a longer program, perhaps as we would see in an agent with a lot of experience.
Many of the individual programs would work on their own, but the addition of more make them work better, giving the offer gradient.

In [3]:
print(json.dumps(config['blackboard'][2], indent=2))

{
  "type": "SISTER",
  "label": "NLP pipeline vector specialist, Agent 2",
  "sign": [
    0.42,
    0.99,
    0.75,
    0.31,
    0.55,
    0.48,
    0.53,
    0.33
  ],
  "trades": [
    {
      "type": "sell",
      "sign": [
        0.45,
        0.89,
        0.85,
        0.3,
        0.59,
        0.45,
        0.58,
        0.38
      ],
      "item": "vectorSpace_gensim_doc2vec_200size_1000iterations_5minFreq",
      "low": 0.0,
      "high": 0.5,
      "tests": [
        {
          "test": "test_stop_silhouette",
          "data": "data_freetext_editorial",
          "threshold": 0.77,
          "hidden": false
        }
      ]
    },
    {
      "type": "construct",
      "sign": [
        0.45,
        0.59,
        0.45,
        0.35,
        0.64,
        0.67,
        0.28,
        0.75
      ],
      "item": "preprocessor_freetext_tag",
      "low": 0.0,
      "high": 0.4,
      "tests": [
        {
          "test": "test_clusterer_stop",
          "data": "data_fre

The simulation parameters in the config file are used to interpret a vector of floats, which is what the machine learning agents that are not stubbed put on the blackboard.
This scenario has five such agents.

In [4]:
print(json.dumps(config['parameters'], indent=2))

{
  "label": "Cluster Scenario",
  "output_path": "competing_clusterers/",
  "sign_size": 8,
  "num_trade_plans": 10,
  "item_size": 8,
  "num_tests": 5,
  "min_token_price": 1,
  "max_token_price": 100,
  "max_iterations": 10,
  "seed": 5,
  "agent_parameters": {
    "SISTER": {
      "num_chromosomes": 100,
      "num_chromosomes_kept": 50
    },
    "Human": {}
  },
  "chance_of_stop_codon": 0.1,
  "iterative_convergence threshold": 100,
  "random_agents": {
    "SISTER": 5
  }
}


As we have noticed in the examples above, the blackboard knowledge representation for agent communications, a vector of floats from zero to one is composed as follows:

sign_to_display \[float\*sign_size\]

num_trade_plans \* \(

type\[1 float: construct, buy , sell, or stop\] 

item\[float \* item_size: interpretation comes from ontology\] (Uses stop codons, to generalize)

tests\[num_tests \* \(float \* item_size for the test program, float \* item_size for the data to test, 1 float for threshold,  1 float for boolean is_hidden)\] (Uses stop codons) 

low (float: lower price interpreted with min_amount and max_amount)

high  (float: higher price interpreted with min_amount and max_amount)
sign_to_seek(float \* sign_size)

\)

The sign to display on the blackboard is a vector of floats (8 in the examples above), which is compared to the sign-to-seek field of someone seeking out the agent for an offer.
The cosine distance ranks agents according to how closely they resemble the ideal agent, and can be used in a variety of algorithms.

Each block represents a different trade offer or construction notice to put on the blackboard.
An agent can communicate up to `num_trade_plans` blocks, but the number of actual communications is controlled by the stop codon.
An agent may indicate an item to sell, buy or construct.  

The human-designed ontology includes input and output lists in \*args, \*\*kwargs Python syntax.
Input and output lists are important for determining the arity for assignment in the genetic programming [GEP (Karva)](https://www.gene-expression-programming.com/Tutorial002.asp) representation, and also for calculating the input and output for self-organized functions made by the agents in the course of their transactions.
Although it is not implemented now, a check could be made for input-output compatibility (now they just error out).
For example, the agents could conceivably invent a new clusterer that wasn't one of the listed ones, in which case they may indicate a clusterer using the human-designed general category with a stop codon immediately following, and then have the input and output list calculated from the Karva ordering.
Because the generated input and output list is used with software references, it only needs the \*args syntax part of the ontology.

The tests are optional to the agent and used in a buy block to require that software performs above a threshold before it is accepted, with or without revealing the test and data to pass (so that it can't be gamed or trained on). 
At this moment hidden price ranges and tests are not implemented yet.
In the future, the same tests in the construct and the sell block indicate that the agent has passed the test, before the item was put on the board, and to have this hidden is a note to the agent-self to pass the test before it is put on the board.
Agents do not need such tests in that they could rely on reputation or market price alone to incentivize themselves to quality, but such tests are good points of quality control, and necessary when dealing directly with human beings in establishing criteria for transaction validation. 

The messages shown above come from initialized, stubbed agents, and give an example of a successful set of trade plans.
There can also be randomized agents, whose trade plans are generated from a random vector of floats, which can be processed by algorithms like SISTER/CMS-ES.
We show the translation process below.

Now we load the simulation, and run it through several trades (we're not showing the output of the simulation run here, for space reasons).

In [None]:
from simulation.SnetSim import SnetSim

#first find out the size of the vector
snetsim = SnetSim(study_path='onehumantwobots.json')
snetsim.go()

In [4]:
agent = snetsim.schedule.agents[0]
print('The number of floats in a message vector is: '+str(agent.vector_size()))

The number of floats in a message vector is: 1148


For illustration purposes, we generate a float and send it to the method that can translate it into a blackboard message.
If you re-run (`SHIFT+ENTER`) the cell many times, you will see many examples.

In [11]:
import random
floatvec = [random.uniform(0, 1) for i in range(agent.vector_size())] 

message = agent.float_vec_to_trade_plan(floatvec)
print(json.dumps(message, indent=2))
print("comes from this vector:")
print(floatvec)

[
  {
    "type": "SISTER",
    "label": "SISTER Agent 0",
    "sign": [
      0.7021771749569637,
      0.4044440785819341,
      0.6448807059916245,
      0.9652423438582776,
      0.9559936973714738,
      0.9175402970091974,
      0.2717683069004847,
      0.7345675908788633
    ],
    "trades": [
      {
        "type": "construct",
        "sign": [
          0.722711866179287,
          0.23713232575456278,
          0.6893817453395683,
          0.8201122874145773,
          0.19904756199951956,
          0.5871142252014819,
          0.5547814568281402,
          0.9593870083694316
        ],
        "item": "vectorSpace_gensim_doc2vec_100size_20iterations_stop",
        "midpoint": 20.89432741268966,
        "range": 21.832687208696864,
        "tests": [
          {
            "stophere": true,
            "test": "test_vectorSpace",
            "data": "data_freetext_short",
            "threshold": 0.6071787751187272,
            "hidden": true
          },
          {
  

## The Human-Designed Ontology

The human-designed ontology is designated in the config file, a JSON file. 
We say "human-designed" because when agents buy and sell to each other, their constructions from other programs are themselves programs. 
The official SingularityNET ontology, or API of APIs, would contain all of the software entered into the net, but this possibly smaller representation is used solely for evolution, which may read from the official ontology and only use a subset of the available programs. 

Different instances of the ontology used in evolution may be created for individual problems, so that the functions that are made available to use in a solution can be reduced to those that are likely to be in that solution over a threshold, so that those functions may be weighted by their likelihood to be in a solution, and so that different parameter values to explore together may be indicated. 
All these statistical relations between functions and particular solutions can be learned from current solutions in the open source, as in the [Microsoft DeepCoder project](https://openreview.net/pdf?id=ByldLrqlx).  


A small subsection of ontology follows.
Its hierarchical structure is used to interpret the meaning of ontology items, explained below.

In [8]:
import json
with open('config.json') as json_file:  
    config = json.load(json_file)
    print(json.dumps(config['ontology']['clusterer']['nltk'], indent=2))
   # for type in config['ontology']:
        #print('Name: ' + type['name'])

{
  "_args": [
    {
      "type": "numpy.ndarray",
      "dtype": "float32"
    }
  ],
  "_weight": 0.3,
  "kmeans": {
    "_weight": 0.3,
    "_comment": "clusterer_nltk_kmeans",
    "_kwargs": {
      "n_clusters": {
        "type": "int"
      }
    },
    "_args": [
      {
        "type": "numpy.ndarray",
        "dtype": "float32"
      }
    ],
    "_return": [
      {
        "type": "numpy.ndarray",
        "dtype": "int32"
      }
    ],
    "5clusters": {
      "_args": [
        {
          "type": "numpy.ndarray",
          "dtype": "float32"
        }
      ],
      "_weight": 0.3,
      "_comment": "clusterer_nltk_kmeans_5clusters",
      "_kwarg_vals": {
        "n_clusters": 5
      },
      "_kwargs": {}
    },
    "10clusters": {
      "_args": [
        {
          "type": "numpy.ndarray",
          "dtype": "float32"
        }
      ],
      "_weight": 0.3,
      "_comment": "clusterer_nltk_kmeans_10clusters",
      "_kwarg_vals": {
        "n_clusters": 10
      

The representation in the ontology follows a consistent hierarchy from root to branch:
1. function type
2. data type or brand
3. algorithm
4. parameter1
5. parameter2
6. etc.

As consistency is important for evolvability, care is taken in creating the JSON file to list parameters in a consistent order, for example the number of clusters in clustering algorithms might all be listed first.
The siblings within a level are also listed in a meaningful order.
For example, all clustering algorithms should be sampled at 10 clusters, 30 clusters and 50 clusters.
Some sample paths from root to branch in the example ontology are:
```
test -> clusterer -> silouhette
data -> freetext -> editorial
vectorSpace -> gensim -> doc2vec -> 50size -> 200iterations -> 2minFrq
```
`50size`, `200iterations`, and `2minFrq` are three parameter values that our ontology designates should be tested together in gensim's doc2vec algorithm, a vectorSpace algorithm.
Admittedly there are many more possible combinations of parameters that can be explored than can be explicated individually in a JSON file, but these are the ones that are worth saving to disk as checkpoints.
Algorithms can still be used to zero-in on exact float values and parameter relations, but we would not save all combinations of these parameter values to disk.
Because we are dealing with curried functions, a function with one of the parameters bound is itself a function.
From root to branch we go from more general to more and more specific functions, until all values are bound and the output of the function only differs if it is stochastic.
So as we go from left to right, we go from broader to narrower possibilities.  

Internal variables, which do not constitute a next level of the tree, start with an underscore and inherit values from parent nodes, if they are not specified in a child node.
The internal variables at each level include those we would expect in an API: the types of the input and output, and keyword and positional arguments.
In the ontology file, the input values that are not bound are the ones expected from other agents.
If two or more inputs are not bound, then the input of two or more agents is required.
Although more than one output is allowed in Python, this prototype curried representation doesn't allow that.
The `_weight` internal variable is normalized with its siblings, and represents the probability that that a sibling node occurs in a solution, given the parent node.
It can be filled in with data from techniques that use the likelihood of function use given the problem such as Microsoft's DeepCoder.
Right now, however, the likelihood of any child is the same.  

The hierarchical design of functions gives gradient to the vector of floats representation of an item (to buy, sell, or construct) in the agent communication to the blackboard.
Each float in the float vector representation of an item represents a level in the hierarchy, with the values of floats to the left determining the meanings of floats to the right.
The first float in this hierarchy represents the type of a node: whether it's a test, data, preprocessor, vectorSpace or clusterer.
In this sample ontology all node types are equally weighted, so each has a 20% chance of being picked.
For example, if the float in position 1 fell in the range for clusterers (0.8 and above), then floats greater than 0.5 in position 2 might indicate the brand nltk, since there are only two brands.

Because the string ends with a stop codon on the left, and more specific answers occur on the left, a consensus about general facts about the item may form, or converge, before the specific facts about an item.
This representation works with gene switches, which can be disruptive, but the consistency of the meanings of the levels helps to mitigate disruptions to the left.
For example, a 300-sized vector would mean the same thing in a vector space whether the brand was nltk or scikit-learn.

Now we are in a position to interpret agent communication on the board.
First, we look at a scenario of individual float communications on the blackboard, interpreting each float and how they construct a program.
Next we take the steps to construct a program from those same floats.

## Example interpretation of the vectors of floats used by agents to communicate on the blackboard

Our example blackboard is intialized with five offers made by three humans.
The first two humans post offers to sell data and a test, respectively.
The third human offers to buy a test and data, and asks SingularityNET to construct software: a clusterer of the "Editorial tweets", as well as to add a column with cluster distance to a dataframe.

While interpreting the vector of floats on the blackboard remember that, except for the signs, each float (ranging from 0 to 1) is divided up by the probability of a value. 
`0.37` is interpreted as a sell because the possible values (alleles): buy, sell, construct, and stop are evenly distributed, and `0.37` is in the sell range from `0.25` to `0.5` (non-inclusive).
In the ontology, the allele probabilies are weighted and normalized.

So, we start interpreting the vector posted by the first human agent:
```
[0, 0, 0, 0, 0, 0, 0, 0.4, 
0.37, 0.22, 0.4, 0.17, 0.99, 0.87, 0.11, 0.27, 
0, 0, 0, 0, 0, 0, 0, 0]
```
The different floats in the vector mean:
```
0, 0, 0, 0, 0, 0, 0, 0.4 --> sign reserved for human

0.37 --> sell

0.22 --> data  
0.4 --> freetext 
0.17 --> editorial
0.99 --> stop --> points to data in the API of APIS.

0.87 --> test type not chosen 

0.11, 0.27 --> accepts between 11 and 27 agiTokens (hidden)

0, 0, 0, 0, 0, 0, 0, 0 --> sign not used by human
```

********************
The second human posts the message:
```
[0, 0, 0, 0, 0, 0, 0, 0.3, 
0.58, 0.18, 0.4, 0.88, 0.99, 0.59, 0.35, 0.45,
0, 0, 0, 0, 0, 0, 0, 0]
```
The different floats in the vector mean:
```
0, 0, 0, 0, 0, 0, 0, 0.3 --> sign reserved for human

0.58 --> sell

0.18 --> test
0.4 --> clusterer
0.88 --> silhouette
.99 --> stop --> points to test in api of apis

0.59 --> test type not chosen 

0.34, 0.45 --> accepts between 34 and 45 agiTokens (hidden)

0, 0, 0, 0, 0, 0, 0, 0 --> sign not used by human
```

******************
The third human posts a longer vector, which includes three different trade offers:
```
[0, 0, 0, 0, 0, 0, 0, 0.2,
0.18, 0.22, 0.4, 0.17, 0.99, 0.24, 0.27, 0.76, 0, 0, 0, 0, 0, 0, 0, 0,
0.08, 0.18, 0.4, 0.88, 0.99, 0.78, 0.34, 0.41, 0, 0, 0, 0, 0, 0, 0, 0,
0.08, 0.65, 0.99, 0.18, 0.4, 0.88, 0.99, 0.22, 0.4, 0.17, 0.99, 0.54, 0.67, 0.93, 0.34, .41,
0, 0, 0, 0, 0, 0, 0, 0]
```
Vector start and first offer:
```
0, 0, 0, 0, 0, 0, 0, 0.2 --> sign reserved for human

0.18 --> buy

0.22 --> data
0.4 --> freetext
0.17 --> editorial
0.99 --> stop --> points to data in the api of apis

0.24 --> test type not chosen 

0.27, 0.76 --> accepts between 25 and 76 agiTokens  

0, 0, 0, 0, 0, 0, 0, 0 --> sign not used by human
```
Second offer:
```
0.08 --> buy

0.18 --> test
0.4 --> clusterer
0.88 --> silhouette
0.99 --> stop --> points to test in api of apis

0.78 --> test type not chosen 

0.34, 0.41 --> accepts between 34 and 41 agiTokens 

0, 0, 0, 0, 0, 0, 0, 0 --> sign not used by human
```
Third offer and vector end. 
The clusterer that this human wants to buy must take freetext in a dataframe, from the column named "text", and output a vector of numbers in the field named "cluster":
```
0.08 --> buy

0.65 --> clusterer
0.99 --> stop 

0.18 --> test
0.4 --> clusterer
0.88 --> silhouette
0.99 --> stop

0.22 --> data
0.4 --> freetext
0.17 --> editorial
0.99 --> stop
0.54 --> threshold
0.67 --> hidden
0.93 --> stop

0.34, 0.41 --> accepts between 24 and 34 agiTokens 

0, 0, 0, 0, 0, 0, 0, 0 --> sign not used by human
```

**********
At least two constructing agents are needed to fulfill this request, because of the two data streams coming into the silouhette test (the vector space to cluster, and the clusterer that will do it).
Here is the communication of the first automated agent:
```
[0.68, 0.2, 0.34, 0.52, 0.31, 0.95, 0.28, 0.46,
0.08, 0.7, 0.43, 0.13, 0.80, 0.32, 0.28, 0.1, 0.4, 
0.27, 0.85, 0.03, 0.24, 0.95, 0.12, 0.37, 0.75,
0.21, 0.99, 0.76, 0.34, 0.41, 
0.33, 0.75, 0.94, 0.48, 0.06, 0.26, 0.84, 0.35, 0.93]
```
which represents:
```
0.68, 0.2, 0.34, 0.52, 0.31, 0.95, 0.28, 0.46 --> sign displayed

0.08 --> construct

0.7 --> clusterer
0.43 --> sklearn
0.13 --> kmeans
0.80 --> 20clusters
0.32 --> stop (not necesary if there are no parameters left)

0.28 --> test type not chosen 

0.1, 0.4 --> accepts between 10 and 40 agiTokens 

0.27, 0.85, 0.03, 0.24, 0.95, 0.12, 0.37, 0.75 --> sign sought

0.21 --> buy vectorSpace 
0.99 --> stop

0.76 --> test type not chosen 

0.34, 0.41 --> accepts between 34 and 41 agiTokens 

0.33, 0.75, 0.94, 0.48, 0.06, 0.26, 0.84, 0.35 --> sign sought

.93 --> stop
```

This representation is not hard to evolve, because the only hard guesses are the vectorSpace purchase, the clusterer, and the construct, while the rest of the components have a smooth gradient.
A rich ecosystem of problems on the blackboard that include other vector spaces, besides the one ordered here, would help to make this agent even easier to evolve.
Because the clusterer agent takes two inputs (the clustering method, and the vector space to cluster), at most two more construct or buy blocks are allowed, and then there is an automatic stop.
In this case the stop evolved anyway, so that the bought item is used twice.

Transactions only go through if all pieces are present; assuming that will happen, this is the translation of the code that the human has purchased so far:

```
blackboard['test_clusterer_silouhette']

(blackboard['clusterer_sklearn_kmeans_20clusters']

(vectorSpace

)) 

(vectorSpace)
```
A vector space is needed as an input to this clusterer.

Each entry in the blackboard dictionary is the curry corresponding to its name.
Since there is a fork, another agent is needed to create the vector space.

Here we show one such agent, to demonstrate a minimal setup.
This is harder to evolve because it involves the data guess, the vector guess, and multiple construction guesses.
However, the multiple construction of preprocessors has gradient, because the program will still work and get a gradient answer from test.
The use of the sign field by communicating agents would make it easier still to evolve, as will be demonstrated in our agent-based coevolution section.

The vector message posted by the agent in this case is:
```
[0.63, 0.52, 0.54, 0.82, 0.91, 0.05, 0.22, 0.57,
0.08, 0.73, 0.84, 0.11, 0.88, 0.79, 0.62, 0.99, 0.28, 0.34, 0.41,
0.53, 0.25, 0.34, 0.46, 0.76, 0.22, 0.81, 0.75,
0.18, 0.40, 0.23, 0.13, 0.20, 0.28, 0.90, 0.93,
0.13, 0.43, 0.46, 0.13, 0.51, 0.28, 0.96, 0.96,
0.03, 0.39, 0.88, 0.13, 0.82, 0.28, 0.98, 0.97]
```

which is explained as:
```
0.63, 0.52, 0.54, 0.82, 0.91, 0.05, 0.22, 0.57 --> sign displayed

0.08 --> construct

0.73 --> vectorSpace
0.84 --> doc2vec
0.11 --> gensim
0.88 --> size200
0.79 --> iterations1000
0.62 --> minfreq5
0.99 --> stop 
0.28 --> test type not chosen 
0.34, 0.41 --> accepts between 34 and 41 agiTokens 

0.53, 0.25, 0.34, 0.46, 0.76, 0.22, 0.81, 0.75 --> sign sought

0.18 --> construct
0.40 --> preprocessor
0.23 --> freetext
0.13 --> emojiRemoval
0.20 ...
0.28 --> test type not chosen 
0.90 --> stop
0.93 --> stop

0.13 --> construct
0.43 --> preprocessor
0.46 --> freetext
0.13 --> lemmatization
0.51 ...
0.28 --> test type not chosen 
0.96 --> stop
0.96 --> stop

0.03 --> construct
0.39 --> preprocessor
0.88 --> freetext
0.13 --> stopwords
0.82 ...
0.28 --> test type not chosen 
0.98 --> stop
0.97 --> stop
```

This translates to a complete program:

```
ontology['test_clusterer_silouhette']

(ontology['clusterer_sklearn_kmeans_20clusters']

(vectorSpace

)) 

(vectorSpace)

vectorSpace=ontology['vectorSpace_gensim_doc2vec_size200_iterations1000_minfreq5']

(data = ontology['preprocessor_freetext_emoji_removal']

(data = ontology['preprocessor_freetext_lemmatization']

(data = ontology['preprocessor_freetext_stopword']

(data = ontology['data_freetext_editorial']

))))
```

## Creating a Python Program from an evolvable representation through Currying

We first demonstrate the Python [curry function](https://mtomassoli.wordpress.com/2012/03/18/currying-in-python/) we're using, applied to existing NLP Python programs, mentioned in the above example of an ontology and blackboard communication. 


In [13]:
# Coded by Massimiliano Tomassoli, 2012.
#
def genCur(func, unique = True, minArgs = None):
    """ Generates a 'curried' version of a function. """
    def g(*myArgs, **myKwArgs):
        def f(*args, **kwArgs):
            if args or kwArgs:                  # some more args!
                # Allocates data to assign to the next 'f'.
                newArgs = myArgs + args
                newKwArgs = dict.copy(myKwArgs)

                # If unique is True, we don't want repeated keyword arguments.
                if unique and not kwArgs.keys().isdisjoint(newKwArgs):
                    raise ValueError("Repeated kw arg while unique = True")

                # Adds/updates keyword arguments.
                newKwArgs.update(kwArgs)

                # Checks whether it's time to evaluate func.
                if minArgs is not None and minArgs <= len(newArgs) + len(newKwArgs):
                    return func(*newArgs, **newKwArgs)  # time to evaluate func
                else:
                    return g(*newArgs, **newKwArgs)     # returns a new 'f'
            else:                               # the evaluation was forced
                return func(*myArgs, **myKwArgs)
        return f
    return g

def cur(f, minArgs = None):
    return genCur(f, True, minArgs)

def curr(f, minArgs = None):
    return genCur(f, False, minArgs)

# Simple Function.
def func(a, b, c, d, e, f, g = 100):
    print(a, b, c, d, e, f, g)


Next, we define all the functions needed for this NLP pipeline.
These functions will later be curried to use in the simulation.

First, we make the tests:

In [14]:
def test_clusterer_silhouette(X, Y):
    # "_args": [{  "type": "numpy.ndarray",  "dtype": "float32"},
    #           {"type": "numpy.ndarray", "dtype": "int32"}],
    # "_return": [{"type": "float"}]
    
    # we only want to test cosine metric for this example, 
    # but it could be a parameter in other cases
    
    from sklearn import metrics
    print('test_clusterer_silhouette')
    silhouette = metrics.silhouette_score(X, Y, metric = 'cosine')
    return (silhouette)

def test_clusterer_calinskiHarabaz(X, Y):
    # "_args": [{  "type": "numpy.ndarray",  "dtype": "float32"},
    #           {"type": "numpy.ndarray", "dtype": "int32"}],
    # "_return": [{"type": "float"}]
    
    from sklearn import metrics
    
    calinski_harabaz = metrics.calinski_harabaz_score(X, Y) 
    return (calinski_harabaz)

Then the NLP routines

In [15]:
def vectorSpace_gensim_doc2vec(X, size, iterations, minfreq):
     #   "_args": [{"type": "list","firstElement":"gensim.models.doc2vec.TaggedDocument" }],
     #   "_return": [{"type": "numpy.ndarray","dtype": "float32" }
    
    import gensim
    import numpy as np
    from sklearn.preprocessing import StandardScaler 
    
    print('vectorSpace_gensim_doc2vec')
    model = gensim.models.doc2vec.Doc2Vec(size=size, min_count=minfreq, iter=iterations, dm=0)
    model.build_vocab(X)
    model.train(X, total_examples=model.corpus_count, epochs=model.iter)
    cmtVectors = [model.infer_vector(X[i].words) for i in range(len(X))]
    cmtVectors = [inferred_vector for inferred_vector in cmtVectors 
                  if  not np.isnan(inferred_vector).any() 
                  and not np.isinf(inferred_vector).any()]
   
    X = StandardScaler().fit_transform(cmtVectors)
    return(X)
    
def preprocessor_freetext_tag(X):
    # convert a list of strings to a tagged document
    # if it is a list of a list of strings broadcast to a list of tagged documents

    #   "_args": [{"type": "list","firstElement":"string" }],
    #   "_return": [{"type": "list","gensim.models.doc2vec.TaggedDocument" }]
    
    import gensim
    print ('preprocessor_freetext_tag')
    
    tag = lambda x, y: gensim.models.doc2vec.TaggedDocument(x, [y])
    
    if type(X) is str:
        tagged = tag(X, X)
    else:
        tagged = [tag(x, y) for y, x in enumerate(X)]
    return (tagged)
    
def preprocessor_freetext_lemmatization(X):
    # converts string documents into list of tokens
    # if given a list, broadcasts
    
    #   "_args": [{"type": "list","firstElement":"string" }],
    #   "_return": [{"type": "list","firstElement":"list" }]
    
    import gensim
    
    print('preprocessor_freetext_lemmatization')
    stopfile = 'stopwords.txt'
    lemmatized = []
    with open(stopfile,'r') as f:
        stopwords = {word.lower().strip() for word in f.readlines()}
        lemma = lambda x:[b.decode('utf-8') for b in gensim.utils.lemmatize(str(x), stopwords=frozenset(stopwords))]
    
        if type(X) is str:
            lemmatized = lemma(X)
        else:
            lemmatized = [lemma(x) for x in X]
    
    return(lemmatized)
    
def preprocessor_freetext_strip (X):
    # strips addresses and emojis. 
    # if X is a string: strip
    # If X is a list: broadcast
    
    #   "_args": [{"type": "list","firstElement":"string" }],
    #   "_return": [{"type": "list","firstElement":"string" }]
    
    import re
    
    print("preprocessor_freetext_strip")
    code ='utf-8'
    strip = lambda x: re.sub(r"\s?http\S*", "", x).encode(code).decode(code)
    
    #strip = lambda  x: re.sub(r"\s?http\S*", "", x).decode(code)
    #strip = lambda  x: re.sub(r"\s?http\S*", "", x.decode(code))
    #strip = lambda  x: re.sub(r"\s?http\S*", "", x)
    
    if type(X) is str:
        decoded = strip(X)
    else:
        decoded = [strip(x) for x in X]
    return (decoded)
       
def preprocessor_freetext_shuffle (X):
    #   "_args": [{"type": "list" }],
    #   "_return": [{"type": "list" }]
    
    import random
    
    print("preprocessor_freetext_shuffle")
    random.shuffle(X)
    return (X)

Make the data

In [16]:
def data_freetext_csvColumn(path, col='text'):
    #  returns a list of documents that are strings
    #   "_return": [{"type": "list","firstElement":"string" }]
    
    import pandas as pd
    
    print('data_freetext_csvColumn_short')
    raw_data = pd.read_csv(path, encoding="ISO-8859-1")
    docList = [raw_data.loc[i,col] for i in range (len(raw_data)) if raw_data.loc[i,col]]
    return docList

def data_vector_blobs(n_samples=1500):
    from sklearn.datasets import make_blobs
    
    X, Y = make_blobs(n_samples=n_samples, random_state=8)
    return X

Make the clusterers

In [17]:
def clusterer_sklearn_kmeans(X, n_clusters):
    # we want to try different numbers of clusters, so it is a parameter
        
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}

    from sklearn.cluster import MiniBatchKMeans
    
    print ('clusterer_sklearn_kmeans')
    clusterAlgSKN = MiniBatchKMeans(n_clusters).fit(X)
    clusterAlgLabelAssignmentsSKN= clusterAlgSKN.predict(X)
    return (clusterAlgLabelAssignmentsSKN)

def clusterer_sklearn_agglomerative(X, n_clusters):
    # we want to try different numbers of clusters, so it is a parameter
    
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}

    from sklearn.cluster import AgglomerativeClustering
    
    average_linkage = AgglomerativeClustering(linkage="average", 
        affinity="cosine",n_clusters=params['n_clusters'], connectivity=connectivity).fit(X)
    clusterAlgLabelAssignmentsSAG= average_linkage.labels_.astype(np.int)
    
    return (clusterAlgLabelAssignmentsSAG)

def clusterer_sklearn_affinityPropagation(X, n_clusters):
    # we want to try different numbers of clusters, so it is a parameter
    
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}

    from sklearn.cluster import AffinityPropagation
    
    affinity_propagation = cluster.AffinityPropagation(damping=params['damping'], preference=params['preference']).fit(X)
    clusterAlgLabelAssignmentsSAP= affinity_propagation.predict(X)
    
    return (clusterAlgLabelAssignmentsSAP)

def clusterer_sklearn_meanShift(X, n_clusters):
    # we want to try different numbers of clusters, so it is a parameter
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}

    from sklearn.cluster import MeanShift
    
    bandwidth = sklearn.cluster.estimate_bandwidth(X, quantile=params['quantile'])
    ms = cluster.MeanShift(bandwidth=bandwidth, bin_seeding=True).fit(X)
    clusterAlgLabelAssignmentsSM= ms.predict(X)
        
    return (clusterAlgLabelAssignmentsSM)

def clusterer_sklearn_spectral(X, n_clusters):
    # in this case we want to try different numbers of clusters, so it is a parameter
    
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}
        
    from sklearn.cluster import SpectralClustering
    
    spectral = SpectralClustering(
        n_clusters=params['n_clusters'], eigen_solver='arpack',
        affinity="cosine")
    try:
        clusterAlgLabelAssignmentsSS= None
        spectral = spectral.fit(X)
    except ValueError as e:
        pass
    else:
        clusterAlgLabelAssignmentsSS= spectral.labels_.astype(np.int)
    
    return (clusterAlgLabelAssignmentsSS)

def clusterer_sklearn_ward(X, n_clusters):
    # in this case we want to try different numbers of clusters, so it is a parameter
    
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}
        
    from sklearn.cluster import AgglomerativeClustering
    
    connectivity = kneighbors_graph(
        X, n_neighbors=params['n_neighbors'], include_self=False)
    # make connectivity symmetric
    connectivity = 0.5 * (connectivity + connectivity.T)
    ward = AgglomerativeClustering(n_clusters=params['n_clusters'], linkage='ward',
                                   connectivity=connectivity).fit(X)
    clusterAlgLabelAssignmentsSW= ward.labels_.astype(np.int)
    
    return (clusterAlgLabelAssignmentsSW)

def clusterer_sklearn_dbscan(X, n_clusters):
    # in this case we want to try different numbers of clusters, so it is a parameter
    
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}
        
    from sklearn.cluster import DBSCAN
    
    dbscan = DBSCAN(eps=params['eps']).fit(X)
    clusterAlgLabelAssignmentsSD= dbscan.labels_.astype(np.int)
    
    return (clusterAlgLabelAssignmentsSD)

def clusterer_sklearn_birch(X, n_clusters):
    # in this case we want to try different numbers of clusters, so it is a parameter
    
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}
        
    from sklearn.cluster import Birch
    
    birch = Birch(n_clusters=params['n_clusters']).fit(X)
    clusterAlgLabelAssignmentsSB= birch.predict(X)
        
    return (clusterAlgLabelAssignmentsSB)

def clusterer_sklearn_gaussian(X, n_clusters):
    # in this case we want to try different numbers of clusters, so it is a parameter
    
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}
        
    from sklearn import mixture
    
    clusterAlgSGN = mixture.GaussianMixture(n_components=params['n_clusters'], covariance_type='full').fit(X)
    clusterAlgLabelAssignmentsSGN= clusterAlgSGN.predict(X)
    
    return (clusterAlgLabelAssignmentsSGN)

def clusterer_nltk_kmeans(X, n_clusters):
    # in this case we want to try different numbers of clusters, so it is a parameter
    
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}
        
    from nltk.cluster.kmeans import KMeansClusterer
    
    clusterAlgNK = KMeansClusterer(params['n_clusters'], distance=nltk.cluster.util.cosine_distance, repeats=25, avoid_empty_clusters=True)
    clusterAlgLabelAssignmentsNK = clusterAlgNK.cluster(cmtVectors, assign_clusters=True)
    
    return (clusterAlgLabelAssignmentsNK)

def clusterer_nltk_agglomerative(X, n_clusters):
    # in this case we want to try different numbers of clusters, so it is a parameter
    
    # "_args": [{"type": "numpy.ndarray","dtype": "float32"} ],
    #   "_return": [{ "type": "numpy.ndarray","dtype": "int32"}
        
    from nltk.cluster.gaac import GAAClusterer
    
    clusterAlgNG = GAAClusterer(num_clusters=params['n_clusters'], normalise=True, svd_dimensions=None)
    clusterAlgLabelAssignmentsNG = clusterAlgNG.cluster(cmtVectors, assign_clusters=True)
    
    return (clusterAlgLabelAssignmentsNG)

List curried version of the above functions in the ontology:

In [18]:
#Fill the initial function
ontology= {}
ontology['data_freetext_csvColumn'] = curr(data_freetext_csvColumn)
ontology['data_vector_blobs'] = curr(data_vector_blobs)
ontology['preprocessor_freetext_shuffle'] = curr(preprocessor_freetext_shuffle)
ontology['preprocessor_freetext_strip'] = curr(preprocessor_freetext_strip)
ontology['preprocessor_freetext_lemmatization']  = curr(preprocessor_freetext_lemmatization)
ontology['preprocessor_freetext_tag'] = curr(preprocessor_freetext_tag)
ontology['vectorSpace_gensim_doc2vec'] = curr(vectorSpace_gensim_doc2vec)
ontology['clusterer_sklearn_kmeans'] = curr(clusterer_sklearn_kmeans)
ontology['clusterer_sklearn_agglomerative'] = curr(clusterer_sklearn_agglomerative)
ontology['clusterer_sklearn_affinityPropagation'] = curr(clusterer_sklearn_affinityPropagation)
ontology['clusterer_sklearn_meanShift'] = curr(clusterer_sklearn_meanShift)
ontology['clusterer_sklearn_spectral'] = curr(clusterer_sklearn_spectral)
ontology['clusterer_sklearn_ward'] = curr(clusterer_sklearn_ward)
ontology['clusterer_sklearn_dbscan'] = curr(clusterer_sklearn_dbscan)
ontology['clusterer_sklearn_birch'] = curr(clusterer_sklearn_birch)
ontology['clusterer_sklearn_gaussian'] = curr(clusterer_sklearn_gaussian)
ontology['clusterer_nltk_agglomerative'] = curr(clusterer_nltk_agglomerative)
ontology['clusterer_nltk_kmeans'] = curr(clusterer_nltk_kmeans)
ontology['test_clusterer_silhouette'] = curr(test_clusterer_silhouette)
ontology['test_clusterer_calinskiHarabaz'] = curr(test_clusterer_calinskiHarabaz)

Now, using these programs stored in the ontology, we recreate the contruction that was machine-learned by the agents in the example above.
Except that, instead of the full "Editorial" text, we use a shortened version for illustration purposes.

In [19]:
# Create the constructions that would be machine learned, using a shortened dataset.
# These are just a few examples.
# The rest are in the Registry.py file

ontology['data_freetext_csvColumn_short'] = ontology['data_freetext_csvColumn'](path='data/short.csv')
ontology['clusterer_sklearn_kmeans_20clusters'] = ontology['clusterer_sklearn_kmeans'](n_clusters=20)
ontology['vectorSpace_gensim_doc2vec_size200_iterations1000_minfreq5'] = ontology['vectorSpace_gensim_doc2vec'](size=200)(iterations=1000)(minfreq=5)

Finally, we can build what the agents in the above example can do with the programs they constructed:

In [21]:
# Agent that builds the vector space, the NLP solution
a = ontology['data_freetext_csvColumn_short']()
b = ontology['preprocessor_freetext_shuffle'](a)()
c = ontology['preprocessor_freetext_strip'](b)()
d = ontology['preprocessor_freetext_lemmatization'](c)()
e = ontology['preprocessor_freetext_tag'](d)()
f = ontology['vectorSpace_gensim_doc2vec_size200_iterations1000_minfreq5'](e)()

# Agent that builds the clusterer, using the vector space it bought
g = ontology['clusterer_sklearn_kmeans_20clusters'](f)()
h = ontology['test_clusterer_silhouette'] (f)(g)()

data_freetext_csvColumn_short
preprocessor_freetext_shuffle
preprocessor_freetext_strip
preprocessor_freetext_lemmatization
preprocessor_freetext_tag
vectorSpace_gensim_doc2vec


  if sys.path[0] == '':


clusterer_sklearn_kmeans
test_clusterer_silhouette


Now we show the results from different parts of a couple of agents.
First, the agent implementing the NLP solution:

In [22]:
# A vector from the vector space, as created by doc2vec
f[:1]

array([[-0.60186177,  0.85971031, -0.1437311 , -1.70166747,  0.40024022,
         0.18544679, -0.38854304, -0.49366646,  1.20071604, -0.0942759 ,
        -1.0556894 , -0.50744381,  0.36524885,  0.61563718, -1.64886513,
        -0.22921731, -1.26717072, -1.31596627,  1.1098382 , -0.6069177 ,
        -0.13373245,  0.80917209,  0.43409153,  0.16854793, -0.15181725,
         0.19128376, -0.70133634,  0.44582226,  0.23319934, -0.03907131,
         0.29975345, -0.02095504, -0.81913617, -0.02647975, -0.17945205,
         0.55750605,  0.70994787, -0.73556381,  0.75684461, -0.62925441,
         0.0677403 , -0.89871581,  0.7508793 ,  1.50891954, -0.64969293,
         1.64733174, -0.17171449,  1.17766781,  1.83468742, -0.63252826,
         0.33138572, -0.37681217, -0.34734143,  0.07133418, -0.16775489,
         1.84754129,  0.73820649,  0.28920507,  0.22847022,  0.12064567,
         0.27037829,  0.75594784, -0.54704048, -0.03762876,  0.91715142,
        -0.58148582,  0.611591  , -1.11148123,  0.2

At last, the clusterer agent, which clusters the results, and performs a test on them:

In [22]:
# The clustering results
g[:5]

array([10, 19,  1,  8,  1], dtype=int32)

In [23]:
# The test score (the output of the clusterer)
h

0.12116911496103314

# GEP representation

Once the individual Python functions are parameterized, the remaining unbound input parameters are filled with calls to other Python programs.
Once agents have finished constructing and buying programs, they have a sequential list, the input and output of which we must interpret.

For this, we use GEP, or Genetic Expression Program representation.
Because we have a tree representation of general Python programs that have different arities, it is disruptive in that a Python program of different arity can change the meaning of all subsequent programs, and the farther away the argument list is, the more likely it is to be disrupted.
We stop at both a stop codon and when the input/output types do not match.
If we decided to associate meaning with positions, then we would need to have consistent arity throughout, filling in blank spaces with nulls.
Although this would help convergence, it would take too much space.

An alternative is to not construct long programs but rather to trade tokens for them.
This creates good market conditions for trade and encourages specialization; that is, agents that are worth more money as they are repeatedly asked to do the same kind of problem.
From this we expect types to emerge, those specifically needed for certain applications.
These agents are expected to communicate their emergent type through the arbitrary sign.

We start with a test program with known answer.
For a list of functions named with alphabetical letters in alphabetical order, GEP uses Karva notation to call them according to the illustrated tree (where their arities are as listed in the arity map).
That is, if root function `a` had arity two, then functions `b` and `c` would be its arguments, which are the next two values on the list.
The terminal functions are those with arity 0, which in our case return their names.
So, if each of the functions returned what was sent up in order, and terminals sent their names, then GEP representation would print the result: `pqklrso`.
We take out a portion of the simulation programs to clearly show how the representation works.

In [24]:
from IPython.display import Image, display

display(Image(filename='karva.jpg'))

<IPython.core.display.Image object>

We display here the functions necessary for storing all calculations and retrieve them in future scenarios, instead of calculating them again:

In [25]:
def pickleThis(fn):  # define a decorator for a function "fn"
    def wrapped(self, *args, **kwargs):   # define a wrapper that will finally call "fn" with all arguments    
        cachefile = None
        if args and args[0] in self.pickles:
            pickle_name = self.pickles[args[0]]
            cachefile = self.parameters['output_path']+ 'pickles/' + pickle_name

        if cachefile and os.path.exists(cachefile):
            with open(cachefile, 'rb') as cachehandle:
                print("using pickled result from '%s'" % cachefile)
                return pickle.load(cachehandle)

        # execute the function with all arguments passed
        res = fn(self, *args, **kwargs)

        pickle_name = str(self.pickle_count) + '.p'
        self.pickle_count += 1
        cachefile = self.parameters['output_path']+ 'pickles/' + pickle_name

        # write to cache file
        with open(cachefile, 'wb') as cachehandle:
            pickle.dump(res, cachehandle)
            self.pickles[args[0]] = pickle_name

        return res

    return wrapped

In [26]:
from boltons.cacheutils import cachedmethod
from boltons.cacheutils import LRU
import os
import pickle
import json
from collections import OrderedDict

class SnetSim_test(object):
    # TODO: implement hidden tests, because those that are marked hidden now can be seen on the blackboard by all.
    # Important because they constitute a hidden testing set as in kaggle
    
    def __init__(self, config_path, registry):
        def print_itself(a):
            print(a)
        with open(config_path) as json_file:  
            config = json.load(json_file)
        #print(json.dumps(config['ontology'], indent=2))
        self.parameters = config['parameters']
        self.blackboard = config['blackboard']
        self.ontology = config['ontology']
        self.registry = registry
        
        pickle_config_path = config['parameters']['output_path']+ 'pickles/' +  'index.p'
        if pickle_config_path and os.path.exists(pickle_config_path):
            with open(pickle_config_path, 'rb') as cachehandle:
                pickle_config =  pickle.load(cachehandle)
        else:
            pickle_config = OrderedDict([("count", 0), ("pickles", OrderedDict())])
            
        self.pickle_count = pickle_config['count'] # contains the next number for the pickle file
        self.pickles = pickle_config['pickles']
        
        self.resultTuple = ()
        #self.cache = LRU(max_size = 512)
        self.cache = LRU()
        
    @cachedmethod('cache')
    @pickleThis
    def memoisePickle(self, tupleKey):
        if len(self.resultTuple):
            result = self.registry[tupleKey[0]](*self.resultTuple)
        else:
            result = self.registry[tupleKey[0]]()
        return (result)

    def callMemoisePickle(self, root):
        resultList = []
        funcList = []
        argTuple = ()
        if root in self.gepResult:
            args = self.gepResult[root]
            argTuple = tuple(args)
            for arg in args:
                tfuncTuple, tresult = self.callMemoisePickle(arg)
                resultList.append(tresult)
                funcList.append(tfuncTuple)
        carriedBack = tuple(funcList)
        funcTuple = (root, carriedBack)
        
        self.resultTuple = tuple(resultList) #You have to set a global to memoise and pickle correctly
              
        result = self.memoisePickle(funcTuple)

        return(funcTuple, result)  
    
     
    def gep(self):
        # Assign input and output functions as defined by the Karva notation.  
        # Get arity of the items and divide the levels according to that arity, then make the assignments across the levels
        # TODO: take input / output compatability into account, skipping that which 
        
        # Example: for the following program list with the following arity, the karva notation result is the following 
        learnedProgram = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s']
        arity = {'a':2,'b':3,'c':2,'d':2,'e':1,'f':1,'g':2,'h':1,'i':1,'j':1,'k':0,'l':0,'m':1,'n':1,'o':0,'p':0,'q':0,'r':0,'s':0}
        
        # This is what comes out of the gep function: an assignment list of what functions form the parameters of the other
        # functions based on arity. It is calculated, but shown here for convenience.
        self.results = {'a':['b','c'],'b':['d','e','f'],'c':['g','h'], 'd':['i','j'],'e':['k'],
              'f':['l'],'g':['m','n'],'h':['o'],'i':['p'],'j':['q'],'m':['r'], 'n':['s']}
        
        # Divide into levels
        # Don't modify the functionList
        functionList = []
        functionList.extend(learnedProgram)
        
        levels= {1 : [functionList.pop(0)]}
        
        currentLevel = 1
        #length_next_level = 0
        maxiters = 100
        count = 0
        while functionList and count < maxiters:
            count +=1
            length_next_level = 0
            for func in  levels[currentLevel]:
                length_next_level += arity[func]
            currentLevel += 1
            levels[currentLevel]= functionList[0:length_next_level]
            functionList = functionList[length_next_level:]
            
            
        # Make assignments
        gepResult = OrderedDict()
        for level, functionList in levels.items():
            next_level = level + 1
            cursor= 0
            for func in functionList:
                next_cursor = cursor + arity[func]
                if next_level in levels:
                    gepResult[func]= levels[next_level][cursor:next_cursor]
                cursor = next_cursor
                
        return(gepResult)
    
    def performTest(self):
        score = 0
        #print ("in perform test")
        self.gepResult = self.gep() # Put the ordered Dictionary in the global so a decorated function can access
        if any(self.gepResult.values()):
            root = next(iter(self.gepResult.items()))[0]
            score = self.callMemoisePickle(root)
        return score

Now, we call gep, memoise, and pickle simultaneously. 
Boltons is used for memoising, while we wrote our own decorator function for pickling.
The decorator function takes a tuple tree as the input, that has a one to one correspondance with an arrangement of programs, and then the answer.
For the answer, we first see a tuple tree and then the answer that we expected.
The tuple tree uniquely designates the order of functions.
Since tuples are hashable in Python, this representation for the curried functions enables us to both pickle and memoise the exact function call set so that not only each result, but each parital result need never be called again.
By memoise we need that a specific amount of RAM is set aside for results and within are kept the most recent computations, as is needed in many machine learning programs.
In every case, the pickles are also saved to disk for subsequent runs of the same scenario.

First we will look at the memoising stats, showing that the cache was hit the second time that the same program was called, so it was not recalculated but retrieved.
Then we look at the pickles made during the simulation intialization and our short run above. 

In [27]:
# Create test_registry registry for this example
def return_arg(a):
    return a
res = cur(return_arg)
test_registry = {i: res(i) for i in 'pqklrso'}
test_registry['a'] = lambda b, c: b + c
test_registry['b'] = lambda d, e, f: d + e + f
test_registry['c'] = lambda b, c: b + c
test_registry['d'] = lambda b, c: b + c
test_registry['e'] = lambda b: b
test_registry['f'] = lambda b: b
test_registry['g'] = lambda b, c: b + c
test_registry['h'] = lambda b: b
test_registry['i'] = lambda b: b
test_registry['j'] = lambda b: b
test_registry['m'] = lambda b: b
test_registry['n'] = lambda b: b

In [28]:
snetsim = SnetSim_test('onehumantwobots.json', test_registry)
snetsim.performTest()

(('a',
  (('b',
    (('d', (('i', (('p', ()),)), ('j', (('q', ()),)))),
     ('e', (('k', ()),)),
     ('f', (('l', ()),)))),
   ('c',
    (('g', (('m', (('r', ()),)), ('n', (('s', ()),)))),
     ('h', (('o', ()),)))))),
 'pqklrso')

In [29]:
print((snetsim.cache.hit_count, snetsim.cache.miss_count, snetsim.cache.soft_miss_count))

(0, 19, 0)


In [30]:
snetsim.performTest()

(('a',
  (('b',
    (('d', (('i', (('p', ()),)), ('j', (('q', ()),)))),
     ('e', (('k', ()),)),
     ('f', (('l', ()),)))),
   ('c',
    (('g', (('m', (('r', ()),)), ('n', (('s', ()),)))),
     ('h', (('o', ()),)))))),
 'pqklrso')

In [31]:
print((snetsim.cache.hit_count, snetsim.cache.miss_count, snetsim.cache.soft_miss_count))

(19, 19, 0)


To show that not only the result is stored, but also important intermediates which will help speed up combinitorial optimization, we look in the pickle index that was saved to directory when ten iterations of the simulation were run above.
The pickles are named with a number and a `.p` extension.
The index maps the program order to the pickle name in the pickled directory.
When the simuation is run again, the pickles are reloaded.
We see that 10 pickles have been saved.

In [32]:
import os
import pickle

pickled = "competing_clusterers/pickles/index.p"
if os.path.exists(pickled):
    with open(pickled, 'rb') as cachehandle:
        pickle_index =  pickle.load(cachehandle)

In [33]:
pickle_index

{'count': 10,
 'pickles': {('vectorSpace_gensim_doc2vec_100size_200iterations_5minFreq',
   (('data_freetext_editorial', ()),)): '1.p',
  ('test_clusterer_silhouette',
   (('vectorSpace_gensim_doc2vec_100size_200iterations_5minFreq',
     (('data_freetext_Detector', ()),)),)): '5.p',
  ('data_freetext_Detector', ()): '3.p',
  ('data_freetext_editorial', ()): '0.p',
  ('preprocessor_freetext_tag',
   (('data_freetext_editorial', ()),)): '6.p',
  ('test_clusterer_silhouette',
   (('vectorSpace_gensim_doc2vec_100size_200iterations_5minFreq',
     (('data_freetext_editorial', ()),)),)): '2.p',
  ('vectorSpace_gensim_doc2vec_50size_200iterations_5minFreq',
   (('preprocessor_freetext_tag',
     (('data_freetext_editorial', ()),)),)): '7.p',
  ('clusterer_sklearn_affinityPropagation_10clusters',
   (('vectorSpace_gensim_doc2vec_50size_200iterations_5minFreq',
     (('preprocessor_freetext_tag',
       (('data_freetext_editorial', ()),)),)),)): '8.p',
  ('test_clusterer_silhouette',
   (('clu

## The Simulation Loop

The simulation consists of a registry of programs with which the machine learning agents compose solutions by parameterizing and ordering them, an ontology that describes those programs from general to specific, a SnetSim agent that takes care of global functionality like the caches and calling the staged activation of the agents, and finally an SnetAgent that has all the routines to select partners, which the user subclasses to implement their own machine/reinforcement learning algorithms.  

Users submit agents that can perform two routines: a `step` routine that puts a message on the blackboard, and a `payment_notification` routine that the user can write to keep track of trade payments.
If the user submits a machine/reinforcement learning algorithm, then it would submit a message that would optimize a quality, such as quantity of AGI tokens.

In the simulation, every agent puts their message on the blackboard at random.
An agent's only job is to put on the blackboard the list of programs it will buy, sell and construct, as well as the terms for trade.
The agent gets responses from its message on the blackboard not immediately, but before its next message is due.
So instead of taking a step and then getting a response, as in OpenAI Gym, the SingularityNET simulation starts with a response (from the last message) and then a step.
The simulation calls the step, so that all agents have time to move before a response is received.

After all agents step, for every buy on the blackboard, the simulation ranks those with overlaping prices and items of the correct categories by the cosine distance of the sign the buyer seeks to the sign that the seller is displaying.
Selection is done by roulette wheel, where the farthest sign of agents that have correspending trade plans has a zero percent chance of being chosen.

After trades are made, each agent has a list of programs in a row, that will be interpreted by a call to GEP.
Tests are then run if required, and if the agent is a human, funds are distributed.
The agents are notified of change of funds at the time their particular trade is part of a solution that a human buys, but can also see all the messages on the blackboard: who won trades and money, and how well each did on every test.  

This may be a centralized market with the current settings of the program, but what the agent sees is easily modified with a Mesa network, including being able to see only its neighbors and having to pay a price for hopping to more distant neighobors, as one might expect in a blockchain network.

The same configuration file is written to in the logs, including the ofers that each agent make in a buy, the similarity of the agents, what agents were chosen, their price, and their scores on each test.
We take a log from the ninth iteration of the simulation that was run earlier in this notebook.
In this run the human, the first agent, has gained many pieces of information since the last message submission, including sign similarities and probabilities of being accepted, test scores, and prices.
By examining signs of agents with matching plans, the sign the first offerer displayed gave him have about 75% chance of being selected, while the sign the third offerer displayed gave him a 25% chance of selection; the less prefered still won, possibly because the more preferred agent did not purchase a specialist vector space (as it had in other stochastic runs).
Scores are lower then they would be for the software because we are using a small dataset here, just to demonstrte the functionality of the simulation package.  

In [34]:
import json
log = 'onehumantwobots/logs/log9.000000000000021.txt'
with open(log) as json_file:  
    config = json.load(json_file)
    print(json.dumps(config, indent=2))

[
  {
    "type": "SISTER",
    "label": "Cluster Seeking Human, SISTER Agent 0",
    "distributes": true,
    "sign": [
      0.1830227068394336,
      0.6860171547991621,
      0.588006425485045,
      0.0007370082964249609,
      0.20146993683524222,
      0.6063306727046138,
      0.8771210556436853,
      0.14808878741060963
    ],
    "trades": [
      {
        "type": "buy",
        "sign": [
          0.8608412817699979,
          7.151843455604516e-05,
          0.41188068354143725,
          0.133204028017387,
          0.6722521783619393,
          0.6073852208085961,
          0.5459298766828227,
          0.718646303526079
        ],
        "item": "clusterer_stop",
        "midpoint": 82.15316795452254,
        "range": 94.1198159280011,
        "tests": [
          {
            "stophere": false,
            "test": "test_clusterer_silhouette",
            "data": "data_freetext_short",
            "threshold": 0.5,
            "hidden": false,
            "results": 

To conclude, in this first tutorial we have examined a general, flexible, and evolvable representation for the agent communication and the ontology in the simulation.
We have seen how a Python program may be generated from agent communications about buying, selling, and constructing software.
This representation could make use of statistics about the frequency of functions occurring in problem types, such as in Microsoft's Deep coder.

We have seen all the lower level details of how the simulation works, but do not yet explore the patterns that these designs create, or how such a simple design can faciliate agent self organization and growth, or an agent economy with a natural price.
In the second tutorial [marketplace.ipynb](marketplace.ipynb), we will address a coevolutionary representation that leverages a rich heterogeneous environment to make the evolution of Python programs within reach, to demonstrate the reasoning behind our design decisions. 