# DAS Evolution Queries - Sentences Dataset

This notebook demonstrates how to use Evolution-based queries with Hyperon MeTTa and DAS.

Evolution queries are like regular pattern-matching queries in the sense that it expects the same kind of input (a query) and delivers the same kind of result (an iterator to query answers). They are different because Evolution queries go through an evolutionary algorithm before delivering the answers while regular queries are simply executed in the query engine. Here's how it works.

1. The caller submits a query to the evolution agent. In addition to the query itself, the caller also provides a fitness function, which can evaluate the quality of query answers giving a score in \[0, 1\], and a secondary query which we call the "correlation query", whose meaning will be explained below.
2. The evolution agent will execute the query in the query engine and will use the first N results (N is another parameter of the evolution query request) to build a population of query answers.
3. All the N query answers in the population are evaluated using the passed fitness function.
4. The best M individuals (i.e. the M query answers with the largest fitness values) are selected to sample the next generation of the population (actually, the selection is made with a mix of just picking up the best individuals and tournament selection with the balance between them being another evolution parameter)
5. To sample the population generation, first we use the "correlation query" passed as an evolution parameter. For each of the selected query answers, we use elements from the answer (variable values or the rewritten links themselves) to customize the correlation query. Then this query is executed in the query engine and its results are used to change the Hebbian Network related to the given context in the AttentionBroker and to stimulate some of the elements in the query answer. This stimulation will also trigger activation spreading in the Hebbian Network (all this stimulation and activation spreading happens ONLY IN ATTENTION BROKER, which keeps separate Hebbian Networks for different contexts; importance updating DOESN'T affect the atomspace itself).
6. After the importance update in the context, the main query is executed once again and the next generation of the population is sampled by getting the best N individuals (query answers) as we did initially.
7. Evolution query agent repeats steps 2-6 until a stop criteria (another evolution parameter) is met. While new generations are sampled and evaluated (using the passed fitness function), every time the agent sees a query answer which is better (i.e. has a larger fitness value) than the last one it delivered to the caller, this new best solution is instantly delivered (it doesn't matter if it is in the first generation, second or whatever). This way, in addition to the stop criteria passed as evolution parameter, the caller can also just interrupt the evolution process by aborting the query if it already found a solution which is considered good enough by the caller.

## Load Sentences Dataset (Optional)

If not already loaded, use das-cli to load a sentences dataset containing 100K sentences with 10 words each (words starting with letters a-e).


In [1]:
!das-cli metta load /tmp/100K_sentences_10_words_a-e.metta

[33mdas-cli-mongodb-40021 is running on port 40021[0m
[33mdas-cli-redis-40020 is running on port 40020[0m
Loading metta file /tmp/100K_sentences_10_words_a-e.metta...
Connecting to Redis at 0.0.0.0:40020
Connecting to MongoDB at 0.0.0.0:40021
Done.


## Setup

Initialize `hyperon` MeTTa environment and create a helper function to run MeTTa programs.


In [1]:
import hyperon

metta = hyperon.MeTTa()
def run(program='!(+ 1 2)'):
    for result in metta.run(program):
        for child in result:
            print(child)

## Import DAS Module

Import the DAS module into the MeTTa environment.


In [2]:
run('!(import! &self das)')

()


## Connect to DAS

Bind a DAS connection to `&das` space. The first parameter specifies a client's host and port range (47000-47999) and the second must be a known peer address (eg. Query Agent at localhost:40002).

For `MacOS` users you will need to use `host.docker.internal` as this notebook host (eg. `host.docker.internal:47000-47999`)


In [3]:
run('!(bind! &das (new-das! (localhost:47000-47999) (localhost:40002)))')

()


## Check the available DAS services

This command will return all available services as (endpoint \<command\>)

In [4]:
run('!(das-services!)')

()
DAS Services (peer <command>) [5/5]:
  - 0.0.0.0:40005 <query_evolution>
  - 0.0.0.0:40004 <inference>
  - 0.0.0.0:40003 <link_creation>
  - 0.0.0.0:40006 <context>
  - 0.0.0.0:40002 <pattern_matching_query>


## Simple Query: Find Words in Sentence

Find all words contained in a specific sentence. This is a basic pattern match to verify the dataset is loaded.


In [5]:
run('!(match &das (Contains (Sentence "acc eaa eec bbb bbe ceb cee cbe bbd ebe") (Word $W)) $W)')

"bbe"
"acc"
"ceb"
"cee"
"cbe"
"bbd"
"ebe"
"eec"
"eaa"
"bbb"


## Create an Attention Broker Context

Now we create a context in the Attention Broker (via Context Broker) that will be used by our evolution algorithm.


In [6]:
run('!(das-create-context! context ((Contains $sentence1 $word1) ((0 $sentence1) ($sentence1 $word1)) ()))')

((name context) (key 5c18ef72771564b7f43c497dc507aeab))


## Define Evolution Query Parameters

- **Query definition**: Pattern to search for sentences containing the word "bbb"

In [7]:
run('''(= (query) (Contains $sentence1 (Word "bbb")))''')

- **Fitness function (ff)**: Fitness(sentence) = count(c, sentence) / length(sentence); count the occurrences of a given character and divide it by the sentence length. 

In [8]:
run('''
(= (str-length $s) (* ((py-dot "" len) $s) 1.0))
(= (count-letters $s $c) (* ((py-dot $s count) $c) 1.0))
(= (remove-spaces $s) ((py-dot $s replace) " " ""))
(= (prep-sentence $s) (remove-spaces (index-atom $s 1)))
(= 
  (ff $s $c) 
  (/ 
    (count-letters (prep-sentence $s) $c) 
    (str-length (prep-sentence $s))
  )
)
''')

- **Correlation parameters**: These parameters difine how the Hebbian network inside AttentionBroker should be updated in regard to each query answer of the main query above.

For each query answer, `correlation-query` is used to make another query. Before being issued, elements of `correlation-query` are supposed to be replaced by elements in the query answer. These substitutions are determined by `correlation-replacements` which is basically a list of pairs with a mapping from elements in `correlation-query`which are supposed to be replaced by the corresponding element in the query answer.

After being properly rewritten, `correlation-query` is issued and `correlation-mappings` is used to update the Hebbian Links. `correlation-mappings` is a list of pairs `(source, target)` where `source` is an element of the original query answer being correlated and `target` is an element of the `correlation-query` query answer. Each pair indicates that the corresponding Hebbian Link `source` -> `target` is supposed to be created in the proper context in the AttentionBroker (or updated to increase the link count by 1).

In [9]:
run('''
; Template queries used to find correlated atoms in the knowledge base after the initial query.
(=
  (correlation-queries)
  (
    (Contains $placeholder1 $word1)
  )
)

; Variable substitution maps that specify which correlation query variables should be replaced 
; with actual values from the initial query answers.
(=
  (correlation-replacements)
  (
    (placeholder1 sentence1)
  )
)

; Defines which elements from initial and correlation query answers should be linked together for 
; attention allocation updates (Hebbian Network).
(=
  (correlation-mappings)
  (
    (sentence1 word1)
  )
)
''')

## Check Current Evolution Parameters

Display the current DAS evolution parameters to see default settings.


In [10]:
run('!(das-get-params!)')

()
DAS Parameters:
  Networking:
    !(das-set-param! (hostname host.docker.internal))
    !(das-set-param! (port_lower 47000))
    !(das-set-param! (port_upper 47999))
    !(das-set-param! (known_peer_id localhost:40002))
  Context:
    !(das-set-param! (context context))
    !(das-set-param! (use_cache true))
    !(das-set-param! (enforce_cache_recreation false))
    !(das-set-param! (initial_rent_rate 0.25))
    !(das-set-param! (initial_spreading_rate_lowerbound 0.5))
    !(das-set-param! (initial_spreading_rate_upperbound 0.7))
  Query:
    !(das-set-param! (max_answers 100))
    !(das-set-param! (max_bundle_size 1000))
    !(das-set-param! (count_flag false))
    !(das-set-param! (attention_update_flag false))
    !(das-set-param! (unique_assignment_flag true))
    !(das-set-param! (positive_importance_flag false))
    !(das-set-param! (populate_metta_mapping true))
    !(das-set-param! (use_metta_as_query_tokens true))
  Evolution:
    !(das-set-param! (elitism_rate 0.08))
    !

## Set Maximum Generations

Configure the evolution to run for 5 generations only. Each generation refines the search based on correlation analysis.


In [11]:
run('!(das-set-param! (max_generations 5))')

()
DAS Param Updated: 'max_generations': UnsignedInt(5)


## Run Evolution Query

Execute an evolution-based query that:
1. Searches for sentences containing "bbb"
2. Analyzes the frequency of letter "c" in matching sentences
3. Evolves over 5 generations to find sentences with optimal "c" frequency
4. Uses correlation mappings to refine results across generations


In [12]:
run('''!(das-evolution! (!(query) (ff $sentence1 "c") !(correlation-queries) !(correlation-replacements) !(correlation-mappings)) $sentence1)''')

(Sentence "bbb bdb ddb aad bcb bbd dba ada aeb bbb")
(Sentence "ebb abd dbe dde ebd abb ace bbb aba ebd")
(Sentence "cdd bbb aab adb bea cbb dab eaa bbd bae")
(Sentence "eae add bad acb aeb bbb cda bdb dbd dee")
(Sentence "bbb eda ece eae eeb eae eab aca dda eed")
(Sentence "eaa ddb aec ebe bde dae ebe add bbb cda")
(Sentence "bbb cbb ecb bee dde ebc bad eeb bbe edb")
(Sentence "cde eab ddd dbb eed ebe cde bbb ebb cad")
(Sentence "cca cba aea bbb eea aee bdb aad aba ded")
(Sentence "bdc bde bdd cee bee bbb bbd bee ebd dce")
(Sentence "cbd bad bdb beb bcd bee caa aed bbb ade")
(Sentence "eba cae dad bbb edd cdb aee bab abc bcb")
(Sentence "deb aec bed bee bbb dea cec aca ddb beb")
(Sentence "cbd bda ada ccd deb dae dbd dee cea bbb")
(Sentence "dbe ebb eea ceb dcb ace aba bbb edc bed")
(Sentence "aab dee aac cdc bea dba ecb ebd bbb dea")
(Sentence "aee adc bbb abc eae aed dad acc daa bae")
(Sentence "dbe cab eab dce dba bce eae bbb ded cba")
(Sentence "bdb eea dcd ded eac eda bbb ceb bac

## Test Fitness Function

Calculate the frequency of letter "c" in the first and last two sentences.


In [13]:
# First two senteces
run('!(ff (Sentence "bbb bdb ddb aad bcb bbd dba ada aeb bbb") "c")')
run('!(ff (Sentence "ebb abd dbe dde ebd abb ace bbb aba ebd") "c")')
# Last two sentences
run('!(ff (Sentence "ccb ccc cdd bbb dcc add ebe dbb dcd ecc") "c")')
run('!(ff (Sentence "dec bbe ccb dec dda cca bbb bec cca ccc") "c")')

0.03333333333333333
0.03333333333333333
0.36666666666666664
0.4
