# Tutorial: Controlled Stream Generation

We will generate words and a Lexicon with minimal feature overlap between the words. Next, we introduce the 3 main ways to generate random streams, depending on how the transition probabilities (TPs) of their syllables are structured: word-structured TPs, fully random (uniform) TPs, and position-controlled TPs.

First, we generate/reload the words register (see arc types tutorial).

In [None]:
import os

from arc import load_words, load_phonemes, make_syllables, make_words

FORCE_RECOMPUTE = False

if os.path.exists("words.json") and not FORCE_RECOMPUTE:
    print("Load words...")
    words = load_words("words.json")
    print(words)
else:
    print("Load phonemes...")
    phonemes = load_phonemes()
    print(phonemes)

    print("Make syllables...")
    syllables = make_syllables(phonemes)
    print(syllables)

    print("Make words...")
    words = make_words(syllables)
    print(words)

    print("Save words ...")
    words.save("words.json")

## Lexicon

Now we generate lexica with minimal feature repetitiveness. Let's start with 4 words each. 

By default, the function will generate 5 `Lexicon`s max. Let's generate 2 and print some info.

In [None]:
from arc import make_lexicons

lexicons = make_lexicons(words, n_lexicons=20, n_words=4)

for lexicon in lexicons:
    print("Lexicon:", lexicon)
    print("Info:", lexicon.info)
    print("")

By default, Lexicons with the minimum possible cumulative overlap between the word features will be generated first, starting at zero overlap. If it is not possible to generate all the requested Lexicons with the given parameters, the allowed overlap will be increased, which will be indicated by a warning message.

This process will be repeated, until any of the following statements is true
- the requested number of Lexicons has been generated
- the maximum allowed overlap is reached (set via `max_overlap`)
- the set of all word combinations is exhausted

If one or more Lexicons is returned, their info fields hold the cumulative overlap between all word pairs that is achieved by the Lexicon as well as the maximum pairwise overlap used.

## Compatible Streams

We want to generate a complete set of compatible lexicons for our study, i.e. to generate a compatible set of streams for testing statistical learning hypotheses. If `streams` is empty, try increasing the allowed maximum rythmicity).

As you can see, the `.info` field holds some useful information about the generated stream, i.e. which Lexicon has been used to generate it, the rythmicity indexes achieved for each feature, and which randomization/TP-structure mode has been used.

In [None]:
from arc import make_streams
streams = make_streams(lexicons, max_rhythmicity=0.1)

print("Streams Summary:", streams)
print("Info:", streams.info)
print("")

for stream in streams:
    print("Stream:", stream)
    stream_info = {k: str(v) for k, v in stream.info.items()}
    print("Info:", stream_info)
    print("")