# Tutorial 2

We will load the words and generate a Lexicon with minimal feature overlap between the words. Then, we will follow up with the 2 main ways we can generate random streams, the `Word`-based and the `Syllable`-based stream generation.

## Lexicon

First, we load the words from tutorial 1.

In [None]:
from arc import load_words

words = load_words("words.json")
print(words)

Great, now we generate minimum-overlap lexicons with 4 words each. By default, the function will generate 5 `Lexicon`s max. Let's generate 2 and print some info.

In [None]:
from arc.generation.lexicons import make_lexicons_from_words
from arc import Lexicon
from typing import Tuple

lexicons: Tuple[Lexicon, ...] = make_lexicons_from_words(words, n_lexicons=2)

for lexicon in lexicons:
    print(lexicon, lexicon.info)

By default, Lexicons with the minimum possible cumulative overlap between the word features will be generated first, starting at zero overlap. If it is not possible to generate all the requested Lexicons with the given parameters, the allowed overlap will be increased, which will be indicated by a warning message.

This process will be repeated, until any of the following statements is true
- the requested number of Lexicons has been generated
- the maximum allowed overlap is reached (set via `max_overlap`)
- the set of all word combinations is exhausted

If one or more Lexicons is returned, their info fields hold the cumulative overlap between all word pairs that is achieved by the Lexicon as well as the maximum pairwise overlap used.

## Stream

### Single Stream

The stream generation internally generates a Lexicon first, and then a Stream based on that. The following cell, however, generates a stream directly from words for convenience. The cell should execute quickly. If it doesn't, try reducing the number of words per lexicon or increasing the allowed rhythmicity index.

In [None]:
print(words.info)

In [None]:
from arc.generation.stream import make_stream_from_words

stream = make_stream_from_words(words, rand_mode="word", n_words=4, max_rhythmicity=0.1)

print("")

print(stream)

print("")

for key, val in stream.info.items():
    print(f"{key}: {str(val)}")
    print("")

As you can see, the `.info` field holds some useful information about the generated stream, i.e. which Lexicon has been used to generate it, the rythmicity indexes achieved for each feature, and which randomization mode has been used. The randomization mode can be `syllable` or `word`. It can be immediately varified that the randomization mode is `word`, since the individual words of the Lexicon can be recognized in the stream. Contrastingly, syllable level randomization means that the words are further brocken down into syllables, and the syllables are shuffled across the whole lexicon, destroying word-level information.

Next, we will use this distinction to generate a compatible set of streams for testing statistical learning hypotheses.

### Set of Compatible Streams

If this runs quickly, then we can step it up and generate a complete set of compatible lexicons for our study. If `streams` is empty, try increasing the allowed maximum rythmicity).

In [None]:
from arc.generation.stream import make_compatible_streams
streams = make_compatible_streams(words, n_words=4, max_rhythmicity=0.1)

for i, stream in enumerate(streams):
    print("========= Stream Nr. ", i + 1, " =========")
    
    print("")
    
    print(stream)
    
    print("")
    
    for key, val in stream.info.items():
        print(f"{key}: {str(val)}")
        print("")