# Tutorial 2

We will load the words and generate a Lexicon with minimal feature overlap between the words. Then, we will follow up with the 2 main ways we can generate random streams, the `Word`-based and the `Syllable`-based stream generation.

First, we load the words from tutorial 1.

In [None]:
from arc import load_words

words = load_words("arc_Words.json")
print(words)

Great, now we generate minimum-overlap lexicons with 4 words each. By default, the generator will yield 10 `Lexicon`s max. Let's generate 20 and print some info.

In [None]:
from arc.data import make_lexicon_generator

lexicon_generator = make_lexicon_generator(words, n_words=6, max_yields=20)

for lex in lexicon_generator:
    print(lex, lex.info)

Finally, we can generate a stream. The following should run quickly. If it doesn't, try reducing the number of words per lexicon or increasing the allowed rhythmicity index.

In [None]:
from arc.data import make_stream_from_words

stream = make_stream_from_words(words, rand_mode="word", n_words=4, max_rhythmicity=0.1)
print(stream)

for key, val in stream.info.items():
    print(f"{key}: {str(val)}")

If this runs quickly, then we can step it up and generate a complete set of compatible lexicons for our study. If `streams` is empty, try increasing the allowed maximum rythmicity).

In [None]:
from arc.data import make_compatible_streams
streams = make_compatible_streams(words, n_words=4, max_rhythmicity=0.1)

streams = [] if streams is None else streams

for i, stream in enumerate(streams):
    print("========= Stream Nr. ", i + 1, " =========")
    print("")
    for key, val in stream.info.items():
        print(f"{key}: {str(val)}")
    print("lexicon cummulative overlap: ", stream.info["lexicon"].info["cumulative_overlap"])
    print("")
    print(stream)
    print("")

A stream is a collection of syllables with repetition, i.e. a `Word` in our framework. It also come with some extra info about its feature rhythmicity indexes, the lexicons that were used to generate it, and how the stream was generated, i.e. whether the random generation is at the syllable or at the word level (rand_level={word, syllable}). At the word level, the words from a lexicon are randomized, but each word is left intact. Syllable level means that the words are further brocken down into syllables, and the syllables are shuffled across the whole lexicon, destroying word-level information.

In [None]:
print(streams[0].info["lexicon"])