# Tutorial 2

We will load the words and generate a Lexicon with minimal feature overlap between the words. Then, we will follow up with the 2 main ways we can generate random streams, the `Word`-based and the `Syllable`-based stream generation.

First, we load the words from tutorial 1.

In [1]:
from arc import load_words

words = load_words("arc_results/arc_Words.json")
print(words)

baːsuːheː|biːhoːseː|boːsiːhøː|buːhiːseː|buːhøːsiː|byːhoːʃeː|byːhøːzuː|bøːʃoːhiː|deːhøːfoː|deːçaːmuː|... (100 elements total)


Great, now we generate minimum-overlap lexicons with 4 words each. By default, the generator will yield 10 `Lexicon`s max. Let's generate 20 and print some info.

In [2]:
from arc.data import make_lexicon_generator

lexicon_generator = make_lexicon_generator(words, n_words=6, max_yields=20)

for lex in lexicon_generator:
    print(lex, lex.info)



tyːfuːhoː|kaːnyːfiː|byːhøːzuː|bøːʃoːhiː|loːkɛːvaː|huːpaːzyː|... (6 elements total) {'cumulative_overlap': 8}
tyːfuːhoː|kaːnyːfiː|byːhøːzuː|bøːʃoːhiː|loːkɛːvaː|huːpaːzyː|... (6 elements total) {'cumulative_overlap': 8}
ʃøːbyːhuː|fiːkɛːreː|zɛːheːbøː|foːløːɡyː|hiːpuːʃɛː|hoːʃeːpaː|... (6 elements total) {'cumulative_overlap': 8}
ʃøːpuːhiː|buːhøːsiː|faːdeːhoː|loːkɛːvaː|ɡɛːlyːfuː|huːpaːzyː|... (6 elements total) {'cumulative_overlap': 8}
buːhøːsiː|faːdeːhoː|loːkɛːvaː|ɡɛːlyːfuː|ʃeːpoːhiː|huːpaːzyː|... (6 elements total) {'cumulative_overlap': 8}




tyːfuːhoː|deːhøːfoː|faːɡiːluː|fiːloːkaː|hiːʃuːpeː|huːpaːzyː|... (6 elements total) {'cumulative_overlap': 8}
ʃøːbyːhuː|fiːkɛːreː|zɛːheːbøː|foːløːɡyː|hiːpuːʃɛː|hoːʃeːpaː|... (6 elements total) {'cumulative_overlap': 8}
ʃøːbyːhuː|vaːniːkoː|fiːkɛːreː|zɛːheːbøː|hiːpuːʃɛː|hoːʃeːpaː|... (6 elements total) {'cumulative_overlap': 8}
kaːløːfyː|tyːfuːhoː|byːhøːzuː|loːkɛːvaː|fiːtoːheː|hiːpuːʃɛː|... (6 elements total) {'cumulative_overlap': 9}
tyːfuːhoː|kaːnyːfiː|byːhøːzuː|bøːʃoːhiː|loːkɛːvaː|huːpaːzyː|... (6 elements total) {'cumulative_overlap': 8}
ʃøːbyːhuː|vaːniːkoː|zɛːheːbøː|foːkɛːryː|hiːpuːʃɛː|hoːʃeːpaː|... (6 elements total) {'cumulative_overlap': 9}
ʃøːbyːhuː|faːɡiːluː|zɛːheːbøː|fuːlyːkɛː|hiːpuːʃɛː|hoːʃeːpaː|... (6 elements total) {'cumulative_overlap': 9}
tyːfuːhoː|doːheːfaː|fiːkɛːreː|foːlaːkøː|ɡɛːzøːmuː|hiːpuːʃɛː|... (6 elements total) {'cumulative_overlap': 9}
tyːfuːhoː|deːhøːfoː|faːɡiːluː|fiːloːkaː|hiːʃuːpeː|huːpaːzyː|... (6 elements total) {'cumulative_overlap': 9}
tyːfuːhoː|vaːniːkoː

Finally, we can generate a stream. The following should run quickly. If it doesn't, your try reducing the number of words per lexicon or increasing the allowed rhythmicity index.

In [7]:
from arc.data import make_stream_from_words

stream = make_stream_from_words(words, rand_mode="word", n_words=4, max_rhythmicity=0.1)
print(stream)

for key, val in stream.info.items():
    print(f"{key}: {str(val)}")



hoːbeːsiːmyːtiːçaːzɛːheːbøːfoːkuːreːzɛːheːbøːmyːtiːçaːhoːbeːsiːfoːkuːreːmyːtiːçaːfoːkuːreːhoːbeːsiːzɛːheːbøːfoːkuːreːmyːtiːçaːzɛːheːbøːhoːbeːsiːzɛːheːbøːhoːbeːsiːmyːtiːçaːfoːkuːreːhoːbeːsiːfoːkuːreːzɛːheːbøːmyːtiːçaːfoːkuːreːhoːbeːsiːzɛːheːbøːmyːtiːçaːhoːbeːsiːfoːkuːreːmyːtiːçaːzɛːheːbøːfoːkuːreːzɛːheːbøːhoːbeːsiːmyːtiːçaːhoːbeːsiːmyːtiːçaːfoːkuːreːzɛːheːbøːhoːbeːsiːzɛːheːbøːfoːkuːreːmyːtiːçaːfoːkuːreːhoːbeːsiːmyːtiːçaːzɛːheːbøːmyːtiːçaːhoːbeːsiːzɛːheːbøːfoːkuːreːzɛːheːbøːhoːbeːsiːfoːkuːreːmyːtiːçaːzɛːheːbøːmyːtiːçaːfoːkuːreːhoːbeːsiːzɛːheːbøːfoːkuːreːhoːbeːsiːmyːtiːçaːzɛːheːbøːfoːkuːreːmyːtiːçaːhoːbeːsiːmyːtiːçaːfoːkuːreːzɛːheːbøːhoːbeːsiːfoːkuːreːzɛːheːbøːmyːtiːçaːhoːbeːsiːfoːkuːreːmyːtiːçaːhoːbeːsiːzɛːheːbøːmyːtiːçaːzɛːheːbøːhoːbeːsiːfoːkuːreːhoːbeːsiːzɛːheːbøːmyːtiːçaːfoːkuːreːmyːtiːçaːhoːbeːsiːfoːkuːreːzɛːheːbøːmyːtiːçaːzɛːheːbøːfoːkuːreːhoːbeːsiːmyːtiːçaːfoːkuːreːhoːbeːsiːzɛːheːbøːhoːbeːsiːfoːkuːreːmyːtiːçaːzɛːheːbøːfoːkuːreːzɛːheːbøːhoːbeːsiːmyːtiːçaːfoːkuːreːmyːtiːçaːzɛːheːbøːh

If this runs quickly, then we can step it up and generate a complete set of compatible lexicons for our study.

In [None]:
from arc.data import make_compatible_streams
streams = make_compatible_streams(words, n_words=4, max_rhythmicity=0.09)

for i, stream in enumerate(streams):
    print("========= Stream Nr. ", i + 1, " =========")
    print("")
    for key, val in stream.info.items():
        print(f"{key}: {str(val)}")
    print("lexicon cummulative overlap: ", stream.info["lexicon"].info["cumulative_overlap"])
    print("")
    print(stream)
    print("")

