# Tutorial 1
You will learn basic data saving and load with the core ARC-Types.

## Phonemes
Phonemes are the atomic unit of the ARC-Typesystem and built the basis for constructing other types like Syllables and Words. 
To enjoy the full functionolity of ARC, you'll need Phonemes with the phonetic feature fields filled. Luckily, ARC comes with an extensive corpus of Phonemes and phonetic features.
Let's load them and see what they look like.

In [1]:
from arc import load_default_phonemes
phonemes = load_default_phonemes()
print(phonemes)

k͡p|ɡ͡b|c|ɡ|k|q|ɖ|ɟ|ɠ|ɢ|... (5175 elements total)


The `phonemes` variable is a Collection of Phoneme-Objects, more specifically an `ARC-Collection`. What you see when you print any `ARC-Collection` is a short summary of the highest level elements.
You can treat the `ARC-Collection` like most Python collection types, meaning you can access elements, iterate over it etc.

> Note: Internally, `ARC-Collection`s are `OrderedDict`s (with some extra convenience methods). This meaning you can treat it like both Python builtin types `Dict`and `List`.

Let's see that in action.

## Syllables
Our first composite type is the `Syllable`, consisting of a list of `Phoneme`s. Let's make a collection of syllables, that follow the `cV`pattern, meaning they consist of a single-character phoneme `c` followed by a long vowel `V`.

In [2]:
from arc.data import make_feature_syllables
syllables = make_feature_syllables(phonemes, phoneme_pattern="cV")
print(syllables)

Output()

cʔː|cɥː|cɰː|cʋː|cʍː|cjː|cwː|cɹː|cɻː|cɑː|... (2108 elements total)


Since we started with an international Phoneme corpus, there may be many Syllables, that we do not want to include in our further analysis. Lets filter out some of them.

We'll start by filtering based on a corpus of syllables. ARC comes with an example corpus in German, and it will be called, when you call filters without supplying a path to a custom file. 

>The filter-implementations are specific to the corpus, so you might want to implement your own filters. We will discuss that in a later tutorial. If you are curious, you can take a look at the arc.filter submodule to see how to implement a filter.

In [3]:
from arc.filter import filter_with_corpus
syllables = filter_with_corpus(syllables)
print(syllables)

daː|diː|viː|ziː|zoː|taː|keː|haː|ɡeː|neː|... (130 elements total)


In our original publication, we filter syllables based on the p-value that the syllable is uniformaly distributed with the others. We made a filter for that:

In [4]:
from arc.filter import filter_uniform_syllables
syllables = filter_uniform_syllables(syllables)
print(syllables)

zuː|toː|moː|deː|biː|niː|kaː|buː|beː|ruː|... (77 elements total)


If you have a native (in our case German) phoneme corpus as well, you can filter the syllables based on that.

> Note that we could have done that at the phoneme level already.

In [5]:
from arc.filter import filter_common_phoneme_syllables
syllables = filter_common_phoneme_syllables(syllables)
print(syllables)

Output()

zuː|toː|moː|deː|biː|niː|kaː|buː|beː|ruː|... (76 elements total)


## Export to SSML
Once we are done choosing syllables, we can export them to Speech Synthesis Markup Language (SSML) for later reference.

In [6]:
from arc.io import export_speech_synthesiser
export_speech_synthesiser(syllables)

## Words
`Word`s are made out of `Syllable`s, same as before when we made syllables from phonemes.

Since one of ARC's main features is rythmicity control, our `make_words` function will only create words that have minimum overlap of phonotactic features.

In [7]:
from arc.data import make_words
words = make_words(syllables)
print(words)

Output()

siːbøːheː|laːɡyːfuː|myːtiːçaː|toːheːfaː|boːlyːçaː|ʃuːmɛːɡyː|ʃɛːkaːmyː|ɡɛːzyːmuː|byːʃaːheː|niːpoːçaː|... (5501 elements total)


Again, we apply some filters, but this time at word level.

In [8]:
from arc.filter import filter_common_onset_words
words = filter_common_onset_words(words)
print(words)

Output()

laːɡyːfuː|myːtiːçaː|toːheːfaː|boːlyːçaː|ʃuːmɛːɡyː|ʃɛːkaːmyː|ɡɛːzyːmuː|byːʃaːheː|niːpoːçaː|myːkɛːʃaː|... (5089 elements total)


In [9]:
from arc.filter import filter_gram_stats
words = filter_gram_stats(words)
print(words)



toːheːfaː|deːfaːhiː|loːfiːkuː|fuːnyːɡiː|piːzuːhoː|luːɡaːfoː|hiːbaːsuː|luːkaːfyː|hoːtiːvaː|biːseːhoː|... (1130 elements total)


Even with all the phonotactic conditions we applied, there are still many words to choose from to build our `Lexicons`and streams later on.

You can always get a random subsample of a collection by running:

In [10]:
words = words.sample(100)
print(words)

ʃeːhoːbøː|riːfoːkuː|huːfiːdoː|reːfoːɡiː|fuːniːkaː|ɡaːsuːmyː|laːkuːfoː|zuːmoːɡɛː|faːryːkoː|faːhoːtyː|... (100 elements total)


In [11]:
words.save()

This concludes our first tutorial. You've made `Syllable`s from `Phonemes`s and `Word`s from `Syllable`s and applyied filters to them. Finally, you saved the generated words to a json file. In the next tutorial, we will pick up where we left and load these words to generate a `Lexcion`, a list of `Word`s with specific phonotactic requirements and use the lexicon to generate different types of streams.