# Use your own data

In this tutorial, we discuss two ways of using your own data:

1. You have one or more lexicons you want to evaluate and generate streams with
2. You already have streams and just want to evaluate them

If you want to expand ARC, we are happy to invite you to contribute to the [ARC Project](https://github.com/milosen/arc) 


## 1. Loading/creating your custom lexicon

Let's say you have a lexicon consisting of the (pseudo-)words 'piɾuta', 'baɡoli', 'tokuda, and 'ɡuhaɪbo'.

We assume you have prepared your lexicon as a list of lists (see below), and that all syllables are of the same type. The function `to_lexicon()` accepts the syllable types we call 'cv' and 'cV'. 'cv' is a syllable consisting of a single-character consonant and a short vowel, e.g. 'pi'. Because it is common in the literature, 'cv' also allows diphthongs, e.g. 'haɪ'). The 'cV' type is a single-character consonant, together with a long vowel, e.g. 'tuː'.

In [9]:
from arc import to_lexicon

raw_lexicon = [
  ['pi', 'ɾu', 'ta'],
  ['ba', 'ɡo', 'li'],
  ['to', 'ku', 'da'],
  ['ɡu', 'haɪ', 'bo']
]

lexicon = to_lexicon(raw_lexicon, syllable_type="cv")

print(lexicon)

print("cumulative_feature_repetitiveness:", lexicon.info["cumulative_feature_repetitiveness"])

piɾuta|baɡoli|tokuda|ɡuhaɪbo
cumulative_feature_repetitiveness: 7


Now turn it into a stream using the `arc` functions introduced earlier.

In [14]:
from arc import make_streams

streams = make_streams([lexicon])
print(streams)

piɾutabaɡolitokudaɡuhaɪbo_random|piɾutabaɡolitokudaɡuhaɪbo_word_structured|piɾutabaɡolitokudaɡuhaɪbo_position_controlled


In [17]:
print(streams[0], end="\n\n")
print(streams[0].info['rhythmicity_indexes'])

to|ɾu|ta|haɪ|ɡu|ɡo|pi|bo|ba|li|ku|da|ɡu|li|bo|ta|to|pi|da|ba|haɪ|ku|ɡo|ɾu|ba|da|ku|ɾu|pi|to|bo|haɪ|li|ɡo|ta|ɡu|ɾu|li|da|haɪ|ɡo|ɡu|bo|pi|ta|ku|ba|to|da|ta|li|pi|ɡu|ku|to|haɪ|bo|ɡo|ba|ɾu|ɡu|to|ba|ta|da|ɡo|li|ɾu|haɪ|pi|ku|bo|li|haɪ|ɾu|ku|ta|pi|ba|bo|ɡu|da|to|ɡo|haɪ|da|li|ba|ɡu|ta|ɡo|bo|ku|pi|ɾu|to|ta|bo|da|ɾu|ɡo|to|ɡu|pi|haɪ|ba|ku|li|to|li|ta|ɾu|da|pi|ɡo|ku|haɪ|bo|ɡu|ba|pi|li|ɡu|haɪ|to|ku|da|bo|ɾu|ta|ba|ɡo|da|haɪ|ta|li|ɾu|bo|to|pi|ba|ku|ɡu|ɡo|bo|ta|pi|ɡu|li|da|to|ɡo|ba|haɪ|ɾu|ku|ba|da|pi|bo|to|ɡu|ta|ku|haɪ|li|ɡo|ɾu|ɡo|haɪ|to|ku|pi|li|ɡu|ba|bo|ɾu|da|ta|ɡu|pi|da|li|ta|to|bo|haɪ|ɡo|ku|ɾu|ba|ɾu|li|haɪ|ba|ɡo|pi|to|ta|bo|da|ku|ɡu|haɪ|ɡu|ɾu|pi|ta|da|ɡo|to|li|ku|bo|ba|ta|ba|li|to|ɾu|bo|pi|haɪ|ku|ɡo|da|ɡu|bo|ɡo|ta|ɾu|ɡu|da|ba|to|haɪ|pi|ku|li|bo|li|pi|ɡo|ɡu|ku|ta|haɪ|da|ɾu|to|ba|ɡu|to|da|bo|ku|ɾu|haɪ|ta|ɡo|li|ba|pi|ɾu|pi|to|da|bo|ɡu|haɪ|ɡo|ta|ba|li|ku|to|pi|ɡo|ɡu|ɾu|bo|ku|ba|haɪ|da|ta|li|ta|da|to|li|bo|ɾu|haɪ|ɡu|ku|pi|ba|ɡo|ku|haɪ|pi|ta|ɡo|bo|to|ɾu|ba|da|li|ɡu|ba|bo|ta|ɡu|pi|ɾu|li|da|haɪ|to|ku|ɡo|h

## Reading in your stream

Again, we assume you have prepared your data into a list of syllables like below.

In [13]:
from arc import to_stream

stream = ['pi', 'ɾu', 'ta', 'ba', 'ɡo', 'li', 'to', 'ku', 'da', 'ɡu', 'ki', 'bo']

stream = to_stream(stream)

print("Stream: ", stream, end="\n\n")
print("rhythmicity indexes (PRIs)", stream.info['rhythmicity_indexes'])

Stream:  pi|ɾu|ta|ba|ɡo|li|to|ku|da|ɡu|ki|bo

rhythmicity indexes (PRIs) {'phon_1_son': 0.0, 'phon_1_back': 0.3333333333333333, 'phon_1_hi': 0.3333333333333333, 'phon_1_lab': 0.16666666666666666, 'phon_1_cor': 0.0, 'phon_1_cont': 0.0, 'phon_1_lat': 0.0, 'phon_1_nas': 0.0, 'phon_1_voi': 0.0, 'phon_2_back': 0.0, 'phon_2_hi': 0.0, 'phon_2_lo': 0.0, 'phon_2_lab': 0.0, 'phon_2_tense': 0.0, 'phon_2_long': 0.0}


As you can see, the non-randomized streams have much worse PRIs.

This concludes our third and last tutorial. We hope you feel ready to use ARC, and help us extend it.