## Getting started with `scikit-talk`

`scikit-talk` can be used to explore and analyse conversation files.

It contains three main levels of objects:

- Corpora; described with the `Corpus` class
- Conversations; described with the `Conversation` class
- Utterances; described with the `Utterance` class

To explore the power of `scikit-talk`, the best entry point is a parser. With the parsers, we can load data into a `scikit-talk` object.

`scikit-talk` currently has the following parsers:

- `ChaFile.parse()`, which parsers .cha files.

Future plans include the creation of parsers for:

- .eaf files
- .TextGrid files
- .xml files
- .csv files
- .json files

Parsers return an object of the `Conversation` class.

To get started with `scikit-talk`, import the module:

In [1]:
import sktalk

To see it in action, we will need to start with a transcription file.

For example, you can download a file from the
[Griffith Corpus of Spoken Australian English](https://ca.talkbank.org/data-orig/GCSAusE/). This publicly available corpus contains transcription files in `.cha` format.

We use the `ChaFile.parse` module to create the `Conversation` object:

In [2]:
cha01 = sktalk.ChaFile('GCSAusE_01.cha').parse()

cha01

<sktalk.corpus.conversation.Conversation at 0x113111390>

A parsed cha file is a conversation object. It has metadata, and a collection of utterances:

In [3]:
cha01.utterances[:10]

[Utterance(utterance='0', participant='S', time=[0, 1500], begin='00:00:00.000', end='00:00:01.500', metadata=None, utterance_clean='S x150_1500x15', utterance_list=['S', 'x150_1500x15'], n_words=2, n_characters=13, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance="mm I'm glad I saw you⇗", participant='S', time=[1500, 2775], begin='00:00:01.500', end='00:00:02.775', metadata=None, utterance_clean='S mm Im glad I saw you x151500_2775x15', utterance_list=['S', 'mm', 'Im', 'glad', 'I', 'saw', 'you', 'x151500_2775x15'], n_words=8, n_characters=31, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance="I thought I'd lost you (0.3)", participant='S', time=[2775, 3773], begin='00:00:02.775', end='00:00:03.773', metadata=None, utterance_clean='S I thought Id lost you x152775_3773x15 x153773_4052x15', utterance_list=['S', 'I', 'thought', 'Id', 'lost', 'you', 'x152775_3773x15', 'x153773_4052x15'], n_words=8, n_characters=48, time_to_next=None, dyadic=None, FTO=None),
 Ut

In [4]:
cha01.metadata

{'source': 'GCSAusE_01.cha',
 'UTF8': '',
 'PID': '11312/t-00017232-1',
 'Languages': ['eng'],
 'Participants': {'S': {'name': 'Sarah',
   'language': 'eng',
   'corpus': 'GCSAusE',
   'age': '',
   'sex': '',
   'group': '',
   'ses': '',
   'role': 'Adult',
   'education': '',
   'custom': ''},
  'H': {'name': 'Hannah',
   'language': 'eng',
   'corpus': 'GCSAusE',
   'age': '',
   'sex': '',
   'group': '',
   'ses': '',
   'role': 'Adult',
   'education': '',
   'custom': ''}},
 'Options': 'CA',
 'Media': '01, audio'}

We can write the conversation to file as a json file:

In [5]:
cha01.write_json(path = "CGSAusE_01.json")

## The `Corpus` object

A Corpus is a way to collect conversations.

A Corpus can be initialized from a single conversation, or a list of conversations.
It can also be initialized as an empty object, with metadata.

In [6]:
GCSAusE = sktalk.Corpus(name = "Griffith Corpus of Spoken Australian English",
                        url = "https://ca.talkbank.org/data-orig/GCSAusE/")

GCSAusE.metadata

{'name': 'Griffith Corpus of Spoken Australian English',
 'url': 'https://ca.talkbank.org/data-orig/GCSAusE/'}

We can add conversations to a `Corpus`:

In [7]:
GCSAusE.append(cha01)

GCSAusE.conversations

[<sktalk.corpus.conversation.Conversation at 0x113111390>]

We can turn objects of type `Conversation` and `Corpus` to a dictionary with, `cha01.asdict()` and `GCSAusE.asdict()`, respectively.

A `Corpus` object can also be stored as a `.json` file:

In [8]:
GCSAusE.write_json(path = "CGSAusE.json")


Vice versa, a `Corpus` object stored as a `.json` file can be loaded back into a `Corpus` object:

In [9]:
GCSAusE_2 = sktalk.Corpus.from_json("CGSAusE.json")

The corresponding action can also be done with a `Conversation` object stored as a `.json` file:

In [10]:
cha01_2 = sktalk.Conversation.from_json(path = "CGSAusE_01.json")

## Analyzing turn-taking dynamics

When creating a `Conversation` object, a number of calculations and transformations are performed on the `Utterance` objects within.
For example, the number of words in each utterance is calculated, and stored under `Utterance.n_words`.
You can see this for a specific utterance as follows:

In [11]:
cha01.utterances[0].n_words

2

More sophisticated calculations can be performed, but do not happen automatically.
An example of this is the calculation of the Floor Transfer Offset (FTO) per utterance.
FTO is defined as the difference between the time that a turn starts, and the end of the most relevant prior turn by the other participant.
If there is overlap between these turns, the FTO is negative.
If there is a pause between these utterances, the FTO is positive.

We can calculate the FTOs of the utterances in a conversation:

In [12]:
cha01.calculate_FTO()

for utterance in cha01.utterances[:10]:
    print(f'{utterance.time} {utterance.participant} - FTO: {utterance.FTO}')

[0, 1500] S - FTO: None
[1500, 2775] S - FTO: None
[2775, 3773] S - FTO: None
[4052, 5515] H - FTO: 279
[4052, 5817] S - FTO: None
[6140, 9487] S - FTO: None
[12888, 14050] H - FTO: 3401
[14050, 17014] H - FTO: None
[17014, 18611] S - FTO: 0
[18611, 21090] H - FTO: 0


To determine which prior turn is the relevant turn for FTO calculation, the following criteria are used to find a relevant utterance prior to an utterance U:

- the relevant utterance must be by another participant
- the relevant utterance must be the most recent utterance by that participant
- the relevant utterance must have started more than a specified number of ms before the start of U. This time defaults to 200 ms, but can be changed with the `planning_buffer` argument.
- the relevant utterance must be partly or entirely within the context window. The context window is defined as 10s (or 10000ms) prior to the utterance U. The size of this window can be changed with the `window` argument.
- within the context window, there must be a maximum of 2 speakers, which can be changed to 3 with the `n_participants` argument.