## Getting started with scikit-talk

Scikit-talk can be used to explore and analyse conversation files.

It contains three main levels of objects:
- Corpora; described with the `Corpus` class
- Conversations; described with the `Conversation` class
- Utterances; described with the `Utterance` class

To explore the power of scikit-talk, the best entry point is a parser. With the parsers, we can load data into a scikit-talk object.

Scikit-talk currently has the following parsers:

- `ChaFile`.parse(), which parsers .cha files.

Future plans include the creation of parsers for:
- .eaf files
- .TextGrid files
- .xml files
- .csv files

Parsers return an object of the `Conversation` class. Let's see it in action:

In [10]:
import sktalk

In [11]:
parsed_cha = sktalk.ChaFile('../data/02.cha').parse()

parsed_cha

<sktalk.corpus.conversation.Conversation at 0x1094b7820>

A parsed cha file is a conversation object. It has metadata, and a collection of utterances:

In [8]:
parsed_cha.utterances[:10]

[Utterance(utterance='&=noise after all the planning and thinking about today (0.8) I realised that I forgot to bring your Conan book in aga:in→ (.)', participant='A', time=(0, 2856), begin='00:00:00.000', end='00:00:02.856', metadata=None),
 Utterance(utterance='°you son of a bitch°', participant='T', time=(6817, 7860), begin='00:00:06.817', end='00:00:07.860', metadata=None),
 Utterance(utterance='∙hhh hh (0.8)', participant='A', time=(7860, 9125), begin='00:00:07.860', end='00:00:09.125', metadata=None),
 Utterance(utterance='you ∆SON OF A BITCH∆ (1.0)', participant='T', time=(9493, 11118), begin='00:00:09.493', end='00:00:11.118', metadata=None),
 Utterance(utterance='that is pretty poor form', participant='A', time=(12116, 13463), begin='00:00:12.116', end='00:00:13.463', metadata=None),
 Utterance(utterance="°that's alright° (0.4)", participant='T', time=(13463, 14148), begin='00:00:13.463', end='00:00:14.148', metadata=None),
 Utterance(utterance="he:y look→ look at the Myspace 

In [9]:
parsed_cha.metadata

{'source': '../data/02.cha',
 'UTF8': '',
 'PID': '11312/t-00017233-1',
 'Languages': ['eng'],
 'Participants': {'A': {'name': 'Adult',
   'language': 'eng',
   'corpus': 'GCSAusE',
   'age': '',
   'sex': '',
   'group': '',
   'ses': '',
   'role': 'Adult',
   'education': '',
   'custom': ''},
  'T': {'name': 'Adult',
   'language': 'eng',
   'corpus': 'GCSAusE',
   'age': '',
   'sex': '',
   'group': '',
   'ses': '',
   'role': 'Adult',
   'education': '',
   'custom': ''}},
 'Options': 'CA',
 'Media': '02, audio'}

We can write the conversation to file as a json file:

In [None]:
parsed_cha.write_json(name = "testjson", directory = ".")

## The `Corpus` object

A Corpus is a way to collect conversations.

A Corpus can be initialized from a single conversation, or a list of conversations.
It can also be initialized as an empty object, with metadata.

In [3]:
democorpus = sktalk.Corpus(author = "demo")
democorpus2 = sktalk.Corpus(conversations=[parsed_cha])


print(democorpus.metadata)
print(democorpus2)

{'author': 'demo'}
<sktalk.corpus.corpus.Corpus object at 0x10f8579d0>
