### Punctuation
- period (.) question mark (?) or exclamation point (!)
- Trailing Off +...
- Interruption +/.
```
*EXP: what did you do +/.
*CHI: mommy .
*EXP: +, with your spoon .
```
- Self Interruption +//.


- Retracing Without Correction [/] (also in fluency codes, Appendix 7)

the material being retraced is enclosed in angle
brackets. In a retracing without correction, it is necessarily the case that the material in angle
brackets is the same as the material immediately following the [/] symbol.
```
*CHI: <I wanted> [/] I wanted to invite Margie .
```


In [47]:
import re

def find_retracing_markers(text):
    # Define patterns for different types of retracing markers
    retracing_with_brackets_pattern = r'(<[^>]+> \[//\])|(<[^>]+> \[/\])'
    retracing_single_word_pattern = r'(\b\w+\b \[/\])'
    retracing_with_fillers_pattern = r'(<[^>]+> \[/\] \(.*?\) &-[a-z]+ \[.*?\])'

    # Finding all occurrences of each pattern
    retracing_with_brackets_matches = re.findall(retracing_with_brackets_pattern, text)
    retracing_single_word_matches = re.findall(retracing_single_word_pattern, text)
    retracing_with_fillers_matches = re.findall(retracing_with_fillers_pattern, text)

    # Extract only the matched group that is not empty (re.findall with multiple groups can return tuples with empty strings)
    retracing_with_brackets_matches = [match[0] if match[0] else match[1] for match in retracing_with_brackets_matches]

    # Display the results
    print('Retracing with brackets:')
    for match in retracing_with_brackets_matches:
        print(match)
    # print('\nSingle word retracing:')
    # for match in retracing_single_word_matches:
    #     print(match)
    # print('\nRetracing with fillers:')
    # for match in retracing_with_fillers_matches:
    #     print(match)

# Example transcript text
transcript_text = """
*CHI: <I wanted> [/] I wanted to invite Margie .
*CHI: it's [/] (.) &-um (.) it's [/] it's (.) a &-um (.) dog .
*CHI: apple [/] apple is good.
*CHI: <I wanted> [//] &-uh I thought I wanted to invite Margie .
*CHI: <the fish is> [//] the [/] the fish are swimming .
*CHI: <it was> [//] it is a sunny day. 
"""

# Find retracing markers in the transcript
find_retracing_markers(transcript_text)

Retracing with brackets:
<I wanted> [/]
<I wanted> [//]
<the fish is> [//]
<it was> [//]


- Retracing with Correction [//]


In [50]:
import re

def find_bracketed_phrases(text):
    # Define the pattern to match phrases inside <>
    bracketed_pattern = r'<[^>]+>'

    # Find all occurrences of the pattern
    matches = re.findall(bracketed_pattern, text)

    # Display the results
    print('Bracketed phrases:')
    for match in matches:
        print(match)

# Example transcript text
transcript_text = """
*CHI: <I wanted> [//] &-uh I thought I wanted to invite Margie .
*CHI: <the fish is> [//] the [/] the fish are swimming .
*CHI: <it was> [//] it is a sunny day. 
"""

# Find bracketed phrases in the transcript
find_bracketed_phrases(transcript_text)

Bracketed phrases:
<I wanted>
<the fish is>
<it was>


all symbols
```
(.)
(?)
(!)
+...
+/.
+//.
[/]
[//]
```


### Overlaps
- Interposed word &*


In [36]:
import re

def find_interposed_words(text):
    # Define the pattern for interposed words
    interposed_pattern = r'&\*[A-Z]{3}:[a-zA-Z]+'

    # Find all occurrences of the pattern
    matches = re.findall(interposed_pattern, text)

    # Display the results
    print('Interposed words:')
    for match in matches:
        print(match)

# Example transcript text
transcript_text = """
*PAR: it was really difficult &*INV:mhm when all of that was happening.
*CHI: I think &*MOT:yeah that it's a good idea.
*EXP: This is interesting &*PAR:uh-huh but could you explain more?
"""

# Find interposed words in the transcript
find_interposed_words(transcript_text)

Interposed words:
&*INV:mhm
&*MOT:yeah
&*PAR:uh


- Lazy overlap +<
- Overlap follows [>]


In [45]:
import re

def find_overlap_markers(text):
    # Define the patterns for overlap markers with content restricted to letters
    overlap_follows_pattern = r'<[a-zA-Z ]+> \[>\]'
    overlap_precedes_pattern = r'<[a-zA-Z ]+> \[<\]'

    # Find all occurrences of each pattern
    overlap_follows_matches = re.findall(overlap_follows_pattern, text)
    overlap_precedes_matches = re.findall(overlap_precedes_pattern, text)

    # Display the results
    print('Overlap follows markers:')
    for match in overlap_follows_matches:
        print(match)
    
    print('\nOverlap precedes markers:')
    for match in overlap_precedes_matches:
        print(match)

# Example transcript text
transcript_text = """
*INV: how did you communicate <with her> [>] ?
*PAR: <I just kept talking> [<] .
*CHI: <it was very> [>] interesting.
*EXP: <I thought> [<] it was too.
"""

# Find overlap markers in the transcript
find_overlap_markers(transcript_text)


Overlap follows markers:
<with her> [>]
<it was very> [>]

Overlap precedes markers:
<I just kept talking> [<]
<I thought> [<]


all symbols
```
+<
```


### FLUENCY codes
- Unfilled pauses

```
(.)
(..)
(...)
```
- Filled pauses


In [26]:
import re

def find_fillers(text):
    # Define the pattern for fillers marked with &-
    fillers_pattern = r'&-[a-zA-Z_]+'

    # Find all occurrences of the pattern
    matches = re.findall(fillers_pattern, text)

    # Display the results
    print('Fillers:')
    for match in matches:
        print(match)

# Example transcript text
transcript_text = """
*CHI: I was like &-um going to the store &-you_know to buy some &-stuff.
*INV: So, &-like, what do you think about that?
*PAR: Well, &-you_know, it's kind of &-um complicated.
"""

# Find fillers in the transcript
find_fillers(transcript_text)

Fillers:
&-um
&-you_know
&-stuff
&-like
&-you_know
&-um


- Quotation on Next Line +”/.
- Quotation Precedes +”


In [49]:
import re

def find_quotation_markers(text):
    # Define the pattern for "Quotation Precedes" and single quoted words
    quotation_precedes_pattern = r'\+\”[^.]*'
    single_quoted_words_pattern = r'\b\w+@q\b'

    # Find all occurrences of each pattern
    quotation_precedes_matches = re.findall(quotation_precedes_pattern, text)
    single_quoted_words_matches = re.findall(single_quoted_words_pattern, text)

    # Display the results
    # print('Quotation Precedes markers:')
    # for match in quotation_precedes_matches:
    #     print(match)
    
    print('\nSingle quoted words:')
    for match in single_quoted_words_matches:
        print(match)

# Example transcript text
transcript_text = """
*CHI: +” please give me all of your honey .
*CHI: the little bear said +”.
*CHI: and the boy said shh@q .
"""

# Find quotation markers in the transcript
find_quotation_markers(transcript_text)


Single quoted words:
shh@q


- Multiple words that should hang together


In [28]:
import re

def find_frozen_phrases(text):
    # Define the pattern for frozen phrases linked with underscores
    frozen_phrases_pattern = r'\b\w+(?:_\w+)+\b'

    # Find all occurrences of the pattern
    matches = re.findall(frozen_phrases_pattern, text)

    # Display the results
    print('Frozen phrases:')
    for match in matches:
        print(match)

# Example transcript text
transcript_text = """
*CHI: I think you_know that patty_cake is a fun game.
*INV: What do you think about Nan_Bernstein_Ratner's work?
*PAR: Well, Mister_Spock was a character in Star Trek.
*CHI: We went to the merry_go_round yesterday.
"""

# Find frozen phrases in the transcript
find_frozen_phrases(transcript_text)

Frozen phrases:
you_know
patty_cake
Nan_Bernstein_Ratner
Mister_Spock
merry_go_round


- Other Coding Conventions: ERRORS!


In [29]:
import re

def find_errors_and_replacements(text):
    # Define the patterns for errors and target replacements
    error_pattern = r'\[\*\]'
    replacement_pattern = r'\[: [^\]]+\]'

    # Find all occurrences of each pattern
    error_matches = re.findall(error_pattern, text)
    replacement_matches = re.findall(replacement_pattern, text)

    # Display the results
    # print('Errors:')
    # for match in error_matches:
    #     print(match)
    
    print('\nTarget Replacements:')
    for match in replacement_matches:
        print(match)

# Example transcript text
transcript_text = """
*CHI: he had two mouses [: mice] [*] .
*CHI: two cookie [*].
*CHI: to [*] home.
*CHI: he going to the store.
"""

# Find errors and replacements in the transcript
find_errors_and_replacements(transcript_text)


Target Replacements:
[: mice]


- Missing words: 0


In [30]:
import re

def find_omitted_words(text):
    # Define the pattern for omitted words marked with the zero symbol
    omitted_word_pattern = r'\b0\w+\b'

    # Find all occurrences of the pattern
    matches = re.findall(omitted_word_pattern, text)

    # Display the results
    print('Omitted words:')
    for match in matches:
        print(match)

# Example transcript text
transcript_text = """
*CHI: 0does he like it?
*PAR: 0mod he like it?
*CHI: 0is she going?
*PAR: 0has he done it?
"""

# Find omitted words in the transcript
find_omitted_words(transcript_text)

Omitted words:
0does
0mod
0is
0has


- Phonological Fragments &+fr


In [31]:
import re

def find_word_fragments(text):
    # Define the pattern for word fragments marked with &+
    word_fragment_pattern = r'&\+\w+'

    # Find all occurrences of the pattern
    matches = re.findall(word_fragment_pattern, text)

    # Display the results
    print('Word fragments:')
    for match in matches:
        print(match)

# Example transcript text
transcript_text = """
*CHI: he had a &+fr friend.
*CHI: I really wanted to &+vi visit the zoo.
*CHI: she is &+un unbelievable.
"""

# Find word fragments in the transcript
find_word_fragments(transcript_text)

Word fragments:
&+fr
&+vi
&+un


- Unintelligible words xxx
- Pauses (.)
- Babbling and Jargon (from kids or patients)


In [32]:
import re

def find_annotations(text):
    # Define the pattern to match annotations [=! ...] following yyy or xxx
    annotation_pattern = r'\b(?:yyy|xxx)\b \[=! [^\]]+\]'

    # Find all occurrences of the pattern
    matches = re.findall(annotation_pattern, text)

    # Extract and display the annotations
    print('Annotations:')
    for match in matches:
        # Extract the [=! ...] part
        annotation = re.search(r'\[=! [^\]]+\]', match).group()
        print(annotation)

# Example transcript text
transcript_text = """
*CHI: yyy [=! dada] .
*CHI: xxx [=! vocalizes/laughs/whines, etc] .
*PAR: I probably got xxx and things like that .
"""

# Find annotations in the transcript
find_annotations(transcript_text)

Annotations:
[=! dada]
[=! vocalizes/laughs/whines, etc]


- Tables for Other Communicative Behaviors and Words

- Excluding utterances from analysis

all symbols
```
+”/.
+”
[*]
xxx
yyy
(.)
[+ exc]
```

Shortened Words

In [59]:
import re

c c

# 示例文本
transcript_text = """
(a)bout don('t) (h)is (re)frigerator
an(d) (e)nough (h)isself (re)member
(a)n(d) (e)spress(o) -in(g) sec(ond)
(a)fraid (e)spresso nothin(g) s(up)pose
(a)gain (es)presso (i)n (th)e
(a)nother (ex)cept (in)stead (th)em
(a)round (ex)cuse Jag(uar) (th)emselves
ave(nue) (ex)cused lib(r)ary (th)ere
(a)way (e)xcuse Mass(achusetts) (th)ese
(be)cause (e)xcused micro(phone) (th)ey
(be)fore (h)e (pa)jamas (to)gether
(be)hind (h)er (o)k (to)mato
b(e)long (h)ere o(v)er (to)morrow
b(e)longs (h)erself (po)tato (to)night
Cad(illac) doc(tor) (h)im (h)imself prob(ab)ly (re)corder (un)til
wan(t)
"""

# 运行函数以找到缩写单词并打印结果
shortened_words = find_shortened_words(transcript_text)
print(shortened_words)


['(a)bout', '(h)is', '(re)frigerator', 'an(d)', '(e)nough', '(h)isself', '(re)member', 'sec(ond)', '(a)fraid', '(e)spresso', 'nothin(g)', 's(up)pose', '(a)gain', '(es)presso', '(i)n', '(th)e', '(a)nother', '(ex)cept', '(in)stead', '(th)em', '(a)round', '(ex)cuse', 'Jag(uar)', '(th)emselves', 'ave(nue)', '(ex)cused', 'lib(r)ary', '(th)ere', '(a)way', '(e)xcuse', 'Mass(achusetts)', '(th)ese', '(be)cause', '(e)xcused', 'micro(phone)', '(th)ey', '(be)fore', '(h)e', '(pa)jamas', '(to)gether', '(be)hind', '(h)er', '(o)k', '(to)mato', 'b(e)long', '(h)ere', 'o(v)er', '(to)morrow', 'b(e)longs', '(h)erself', '(po)tato', '(to)night', 'Cad(illac)', 'doc(tor)', '(h)im', '(h)imself', 'prob(ab)ly', '(re)corder', '(un)til', 'wan(t)']


~~## Simple Events (not required [could replace to same words])~~


In [33]:
import re

def find_nonverbal_activities(text):
    # Define the pattern for non-verbal activities marked with &=
    nonverbal_pattern = r'&=\w+(:\w+)?'

    # Find all occurrences of the pattern
    matches = re.findall(nonverbal_pattern, text)

    # Display the results
    print('Non-verbal activities:')
    for match in matches:
        print(match)

# Example transcript text
transcript_text = """
*CHI: he had a &+fr friend.
*CHI: I really wanted to &+vi visit the zoo.
*CHI: she is &+un unbelievable.
*CHI: &=moans loudly.
*CHI: &=laughs and &=coughs.
*CHI: makes a noise &=imit:plane.
*CHI: gestures with frustration &=ges:frustration.
*CHI: moves the doll &=moves:doll.
*CHI: shows pictures &=shows:pictures.
*CHI: points to the picture &=points:picture.
*CHI: opens mouth &=opens:mouth.
"""

# Find non-verbal activities in the transcript
find_nonverbal_activities(transcript_text)

Non-verbal activities:



:plane
:frustration
:doll
:pictures
:picture
:mouth


cheating codes match all []

In [62]:
import re

def find_bracketed_content(text):
    # Regular expression to match content inside square brackets, specifically in the form [: ...]
    bracketed_pattern = r'\[.*?\]'
    
    # Find all matches in the text
    matches = re.findall(bracketed_pattern, text)
    
    return matches

# Example text
text = """
now (.) there (i)s a girl there (.) a girl elephant with a net that (i)s going to [: gonna] grab it.

and now the elephant fell and (.) <stub> [//] <made> [//] gots [: has got] a boo_boo on her knee.
"""

# Call the function and print the results
bracketed_content = find_bracketed_content(text)
print(bracketed_content)


['[: gonna]', '[//]', '[//]', '[: has got]']


gather symbols
```
(.)
(?)
(!)
+...
+/.
+//.
[/]
[//]
+<
+”/.
+”
[*]
xxx
yyy
(.)
[+ exc]
```

In [51]:
import re

def find_retracing_markers(text):
    retracing_with_brackets_pattern = r'(<[^>]+> \[//\])|(<[^>]+> \[/\])'
    retracing_single_word_pattern = r'(\b\w+\b \[/\])'
    retracing_with_fillers_pattern = r'(<[^>]+> \[/\] \(.*?\) &-[a-z]+ \[.*?\])'

    retracing_with_brackets_matches = re.findall(retracing_with_brackets_pattern, text)
    retracing_single_word_matches = re.findall(retracing_single_word_pattern, text)
    retracing_with_fillers_matches = re.findall(retracing_with_fillers_pattern, text)

    retracing_with_brackets_matches = [match[0] if match[0] else match[1] for match in retracing_with_brackets_matches]

    return retracing_with_brackets_matches + retracing_single_word_matches + retracing_with_fillers_matches

def find_interposed_words(text):
    interposed_pattern = r'&\*[A-Z]{3}:[a-zA-Z]+'
    matches = re.findall(interposed_pattern, text)
    return matches

def find_overlap_markers(text):
    overlap_follows_pattern = r'<[a-zA-Z ]+> \[>\]'
    overlap_precedes_pattern = r'<[a-zA-Z ]+> \[<\]'
    overlap_follows_matches = re.findall(overlap_follows_pattern, text)
    overlap_precedes_matches = re.findall(overlap_precedes_pattern, text)
    return overlap_follows_matches + overlap_precedes_matches

def find_fillers(text):
    fillers_pattern = r'&-[a-zA-Z_]+'
    matches = re.findall(fillers_pattern, text)
    return matches

def find_quotation_markers(text):
    quotation_precedes_pattern = r'\+\”[^.]*'
    single_quoted_words_pattern = r'\b\w+@q\b'
    quotation_precedes_matches = re.findall(quotation_precedes_pattern, text)
    single_quoted_words_matches = re.findall(single_quoted_words_pattern, text)
    return quotation_precedes_matches + single_quoted_words_matches

def find_frozen_phrases(text):
    frozen_phrases_pattern = r'\b\w+(?:_\w+)+\b'
    matches = re.findall(frozen_phrases_pattern, text)
    return matches

def find_errors_and_replacements(text):
    error_pattern = r'\[\*\]'
    replacement_pattern = r'\[: [^\]]+\]'
    error_matches = re.findall(error_pattern, text)
    replacement_matches = re.findall(replacement_pattern, text)
    return error_matches + replacement_matches

def find_omitted_words(text):
    omitted_word_pattern = r'\b0\w+\b'
    matches = re.findall(omitted_word_pattern, text)
    return matches

def find_word_fragments(text):
    word_fragment_pattern = r'&\+\w+'
    matches = re.findall(word_fragment_pattern, text)
    return matches

def find_annotations(text):
    annotation_pattern = r'\b(?:yyy|xxx)\b \[=! [^\]]+\]'
    matches = re.findall(annotation_pattern, text)
    annotations = [re.search(r'\[=! [^\]]+\]', match).group() for match in matches]
    return annotations

def find_nonverbal_activities(text):
    nonverbal_pattern = r'&=\w+(:\w+)?'
    matches = re.findall(nonverbal_pattern, text)
    return matches

def collect_all_matches(text):
    all_matches = []
    all_matches += find_retracing_markers(text)
    all_matches += find_interposed_words(text)
    all_matches += find_overlap_markers(text)
    all_matches += find_fillers(text)
    all_matches += find_quotation_markers(text)
    all_matches += find_frozen_phrases(text)
    all_matches += find_errors_and_replacements(text)
    all_matches += find_omitted_words(text)
    all_matches += find_word_fragments(text)
    all_matches += find_annotations(text)
    all_matches += find_nonverbal_activities(text)

    # Remove duplicates and empty values
    all_matches = list(set(filter(None, all_matches)))
    return all_matches

# Example transcript text
transcript_text = """
*CHI: <I wanted> [/] I wanted to invite Margie .
*CHI: it's [/] (.) &-um (.) it's [/] it's (.) a &-um (.) dog .
*CHI: apple [/] apple is good.
*CHI: <I wanted> [//] &-uh I thought I wanted to invite Margie .
*CHI: <the fish is> [//] the [/] the fish are swimming .
*CHI: <it was> [//] it is a sunny day.
*INV: how did you communicate <with her> [>] ?
*PAR: <I just kept talking> [<] .
*CHI: <it was very> [>] interesting.
*EXP: <I thought> [<] it was too.
*PAR: it was really difficult &*INV:mhm when all of that was happening.
*CHI: I think &*MOT:yeah that it's a good idea.
*EXP: This is interesting &*PAR:uh-huh but could you explain more?
*CHI: I was like &-um going to the store &-you_know to buy some &-stuff.
*INV: So, &-like, what do you think about that?
*PAR: Well, &-you_know, it's kind of &-um complicated.
*CHI: +” please give me all of your honey .
*CHI: the little bear said +”.
*CHI: and the boy said shh@q .
*CHI: I think you_know that patty_cake is a fun game.
*INV: What do you think about Nan_Bernstein_Ratner's work?
*PAR: Well, Mister_Spock was a character in Star Trek.
*CHI: We went to the merry_go_round yesterday.
*CHI: he had two mouses [: mice] [*] .
*CHI: two cookie [*].
*CHI: to [*] home.
*CHI: he going to the store.
*CHI: 0does he like it?
*PAR: 0mod he like it?
*CHI: 0is she going?
*PAR: 0has he done it?
*CHI: he had a &+fr friend.
*CHI: I really wanted to &+vi visit the zoo.
*CHI: she is &+un unbelievable.
*CHI: yyy [=! dada] .
*CHI: xxx [=! vocalizes/laughs/whines, etc] .
*PAR: I probably got xxx and things like that .
*CHI: &=moans loudly.
*CHI: &=laughs and &=coughs.
*CHI: makes a noise &=imit:plane.
*CHI: gestures with frustration &=ges:frustration.
*CHI: moves the doll &=moves:doll.
*CHI: shows pictures &=shows:pictures.
*CHI: points to the picture &=points:picture.
*CHI: opens mouth &=opens:mouth.
"""

# Run the function to collect all matches and print the result
all_matches = collect_all_matches(transcript_text)
print(all_matches)


['0is', '<I just kept talking> [<]', '0mod', 'patty_cake', '&-you_know', 'shh@q', '+”', ':mouth', '0does', 'Mister_Spock', '[: mice]', '[=! vocalizes/laughs/whines, etc]', '&+vi', '&*INV:mhm', ':plane', '<I wanted> [//]', '<I wanted> [/]', '+” please give me all of your honey ', '0has', '&*MOT:yeah', '&-uh', '[=! dada]', '&+fr', '<with her> [>]', '&-stuff', 's [/]', '&+un', '<it was> [//]', 'Nan_Bernstein_Ratner', 'apple [/]', '&-um', 'the [/]', '<it was very> [>]', '<the fish is> [//]', ':picture', '&*PAR:uh', ':pictures', '&-like', '[*]', ':frustration', ':doll', '<I thought> [<]', 'merry_go_round', 'you_know']
