# PyConverse Basic usage:

Import necessary functions and class from pyconverse

_**Note: first time you install pyconverse and make these imports it downloads few transformers models, sentence-transformers in the backend hence the first import might take few minutes.**_

In [2]:
import pandas as pd
from pprint import pprint

from pyconverse import Callyzer, SpeakerStats
from pyconverse import SemanticTextSegmentation, ZeroShotTopicFinder, TranscriptSummarization



## Load sample dataset

_**Note: to load your own transcript dataset, let's say from aws-transcribe, google-cloud, azure etc or any other services, you would need to convert your transcripts into a pandas dataframe and while you initialize the `Callyzer` class, you need to point towards the speaker, utterance, start-time & end-time for each utterance.**_

In [3]:
transcript_df = pd.read_csv("sample_transcript_data.csv"); transcript_df.head() #read sample data

Unnamed: 0,speaker,utterance,start_time,end_time
0,CHI,he wan an explorer,0.0,1.033
1,CHI,and he discovered America,1.033,3.4
2,CHI,and he went for the king and queen of Spain.,3.4,7.983
3,INV,tell me what you know about the Pilgrims.,16.161,20.777
4,CHI,well they moved from England and came to Ameri...,20.777,28.711


## Analyse the Call transcript

Initialise the core call analysis class `Callyzer` with your dataset represented as a pandas dataframe and point towards utterance, speaker, start-time & end-time columns in it.

In [5]:
transcript_analysis = Callyzer(data=transcript_df, utterance="utterance", speaker="speaker", starttime="start_time", endtime="end_time")

compute and access various attributes of the call as follows: 

## Find Interruptions and periods of silence in a call.

In [6]:
interruptions = transcript_analysis.get_interruption() #interruption periods in a call
silence = transcript_analysis.get_silence() #periods of silence in a call

print("1. INTERRUPTIONS:\n")
pprint(interruptions)

print("\n2. PERIODS OF SILENCE:\n")
pprint(silence)

1. INTERRUPTIONS:

{'CHI': {'count': 1,
         'metadata': [{'end_time': 1223.032,
                       'index': 200,
                       'start_time': 1222.365}]},
 'INV': {'count': 1,
         'metadata': [{'end_time': 574.931,
                       'index': 102,
                       'start_time': 574.198}]},
 'total_interruption': 2}

2. PERIODS OF SILENCE:

{'CHI': {'count': 3,
         'metadata': [{'end_time': 189.865, 'index': 34, 'start_time': 188.765},
                      {'end_time': 279.215, 'index': 48, 'start_time': 275.082},
                      {'end_time': 3190.575,
                       'index': 617,
                       'start_time': 3182.483}]},
 'INV': {'count': 6,
         'metadata': [{'end_time': 20.777, 'index': 3, 'start_time': 16.161},
                      {'end_time': 964.2, 'index': 166, 'start_time': 963.0},
                      {'end_time': 1512.706,
                       'index': 253,
                       'start_time': 1509.049},
    

## Find the Backchannel utterances in a call transcripts.


Backchannels can be verbal, non-verbal (visual) or both. Vocalisations like 'hmm' or 'uh-huh', gestures such as head nods or head shakes, and a combination of verbal and non-verbal responses are common examples of backchannels. `pyconverse` identifies verbal backchannels using two different methods: 

1. default : via a set of commonly used backchannel keywords dictionary - fast, slightly low accuracy.
2. nlp: via sentence similarity with sentence-transformers - slow, high accuracy. 

_**Note: the backchannel identification with sentence similarity implementation is  highly inspired by facebook's [Unsupervised Topic Segmentation of Meetings with BERT Embeddings](https://arxiv.org/abs/2106.12978) paper.**_

The way this works is by taking common backchannel words like "okay", "thats it", "ummhhh" etc as backchannel samples and then do maxpool and  we apply sentence similarity with all utterances in the transcript.

In [7]:
backchannels_via_keywords = transcript_analysis.tag_backchannel().query("is_backchannel == True") #identify backchannel utterances via keywords
backchannels_via_transformers = transcript_analysis.tag_backchannel(type='nlp').query("is_backchannel == True") #identify backchannel utterances with sentence-transformers

In [8]:
backchannels_via_keywords

Unnamed: 0,speaker,utterance,start_time,end_time,is_backchannel
28,INV,okay,156.133,156.866,True
52,INV,okay,284.348,284.914,True
65,INV,okay,342.564,343.03,True
83,INV,okay,450.4,451.066,True
105,INV,okay,585.98,586.63,True
120,INV,okay,701.13,701.53,True
128,INV,okay,741.783,742.149,True
135,CHI,no,770.545,771.411,True
136,CHI,yeah,771.411,771.895,True
158,INV,okay,916.974,917.574,True


In [9]:
backchannels_via_transformers

Unnamed: 0,speaker,utterance,start_time,end_time,is_backchannel
17,INV,okay great,79.315,81.382,True
21,CHI,well he,103.298,114.700,True
28,INV,okay,156.133,156.866,True
37,INV,okay good,196.855,198.232,True
52,INV,okay,284.348,284.914,True
...,...,...,...,...,...
667,INV,that's it,3423.566,3424.170,True
681,INV,mhm.,3460.900,3461.353,True
685,INV,okay,3485.700,3486.594,True
696,INV,okay,3532.419,3533.789,True


backchannel detection with keywords returned with **39 utterances** vs backchannel detection with sentence-transformers returned with **68 utterances**! 

## Find the utterances which are interrogative questions

In [10]:
questions = transcript_analysis.tag_questions().query("is_question == True") #identiy utterances which are questions
questions

Unnamed: 0,speaker,utterance,start_time,end_time,is_backchannel,is_question
84,INV,what else.,451.066,451.916,False,True
167,INV,what else can you tell me.,964.2,966.666,False,True
446,INV,what do you know about ballet.,2382.482,2384.188,False,True
474,INV,what do you know about the Blues.,2539.778,2541.891,False,True
572,INV,what do you know about the Columbine shooting.,2990.69,2993.14,False,True
578,INV,what do you know about the Oklahoma_City bombing.,3018.327,3021.09,False,True
585,INV,what do you know about Princess_Diana.,3038.958,3040.908,False,True
616,INV,what do you know about John_F_Kennedy_Junior.,3172.962,3175.517,False,True
663,INV,what do you know about the Million_Man_March.,3393.043,3395.748,False,True
668,INV,what do you know about Moesha.,3424.17,3425.586,False,True


## Identify the emotions of the utterances

note: this might take some time as it uses miniLM language model.

In [11]:
transcript_analysis_ = Callyzer(transcript_df.tail(), utterance="utterance", speaker="speaker", starttime="start_time", endtime="end_time")

emotions = transcript_analysis_.tag_emotion(); emotions[["speaker", "utterance", "emotion"]]
#if no emotionis identified, it returns 'not found'.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.data['emotion'] = classes


Unnamed: 0,speaker,utterance,emotion
696,INV,okay,not found
697,INV,that's everything,Impressed
698,INV,took a long time huh,not found
699,INV,did you get tired,not found
700,INV,yeah.,not found


## Identify if a given utterance is empathetic or not

In [12]:
empathy = transcript_analysis_.tag_empathy(); empathy[["speaker", "utterance", "is_empathy"]]
#if no empathy is identified, it classifies the sentence as 'non_empathetic', if identified it returns 'empathetic'.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.data['is_empathy'] = classes


Unnamed: 0,speaker,utterance,is_empathy
696,INV,okay,Neutral
697,INV,that's everything,not found
698,INV,took a long time huh,not found
699,INV,did you get tired,not found
700,INV,yeah.,not found


## Collapse utterances into Turn level text chunks:

In [13]:
# convert the data at speaker level to turn level
df = transcript_analysis.convert_at_turn()

print(f"1. Original Utterance count: {transcript_df.shape[0]}\n2. After collapsing the utterance to turn level: {df.shape[0]}")

1. Original Utterance count: 701
2. After collapsing the utterance to turn level: 194


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cacher_needs_updating = self._check_is_chained_assignment_possible()


## Identiy the overall Psycological correlatedness nature of the speakers

In [14]:
ss = SpeakerStats(df, speaker='speaker')
pprint(ss.get_stats())

{'CHI': ['Focused on the past', 'Verbal fluency, cognitive complexity'],
 'INV': ['Use of concrete nouns, interest in objects/things',
         'Education, concern with precision']}


## Call segmentation

lets segment our calls into bigger chunks of texts via semantic sentence similairty & text tilling algorithms. 

In [15]:
sts = SemanticTextSegmentation(df)
segments = sts.get_segments()

for segment in segments[0:4]:
    pprint(segment)
    print("-"*50)

('he wan an explorer and he discovered America and he went for the king and '
 'queen of Spain. tell me what you know about the Pilgrims. well they moved '
 'from England and came to America to Plymouth_Rock and it was hard for them '
 'at first when they came and a_lot_of them died and stuff and the Indians '
 'they met the Indians and they came along and helped them and that was how '
 'the first Thanksgiving started. can you tell me anything else about the '
 "Pilgrims. they moved from England because of the king he wasn't treating "
 "them right and they couldn't have what they they had to worship what he "
 "wanted to worship and do what he wanted to do and they didn't like that so "
 'they moved.')
--------------------------------------------------
(' can you tell me anything else okay great tell me everything you know about '
 "George_Washington. he was the first president he was well he I'm trying to "
 'well he fought in the Civil_War he was a general in the Civil_War and '
 "

## ZeroShot topic identification

Identify topics being discussed in a call via zero shot topic infernce at utterance/segment level (works best on segments)

In [17]:
zst = ZeroShotTopicFinder()

In [18]:
for text in segments[0:2]:
    print(f"Text: {text}\n")
    print(f"Topics: {zst.find_topic(text)}\n")
    print("-"*50)

Text: he wan an explorer and he discovered America and he went for the king and queen of Spain. tell me what you know about the Pilgrims. well they moved from England and came to America to Plymouth_Rock and it was hard for them at first when they came and a_lot_of them died and stuff and the Indians they met the Indians and they came along and helped them and that was how the first Thanksgiving started. can you tell me anything else about the Pilgrims. they moved from England because of the king he wasn't treating them right and they couldn't have what they they had to worship what he wanted to worship and do what he wanted to do and they didn't like that so they moved.

Topics: ['Settler', 'Wayfarer']

--------------------------------------------------
Text:  can you tell me anything else okay great tell me everything you know about George_Washington. he was the first president he was well he I'm trying to well he fought in the Civil_War he was a general in the Civil_War and chopped 

### Transcript Summarization

Summarize the whole transcript

In [19]:
sample = transcript_df.iloc[:30]
ts = TranscriptSummarization(sample)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [20]:
print(ts.get_summary())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
100%|██████████| 2/2 [00:15<00:00,  7.85s/it]

CHI tells INV about the Pilgrims who came to America from England and settled at Plymouth Rock. They met the Indians who helped them. George Washington discovered America and went for the king and queen of Spain.He was the first president. He fought in the Civil War as a general. He chopped down his father's cherry tree as a boy.



