<img src="./images/Blob_logo.png" width=600 height=600/>

# Table of Contents

1. [Introduction](#Introduction)
2. [Methodology](#Methodology)
3. [Implementation](#Implementation)
4. [Deployment](#Deployment)
5. [References](#References)

# Introduction

As part of our pipeline, we analyze users' conversation to determine social areas of development and provide personal feedback to our users.

<img src="./images/content_tracker.png" height=800 width=800 />

Conversation analysis contains two main components:
1. **Facial Emotion Recognition**
2. **Speech Analysis**

In this notebook, we will be tackling the *Speech Analysis* part

<img src="./images/analysis_tracker.png" height=500 width=500 />

# Methodology

The service will be implemented as follows:

<img src="./images/diagram.png" height=600 width=600 />


## Social Areas of Development

Through our research, we were able to determine the following areas of development:

1. Lack of empathy and understanding towards others
2. Use of inappropriate or offensive language
3. Interrupting or talking over others
4. Failure to listen actively or to show interest in the conversation
5. Failure to establish rapport or build a connection with the other person
6. Talking excessively and dominating the conversation
7. Bringing up sensitive or controversial topics without regard for the other person's feelings
8. Engaging in negative or hostile behavior, such as insults or verbal attacks
9. Dismissing or belittling the other person's perspective or experiences
10. Failing to communicate clearly or effectively, resulting in misunderstandings or confusion.

For the purposes of this service, we will only be searching for 5 of these topics however:
<img src="./images/development_topics.png" height=800 width=800 />

| Issue | Description | Importance | How to fix |
| --- | --- | --- | --- |
| Lack of empathy and understanding towards others | Difficulty in understanding and relating to other people's feelings, perspectives, and experiences. | Important as it can lead to misunderstandings, conflicts, and strained relationships. | Practicing active listening, perspective-taking, and showing genuine interest in others can help increase empathy and understanding towards others. Engaging in cultural diversity training can also help broaden one's perspectives and increase cultural competence. |
| Use of inappropriate or offensive language | Using language that can be hurtful, offensive, or derogatory towards others. | Important as it can create a hostile or uncomfortable environment, hurt others' feelings, and damage relationships. | Practicing awareness of language use and avoiding using offensive or derogatory language towards others. Apologizing and making amends if offensive language is used. Engaging in cultural sensitivity training can also help broaden one's perspectives and increase cultural competence. |
| Bringing up sensitive or controversial topics without regard for the other person's feelings | Initiating conversations or bringing up topics that may be sensitive or controversial without considering how the other person may feel about it. | Important as it can lead to discomfort, hurt feelings, or even conflict. | Practicing awareness of the other person's feelings and perspective before initiating conversations on sensitive or controversial topics. Asking for permission or checking in with the other person before bringing up such topics. Using a respectful and non-judgmental tone when discussing sensitive or controversial topics. |
| Engaging in negative or hostile behavior, such as insults or verbal attacks | Using aggressive, hostile, or negative language towards others. | Important as it can create a hostile or uncomfortable environment, hurt others' feelings, and damage relationships. | Practicing awareness of one's behavior and language use towards others. Avoiding insults or verbal attacks towards others. Using a respectful and non-judgmental tone when communicating with others, even in times of conflict or disagreement. |
| Dismissing or belittling the other person's perspective or experiences | Dismissing or belittling the other person's feelings, perspectives, or experiences. | Important as it can create a hostile or uncomfortable environment, hurt others' feelings, and damage relationships. | Practicing active listening and showing genuine interest in others' perspectives and experiences. Avoiding judgment or dismissive language when communicating with others. Using a respectful and non-judgmental tone when discussing differences in perspectives or experiences. |

# Implementation

## Transcription Service

In [37]:
import requests

filename = r"C:\Users\Anton\Desktop\Recording.m4a"
def read_file(filename, chunk_size=5242880):
    with open(filename, 'rb') as _file:
        while True:
            data = _file.read(chunk_size)
            if not data:
                break
            yield data

headers = {'authorization': "79fe1aefb66b45fca7e54ed5b2da42d3"}
response = requests.post('https://api.assemblyai.com/v2/upload',
                        headers=headers,
                        data=read_file(filename))

print(response.json())

{'upload_url': 'https://cdn.assemblyai.com/upload/ae8cf144-5150-483f-b13a-314c684743a8'}


In [38]:
url = response.json()['upload_url']

Let's queue for processing

In [39]:
endpoint = "https://api.assemblyai.com/v2/transcript"

json = {
  "audio_url": url,
  "speaker_labels": True

}

headers = {
  "Authorization": "79fe1aefb66b45fca7e54ed5b2da42d3",
  "Content-Type": "application/json"
}

response = requests.post(endpoint, json=json, headers=headers)

In [40]:
response.json()

{'id': '6r1jywzwrj-3807-4e83-bd23-b61bc89db514',
 'language_model': 'assemblyai_default',
 'acoustic_model': 'assemblyai_default',
 'language_code': 'en_us',
 'status': 'queued',
 'audio_url': 'https://cdn.assemblyai.com/upload/ae8cf144-5150-483f-b13a-314c684743a8',
 'text': None,
 'words': None,
 'utterances': None,
 'confidence': None,
 'audio_duration': None,
 'punctuate': True,
 'format_text': True,
 'dual_channel': None,
 'webhook_url': None,
 'webhook_status_code': None,
 'webhook_auth': False,
 'webhook_auth_header_name': None,
 'speed_boost': False,
 'auto_highlights_result': None,
 'auto_highlights': False,
 'audio_start_from': None,
 'audio_end_at': None,
 'word_boost': [],
 'boost_param': None,
 'filter_profanity': False,
 'redact_pii': False,
 'redact_pii_audio': False,
 'redact_pii_audio_quality': None,
 'redact_pii_policies': None,
 'redact_pii_sub': None,
 'speaker_labels': True,
 'content_safety': False,
 'iab_categories': False,
 'content_safety_labels': {},
 'iab_cate

In [41]:
id = response.json()['id']

Let's get result

In [44]:
endpoint = f"https://api.assemblyai.com/v2/transcript/{id}"
headers = {
    "authorization": "79fe1aefb66b45fca7e54ed5b2da42d3",
}
response = requests.get(endpoint, headers=headers)
print(response.json())

{'id': '6r1jywzwrj-3807-4e83-bd23-b61bc89db514', 'language_model': 'assemblyai_default', 'acoustic_model': 'assemblyai_default', 'language_code': 'en_us', 'status': 'completed', 'audio_url': 'https://cdn.assemblyai.com/upload/ae8cf144-5150-483f-b13a-314c684743a8', 'text': "Hey, man. How's it going? All doing good.", 'words': [{'text': 'Hey,', 'start': 1610, 'end': 1774, 'confidence': 0.9992, 'speaker': 'A'}, {'text': 'man.', 'start': 1812, 'end': 2062, 'confidence': 0.5873, 'speaker': 'A'}, {'text': "How's", 'start': 2116, 'end': 2346, 'confidence': 0.93009, 'speaker': 'A'}, {'text': 'it', 'start': 2378, 'end': 2478, 'confidence': 0.99975, 'speaker': 'A'}, {'text': 'going?', 'start': 2484, 'end': 3134, 'confidence': 0.8719, 'speaker': 'A'}, {'text': 'All', 'start': 3332, 'end': 3694, 'confidence': 0.99361, 'speaker': 'A'}, {'text': 'doing', 'start': 3732, 'end': 3982, 'confidence': 0.99865, 'speaker': 'A'}, {'text': 'good.', 'start': 4036, 'end': 4140, 'confidence': 0.99971, 'speaker':

In [48]:
utterances = response.json()['utterances']

[{'confidence': 0.9225575000000001,
  'end': 4140,
  'speaker': 'A',
  'start': 1610,
  'text': "Hey, man. How's it going? All doing good.",
  'words': [{'text': 'Hey,',
    'start': 1610,
    'end': 1774,
    'confidence': 0.9992,
    'speaker': 'A'},
   {'text': 'man.',
    'start': 1812,
    'end': 2062,
    'confidence': 0.5873,
    'speaker': 'A'},
   {'text': "How's",
    'start': 2116,
    'end': 2346,
    'confidence': 0.93009,
    'speaker': 'A'},
   {'text': 'it',
    'start': 2378,
    'end': 2478,
    'confidence': 1.0,
    'speaker': 'A'},
   {'text': 'going?',
    'start': 2484,
    'end': 3134,
    'confidence': 0.8719,
    'speaker': 'A'},
   {'text': 'All',
    'start': 3332,
    'end': 3694,
    'confidence': 0.99361,
    'speaker': 'A'},
   {'text': 'doing',
    'start': 3732,
    'end': 3982,
    'confidence': 0.99865,
    'speaker': 'A'},
   {'text': 'good.',
    'start': 4036,
    'end': 4140,
    'confidence': 0.99971,
    'speaker': 'A'}]}]

## Sentiment Analysis

In [5]:
import openai

openai.api_key = "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"

prompt = """You are a communication and social expert. Your job is to identify issues in people's conversations. You will be given a text extract and you need to identify if one of these issues were present in the conversation:

- Lack of empathy and understanding towards others
- Use of inappropriate or offensive language
- Bringing up sensitive or controversial topics without regard for the other person's feelings
- Engaging in negative or hostile behavior, such as insults or verbal attacks
- Dismissing or belittling the other person's perspective or experiences

Note that each one could have more than one issue. Return ONLY a vector with 1 if the issue is present, and 0 otherwise. Don't name the issue, just use a vector where the dimensions follow the order of the issues as i gave them to you. Don't return anything other than this vector. Don't explain anything. Don't include any text other than the vector."""



conversations = ["Hey man, how are you doing? All's good? Yeah man, all is good. And you? All's good too.  So I hear you graduated as a computer engineer from LAU. What are you doing now? I'm actually working at CompanyX Oh, what's that? Never heard of it...  It's a small development firm. Were you not able to secure a job at Google? that's funny. I actually work at Google. It is what it is man. Great for you."]

labels = []

for conv in conversations:
    response = openai.ChatCompletion.create(
          model="gpt-3.5-turbo",
          messages=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": conv},
            ]
        )

    labels.append( response['choices'][0]['message']['content'])
    
labels

['[0, 0, 1, 0, 0, 1]']

### Speakers


In [12]:
import openai

openai.api_key = "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"

prompt = """You are a communication and social expert. You will be given a text extract that contains a conversation between two speakers. Your job is to identify each speaker's communication weaknesses. The weaknesses can be one of the following:

- Lack of empathy and understanding towards others
- Use of inappropriate or offensive language
- Bringing up sensitive or controversial topics without regard for the other person's feelings
- Engaging in negative or hostile behavior, such as insults or verbal attacks
- Dismissing or belittling the other person's perspective or experiences

The text you will be given will be labeled according to speaker. You need to return exactly one vector for each speaker that takes into account all the things said by only this speaker.

The vector will have a 1 if the issue is present, and 0 otherwise. Don't name the issue, just use a vector where the dimensions follow the order of the issues as i gave them to you. Note that each one could have more than one issue. 

Make sure not to confuse the speakers. Pay very close attention to who is exhibiting the issue and put the flag in that speaker's vector. Don't put it in the others'. If B exhibited an issue towards A (for instance, if B was insensitive to A), then the 1 would appear in B's vector. It would not appear in A's. 

Make sure your output follows the following format: '{Speaker A: [x, x, x, x, x], Speaker B: [x, x, x, x, x]}'

"""



conversations = ["""

Person A: "I'm really struggling with this project. I feel like I'm not making any progress."
Person B: "Well, have you tried working harder? Maybe you just need to put in more effort."

Person A: "I have been working really hard, but I just can't seem to get the results I want."
Person B: "That's because you're not smart enough. Some people just don't have what it takes."

Person A: "That's not fair. I'm doing the best I can with the resources I have."
Person B: "Well, maybe you should have planned better or asked for help earlier. It's not my problem if you can't handle the pressure."

Person A: "I really need some support right now. I feel like I'm drowning in this project and no one is listening to me."
Person B: "I understand that you're struggling, but I have my own problems to deal with. I don't have time to listen to your complaints."

Person A: "I'm sorry if I'm being a burden. I just really need some help right now."
Person B: "Well, if you can't handle the pressure, maybe this isn't the right job for you."

"""]

labels = []

for conv in conversations:
    response = openai.ChatCompletion.create(
          model="gpt-3.5-turbo",
          messages=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": conv},
            ]
        )

    labels.append( response['choices'][0]['message']['content'])
    
labels

['{Speaker A: [0, 0, 0, 0, 0], Speaker B: [1, 1, 1, 1, 1]}']

# Deployment

For deployment, refer to the *src* folder of the notebook.

# References

https://www.assemblyai.com/docs/walkthroughs#getting-the-transcription-result  
https://www.assemblyai.com/blog/the-top-free-speech-to-text-apis-and-open-source-engines/  
End-to-End Speech Recognition: A Survey Rohit Prabhavalkar, Member, IEEE, Takaaki Hori, Senior Member, IEEE, Tara N. Sainath, Fellow, IEEE, Ralf Schluter, ¨ Senior Member, IEEE, and Shinji Watanabe, Fellow, IEEE