## Package installation

In [21]:
#pip install openai-whisper

In [57]:
#pip install jiwer

In [None]:
#pip install tempfile

## Package import and key declaration

In [94]:
import pandas as pd
import openai
import IPython
import tempfile
import re
import string

OPENAI_API_KEY=""
WHISPER_VERSION="whisper-1"
OPENAI_CHATGPT_VERSION="gpt-4-turbo-preview"

In [17]:
openai_client = openai.OpenAI(api_key=OPENAI_API_KEY)

NameError: name 'openai' is not defined

## Speech to text

1) Import one voice sample

In [5]:
path="REPONSES Q1/user_amld/answer_ts2024-03-22_083150.8049190000_audio.mp3"

2) Listen it with the audio display

In [25]:
IPython.display.Audio(path)

## 1 - Create a function called "convert(path_to_speech)"

**The function should step-by-step:**

a. Open the audio using open()

b. Create an openai client 

c. Transcript with openAI whisper

d. Return the transcript in "Prediction"


In [3]:
def convert(path_to_speech):
    return

## 2 - Benchmark whisper

a. We’ll use a running example for each of the metric definitions. 
Here, we have a ground truth or reference text sequence:

In [30]:
Reference = "Retraites Populaires m'a accompagné dans le choix de mon troisième pilier dès le début et a su être patient et à l'écoute dans toute la démarche."

And a predicted sequence from the speech recognition system that we’re trying to assess:



In [31]:
Prediction

"Le traite populaire m'a accompagné dans le choix de mon troisième pilier dès le début et j'en suis été patient et à l'écoute pour toute la démarche."

We can see that the prediction is pretty close, but some words are not quite right. We’ll evaluate this prediction against the reference for the three most popular speech recognition metrics and see what sort of numbers we get for each.

## 3 - Word Error Rate

### Word Error Rate

The word error rate (WER) metric is the ‘de facto’ metric for speech recognition. It calculates substitutions, insertions and deletions on the word level. This means errors are annotated on a word-by-word basis. Take our example:
​	
 

In [38]:
from evaluate import load

wer_metric = load("wer")

In [39]:
wer = wer_metric.compute(references=[Reference], 
                         predictions=[Prediction])
print(wer)

0.2692307692307692


0.27, or 26.9%, as expected! We now know what’s going on under-the-hood with this WER calculation.


## Word Accuracy

We can flip the WER around to give us a metric where higher is better. Rather than measuring the word error rate, we can measure the word accuracy (WAcc) of our system:
 
### WAcc=1−WER​ 

 

The WAcc is also measured on the word-level, it’s just the WER reformulated as an accuracy metric rather than an error metric. 

The WAcc is very infrequently quoted in the speech literature - we think of our system predictions in terms of word errors, and so prefer error rate metrics that are more associated with these error type annotations.



## 4 - Normalizing

The Whisper transcription is orthographic and thus ready to go - it’s formatted as we’d expect for a meeting transcription or dictation script with both punctuation and casing. On the contrary, we would need to use additional post-processing to restore punctuation and casing in our Wav2Vec2 predictions if we wanted to use it for downstream applications.

There is a happy medium between normalising and not normalising: we can train our systems on orthographic transcriptions, and then normalise the predictions and targets before computing the WER. This way, we train our systems to predict fully formatted text, but also benefit from the WER improvements we get by normalising the transcriptions.

The Whisper model was released with a normaliser that effectively handles the normalisation of casing, punctuation and number formatting among others. Let’s apply the normaliser to the Whisper transcriptions to demonstrate how we can normalise them:

#### The function should step-by-step:

a. Import the normalizer object

b. Normalized the prediction  

c. Compute the wer_metric

d. Repeat and compare with the reference normalization. Analyze

In [6]:
from whisper_normalizer.basic import BasicTextNormalizer
prediction = "He tells us that at this festive season of the year, with Christmas and roast beef looming before us, similarly is drawn from eating and its results occur most readily to the mind."

Great! We can see that the text has been fully lower-cased and all punctuation removed. Let’s now define the reference transcription and then compute the normalised WER between the reference and prediction:n

In [7]:
reference = "HE TELLS US THAT AT THIS FESTIVE SEASON OF THE YEAR WITH CHRISTMAS AND ROAST BEEF LOOMING BEFORE US SIMILES DRAWN FROM EATING AND ITS RESULTS OCCUR MOST READILY TO THE MIND"

6.25% - that’s about what we’d expect for the Whisper base model on the LibriSpeech validation set. As we see here, we’ve predicted an orthographic transcription, but benefited from the WER boost obtained by normalising the reference and prediction prior to computing the WER.

The choice of how you normalise the transcriptions is ultimately down to your needs. We recommend training on orthographic text and evaluating on normalised text to get the best of both worlds.

## 5 -  TTS

In [55]:
question = "Si vous deviez conclure une nouvelle assurance vie (3e pilier), auprès de quelle entreprise le feriez-vous ? Citez le nom de l’entreprise et expliquez pourquoi."

1 - Create a function "convert_openai" which take as an input the text of the quesiton and the language code

2 - The function return a tempfile we could display and play.

3 - Choose the most "User friendly" voice.

https://platform.openai.com/docs/guides/text-to-speech

In [59]:
openai_client = openai.OpenAI(api_key=OPENAI_API_KEY)

def convert_openai(text, language_code="fr-FR"):
    response = None ## To complete
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as temp_file:
        temp_file_path = temp_file.name
        response.stream_to_file(temp_file_path)

    return temp_file_path

q_path = convert_openai(question,language_code='fr-FR')
IPython.display.Audio(q_path)

  response.stream_to_file(temp_file_path)


In [12]:
question

'Si vous deviez conclure une nouvelle assurance vie (3e pilier), auprès de quelle entreprise le feriez-vous ? Citez le nom de l’entreprise et expliquez pourquoi.'

### 6 - CONTEXT-GENERATED QUESTIONS

1 - Propose a prompt to get in-depth insights about life insurance in switzerland.

2 - Use the most adapted techniques (cf: slide)

In [9]:
Prompt=""

In [10]:
questions = {"question_text": "Si vous deviez conclure une nouvelle assurance vie (3e pilier), auprès de quelle entreprise le feriez-vous ? Citez le nom de l’entreprise et expliquez pourquoi.",
             "question_generation": {"context":Prompt}}

In [108]:
questions

{'question_text': 'Si vous deviez conclure une nouvelle assurance vie (3e pilier), auprès de quelle entreprise le feriez-vous ? Citez le nom de l’entreprise et expliquez pourquoi.',
 'question_generation': {'context': "Tu es un spécialiste dans les études de marchés. Tu cherches à comprendre ce que les Vaudois pensent de l'entreprise Retraites Populaires. Le 3ème pilier est une solution d’épargne privée qui permet de compléter les revenus prévus par les 1er et 2e piliers, tout en vous faisant bénéficier d’avantages fiscaux. Tu effectues maintenant une interview avec un Vaudois pour mieux comprendre ce qu'il pense des assurance vie. La personne parle et ses paroles sont retranscrites en texte, c'est ce texte dont tu disposes. Tu effectues maintenant une interview avec un Vaudois pour mieux comprendre ce qu'il pense des assurance vie. Pose lui des questions en rapport avec ses réponses pour bien comprendre ce que représente pour lui une assurance vie (3e pilier) et ce qui le rend désirab

In [14]:
def get_context(generated_count
                 , context
                 , history):


    context_full = ("{}") .format(context, history[0]['content'])

    context = [
        {"role": "system", "content": context_full},
        # {"role": "user", "content": context_full}
    ]

    history = history[-2*(generated_count+1)+1:]
    
    for utterance in history:
        context.append(utterance)

    print(f'[PING-PONG][#GENERATED] {generated_count}')
    print(f'[PING-PONG][CONTEXT] {context}')
    return context

def _generate(context):
    response = openai_client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=context)
    question = response.choices[0].message.content
    return question


generated_count = 1
context = questions["question_generation"]["context"]

1) Insert the transcript correct transcript in the history and the question also to generate the "smart interview"

In [15]:
question = "Si vous deviez conclure une nouvelle assurance vie (3e pilier), auprès de quelle entreprise le feriez-vous ? Citez le nom de l’entreprise et expliquez pourquoi."
reference = "Retraites Populaires m'a accompagné dans le choix de mon troisième pilier dès le début et a su être patient et à l'écoute dans toute la démarche."

In [16]:
history = [{"role": "system", "content": question},
           {"role": "user", "content": reference}]

question_text = _generate(get_context(generated_count,
                                      context,
                                      history))
question_text

[PING-PONG][#GENERATED] 1
[PING-PONG][CONTEXT] [{'role': 'system', 'content': "Tu es un spécialiste dans les études de marchés. Tu cherches à comprendre ce que les Vaudois pensent de l'entreprise Retraites Populaires. Le 3ème pilier est une solution d’épargne privée qui permet de compléter les revenus prévus par les 1er et 2e piliers, tout en vous faisant bénéficier d’avantages fiscaux. Tu effectues maintenant une interview avec un Vaudois pour mieux comprendre ce qu'il pense des assurance vie. La personne parle et ses paroles sont retranscrites en texte, c'est ce texte dont tu disposes. Tu effectues maintenant une interview avec un Vaudois pour mieux comprendre ce qu'il pense des assurance vie. Pose lui des questions en rapport avec ses réponses pour bien comprendre ce que représente pour lui une assurance vie (3e pilier) et ce qui le rend désirable. Montre que tu as compris sa réponse et pose lui une question pour lui demander plus de détails, de profondeur et de précision sur l’entr

NameError: name 'openai_client' is not defined

### 7 -  ENTITY EXTRACTION

In [85]:
def get_clean_name(input_string):
    """
    Process the input string by:
    - Removing punctuation and figures
    - Keeping only letters
    - Converting to lowercase except for the first letter

    :param input_string: str
    :return: str
    """
    # Remove punctuation and figures
    cleaned_string = re.sub(f"[{string.punctuation}0-9]", "", input_string)
    
    # Convert to lowercase and capitalize the first letter
    processed_string = cleaned_string.strip().lower().capitalize()
    
    return processed_string
    
class WrongExtractionAnswerFormat(Exception):
    def __init__(self, answer):
        base_message = ("Could not parse Entity Extraction Answer from API: {}. "
                        "Should be : <status> <entity>")
        self.message = base_message.format(answer)
        super().__init__(self.message)

def parse_entity_with_status_ok(raw_entity, entity_kind):
    if(entity_kind==LIST_ENTITY_NAME):
        entity_list = raw_entity.strip().split(",")
        entity_list = [get_clean_name(item) for item in entity_list]
        entity = entity_list
    else:
        entity = get_clean_name(raw_entity)
    return entity


In [86]:
def parse_entity_extraction(answer, entity_kind):
    status = answer[""]
    if(status == ""):
        raw_entity = answer[""] # there should be a space but since we strip afterwards it is ok
        entity = parse_entity_with_status_ok(raw_entity, entity_kind)
        return True, entity
    elif(status == ""):
        return False, ""
    raise WrongExtractionAnswerFormat(answer)
    
SINGLE_ENTITY_NAME = "SINGLE"
LIST_ENTITY_NAME = "LIST"

Define the extraction prompt which should take as an input the 4 key sentences below in order to trigger the assurances name.

In [None]:
EXTRACTION_FIRST_USER_CONTENT = () 

In [89]:
Answer = Reference

In [None]:
entity_description = "Assurance que l'utilisateur cite"
entity_kind = "SINGLE"
fallback_entity = "Retraites Populaires"
entity_name = "entity"
answer_format = "<entity>"
complete_answer_format = f"{answer_format}"

In [130]:
prompt_full = EXTRACTION_FIRST_USER_CONTENT.format(entity_name,
                                                   entity_description,
                                                   complete_answer_format,
                                                   entity_name)

context = [{"role": "system", "content": prompt_full},
           {"role": "user", "content": Answer}]

answer = _generate(context)

is_successful, entity  = parse_entity_extraction(answer, entity_kind)

In [103]:
entity

'Retraites populaires'

### 8 - FULL AI VOCAL AGENT PROTOTYPE

In [None]:
!streamlit run interviewer.py

[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8503[0m
[34m  Network URL: [0m[1mhttp://192.168.1.28:8503[0m
[0m
[34m[1m  For better performance, install the Watchdog module:[0m

  $ xcode-select --install
  $ pip install watchdog
            [0m

`from langchain_community.callbacks import get_openai_callback`.

To install langchain-community run `pip install -U langchain-community`.

`from langchain_community.chat_models import ChatOpenAI`.

To install langchain-community run `pip install -U langchain-community`.
  warn_deprecated(

`from langchain_community.callbacks import get_openai_callback`.

To install langchain-community run `pip install -U langchain-community`.

`from langchain_community.chat_models import ChatOpenAI`.

To install langchain-community run `pip install -U langchain-community`.

`from langchain_community.callbacks import get_openai_callback`.

To install langchain-community run `pi

### 9 - SENTIMENT ANALYSIS

1) Create your own sentiment extraction pipe line based on the learnings and LLMs architecture.
2) langchain is very useful for this.

In [14]:
class Sentiment():
    def __init__(self, model="gpt-3.5-turbo"):
        self.model = model
        self.client_path = f"REPONSES Q1/{self.client_name}"
        self.language = 'fr-FR'
        self.conversation = []
        self.user_id = {}
        self.themes = ["Price",
                       "Quality"]
        self.survey_theme = "insurance"
        self.themes_desc = {"Price": f"few words extracted from multiple part of the response + context giving the human perception about the price of {self.survey_theme}. For example the user can say ['fair price','sometimes expensive'] for: What do you think about regional product ? the price is fair because it rewards local farms but it could be expensive sometimes if nothing in the conversation relates the price, returns an empy list. Dont create a fake list. Create the list in the same language as the conversation.",
                            "Quality": f"few words extracted from multiple part of the response + context giving the human perception about the {self.survey_theme} quality. For Example: There's a noticeable difference in taste and texture compared to mass-produced items  could lead to extraction in the format: ['high quality', 'noticeable difference in taste and texture']. If nothing in the conversation relates the quality, returns an empy list. Create the list in the same language as the conversation.",
        self.extracted_themes = []
        self.llm = ChatOpenAI(temperature=0, model=LLM_NAME, openai_api_key=openai.api_key)
