# Exploring Azure Text Analytics for Chapter Titles

## Setting up Speech SDK and Speech Recognizer

In [1]:
import os
import time
from dotenv import load_dotenv
import azure.cognitiveservices.speech as speechsdk

In [2]:
# Set up subscription info for the Speech Service
load_dotenv() # Load environment variables such as Speech SDK API keys
AZURE_SPEECH_KEY = os.getenv("SPEECHSDK_API_KEY")
AZURE_SERVICE_REGION = os.getenv("SPEECHSDK_REGION")

In [48]:
import re
import textgrid

def extract_text_from_textgrid(file_path, tier_name=None):
    # Load the TextGrid file
    tg = textgrid.TextGrid.fromFile(file_path)
    
    # Find the relevant tier (if tier_name is provided)
    if tier_name:
        tier = tg.getFirst(tier_name)
    else:
        # If no specific tier is mentioned, extract from the first tier
        tier = tg[0]

    # Extract the intervals with text and concatenate them
    extracted_text = []
    for interval in tier:
        if interval.mark.strip():  # Only consider non-empty intervals
            extracted_text.append(interval.mark)
    
    # Return all the concatenated text
    return " ".join(extracted_text)

def remove_intents(text):
    """
    Remove intents marked by <UNSURE>, <UNIN/>, etc. from the text.
    """
    # Regular expression pattern to match any text within angle brackets including the brackets
    pattern = r'<[^>]*>'
    # Use re.sub to replace matches with an empty string
    cleaned_text = re.sub(pattern, '', text)
    # Optionally, you can remove extra spaces left after removing tags
    cleaned_text = re.sub(r'\s+', ' ', cleaned_text).strip()
    return cleaned_text

In [49]:
# Initialize Speech Service
def initialize_speech_service(audio_file_path):
    # speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SERVICE_REGION)
    speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SERVICE_REGION)
    audio_config = speechsdk.audio.AudioConfig(filename=audio_file_path)
    return speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

# Perform transcription on the audio file
def transcribe_continuous_audio_file(recognizer):
    recognized_speech = []

    # Set a variable to manage the state of transcription
    done = False

    def handle_recognized(evt):
        print(f"Recognized: {evt.result.text}")
        recognized_speech.append(evt.result.text)
    
    def handle_canceled(evt):
        print(f"Recognition canceled: {evt.result.reason}")
        if evt.result.reason == speechsdk.CancellationReason.Error:
            print(f"Error details: {evt.result.error_details}")

        recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    # Attach handlers for recognized results and any cancellations
    recognizer.recognized.connect(handle_recognized)
    recognizer.canceled.connect(handle_canceled)
    
    # Start continuous recognition
    recognizer.start_continuous_recognition() # transcribe longer audio sessions or multiple sentences and handle the results asynchronously.
    print("Transcribing...")

    # Wait for completion (i.e., done = True)
    try:
        import time
        while not done:
            time.sleep(.5)
    except KeyboardInterrupt:
        print("Transcription stopped by user.")
        recognizer.stop_continuous_recognition()

    return " ".join(recognized_speech)  # Return all the transcriptions concatenated

In [None]:
# recognizer = initialize_speech_service('../data/primock/day1_consultation01_doctor.wav') 
# hypothesis_text = transcribe_continuous_audio_file(recognizer)

In [5]:
hypothesis_text = "Hello. Hi. Yeah. OK. Hello. Good morning. How can I help you this morning? Yeah, I'm sorry to hear that. And when you say diarrhoea, what do you mean by diarrhea? Do you mean you're going to the toilet more often or are your stools more loose? OK. And how many times a day are you going, let's say over the last couple of days? 6-7 times a day and you mentioned this mainly water tree. Have you noticed any other things like blood in your stools? OK. And you mentioned you've had some pain in your tummy as well. Whereabouts is the pain exactly? One side. And what side is that? That's right. OK. And can you describe the pain to me? OK. And there's a pain. Is that is it there all the time or does it come and go? Does the pain move anywhere else because on between your back? OK, fine. And you mentioned you've been feeling quite weak and shaky as well. What do you mean by shaky? Do you mean you've been having, have you been feeling feverish, for example? Measure your temperature then. OK OK. Any other symptoms like sweating or night? No. And any vomiting at all? You stop vomiting again and would you vomit? I know it's not nice thing to talk about but was it just normal food colour? Yeah. And there was no blood in your vomits, is that right? No. OK. And any other symptoms at all? So you mentioned the tummy pain, you mentioned diarrhoea, you mentioned the vomiting. Anything else that comes to mind? OK. OK, so you're drinking fluids. What kind of foods have you managed to eat? Anything? OK, fine. And so we started three days ago. The symptoms, are you aware of any triggers which may have caused the symptoms to kick on? So for example, things like take away foods or eating out or being around other people who would similar symptoms? Thank you. Do you remember where you ate? OK anyone else unwell with similar symptoms? OK, OK, fine. Right. And in terms of your your overall health, are you normally fitting well or? OK. And did you asthma well controlled? MMM. And you don't have any other tummy problem? Bowel problems I should be aware of, no. OK. And apart from the inhalers, do you take any other medications? OK, fine. And in terms of just your your day-to-day life, you said it's been affecting your life. In what way has it been affecting your life? OK. Yeah. And have you are you currently working at the moment? Would work. OK. Have you been going into work the last three days? Have you been at home? OK, that must be difficult for you then. Right. And you said, you mentioned you live with a wife and two children, is that right? All right, just a couple of other questions we need to ask. Do you smoke at all? And do you drink much more of alcohol? OK, so. Normally at this stage I'd like to examine you if that's OK but but haven't listened to your stories. I think just to recap, for the last three days you've been having loose dual diarrhea, a bit of tummy pain mainly on the left hand side and vomiting and feeling quite weak and lethargic. You mentioned you're having this Chinese take away as well three days ago and I wonder whether that might be the cause of your problems. It seems like you may have something. Called gastroenteritis. Which essentially just the tummy bug or infection of your of your tummy. Mainly caused by viruses, but they can be a possibility of bacteria causing your symptoms. At this stage, what what we'd recommend is just what we say conservative management. So I don't think you need anything like antibiotics is really just making sure you're well hydrated, so drinking plenty of fluids. There are things like diolite you can get from the pharmacy which it's it helps replenish some of your minerals and vitamins. And if you are having vomiting and diarrhea, I would soon recommend that first, you know, first couple of days. If you are feeling feverish and weak, taking some paracetamol, 2 tablets up to four times a day for the first few days can also help. I'll certainly advise you to take some type of work. Actually I know you're quite keen to work. I would say next to two to three days as clears from your system. Just take some time off and rest. You know, if your symptoms haven't got better, you know, in in three to four days, I'd like to come back and see you again. Because if it is ongoing, then we have to wonder whether there's something else causing symptoms and we may need to do further tests like taking a sample of your store so we can test that, etcetera, etcetera. How's that sound? Do you have any questions for me? OK. Is it, is the treatment plan clear? Great. Well, I wish you all the best. Thank you. Bye bye."

In [6]:
hypothesis_text

"Hello. Hi. Yeah. OK. Hello. Good morning. How can I help you this morning? Yeah, I'm sorry to hear that. And when you say diarrhoea, what do you mean by diarrhea? Do you mean you're going to the toilet more often or are your stools more loose? OK. And how many times a day are you going, let's say over the last couple of days? 6-7 times a day and you mentioned this mainly water tree. Have you noticed any other things like blood in your stools? OK. And you mentioned you've had some pain in your tummy as well. Whereabouts is the pain exactly? One side. And what side is that? That's right. OK. And can you describe the pain to me? OK. And there's a pain. Is that is it there all the time or does it come and go? Does the pain move anywhere else because on between your back? OK, fine. And you mentioned you've been feeling quite weak and shaky as well. What do you mean by shaky? Do you mean you've been having, have you been feeling feverish, for example? Measure your temperature then. OK OK. A

## Set up Azure OpenAI

In [78]:
# Set up OpenAI API key and endpoint
openai_api_key = 'OPENAI_API_KEY'  # Replace with your API key
openai_api_base = "OPENAI_API_ENDPOINT"  # Replace with your resource's URL

In [8]:
from openai import AzureOpenAI

# Set up Azure OpenAI client
client = AzureOpenAI(
    api_version="2024-02-01", # Make sure to use the correct API version
    api_key=openai_api_key,
    azure_endpoint=openai_api_base,
)

### Generate Chapter Title by Parsing Text into GPT-4o-Mini

GPT-4o-Mini is used for generating chapter title as it is the cheapest GPT model and it is sufficient for this function.

In [9]:
# Function to generate chapter titles using Azure OpenAI
def generate_chapter_title_nb(text):
    # The prompt asks for a brief summary suitable as a chapter title
    messages = [
        {
            "role": "system",
            "content": "You are an assistant that generates concise chapter titles based on text."
        },
        {
            "role": "user",
            "content": f"Generate a concise chapter title for the following text:\n\n{text}. Don't include a heading such as 'Chapter Title:'."
        }
    ]

    # Request to the Azure OpenAI API using chat completions
    completion = client.chat.completions.create(
        model="gpt-4o-mini",  # Replace with your Azure GPT model deployment name, e.g., 'gpt-35-turbo'
        messages=messages,
        max_tokens=20,  # Limit tokens for shorter, concise titles
        temperature=0.7,  # Adjust to control creativity level
        n=1,  # Number of responses
    )

    # Extract and return the generated title
    title = completion.choices[0].message.content.strip()
    return title

# Generate a chapter title based on the transcribed text
chapter_title = generate_chapter_title_nb(hypothesis_text)
print(f"Generated Chapter Title: {chapter_title}")

Generated Chapter Title: Consultation on Gastroenteritis Symptoms


## Using Function from chaptertitle.py

In [10]:
import sys

sys.path.append(os.path.abspath('..'))
from chaptertitle import generate_chapter_title

In [11]:
document_text = """
In this chapter, we discuss the importance of data privacy in modern digital systems. 
We explore various encryption techniques and their roles in securing user data.
"""

In [12]:
chapter_title_fn_test = generate_chapter_title(hypothesis_text, openai_api_key, openai_api_base)
print(f"Test Generate Chapter Title Function: {chapter_title_fn_test}")

Test Generate Chapter Title Function: Gastroenteritis: Diagnosis and Management


In [13]:
chapter_title_fn_test2 = generate_chapter_title(document_text, openai_api_key, openai_api_base)
print(f"Test Generate Chapter Title Function: {chapter_title_fn_test2}")

Test Generate Chapter Title Function: Securing User Data: The Role of Encryption in Privacy


### Generate multiple headings based on context and insert into transcript

In [14]:
sample_meeting = "Good morning, everyone! Thank you for joining this meeting. Today, we’ll discuss the latest advancements in technology that could impact our product line and explore how we can integrate these innovations into our strategy. Morning, Alice! I’m looking forward to hearing everyone’s thoughts. Me too! I think there’s a lot we can leverage in our upcoming campaigns. Absolutely, let’s make sure we cover how these advancements align with our overall goals. To kick things off, I want to highlight a few trends I’ve seen in the industry. First, there’s been significant progress in artificial intelligence and machine learning, particularly in predictive analytics. This could help us better understand customer behavior. That’s interesting, Alice. I’ve been reading about AI models that can analyze user data in real-time. Implementing this could enhance our product features considerably. For marketing, this could mean more personalized campaigns. Imagine being able to tailor offers based on user behavior in real time. While the potential is exciting, we should also address the challenges of integration. What do you think, Bob? One major concern is ensuring that our existing infrastructure can support these new technologies. We might need to upgrade our servers or invest in cloud solutions. That’s a valid point. We also need to consider the learning curve for our team. Training will be essential for effective implementation. Speaking of training, I’ve done some market research. Customers are increasingly looking for products that leverage AI. They want to see tangible benefits. That’s encouraging to hear. We need to position ourselves as a leader in adopting these technologies. Let’s explore what our competitors are doing as well. To wrap up, I suggest we form a small task force to dive deeper into these technologies and assess feasibility. Bob, can you lead that group? Sure! I’ll gather a few more team members and set up a brainstorming session. I’d like to be involved, especially to discuss how we can market these advancements effectively. Great! Let’s aim to reconvene next week with our findings. Thank you all for your insights today. I’m excited about the possibilities ahead. Let’s keep the momentum going! Thanks, everyone! Looking forward to the next steps. Me too! Have a great day! Bye, everyone!"

In [32]:
# Function to generate chapter titles using Azure OpenAI
def generate_multiple_chapter_titles_nb(text):
    # The prompt asks for chapter titles based on the provided transcript
    messages = [
        {
            "role": "system",
            "content": "You are an assistant that generates concise chapter titles for transcripts. Your task is to analyze the text and create relevant titles based on context changes."

        },
        {
            "role": "user",
            "content": (
                f"Generate concise chapter titles for the following text:\n\n{text}. Each title should reflect a significant change in context. Pick out important titles at your discretion."
                "After generating the titles, update them into the full transcript, with every single word, at the appropriate locations. Only show the full transcript with titles."
                "Here is a sample transcript from a project meeting, the format should be something like this:\n\n"
                
                "1. Opening and Introductions\n"
                "Hi, everyone. Thanks for joining the meeting today. Let's go around and introduce ourselves quickly. I'm John, leading the project.\n\n"
                
                "2. Project Overview\n"
                "As you know, we're working on the new app update, so I just want to give a quick overview of where we are. The design phase is complete, and the dev team has started coding.\n\n"
                
                "3. Current Challenges\n"
                "We've hit a couple of issues with the backend integration. Sarah, could you update us on the server-side challenges?\n\n"
                
                "4. Team Member Updates\n"
                "Now let's hear from the rest of the team. Mark, can you give us an update on the frontend progress?\n\n"
                
                "5. Future Steps and Deadlines\n"
                "We need to finalize the testing timeline. Everyone should aim to complete their sections by the end of next week. Any blockers we should address?\n\n"
                
                "6. Closing Remarks\n"
                "Great, thanks for all the updates. Let's regroup next Friday and ensure we're on track for the final release. Have a great weekend!"
            )
        }

    ]

    # Request to the Azure OpenAI API using chat completions
    completion = client.chat.completions.create(
        model="gpt-4o-mini",  # Replace with your Azure GPT model deployment name
        messages=messages,
        #max_tokens=200,  # Limit tokens for a concise response
        temperature=0.7,  # Adjust to control creativity level
        n=1,  # Number of responses
    )

    # Extract and return the generated titles
    titles = completion.choices[0].message.content.strip()
    return titles


In [33]:
# Generate a formatted transcript with chapter titles based on the transcribed text
formatted_text = generate_multiple_chapter_titles_nb(hypothesis_text)
print(formatted_text)

1. Initial Inquiry  
Hello. Hi. Yeah. OK. Hello. Good morning. How can I help you this morning?

2. Symptom Assessment  
Yeah, I'm sorry to hear that. And when you say diarrhoea, what do you mean by diarrhea? Do you mean you're going to the toilet more often or are your stools more loose? OK. And how many times a day are you going, let's say over the last couple of days?

3. Pain and Additional Symptoms  
6-7 times a day and you mentioned this mainly watery. Have you noticed any other things like blood in your stools? OK. And you mentioned you've had some pain in your tummy as well. Whereabouts is the pain exactly? One side. And what side is that? That's right. OK. And can you describe the pain to me?

4. Further Symptom Exploration  
OK. And there's a pain. Is that is it there all the time or does it come and go? Does the pain move anywhere else, like between your back? OK, fine. And you mentioned you've been feeling quite weak and shaky as well. What do you mean by shaky? Do you mean

In [21]:
# Generate a formatted transcript with chapter titles based on the transcribed text
formatted_text2 = generate_multiple_chapter_titles_nb(sample_meeting)
print(formatted_text2)

1. Opening and Introductions  
Good morning, everyone! Thank you for joining this meeting. Today, we’ll discuss the latest advancements in technology that could impact our product line and explore how we can integrate these innovations into our strategy. Morning, Alice! I’m looking forward to hearing everyone’s thoughts. Me too! I think there’s a lot we can leverage in our upcoming campaigns.

2. Industry Trends in Technology  
Absolutely, let’s make sure we cover how these advancements align with our overall goals. To kick things off, I want to highlight a few trends I’ve seen in the industry. First, there’s been significant progress in artificial intelligence and machine learning, particularly in predictive analytics. This could help us better understand customer behavior.

3. AI and Real-Time Data Analysis  
That’s interesting, Alice. I’ve been reading about AI models that can analyze user data in real-time. Implementing this could enhance our product features considerably. For mark