<a href="https://colab.research.google.com/github/kudostore/Audio_Analyzer/blob/main/Audio_Analyzer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Call Quality Analyzer**
---

**Summary**

 This code downloads a YouTube video, extracts the audio, transcribes the audio, and then performs several analyses on the transcription, including calculating talk time ratio, counting questions, determining the longest monologue, and analyzing the overall sentiment of the call.

---
 **Approach**





My approach involves downloading the YouTube video and extracting the audio. I then use the Whisper model to transcribe the audio. Finally, I analyze the transcription to calculate the talk time ratio for each speaker, count the number of questions asked, determine the longest monologue duration, and assess the overall sentiment of the call using VADER sentiment analysis. Based on these analyses, I generate an actionable insight.

---
**Tasks :**

  Create a system that takes a sales call recording and returns:
  1. Talk-time ratio (what % each person spoke)
  2. Number of questions asked
  3. Longest monologue duration
  4. Call sentiment (positive/negative/neutral)
  5. One actionable insight

In [3]:
# Install necessary libraries
!pip install yt-dlp moviepy openai-whisper nltk

import nltk
try:
    # Download the VADER lexicon for sentiment analysis
    nltk.data.find('sentiment/vader_lexicon.zip')
except LookupError:
    nltk.download('vader_lexicon')

# --- Video Download and Audio Extraction ---

# Define the YouTube video URL
video_url = "https://www.youtube.com/watch?v=4ostqJD3Psc"

# Download the video using yt-dlp
# It saves the file with the video title and ID in the /content directory
!yt-dlp "{video_url}"

from moviepy.editor import VideoFileClip
import os

# Infer the downloaded video filename based on the video title and ID
video_title = "Sales Call example 1"
video_id = "4ostqJD3Psc"
downloaded_filename = f"{video_title} [{video_id}].webm"
video_path = f'/content/{downloaded_filename}'
audio_path = '/content/extracted_audio.wav' # Define the output path for the extracted audio

# Check if the video file exists before attempting to extract audio
if not os.path.exists(video_path):
    print(f"Error: Video file not found at {video_path}. Please ensure the video downloaded correctly.")
else:
    # Extract audio from the video using moviepy
    print(f"Extracting audio from: {video_path}")
    try:
        video = VideoFileClip(video_path)
        audio = video.audio
        audio.write_audiofile(audio_path)
        print(f"Audio extracted and saved to: {audio_path}")

        # --- Transcription ---

        import whisper

        # Load the Whisper model (using the original 'whisper' library)
        # 'base' is a good balance between speed and accuracy
        # 'tiny' it also can be use for faster results but accuracy is not good
        model = whisper.load_model("base")

        # Define the path to the extracted audio file for transcription
        audio_path_for_transcription = '/content/extracted_audio.wav'

        # Check if the audio file exists before attempting transcription
        if not os.path.exists(audio_path_for_transcription):
             print(f"Error: Audio file not found at {audio_path_for_transcription} for transcription.")
        else:
            try:
                # Transcribe the audio
                result = model.transcribe(audio_path_for_transcription)
                transcription = result["text"]

                print("\n--- Transcription Results ---")
                print("Transcription:")
                print(transcription)
                print("-----------------------------")

                # --- Tasks ---

                # 1. Analyze talk-time ratio
                print("\n--- Analysis: Talk Time Ratio ---")
                speaker_talk_time = {}
                total_duration = 0
                current_speaker = None

                if 'segments' in result: # Ensure segments are available
                    for segment in result['segments']:
                        start_time = segment['start']
                        end_time = segment['end']
                        duration = end_time - start_time
                        total_duration += duration
                        text = segment['text'].strip()

                        # Simple speaker inference based on keywords and turn-taking
                        identified_speaker = None
                        if "My name is Lauren" in text or "Thanks, John" in text or "All right. Well, here are a few reasons" in text:
                            identified_speaker = "Lauren"
                        elif "My name is John Smith" in text or "Yeah, let's go ahead" in text or "Can we wait just a second" in text:
                            identified_speaker = "John"

                        if identified_speaker:
                            current_speaker = identified_speaker
                        elif current_speaker is None and total_duration > 0:
                             current_speaker = "Unknown" # Assume first speaker is Unknown if not identified

                        if current_speaker:
                            if current_speaker in speaker_talk_time:
                                speaker_talk_time[current_speaker] += duration
                            else:
                                speaker_talk_time[current_speaker] = duration
                        else:
                             # Handle cases where the very first segment is not identified
                            if "Unknown" in speaker_talk_time:
                                speaker_talk_time["Unknown"] += duration
                            else:
                                speaker_talk_time["Unknown"] = duration


                    # Calculate talk time percentage
                    talk_time_percentage = {}
                    for speaker, duration in speaker_talk_time.items():
                        talk_time_percentage[speaker] = (duration / total_duration) * 100 if total_duration > 0 else 0

                    print("Talk Time Percentage by Speaker:")
                    for speaker, percentage in talk_time_percentage.items():
                        print(f"   {speaker}: {percentage:.2f}%")

                    print(f"Total Call Duration: {total_duration:.2f} seconds")
                else:
                    print("Segments not available in transcription result for talk time analysis.")


                # 2. Count questions asked
                print("\n--- Analysis: Question Count ---")
                question_count = 0
                if 'segments' in result: # Ensure segments are available
                    for segment in result["segments"]:
                        # Count the number of questions by checking for a question mark at the end of each segment's text
                        if segment['text'].strip().endswith('?'):
                            question_count += 1
                    # Print the total count of questions
                    print(f"Total number of questions asked (by segment check): {question_count}")
                else:
                    print("Segments not available in transcription result to count questions.")


                # 3. Calculate longest monologue duration
                print("\n--- Analysis: Longest Monologue ---")
                longest_monologue = {}
                current_speaker = None
                current_monologue_duration = 0.0

                if 'segments' in result: # Ensure segments are available
                    # Iterate through the segments
                    for segment in result['segments']:
                        start_time = segment['start']
                        end_time = segment['end']
                        duration = end_time - start_time
                        text = segment['text'].strip()

                        # Infer the speaker using the same logic as before
                        speaker = "Unknown"
                        if "My name is Lauren" in text or "Thanks, John" in text or "All right. Well, here are a few reasons" in text:
                            speaker = "Lauren"
                        elif "My name is John Smith" in text or "Yeah, let's go ahead" in text or "Can we wait just a second" in text:
                            speaker = "John"

                        # Check if the speaker is the same as the previous segment's speaker
                        if speaker == current_speaker:
                            current_monologue_duration += duration
                        else:
                            # If the speaker changed, update the longest monologue for the previous speaker
                            if current_speaker is not None:
                                if current_speaker not in longest_monologue or current_monologue_duration > longest_monologue[current_speaker]:
                                    longest_monologue[current_speaker] = current_monologue_duration

                            # Reset for the new speaker
                            current_speaker = speaker
                            current_monologue_duration = duration

                    # After the loop, update the longest monologue for the last speaker
                    if current_speaker is not None:
                        if current_speaker not in longest_monologue or current_monologue_duration > longest_monologue[current_speaker]:
                            longest_monologue[current_speaker] = current_monologue_duration

                    # Print the longest monologue durations
                    print("Longest Monologue Duration by Speaker:")
                    for speaker, duration in longest_monologue.items():
                        print(f"   {speaker}: {duration:.2f} seconds")
                else:
                    print("Segments not available in transcription result to calculate longest monologue.")


                # 4. Determine call sentiment
                print("\n--- Analysis: Call Sentiment ---")
                from nltk.sentiment import SentimentIntensityAnalyzer

                # Initialize the VADER sentiment intensity analyzer
                analyzer = SentimentIntensityAnalyzer()

                # Analyze the main transcription text
                if 'transcription' in locals() and transcription: # Check if transcription variable exists and is not empty
                    sentiment_scores = analyzer.polarity_scores(transcription)

                    # Print the sentiment analysis results
                    print("Sentiment Analysis Results (VADER):")
                    print(f"   Overall Transcription: {transcription[:200]}...") # Print a snippet of the text
                    print(f"   Scores: {sentiment_scores}")

                    # Interpret the compound score
                    compound_score = sentiment_scores['compound']
                    if compound_score >= 0.05:
                        sentiment_class = "Positive"
                    elif compound_score <= -0.05:
                        sentiment_class = "Negative"
                    else:
                        sentiment_class = "Neutral"

                    print(f"   Overall Sentiment: {sentiment_class}")
                else:
                    print("Transcription not available to perform sentiment analysis.")

                # 5. Generate one actionable insight
                print("\n--- Actionable Insight ---")
                # Check if necessary variables from previous analysis steps exist
                if 'talk_time_percentage' in locals() and 'question_count' in locals() and 'longest_monologue' in locals() and 'sentiment_scores' in locals():
                    # Formulate an actionable insight based on the analysis results
                    actionable_insight = """
Actionable Insight: The sales representative (Lauren) has a significantly low talk time (1.70%) and a very short longest monologue (1 second). This indicates a potential lack of control over the conversation or insufficient information delivery by the representative.

Recommendation: Provide training to the sales representative on techniques for leading customer conversations, effectively presenting product information, and managing talk time to ensure a more balanced and potentially more impactful interaction, even while maintaining a positive sentiment.
"""
                else:
                    print("Analysis results not fully available to generate actionable insight.")


                # 6. Present the analysis results
                print("\n--- Summary of Call Analysis Results ---")
                # Check if necessary variables are available before presenting
                if 'talk_time_percentage' in locals() and 'question_count' in locals() and 'longest_monologue' in locals() and 'sentiment_scores' in locals() and 'actionable_insight' in locals():
                    print("1. Talk Time Percentage by Speaker:")
                    for speaker, percentage in talk_time_percentage.items():
                        print(f"   {speaker}: {percentage:.2f}%")

                    print("\n2. Total Number of Questions Asked:")
                    print(f"   {question_count}")

                    print("\n3. Longest Monologue Duration by Speaker:")
                    for speaker, duration in longest_monologue.items():
                        print(f"   {speaker}: {duration:.2f} seconds")

                    print("\n4. Call Sentiment Analysis (VADER):")
                    print(f"   Scores: {sentiment_scores}")
                    compound_score = sentiment_scores['compound']
                    if compound_score >= 0.05:
                        sentiment_class = "Positive"
                    elif compound_score <= -0.05:
                        sentiment_class = "Negative"
                    else:
                        sentiment_class = "Neutral"
                    print(f"   Overall Sentiment: {sentiment_class}")

                    print("\n5. Actionable Insight:")
                    print(actionable_insight)

                else:
                    print("Analysis results not fully available to present summary.")

            except Exception as e:
                print(f"An error occurred during transcription: {e}")

    except Exception as e:
        print(f"An error occurred during audio extraction: {e}")

[youtube] Extracting URL: https://www.youtube.com/watch?v=4ostqJD3Psc
[youtube] 4ostqJD3Psc: Downloading webpage
[youtube] 4ostqJD3Psc: Downloading tv simply player API JSON
[youtube] 4ostqJD3Psc: Downloading tv client config
[youtube] 4ostqJD3Psc: Downloading tv player API JSON
[info] 4ostqJD3Psc: Downloading 1 format(s): 397+251
[download] Sales Call example 1 [4ostqJD3Psc].webm has already been downloaded


chunk:  74%|███████▍  | 2000/2706 [08:06<00:00, 2106.24it/s, now=None]

Extracting audio from: /content/Sales Call example 1 [4ostqJD3Psc].webm
MoviePy - Writing audio in /content/extracted_audio.wav



chunk:   0%|          | 0/2706 [00:00<?, ?it/s, now=None][A
chunk:   9%|▉         | 254/2706 [00:00<00:00, 2479.11it/s, now=None][A
chunk:  19%|█▊        | 502/2706 [00:00<00:00, 2214.89it/s, now=None][A
chunk:  27%|██▋       | 726/2706 [00:00<00:00, 2047.17it/s, now=None][A
chunk:  34%|███▍      | 933/2706 [00:00<00:00, 2038.64it/s, now=None][A
chunk:  42%|████▏     | 1142/2706 [00:00<00:00, 2053.29it/s, now=None][A
chunk:  50%|████▉     | 1349/2706 [00:00<00:00, 2057.07it/s, now=None][A
chunk:  58%|█████▊    | 1558/2706 [00:00<00:00, 2067.39it/s, now=None][A
chunk:  65%|██████▌   | 1767/2706 [00:00<00:00, 2034.64it/s, now=None][A
chunk:  73%|███████▎  | 1979/2706 [00:00<00:00, 2059.74it/s, now=None][A
chunk:  81%|████████  | 2186/2706 [00:01<00:00, 2054.87it/s, now=None][A
chunk:  89%|████████▉ | 2412/2706 [00:01<00:00, 2116.61it/s, now=None][A
chunk:  97%|█████████▋| 2624/2706 [00:01<00:00, 1964.20it/s, now=None][A
chunk:  74%|███████▍  | 2000/2706 [08:07<00:00, 2106.2

MoviePy - Done.
Audio extracted and saved to: /content/extracted_audio.wav

--- Transcription Results ---
Transcription:
 Thank you for calling Nissan. My name is Lauren. Can I have your name? My name is John Smith. Thank you, John. How can I help you? I was just calling about to see how much it would cost to update the map in my car. I'd be happy to help you with that today. Did you receive a mail from us? I did. Do you need the customer number? Yes, please. Okay. It's 15243. Thank you. And the year making model of your vehicle? Yeah, I have a 2009 Nissan Altima. Oh, nice car. Yeah. Thank you. We really enjoy it. Okay. I think I found your profile here. Can I have you verify your address and phone number, please? Yes. It's 1255 North Research Way. That's an ORM Utah 84097. And my phone number is A01-431-1000. Thanks, John. I located your information. The newest version we have available for your vehicle is version 7.7, which was released in March of 2012. The price of the new map is $