<div style="  background: linear-gradient(145deg, #0f172a, #1e293b);  border: 4px solid transparent;  border-radius: 14px;  padding: 18px 22px;  margin: 12px 0;  font-size: 26px;  font-weight: 600;  color: #f8fafc;  box-shadow: 0 6px 14px rgba(0,0,0,0.25);  background-clip: padding-box;  position: relative;">  <div style="    position: absolute;    inset: 0;    padding: 4px;    border-radius: 14px;    background: linear-gradient(90deg, #06b6d4, #3b82f6, #8b5cf6);    -webkit-mask:       linear-gradient(#fff 0 0) content-box,       linear-gradient(#fff 0 0);    -webkit-mask-composite: xor;    mask-composite: exclude;    pointer-events: none;  "></div>    <b>Creating Transcription Helper Functions & Spoken Language Processing</b>    <br/>  <span style="color:#9ca3af; font-size: 18px; font-weight: 400;">(Spoken Language Processing in Python)</span></div>

## Table of Contents

1. [Exploring Audio Files](#section-1)
2. [Preparing for the Proof of Concept](#section-2)
3. [Creating Helper Functions](#section-3)
4. [Sentiment Analysis on Spoken Language](#section-4)
5. [Named Entity Recognition (NER) with spaCy](#section-5)
6. [Classifying Transcribed Speech with Sklearn](#section-6)
7. [Conclusion](#section-7)

***

<a id="section-1"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 1. Exploring Audio Files</span><br>

Before diving into complex processing, the first step in any audio pipeline is inspecting the data source. We utilize Python's built-in `os` module to navigate the file system and identify the audio files available for processing.

### Inspecting the Directory
The following code lists the contents of the directory containing the audio files.



In [None]:
# Import os module
import os

# Check the folder of audio files
# Assuming the files are stored in a directory named "acme_audio_files"
audio_files = os.listdir("acme_audio_files")

print(audio_files)



**Expected Output:**


```text
['call_1.mp3', 'call_2.mp3', 'call_3.mp3', 'call_4.mp3']
```



<div style="background: #e0f2fe; border-left: 16px solid #0284c7; padding: 14px 18px; border-radius: 8px; font-size: 18px; color: #075985;"> ðŸ’¡ <b>Tip:</b> Note that the files are in <code>.mp3</code> format. Most speech recognition libraries prefer or require <code>.wav</code> format, which we will address in the next section. </div>

***

<a id="section-2"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 2. Preparing for the Proof of Concept</span><br>

To perform speech recognition, we need to convert our compressed audio (`.mp3`) into a lossless format (`.wav`). We use `pydub` for audio manipulation and `speech_recognition` for the actual transcription.

### Conversion and Transcription Workflow
1.  **Import**: Load the MP3 file using `AudioSegment`.
2.  **Export**: Save the file as a WAV.
3.  **Recognize**: Use the `Google Web Speech API` via the `speech_recognition` library to transcribe the audio.



In [None]:
import speech_recognition as sr
from pydub import AudioSegment

# 1. Import call 1 and convert to .wav
# Load the mp3 file
call_1 = AudioSegment.from_file("acme_audio_files/call_1.mp3")

# Export as wav
call_1.export("acme_audio_files/call_1.wav", format="wav")

# 2. Transcribe call 1
recognizer = sr.Recognizer()

# Load the newly created wav file into the recognizer
call_1_file = sr.AudioFile("acme_audio_files/call_1.wav")

with call_1_file as source:
    # Record the audio data from the file
    call_1_audio = recognizer.record(call_1_file)

# Recognize the speech using Google's API
text = recognizer.recognize_google(call_1_audio)
print(text)



***

<a id="section-3"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 3. Creating Helper Functions</span><br>

To make our workflow scalable, we will encapsulate the logic into reusable functions. We will create three specific functions:

1.  `convert_to_wav()`: Handles file format conversion.
2.  `show_pydub_stats()`: Displays technical audio attributes.
3.  `transcribe_audio()`: Handles the transcription process.

### Function 1: File Format Conversion
This function takes a filename, converts it to `.wav`, and saves it.



In [None]:
from pydub import AudioSegment

def convert_to_wav(filename):
    """Takes an audio file of non .wav format and converts to .wav"""
    
    # Import audio file
    audio = AudioSegment.from_file(filename)
    
    # Create new filename by replacing extension
    new_filename = filename.split(".")[0] + ".wav"
    
    # Export file as .wav
    audio.export(new_filename, format="wav")
    
    print(f"Converting {filename} to {new_filename}...")

# Usage Example
convert_to_wav("acme_audio_files/call_1.mp3")



### Function 2: Audio Statistics
This function inspects the `.wav` file and prints metadata like channels, sample rate, and duration.



In [None]:
def show_pydub_stats(filename):
    """Returns different audio attributes related to an audio file."""
    
    # Create AudioSegment instance
    audio_segment = AudioSegment.from_file(filename)
    
    # Print attributes
    print(f"Channels: {audio_segment.channels}")
    print(f"Sample width: {audio_segment.sample_width}")
    print(f"Frame rate (sample rate): {audio_segment.frame_rate}")
    print(f"Frame width: {audio_segment.frame_width}")
    print(f"Length (ms): {len(audio_segment)}")
    print(f"Frame count: {audio_segment.frame_count()}")

# Usage Example
show_pydub_stats("acme_audio_files/call_1.wav")



### Function 3: Transcription
This function wraps the `speech_recognition` logic to return text from a `.wav` file.



In [None]:
import speech_recognition as sr

def transcribe_audio(filename):
    """Takes a .wav format audio file and transcribes it to text."""
    
    # Setup a recognizer instance
    recognizer = sr.Recognizer()
    
    # Import the audio file and convert to audio data
    audio_file = sr.AudioFile(filename)
    
    with audio_file as source:
        audio_data = recognizer.record(audio_file)
        
    # Return the transcribed text
    return recognizer.recognize_google(audio_data)

# Usage Example
transcription = transcribe_audio("acme_audio_files/call_1.wav")
print(transcription)



***

<a id="section-4"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 4. Sentiment Analysis on Spoken Language</span><br>

Once we have text, we can analyze the sentiment (positive, negative, or neutral) of the conversation. We use the **NLTK** library and the **VADER** lexicon.

### Installation and Setup
First, ensure the necessary NLTK data is downloaded.



In [None]:
# Install NLTK if not already installed
# !pip install nltk

import nltk

# Download required NLTK packages
nltk.download("punkt")
nltk.download("vader_lexicon")



### Using VADER for Sentiment Analysis
We use the `SentimentIntensityAnalyzer` to get polarity scores.



In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Create sentiment analysis instance
sid = SentimentIntensityAnalyzer()

# Test sentiment analysis on negative text
print(sid.polarity_scores("This customer service is terrible."))



**Output:**


```text
{'neg': 0.437, 'neu': 0.563, 'pos': 0.0, 'compound': -0.4767}
```



### Analyzing Transcribed Text
We can apply this to the text we transcribed earlier.



In [None]:
# Assuming 'call_3_channel_2.wav' exists and was transcribed
call_3_channel_2_text = transcribe_audio("acme_audio_files/call_3_channel_2.wav")

print(f"Text: {call_3_channel_2_text}")

# Sentiment analysis on the full text
print(sid.polarity_scores(call_3_channel_2_text))



### Sentence-by-Sentence Analysis
Analyzing the whole block of text can average out emotions. It is often better to tokenize the text into sentences and analyze them individually.



In [None]:
from nltk.tokenize import sent_tokenize

call_3_paid_api_text = "Okay. Yeah. Hi, Diane. This is paid on this call and obviously the status of my orders at three weeks ago, and that service is terrible. Is this any better? Yes..."

# Find sentiment on each sentence
for sentence in sent_tokenize(call_3_paid_api_text):
    print(sentence)
    print(sid.polarity_scores(sentence))



***

<a id="section-5"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 5. Named Entity Recognition (NER) with spaCy</span><br>

Named Entity Recognition (NER) helps us identify real-world objects like people, companies, dates, and locations in the text. We use the **spaCy** library for this.

### Installation and Setup



In [None]:
# Install spaCy
# !pip install spacy
# !python -m spacy download en_core_web_sm

import spacy

# Load spaCy language model
nlp = spacy.load("en_core_web_sm")



### Basic Tokenization and NER
We process a text string to create a spaCy `Doc` object, which automatically tokenizes the text and identifies entities.



In [None]:
# Create a spaCy doc
doc = nlp("I'd like to talk about a smartphone I ordered on July 31st from your Sydney store, my order number is 40939440. I spoke to Georgia about it last week.")

# Show different tokens and positions
print("--- Tokens ---")
for token in doc:
    print(token.text, token.idx)

# Show sentences
print("\n--- Sentences ---")
for sentence in doc.sents:
    print(sentence)



### Built-in Named Entities
spaCy comes with several built-in entity types:

| Entity Type | Description |
| :--- | :--- |
| **PERSON** | People, including fictional. |
| **ORG** | Companies, agencies, institutions, etc. |
| **GPE** | Countries, cities, states. |
| **PRODUCT** | Objects, vehicles, foods, etc. (Not services). |
| **DATE** | Absolute or relative dates or periods. |
| **TIME** | Times smaller than a day. |
| **MONEY** | Monetary values, including unit. |
| **CARDINAL** | Numerals that do not fall under another type. |



In [None]:
# Find named entities in doc
print("\n--- Entities ---")
for entity in doc.ents:
    print(entity.text, entity.label_)



### Creating Custom Named Entities
Sometimes the default model misses specific terms (like "smartphone" as a PRODUCT). We can add custom rules to the pipeline using `EntityRuler`.



In [None]:
from spacy.pipeline import EntityRuler

# Check current pipeline
print("Original Pipeline:", nlp.pipeline)

# Create EntityRuler instance
ruler = EntityRuler(nlp)

# Add token pattern to ruler
# We want to label "smartphone" as a "PRODUCT"
ruler.add_patterns([{"label": "PRODUCT", "pattern": "smartphone"}])

# Add new rule to pipeline before the default 'ner' component
# Note: In newer spaCy versions, use nlp.add_pipe("entity_ruler", before="ner")
# The syntax below follows the PDF's version for compatibility
try:
    nlp.add_pipe(ruler, before="ner")
except TypeError:
    # Fallback for newer spaCy versions
    nlp.add_pipe("entity_ruler", before="ner", config={"overwrite_ents": True}).add_patterns([{"label": "PRODUCT", "pattern": "smartphone"}])

# Check updated pipeline
print("Updated Pipeline:", nlp.pipeline)

# Test new entity rule
doc = nlp("I'd like to talk about a smartphone I ordered on July 31st...")
for entity in doc.ents:
    print(entity.text, entity.label_)



***

<a id="section-6"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 6. Classifying Transcribed Speech with Sklearn</span><br>

In this final section, we build a machine learning pipeline to classify phone calls into categories (e.g., "post_purchase" vs "pre_purchase") based on their transcribed text.

### Data Preparation
We assume we have two folders: `post_purchase` and `pre_purchase`, containing audio files.



In [None]:
import os

# 1. Inspect audio folder
post_purchase_audio = os.listdir("post_purchase")
print(post_purchase_audio[:5])

# 2. Convert all files to wav (using our helper function)
for file in post_purchase_audio:
    # Note: In a real scenario, ensure full path is passed
    # convert_to_wav(os.path.join("post_purchase", file))
    print(f"Converting {file} to .wav...")

# 3. Function to transcribe all files in a folder
def create_text_list(folder):
    text_list = []
    # Loop through folder
    for file in folder:
        # Check for .wav extension
        if file.endswith(".wav"):
            # Transcribe audio (Mocking the path for this example)
            # text = transcribe_audio(os.path.join(folder_path, file))
            text = "mock transcription text" 
            text_list.append(text)
    return text_list

# Generate lists (Mock data for demonstration based on slides)
post_purchase_text = [
    "hey man I just bought a product from you guys",
    "these clothes I just bought are too small",
    "I recently got these pair of shoes but they're too big",
    "I bought a pair of pants and they are wrong color"
]

pre_purchase_text = [
    "hey I was wondering if you know where my new phone is",
    "do you have this in stock",
    "how much does this cost"
]



### Organizing Data with Pandas
We combine the text lists into a single DataFrame with labels.



In [None]:
import pandas as pd

# Create post purchase dataframe
post_purchase_df = pd.DataFrame({"label": "post_purchase", "text": post_purchase_text})

# Create pre purchase dataframe
pre_purchase_df = pd.DataFrame({"label": "pre_purchase", "text": pre_purchase_text})

# Combine pre purchase and post purchase
df = pd.concat([post_purchase_df, pre_purchase_df])

# View the combined dataframe
print(df.head())



### Building the Text Classifier
We use a **Naive Bayes** classifier within a Scikit-Learn **Pipeline**.



In [None]:
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.model_selection import train_test_split

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    df["text"],
    df["label"],
    test_size=0.3,
    random_state=42
)

# Create text classifier pipeline
text_classifier = Pipeline([
    ("vectorizer", CountVectorizer()),
    ("tfidf", TfidfTransformer()),
    ("classifier", MultinomialNB())
])

# Fit the classifier pipeline on the training data
text_classifier.fit(X_train, y_train)



### Evaluating the Model
Finally, we test the accuracy of our model.



In [None]:
# Make predictions and compare them to test labels
predictions = text_classifier.predict(X_test)

# Calculate accuracy
accuracy = 100 * np.mean(predictions == y_test)
print(f"The model is {accuracy:.2f}% accurate.")



***

<a id="section-7"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 7. Conclusion</span><br>

### Summary of Achievements
In this notebook, we have successfully built a comprehensive Spoken Language Processing pipeline:
1.  **Audio Conversion**: We used `pydub` to convert compressed `.mp3` files into `.wav` format suitable for analysis.
2.  **Transcription**: We utilized `speech_recognition` and the Google Web Speech API to convert audio soundwaves into text.
3.  **Audio Manipulation**: We created helper functions to automate conversion and inspect audio statistics.
4.  **NLP Pipeline**: We applied `NLTK` for sentiment analysis, `spaCy` for Named Entity Recognition (including custom entities), and `scikit-learn` to classify the intent of the calls.

### Next Steps
To further advance your skills:
*   **Project**: Apply these techniques to your own dataset of audio files.
*   **Live Audio**: Explore the `speech_recognition.Microphone()` class to transcribe audio in real-time.
*   **Advanced Models**: Experiment with different classifiers (e.g., Support Vector Machines) or deep learning models for improved accuracy.



In [None]:
# One last transcription to celebrate!
# one_last_transcription = transcribe_audio("congratulations.wav")
print("Congratulations on finishing the Spoken Language Processing with Python course!")
print("You should be proud.")
print("Now get out there and recognize some speech!")
