# Processing text transcribed from spoken language
>  In this chapter, you'll put everything you've learned together by building a speech processing proof of concept project for a technology company, Acme Studios. You'll start by transcribing customer support call phone call audio snippets to text. Then you'll perform sentiment analysis using NLTK, named entity recognition using spaCy and text classification using scikit-learn on the transcribed text.

- toc: true 
- badges: true
- comments: true
- author: Lucas Nunes
- categories: [Datacamp]
- image: images/datacamp/___

> Note: This is a summary of the course's chapter 4 exercises "Spoken Language Processing in Python" at datacamp. <br>[Github repo](https://github.com/lnunesAI/Datacamp/) / [Course link](https://www.datacamp.com/tracks/machine-learning-scientist-with-python)

In [4]:
%%capture
!pip install pydub
!pip install SpeechRecognition
!python -m spacy download en_core_web_sm
#restart the colab runtime

In [4]:
from pydub import AudioSegment
import speech_recognition as sr

## Creating transcription helper functions

### Converting audio to the right format

<div class=""><p>Acme Studios have asked you to do a proof of concept to find out more about their audio files.</p>
<p>After exploring them briefly, you find there's a few calls but they're in the wrong file format for transcription.</p>
<p>As you'll be interacting with many audio files, you decide to begin by creating some helper functions.</p>
<p>The first one, <code>convert_to_wav(filename)</code> takes a file path and uses <code>PyDub</code> to convert it from a non-wav format to <code>.wav</code> format.</p>
<p>Once it's built, we'll use the function to convert <a href="https://assets.datacamp.com/production/repositories/4637/datasets/83ef1650407e911a0f52f491068e3082661db743/ex4_call_1_stereo_mp3.mp3" target="_blank" rel="noopener noreferrer">Acme's first call</a>, <code>call_1.mp3</code>, from <code>.mp3</code> format to <code>.wav</code>.</p>
<p><code>PyDub</code>'s <code>AudioSegment</code> class has already been imported. Remember, to work with non-wav files, you'll need <a href="http://www.ffmpeg.org/" target="_blank" rel="noopener noreferrer"><code>ffmpeg</code></a>.</p></div>

In [None]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex4_call_1_stereo_mp3.mp3

Instructions
<ul>
<li>Import the <code>filename</code> parameter using <code>AudioSegment</code>'s <code>from_file()</code>.</li>
<li>Set the export format to <code>"wav"</code>.</li>
<li>Pass the target audio file, <code>call_1.mp3</code>, to the function.</li>
</ul>

In [None]:
# Create function to convert audio file to wav
def convert_to_wav(filename):
  """Takes an audio file of non .wav format and converts to .wav"""
  # Import audio file
  audio = AudioSegment.from_file(filename)
  
  # Create new filename
  new_filename = filename.split(".")[0] + ".wav"
  
  # Export file as .wav
  audio.export(new_filename, format="wav")
  print(f"Converting {filename} to {new_filename}...")
 
# Test the function
convert_to_wav("ex4_call_1_stereo_mp3.mp3") #call_1.mp3

Converting ex4_call_1_stereo_mp3.mp3 to ex4_call_1_stereo_mp3.wav...


**Now to convert any audio file to .wav format, you can pass the filename to convert_to_wav(). Creating functions like this at the start of your projects saves plenty of coding later on.**

### Finding PyDub stats

<div class=""><p>You decide it'll be helpful to know the audio attributes of any given file easily. This will be especially helpful for finding out how many channels an audio file has or if the frame rate is adequate for transcription.</p>
<p>In this exercise, we'll create <code>show_pydub_stats()</code> which takes a filename of an audio file as input. It then imports the audio as a <code>PyDub</code> <code>AudioSegment</code> instance and prints attributes such as number of channels, length and more.</p>
<p>It then returns the <code>AudioSegment</code> instance so it can be used later on.</p>
<p>We'll use our function on the <a href="https://assets.datacamp.com/production/repositories/4637/datasets/43c5aff8c419d07f8cef70fdf40e4657b78b70be/ex4_call_1_stereo_formatted.wav" target="_blank" rel="noopener noreferrer">newly converted .wav file</a>, <code>call_1.wav</code></p>
<p><code>AudioSegment</code> has already imported from <code>PyDub</code>.</p></div>

In [None]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex4_call_1_stereo_formatted.wav

Instructions
<ul>
<li>Create an <code>AudioSegment</code> instance called <code>audio_segment</code> by importing the <code>filename</code> parameter.</li>
<li>Print the number of channels using the <code>channels</code> attribute.</li>
<li>Return the <code>audio_segment</code> variable.</li>
<li>Test the function on <code>"call_1.wav"</code>.</li>
</ul>

In [None]:
def show_pydub_stats(filename):
  """Returns different audio attributes related to an audio file."""
  # Create AudioSegment instance
  audio_segment = AudioSegment.from_file(filename)
  
  # Print audio attributes and return AudioSegment instance
  print(f"Channels: {audio_segment.channels}") #audio_segment.pan(-1) maybe
  print(f"Sample width: {audio_segment.sample_width}")
  print(f"Frame rate (sample rate): {audio_segment.frame_rate}")
  print(f"Frame width: {audio_segment.frame_width}")
  print(f"Length (ms): {len(audio_segment)}")
  return audio_segment

# Try the function
call_1_audio_segment = show_pydub_stats("ex4_call_1_stereo_formatted.wav")

Channels: 2
Sample width: 2
Frame rate (sample rate): 32000
Frame width: 4
Length (ms): 54888


In [None]:
call_1_audio_segment

**Now you'll be able to find the PyDub attribute parameters of any audio file in one line! It seems call_1.wav has two channels, potentially they could be split using PyDubs's split_to_mono() and transcribed separately.**

### Transcribing audio with one line

<div class=""><p>Alright, now you've got functions to convert audio files and find out their attributes, it's time to build one to transcribe them.</p>
<p>In this exercise, you'll build <code>transcribe_audio()</code> which takes a <code>filename</code> as input, imports the <code>filename</code> using <code>speech_recognition</code>'s <code>AudioFile</code> class and then transcribes it using <code>recognize_google()</code>.</p>
<p>You've seen these functions before but now we'll put them together so they're accessible in a function.</p>
<p>To test it out, we'll transcribe <a href="https://assets.datacamp.com/production/repositories/4637/datasets/43c5aff8c419d07f8cef70fdf40e4657b78b70be/ex4_call_1_stereo_formatted.wav" target="_blank" rel="noopener noreferrer">Acme's first call</a>, <code>"call_1.wav"</code>.</p>
<p><code>speech_recognition</code> has been imported as <code>sr</code>.</p></div>

In [None]:
def transcribe_audio(filename):
  """Takes a .wav format audio file and transcribes it to text."""
  # Setup a recognizer instance
  recognizer = sr.Recognizer()
  
  # Import the audio file and convert to audio data
  audio_file = sr.AudioFile(filename)
  with audio_file as source:
    audio_data = recognizer.record(source)
  
  # Return the transcribed text
  return recognizer.recognize_google(audio_data)

# Test the function
print(transcribe_audio("ex4_call_1_stereo_mp3.wav"))

hello welcome to Acme Studios support lawn my name is Daniel how can I best help you hi Daniel this is John


`    hello welcome to Acme studio support line my name is Daniel how can I best help you hey Daniel this is John I've recently bought a smart from you guys 3 weeks ago and I'm already having issues with it I know that's not good to hear John let's let's get your cell number and then we can we can set up a way to fix it for you one number for 17 varies how long do you reckon this is going to try our best to get the steel number will start up this support case I'm just really really really really I've been trying to contact past three 4 days now and I've been put on hold more than an hour and a half so I'm not really happy I kind of wanna get this issue 6 is f***** possible`

**You'll notice the recognizer didn't transcribe the words 'fast as' adequately on the last line, starring them out as a potential expletive, this is a reminder speech recognition still isn't perfect. But now you've now got a function which can transcribe the audio of a .wav file with one line of code. They're a bit of effort to setup but once you've got them, helper functions like transcribe_audio() save time and prevent errors later on.**

### Using the helper functions you've built

<div class=""><p>Okay, now we've got some helper functions ready to go, it's time to put them to use!</p>
<p>You'll first use <code>convert_to_wav()</code> to convert Acme's <a href="https://assets.datacamp.com/production/repositories/4637/datasets/56f523fb855eaecc14a87c5619ec5e6e7c4490bc/ex4_call_1_stereo_formatted_mp3.mp3" target="_blank" rel="noopener noreferrer"><code>call_1.mp3</code></a> to <code>.wav</code> format and save it as <code>call_1.wav</code></p>
<p>Using <code>show_pydub_stats()</code> you find <code>call_1.wav</code> has 2 channels so you decide to split them using <code>PyDub</code>'s <code>split_to_mono()</code>. Acme tells you the <a href="https://assets.datacamp.com/production/repositories/4637/datasets/03ace2e9b866aaa554c465d6698500aaf48599dc/ex4_call_1_channel_2_split.wav" target="_blank" rel="noopener noreferrer">customer channel</a> is likely channel 2. So you export channel 2 using <code>PyDub</code>'s <code>.export()</code>.</p>
<p>Finally, you'll use <code>transcribe_audio()</code> to transcribe channel 2 only.</p></div>

In [None]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex4_call_1_channel_2_split.wav

Instructions 1/3
<li>Convert the <code>.mp3</code> version of <code>call_1</code> to <code>.wav</code> and then check the stats of the <code>.wav</code> version.</li>

In [None]:
# Check the stats of new file
call_1 = show_pydub_stats("ex4_call_1_stereo_formatted.wav")

Channels: 2
Sample width: 2
Frame rate (sample rate): 32000
Frame width: 4
Length (ms): 54888


Instructions 2/3
<li>Split <code>call_1</code> to mono and then export the second channel in <code>.wav</code> format.</li>

In [None]:
# Split call_1 to mono
call_1_split = call_1.split_to_mono()

# Export channel 2 (the customer channel)
call_1_split[1].export("call_1_channel_2.wav",
                       format="wav")

<_io.BufferedRandom name='call_1_channel_2.wav'>

In [None]:
call_1_split[0]

In [None]:
call_1_split[1]

Instructions 3/3
<li>Transcribe the audio of call 1's channel 2.</li>

In [None]:
def transcribe_audio2(filename):
  "Takes a .wav format audio file and transcribes it to text."
  return "hey Daniel this is John I've recently bought a smartphone from you guys 3 weeks ago and I'm already having issues with it once they can we grab my Siri number it is for 1757 and very displease how long do you reckon this is going to take a pee on hold for about an hour now right I'm just just really really really really just weasel this product I've been trying to contact supports the past past three 4 days now and have been put on hold for more than an hour and a half so I'm not really happy I kind of wanna get this issue fixed as fast as possible"

In [None]:
# Transcribe the single channel
print(transcribe_audio2(call_1_split))
print(transcribe_audio("call_1_channel_2.wav"))

hey Daniel this is John I've recently bought a smartphone from you guys 3 weeks ago and I'm already having issues with it once they can we grab my Siri number it is for 1757 and very displease how long do you reckon this is going to take a pee on hold for about an hour now right I'm just just really really really really just weasel this product I've been trying to contact supports the past past three 4 days now and have been put on hold for more than an hour and a half so I'm not really happy I kind of wanna get this issue fixed as fast as possible
play Daniel this is John I've recently bought a small fire from 1757 I'm very displeased how long do you reckon this is going to take on me on hold for about an hour now we're on I'm just I just really really really really just ways of this talk time I've been trying to contact the pool at Cintas cost three four days now and I've been coronavirus and then an awesome not really happy I kinda want to get this issue fixed as possible


**Thanks to the helper functions we implemented earlier, you converted an audio file, check its stats, split it into separate channels and transcribed it all within a few lines of code! Well done. Now we've got some ways to turn our audio files into text, let's use some natural language processing to find out more.**

## Sentiment analysis on spoken language text

### Analyzing sentiment of a phone call

<div class=""><p>Once you've transcribed the text from an audio file, it's possible to perform natural language processing on the text.</p>
<p>In this exercise, we'll use <code>NLTK</code>'s VADER (Valence Aware Dictionary and sEntiment Reasoner) to analyze the sentiment of the transcribed text of <a href="https://assets.datacamp.com/production/repositories/4637/datasets/82c77dc404e914eb08ce2a54a10603ef027711b8/ex4_call_2_stereo_native.wav" target="_blank" rel="noopener noreferrer"><code>call_2.wav</code></a>.</p>
<p>To transcribe the text, we'll use the <code>transcribe_audio()</code> function we created earlier.</p>
<p>Once we have the text, we'll use <code>NLTK</code>'s <code>SentimentIntensityAnalyzer()</code> class to obtain a sentiment polarity score.</p>
<p><code>.polarity_scores(text)</code> returns a value for pos (positive), neu (neutral), neg (negative) and compound. Compound is a mixture of the other three values. The higher it is, the more positive the text. Lower means more negative.</p></div>

In [None]:
%%capture 
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex4_call_2_stereo_native.wav

import nltk
nltk.download('vader_lexicon')

In [None]:
def transcribe_audio_2(filename):
  """
  Takes an audio filename as string and transcribes the text within it.
  Note: For this exercise, the function has been mocked to prevent hitting the API usage limit.
  """
  if type(filename) == str:
    return "hello my name is Daniel thank you for calling acne Studios how can I best help you a little bit more but I'm corner of Edward and Elizabeth according to Google according to the match but would you be able to help me in some way because I think I'm actually walk straight past your shop yeah sure thing or thank you so it's good to hear you're enjoying it let me find out where the nearest store is for you"
  else:
    raise TypeError("filename should be of type string, like: 'call_2.wav'")

Instructions
<ul>
<li>Instantiate an instance of <code>SentimentIntensityAnalyzer()</code> and save it to the variable <code>sid</code>.</li>
<li>Transcribe the target call and save it to <code>call_2_text</code>.</li>
<li>Print the <code>polarity_scores()</code> of <code>call_2_text</code>.</li>
</ul>

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Create SentimentIntensityAnalyzer instance
sid = SentimentIntensityAnalyzer()

# Let's try it on one of our phone calls
call_2_text_2 = transcribe_audio_2("ex4_call_2_stereo_native.wav") #call_2.wav
call_2_text = transcribe_audio("ex4_call_2_stereo_native.wav") #call_2.wav

# Display text and sentiment polarity scores
print(call_2_text_2)
print(sid.polarity_scores(call_2_text_2))
print("____")
print(call_2_text)
print(sid.polarity_scores(call_2_text))

hello my name is Daniel thank you for calling acne Studios how can I best help you a little bit more but I'm corner of Edward and Elizabeth according to Google according to the match but would you be able to help me in some way because I think I'm actually walk straight past your shop yeah sure thing or thank you so it's good to hear you're enjoying it let me find out where the nearest store is for you
{'neg': 0.0, 'neu': 0.694, 'pos': 0.306, 'compound': 0.9817}
____
hello my name is Daniel thank you for calling Acme Studios how can I best help you from you guys and extremely happy with it I'll just go to East not an issue but I'm just going to learn a little bit more about the message prank I had but I thought you on the corner of Edward and Elizabeth according to Google according to the maps but some would you be able to help me in some way because I think I actually walked straight past your shop yeah sure thing well thank you Sally that's good to hear you're enjoying it let me let 

**Reading back the transcribed text and listening to the phone call, a compound score of close to 1 (more positive) makes sense since the customer states they're very happy and enjoying their device.**

### Sentiment analysis on formatted text

<div class=""><p>In this exercise, you'll calculate the sentiment on the customer channel of <a href="https://assets.datacamp.com/production/repositories/4637/datasets/82c77dc404e914eb08ce2a54a10603ef027711b8/ex4_call_2_stereo_native.wav" target="_blank" rel="noopener noreferrer"><code>call_2.wav</code></a>.</p>
<p>You've split the customer channel and saved it to <a href="https://assets.datacamp.com/production/repositories/4637/datasets/bc1fa0595fda765634de7b09864a26566b5f11db/ex4_call_2_channel_2_formatted.wav" target="_blank" rel="noopener noreferrer"><code>call_2_channel_2.wav</code></a>.</p>
<p>But from your experience with sentiment analysis, you know it can change sentence to sentence.</p>
<p>To calculate it sentence to sentence, you split the split using <code>NLTK</code>'s <code>sent_tokenize()</code> module.</p>
<p>But <code>transcribe_audio()</code> doesn't return sentences. To try sentiment anaylsis with sentences, you've tried a paid API service to get <code>call_2_channel_2_paid_api_text</code> which has sentences.</p></div>

In [None]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex4_call_2_channel_2_formatted.wav

In [None]:
call_2_channel_2_paid_api_text = "Hello and welcome to acme studios. My name's Daniel. How can I best help you? Hi Diane. This is paid on this call up to see the status of my, I'm proctor mortars at three weeks ago, and then service is terrible. Okay, Peter, sorry to hear about that. Hey, Peter, before we go on, do you mind just, uh, is there something going on with your microphone? I can't quite hear you. Is this any better? Yeah, that's much better. And sorry, what was, what was it that you said when you first first started speaking?  So I ordered a product from you guys three weeks ago and, uh, it's, it's currently on July 1st and I haven't received a provocative, again, three weeks to a full four weeks down line. This service is terrible. Okay. Well, what's your order id? I'll, uh, I'll start looking into that for you. Six, nine, eight, seven five. Okay. Thank you."

In [None]:
# Check the stats of new file
call_1 = show_pydub_stats("ex4_call_2_channel_2_formatted.wav")

In [None]:
def transcribe_audio_2(filename):
  """
  Takes an audio filename as string and transcribes the text within it.
  Note: For this exercise, the function has been mocked to prevent hitting the API usage limit.
  """
  if type(filename) == str:
    call_2_channel_2_text = "oh hi Daniel my name is Sally I recently purchased a smartphone from you guys and extremely happy with it I've just gotta issue not an issue but I've just got to learn a little bit more about the message bank on I have Google the location but I'm I'm finding it hard I thought you were on the corner of Edward and Elizabeth according to Google according to the match but would you be able to help me in some way because I think I've actually walk straight past your shop"
    return call_2_channel_2_text
  else:
    raise TypeError("filename should be of type string, like: 'call_2_channel_2.wav'")

Instructions 1/3
<li>Transcribe the audio of <code>call_2_channel_2.wav</code> and find the sentiment scores.</li>

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Create SentimentIntensityAnalyzer instance
sid = SentimentIntensityAnalyzer()

# Transcribe customer channel of call 2
call_2_channel_2_text_2 = transcribe_audio_2("ex4_call_2_channel_2_formatted.wav") #call_2_channel_2.wav
call_2_channel_2_text = transcribe_audio("ex4_call_2_channel_2_formatted.wav") #call_2_channel_2.wav

# Display text and sentiment polarity scores
print(call_2_channel_2_text_2)
print(sid.polarity_scores(call_2_channel_2_text_2))
print("____")
print(call_2_channel_2_text)
print(sid.polarity_scores(call_2_channel_2_text))

oh hi Daniel my name is Sally I recently purchased a smartphone from you guys and extremely happy with it I've just gotta issue not an issue but I've just got to learn a little bit more about the message bank on I have Google the location but I'm I'm finding it hard I thought you were on the corner of Edward and Elizabeth according to Google according to the match but would you be able to help me in some way because I think I've actually walk straight past your shop
{'neg': 0.017, 'neu': 0.891, 'pos': 0.091, 'compound': 0.778}
____
all hi Daniel my name is belly I've recently purchased a smartphone from you guys and extremely happy with it I'll just go to East Normandy Street but I'm just going to learn a little bit more about the message prank I have Google like Haitian but I'm finding it had I thought you on the corner of Edward and Elizabeth according to Google according to the maps but some would you be able to help me in some way because I think I actually walked straight past you

Instructions 2/3
<li>Split <code>call_2_channel_2_text</code> into sentences and find the sentiment score of each sentence.</li>

In [None]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
# Import sent_tokenize from nltk
from nltk import sent_tokenize

# Split call 2 channel 2 into sentences and score each
for sentence in sent_tokenize(call_2_channel_2_text):
    print(sentence)
    print(sid.polarity_scores(sentence))

all hi Daniel my name is belly I've recently purchased a smartphone from you guys and extremely happy with it I'll just go to East Normandy Street but I'm just going to learn a little bit more about the message prank I have Google like Haitian but I'm finding it had I thought you on the corner of Edward and Elizabeth according to Google according to the maps but some would you be able to help me in some way because I think I actually walked straight past yourself
{'neg': 0.0, 'neu': 0.872, 'pos': 0.128, 'compound': 0.8921}


Instructions 3/3
<li>Split <code>call_2_channel_2_paid_api_text</code> into sentences and score the sentiment of each.</li>

In [None]:
# Split channel 2 paid text into sentences and score each
for sentence in sent_tokenize(call_2_channel_2_paid_api_text):
    print(sentence)
    print(sid.polarity_scores(sentence))

Hello and welcome to acme studios.
{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'compound': 0.4588}
My name's Daniel.
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
How can I best help you?
{'neg': 0.0, 'neu': 0.303, 'pos': 0.697, 'compound': 0.7845}
Hi Diane.
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
This is paid on this call up to see the status of my, I'm proctor mortars at three weeks ago, and then service is terrible.
{'neg': 0.114, 'neu': 0.886, 'pos': 0.0, 'compound': -0.4767}
Okay, Peter, sorry to hear about that.
{'neg': 0.159, 'neu': 0.61, 'pos': 0.232, 'compound': 0.1531}
Hey, Peter, before we go on, do you mind just, uh, is there something going on with your microphone?
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
I can't quite hear you.
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
Is this any better?
{'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.4404}
Yeah, that's much better.
{'neg': 0.0, 'neu': 0.282, 'pos': 0.718, 'compound': 0.

You can see how the sentiment differs from sentence to sentence in the call 2 channel 2 paid API text. An extension could be to dig deeper into each of the sentences which have the lowest scores.

### Named entity recognition on transcribed text

### Named entity recognition in spaCy

<div class=""><p>Named entities are real-world objects which have names, such as, cities, people, dates or times. We can use <code>spaCy</code> to find named entities in our transcribed text.</p>
<p>In this exercise, you'll transcribe <a href="https://assets.datacamp.com/production/repositories/4637/datasets/2e039462d95117677db6ddfe24377d9cadcdf730/ex4_call_4_channel_2_formatted.wav" target="_blank" rel="noopener noreferrer"><code>call_4_channel_2.wav</code></a> using <code>transcribe_audio()</code> and then use <code>spaCy</code>'s language model, <code>en_core_web_sm</code> to convert the transcribed text to a <code>spaCy</code> doc.</p>
<p>Transforming text to a <code>spaCy</code> doc allows us to leverage <code>spaCy</code>'s built-in features for analyzing text, such as, <code>.text</code> for tokens (single words), <code>.sents</code> for sentences and <code>.ents</code> for named entities.</p></div>

In [None]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex4_call_4_channel_2_formatted.wav

In [3]:
call_4_channel_2_text = "oh hello Daniel my name is Ann and I've recently just purchased are a smartphone from you and I'm very happy with the product ID like to order another one for my friend who lives in Sydney and have it delivered I'm pretty sure it's model 315 I can check that for you and I'll give you my details arm if you would like to take my details and I I will also give you the address thank you excellent"

In [1]:
def transcribe_audio(filename):
  """Takes a .wav format audio file and transcribes it to text."""
  # Setup a recognizer instance
  recognizer = sr.Recognizer()
  
  # Import the audio file and convert to audio data
  audio_file = sr.AudioFile(filename)
  with audio_file as source:
    audio_data = recognizer.record(source)
  
  # Return the transcribed text
  return recognizer.recognize_google(audio_data)

In [15]:
def transcribe_audio_2(filename):
  """
  Takes an audio filename as string and transcribes the text within it.
  Note: For this exercise, the function has been mocked to prevent hitting the API usage limit.
  """
  if type(filename) == str:
    call_4_channel_2_text = "oh hello Daniel my name is Ann and I've recently just purchased are a smartphone from you and I'm very happy with the product ID like to order another one for my friend who lives in Sydney and have it delivered I'm pretty sure it's model 315 I can check that for you and I'll give you my details arm if you would like to take my details and I I will also give you the address thank you excellent"
    return call_4_channel_2_text
  else:
    raise TypeError("filename should be of type string, like: 'call_4_channel_2.wav'")

Instructions 1/4
<li>Create a <code>spaCy</code> <code>doc</code> by passing the transcribed call 4 channel 2 text to <code>nlp()</code> and then check its type.</li>

In [16]:
import spacy

# Transcribe call 4 channel 2
call_4_channel_2_text = transcribe_audio("ex4_call_4_channel_2_formatted.wav") #call_4_channel_2.wav
call_4_channel_2_text_2 = transcribe_audio_2("ex4_call_4_channel_2_formatted.wav")

# Create a spaCy language model instance
nlp = spacy.load("en_core_web_sm")

# Create a spaCy doc with call 4 channel 2 text
doc = nlp(call_4_channel_2_text)
doc_2 = nlp(call_4_channel_2_text_2)

# Check the type of doc
print(type(doc))

<class 'spacy.tokens.doc.Doc'>


In [17]:
doc_2.text

"oh hello Daniel my name is Ann and I've recently just purchased are a smartphone from you and I'm very happy with the product ID like to order another one for my friend who lives in Sydney and have it delivered I'm pretty sure it's model 315 I can check that for you and I'll give you my details arm if you would like to take my details and I I will also give you the address thank you excellent"

In [18]:
doc.text

"Hawaii Daniel my name is Anne and I've recently just purchased a smart front buying from you and I'm very happy with the product I'd like to order another one for my friend believes in Sydney and have it delivered I'm pretty sure it's model 315 I can check that for you and I'll give you more details if you would like to take my details and I I will also give you the address thank you excellent"

Instructions 2/4
<li>Create a <code>spaCy</code> <code>doc</code> with <code>call_4_channel_2_text</code> then print all the token text in it using the <code>.text</code> attribute.</li>

In [23]:
# Show tokens in doc
for token in doc:
    print(token.text, token.idx)

Hawaii 0
Daniel 7
my 14
name 17
is 22
Anne 25
and 30
I 34
've 35
recently 39
just 48
purchased 53
a 63
smart 65
front 71
buying 77
from 84
you 89
and 93
I 97
'm 98
very 101
happy 106
with 112
the 117
product 121
I 129
'd 130
like 133
to 138
order 141
another 147
one 155
for 159
my 163
friend 166
believes 173
in 182
Sydney 185
and 192
have 196
it 201
delivered 204
I 214
'm 215
pretty 218
sure 225
it 230
's 232
model 235
315 241
I 245
can 247
check 251
that 257
for 262
you 266
and 270
I 274
'll 275
give 279
you 284
more 288
details 293
if 301
you 304
would 308
like 314
to 319
take 322
my 327
details 330
and 338
I 342
I 344
will 346
also 351
give 356
you 361
the 365
address 369
thank 377
you 383
excellent 387


Instructions 3/4
<li>Load the <code>"en_core_web_sm"</code> language model and then print the sentences in the <code>doc</code> using the <code>.sents</code> attribute.</li>

In [27]:
# Show sentences in doc
for sentence in doc_2.sents:
    print(sentence)

oh hello Daniel my name is Ann
and I've recently just purchased
are a smartphone from you
and I'm very happy with the product ID like to order another one for my friend who lives in Sydney and have it delivered
I'm pretty sure it's model 315
I can check that for you
and I'll give you my details arm if you would like to take my details
and I I will also give you the address
thank you excellent


In [24]:
# Show sentences in doc
for sentence in doc.sents:
    print(sentence)

Hawaii Daniel my name is Anne
and I've recently just purchased a smart front buying from you
and I'm very happy with the product I'd like to order another one for my friend believes in Sydney and
have it delivered I'm pretty sure it's model 315
I can check that for you
and I'll give you more details if you would like to take my details
and I I will also give you the address
thank you excellent


Instructions 4/4
<li>Access the entities in the doc using <code>.ents</code> and then print the text of each.</li>

In [26]:
# Show named entities and their labels
for entity in doc_2.ents:
    print(entity.text, entity.label_)

Ann PERSON
Sydney GPE
315 CARDINAL


In [25]:
# Show named entities and their labels
for entity in doc.ents:
    print(entity.text, entity.label_)

Hawaii Daniel ORG
Anne PERSON
Sydney GPE
315 CARDINAL


**You've now seen some of spaCy's helpful functions for analyzing text. spaCy's built-in named entities are great to start with but sometimes you'll want to use your own.**

### Creating a custom named entity in spaCy

<div class=""><p>If <code>spaCy</code>'s built-in named entities aren't enough, you can make your own using <code>spaCy</code>'s <code>EntityRuler()</code> class.</p>
<p><code>EntityRuler()</code> allows you to create your own entities to add to a <code>spaCy</code> pipeline.</p>
<p>You start by creating an instance of <code>EntityRuler()</code> and passing it the current pipeline, <code>nlp</code>.</p>
<p>You can then call <code>add_patterns()</code> on the instance and pass it a dictionary of the text <code>pattern</code> you'd like to label with an entity.</p>
<p>Once you've setup a pattern you can add it the <code>nlp</code> pipeline using <code>add_pipe()</code>.</p>
<p>Since Acme is a technology company, you decide to tag the pattern <code>"smartphone"</code> with the <code>"PRODUCT"</code> entity tag.</p>
<p><code>spaCy</code> has been imported and a <code>doc</code> already exists containing the transcribed text from <a href="" target="_blank" rel="noopener noreferrer"><code>call_4_channel_2.wav</code></a>.</p></div>

In [31]:
# Create a spaCy language model instance
nlp = spacy.load("en_core_web_sm")

Instructions
<ul>
<li>Import <code>EntityRuler</code> from <code>spacy.pipeline</code>.</li>
<li>Add <code>"smartphone"</code> as the value for the <code>"pattern"</code> key.</li>
<li>Add the <code>EntityRuler()</code> instance, <code>ruler</code>, to the <code>nlp</code> pipeline.</li>
<li>Print the entity attributes contained in <code>doc</code>.</li>
</ul>

In [32]:
# Import EntityRuler class
from spacy.pipeline import EntityRuler

# Create EntityRuler instance
ruler = EntityRuler(nlp)

# Define pattern for new entity
ruler.add_patterns([{"label": "PRODUCT", "pattern": "smartphone"}])

# Update existing pipeline
nlp.add_pipe(ruler, before="ner")

#New doc
doc = nlp(call_4_channel_2_text_2)

# Test new entity
for entity in doc.ents:
  print(entity.text, entity.label_)

Ann PERSON
smartphone PRODUCT
Sydney GPE
315 CARDINAL


**With custom entities like this, you can start to get even more information out of your transcribed text. Depending on the problem you're working with, you may want to combine a few different patterns together.**

### Classifying transcribed speech with Sklearn

### Preparing audio files for text classification

<div class=""><p>Acme are very impressed with your work so far. So they've sent over two more folders of audio files.</p>
<p>One folder is called <code>pre_purchase</code> and contains audio snippets from customers who are pre-purchase, like <a href="https://assets.datacamp.com/production/repositories/4637/datasets/2acd3f72cd3753f200fae1479d7c06f2ea70cf7d/pre-purchase-audio-25.wav" target="_blank" rel="noopener noreferrer"><code>pre_purchase_audio_25.mp3</code></a>. </p>
<p>And the other is called <code>post_purchase</code> and contains audio snippets from customers who have made a purchase (post-purchase), like <a href="https://assets.datacamp.com/production/repositories/4637/datasets/30c755abc91782decd347c0b7c3b2c9ab86751a0/post-purchase-audio-27.wav" target="_blank" rel="noopener noreferrer"><code>post_purchase_audio_27.mp3</code></a>.</p>
<p>Upon inspecting the files you find there's about 50 in each and they're in the <code>.mp3</code> format.</p>
<p>Acme want to know if you can build a classifier to classify future calls. You tell them you sure can.</p>
<p>So in this exercise, you'll go through each folder and convert the audio files to <code>.wav</code> format using <code>convert_to_wav()</code> so you can transcribe them.</p></div>

Instructions
<ul>
<li>Convert the files in <code>pre_purchase</code> to <code>.wav</code> using <code>convert_to_wav()</code>.</li>
<li>Convert the files in <code>post_purchase</code> to <code>.wav</code> using <code>convert_to_wav()</code>.</li>
</ul>

In [None]:
# Convert post purchase
for file in post_purchase:
    print(f"Converting {file} to .wav...")
    convert_to_wav(file)

# Convert pre purchase
for file in pre_purchase:
    print(f"Converting {file} to .wav...")
    convert_to_wav(file)

### Transcribing phone call excerpts

<div class=""><p>In this exercise, we'll transcribe the audio files we converted to <code>.wav</code> format to text using <code>transcribe_audio()</code>.</p>
<p>Since there's lots of them and there could be more, we'll build a function <code>create_test_list()</code> which takes a list of filenames of audio files as input and goes through each file transcribing the text.</p>
<p><code>create_test_list()</code> uses our <code>transcribe_audio()</code> function we created earlier and returns a list of strings containing the transcribed text from each audio file.</p>
<p><code>pre_purchase_wav_files</code> and <code>post_purchase_wav_files</code> are lists of audio snippet filenames.</p></div>

Instructions 1/2
<ul>
<li>Use <code>transcribe_audio()</code> to transcribe the current <code>file</code> to text and add it to the text list.</li>
<li>Return the text list.</li>
</ul>

In [None]:
def create_text_list(folder):
  # Create empty list
  text_list = []
  
  # Go through each file
  for file in folder:
    # Make sure the file is .wav
    if file.endswith(".wav"):
      print(f"Transcribing file: {file}...")
      
      # Transcribe audio and append text to list
      text_list.append(transcribe_audio(file))   
  return text_list

create_text_list(folder)

Instructions 2/2
<ul>
<li>Use <code>create_text_list()</code> to transcribe all post and pre purchase audio snippets.</li>
<li>Check the first transcription of the post purchase text list.</li>
</ul>

In [None]:
# Transcribe post and pre purchase text
post_purchase_text = create_text_list(post_purchase_wav_files)
pre_purchase_text = create_text_list(pre_purchase_wav_files)

# Inspect the first transcription of post purchase
print(post_purchase_text[0])

**We've now got two lists of transcribed audio snippets we can use to start building a text classifier. Let's organize our text data a little bit with a dataframe.**

### Organizing transcribed phone call data

<div class=""><p>We're almost ready to build a text classifier. But right now, all of our transcribed text data is in two lists, <code>pre_purchase_text</code> and <code>post_purchase_text</code>.</p>
<p>To organize it better for building a text classifier as well as for future use, we'll put it together into a pandas DataFrame.</p>
<p>To start we'll import <code>pandas</code> as <code>pd</code> then we'll create a post purchase dataframe, <code>post_purchase_df</code> using <code>pd.DataFrame()</code>.</p>
<p>We'll pass <code>pd.DataFrame()</code> a dictionary containing a <code>"label"</code> key with a value of <code>"post_purchase"</code> and a <code>"text"</code> key with a value of our <code>post_purchase_text</code> list.</p>
<p>We'll do the same for <code>pre_purchase_df</code> except with <code>pre_purchase_text</code>.</p>
<p>To have all the data in one place, we'll use <code>pd.concat()</code> and pass it the pre and post purchase DataFrames.</p></div>

In [35]:
#@title ⠀ { display-mode: "form" }
post_purchase_text = ['hey man I just bought a product from you guys and I think is amazing but I leave a little help setting it up',
 'these clothes I just bought from you guys too small is there any way I can change the size',
 "I recently got these pair of shoes but they're too big can I change the size",
 "I bought a pair of pants from you guys but they're way too small",
 "I bought a pair of pants and they're the wrong colour is there any chance I can change that",
 "hey mate how you doing I'm just calling in regards the product that god it's faulty and doesn't work",
 "just wondering if there's any tutorials on how to set up my device I just received",
 "hey I'm just not happy with the product that you guys send me there any chance I can swap it out for another one",
 'I bought a pair of pants from you guys and they are just a bit too long do you guys do Hemi',
 'is there anybody that can help me set up this product or any how to use',
 "hey mate I just bought a product from you guys and I'm just unhappy with the pop the product can I return it",
 "just received the product from you guys and it didn't meet my expectations can I please get a refund",
 "what's the process I have to go through to send my product back for a swap",
 "hey mate how are you doing just wanting to know if there's any support I can get on this device how to set it up",
 "what's your refund policy on items that I've purchased from you guys",
 "hey how we doing I just put a cat from you guys and it's just the Wrong Colours is there any chance I can change that",
 "call me on to talk about a package I got yesterday it's I got it but I need to do I need some help with setting it up",
 "I got my order yesterday and the order number is 1863 3845 I'm just calling up to to check some more details on that",
 'I would have a couple of things from you guys the other day and two it two of them two of them and great and I love them but the other one is is not the right thing',
 "yeah hello I'm just wondering if I can speak to someone about an order I received yesterday",
 'wrong package delivered',
 "hey I ordered something yesterday and it arrived it arrived this morning but it seems like there's a few a few extra things in there that I didn't really order is there someone that I can talk to you to fix this up",
 "hey I bought something from your website the other day and it arrived but it's it's not the thing that I ordered is there someone I can talk to her to fix this up",
 "hello someone from your team delivered my package today but it's it's got a problem with it",
 "my shipment arrived this afternoon but it's wrong size is there anyone I can talk to you to change it",
 'I just bought a item from you guys and ID want to know if I can swap it for a different colour',
 "hey I received my order but it's the wrong size can I get a refund please",
 "hey my order arrived today but it's it's there's a it's I don't think it's the one that I ordered I check the receipt and it doesnt match what what a right",
 "hey I'm calling up to to see if I can talk to someone to help with her a shipment that I received yesterday",
 "I just received this device and I'd love some supported to be able to set it up",
 "I just bought a product from you guys and I wouldn't want to know if I can send it back to get a colour change",
 "I purchase something from your online store yesterday but the receipt didn't come through can can I get another receipt emailed please",
 'the product arrived and there was a few things in the box but two of them the wrong is there someone I can talk to about fixing up my order',
 "I'm just happy with the colour that I got from you guys so is there any chance I can change it for a different one",
 "a couple of days ago I got a message saying that my package have been delivered it wasn't delivered that day but it still hasn't arrived there someone I can talk to about my order",
 "my shipment arrived yesterday but it's not the right thing is there someone I can talk to you to fix it up",
 "my shipment arrived yesterday but it's not the right thing is there someone I can talk to you to fix it up",
 "my package was supposed to be delivered yesterday but it it didn't arrive is there someone I can talk to about my order",
 "my package was supposed to be delivered yesterday but it it didn't arrive is there someone I can talk to about my order",
 "I bought a hat from you guys and it's just too big is there anyway I can get it down size and what's your policies on that",
 'calling in regards to the order I just got would love some support',
 "my order a 64321 arrived this morning but it's something wrong with it is there someone I can talk to to fix it",
 "yeah hello someone this morning delivered a package but I think it's I think it's not the right one that I ordered is there someone I can talk to you too to change it",
 "on the box that you sent me yesterday arrived but it's damaged the someone I can talk to her about replacement",
 "I've just bought a product can you guys and I want to know what your return keys and Caesar",
 "my order a 64321 arrived this morning but it's something wrong with it is there someone I can talk to to fix it",
 "hey my name is Daniel I received my shipment yesterday but it's wrong can I change it",
 "all the things I received the my order yesterday would damaged I'm not sure what happened to delivery is there someone that can give me a hand",
 'the shipment I received is wrong',
 "yeah hey I need I need some help with her with an order that I ordered the other day it it came and it wasn't it wasn't correct",
 "yeah hello someone this morning delivered a package but I think it's I think it's not the right one that I ordered is there someone I can talk to you too to change it",
 'the shipment I received is wrong',
 "yeah hello I'm just wondering if I can speak to someone about an order I received yesterday",
 "my shipment arrived this afternoon but it's wrong size is there anyone I can talk to you to change it",
 "all the things I received the my order yesterday would damaged I'm not sure what happened to delivery is there someone that can give me a hand",
 'hey mate the must have been a problem with the shipping because the product I just received from you is damaged',
 "hey mate how you doing just calling in regards to the phone I just purchased from you guys faulty not working and now he's damaged on the way here"]


pre_purchase_text = ['yeah hi John just calling in regards to a recent order I just placed I found a cheaper product online and I was wondering if I could cancel that',
 "I was looking online it says that you're only size is available a large and small I was wondering if you'll have any mediums in soon",
 'hi I was just wondering if you have the extra large tea and blue',
 'yeah hey Steve just calling in regards to a recent order I just placed I was wondering if I could cancel that order',
 'hi I just ordered a new phone and I was just wondering if I could cancel out order and organise a refund',
 'hi I just ordered a new t-shirt and I was wondering if I could cancel an order and organise a refund',
 'accidentally made some errors and order I recently just placed I was wondering if you could help me',
 "I just placed an order online and I was just wondering when I'll get my confirmation email",
 "hey mate I just finished paying for my order and I was just wondering when I'm going to get that email to confirm it",
 'hey I was wondering if you know where my new phone is that I just recently ordered',
 'do you currently offer any new promotions at the moment',
 "hi I just pre-ordered the nudity and this is my order number but doctor I was just wondering if you know where abouts it isn't shipment",
 'your hi Jacob looking to make an order but just have a few questions regarding some products that you have online',
 'hi I just recently placed an order with your company I was just wondering if you know the status of my shipment',
 "Archie thank god I'm free been on hold for the last 30 minutes yeah got a couple of complaints made about this order I just posted",
 "hi just calling in regards to my order on November the 3rd I was just wondering when that's going to leave your office",
 "just looking to get some more information on the current promotions you're offering right now before I place my order",
 "hi I recently ordered a new phone and I'm just wondering where I could find my reference number for the delivery",
 'hey mate just looking to make some alterations to my order I just placed',
 'hey just looking to place this order but I see that you have a promotion still running can you give me some more details behind this promotion',
 "hi I placed an order a couple days ago and I was just wondering why my tracking number isn't working",
 'hi I just realised I ordered the wrong computer I was wondering if I could just cancel that and organise a refund',
 "yeah I just placed an all this you guys and I was wondering if I could change a few things before it's shift out",
 "how's it going after I just placed an order with you guys and I accidentally sent it to the wrong address can you please help me change this",
 "hey Polly just looking to place an order but before I proceed I'm just wondering if this offer still stands",
 'yeah hi Tommy I just placed an order with you guys but I use the wrong payment processing method I was wondering if I could change that',
 'hi Michael just looking to enquire about a few things before I placed an order I was wondering if you could help me',
 'hi I saw your new phone on your website I was wondering if you have any setup tips for',
 "I just ordered the new remote control car off you website I was just didn't see how many horsepower it has can you tell me",
 'hi just about to order these shoes online I was just wondering if you have any different sizes in store',
 'I just placed an order and I was wondering if I could change my shipping time from standard business days to rush if possible',
 'hey I just ordered the new phone and I was wondering if I could get airpods put into that order just before you guys send it',
 'hi Jacob I just placed an order with you guys but I found the same product online it and other store for a cheaper price I was wondering if you could price match it or could I cancel this order',
 'it says here you have the iPhone x l and X I was wondering if you still stock the iPhone 10',
 'hey I was just looking online at your shoes and I was wondering if you have this brand in Pink',
 'I just placed an order I was wondering how long shipping time would be expected to be',
 "hey mate just have a few questions regarding the recent order I just posted it shows that it's coming from overseas however when I looked at the Australian soccer shop online it says that there's current stock in store for the Australian store",
 'hi I just ordered some shoes and I was just wondering if I could cancel that order and make a refund',
 'hey I just ordered the blue and yellow shoes off your website and I was wondering if I could cancel that order and organise a refund',
 'hey so I just placed an order with your company and I was just wondering where I can find my reference number',
 'hey I was just wondering about the sizing on your shirts it says us as how does that relate to AUD',
 "hi Tony I just placed an order I'm currently having a few problems I was wondering if you could help me",
 'yeah hi David I just placed an order online and I was wondering if I could make an alteration to that order before you send it off',
 'hi I was just looking at finding a new phone I was wondering if you could recommend anything to me',
 'I I just ordered the green and blue shoes off your website and I was wondering if I could add a shirt to my order before you send it']

Instructions
<ul>
<li>Create <code>post_purchase_df</code> using the <code>post_purchase_text</code> list.</li>
<li>Create <code>pre_purchase_df</code> using the <code>pre_purchase_text</code> list.</li>
<li>Combine the two DataFrames using <code>pd.concat()</code>.</li>
</ul>

In [37]:
import pandas as pd

# Make dataframes with the text
post_purchase_df = pd.DataFrame({"label": "post_purchase",
                                 "text": post_purchase_text})
pre_purchase_df = pd.DataFrame({"label": "pre_purchase",
                                "text": pre_purchase_text})

# Combine DataFrames
df = pd.concat([post_purchase_df, pre_purchase_df])

# Print the combined DataFrame
df.head()

Unnamed: 0,label,text
0,post_purchase,hey man I just bought a product from you guys ...
1,post_purchase,these clothes I just bought from you guys too ...
2,post_purchase,I recently got these pair of shoes but they're...
3,post_purchase,I bought a pair of pants from you guys but the...
4,post_purchase,I bought a pair of pants and they're the wrong...


### Create a spoken language text classifier

<div class=""><p>Now you've transcribed some customer call audio data, we'll build a model to classify whether the text from the customer call is <code>pre_purchase</code> or <code>post_purchase</code>.</p>
<p>We've got 45 examples of <code>pre_purchase</code> calls and 57 examples of <code>post_purchase</code> calls.</p>
<p>The data the model will train on is stored in <code>train_df</code> and the data the model will predict on is stored in <code>test_df</code>.</p>
<p>Try printing the <code>.head()</code> of each of these to the console.</p>
<p>We'll build an <code>sklearn pipeline</code> using <code>CountVectorizer()</code> and <code>TfidfTransformer()</code> to convert our text samples to numbers and then use a <code>MultinomialNB()</code> classifier to learn what category each sample belongs to.</p>
<p>This model will work well on our small example here but for larger amounts of text, you may want to consider something more sophisticated.</p></div>

In [51]:
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split

In [64]:
train_df, test_df = train_test_split(df, test_size=0.20)
train_df.shape, test_df.shape

((81, 2), (21, 2))

Instructions 1/2
<ul>
<li>Create <code>text_classifier</code> using <code>CountVectorizer()</code>, <code>TfidfTransformer()</code>, and <code>MultinomialNB()</code>.</li>
<li>Fit <code>text_classifier</code> on <code>train_df.text</code> and <code>train_df.label</code>.</li>
</ul>

In [65]:
# Build the text_classifier as an sklearn pipeline
text_classifier = Pipeline([
    ('vectorizer', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('classifier', MultinomialNB()),
])

# Fit the classifier pipeline on the training data
text_classifier.fit(train_df.text, train_df.label)

Pipeline(memory=None,
         steps=[('vectorizer',
                 CountVectorizer(analyzer='word', binary=False,
                                 decode_error='strict',
                                 dtype=<class 'numpy.int64'>, encoding='utf-8',
                                 input='content', lowercase=True, max_df=1.0,
                                 max_features=None, min_df=1,
                                 ngram_range=(1, 1), preprocessor=None,
                                 stop_words=None, strip_accents=None,
                                 token_pattern='(?u)\\b\\w\\w+\\b',
                                 tokenizer=None, vocabulary=None)),
                ('tfidf',
                 TfidfTransformer(norm='l2', smooth_idf=True,
                                  sublinear_tf=False, use_idf=True)),
                ('classifier',
                 MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))],
         verbose=False)

Instructions 2/2
<ul>
<li>Create <code>predicted</code> by calling <code>predict()</code> on <code>text_classifier</code> and passing it the text column of <code>test_df</code>.</li>
<li>Evaluate the model by seeing how <code>predicted</code> compares to the <code>test_df.label</code>.</li>
</ul>

In [66]:
# Evaluate the MultinomialNB model
predicted = text_classifier.predict(test_df.text)
accuracy = 100 * np.mean(predicted == test_df.label)
print(f'The model is {accuracy}% accurate')

The model is 95.23809523809523% accurate


**The model was able to classify our test examples with a high level of accuracy. For larger datasets, our pipeline is a good baseline but you might want to look into something like a language model. Now you can start capturing speech, converting it to text and classifying it into different categories.**