<h2>Using SAS DLPy to Convert Speech to Text</h2>

This example uses SAS DLPy to convert speech (audio files) to text. 

The example begins with configuring the environment. After that, a Speech object is created and the pre-trained acoustic model and language model are loaded. Lastly, the models are applied to the sound and the predicted text is output.

The example assumes that you're using the pre-trained acoustic and language models published for SAS Viya 3.4, which you can download from here: https://support.sas.com/documentation/prod-p/vdmml/zip/speech_19w21.zip.

**Note: The current models only support audio files in WAV format with 8-bit, 16-bit, 24-bit or 32-bit sampling encoding.**

<h3>Configure the Environment<a name="configureIt"></a></h3>

Begin by importing `SWAT`. SWAT (specifically python-swat) is a Python interface to SAS CAS. For more information about starting a CAS session with the SWAT package, see https://sassoftware.github.io/python-swat/getting-started.html.

In [1]:
from swat import *

Next, import `dlpy.speech`.

In [2]:
import dlpy.speech

After configuring your environment and loading required libraries, connect to your CAS server. You will need a host name and port number for this step.

In [3]:
s = CAS(cashost, casport)

<h3>Create a Speech Object</h3>

First, create a `Speech` object.

We assume that the audio you want to transcribe is located somewhere on your disk, so we need to copy the audio file to a place where both the Python client and the CAS server can access it. Also, if the audio is relatively long (longer than 10 seconds), we may need to segment it into multiple pieces and then save them. 

The `data_path` parameter specifies the absolute path of the folder to store these temporary audio files (server-side). We'll remove these files after processing them successfully, and will let you know every time we try to create and remove these files on your disk. The `local_path` parameter specifies the path of the **SAME FOLDER** on the client side. Please note that if the Python client and the CAS server have different OS types (Windows/Linux), the paths of this folder on server side and client side will be different. 

<p style="background-color:#FE9595;border:thick solid #000000;text-align:center">
<b>IMPORTANT NOTE</b><br />
    For this release, the data_path and local_path must point to the <b>same</b> folder! The server must have <b>read</b> access to this folder and the client must have <b>read and write</b> access to this folder! This is because the segmentation is being done locally via Python for this release, meaning the segmentation is being done client side and thus the temporary files are being created by the client.<br /> <br />
    In future releases, segmentation will be built into CAS and this limitation will not exist.
</p>

In [4]:
spch2txt = dlpy.speech.Speech(s, 
                              data_path                          = "/path/to/shared/folder/on/server", 
                              local_path                         = "\\\\path\\to\\shared\\folder\\on\\client", 
                              acoustic_model_path                = "/path/to/acoustic_model_cpu.sashdat",
                              language_model_path                = "/path/to/language_model.sashdat")

NOTE: Model table is loaded successfully!
NOTE: Model is renamed to "asr" according to the model name in the table.
NOTE: acoustic_model_cpu_weights.sashdat is used as model weigths.
NOTE: Model weights attached successfully!
NOTE: acoustic_model_cpu_weights_attr.sashdat is used as weigths attribute.
NOTE: Model attributes attached successfully!


You can load our pre-trained models at the initialization step as shown above. Or you can choose to load the models later by using the methods `load_acoustic_model` and `load_language_model` from `dlpy.speech`. If there is an acoustic or language model loaded already, the method will replace the old model by the new one.

<h3>Convert Speech To Text</h3>

Now, you can use the `speech.transcribe()` method to convert speech to text.

In [5]:
import IPython.display as ipd
ipd.Audio("sample_1.wav")

In [6]:
spch2txt.transcribe("sample_1.wav")

Note: 1 temporary audio files are created.
NOTE: Due to data distribution, miniBatchSize has been limited to 1.
Note: all temporary files are removed.


'NEW FEDERAL RULES REQUIRE THRIFTS TO GRADUALLY RAISE THAT LEVEL DEPENDING ON THE PROFITABILITY OF THE INDUSTRY'

In [7]:
ipd.Audio("sample_2.wav")

In [8]:
spch2txt.transcribe("sample_2.wav")

Note: 4 temporary audio files are created.
NOTE: Due to data distribution, miniBatchSize has been limited to 1.
Note: all temporary files are removed.


'THERE WAS INFINITE SKEPTICISM AROUND HIM ON THE SUBJECT AND WHILE OTHER INVENTORS WERE ALSO GIVING THE SUBJECT BERTHA THE PUBLIC TOOK IT FOR GRANTED THAT ANYTHING SO UTTERLY INTANGIBLE AS ELECTRICITY THAT COULD NOT BE SEEN OR WEIGHED AND ONLY GAVE SECONDARY EVIDENCE OF ITSELF AT THE EXACT POINTED USE COULD NOT BE BROUGHT TO ACCURATE REGISTRATION'