<h2>Using SAS DLPy to Convert Speech to Text</h2>

This example uses SAS DLPy to convert speech to text. 

The example begins with configuring the environment. After that, the SpeechToText object is created and the pretrained acoustic model and language model are loaded, which followed by applying the models to do speech-to-text.

The example assumes that you're using the pretrained acoustic and language models published for SAS Viya 3.4, which you can download them from here: https://support.sas.com/documentation/prod-p/vdmml/zip/speech_19w21.zip. The example also assumes that the inputs will be audio files in WAV format with 8-bit, 16-bit, 24-bit or 32-bit sampling encoding.

<h3>Configure the Environment<a name="configureIt"></a></h3>

Begin by importing `SWAT`. SWAT is a Python interface to SAS CAS. For more information about starting a CAS session with the SWAT package, see https://sassoftware.github.io/python-swat/getting-started.html.

In [1]:
from swat import *

Next, import DLPy `speech`.

In [2]:
import speech

After configuring your environment and loading required libraries, connect to your CAS server. You will need a host name and port number for this step.

In [3]:
s = CAS(cashost, casport)

In [4]:
s.table.addcaslib(datasource = {"srcType": "path"},
                  name = "caslib_speech",
                  path = "/dept/cas/xixche/data",
                  subdirectories = True)

NOTE: 'caslib_speech' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'caslib_speech'.


Unnamed: 0,Name,Type,Description,Path,Definition,Subdirs,Local,Active,Personal,Hidden,Transient
0,caslib_speech,PATH,,/dept/cas/xixche/data/,,1.0,1.0,1.0,0.0,0.0,0.0


The output above shows caslib `caslib_speech` is created. The folder referenced in the `Path` column contains the acoustic model and language model files downloaded. This step is to prepare for accessing some data in the server.

<h3>Create a SpeechToText Object</h3>

This step creates a `speech.SpeechToText` object with the current CAS connection. 

When dealing with an input audio lasts quite long, we may need to segment the long audio into several shorter pieces first. We will save the segmented audio data somewhere on your disk temporarily and remove them once the API completes running. The location to store the audio files must be accessible by both the CAS server and the Python client. 

Therefore, we will have two paths to store the files: `data_path_after_caslib` and `local_path`. They actually point to the same location. The value of `data_path_after_caslib` is determined by the CAS server OS and should be relative to `caslib_data`, while `local_path` is determined by the Python client OS and can be either relative or absolute.  

In [5]:
spch2txt = speech.SpeechToText(s, 
                               data_path_after_caslib             = "tmp/", 
                               local_path                         = "\\\\sashq\\root\\dept\\cas\\xixche\\data\\tmp", 
                               data_caslib                        = "caslib_speech", 
                               acoustic_model_path                = "acoustic_model_cpu.sashdat",
                               acoustic_model_caslib              = "caslib_speech",
                               acoustic_model_weights_path        = "acoustic_model_cpu_weights.sashdat",
                               acoustic_model_weights_caslib      = "caslib_speech",
                               acoustic_model_weights_attr_path   = "acoustic_model_cpu_weights_attr.sashdat",
                               acoustic_model_weights_attr_caslib = "caslib_speech", 
                               language_model_path                = "language_model.sashdat",
                               language_model_caslib              = "caslib_speech")

You can load our pretrained models at the initialization step as shown above. Or you can choose to create the SpeechToText object first, and then load the models one by one by using the methods `load_acoustic_model` and `load_language_model` from `dlpy.speech`. If there is an acoustic or language model loaded already, the method will replace the old model by the new one.

<h3>Convet Speech To Text</h3>

Now, you can use the `speech.transcribe` method to convert speech to text.

In [6]:
spch2txt.transcribe("sample_1.wav")

'NEW FEDERAL RULES REQUIRE THRIFTS TO GRADUALLY RAISE THAT LEVEL DEPENDING ON THE PROFITABILITY OF THE INDUSTRY'

In [7]:
spch2txt.transcribe("sample_2.wav")

'THEY CAN BE HILARIOUSLY FUNNY AS WELL AS HIGHLY EFFECTIVE TOOLS FOR SATIRE'