---

# **Importing of Libraries**

- In this section, we have imported all the libraries that we will used to perform Speech to Text operations.

---

In [1]:
import os
import time
from google.cloud import speech

In [2]:
# Authenticating the google application using service account file
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'client_secret.json'
speech_client = speech.SpeechClient()

---

# **Transcripting Local Media File**

- Make sure that the file size is less than 10 MB and have length less than 1 minute.

- The below code will transcript audio file of size 222 KB of length 12 seconds.

---

In [3]:
start_time = time.time()
# Locating media file
media_file_mp3 = 'audios/local audio.mp3'

# Encoding media file to RecognitionAudio
with open(file=media_file_mp3, mode='rb') as mp3_file:
    byte_data_mp3 = mp3_file.read()
mp3_audio = speech.RecognitionAudio(content=byte_data_mp3)

# Configuring the media files 
config_mp3 = speech.RecognitionConfig(sample_rate_hertz=48000, 
                                      enable_automatic_punctuation=True, 
                                      language_code='en-US', 
                                      model='default')

# Transcribing the media file
trans_mp3 = speech_client.recognize(audio=mp3_audio, 
                                    config=config_mp3)
stop_time = time.time()
print('Total Execution time', stop_time - start_time)
print(trans_mp3)

Total Execution time 4.986997127532959
results {
  alternatives {
    transcript: "Hi, my name is Mukesh and I am an associate in the domain of science."
    confidence: 0.8742868900299072
  }
}
total_billed_time {
  seconds: 15
}



---

# **Transcripting Cloud Storage Media File**

- If the file size is greater than 10 MB and have length more than 1 minute, then we use Google Cloud Storage.

- The below code will transcript audio file of size 28.9 MB of length 48 minutes.

---

In [4]:
start_time = time.time()
# Locating gs media file
media_uri = 'gs://speech-to-text-files-collection/Big Audio.mp3'
long_media_mp3 = speech.RecognitionAudio(uri=media_uri)

# Configuring the gs media files 
config_mp3_enhanced = speech.RecognitionConfig(sample_rate_hertz=48000, 
                                               enable_automatic_punctuation=True, 
                                               language_code='en-US', 
                                               use_enhanced=True, 
                                               model='video')

# Transcribing the gs media file
operation = speech_client.long_running_recognize(config=config_mp3_enhanced, 
                                                 audio=long_media_mp3)

# Generate a reponse of transcripted data
response = operation.result()
stop_time = time.time()
print('Total Execution time', stop_time - start_time)

# Display the results
for result in response.results:
    print(result.alternatives[0].transcript)
    print(result.alternatives[0].confidence)

Total Execution time 416.6010513305664

0.0
How are you? How's everyone doing?
0.8538639545440674

0.0

0.0

0.0

0.0

0.0

0.0
 So let me quickly just keep my phone on deep silence and then we can get started. Yeah, so I'm suggesting same to everyone. Keep your phones on deep silence.
0.9128383994102478

0.0
 Unless you expect an emergency call.
0.91007399559021
 Great, so welcome back to data science and we have almost neared, our end, not the end of Osment, the end of at least the term. One, two, three, we started long back by introducing data science. And then we moved on to basics of python. We talked about statistics and in-depth. We then talked about numpy Wanda's matplotlib. Seabone various types of visualization.
0.8247197866439819
 And ultimately, we talked about what is the very basic psyche, behind exploratory data, analysis. And how do we construct a story out of data in yesterday's session in last session? We finally started implementing our ideas and we start with the ve