[View in Colaboratory](https://colab.research.google.com/github/egree32/my-first-repository/blob/master/translate_demo.ipynb)

# Translation project: use of Google Cloud Speech API Client Library for Python
In this notebook, we demonstrate how to transcribe an audio recording. 
Prior to using the Speech API it is recommended that you read 
https://cloud.google.com/speech/docs/basics


It begins with
> This document is a guide to the basics of using the Google Cloud Speech API. This conceptual guide covers the types of requests you can make to the Speech API, how to construct those requests, and how to handle their responses. 

The Speech API has three main methods to perform speech recognition: 
- **Asynchronous Recognition** initiates a Long Running Operation. Using this operation, you can periodically poll for recognition results. Use asynchronous requests for audio data of any duration up to 180 minutes.
- **Synchronous Recognition** Synchronous recognition requests are limited to audio data of 1 minute or less in duration.
- **Streaming Recognition** Streaming requests are designed for real-time recognition purposes, such as capturing live audio from a microphone. Streaming recognition provides interim results while audio is being captured, allowing result to appear, for example, while a user is still speaking.

## Importing Flac files to cloud storage

The recording we will transcribe was saved in a `.flac` file. FLAC stands for Free Lossless Audio Codec, and as the name indicates, it is an open source, lossless, audio codec. In the file, I selected a segment from 11:16 to 11:36 as it was among the cleanest segments with respect to background noise. In this segment only a single voice (francophone) is heard. The total length of the original recording file was one hour and 8 minutes.

Note: Google Speech API only supports mono recording. Also, 16000 Hertz is optimal. 

### Importing the data 

To see the list of all your buckets, use the cell magic below.

In [1]:
 %%gcs list

UsageError: Cell magic `%%gcs` not found.


Another way to see the bucket name is to create a reference, below called `project`, to the project name. We will use this in the code that follows so that you can run this notebook in a new instance of datalab without having to retype names. 
The Datalab APIs are provided in the `google.datalab` Python library.

In [0]:
from google.datalab import Context
project_name = Context.default().project_id
print(project_name)

testing-data-lab


The Cloud Storage functionality is contained within the `google.datalab.storage` module. Using the bucket name, we create a reference to the bucket. To see what the bucket contains, we enumerate through the objects in `project_bucket` to display the contents of the bucket. 

In [0]:
import google.datalab.storage as storage
project_bucket = storage.Bucket(project_name)
for obj in project_bucket.objects():
  print(obj.key)

datalab-backups/us-central1-a/avaya/content/daily-20170815184107
datalab-backups/us-central1-a/avaya/content/daily-20170818191007
datalab-backups/us-central1-a/avaya/content/daily-20170823163046
datalab-backups/us-central1-a/avaya/content/daily-20170901221818
datalab-backups/us-central1-a/avaya/content/hourly-20170815184107
datalab-backups/us-central1-a/avaya/content/hourly-20170818191007
datalab-backups/us-central1-a/avaya/content/hourly-20170823163046
datalab-backups/us-central1-a/avaya/content/hourly-20170901221818
datalab-backups/us-central1-a/avaya/content/weekly-20170815184107
datalab-backups/us-central1-a/avaya/content/weekly-20170818191007
datalab-backups/us-central1-a/avaya/content/weekly-20170823163046
datalab-backups/us-central1-a/avaya/content/weekly-20170901221818
datalab-backups/us-central1-a/cloud/content/daily-20170824134323
datalab-backups/us-central1-a/cloud/content/hourly-20170824134323
datalab-backups/us-central1-a/cloud/content/weekly-20170824134323
original_flac_d

You can use the project name to construct a path to the flac file uploaded using Storage in the Cloud Console.

In [0]:
demo_flac_file = 'gs://' + project_name + '/original_flac_data/demo_clip.flac'
print('Flac file: ' + demo_flac_file)

Flac file: gs://testing-data-lab/original_flac_data/demo_clip.flac


## Transcription demonstration
The code in the next cell is from the Google Cloud Speech API Python Samples 

https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/cloud-client/transcribe.py

In [0]:
# [START def_transcribe_gcs]
def transcribe_gcs(gcs_uri):
    """Transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    # [START migration_audio_config_gcs]
    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=16000,
        language_code='fr-FR')
    # [END migration_audio_config_gcs]

    response = client.recognize(config, audio)
    # Print the first alternative of all the consecutive results.
    for result in response.results:
        print('Transcript: {}'.format(result.alternatives[0].transcript))
# [END def_transcribe_gcs]


In [0]:
transcribe_gcs(demo_flac_file)

Transcript: observer cette réalité est le devenir de cette réalité pour tirer chaque fois des leçons vue de définir une politique d'enseignement une politique d'intervention


The demo file was 15 seconds of a french speaker in a car. Here is the audio clip that was transcribed above


https://storage.googleapis.com/testing-data-lab/original_flac_data/demo_clip.flac