The idea for this tutorial came from a few students that were dealing with the process of translating speech to text. With today's advances in Machine Learning, these services are more of a commodity than what they're are a cutting edge technology. Every major provider (AWS, IBM Watson, Azure, etc) offers speech to text services.

We find that Google's is probably the one that works best. It also includes a **free trial** which makes it ideal for quick tests.

### The objective 👊

The objective will be to transcribe pieces of audio using [Google Speech to Text](https://cloud.google.com/speech-to-text/). To do that, I'll show you how to setup the Google Cloud account, create a project, enable the Speech to Text service and create the buckets on [Google Cloud Storage](https://cloud.google.com/storage/) to upload the audio files.

### The problem 🤦

The setup for Google's account + project configuration is tedious. It's not really intuitive the way we have to enable services, generate credentials etc.

This tutorial is built so it can be executed from any platform (even your own local laptop). Google's credentials (authentication service) are available automatically if you're working from withing Google's compute environments, but to do it locally, you must take a couple of extra steps.

### The end result ✨

Here's the audio that we'll use for the demo. It's the first 30 seconds of [Jacob Kaplan-Moss' Keynote from Pycon 2015](https://www.youtube.com/watch?v=hIJdFxYlEKE).

In [2]:
import IPython

In [3]:
IPython.display.Audio("jacob-keynote.flac")

The resulting transcription is:

In [4]:
with open('result.txt') as fp:
    print(fp.read())

Before I start at the very quick announcement, if you bought anything at the
wonderful pyladies auction last night, you need to pick it up at registration
during the break right now. I've already picked my tie up. Thank you.  So hi.
Good morning. I'm I'm Jacob Kaplan Moss. I'm one of the contributors to Django
and I'm the director of security at Heroku and when I say thank you. I really
mean it is hard for you to run.


As you can see, we have very good results. Even proper nouns (as Heroku, or Jacob Kaplan Moss ) are picked by the service.

### The code 🐍

If you already have an account, with Speech to Text enabled, and you have your credentials, you can skip directly to the notebook containing the actual code: `Google Speech to Text Demo.ipynb`. This notebook is a tutorial on how to setup the account and project.

### The caveats 🛠

There are a number of things that ALWAYS confuse our students, so I'll name them first. **PLEASE read this section:**

#### 1. Audio files must be encoded using FLAC!

Audio files must be `FLAC` encoding (and extension). There are multiple encodings supported, but `FLAC` is a safe bet. If you want to read check the supported encodings read [official google's docs](https://cloud.google.com/speech-to-text/docs/encoding).
If you have mp3 files, use any of these services: [1](https://audio.online-convert.com/convert-to-flac), [2](https://videoconverter.wondershare.com/convert-mp3/convert-mp3-to-flac-mac.html), [3](https://www.zamzar.com/convert/mp3-to-flac/), [4](https://www.onlineconverter.com/mp3-to-flac), [5](https://www.media.io/convert/mp3-to-flac.html).

Also, check [`ffmpeg`](https://ffmpeg.org/) which is an open source, command line tool to manage audio and video. The `ffmpeg` command to do mp3 to flac is: `$ ffmpeg -i my-file.mp3 my-file.flac` (`ffmpeg` understands the extensions).

#### 2. You have to enable billing

Even though Google offers a free trial ($300 in credits at the time of this writing), you still need to input your credit card.

#### 3. Audio files must be first uploaded to Google Storage

Technically speaking, Google supports both a "sync" and "async" models. The Sync model will transcribe a file that you have in your local computer. But that only works up to (I think) 1 minute of audio, which is pretty much useless. So we're going to default to "async". Async works by uploading the audio files to Google Cloud Storage first, and then pointing Speech to Text to those files.

---

## ✨ Visual Step by Step Guide ✨

The required steps to have everything working are:
* Creating an account
* Setting up the project
* Enabling services
* Generating credentials

I've done this step from the beginning and taken screenshots to aid in the process. Let's get started!

### 1. Create an account

I'm assuming you don't have an account, if you already have one, you can skip this part. First visit [Google Speech to Text page](https://cloud.google.com/speech-to-text/) and click on "Try for Free":

![2](https://user-images.githubusercontent.com/872296/59147340-732c5980-89d0-11e9-9f9e-cc8890a1629a.png)

You can either "Sign in" or "Create account". Again, I'm assuming you don't have an account. If you do have one, you can sign in, this process assumes you're going to create a new one.

![3](https://user-images.githubusercontent.com/872296/59147339-732c5980-89d0-11e9-91c9-ef5296f1a2e8.png)

Select your country and accept terms:

![4](https://user-images.githubusercontent.com/872296/59147338-732c5980-89d0-11e9-9e64-e367351d11fc.png)

Fill your user info and setup your payment:

![5](https://user-images.githubusercontent.com/872296/59147337-732c5980-89d0-11e9-88cc-2b44a314ff91.png)

Remember that you're part of the free trial, which usually includes $300 in credits, more than enough for our tests.

### 2. Setup your project

Google will create a project automatically once you create your account. We're going to use that default project. You can create a new one if you want. You can see the currently active project at the top bar:

![6](https://user-images.githubusercontent.com/872296/59147336-7293c300-89d0-11e9-84a9-6c6f91e5a4df.png)

Now it's time to _"enable services"_. This is a little bit confusing: every new project you create is "empty" and it has no access to any "services". You have to enable the services that you want to use one by one. Let's start with Speech to Text.

At the search bar at the top, search for "speech to text" (just typing `speech` will do):

![7](https://user-images.githubusercontent.com/872296/59147335-7293c300-89d0-11e9-8b19-c1347229961a.png)

You'll be dropped to the Speech to Text service page, click on "Enable":

![8](https://user-images.githubusercontent.com/872296/59147334-7293c300-89d0-11e9-8b28-bd46edbd0901.png)

Now let's "enable" Cloud Storage. In the search bar, type "Storage" and click on it:

![9](https://user-images.githubusercontent.com/872296/59147333-7293c300-89d0-11e9-8dee-a5fb10b39c52.png)

Cloud Storage is already "enabled", but we must create _buckets_ to upload our audio files. Click on "Create Bucket":

![10](https://user-images.githubusercontent.com/872296/59147332-71fb2c80-89d0-11e9-9560-112e977513bf.png)

Input all the information. The bucket name is important, we're going to use it later. Also, bucket names have to be **globally** unique. It might take a few attempts. Also, select the region you're located to decrease latency. Some regions are more expensive than others, so try the ones that are more convenient to you:

![11](https://user-images.githubusercontent.com/872296/59147331-71fb2c80-89d0-11e9-8a51-d46600e59759.png)

It's time to upload the audio files. If you don't have an audio file at hand or you want to try with the audio file used in this tutorial, right click on it and download it from the left pane:

![Screenshot at 10-05-37](https://user-images.githubusercontent.com/872296/59147671-fcde2600-89d4-11e9-825b-c71753b45522.png)

Upload your files. **REMEMBER** that your files MUST be uploaded as `.flac`. There are more formats supported, check [the official documentation](https://cloud.google.com/speech-to-text/docs/encoding) for that.

![12](https://user-images.githubusercontent.com/872296/59147330-71fb2c80-89d0-11e9-8d4b-bebc40e493a3.png)

Once your file is uploaded, you should see it in Storage your dashboard:

![13](https://user-images.githubusercontent.com/872296/59147329-71fb2c80-89d0-11e9-949a-84f88236a016.png)

### 3. Generate Credentials

This is a VERY important step. It's also one of those things that, if it goes wrong, the whole process will fail.

First, expand the left navigation bar (see the hamburger at the top left corner), expand **APIs & Services** and click on **Credentials**:

![14](https://user-images.githubusercontent.com/872296/59147906-d8377d80-89d7-11e9-8566-4e64e8cc872c.png)

Then click on _"Create Credentials"_ and make sure you select **Service account key**:

![15](https://user-images.githubusercontent.com/872296/59147327-71629600-89d0-11e9-9c9b-a9b8e16f3351.png)

Now click on "New service account":

![16](https://user-images.githubusercontent.com/872296/59147326-71629600-89d0-11e9-973a-eb2f3fcda2aa.png)

Give it a name, and **choose the role**. This is VERY important. Make sure you choose the Project Owner role. This will give your credentials access to all the services:

![17](https://user-images.githubusercontent.com/872296/59147325-71629600-89d0-11e9-84a0-f3355e93c4d9.png)

Once you pick the roles, it should look like this:

![18](https://user-images.githubusercontent.com/872296/59147324-71629600-89d0-11e9-9099-56f2688f8b6a.png)

Now **download** your credentials JSON file. **Keep it safe!**

![19](https://user-images.githubusercontent.com/872296/59147323-70c9ff80-89d0-11e9-9893-0410bec5527f.png)

Your credentials are added now to your account:

![20](https://user-images.githubusercontent.com/872296/59147322-70c9ff80-89d0-11e9-883e-1e569869df27.png)

### 4. Replace your credentials file in this project

Now it's time to replace the credentials file for your own one. I've named it `google-demo-speech-to-text.json`, and you can find it on the left pane. Right click on it, and open it with the Editor:

![Screenshot at 10-30-46](https://user-images.githubusercontent.com/872296/59147961-804d4680-89d8-11e9-91c2-bc6c0a48e13a.png)

Replace the entire file with the contents of the one you downloaded (your credentials):


![Screenshot at 10-34-11](https://user-images.githubusercontent.com/872296/59148012-0ec1c800-89d9-11e9-9556-fd769016bb7f.png)

### 5. Jump to code!

The setup should be ready! You can switch to the other Notebook (in the left pane) to follow the code.