Skip to content

Commit

Permalink
rename build folder #70 and add some docs
Browse files Browse the repository at this point in the history
  • Loading branch information
MarkEdmondson1234 committed Apr 19, 2020
1 parent a318edc commit 916510e
Show file tree
Hide file tree
Showing 7 changed files with 419 additions and 107 deletions.
2 changes: 1 addition & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@
^\.httr-oauth$
^cran-comments\.md$
^\.Renviron$
^build$
^cloud_build$
^CRAN-RELEASE$
File renamed without changes.
File renamed without changes.
27 changes: 22 additions & 5 deletions vignettes/speech.Rmd
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
---
title: "Google Cloud Speech API"
title: "Google Cloud Speech-to-Text API"
author: "Mark Edmondson"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Google Cloud Speech API}
%\VignetteIndexEntry{Google Cloud Speech-to-Text API}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

The Google Cloud Speech API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.
The Google Cloud Speech-to-Text API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.

Read more [on the Google Cloud Speech Website](https://cloud.google.com/speech/)
Read more [on the Google Cloud Speech-to-Text Website](https://cloud.google.com/speech/)

The Cloud Speech API provides audio transcription. Its accessible via the `gl_speech` function.

Expand Down Expand Up @@ -47,7 +47,7 @@ return$timings
# etc...
```

### Demo for Google Cloud Speech API
### Demo for Google Cloud Speech-to-Text API


A test audio file is installed with the package which reads:
Expand Down Expand Up @@ -96,6 +96,23 @@ result$timings
#5 0.900s 1s Dream
```

## Custom configurations

You can also send in other arguments which can help shape the output, such as speaker diagrization (labelling different speakers) - to use such custom configurations create a [`RecognitionConfig`](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig) object. This can be done via R lists which are converted to JSON via `library(jsonlite)` and an example is shown below:

```r
## Use a custom configuration
my_config <- list(encoding = "LINEAR16",
diarizationConfig = list(
enableSpeakerDiarization = TRUE,
minSpeakerCount = 2,
maxSpeakCount = 3
))

# languageCode is required, so will be added if not in your custom config
gl_speech(my_audio, languageCode = "en-US", customConfig = my_config)
```

## Asynchronous calls

For speech files greater than 60 seconds of if you don't want your results straight away, set `asynch = TRUE` in the call to the API.
Expand Down
42 changes: 28 additions & 14 deletions vignettes/speech.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@

<meta name="author" content="Mark Edmondson" />

<meta name="date" content="2020-04-16" />
<meta name="date" content="2020-04-19" />

<title>Google Cloud Speech API</title>
<title>Google Cloud Speech-to-Text API</title>



Expand Down Expand Up @@ -299,14 +299,14 @@



<h1 class="title toc-ignore">Google Cloud Speech API</h1>
<h1 class="title toc-ignore">Google Cloud Speech-to-Text API</h1>
<h4 class="author">Mark Edmondson</h4>
<h4 class="date">2020-04-16</h4>
<h4 class="date">2020-04-19</h4>



<p>The Google Cloud Speech API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.</p>
<p>Read more <a href="https://cloud.google.com/speech/">on the Google Cloud Speech Website</a></p>
<p>The Google Cloud Speech-to-Text API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.</p>
<p>Read more <a href="https://cloud.google.com/speech/">on the Google Cloud Speech-to-Text Website</a></p>
<p>The Cloud Speech API provides audio transcription. Its accessible via the <code>gl_speech</code> function.</p>
<p>Arguments include:</p>
<ul>
Expand Down Expand Up @@ -337,8 +337,8 @@ <h3>Returned structure</h3>
<a class="sourceLine" id="cb1-14" data-line-number="14"><span class="co">#4 0.700s 1.200s to</span></a>
<a class="sourceLine" id="cb1-15" data-line-number="15"><span class="co"># etc...</span></a></code></pre></div>
</div>
<div id="demo-for-google-cloud-speech-api" class="section level3">
<h3>Demo for Google Cloud Speech API</h3>
<div id="demo-for-google-cloud-speech-to-text-api" class="section level3">
<h3>Demo for Google Cloud Speech-to-Text API</h3>
<p>A test audio file is installed with the package which reads:</p>
<blockquote>
<p>“To administer medicine to animals is frequently a very difficult matter, and yet sometimes it’s necessary to do so”</p>
Expand Down Expand Up @@ -378,16 +378,30 @@ <h3>Word transcripts</h3>
<a class="sourceLine" id="cb3-12" data-line-number="12"><span class="co">#4 0.700s 0.900s A</span></a>
<a class="sourceLine" id="cb3-13" data-line-number="13"><span class="co">#5 0.900s 1s Dream</span></a></code></pre></div>
</div>
<div id="custom-configurations" class="section level2">
<h2>Custom configurations</h2>
<p>You can also send in other arguments which can help shape the output, such as speaker diagrization (labelling different speakers) - to use such custom configurations create a <a href="https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig"><code>RecognitionConfig</code></a> object. This can be done via R lists which are converted to JSON via <code>library(jsonlite)</code> and an example is shown below:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1"><span class="co">## Use a custom configuration</span></a>
<a class="sourceLine" id="cb4-2" data-line-number="2">my_config &lt;-<span class="st"> </span><span class="kw">list</span>(<span class="dt">encoding =</span> <span class="st">&quot;LINEAR16&quot;</span>,</a>
<a class="sourceLine" id="cb4-3" data-line-number="3"> <span class="dt">diarizationConfig =</span> <span class="kw">list</span>(</a>
<a class="sourceLine" id="cb4-4" data-line-number="4"> <span class="dt">enableSpeakerDiarization =</span> <span class="ot">TRUE</span>,</a>
<a class="sourceLine" id="cb4-5" data-line-number="5"> <span class="dt">minSpeakerCount =</span> <span class="dv">2</span>,</a>
<a class="sourceLine" id="cb4-6" data-line-number="6"> <span class="dt">maxSpeakCount =</span> <span class="dv">3</span></a>
<a class="sourceLine" id="cb4-7" data-line-number="7"> ))</a>
<a class="sourceLine" id="cb4-8" data-line-number="8"></a>
<a class="sourceLine" id="cb4-9" data-line-number="9"><span class="co"># languageCode is required, so will be added if not in your custom config</span></a>
<a class="sourceLine" id="cb4-10" data-line-number="10"><span class="kw">gl_speech</span>(my_audio, <span class="dt">languageCode =</span> <span class="st">&quot;en-US&quot;</span>, <span class="dt">customConfig =</span> my_config)</a></code></pre></div>
</div>
<div id="asynchronous-calls" class="section level2">
<h2>Asynchronous calls</h2>
<p>For speech files greater than 60 seconds of if you don’t want your results straight away, set <code>asynch = TRUE</code> in the call to the API.</p>
<p>This will return an object of class <code>&quot;gl_speech_op&quot;</code> which should be used within the <code>gl_speech_op()</code> function to check the status of the task. If the task is finished, then it will return an object the same form as the non-asynchronous case.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1">async &lt;-<span class="st"> </span><span class="kw">gl_speech</span>(test_audio, <span class="dt">asynch =</span> <span class="ot">TRUE</span>)</a>
<a class="sourceLine" id="cb4-2" data-line-number="2">async</a>
<a class="sourceLine" id="cb4-3" data-line-number="3"><span class="co">## Send to gl_speech_op() for status</span></a>
<a class="sourceLine" id="cb4-4" data-line-number="4"><span class="co">## 4625920921526393240</span></a>
<a class="sourceLine" id="cb4-5" data-line-number="5"></a>
<a class="sourceLine" id="cb4-6" data-line-number="6">result &lt;-<span class="st"> </span><span class="kw">gl_speech_op</span>(async)</a></code></pre></div>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" data-line-number="1">async &lt;-<span class="st"> </span><span class="kw">gl_speech</span>(test_audio, <span class="dt">asynch =</span> <span class="ot">TRUE</span>)</a>
<a class="sourceLine" id="cb5-2" data-line-number="2">async</a>
<a class="sourceLine" id="cb5-3" data-line-number="3"><span class="co">## Send to gl_speech_op() for status</span></a>
<a class="sourceLine" id="cb5-4" data-line-number="4"><span class="co">## 4625920921526393240</span></a>
<a class="sourceLine" id="cb5-5" data-line-number="5"></a>
<a class="sourceLine" id="cb5-6" data-line-number="6">result &lt;-<span class="st"> </span><span class="kw">gl_speech_op</span>(async)</a></code></pre></div>
</div>


Expand Down
28 changes: 28 additions & 0 deletions vignettes/text-to-speech.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,34 @@ gl_talk("Would you like a cup of tea?", gender = "FEMALE", languageCode = "en-GB

Some languages are not yet supported, such as Danish. The API will return an error in those cases.

## Support for SSML

Support is also included for Speech Synthesis Markup Language (SSML) - more details on using this to insert pauses, sounds and breaks in your audio can be found here: `https://cloud.google.com/text-to-speech/docs/ssml`

To use, send in your SSML markup around the text you want to talk and set `inputType= "ssml"`:

```r
# using SSML
gl_talk('<speak>The <say-as interpret-as=\"characters\">SSML</say-as>
standard <break time=\"1s\"/>is defined by the
<sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>',
inputType = "ssml")
```

## Effect Profiles

You can output audio files that are optimised for playing on various devices.

To use audio profiles, supply a character vector of the available audio profiles listed here: `https://cloud.google.com/text-to-speech/docs/audio-profiles` - the audio profiles are applied in the order given.

For instance `effectsProfileIds="wearable-class-device"` will optimise output for smart watches, `effectsProfileIds=c("wearable-class-device","telephony-class-application")` will apply sound filters optimised for smart watches, then telephonic devices.

```r
# using effects profiles
gl_talk("This sounds great on headphones",
effectsProfileIds = "headphone-class-device")
```

## Browser Speech player

Creating and clicking on the audio file to play it can be a bit of a drag, so you also have a function that will play the audio file for you, launching via the browser. This can be piped via the tidyverse's `%>%`
Expand Down
Loading

0 comments on commit 916510e

Please sign in to comment.