rename build folder #70 and add some docs

ropensci · Apr 19, 2020 · 916510e · 916510e
1 parent a318edc
commit 916510e
Show file tree

Hide file tree

Showing 7 changed files with 419 additions and 107 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -9,5 +9,5 @@
 ^\.httr-oauth$
 ^cran-comments\.md$
 ^\.Renviron$
-^build$
+^cloud_build$
 ^CRAN-RELEASE$
diff --git a/build/build.R → cloud_build/build.R b/build/build.R → cloud_build/build.R
diff --git a/build/cloudbuild-tests.yml → cloud_build/cloudbuild-tests.yml b/build/cloudbuild-tests.yml → cloud_build/cloudbuild-tests.yml
diff --git a/vignettes/speech.Rmd b/vignettes/speech.Rmd
@@ -1,17 +1,17 @@
 ---
-title: "Google Cloud Speech API"
+title: "Google Cloud Speech-to-Text API"
 author: "Mark Edmondson"
 date: "`r Sys.Date()`"
 output: rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Google Cloud Speech API}
+  %\VignetteIndexEntry{Google Cloud Speech-to-Text API}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
 ---
 
-The Google Cloud Speech API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases. 
+The Google Cloud Speech-to-Text API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases. 
 
-Read more [on the Google Cloud Speech Website](https://cloud.google.com/speech/)
+Read more [on the Google Cloud Speech-to-Text Website](https://cloud.google.com/speech/)
 
 The Cloud Speech API provides audio transcription.  Its accessible via the `gl_speech` function.
 
@@ -47,7 +47,7 @@ return$timings
 # etc...
 ```
 
-### Demo for Google Cloud Speech API
+### Demo for Google Cloud Speech-to-Text API
 
 
 A test audio file is installed with the package which reads:
@@ -96,6 +96,23 @@ result$timings
 #5      0.900s      1s      Dream
 ```
 
+## Custom configurations
+
+You can also send in other arguments which can help shape the output, such as speaker diagrization (labelling different speakers) - to use such custom configurations create a [`RecognitionConfig`](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig) object.  This can be done via R lists which are converted to JSON via `library(jsonlite)` and an example is shown below:
+
+```r
+## Use a custom configuration
+my_config <- list(encoding = "LINEAR16",
+                  diarizationConfig = list(
+                    enableSpeakerDiarization = TRUE,
+                    minSpeakerCount = 2,
+                    maxSpeakCount = 3
+                  ))
+
+# languageCode is required, so will be added if not in your custom config
+gl_speech(my_audio, languageCode = "en-US", customConfig = my_config)
+```
+
 ## Asynchronous calls
 
 For speech files greater than 60 seconds of if you don't want your results straight away, set `asynch = TRUE` in the call to the API.

diff --git a/vignettes/speech.html b/vignettes/speech.html
@@ -12,9 +12,9 @@
 
 <meta name="author" content="Mark Edmondson" />
 
-<meta name="date" content="2020-04-16" />
+<meta name="date" content="2020-04-19" />
 
-<title>Google Cloud Speech API</title>
+<title>Google Cloud Speech-to-Text API</title>
 
 
 
@@ -299,14 +299,14 @@
 
 
 
-<h1 class="title toc-ignore">Google Cloud Speech API</h1>
+<h1 class="title toc-ignore">Google Cloud Speech-to-Text API</h1>
 <h4 class="author">Mark Edmondson</h4>
-<h4 class="date">2020-04-16</h4>
+<h4 class="date">2020-04-19</h4>
 
 
 
-<p>The Google Cloud Speech API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.</p>
-<p>Read more <a href="https://cloud.google.com/speech/">on the Google Cloud Speech Website</a></p>
+<p>The Google Cloud Speech-to-Text API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.</p>
+<p>Read more <a href="https://cloud.google.com/speech/">on the Google Cloud Speech-to-Text Website</a></p>
 <p>The Cloud Speech API provides audio transcription. Its accessible via the <code>gl_speech</code> function.</p>
 <p>Arguments include:</p>
 <ul>
@@ -337,8 +337,8 @@ <h3>Returned structure</h3>
 <a class="sourceLine" id="cb1-14" data-line-number="14"><span class="co">#4     0.700s  1.200s         to</span></a>
 <a class="sourceLine" id="cb1-15" data-line-number="15"><span class="co"># etc...</span></a></code></pre></div>
 </div>
-<div id="demo-for-google-cloud-speech-api" class="section level3">
-<h3>Demo for Google Cloud Speech API</h3>
+<div id="demo-for-google-cloud-speech-to-text-api" class="section level3">
+<h3>Demo for Google Cloud Speech-to-Text API</h3>
 <p>A test audio file is installed with the package which reads:</p>
 <blockquote>
 <p>“To administer medicine to animals is frequently a very difficult matter, and yet sometimes it’s necessary to do so”</p>
@@ -378,16 +378,30 @@ <h3>Word transcripts</h3>
 <a class="sourceLine" id="cb3-12" data-line-number="12"><span class="co">#4      0.700s  0.900s          A</span></a>
 <a class="sourceLine" id="cb3-13" data-line-number="13"><span class="co">#5      0.900s      1s      Dream</span></a></code></pre></div>
 </div>
+<div id="custom-configurations" class="section level2">
+<h2>Custom configurations</h2>
+<p>You can also send in other arguments which can help shape the output, such as speaker diagrization (labelling different speakers) - to use such custom configurations create a <a href="https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig"><code>RecognitionConfig</code></a> object. This can be done via R lists which are converted to JSON via <code>library(jsonlite)</code> and an example is shown below:</p>
+<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1"><span class="co">## Use a custom configuration</span></a>
+<a class="sourceLine" id="cb4-2" data-line-number="2">my_config &lt;-<span class="st"> </span><span class="kw">list</span>(<span class="dt">encoding =</span> <span class="st">&quot;LINEAR16&quot;</span>,</a>
+<a class="sourceLine" id="cb4-3" data-line-number="3">                  <span class="dt">diarizationConfig =</span> <span class="kw">list</span>(</a>
+<a class="sourceLine" id="cb4-4" data-line-number="4">                    <span class="dt">enableSpeakerDiarization =</span> <span class="ot">TRUE</span>,</a>
+<a class="sourceLine" id="cb4-5" data-line-number="5">                    <span class="dt">minSpeakerCount =</span> <span class="dv">2</span>,</a>
+<a class="sourceLine" id="cb4-6" data-line-number="6">                    <span class="dt">maxSpeakCount =</span> <span class="dv">3</span></a>
+<a class="sourceLine" id="cb4-7" data-line-number="7">                  ))</a>
+<a class="sourceLine" id="cb4-8" data-line-number="8"></a>
+<a class="sourceLine" id="cb4-9" data-line-number="9"><span class="co"># languageCode is required, so will be added if not in your custom config</span></a>
+<a class="sourceLine" id="cb4-10" data-line-number="10"><span class="kw">gl_speech</span>(my_audio, <span class="dt">languageCode =</span> <span class="st">&quot;en-US&quot;</span>, <span class="dt">customConfig =</span> my_config)</a></code></pre></div>
+</div>
 <div id="asynchronous-calls" class="section level2">
 <h2>Asynchronous calls</h2>
 <p>For speech files greater than 60 seconds of if you don’t want your results straight away, set <code>asynch = TRUE</code> in the call to the API.</p>
 <p>This will return an object of class <code>&quot;gl_speech_op&quot;</code> which should be used within the <code>gl_speech_op()</code> function to check the status of the task. If the task is finished, then it will return an object the same form as the non-asynchronous case.</p>
-<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1">async &lt;-<span class="st"> </span><span class="kw">gl_speech</span>(test_audio, <span class="dt">asynch =</span> <span class="ot">TRUE</span>)</a>
-<a class="sourceLine" id="cb4-2" data-line-number="2">async</a>
-<a class="sourceLine" id="cb4-3" data-line-number="3"><span class="co">## Send to gl_speech_op() for status</span></a>
-<a class="sourceLine" id="cb4-4" data-line-number="4"><span class="co">## 4625920921526393240</span></a>
-<a class="sourceLine" id="cb4-5" data-line-number="5"></a>
-<a class="sourceLine" id="cb4-6" data-line-number="6">result &lt;-<span class="st"> </span><span class="kw">gl_speech_op</span>(async)</a></code></pre></div>
+<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" data-line-number="1">async &lt;-<span class="st"> </span><span class="kw">gl_speech</span>(test_audio, <span class="dt">asynch =</span> <span class="ot">TRUE</span>)</a>
+<a class="sourceLine" id="cb5-2" data-line-number="2">async</a>
+<a class="sourceLine" id="cb5-3" data-line-number="3"><span class="co">## Send to gl_speech_op() for status</span></a>
+<a class="sourceLine" id="cb5-4" data-line-number="4"><span class="co">## 4625920921526393240</span></a>
+<a class="sourceLine" id="cb5-5" data-line-number="5"></a>
+<a class="sourceLine" id="cb5-6" data-line-number="6">result &lt;-<span class="st"> </span><span class="kw">gl_speech_op</span>(async)</a></code></pre></div>
 </div>
 
 

diff --git a/vignettes/text-to-speech.Rmd b/vignettes/text-to-speech.Rmd
@@ -80,6 +80,34 @@ gl_talk("Would you like a cup of tea?", gender = "FEMALE", languageCode = "en-GB
 
 Some languages are not yet supported, such as Danish.  The API will return an error in those cases. 
 
+## Support for SSML
+
+Support is also included for Speech Synthesis Markup Language (SSML) - more details on using this to insert pauses, sounds and breaks in your audio can be found here: `https://cloud.google.com/text-to-speech/docs/ssml`
+
+To use, send in your SSML markup around the text you want to talk and set `inputType= "ssml"`:
+
+```r
+# using SSML
+gl_talk('<speak>The <say-as interpret-as=\"characters\">SSML</say-as>
+  standard <break time=\"1s\"/>is defined by the
+  <sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>',
+  inputType =  "ssml")
+```
+
+## Effect Profiles
+
+You can output audio files that are optimised for playing on various devices. 
+
+To use audio profiles, supply a character vector of the available audio profiles listed here: `https://cloud.google.com/text-to-speech/docs/audio-profiles` - the audio profiles are applied in the order given. 
+
+For instance `effectsProfileIds="wearable-class-device"` will optimise output for smart watches, `effectsProfileIds=c("wearable-class-device","telephony-class-application")` will apply sound filters optimised for smart watches, then telephonic devices.
+
+```r
+# using effects profiles
+gl_talk("This sounds great on headphones",
+        effectsProfileIds = "headphone-class-device")
+```
+
 ## Browser Speech player
 
 Creating and clicking on the audio file to play it can be a bit of a drag, so you also have a function that will play the audio file for you, launching via the browser.  This can be piped via the tidyverse's `%>%`