Create worker code from module instead of from string #563

jeshua-clipchamp · 2022-08-31T03:44:22Z

What's the problem?

In Timeout.ts a worker is loaded from a hard-coded string within the file. This doesn't play nicely with the Content-Security-Policy, where if you allow creating workers from strings you have to let anyone create a worker from a string (as far as I could tell there is no way to limit it to specific libraries, worker code with specific hashes, from files with specific hashes etc.). This is a pretty big security hole for something which is relatively unimportant (my understanding of this worker is to provide accurate timing if the user changes tabs, is that correct?).

We are currently working around this by patching the source code on checkout to just use the standard setTimeout:

diff --git a/distrib/es2015/src/common.speech/ServiceRecognizerBase.js b/distrib/es2015/src/common.speech/ServiceRecognizerBase.js
index 2fba3f3f26e134e720c78e9a692ff84c4e0d3898..1865fcc0aaf20768d85877b3dae50901b3a394ec 100644
--- a/distrib/es2015/src/common.speech/ServiceRecognizerBase.js
+++ b/distrib/es2015/src/common.speech/ServiceRecognizerBase.js
@@ -557,7 +557,7 @@ export class ServiceRecognizerBase {
     }
     delay(delayMs) {
         return new Promise((resolve, reject) => {
-            this.privSetTimeout(resolve, delayMs);
+            setTimeout(resolve, delayMs);
         });
     }
     writeBufferToConsole(buffer) {

Possible solutions

Move the worker code into a file which can be compiled/bundled along with everything else (then worker-src 'self' should work)
Add an option to not use the worker (so we don't have to patch things)

The text was updated successfully, but these errors were encountered:

glharper · 2022-09-05T16:26:41Z

@jeshua-clipchamp, Thank you for using Speech SDK, and writing up this issue. This admittedly obtuse method of setting timeouts is a workaround for allowing continued recognition when the browser is minimized. (See #74 for more information.)

Regressing this behavior is not tenable, obviously, and I can't make any promises yet, but adding an option not to use the worker seems doable, as long as the browser-minimized regression is not something you care about when that option is set. I'll update this issue when I get a chance to investigate further.

glharper · 2022-09-08T18:27:48Z

@jeshua-clipchamp Having investigated this issue, I'm curious why the worker in PCMRecorder (created from a string) is acceptable, but the timeout worker is not.

I'm also curious about the issue you're describing with Content-Security-Policy, as you can specify the allowed URL of any workers using the worker-src tag, described here.

Would you mind elaborating a bit on why worker-src doesn't work for your use case?

glharper · 2022-09-16T13:51:16Z

Closing this on lack of response to follow-up questions, may re-open on response.

jeshua-clipchamp · 2022-09-28T01:49:52Z

Whoops, sorry about the lack of response Github decided to not notify me about your replies 😅

For your first question, I assume that PCMRecorder is OK because we don't use the record functionality (we are pulling out audio from an existing video file and sending that directly rather than captioning as it is spoken). As long as it never gets executed, I guess CSP is totally find 😄

For your second point, worker-src lets you specify a domain where worker code can be loaded from, but because these files are loaded from a string it can't figure out that the code for the string came from the same source file (I guess they account for the case where you manually fetch the code and then create a worker via string rather than creating it from a URL). From my testing the only way to get it loading is to allow any blob through (by specifying worker-src blob:), but there is no way to specify exactly where the blob is allowed to come from.

Having the worker code as a separate file and loading it from that should be enough to fix this (because then we could do worker-src self and it should work), not sure how to do that with your package/deployment though 🙂

glharper · 2022-09-29T23:42:13Z

@jeshua-clipchamp When looking at this before, using a data URL seemed to be a way forward that works with our current build processes. See this branch's Timeout.ts for an example of loading a base64 string as a worker. Let me know your thoughts.

jeshua-clipchamp · 2022-10-04T00:27:58Z

I suspect that would have the same kind of problem (it's still loading a worker from a string after all). Another option might be to allow users to supply the worker themselves, then we could do some shenanigans to build the worker from a file or something?

glharper · 2022-10-04T11:47:12Z

I suspect that would have the same kind of problem (it's still loading a worker from a string after all).

@jeshua-clipchamp From Using Web Workers:
"To specify a content security policy for the worker, set a Content-Security-Policy response header for the request which delivered the worker script itself.

The exception to this is if the worker script's origin is a globally unique identifier (for example, if its URL has a scheme of data or blob). In this case, the worker does inherit the CSP of the document or worker that created it."

jeshua-clipchamp · 2022-10-06T03:31:11Z

Right, but this means you have to include blob: as a scheme for loading workers but there is no way to restrict where this blob comes from (or what the sha of its contents is), meaning that if we allow workers to be made from blobs everything would be able to make workers from blobs (rather than what we have now which is workers can only be created from self).

glharper · 2022-10-06T15:55:23Z

@jeshua-clipchamp In the upcoming release, the code will be included as a data url, as shown in the merged PR above.

jeshua-clipchamp · 2022-10-09T23:22:14Z

Hmmm I'll give that a go when it comes out, thanks for the fix :) I'm interested as to whether you can only allow data URLs from specific domains with CSP but I guess we'll see :D

glharper · 2022-10-17T14:48:39Z

@jeshua-clipchamp JS Speech SDK v1.24 has been released, with this fix included. Thanks again for submitting this issue!

ishowta · 2023-02-13T03:25:15Z

I'm not sure about security, but isn't there any difference between blobs and data other than the encoding method?

me4502 · 2023-06-14T00:45:57Z

From what I can tell the provided fix might not resolve the original concern, as the data: scheme in CSP does not allow any further specificity (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/Sources#sources). This means by enabling it we'd be allowing any data URLs to be used as the source for a worker, which is still a fairly large security nightmare.

Would it be possible to please get an option to disable this worker in high-security environments, with the caveat that accurate timing information will be lost when the tab is minimised? For our specific use-case we're already patching this out and just using a standard browser setTimeout so it would not be a regression for us, and I imagine there are others who would take this trade-off over the added security implications

glharper · 2023-06-14T11:53:34Z

@me4502 Thanks for looking at and thinking about this issue. Can the "hash-algorithm"-"base64-value" attribute not being applied to this worker?

me4502 · 2023-06-19T00:39:17Z

After testing that, it does appear to work. Thanks for pointing that one out, I was just looking at the data: scheme.

My only concern with this is that when the library updates it's an extra step to ensure that the base64'd hash is updated alongside it, but that's not too major an issue.

me4502 · 2023-06-19T05:48:21Z

Actually, we've found that while the hash I setup works on macOS (the OS I initially tested on), the value appears to differ on Windows. Unsure if this is something encoding related, but given it appears to differ here there's a possibility it'll differ further based on other factors. Due to this, this doesn't seem to be a viable fix

glharper self-assigned this Sep 5, 2022

glharper added question Further information is requested pending close Ready for closure pending follow-up or prolonged inactivity labels Sep 8, 2022

glharper closed this as completed Sep 16, 2022

glharper reopened this Sep 29, 2022

glharper mentioned this issue Sep 29, 2022

Create audio worklet blob on-demand #574

Merged

glharper removed the pending close Ready for closure pending follow-up or prolonged inactivity label Sep 29, 2022

glharper mentioned this issue Sep 29, 2022

use base64 encoded data URL for worker load #577

Merged

glharper removed the question Further information is requested label Oct 5, 2022

glharper closed this as completed Oct 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create worker code from module instead of from string #563

Create worker code from module instead of from string #563

jeshua-clipchamp commented Aug 31, 2022

glharper commented Sep 5, 2022

glharper commented Sep 8, 2022

glharper commented Sep 16, 2022

jeshua-clipchamp commented Sep 28, 2022

glharper commented Sep 29, 2022

jeshua-clipchamp commented Oct 4, 2022

glharper commented Oct 4, 2022 •

edited

Loading

jeshua-clipchamp commented Oct 6, 2022

glharper commented Oct 6, 2022

jeshua-clipchamp commented Oct 9, 2022

glharper commented Oct 17, 2022

ishowta commented Feb 13, 2023

me4502 commented Jun 14, 2023

glharper commented Jun 14, 2023 •

edited

Loading

me4502 commented Jun 19, 2023

me4502 commented Jun 19, 2023 •

edited

Loading

Create worker code from module instead of from string #563

Create worker code from module instead of from string #563

Comments

jeshua-clipchamp commented Aug 31, 2022

What's the problem?

Possible solutions

glharper commented Sep 5, 2022

glharper commented Sep 8, 2022

glharper commented Sep 16, 2022

jeshua-clipchamp commented Sep 28, 2022

glharper commented Sep 29, 2022

jeshua-clipchamp commented Oct 4, 2022

glharper commented Oct 4, 2022 • edited Loading

jeshua-clipchamp commented Oct 6, 2022

glharper commented Oct 6, 2022

jeshua-clipchamp commented Oct 9, 2022

glharper commented Oct 17, 2022

ishowta commented Feb 13, 2023

me4502 commented Jun 14, 2023

glharper commented Jun 14, 2023 • edited Loading

me4502 commented Jun 19, 2023

me4502 commented Jun 19, 2023 • edited Loading

glharper commented Oct 4, 2022 •

edited

Loading

glharper commented Jun 14, 2023 •

edited

Loading

me4502 commented Jun 19, 2023 •

edited

Loading