Access Google Cloud Storage via NIO #6775

fm3 · 2023-01-23T15:28:38Z

Adds support for remote datasets hosted on Google Cloud Storage gs:// with optional GoogleServiceAccountCredentials
Refactor NIO usage: do not get file system using NIO lookup by schema, but handle this explicitly. This results in a lot less magic and more direct and typesafe passing of different kinds of credentials to the FileSystems. Also no need for the META-INF files anymore 🎉
Note: the paths passed to the FileSystem.getPath are no longer full URIs but only paths inside of the bucket scope. The GCS one requires this, and the others support it as well. This has to be converted back to URIs when passing it to MagLocator
Slightly refactor handling of credentials, rename some fields.
Use asynchronous caching for file system creation to avoid duplication due to parallel requests
try to gunzip data returned by NIO file system (I was surprised to randomly receive some gzipped data for a gcs bucket)

URL of deployed dev instance (used for testing):

https://googlecloud.webknossos.xyz

TODO

access GCS data anonymously
access GCS data with service account credential json
integrate GCS file system into managed file systems creation
unify file system handling
clean up credential naming, enums
re-check that s3 and https credentials and uri styles still work (rework so that paths never contain url schema and host/bucket?)
add route to create GCS credential
adapt front-end for uploading service account credential json
remove fast-start changes to application.conf

Steps to test:

Front-end is not ready yet, but this feature can already be tested by pasting a gs:// zarr uri and ignoring the front-ends warnings
Adding credential is also possible by using whichever secret field (and entering a non-empty arbitrary string into the left username/keyid field)
also test that s3 and https datasets can still be explored and viewed, with and without credentials.

Issues:

fixes Support Zarr/N5 streaming from Google Cloud Storage #6626

Updated (unreleased) changelog
Needs datastore update after deployment
Ready for review

fm3 · 2023-02-01T09:57:54Z

frontend/javascripts/admin/dataset/dataset_add_zarr_view.tsx

@@ -311,7 +311,6 @@ function AddZarrLayer({
      (userInput.indexOf("https://") !== 0 && userInput.indexOf("s3://") === 0)
    ) {
      setSelectedProtocol(userInput.indexOf("https://") === 0 ? "https" : "s3");
-      setShowCredentialsFields(userInput.indexOf("s3://") === 0);


note that all supported schemas now support their form of credentials so this is no longer needed

fm3 · 2023-02-01T10:30:57Z

@frcroth I’d appreciate it if you could already have a look at the backend changes even though the frontend part is not yet complete. @philippotto agreed to adapt the front-end in the coming days.

...sos-datastore/app/com/scalableminds/webknossos/datastore/storage/FileSystemCredentials.scala

app/models/binary/explore/ExploreRemoteLayerService.scala

util/src/main/scala/com/scalableminds/util/io/ZipIO.scala

webknossos-datastore/app/com/scalableminds/webknossos/datastore/s3fs/S3FileSystemProvider.java

…to google-cloud

…en clicking Reset

philippotto · 2023-02-03T08:58:27Z

@philippotto Great, thanks! Yes, the json is sent correctly :) Compare https://scm.slack.com/archives/C5AKLAV0B/p1675339672375929?thread_ts=1674548652.080749&cid=C5AKLAV0B for access to example data with credentials.

Perfect 👍 Exploration works, but the data fetch requests don't really work for me. The front-end waits forever since the requests don't finish (until they'll probably time out). The console says:

java.lang.NoClassDefFoundError: Could not initialize class org.blosc.IBloscDll
        at com.scalableminds.webknossos.datastore.datareaders.BloscCompressor.cbufferSizes(Compressor.scala:254)
        at com.scalableminds.webknossos.datastore.datareaders.BloscCompressor.uncompress(Compressor.scala:240)
        at com.scalableminds.webknossos.datastore.datareaders.ChunkReader.$anonfun$readBytes$2(ChunkReader.scala:38)
        at scala.Option.map(Option.scala:230)
        at com.scalableminds.webknossos.datastore.datareaders.ChunkReader.$anonfun$readBytes$1(ChunkReader.scala:35)
        at scala.util.Using$Manager.scala$util$Using$Manager$$manage(Using.scala:171)
        at scala.util.Using$Manager$.$anonfun$apply$2(Using.scala:225)
        at scala.util.Try$.apply(Try.scala:213)
        at scala.util.Using$Manager$.apply(Using.scala:225)
        at com.scalableminds.webknossos.datastore.datareaders.ChunkReader.readBytes(ChunkReader.scala:34)
        at com.scalableminds.webknossos.datastore.datareaders.ChunkReader.read(ChunkReader.scala:31)
        at com.scalableminds.webknossos.datastore.datareaders.DatasetArray.$anonfun$getSourceChunkDataWithCache$1(DatasetArray.scala:96)
        at akka.http.caching.LfuCache$.$anonfun$toJavaMappingFunction$2(LfuCache.scala:97)
        at scala.compat.java8.functionConverterImpls.AsJavaBiFunction.apply(FunctionConverters.scala:41)
        at com.github.benmanes.caffeine.cache.LocalAsyncCache.lambda$get$2(LocalAsyncCache.java:92)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2413)
        at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2411)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2394)
        at com.github.benmanes.caffeine.cache.LocalAsyncCache.get(LocalAsyncCache.java:91)
        at com.github.benmanes.caffeine.cache.LocalAsyncCache.get(LocalAsyncCache.java:82)
        at akka.http.caching.LfuCache.getOrLoad(LfuCache.scala:126)
        at com.scalableminds.webknossos.datastore.datareaders.DatasetArray.getSourceChunkDataWithCache(DatasetArray.scala:96)
        at com.scalableminds.webknossos.datastore.datareaders.DatasetArray.$anonfun$readAsFortranOrder$4(DatasetArray.scala:77)
        at com.scalableminds.util.tools.Fox$.runNext$3(Fox.scala:131)
        at com.scalableminds.util.tools.Fox$.serialCombined(Fox.scala:137)
        at com.scalableminds.webknossos.datastore.datareaders.DatasetArray.readAsFortranOrder(DatasetArray.scala:75)
        at com.scalableminds.webknossos.datastore.datareaders.DatasetArray.readBytes(DatasetArray.scala:54)
        at com.scalableminds.webknossos.datastore.datareaders.DatasetArray.readBytesXYZ(DatasetArray.scala:46)
        at com.scalableminds.webknossos.datastore.dataformats.zarr.ZarrCubeHandle.cutOutBucket(ZarrBucketProvider.scala:23)
        at com.scalableminds.webknossos.datastore.dataformats.BucketProvider.$anonfun$load$2(BucketProvider.scala:23)
        at com.scalableminds.webknossos.datastore.storage.DataCubeCache.$anonfun$withCache$5(DataCubeCache.scala:76)
        at com.scalableminds.util.tools.Fox.$anonfun$flatMap$1(Fox.scala:259)
        at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
        at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

Is there something wrong with my setup? I did a ./clean after checking out this branch..

The label of the dropzone says „(Optional)“, which seems a bit redundant since above there is already the radio selection between anonymous and with credential. Maybe this could be removed (or remove the radio selection and make the auth fields optional for all cases?)

Done 👍

If I press reset after a dataset has been recovered, the status which protocol was selected seems to be lost. The gs uri is still in the field, but the auth fields are showing the basicauth case. Not super important, since the reset button will probably be used rarely shrug

Good point. I adapted the reset button to also reset the original input field.

fm3 · 2023-02-03T09:02:26Z

Thanks!

The error reads to me as if blosc is not installed. That is a data (de)compression library. Does loading other blosc-compressed zarr datasets work for you? Could you try apt install libblosc1? Compare https://github.com/scalableminds/webknossos/blob/master/DEV_INSTALL.md

…to google-cloud

frontend/javascripts/admin/dataset/dataset_add_neuroglancer_view.tsx

daniel-wer

Frontend code almost LGTM. I'll try to test and will report back.

CHANGELOG.unreleased.md

daniel-wer · 2023-02-06T10:35:38Z

frontend/javascripts/admin/dataset/dataset_add_neuroglancer_view.tsx

+  const jsonString = await readFileAsText(file);
+  return JSON.parse(jsonString);


Should this be guarded somehow? The page shouldn't crash if a file with the wrong format was uploaded.

I tested this case. Currently, the page doesn't crash, but also nothing happens and no error is shown.

Should give a proper error msg now :)

daniel-wer · 2023-02-06T10:46:19Z

frontend/javascripts/admin/dataset/dataset_add_zarr_view.tsx

+          if (credentials) {
+            return exploreRemoteDataset([datasourceUrl], {
+              username: "",
+              pass: JSON.stringify(credentials),


I tried to find out which type credentials has, but it's not strictly specified in wk. The only thing I found was Record<string, any> in the NeuroglancerDatasetConfig. Is there a more specific definition of the credential type? And is stringifying and passing all of it as the password here, correct?

Ah I found the link to the documentation which shows what the credentials file looks like (https://cloud.google.com/iam/docs/creating-managing-service-account-keys?hl=de#creating). In that case, there doesn't need to be a more specific type and it also makes sense to pass all of it as pass here, so nevermind :)

Co-authored-by: Daniel <daniel.werner@scalableminds.com>

daniel-wer · 2023-02-06T11:00:57Z

Works very nicely 👍

Notes from testing:

The "Add Remote Zarr / N5 Dataset" page should list which storage types are supported: https, s3, gcs afaik. There is a validation error if a non-supported url is pasted, but it would be nice to find out, before.
- There is a typo on the page: "segmentattion" with double t

Not from this PR, but I really like the error log, if the import doesn't work! 🥇

…to google-cloud

philippotto · 2023-02-06T12:50:12Z

[x] The "Add Remote Zarr / N5 Dataset" page should list which storage types are supported: https, s3, gcs afaik. There is a validation error if a non-supported url is pasted, but it would be nice to find out, before.

There is a typo on the page: "segmentattion" with double t

Done :)

…_editable_text_style * 'master' of github.com:scalableminds/webknossos: Fix error message when trying to join an orga you are already in (#6824) Access Google Cloud Storage via NIO (#6775)

…a_owner * 'master' of github.com:scalableminds/webknossos: Fix error message when trying to join an orga you are already in (#6824) Access Google Cloud Storage via NIO (#6775)

[WIP] Access Google Cloud Storage via NIO

c02b285

fm3 self-assigned this Jan 23, 2023

fm3 added 13 commits January 23, 2023 16:32

info request

3ff126c

storage options

38e6f9e

use credentials, try without gzip

e772e06

WIP: streamline credential passing

583ac2e

wip: create gs file systems on demand

9b20a1c

compile

36570e3

bucket name

ff81f85

handling of path vs uri for gcs case

3b4b400

remove unused code

0a562be

cleanup file system instantiation

07f7d28

Merge branch 'master' into google-cloud

0863ba7

allow legacy credentials, rename things

c94a4c1

Merge branch 'master' into google-cloud

f5417ef

fm3 commented Feb 1, 2023

View reviewed changes

fm3 added 2 commits February 1, 2023 11:09

cache credentials, use alfu cache for file systems

caa7f1f

re-add gunzip

847c3ff

fm3 marked this pull request as ready for review February 1, 2023 10:28

fm3 requested a review from frcroth February 1, 2023 10:30

fm3 added backend frontend new feature refactoring labels Feb 1, 2023

frcroth reviewed Feb 2, 2023

View reviewed changes

fm3 and others added 4 commits February 2, 2023 14:56

rename credential type enum values

4959e1d

Add create route for google service account credentials

9d1abea

Add create route for google service account credentials

7856982

Merge branch 'google-cloud' of github.com:scalableminds/webknossos in…

65b3959

…to google-cloud

philippotto added 2 commits February 3, 2023 09:51

hide Optional hint in add-zarr view; also reset original input url wh…

e27e750

…en clicking Reset

format backend

9dd28c0

fm3 and others added 4 commits February 6, 2023 09:51

pr feedback part 2: messages

2def32b

Merge branch 'google-cloud' of github.com:scalableminds/webknossos in…

3004be0

…to google-cloud

changelog

ae88223

Merge branch 'master' into google-cloud

6f2553b

fm3 requested a review from frcroth February 6, 2023 09:01

philippotto reviewed Feb 6, 2023

View reviewed changes

frontend/javascripts/admin/dataset/dataset_add_neuroglancer_view.tsx Outdated Show resolved Hide resolved

philippotto requested a review from daniel-wer February 6, 2023 10:19

daniel-wer reviewed Feb 6, 2023

View reviewed changes

philippotto and others added 2 commits February 6, 2023 11:51

Update CHANGELOG.unreleased.md

795d07c

Co-authored-by: Daniel <daniel.werner@scalableminds.com>

catch parsing of invalid json

82e4af2

Merge branch 'google-cloud' of github.com:scalableminds/webknossos in…

3945a9c

…to google-cloud

fm3 mentioned this pull request Feb 6, 2023

Explore and view Neuroglancer Precomputed image volumes #6716

Merged

3 tasks

philippotto added 4 commits February 6, 2023 13:39

lint

cdab9ba

fix typo

0fdf55b

adapt docs

163f824

update and mention supported protocols in UI

11706e5

frcroth approved these changes Feb 6, 2023

View reviewed changes

daniel-wer approved these changes Feb 6, 2023

View reviewed changes

Merge branch 'master' into google-cloud

e12bfca

fm3 enabled auto-merge (squash) February 7, 2023 08:59

fm3 merged commit a185e65 into master Feb 7, 2023

fm3 deleted the google-cloud branch February 7, 2023 09:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access Google Cloud Storage via NIO #6775

Access Google Cloud Storage via NIO #6775

fm3 commented Jan 23, 2023 •

edited by daniel-wer

fm3 Feb 1, 2023

fm3 commented Feb 1, 2023

philippotto commented Feb 3, 2023

fm3 commented Feb 3, 2023

daniel-wer left a comment

daniel-wer Feb 6, 2023

daniel-wer Feb 6, 2023

philippotto Feb 6, 2023

daniel-wer Feb 6, 2023

daniel-wer Feb 6, 2023

daniel-wer commented Feb 6, 2023 •

edited by philippotto

philippotto commented Feb 6, 2023

		const jsonString = await readFileAsText(file);
		return JSON.parse(jsonString);

Access Google Cloud Storage via NIO #6775

Access Google Cloud Storage via NIO #6775

Conversation

fm3 commented Jan 23, 2023 • edited by daniel-wer

URL of deployed dev instance (used for testing):

TODO

Steps to test:

Issues:

fm3 Feb 1, 2023

Choose a reason for hiding this comment

fm3 commented Feb 1, 2023

philippotto commented Feb 3, 2023

fm3 commented Feb 3, 2023

daniel-wer left a comment

Choose a reason for hiding this comment

daniel-wer Feb 6, 2023

Choose a reason for hiding this comment

daniel-wer Feb 6, 2023

Choose a reason for hiding this comment

philippotto Feb 6, 2023

Choose a reason for hiding this comment

daniel-wer Feb 6, 2023

Choose a reason for hiding this comment

daniel-wer Feb 6, 2023

Choose a reason for hiding this comment

daniel-wer commented Feb 6, 2023 • edited by philippotto

philippotto commented Feb 6, 2023

fm3 commented Jan 23, 2023 •

edited by daniel-wer

daniel-wer commented Feb 6, 2023 •

edited by philippotto