Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UKB storage to upload processed bulk data #11

Closed
sina-mansour opened this issue Nov 24, 2021 · 3 comments
Closed

UKB storage to upload processed bulk data #11

sina-mansour opened this issue Nov 24, 2021 · 3 comments
Labels
Awaiting discussion This issue is under discussion.

Comments

@sina-mansour
Copy link
Owner

Following on the suggestion by @Lestropie (this commit):

RS: As per discussion, need to find out how much data can be uploaded per subject to UKB
(and indeed what volume of data could potentially be hosted elsewhere). Any temporaries that
are not to be later hosted anywhere are better off being stored on a RAM file system.
My typical approach here is to load all input data into a scratch directory that I can force
to be in /tmp/, store all intermediate files and final outputs there, and only upon script
completion do I then write the desired derivatives to the location requested by the user.
I then only retain the scratch directory if the user explicitly requests that it be retained.
Your structure here checks for the pre-existence of calculated files, which is useful when you
are testing perturbations to the script, but for final deployment this ability is not as high
a priority.

@sina-mansour
Copy link
Owner Author

We're currently storing all intermediary files on the scratch file system. The following processed data are stored/will be stored to be shared with the public:

  • Native atlases:
    • Surface atlases registered to the native space are available as nifti volumetric hard parcellations and will be released as such
    • These atlases include 20 from Schaefer et al. as well as the HCP MMP1.0 atlas
  • Functional time-series:
    • We have provide the resting state time-series for all of the atlases (we decided to provide the time-series rather than correlation connectivity measure as this would increase the possible usecases)
    • We have also provided a global signal time-series for studies aiming to apply global signal regression
  • Structural connectivity:
    • We will provide high-resolution endpoints in native/MNI for all mapped streamlines (only the ends of tractograms, to reduce size)
    • We will also provide the following a wide range of connectivity measures (streamline count, FBC, density, length, etc.) mapped to different atlases.

We'll need to ensure that we can somehow upload three sets of bulk data for every individual back to the UKB storage:

  1. atlases
  2. functional time-series
  3. structural connectivity measures

@caioseguin would you be able to enquire from UKbiobank to see if they will accept that and whether there are certain limits that we need to adhere to?

@sina-mansour sina-mansour added the Awaiting discussion This issue is under discussion. label Nov 24, 2021
@caioseguin
Copy link
Collaborator

I will ask them and get back to you.

@sina-mansour
Copy link
Owner Author

This issue has been left dormant for a while. The last update is that we were able to return the results to the UK biobank over a secure sFTP connection (using MediaFlux).

UKB has informed us that the resource should be made available in a new release (planned for November 2023).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting discussion This issue is under discussion.
Projects
None yet
Development

No branches or pull requests

2 participants