Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running pipeline offline in trusted research environemnt #523

Open
chrisodhams opened this issue Feb 29, 2024 · 1 comment
Open

Running pipeline offline in trusted research environemnt #523

chrisodhams opened this issue Feb 29, 2024 · 1 comment

Comments

@chrisodhams
Copy link

Hi,

Following up on a previous issue: 319

I am attempting to run the pipeline in a secure research environment where outbound internet access is prohibited for patient confidentiality. "The RE is a secure and controlled environment. This means that there is no internet access from the RE and data contained within the RE cannot be exported in its raw form. This policy is to protect the privacy our participants who have generously donated their genomes and clinical history for research. It is your responsibility to comply with the terms of use."

This has become common place to work in airlocked environments.

The pipeline seems to require outbound internet access at certain stages, for example:

AberrantSplicing_pipeline_Counting_01_1_countRNA_splitReads_samplewise_R
Load packages
Loading required package: rtracklayer
Thu Feb 29 16:30:09 2024: Count split reads for sample: HG00096
Error in download.file(url, destfile, quiet = TRUE) : 
  cannot open URL 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz'
Calls: countSplitReads ... fetch_table_from_UCSC_database -> fetch_table_from_url
Execution halted

On this occasion it is simply downloading a static file 'chromInfo.txt.gz'. If this is the case, can this file (and for other assemblies too) just be part of the standard resources and the code point to the file?

Is there anywhere else where outbound internet is required? If so this is a unfortunately a blocker at Genomics England.

Many thanks,

Chris

@vyepez88
Copy link
Collaborator

vyepez88 commented Apr 8, 2024

Hi Chris, yes, we're aware of that problem. It should only happen if the data is unstranded though.
That is the only step of the whole pipeline that requires a connection to the internet.
We're trying to come up with a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants