Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove copyTo call for iGenomes README #896

Closed
drpatelh opened this issue Nov 8, 2022 · 4 comments
Closed

Remove copyTo call for iGenomes README #896

drpatelh opened this issue Nov 8, 2022 · 4 comments
Labels
bug Something isn't working
Milestone

Comments

@drpatelh
Copy link
Member

drpatelh commented Nov 8, 2022

Description of the bug

We need to remove / refactor this call because it raises an error when accessing buckets across regions with AWS

// Save AWS IGenomes file containing annotation version
def anno_readme = params.genomes[ params.genome ]?.readme
if (anno_readme && file(anno_readme).exists()) {
file("${params.outdir}/genome/").mkdirs()
file(anno_readme).copyTo("${params.outdir}/genome/")
}

image

Maybe we document this instead and provide a command that can be used to fetch the README if required via the aws cli

$ aws s3 cp --no-sign-request s3://ngi-igenomes/igenomes/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt .
download: s3://ngi-igenomes/igenomes/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt to ./README.txt

~ cat README.txt
The contents of the annotation directories were downloaded from Ensembl on: July 17, 2015.

Gene annotation files were downloaded from Ensembl release 75. SmallRNA annotation files were downloaded from miRBase release 21.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@robsyme
Copy link
Contributor

robsyme commented Nov 13, 2022

I mid-term solution might be to improve copyTo to detect bucket regions and in cases where regions don't match, download and then re-upload files.

The copyTo method calls down to FileHelper::copyPath and then (on S3, at least) to S3FileSystemProvider::copy.

To the copy method linked above, I was hoping that we could add to something like

AmazonS3Client client = s3Source.getFileSystem().getClient();

String sourceBucketLocation = client.getBucketLocation(s3Source.getBucket());
String targetBucketLocation = client.getBucketLocation(s3Target.getBucket());

if(sourceBucketLocation != targetBucketLocation) {
  Path staged = <some temporary destination in work/stage>
  download(source, staged)
  upload(staged, target)
} else {
  <the other tests already in the copy method>
}

My only hesitation is that the AWS documentation about the GetBucketLocation method suggests that to use the API, "you must be the bucket owner". I'm unsure why knowing the bucket location should be a secrete available only to the bucket owner though.

We can verify this restriction with some testing of iGenomes data though.

@drpatelh
Copy link
Member Author

Not sure it is worth all of this hassle to get a single file from an AWS bucket? Just means we can keep things simple and less prone to breakage in the pipeline trying to accommodate downloading the file. Docs is definitely the way ahead and we can extend on the commands I posted in the first comment and add directly to the website in the section below:
https://github.com/nf-core/nf-co.re/blob/master/markdown/usage/reference_genomes.md#illumina-igenomes

This would allow us to have a central location that we can point to without needing to add to every pipeline.

@drpatelh drpatelh added this to the 3.10 milestone Dec 12, 2022
drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue Dec 18, 2022
@drpatelh
Copy link
Member Author

Fixed in #908

@drpatelh
Copy link
Member Author

Docs added to main website here and available below:
https://nf-co.re/docs/usage/reference_genomes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants