Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update blacklist files #250

Closed
wants to merge 2 commits into from
Closed

Conversation

JoseEspinosa
Copy link
Member

Update blacklists when new versions are available as discussed on #228
The previous versions of blacklists have been moved to ./assets/blacklists/v1.0 folder.
The files in assets/blacklists/v2.0 were obtained from the The ENCODE Blacklist Github repo.
The file in ./assets/blacklists/v3.0 was downloaded from: https://sites.google.com/site/anshulkundaje/projects/blacklists
The new structure of the blacklist folder is:

.
├── v1.0
│   ├── GRCh37-blacklist.bed
│   ├── GRCm38-blacklist.bed
│   ├── hg19-blacklist.bed
│   ├── hg38-blacklist.bed
│   └── mm10-blacklist.bed
├── v2.0
│   ├── ce10-blacklist.v2.bed
│   ├── dm6-blacklist.v2.bed
│   ├── hg19-blacklist.v2.bed
│   ├── hg38-blacklist.v2.bed
│   └── mm10-blacklist.v2.bed
└── v3.0
    └── GRCh38_unified_blacklist.v3.bed

Closes #228

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/chipseq branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Copy link
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question but did you make sure the chromosomes match between the assembly genome fasta and the bed files? The files are generally provided relative to the UCSC whereas we use ENSEMBL in most places.

@JoseEspinosa
Copy link
Member Author

No I didn't 😭
The only thing to do is to replace the chromosome identifiers or there is something else?

@drpatelh
Copy link
Member

drpatelh commented Mar 2, 2022

Documented commands for how we download the blacklists:

cd ..
mkdir -p v1.0
cd v1.0
wget -L https://www.encodeproject.org/files/ENCFF001TDO/@@download/ENCFF001TDO.bed.gz && gunzip ENCFF001TDO.bed.gz && mv ENCFF001TDO.bed hg19-blacklist.v1.bed

mkdir -p assets/blacklists/v2.0/
cd assets/blacklists/v2.0/
wget -L https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/ce10-blacklist.v2.bed.gz && gunzip ce10-blacklist.v2.bed.gz
wget -L https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/ce11-blacklist.v2.bed.gz && gunzip ce11-blacklist.v2.bed.gz
wget -L https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/dm3-blacklist.v2.bed.gz && gunzip dm3-blacklist.v2.bed.gz
wget -L https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/dm6-blacklist.v2.bed.gz && gunzip dm6-blacklist.v2.bed.gz
wget -L https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/hg19-blacklist.v2.bed.gz && gunzip hg19-blacklist.v2.bed.gz
wget -L https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/hg38-blacklist.v2.bed.gz && gunzip hg38-blacklist.v2.bed.gz
wget -L https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/mm10-blacklist.v2.bed.gz && gunzip mm10-blacklist.v2.bed.gz

cd ..
mkdir -p v3.0
cd v3.0
wget -L https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz && gunzip ENCFF356LFX.bed.gz && mv ENCFF356LFX.bed hg38-blacklist.v3.bed

TODO

  • v1.0 - hg19-blacklist.v1.bed -> GRCh37-blacklist.v1.bed (rename original version for GRCh37)
  • v2.0 - mm10-blacklist.v2.bed -> GRCm38-blacklist.v2.bed
  • v3.0 - hg38-blacklist.v3.bed -> GRCh38-blacklist.v3.bed

@drpatelh drpatelh mentioned this pull request Mar 3, 2022
@JoseEspinosa
Copy link
Member Author

JoseEspinosa commented Mar 5, 2022

Closed in favor of #255 and #257

@JoseEspinosa JoseEspinosa deleted the update_blacklst branch April 20, 2022 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants