Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion of mm10 JASAPR prediction track into LOLA JASPAR #28

Open
holgerbrandl opened this issue Oct 16, 2018 · 3 comments
Open

Inclusion of mm10 JASAPR prediction track into LOLA JASPAR #28

holgerbrandl opened this issue Oct 16, 2018 · 3 comments

Comments

@holgerbrandl
Copy link

holgerbrandl commented Oct 16, 2018

Would it be possible to integrate the recently published JSAPR binding prediction for mm10 into Lola Jaspar?
See http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/mm10/ for the data ,http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/ for the method details, and also http://jaspar.genereg.net/genome-tracks/ for a general overview of the new track feature of JASPAR.

Since it's already a region dataset (bed-file) a conversion into the LOLA db format may be straightforward.

@holgerbrandl
Copy link
Author

Any news? both positive or negative would be helpful to me.

@nsheff
Copy link
Owner

nsheff commented Oct 24, 2018

I am looking into it now.

@nsheff
Copy link
Owner

nsheff commented Oct 26, 2018

Hi @holgerbrandl,
I've looked into it; there are 9 billion regions in that file. I'm not sure LOLA will be able to do that in its entirety. Do you have any ideas for reducing that?

If you want you can give it a try. This code will split the file into individual bed files for each factor:

wget http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/mm10/JASPAR2018_mm10_all_chr.bed.gz
mkdir -p mm10/jaspar2018/regions
time zcat JASPAR2018_mm10_all_chr.bed.gz | sed s/[:\.\(\)]/_/g | sed s/__/_/g | awk '{print $_ > "mm10/jaspar2018/regions/"$4".bed"}'

LOLA should be able to load them (see here: http://databio.org/regiondb).
I'm trying this now. But I think if you don't have a lot of memory, that's probably going to be problematic...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants