You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @holgerbrandl,
I've looked into it; there are 9 billion regions in that file. I'm not sure LOLA will be able to do that in its entirety. Do you have any ideas for reducing that?
If you want you can give it a try. This code will split the file into individual bed files for each factor:
wget http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/mm10/JASPAR2018_mm10_all_chr.bed.gz
mkdir -p mm10/jaspar2018/regions
time zcat JASPAR2018_mm10_all_chr.bed.gz | sed s/[:\.\(\)]/_/g | sed s/__/_/g | awk '{print $_ > "mm10/jaspar2018/regions/"$4".bed"}'
LOLA should be able to load them (see here: http://databio.org/regiondb).
I'm trying this now. But I think if you don't have a lot of memory, that's probably going to be problematic...
Would it be possible to integrate the recently published JSAPR binding prediction for mm10 into Lola Jaspar?
See http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/mm10/ for the data ,http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/ for the method details, and also http://jaspar.genereg.net/genome-tracks/ for a general overview of the new track feature of JASPAR.
Since it's already a region dataset (bed-file) a conversion into the LOLA db format may be straightforward.
The text was updated successfully, but these errors were encountered: