Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in missing cellType/antibody entries for encodeTFBSmm10 #25

Closed
oneillkza opened this issue Nov 15, 2017 · 6 comments
Closed

Add in missing cellType/antibody entries for encodeTFBSmm10 #25

oneillkza opened this issue Nov 15, 2017 · 6 comments

Comments

@oneillkza
Copy link

Hi there

I noticed that a handful of the entries for encodeTFBSmm10 have NA in the cellType and antibody annotations. Fortunately, this data seems to be available here, and I've enclosed a code snippet that adds the missing entries to lola.db$regionAnno to make things easier for you. (Right now I'm using the below code as a workaround for myself.)

Thanks for making and maintaining a very useful tool!

lola.db <- loadRegionDB('LOLA/LOLACore/mm10') #change to wherever LOLACore is downloaded

# Fix missing cell/antibody entries in LOLA

encode.meta <- read.table('https://raw.githubusercontent.com/theaidenlab/juicebox/master/src/juicebox/encode/encode.mm9.txt',
                          sep='\t',
                          header=TRUE)

encode.meta$path <- sub('.*\\/', '', encode.meta$path)
encode.meta$path <- sub('.gz$', '', encode.meta$path)

rownames(encode.meta) <- encode.meta$path

lola.missing <- which(lola.db$regionAnno$collection=='encodeTFBSmm10'&is.na(lola.db$regionAnno$cellType))

missing.files <- lola.db$regionAnno$filename[lola.missing]


lola.db$regionAnno[lola.missing, c('cellType', 'antibody')] <- 
    encode.meta[missing.files, c('cell', 'antibody')]
@oneillkza
Copy link
Author

Oh interesting -- digging a little further, it seems like all of the entries with NAs for cellType and antibody are "RepPeaks" type, meaning they are individual replicates. However, their merged/consensus data ("Peaks") are also present in LOLACore.

For most analyses, one probably wouldn't want to be analysing both the individual replicates and their merged data as though they were independent. It might actually be better to drop these from LOLA?

@nsheff
Copy link
Owner

nsheff commented Nov 15, 2017

Probably right. But would you rather keep the "RepPeaks" or the individual replicates?

@oneillkza
Copy link
Author

I'd rather keep the consensus than the replicates (ie the Peaks rather than the RepPeaks). I've been noticing quite a lot of deviation between replicates, which is presumably why they did them in the first place.

@nsheff
Copy link
Owner

nsheff commented Apr 12, 2018

Thanks for reporting this @oneillkza -- I found 45 extra files in there that shouldn't have been. They had already been excluded from the annotation, but because they were left in the folder, they were still getting read (a feature of LOLA, really...). Anyway, I've taken them out now and will update the public core databases soon. thanks!

@nsheff nsheff closed this as completed Apr 12, 2018
@nsheff
Copy link
Owner

nsheff commented Apr 12, 2018

New version is now deployed here: http://cloud.databio.org/regiondb/

@oneillkza
Copy link
Author

Awesome! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants