You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WhenIuseBioPerltoparseGenBankrecordsthereisawaytotargetcoding sequences (CDS) andretrievethespliced sequence (withoutintrons).Iwouldliketoswitchtorestezforthistask, butIcan't seem to find a way to do this?See for example GenBank record X55026.1 where the COI gene range is 38501..63007 but the CDS is join(38501..38669,41209..41297,42536..42576,43837..43868, 46369..46451,47959..48055,49512..49633,51885..51978, 53386..53396,54384..54470,55738..55797,56925..56927, 58328..58357,59363..59519,60813..60862,62103..62120, 62525..63007)Thanks
The text was updated successfully, but these errors were encountered:
Unfortunately there's no readily available method to retrieve the spliced regions. The package does contain record extraction tools which can be coupled to do what you want.
# for example, download single whole sequence using rentrezrecord<-rentrez::entrez_fetch(db='nuccore', id='X55026', rettype='gb')
# to check, run# cat(record)# eqv. to restez::gb_record_get() if restez database existed# manually pull out spliced sequence# 1/ look up exonsfeatures<-restez::gb_extract(record, 'features')
are_exons<- vapply(features, function(x) tolower(x[['type']]) =='exon', logical(1))
exons<-features[are_exons]
length(exons)
# 2/ get locationsexon_locations<- vapply(exons, '[[', character(1), 'location')
exon_locations<- strsplit(x=exon_locations, split='\\.\\.')
# 3/ assemble sequencesequence<-restez::gb_extract(record, 'sequence')
sequence<- strsplit(x=sequence, split='')[[1]]
exon_sequences<- unlist(lapply(X=exon_locations, function(x) {
sequence[x[[1]]:x[[2]]]
}))
assembled_sequence<- paste0(exon_sequences, collapse='')
nchar(assembled_sequence)
^This code is untested plus I'm not 100% sure it does exactly what you want.
Also, if you do come up with a good function(s), I welcome pull-requests!
Session Info
The text was updated successfully, but these errors were encountered: