New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient way to compare 2 sets of multiple sequences #34
Comments
@seb-mueller Great question. You can probably reuse Here is a simple code sketch - I haven't validated its correctness but you can easily verify if it works as expected: crossSetSim <- function(
set1, set2,
type = "local",
submat = "BLOSUM62",
gap.opening = 10,
gap.extension = 4) {
combinations <- expand.grid(seq_along(set1), seq_along(set2))
results <- foreach(
i = seq_len(nrow(combinations)),
.combine = c,
.packages = c("Biostrings")
) %dopar% {
idx1 <- combinations[i, 1]
idx2 <- combinations[i, 2]
protr:::.seqPairSim(
c(idx1, idx2),
c(set1, set2),
type,
submat,
gap.opening,
gap.extension
)
}
matrix(
results,
nrow = length(set1),
ncol = length(set2),
byrow = TRUE
)
} Example: library(protr)
library(foreach)
library(doParallel)
doParallel::registerDoParallel(2)
s1 <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
s2 <- readFASTA(system.file("protseq/P08218.fasta", package = "protr"))[[1]]
s3 <- readFASTA(system.file("protseq/P10323.fasta", package = "protr"))[[1]]
s4 <- readFASTA(system.file("protseq/P20160.fasta", package = "protr"))[[1]]
s5 <- readFASTA(system.file("protseq/Q9NZP8.fasta", package = "protr"))[[1]]
set.seed(42)
plist1 <- as.list(c(s1, s2, s3, s4, s5)[sample(1:5, 10, replace = TRUE)])
plist2 <- as.list(c(s1, s2, s3, s4, s5)[sample(1:5, 20, replace = TRUE)])
psimmat <- crossSetSim(plist1, plist2) |
Hi @nanxstats , thanks so much for you quick response and the great suggestion. The index of set2 needs to be offset by the length of set1 to make it work in the combined vector. I'll have to run some more test, but it should be ok like this. |
@seb-mueller Perfect. I agree that this will be a very useful feature. Please send a PR when you are ready. (Also, feel free to add yourself as |
implemented pairwise protlist function (fixes #34)
This feature has been shipped with protr 1.7-0, now on CRAN: https://cran.r-project.org/package=protr |
Really nice package, I've one question though.
Is there are parallel version of to compare to sets of sequences?
I know there is
twoSeqSim
to compare 2 sequences andparSeqSim
which takes one set of sequences and compares them pairwise. However if I have a 2 sets, say set A with seq1 and seq2, as well as set B with seq3 and seq4. I want to compare each member of set A with each member of set B, but don't want to compare within the 2 sets.Is there an efficient (also using parallel execution) way to do so without having to loop using
twoSeqSim
?The text was updated successfully, but these errors were encountered: