Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New alleles can be forward/reverse duplicates of a single allele #8

Closed
mariemoinet opened this issue Mar 18, 2019 · 1 comment
Closed

Comments

@mariemoinet
Copy link

Hi Ignacio,
First of all, thanks for this package! It's really helpful! (bioinformatics rookie here...)

I don't know if you're aware that the new alleles outputs are sometimes just the forward and reverse versions of the same allele. As a result, the MLSTar outputs per se are not 100% useful when new alleles are identified, as an additional work is needed to be sure it's a unique allele. When there are numerous new alleles like in my case (>300), it's not really straightforward to "pair" those alleles. I've had to check and align them in Geneious to do so.
So, having an extra step that orients the sequences would probably be needed.
I take advantage of this message to also give suggestions:

  • An extra function giving in outputs a list of unique alleles (ready to be submitted to PubMLST) would be great.
  • Also useful would be to have another function that lists all new STs in a way easy to submit to PubMLST.
    Thanks
    Marie

The code of a newbie like me is probably far from useful for you, but just in case it might help:
`#LOAD LIBRARIES USED FOR FASTA FILE MANIPULATION
library(Biostrings)
library(DECIPHER)
library(seqinr)

#READ FASTA FILE AS A DNASTRINGSET VARIABLE
seqs=list.files(path="/media/sf_Marie/MLST/", pattern=".fasta", recursive = T, full.names = T)
nseqs <- grep(pattern = '
/MyGenomesMLST/results*', seqs, value = T) #or whatever name given as fdir in doMLST
output_folder<-"/mygen_new_alleles"
dir.create(paste0(work_dir,output_folder))

for (x in 1:length(nseqs)){
tmp<-readDNAStringSet(filepath=nseqs[x], format = "fasta")
tmp<-OrientNucleotides(tmp)
tmp<-unique(tmp)
len=length(tmp)
names=paste0(gsub(".fasta","",basename(nseqs[x])),"NEW",1:len)
write.fasta(sequences = as.list(paste(tmp)),names=as.list(names),file.out = paste0(work_dir,output_folder,"/",gsub(".fasta","",basename(nseqs[x])),"_new_alleles.fas")) #as.list necessary for some programmes
}
`

@mariemoinet mariemoinet changed the title New alleles can be forward/reverse duplicates of a single duplicate New alleles can be forward/reverse duplicates of a single allele Mar 18, 2019
iferres added a commit that referenced this issue Mar 18, 2019
iferres added a commit that referenced this issue Mar 18, 2019
@iferres
Copy link
Owner

iferres commented Mar 18, 2019

Hi @mariemoinet , thanks for reporting this (quite terrible) bug!
I think I fixed the problem. Please let me know if it works properly.

And also thanks for your suggestions. I think at least part of them are addressed since you can ask MLSTar to write new alleles found to a file. The format may not be optimal to submit directly to PubMLST now, I know. I will consider better integration with the database in future releases.

I'll close this issue for now, but please open it again if you notice it doesn't work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants