New alleles can be forward/reverse duplicates of a single allele #8

mariemoinet · 2019-03-18T05:11:01Z

Hi Ignacio,
First of all, thanks for this package! It's really helpful! (bioinformatics rookie here...)

I don't know if you're aware that the new alleles outputs are sometimes just the forward and reverse versions of the same allele. As a result, the MLSTar outputs per se are not 100% useful when new alleles are identified, as an additional work is needed to be sure it's a unique allele. When there are numerous new alleles like in my case (>300), it's not really straightforward to "pair" those alleles. I've had to check and align them in Geneious to do so.
So, having an extra step that orients the sequences would probably be needed.
I take advantage of this message to also give suggestions:

An extra function giving in outputs a list of unique alleles (ready to be submitted to PubMLST) would be great.
Also useful would be to have another function that lists all new STs in a way easy to submit to PubMLST.
Thanks
Marie

The code of a newbie like me is probably far from useful for you, but just in case it might help:
`#LOAD LIBRARIES USED FOR FASTA FILE MANIPULATION
library(Biostrings)
library(DECIPHER)
library(seqinr)

#READ FASTA FILE AS A DNASTRINGSET VARIABLE
seqs=list.files(path="/media/sf_Marie/MLST/", pattern=".fasta", recursive = T, full.names = T)
nseqs <- grep(pattern = '/MyGenomesMLST/results*', seqs, value = T) #or whatever name given as fdir in doMLST
output_folder<-"/mygen_new_alleles"
dir.create(paste0(work_dir,output_folder))

for (x in 1:length(nseqs)){
tmp<-readDNAStringSet(filepath=nseqs[x], format = "fasta")
tmp<-OrientNucleotides(tmp)
tmp<-unique(tmp)
len=length(tmp)
names=paste0(gsub(".fasta","",basename(nseqs[x])),"NEW",1:len)
write.fasta(sequences = as.list(paste(tmp)),names=as.list(names),file.out = paste0(work_dir,output_folder,"/",gsub(".fasta","",basename(nseqs[x])),"_new_alleles.fas")) #as.list necessary for some programmes
}
`

This PR fixes bug #8

iferres · 2019-03-18T17:45:08Z

Hi @mariemoinet , thanks for reporting this (quite terrible) bug!
I think I fixed the problem. Please let me know if it works properly.

And also thanks for your suggestions. I think at least part of them are addressed since you can ask MLSTar to write new alleles found to a file. The format may not be optimal to submit directly to PubMLST now, I know. I will consider better integration with the database in future releases.

I'll close this issue for now, but please open it again if you notice it doesn't work as expected.

mariemoinet changed the title ~~New alleles can be forward/reverse duplicates of a single duplicate~~ New alleles can be forward/reverse duplicates of a single allele Mar 18, 2019

iferres added a commit that referenced this issue Mar 18, 2019

fix bug #8

1cad3a9

iferres added a commit that referenced this issue Mar 18, 2019

Merge pull request #9 from iferres/develop

5de49bd

This PR fixes bug #8

iferres closed this as completed Mar 18, 2019

JFsanchezherrero mentioned this issue Mar 25, 2019

Downloading cgMLST profiles #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New alleles can be forward/reverse duplicates of a single allele #8

New alleles can be forward/reverse duplicates of a single allele #8

mariemoinet commented Mar 18, 2019

iferres commented Mar 18, 2019

New alleles can be forward/reverse duplicates of a single allele #8

New alleles can be forward/reverse duplicates of a single allele #8

Comments

mariemoinet commented Mar 18, 2019

iferres commented Mar 18, 2019