-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maf file output correct? #1
Comments
Thanks for reporting this issue. I'll check the Maf output. |
Big thanks, Also, I really like GSAlign as it's so fast. Is there a more detailed explanation for some of the parameters? I compared the -sen mode with default but got a shorter alignment and lower ASI with -sen. |
Thank you for saying that. I will keep improving GSAlign to make it more useful. I'll also put more explanation of the parameters in the help message. Thank you again. |
Hi I think I figured it out. It's the ---- that are added in the alignment. When I checked the maf I found for the first line: When I then checked the length of the alignment, it was 1654992 characters long. Of those 127 were ----. If you subtract that -> 1654865. So this would explain why mafToPsl says out of range and why when I make a chain file from the maf downstream tools also give errors with that chain file. I assume this is because of gapped alignments? Do you have to count the length of the alignment with --? |
It is wired. Gaps are not counted when I output the alignment length. I'll check the codes again. Thank you! |
I did some tests and confirmed that the gaps are not counted. Could you please give me your sequences? I'd like to use your sequences to check the output. Thank you. |
Correct, they are not counted. I thought that would be the issue given the error with mafToPsl. Okay, I will share the sequences through google drive. |
Question: If you use any of your maf files that you generate with GSAlign you can convert it to PSL with mafToPsl from uscs tools? I am trying the convert the maf to a chain file and add GenomeAlignmentTools to improve the chain file. Since mafToPsl failed I used maf convert which did convert the GSAlign maf file to chain but if I use that chain with any tool from GenomeAlignmentTools I get errors. Sorry I get to the upload today but could you also test your maf files and if you can convert them to chain format? |
I'll see if I could provide output with PSL format directly in GSAlign. Before that, I'll check my maf output using your dataset. Thank you! |
@kingralph80 I run some tests on your sequences with default parameters and your parameters, and then I use mafToPsl (from kentUtils) to convert maf into psl format, there is no any warnings or errors. The ucsc-maftopsl could not be run in my server, could you please download kentUtils and run that mafToPsl with the maf file. Thank you! |
You are right, no error message but the file size is 0 bytes. Could you tell me which kentUtils are you referring too? https://github.com/ENCODE-DCC/kentUtils is what I used Also changed to default parameters and my maf file is 200218210 with Alignment#=25114 (total alignment length=98983799) ANI=98.41% |
It's wired. The PSL file I got was exactly the same number of alignments as GSAlign outputed. We downloaded the same kentUtils. And my command was mafToPsl CML332.chr10 B73.chr10 output.maf output.psl. I found the order of two reference names does not matter. They both produced the same result. I was wonder how is the coverage rate you expected? |
Hi. so thanks for resolving this. In the end, the error originally was that the genomes we used have the same chromosome names. All the maize once we compare use chr1 to chr10. The tools we used to compare GSAlign to like cactus, renamed them in the maf using our Input in the command line. GSAlign does not seem to not have such an input parameter. For the test, we then renamed the chroms ourselves a typo in the chromosome name resulted in the 0 byte psl output. |
Would it be possible to add an optional input parameter for a suffix for target and query so that if needed the same chr labels can be changed in the output maf? This way the user does not need to rename them all? In our case, its ~30 genomes but others may have many more. |
Hi, I've updated GSAlign to 1.0.14. It adds ref and qry as prefix for reference and query chromosomes, respectively in the MSA/ALN file. I'll add two more parameters to let user input their own prefix/suffix for the two genomes in the future version. |
Argh, when I use the full genome not just chr10 we are back to the original error. MafToPsl says: I will post the line that causes the error (103592) plus the line prior in the next message but I think mafToPSL is correct. the line that causes the issue starts with: If you add 304926865 (Start) + 658 (Alignment) = 304927523, this cannot be if the whole source length is 304927522. Its off by one bp. |
The full alignment that causes the error in the maf file: a score=645 |
Hi,
MAF is 1-base annotation, therefore the last position should be start_pos
+ aln_length - 1. That is 304926865 (Start) + 658 (Alignment) -1 =
304927522.
12345678
agctagct
The above is a sequence of length 8, and the start is 1, length is 8, so
the last position is 1 + 8 - 1 = 8.
I think MafToPsl is wrong about the positioning.
Hsin-Nan
…On Wed, Apr 22, 2020 at 3:44 AM kingralph80 ***@***.***> wrote:
Argh, when I use the full genome not just chr10 we are back to the
original error. MafToPsl says:
"Coordinates out of range line 103592 of B73v5.CML322.50.200.maf"
I will post the line that causes the error (103592) plus the line prior in
the next message but I think mafToPSL is correct. the line that causes the
issue starts with:
s qry.chr1 304926865 658 + 304927522
If you add 304926865 (Start) + 658 (Alignment) = 304927523, this cannot be
if the whole source length is 304927522. Its off by one bp.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACOKJKORQQC3JCTWELH2MGLRNXZTVANCNFSM4MC54KEA>
.
|
Ok I will post this at the kent utilities. |
Hi, under and under From these lines, I would assume maf is 0-based not 1-based,or ? I double-checked and a 1- based coordinate is not mentioned in the definition of maf ( https://genome.ucsc.edu/FAQ/FAQformat.html#format5 ). HOWEVER, there is a second maf format (Mutation Annotation Format instead of Multiple Alignment format). Are there different definitions of the MAF format? |
Sorry for the back and forth. I really just want to be able to use the GSAlign output further. |
Hi, thanks for the information. I've updated GSAlign. Now, the MAF is zero-based. Thank you!
|
Big thanks for supporting the tool. I will try the new version right away. |
I check. Same error although the output looks different: The line that caused the error: Not sure I see the error. Does - strand means the start is the stop position? |
The authors of mafToPSL at UCSC confirmed that maf MAF is a 0-based. In case of this error the MAF definition is: start -- The start of the aligning region in the source sequence. This is a zero-based number. If the strand field is "-" then this is the start relative to the reverse-complemented source sequence (see Coordinate Transforms). So the way I understand this should the alignment not read: 180451555 - 45989 = Start position and - strand indicates its reverse complement? |
Thanks for the information. And yes, the "-" indicates the reverse-complemented strand. |
Dear Hsinnan, your latest commit solved this reported issue. Thanks a lot for fixing and support your great tool. |
Hi,
I am using GSAlign for the alignment of two diverse maize genomes with parameters:
-dp -ind 100 -clr 50 -alen 200 -t 48
-gp /usr/bin/gnuplot \
GSAlign runs through without errors, however, when I try to convert the MAF file to PSL by kent tool mafToPsl I always get:
Coordinates out of range line 311372 of B73v5_vs_Oh43_ch1_10.maf
Cheers
The text was updated successfully, but these errors were encountered: