New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Your vcf2tsv has a really annoying bug!!! #206
Comments
@deniseduma I was unable to reproduce the "bug" you reported. Here's an example of the test I ran. There were two lines in the output, one for each alt alelle.
|
@deniseduma i find the tone of your writing to be aggressive and disrespectful. I believe Github follows the "Open Code of Conduct" (see http://todogroup.org/opencodeofconduct/) which I would argue you are not following here. |
@deniseduma now back to your issue. did you copy the file between Windows or Mac and a Unix system by any chance? This is a common problem with handling different line endings. |
I'm using a Mac so most likely you are not handling the newline character correctly on the Mac. I'm sure I'm not wrong because I've tested this many times and compared your output with that of another tool which does not split records by alleles and the two outputs have exactly the same number of lines. I'm sorry if you were offended by my tone but I find your tool which apparently you published to be quite immature and bug-ridden. For instance, there was another bug when I tried to install vcflib on my Mac which took me forever to figure out! In my opinion, the installation of the software at least should work smoothly. |
Could you please provide an example input file that demonstrates this behaviour. If you're using a mac and 'hexdump' is also installed, the output of the last few lines of |
Hi,
Here is an example input file, test.vcf, that has the problem and the
corresponding output test.tsv
The command I ran is
vcf2tsv test.vcf >test.tsv
I'm using MacOS 10.12
…On Mon, Jul 31, 2017 at 9:35 AM, David Eccles (gringer) < ***@***.***> wrote:
Could you please provide an example input file that demonstrates this
behaviour. If you're using a mac and 'hexdump' is also installed, the
output of the first few lines of hexdump -C <output_file> would also be
informative.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#206 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ANM6sIfvEKuzNbzF9tcDHWtW4gYN8PrEks5sTYOkgaJpZM4OnvOm>
.
--
Denise
|
|
@deniseduma there does not seem to be a I would try running |
No, I'm not getting github emails about this, and also can't see any attachments on the issue page. |
That's because I've attached them to my email, I didn't realize that you
cannot make use of email attachments!
…On Mon, Jul 31, 2017 at 10:08 AM, David Eccles (gringer) < ***@***.***> wrote:
No, I'm not getting github emails about this, and also can't see any
attachments on the issue page.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#206 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ANM6sCzOYHZ0MaLoY5HpVHT2e952RRCLks5sTYtwgaJpZM4OnvOm>
.
--
Denise
|
Here are the input and output files, I've changed their extension to .txt because Github won't let me upload as .vcf and .tsv |
It looks like you uploaded the same file. After I downloaded them, these files were identical. Edit: I think what I did wrong with the download was to just replace the file name, rather than the file number. github doesn't seem to care what the file names are set to. For example, here they are "renamed" to the correct extension: https://github.com/vcflib/vcflib/files/1186892/test.tsv In any case, I do see that there is no line break or carriage return character in the output:
|
I've re-downloaded test_vcf.txt and test_tsv.txt that I've uploaded via the Github interface on my computer and they are different, the first is the .vcf file and the second is the resulting .tsv file, I'm not sure what files you find identical. |
I confirm there are two different files. |
Here's the C++ file for vcf2tsv, if you can see anything that would help the developers fix your issue that would be very helpful. https://github.com/vcflib/vcflib/blob/master/src/vcf2tsv.cpp |
I thought that the way this works is the users point out the issues and the developers fix them, not that the users both point out the issues and fix them by themselves! Besides, I'm not familiar with C++ so I cannot help. |
As discussed in issue vcflib#206, it seems like for some inputs loadInfoSS() writes multiple entries to the output stringstream, without appending a newline. Fixing this allows to remove the special case handling of the newline in main() for all I can see. Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
This is my first contribution to vcflib and I don't write C++ code. I'm not even using vcflib directly, though I guess some tools I use might. It took me about 5 minutes to find and fix this (see the pull request referenced above), maybe this is useful for you. I would like to point out that you are mistaken in your impression on how "this" works. As a maintainer of multiple open source tools myself, here is my take: People spend time writing software, and share it with other people in the hopes it will be useful. At no point in this interaction is a user entitled to a bug free software (check out the license, it says Free software empowers users because it places their copy under their own control, so they can go and fix problems themselves. If you want someone to yell at for software that doesn't work, buy a support contract. |
What exactly makes you think that I have the time required to fix bugs in free software?? Have you thought that my job descriptions might entail many other responsibilities and after having spent the weekend trying to figure out 1. why the software installation fails and 2. why vcf2tsv doesn't output what it's supposed to output, I cannot afford to spend a workday on the same issue??? I haven't yelled at anybody although I can tell I'm very frustrated and if a simple functionality like this turns out to have bugs, I don't want to think what other more complex features that vcflib is supposed to offer might look like! I was planning to use vcflib for my work but at this point, I don't think it's a safe decision anymore! |
But if you don't like this code, why don't you just write your own? |
I appreciate that text communication is difficult, but this is how I read this: Your initial comment on this bug contains 5 exclamation marks and one dot, not counting the pasted output example. The subject contains three exclamation marks. That looks like yelling to me. I understand that you might be frustrated because you wanted to do something that looks easy, and turned out to be much more time consuming than initially thought. But that doesn't entitle you to unload the frustration on other people. What exactly makes you think that the people who initially wrote the software they provide to you for free have the time to fix bugs? Have you thought that their job description might entail many other responsibilities? I have no idea of the constraints on your time, but I'm willing to go out on a limb to say that it'll be faster to fix a bug or two in an existing implementation than to write a new one from scratch. But you're in the best position to decide this for yourself. |
Moreover, at least have the good grace to say "hey thanks!" to the developer who - despite being unconnected to the project or your work in any way - took time out of their working day to help you (and others) out. @kblin Thank you. |
@kblin thanks for the fix!
|
In the future, if I find my work blocked by an elementary mistake in free software, I'll make sure to use better (free) software out there, but thanks for your unnecessary advice! |
I had no idea that Donald Trump was on Github and processing VCFs.
…--t
On Mon, Jul 31, 2017 at 12:31 PM, deniseduma ***@***.***> wrote:
In the future, if I find my work blocked by an elementary mistake in free
software, I'll make sure to use better (free) software out there, but
thanks for your unnecessary advice!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#206 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAARImzyRfB_J6QdLzVn1mlXBD5Fo7kqks5sTit9gaJpZM4OnvOm>
.
|
I can't believe that this is happening. I've heard about events like this, but haven't seen them, now I have. @deniseduma are you helping science community to be better by filling an issue ? Good on ya! I felt that @tseemann and @kblin in particular doing a great job at facilitating this issue, but everyone else is also adding great deal to this project, great job everybody ! open source is extremely important on many fronts and github is a great place for it. |
@deniseduma @kblin's fix has been pulled into the master branch. I'm sorry for your frustrating experience. Just for some context, I introduced the bug while doing a code cleanup, where I fixed several other bugs. I tested the code before committing and releasing it, even for multi-allelic states. Thank you for reporting the bug. --Zev |
Thank you for fixing it and letting me know! |
You claim that your vcf2tsv outputs one record per allele rather than one output per SNP but you are actually messing up the output by not separating the per-allele records by a new-line character!!!! This is a really elementary mistake and really annoying because the output records are messed up and, contrary to expectations, they equal the number of input records! Could you please fix this bug and test your vcf2tsv code?
Moreover, you are randomly changing the order of the input INFO fields in the output tsv file whereas it would be preferable to keep it the same.
Here is an example concatenated output:
1 17380465 rs138979875 G A 0 . . . . . . RCV00013 2258.2 2 Hereditary_cancer-predisposing_syndrome MedGen:SNOMED_CT C0027672:699346009 NC_000001.10:g.1 7380465G>T 1 single 0 . . . . . . SDHB:6390 . . . . . . . . . . . . . . . . . . . . 138979875 17380465 . . 1 . 0 . . . . SNV . 0x050060000a05040002100100 1 . 1341 17380465 rs138979875 G T 0 . . . . . . RCV000132258.2 2 Hereditary_cancer-predisposing_s yndrome MedGen:SNOMED_CT C0027672:699346009 NC_000001.10:g.17380465G>T 1 single 0 . . . . . . SDHB:6390 . . . . . . . . . . . . . . . . . . . . 138979875 17380465 . . 1 . 0 . . . . SNV . 0x050060000a050400021001 00 1 . 134
The text was updated successfully, but these errors were encountered: