Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htslib 1.9 aborts in bcf_update_format invoked from cyvcf2 autotests #813

Closed
smoe opened this issue Jan 10, 2019 · 5 comments · Fixed by #856
Closed

htslib 1.9 aborts in bcf_update_format invoked from cyvcf2 autotests #813

smoe opened this issue Jan 10, 2019 · 5 comments · Fixed by #856
Assignees

Comments

@smoe
Copy link

smoe commented Jan 10, 2019

Hello,
The htslib shipping with cyvcf2 was substituted with the official release 1.9. The problem reproduces with both Python 2 and Python 3 (3.7).

/home/moeller/git/med-team/cyvcf2/cyvcf2/.pybuild/cpython3_3.7_cyvcf2/build/cyvcf2/tests/test_reader.py:570: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  v.set_format("PS", np.array([8.555, 11.111], dtype=np.float64))
/home/moeller/git/med-team/cyvcf2/cyvcf2/.pybuild/cpython3_3.7_cyvcf2/build/cyvcf2/tests/test_reader.py:573: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.
  v.set_format("PS", np.array([9998.555, 99911.111], dtype=np.float32))
/home/moeller/git/med-team/cyvcf2/cyvcf2/.pybuild/cpython3_3.7_cyvcf2/build/cyvcf2/tests/test_reader.py:573: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  v.set_format("PS", np.array([9998.555, 99911.111], dtype=np.float32))
ok
cyvcf2.tests.test_reader.test_set_format_int ... [W::vcf_parse] Contig '7' is not defined in the header. (Quick workaround: index the file with tabix.)
python3.7: vcf.c:3496: bcf_update_format: Assertion `!fmt->p_free' failed.
Aborted (core dumped)
E: pybuild pybuild:338: test: plugin distutils failed with: exit code=134: cd /home/moeller/git/med-team/cyvcf2/cyvcf2/.pybuild/cpython3_3.7_cyvcf2/build; python3.7 -m nose -v 
dh_auto_test: pybuild --test --test-nose -i python{version} -p 3.7 returned exit code 13

First reported on brentp/cyvcf2#109

@daviesrob
Copy link
Member

Original problem now fixed, but we might want to look into making bcf_update_info() able to expand its buffer more than once. Although it's a niche requirement as most programs are likely to only make one change at most...

@kevinjacobs-progenity
Copy link

@daviesrob: Ouch. Apologies if the following rant is not intended for you... hoping not to shoot the messenger.

BUT here are the docs for bcf_update_info:

    /*
     *  bcf_update_info_*() - functions for updating INFO fields
     *  @hdr:       the BCF header
     *  @line:      VCF line to be edited
     *  @key:       the INFO tag to be updated
     *  @values:    pointer to the array of values. Pass NULL to remove the tag.
     *  @n:         number of values in the array. When set to 0, the INFO tag is removed
     *
     *  The @string in bcf_update_info_flag() is optional, @n indicates whether
     *  the flag is set or removed.
     *
     *  Returns 0 on success or negative value on error.
     */

Would you be so kind as to point out where it says that it can be called only once with a resize? How are we even to know if a resize is necessary?

And thanks for classifying all object bindings to htslib (like pysam) as niche.

More constructively, can you help us understand what needs to be done to properly document which other functions in htslib are similarly limited to a single resize? And help us to understand what needs to be done to generalize the functionality? I and others are happy to step up to do the work, but we're not bloody mind readers.

@daviesrob
Copy link
Member

I wrote neither the function nor its documentation.

The assert statement in question can be triggered by calling bcf_update_info() more than once on the same record and with the same key. I described this as "niche" because I expect most programs set the value they want for the given key the first time and don't try to modify it repeatedly.

This of course has nothing to do with what language or object bindings you happen to be using. It fact cyvcf2 (and possibly also pysam) avoids this problem if built with an embedded htslib by disabling asserts, which (as far as I can tell) results in htslib leaking memory instead of calling abort().

I think this should be reasonably easy to fix, so we should be able to get it to work as advertised for people who are indecisive about what value they want to store.

I should add that this limitation is also present in bcf_update_format() which includes a very similar bit of code.

@kevinjacobs-progenity
Copy link

I apologize for the grumpy tone of my initial response and appreciate your more constructive reply.

Is the solution as simple as transforming the asserts into calls to free those extra pointers? I admit I've not yet looked at the specific code, but will dedicate some time to doing so in the coming days.

@daviesrob
Copy link
Member

As far as I can tell, yes. The pointer would need to be freed in a similar way to when tags are removed here. In fact thinking about it, that should act as a work-around - call bcf_update_info() with the key and n = 0 to remove it, and then put it back with the new value.

Any fix would need additions to the test harness to ensure that it gets exercised.

jmarshall added a commit to jmarshall/htslib that referenced this issue Apr 16, 2019
Rather than asserting that no value has previously been set, free any
previously set value array before unpacking the new one.  Fixes samtools#813.

(This free() is all that's left after duplicating the "Mark the tag
for removal, free existing memory if necessary" code from earlier in
each function, and removing assignments that are unnecessary due to
the immediately following bcf_unpack_info/fmt_core1() call.)

Add test cases that update to small, large, and finally middle-sized
values, so that both reallocation and reuse are exercised.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants