Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix possible double frees in bcf_hdr_add_hrec() error handling #1637

Merged
merged 3 commits into from
Jul 13, 2023

Conversation

daviesrob
Copy link
Member

Spotted while looking at the VCF parser. bcf_hdr_add_hrec() should neither call bcf_hrec_destroy(hrec) nor store any pointers to hrec when it returns -1, otherwise double frees or stale pointer dereferences may result.

Remove bcf_hrec_destroy(hrec) call that was incorrectly made when handling a hash table insert failure, and move hdr->hrec reallocation so that all possible failures occur before hrec is added into the header.

Add bcf_hdr_add_hrec() documentation, including a warning that the caller should not touch hrec after a successful return.

It looks like the bug appeared in 3fe2a59. You'd have to be very unlucky to actually trigger it as it requires a particular allocation to fail, but it's best to ensure that the problem can't occur.

bcf_hdr_add_hrec() should neither call bcf_hrec_destroy(hrec)
nor store any pointers to hrec when it returns -1, otherwise
double frees or stale pointer dereferences may result.

Remove bcf_hrec_destroy(hrec) call that was incorrectly made
when handling a hash table insert failure, and move hdr->hrec
reallocation so that all possible failures occur before hrec is
added into the header.

Add bcf_hdr_add_hrec() documentation, including a warning that the
caller should not touch hrec after a successful return.
@jkbonfield
Copy link
Contributor

Definitely an improvement, but I'm wondering now about the bcf_hdr_register_hrec call at the start of the function. This takes a copy of hrec and adds it to the header dictionaries even when it subsequently fails later on.

I can't find a bcf_hdr_unregister_hrec to remove it again if it fails. What is the consequence of having an extra dict item which isn't referred to by the header? Even if it's ignored, it does mean there is an extra pointer to it cached. Although maybe that's not an issue if the header freeing doesn't free via the hdr->dict pointers but instead frees via get_hdr_aux() function.

@jkbonfield
Copy link
Contributor

Ah wait, I'm wrong. hdr->dict is get_hdr_aux(), but for a specific element. :/

static inline bcf_hdr_aux_t *get_hdr_aux(const bcf_hdr_t *hdr)
{
    return (bcf_hdr_aux_t *)hdr->dict[0];
}

That [0] ought to be BCF_DT_ID so it's not obscure, given that bcf_hdr_register_hrec uses hdr->dict[BCF_DT_CTG] meaning I had to go hunting for the defines to work out if it was manipulating the same dictionary. I'm also unsure why "aux" when it's "id"? There's some confusing terminology in this code.

Anyway, later on there is also use of BCF_DT_IT in bcf_hdr_register_hrec:

    vdict_t *d = (vdict_t*)hdr->dict[BCF_DT_ID];
    k = kh_get(vdict, d, str);
    if ( k != kh_end(d) )
    {
        // already present
        free(str);
        if ( kh_val(d, k).hrec[info&0xf] ) return 0;
        kh_val(d, k).info[info&0xf] = info;
        kh_val(d, k).hrec[info&0xf] = hrec;
        if ( idx==-1 ) {
            if (hrec_add_idx(hrec, kh_val(d, k).id) < 0) {
                return -1;
            }
        }
        return 1;
    }
    k = kh_put(vdict, d, str, &ret);

So this function has clearly taken a copy of hrec, irrespective of whether the function call fails, meaning the comment you added is incorrect. The user should probably just never touch hrec again after calling our API, and maybe we should make it free hrec itself given it mostly will already do this, even after an error.

@daviesrob
Copy link
Member Author

Yes, leaving a dangling pointer in bcf_hdr_t::dict[].hrec[] probably isn't good. I don't think it's a problem for cleaning up though - bcf_hdr_destroy() removes hrecs via bcf_hdr_t::hrec[]. bcf_hdr_remove() either iterates through bcf_hdr_t::hrec[] or checks that the pointer it gets from the dictionary is in there although the latter is a bit dodgy as it might throw an assertion (or do nothing if assertions were disabled).

So I think the main risk of a dangling pointer in bcf_hdr_t::dict[].hrec[] would come from failing to check that bcf_hdr_add_hrec() worked, managing to obtain a copy via bcf_hdr_get_hrec() and then using it. Apart maybe from the issue with bcf_hdr_remove(), nothing should try to delete an hrec that way.

Borrow a bit of code from bcf_hdr_remove() that removes hrec
pointers from the hdr->dict[] dictionaries, turn it into
bcf_hdr_unregister_hrec() and use it to clean up the dictionary
should bcf_hdr_add_hrec() fail.

There's actually only one place in bcf_hdr_add_hrec() where this
needs to be called.  In all other paths returning an error, either
the hrec type is not one that needs to be cleaned up, or hrec
will not have been added to the dictionary.

bcf_hdr_remove() is updated to call the new function when it's
removing all lines of a given type.  The code handling lines
with a specific key is unchanged as is that case it already has
the key to look up in the dictionary and so doesn't need to
hunt for it in the header record.
As bcf_hdr_remove() deletes hrec structs it needs to ensure that
the pointers to them in the bcf_hdr_aux_t::gen dictionary are
removed as well, otherwise callers to bcf_hdr_get_hrec() may get
a stale pointer to the deleted item.

Unfortunately bcf_hdr_remove_from_hdict() needs to allocate
some memory to find the items in the dictionary.  If that fails
it falls back to a search through the dictionary values to
find the item, so we can be sure that it will always succeed.

Enhance tests to ensure bcf_hdr_get_hrec() returns NULL for
removed records.
@daviesrob
Copy link
Member Author

I've added a commit that should prevent dangling pointers in bcf_hdr_t::dict[].hrec[].

While working on that, I also noticed that bcf_hdr_remove() wasn't removing items from the bcf_hdr_aux_t::gen dictionary. This is possibly a worse bug, as trying to retrieve the deleted item with bcf_hdr_get_hrec() would return the stale pointer. A second commit fixes that problem, and adds some tests to ensure removal works correctly.

@jkbonfield
Copy link
Contributor

Scary stuff, but looks good. I checked it doesn't trigger issues with bcftools too.

@jkbonfield jkbonfield merged commit 10f1516 into samtools:develop Jul 13, 2023
8 checks passed
@daviesrob daviesrob deleted the hrec_add branch July 13, 2023 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants