Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up Genotype JEXL Filtering #532

Merged
merged 5 commits into from Apr 14, 2016

Conversation

Projects
None yet
4 participants
@bbimber
Copy link
Contributor

bbimber commented Mar 21, 2016

I am hoping someone more familiar with JEXL in htsjdk could do a quick review of what i'm proposing to see if this looks reasonable. As noted in your code, genotype-level JEXL filtering is supported, but not implemented very cleanly. I tried to improve that and also make it more consistent w/ the context available for VariantContext.

Here's a related GATK forum thread:

http://gatkforums.broadinstitute.org/gatk/discussion/comment/28654#Comment_28654

@vdauwera

This comment has been minimized.

Copy link
Contributor

vdauwera commented Mar 22, 2016

@yfarjoun Think you could have a look at this? Seems like it would improve JEXL capabilities in GATK.

@bbimber

This comment has been minimized.

Copy link
Contributor Author

bbimber commented Mar 22, 2016

Hello,

The GATK folks suggested I include an example of how this alters filtering. Here are some key points:

  1. This is general code cleanup. The existing code has a lot of comments about how this codepath needed improvement.

  2. When filtering at the site level, the VariantContext object was in scope. However, when filtering at the Genotype level, neither vc nor genotype are in scope. That's a big limitation. This patch enabled genotype filters like:

-G_filter "isHet && g.hasAd() && g.getAD().1 < 5"

  1. It fixes what I think was a bug in the original implementation, which omits AD + PL from the JEXL map. This is because they were not getting included in Genotype.getExtendedAttributes().

I'm happy to incorporate suggestions on alternate approaches.

@lindenb

This comment has been minimized.

Copy link
Contributor

lindenb commented Mar 22, 2016

@bbimber for you information: there is now a javascript-based engine in picard filtervcf FilterVcf . It's, IMHO, much more powerful than JEXL . See

https://github.com/samtools/htsjdk/blob/master/testdata/htsjdk/variant/variantFilter02.js

for an example.

@vdauwera

This comment has been minimized.

Copy link
Contributor

vdauwera commented Mar 23, 2016

I agree that the javascript filters are very cool but we still need this PR which cleans up existing code and extends functionality, so that we can still offer straightforward one-line querying via JEXL.

@bbimber

This comment has been minimized.

Copy link
Contributor Author

bbimber commented Mar 29, 2016

hi guys - sorry to bug on this, but is there anything else you need from me? is there a process through which hts-jdk submissions are reviewed? as noted by @vdauwera, even if there are newer filter options, JEXL is still used in GATK. thanks.

@vdauwera

This comment has been minimized.

Copy link
Contributor

vdauwera commented Mar 29, 2016

@bbimber I don't think we need anything directly from you right now; I need to find someone to do a technical review of your PR. Since it mostly benefits GATK it will have to be someone from our team, but my first choice of reviewer is on vacation. I'll try to catch someone's attention. Sorry for the delay!

@droazen droazen assigned yfarjoun and unassigned vdauwera Mar 29, 2016

attrs.put("g", new AttributeGetter() { public Object get(Genotype g) { return g; }});
attrs.put(VCFConstants.GENOTYPE_KEY, new AttributeGetter() { public Object get(Genotype g) { return g.getGenotypeString(); }});

attrs.put("isHom", new AttributeGetter() { public Object get(Genotype g) { return g.isHom() ? "1" : "0"; }});

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

you can use java8 magic to make this a lot cleaner (intelliJ gives nice recommendations, and while you are at it, could you do the same for the VariantJEXLContext?)

This comment has been minimized.

Copy link
@bbimber

bbimber Apr 5, 2016

Author Contributor

do you mean lambdas? can i introduce those if you're using java6?

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

htsjdk is on java 8.

This comment has been minimized.

Copy link
@bbimber

bbimber Apr 8, 2016

Author Contributor

FWIW, the checked in project IPR file has the languageLevel as 1.6. If you're officially on 1.8, it might be nice to update that file (not in this PR).

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 8, 2016

Contributor

you are correct, though the README.md states that we are on JAVA8...

On Fri, Apr 8, 2016 at 1:32 PM, bbimber notifications@github.com wrote:

In src/java/htsjdk/variant/variantcontext/GenotypeJEXLContext.java
#532 (comment):

  • */
    +public class GenotypeJEXLContext extends VariantJEXLContext {
  • private Genotype g;
  • private interface AttributeGetter {
  •    public Object get(Genotype g);
    
  • }
  • private static Map<String, AttributeGetter> attrs = new HashMap<String, AttributeGetter>();
  • static {
  •    attrs.put("g", new AttributeGetter() { public Object get(Genotype g) { return g; }});
    
  •    attrs.put(VCFConstants.GENOTYPE_KEY, new AttributeGetter() { public Object get(Genotype g) { return g.getGenotypeString(); }});
    
  •    attrs.put("isHom", new AttributeGetter() { public Object get(Genotype g) { return g.isHom() ? "1" : "0"; }});
    

FWIW, the checked in project IPR file has the languageLevel as 1.6. If
you're officially on 1.8, it might be nice to update that file (not in this
PR).


You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub
https://github.com/samtools/htsjdk/pull/532/files/e8e473449289b9b27f478e72a75a626244eaa363#r59061165

jContext = new MapContext(infoMap);
}
else {
jContext = new MapContext(new HashMap<String, Object>());

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

Use Collections.emptyMap

}
// QUESTION: the original code could have created a new VariantJEXLContext with a null vc.
// i dont think this is right, though i have no idea if vc could actually be null
if ( vc != null ) {

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

I'd prefer if you kept the nesting lower by inverting the if statement and returning Collections.emptyMap() if vc==null.

attrs.put("isAvailable", new AttributeGetter() { public Object get(Genotype g) { return g.isAvailable() ? "1" : "0"; }});
attrs.put("isPassFT", new AttributeGetter() { public Object get(Genotype g) { return g.isFiltered() ? "0" : "1"; }});
attrs.put(VCFConstants.GENOTYPE_FILTER_KEY, new AttributeGetter() { public Object get(Genotype g) { return g.isFiltered()? g.getFilters() : "PASS"; }});
attrs.put(VCFConstants.GENOTYPE_QUALITY_KEY, new AttributeGetter() { public Object get(Genotype g) { return g.getGQ(); }});

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

what about DP? and what about the extended attributes?

This comment has been minimized.

Copy link
@bbimber

bbimber Apr 5, 2016

Author Contributor

these would be captured by GenotypeJEXLContext.get(), correct? DP could be added to the static map, though I'm not sure how one supports extended (variable) attributes there.

I was assuming the primary point of the map was to support non-standard keys (i.e. isHom, isHomRef). Otherwise, we defer back to Genotype.hasAnyAttribute(). This is comparable to what exists in VariantJEXLContext.get().

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

I just mentioned it since it was in the original code and so I feel that it would change the API.

This comment has been minimized.

Copy link
@bbimber

bbimber Apr 5, 2016

Author Contributor

I believe the original code in JEXLMap actually had bugs here. I think you're referring to lines 109-113 in the previous version:

                if ( g.hasDP() )
                    infoMap.put(VCFConstants.DEPTH_KEY, g.getDP());
                for ( Entry<String, Object> e : g.getExtendedAttributes().entrySet() ) {
                    if ( e.getValue() != null && !e.getValue().equals(VCFConstants.MISSING_VALUE_v4) )
                        infoMap.put(e.getKey(), e.getValue());

I'm not seeing how we put anything in the static map that is dependent on the Genotype itself, which I why I'm having GenotypeJEXLContext supply these in get().

Also, the original code special-cased some genotype fields and then deferred to Genotype.getExtendedAttributes() for the rest. The problem is that FastGenotype and JEXLMap were not consistent about which fields they special case. This mostly means that there was no way to ask for AD.

The implementation I'm proposing will change behavior, but that's on purpose. Any value in the Genotype should be available to JEXL. I assume that was the original intent anyway.

return attrs.get(name).get(g);
} else if ( g.hasAnyAttribute(name) ) {
return g.getAnyAttribute(name);
} else if ( g.getFilters().contains(name) ) {

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

This is a surprising behavior...is it possible to add a function that takes a filtername and responds with 1 or 0 depending on whether the filter is applied to the genotype?

This comment has been minimized.

Copy link
@bbimber

bbimber Apr 5, 2016

Author Contributor

I tend to agree, but it's exactly what VariantJEXLContext does. With all my changes, I was trying to keep GenotypeJEXLContext behavior consistent with existing behavior in VariantJEXLContext. Thoughts on consistency?

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

hmmm OK. keep as before.

if ( e.getValue() != null && !e.getValue().equals(VCFConstants.MISSING_VALUE_v4) )
infoMap.put(e.getKey(), e.getValue());
}
// QUESTION: the original code could have created a new VariantJEXLContext with a null vc.

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

this is fine. remove comment.

@yfarjoun

This comment has been minimized.

Copy link
Contributor

yfarjoun commented Apr 5, 2016

back to you @bbimber

@yfarjoun yfarjoun removed their assignment Apr 5, 2016

public Object get(Genotype g);
}

private static Map<String, AttributeGetter> attrs = new HashMap<String, AttributeGetter>();

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 5, 2016

Contributor

no reason to skimp on verbosity here, use attributes. Could you also change the name of the attributes in VariantJEXLContext from "x" to "attributes"?

@bbimber

This comment has been minimized.

Copy link
Contributor Author

bbimber commented Apr 5, 2016

I believe the last commit addresses all the comments

attrs.put(VCFConstants.GENOTYPE_FILTER_KEY, new AttributeGetter() { public Object get(Genotype g) { return g.isFiltered()? g.getFilters() : "PASS"; }});
attrs.put(VCFConstants.GENOTYPE_QUALITY_KEY, new AttributeGetter() { public Object get(Genotype g) { return g.getGQ(); }});
attributes.put("g", (Genotype g) -> g);
attributes.put(VCFConstants.GENOTYPE_KEY, Genotype::getGenotypeString);

This comment has been minimized.

Copy link
@yfarjoun

yfarjoun Apr 6, 2016

Contributor

is the (** Genotype** g) -> g needed? will not g->g also work?

Also, g is now an overloaded variable. Could you rename the member one to be genotype?

(both comments have clear parallels in the VC part.)

@yfarjoun

This comment has been minimized.

Copy link
Contributor

yfarjoun commented Apr 6, 2016

Thanks @bbimber I think this is great. It would be even better if you could add some tests that showcase usages beyond the very simple tests I see in VariantJEXLContextUnitTest....otherwise I made some minor stylistic comments.

Thanks again.

@bbimber

This comment has been minimized.

Copy link
Contributor Author

bbimber commented Apr 8, 2016

now with unit test

@bbimber

This comment has been minimized.

Copy link
Contributor Author

bbimber commented Apr 13, 2016

Hey - just wanted to check back on this. So far as i can see the previous comments have been addressed, and I expanded the unit test to exercise genotype filtering.

@yfarjoun yfarjoun merged commit 4d0f516 into samtools:master Apr 14, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@yfarjoun

This comment has been minimized.

Copy link
Contributor

yfarjoun commented Apr 14, 2016

thanks!

@bbimber bbimber deleted the bbimber:genotypeFilter branch Apr 14, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.