Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return 0 for patients with no neoantigens #52

Merged
merged 2 commits into from Jun 15, 2016
Merged

Return 0 for patients with no neoantigens #52

merged 2 commits into from Jun 15, 2016

Conversation

arahuja
Copy link
Contributor

@arahuja arahuja commented Jun 15, 2016

Removed the check of IDs since this give NaN for patients without neoantigens

@arahuja arahuja changed the title Return 0 for patients with neoantigens Return 0 for patients with no neoantigens Jun 15, 2016
@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 57.87% when pulling 218e077 on neoantigen-nan into 7aed533 on master.

@coveralls
Copy link

coveralls commented Jun 15, 2016

Coverage Status

Coverage decreased (-0.8%) to 57.031% when pulling 218e077 on neoantigen-nan into 7aed533 on master.

@tavinathanson
Copy link
Member

That was intended, and downstream the code checks for null and if null prints that patient X was skipped; which seems important when e.g. cohort = data.init_cohort(only_patients_with_bams=False) (sorry to reference a private repo here).

At a high level, it seems useful/necessary to maintain a distinction between 0 and not available? Where is the NaN a problem for you?

@arahuja
Copy link
Contributor Author

arahuja commented Jun 15, 2016

@tavinathanson in this case the correct answer is 0 though and not NaN (filtering a dataframe for an ID when there are 0 rows that match should be 0)

Compared to the other cases where we are checking if there is an entry corresponding to the ID and then using len. Here there are actually 0 neoantigens so there are 0 rows in the dataframe (and the ID is also not in the dataframe). Alternatively, we can add a separate check that we ran the neoantigens part for this ID.

Also, I haven't found this useful for the mutation counts either, since if a patient is missing VCF files, we return an empty VariantCollection. Which means the check for the ID passes, and we (falsely) return 0 as opposed to NaN

@tavinathanson
Copy link
Member

Ah, I see. I guess we could look at what mutations are present to decide between 0 and NaN?

Re the empty VariantCollection: great point. It used to filter them out properly, so that's a bug.

@tavinathanson
Copy link
Member

Merge away per offline discussion:

@tavinathanson: still not clear to me where you're running into issues with nan?
@arahuja: When there are no neoantigens for example, plot_benefit etc throw out some of the patients
@tavinathanson: suppose you can merge it in and i can follow it up with a PR that brings back nan in a non buggy way, sound reasonable?
@arahuja: Sure - what were you thinking though?
@tavinathanson: using load_variants to distinguish 0 from not available; and for not available, we should throw em out (not fully formed; does that make sense?)
@arahuja: yea but it could be variants were available but hla wasn't
@tavinathanson: can we rely on whether or not the neoantigen cache folder for that patient exists?
@arahuja: not sure when it’s created (i.e. whether it’s created regardless and then just populated if anything is missing)
@tavinathanson: i think not created regardless but have to check

@tavinathanson tavinathanson mentioned this pull request Jun 15, 2016
@arahuja arahuja merged commit f8a14b9 into master Jun 15, 2016
@arahuja arahuja deleted the neoantigen-nan branch June 15, 2016 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants