Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hetio v0.2.3 breaks integration code #10

Closed
veleritas opened this issue Mar 8, 2017 · 6 comments
Closed

Hetio v0.2.3 breaks integration code #10

veleritas opened this issue Mar 8, 2017 · 6 comments
Assignees

Comments

@veleritas
Copy link
Contributor

veleritas commented Mar 8, 2017

Hi Daniel,

Just wanted to note that the new assertions you added into v0.2.3 about whether a node or edge have already been integrated into the network breaks earlier code in the integrate repository.

Looking at the blame for hetnet.py shows that those assertion lines were added only 21 days ago with commit d32ba2a, and are therefore very recent (I wasn't having these problems before).

For now I've just commented out the cells which add edges which already exist in the network, but I just wanted you to know that dhimmel/integrate/integrate.ipynb won't run directly anymore with the latest hetio release.

@dhimmel
Copy link
Member

dhimmel commented Mar 8, 2017

Eek. Okay I'll look into this. Sorry!

@veleritas
Copy link
Contributor Author

The easy way might be to modify hetio so that if the node or edge already exists in the network, then the code just skips adding that node/edge again. This would probably be easier than modifying all of the integrate code to use try/catch statements to deal with all of the assertion errors.

@dhimmel
Copy link
Member

dhimmel commented Mar 9, 2017

@veleritas as you suspected, d32ba2a adds a check for duplicate edge additions. However, duplicate nodes additions previously threw an error. I was surprised when I looked through the code and saw that duplicate edges were allowed to be added.

I definitely think attempting to add a duplicate edge should throw an error... allowing additions of duplicates edges will almost certainly lead to a host of other bugs and confusing behaviors down the road.

The easy way might be to modify hetio so that if the node or edge already exists in the network, then the code just skips adding that node/edge again.

I'd rather implement these checks or error handing in integrate.ipynb. I'll look into where duplicate edges are being added in integrate.ipynb and attempt to resolve these issues. Remember the Zen:

Errors should never pass silently.

Hopefully, I can identify a solution that:

  1. makes integrate.ipynb work with hetio>=0.2.3
  2. reproduces Hetionet v1.0 exactly

I'm thinking that in the past when a duplicate edge was added, it would essentially overwrite the existing edge. So I'm crossing my fingers that drop_duplicates in pandas using keep='last' can save us here. Will let you know.

@dhimmel dhimmel self-assigned this Mar 9, 2017
dhimmel added a commit to dhimmel/drugcentral that referenced this issue Mar 9, 2017
dhimmel added a commit to dhimmel/integrate that referenced this issue Mar 9, 2017
See hetio/hetnetpy#10

Hetnet output different. Not yet tracked, pending investigation.
@dhimmel
Copy link
Member

dhimmel commented Mar 9, 2017

Here's the situation we're dealing with.

The following issues exposed by d32ba2a cause changes to the hetnet:

  1. 2925 Gene-interacts-Gene edges are self-loops. This makes sense (for example homodimers)
  2. Self to self gene pairs in output data dhimmel/erc#2: 2 ERC gene pairs were self-loops (this does not make sense, should be removed)

I believe, these changes will affect the hetnet compared to Hetionet v1.0. This is because previously the self-loops of bidirectional edges would be considered inverted and therefore not get exported. d32ba2a added support for self loops.

The following issues exposed by d32ba2a do not cause changes to the hetnet:

  1. Duplicate gene pairs in output data dhimmel/erc#1: one ERC gene pair was duplicated. This does not make sense, Resolved by dropping the duplicate gene pair with keep='last'.
  2. dhimmel/drugcentral@0e085dd: duplicated Compound-includes-PharmacologicClass edges. Resolved by removing duplicates upstream in dhimmel/drugcentral.

These issues became apparent due to the duplicate edge checking added in d32ba2a. I currently believe these fixes will not affect the hetnet compared to Hetionet v1.0.

I'm still investigating the issue.

@dhimmel
Copy link
Member

dhimmel commented Mar 9, 2017

Effect on Project Rephetio

d32ba2a enabled bidirectional self-loops in hetio. Here I'll assess the downstream effect on Project Rephetio of allowing self-loops in integrate.ipynb. 2925 new GiG and 2 new GcG edges would be added. When computing features, the path counts should stay the same as we exclude paths with duplicate nodes. However, degree-weighted path counts could change, since some nodes will now have a different degree.

Therefore I think it's best to filter self-loops in integrate.ipynb so we keep the old behavior. In future versions of Hetionet, we can remove the self-interaction filter.

@veleritas I implement a fix in dhimmel/integrate#12. If you adopt the changes where you're tagged, you should be able to use hetio v0.2.3 without any downstream effects. I recommend the upgrade since the old hetio behaviors were essentially bugs. Let me know if this makes sense.

@dhimmel
Copy link
Member

dhimmel commented Apr 4, 2018

I think there's nothing outstanding on this issue, so I'll close. Happy to reopen if it's still active.

@dhimmel dhimmel closed this as completed Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants