Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes doubling results #163

Closed
cbizon opened this issue Apr 25, 2021 · 8 comments
Closed

Sometimes doubling results #163

cbizon opened this issue Apr 25, 2021 · 8 comments

Comments

@cbizon
Copy link
Contributor

cbizon commented Apr 25, 2021

Query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "id": "NCBIGENE:1017",
                    "category": "biolink:Gene"
                },
                "n1": {
                    "category": "biolink:Pathway"
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

See ranking-agent/aragorn#10

This query returns 116 results from BTE (always? mostly?). It sometimes returns 232 (2x116) from strider. This seems to happen semi-reproducibly if I call strider twice in rapid succession, but I have seen it on a first call as well. Possibly a BTE issue instead?

@patrickkwang
Copy link
Contributor

Very difficult to reproduce and test, in part due to BTE rate limits. We also have not observed this with any other queries.

@cbizon
Copy link
Contributor Author

cbizon commented Jul 22, 2021

Here's the same thing happening in cohd reproducibly:

query={
      "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "name": "drug-induced liver injury",
                    "ids": [
                        "SNOMEDCT:197358007"
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:DiseaseOrPhenotypicFeature"
                    ],
                    "name": "Disease Or Phenotypic Feature"
                },
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": [
                        "biolink:correlated_with"
                    ]
                }
            }
        }
    }
}

This one hop returns from a direct cohd call with 460 results in a second or two. It runs for 10 minutes (!) in strider, and returns 880 results. It turns out that this is a doubling because although cohd is returning 440 individual n1 results, some of them are repeated 2 or 3 times (with different kedges).

@cbizon
Copy link
Contributor Author

cbizon commented Jul 22, 2021

@mmersmann
Copy link

Decisions:

  1. Symmetric predicates should be responded to, regardless of how we ask.
  2. Inverses: should only send canonical predicate direction. This means we should canonicalize query graphs.

Action: @patrickkwang to create a branch to address.

@mmersmann
Copy link

Recheck, given new Strider release, to see if this can be closed.

@vgardner-renci
Copy link

@cbizon @patrickkwang Can this issue be closed?

@cbizon
Copy link
Contributor Author

cbizon commented Aug 25, 2021

I think so, but it's hard to tell because right now strider is returning 0 results for this query. That's because COHD has gone to TRAPI 1.2 and is therefore not in the kp-registry any more.

@patrickkwang
Copy link
Contributor

Appears fixed on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants