Lots and lots of duplication #267

cbizon · 2021-09-02T20:04:31Z

Standup query:

query = {
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": [
                        "NCBIGene:6656"
                    ],
                    "categories": [
                        "biolink:Gene"
                    ]
                },
                "n1": {
		             "categories":[
                    
                         
                         "biolink:BiologicalProcessOrActivity"
                        ]
                },
                "n2": {
                    "ids":["NCBIGene:6657"],
                    "categories":[
                        "biolink:Gene"
                       ]
               }

            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates":["biolink:related_to"]

                },
                "e1": {
                    "subject": "n2",
                    "object": "n1",
                    "predicates":["biolink:related_to"]

                }
            }
        }
    }
}

2 genes, what processes do they have in common. We return 1174 results, but only 62 are unique.

Oddly, each result happens either 1, 5, 10, 25, 50, 75, 100, or 150 times.

The most common result (150 copies) is this:

{
 "edge_bindings": {
  "e0": [
   {
    "id": "NCBIGene:6656-biolink:participates_in-GO:0006355"
   }
  ],
  "e1": [
   {
    "id": "NCBIGene:6657-biolink:participates_in-GO:0006355"
   }
  ]
 },
 "node_bindings": {
  "n0": [
   {
    "id": "NCBIGene:6656"
   }
  ],
  "n1": [
   {
    "id": "GO:0006355"
   }
  ],
  "n2": [
   {
    "id": "NCBIGene:6657"
   }
  ]
 },
 "score": null
}

I suspect KP funkiness but even so, I think we should unique these. It's possible that there is uniquifiying that could occur in the process, allowing some speedup as well.

The text was updated successfully, but these errors were encountered:

cbizon · 2021-09-02T20:05:25Z

The other thing is that this causes trouble for AC, which is trying to count things....

cbizon · 2021-09-08T23:37:34Z

There are also repeated values in D.1.

uhbrar · 2021-10-05T14:56:49Z

This may be fixed via the smarter message merging Alon has been working on implementing. Once that's in, I'll check back to see whether or not that addresses the problem.

richakanwar13 · 2021-12-03T16:21:05Z

Can be closed when #276 is closed.

cbizon added Priority: Medium standup Issue related to a Translator standup Type: Enhancement labels Sep 2, 2021

cbizon mentioned this issue Sep 2, 2021

Gene (NCBIGene:6656) - related_to - BiologicalProcessOrActivity - related_to - Gene (NCBIGene:6657) NCATSTranslator/testing#111

Open

cbizon added the DecemberDemo label Sep 8, 2021

cbizon mentioned this issue Sep 8, 2021

Aragorn repeating results on D.1 query NCATSTranslator/minihackathons#227

Open

patrickkwang added Type: Bug and removed Type: Enhancement labels Sep 9, 2021

patrickkwang assigned uhbrar Sep 9, 2021

patrickkwang added Priority: High and removed Priority: Medium labels Sep 9, 2021

patrickkwang mentioned this issue Oct 5, 2021

No results for one hop with both nodes specified (Duplicates) #76

Closed

richakanwar13 added the Status: On Hold label Feb 25, 2022

uhbrar closed this as completed Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lots and lots of duplication #267

Lots and lots of duplication #267

cbizon commented Sep 2, 2021

cbizon commented Sep 2, 2021

cbizon commented Sep 8, 2021

uhbrar commented Oct 5, 2021

richakanwar13 commented Dec 3, 2021

Lots and lots of duplication #267

Lots and lots of duplication #267

Comments

cbizon commented Sep 2, 2021

cbizon commented Sep 2, 2021

cbizon commented Sep 8, 2021

uhbrar commented Oct 5, 2021

richakanwar13 commented Dec 3, 2021