Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPC Corrections #86

Closed
richakanwar13 opened this issue Feb 25, 2022 · 10 comments
Closed

EPC Corrections #86

richakanwar13 opened this issue Feb 25, 2022 · 10 comments
Assignees

Comments

@richakanwar13
Copy link

Need to ensure this is working as expected.

@kennethmorton
Copy link
Collaborator

Many of our edges flowing through automat to strider and then to Aragorn ultimately end up looking like this.

{
     "attributes": [
            {
                "attribute_source": null,
                "attribute_type_id": "biolink:original_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:drugcentral",
                "value_type_id": null,
                "value_url": null
            },
            {
                "attribute_source": null,
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:aragorn",
                "value_type_id": null,
                "value_url": null
            },
            {
                "attribute_source": null,
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:automat-robokop",
                "value_type_id": null,
                "value_url": null
            }
    ]
}

Here is an attribute from BTE

{
          "attribute_source": null,
          "attribute_type_id": "biolink:aggregator_knowledge_source",
          "attributes": null,
          "description": null,
          "original_attribute_name": null,
          "value": ["infores:translator-biothings-explorer"],
          "value_type_id": "biolink:InformationResource",
          "value_url": null
}

Here is one from COHD

{
          "attribute_source": "infores:cohd",
          "attribute_type_id": "biolink:original_knowledge_source",
          "attributes": null,
          "description": null,
          "original_attribute_name": null,
          "value": "infores:cohd",
          "value_type_id": "biolink:InformationResource",
          "value_url": "http://cohd.io/api/query"
}

I believe that we should be

  • setting attribute_source for original knowledge sources in automat
  • setting value_type_id to biolink:InformationResource pretty much everywhere it is currently null
  • optionally setting value_url, description, original_attribute_name, but I could pass on this for brevity

@cbizon
Copy link
Contributor

cbizon commented Mar 1, 2022

What should attribute_source be for original knowledge sources in automat? "infores:automat" or something else?

@kennethmorton
Copy link
Collaborator

kennethmorton commented Mar 1, 2022

Technically only attribute_type_id is required, but I feel like we should be adding more as appropriate. According to this attribute_source should be

The source of the core assertion made by the key-value pair of an attribute object. Use a CURIE or namespace designator for this resource where possible.

I think for most things where the value is an infores it is typically the same infores. At least that's what COHD does. In reality I think the intention is to create a linked list of who called who. "I heard it from here". Under this logical, COHD is doing the right thing as an original knowledge source. It's the aggregators that I think need to change things.

Modifying the edge from earlier, I think it should be

{
     "attributes": [
            {
                "attribute_source": "infores:drugcentral",
                "attribute_type_id": "biolink:original_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:drugcentral",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            },
            {
                "attribute_source": "infores:automat-robokop",
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:aragorn",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            },
            {
                "attribute_source": "infores:drugcentral",
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:automat-robokop",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            }
    ]
}

This maintains the call stack. Aragorn called Robokop who had info from Drug Central. The order didn't have meaning, which is good because things were out of order anyway.

I have never actually scene this in the wild, but it feels right.

edit: fixed link.

@mbrush
Copy link

mbrush commented Mar 2, 2022

Hi guys. The documentation describing the TRAPI Standard for Representing Source Retrieval Provenance should answer your questions. Worth a read top to bottom, but example 1B in the Data Examples section is most pertinent. And below this example, you'll see the following relevant comment: "Note that the attribute_source fields indicate the Information Resource that made the key-value assertion about source provenance that is carried in a given Attribute object (here, an assertion that a particular resource was an original or aggregator source of the knowledge expressed in the Edge)."

So, bringing this back to your example, where I understand the retrieval path for the Edge to be DrugCentral --> AutomatRobokop --> ARAGORN. The TRAPI message for this edge will include a separate Attribute object for each of these three Information Resources.

  • The Attribute holding DrugCentral will use an 'original_knowledge_source' attribute type (assuming you know this was the original source of the knowledge - if it aggregated it from somewhere else, but you don't know or want to include this original source, use 'primary_knowledge_source'). The attribute_source field in this Attribute will capture who said that DrugCentral was the original knowledge source - which I think is AutomatRobokop.

  • The Attribute holding AutomatRobokop will use an 'aggregator_knowledge_source' attribute type. And the attribute_source will hold who said that AutomatRobokop was an aggregator source - if AutomatRobokop adds this Attribute to the data before passing it on to ARAGORN, then they would be the source. If ARAGORN adds this Attribute to the data when they receive it, then they would be the source. (Apologies, I forget what was decided here)

  • Finally, the Attribute holding ARAGORN will also use an 'aggregator_knowledge_source' attribute type. And the attribute_source will hold who asserted that ARAGORN was an aggregator source. Again, I am not sure when this Attribute gets added - if ARAGORN does it themselves, then they are the source.

This is how I see things working. Does it make sense? I can easily add an updated version of the data example below (once I understand what the policy is on who adds the Attributes declaring translator aggregator sources)

@kennethmorton
Copy link
Collaborator

kennethmorton commented Mar 2, 2022

Thanks @mbrush, this document is super helpful! I think I had my link list linked backwards. attribute_source does NOT mean "I heard it from here" it means "I was asked by".

Translating the prose above into JSON.

{
     "attributes": [
            {
                "attribute_source": "infores:automat-robokop",
                "attribute_type_id": "biolink:original_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:drugcentral",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            },
            {
                "attribute_source": "infores:aragorn",
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:aragorn",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            },
            {
                "attribute_source": "infores:aragorn",
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:automat-robokop",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            }
    ]
}

This is true because, you are correct, Aragorn adds its own EPC to the message, and the user called Aragorn. This leads me to a follow up question.

How should automat-robokop have responded to Aragorn? I think it might be like this.

{
     "attributes": [
            {
                "attribute_source": "infores:automat-robokop",
                "attribute_type_id": "biolink:original_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:drugcentral",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            },
            {
                "attribute_source": "infores:automat-robokop",
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:automat-robokop",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            }
    ]
}

Is it then Aragorn's job to change the second item to insert itself as the attribute_source to assert that Aragon called automat-robokop? It should do this in addition to adding it's own entry in the list where both attribute_source and value are infores:aragorn?

Thanks for the help!

edit: spelling

@mbrush
Copy link

mbrush commented Mar 2, 2022

Ideally Aragorn would not have to change anything in the retrieval provenance when it gets data from Automat. It should just add one more attribute adding itself to this chain. Since Automat is the one that adds an attribute indicating itself as an aggregator source before sending messages to other systems, it should de facto be the 'source' for this Attribute. There should be no need to change this when the message gets into Aragorn.

Re:

attribute_source does NOT mean "I heard it from here" it means "I was asked by".

. . . this is not currently correct (although it would be useful if it were because it lets you order the retrieval path). Generally, attribute_source captures the agent who made the claim captured in the Attribute, as expressed in its core key,value pair (attribute_type_id, value).

If the attribute holds a publication supporting an Edge, attribute_source captures the agent/system that initially said "this publication supports the statement expressed in the edge".

If the attribute holds a confidence score for an Edge, attribute_source captures the agent/system that said "this score reflects our confidence that the statement expressed in the edge is true".

In the case of source retrieval provenance - when an attribute holds an Info Resource from which the knowledge expressed in an edge was retrieved at some point, attribute_source captures the agent/system that initially said "this Info Resource was a source for the statement expressed in this Edge". Since our convention is for the system sending a message to add an Attribute declaring themselves as an aggregating resource before passing it along, I think it follows that the sending aggregator would be the attribute_source (not the receiving/requesting system).

That said, if we wanted to make the attribute_source field more useful for allowing us to order the retrieval path, I would be fine with defining conventions that make this possible. But I don't think we want a scenario where requesting systems have to overwrite values of data passed to them.

@cbizon curious about your take on all this?

@cbizon
Copy link
Contributor

cbizon commented Mar 4, 2022

My current understanding:

attribute_source is whoever added that attribute blob to the trapi chain.

Automat should have responded to aragorn with what @kennethmorton said, because it added both of these attributes:

 {
     "attributes": [
            {
                "attribute_source": "infores:automat-robokop",
                "attribute_type_id": "biolink:original_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:drugcentral",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            },
            {
                "attribute_source": "infores:automat-robokop",
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:automat-robokop",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            }
    ]
}

Then, when aragorn passed this information back to the ARS it just adds to the chain (not changing any entries) an attribute like

{
                "attribute_source": "infores:aragorn",
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "attributes": null,
                "description": null,
                "original_attribute_name": null,
                "value": "infores:aragorn",
                "value_type_id": "biolink:InformationResource",
                "value_url": null
            }

Saying "Aragorn is telling you that Aragorn passed this edge back"

For cases where value is a translator tool, you would expect value == attribute_source (I can't think of an example where that's not true).

But for cases where value is not a translator tool, then some translator tool (the source) will be making a statement about where the data came from (the value)

@kennethmorton
Copy link
Collaborator

@cbizon After Matt's explanation, this is my understanding as well. It just leaves a little to be desired. Since the EPC group is exploring future modifications, we should take this approach for now and await further direction.

@mbrush
Copy link

mbrush commented Mar 4, 2022

This sounds like the right approach to me for now. I agree that the current model is not ideal . . . esp if we want to be able to assemble an ordered retrieval chain. We were hamstrung by the need to stuff this into one level of attribute objects in the TRAPI schema. But there is renewed interest in supporting more expressivity when it comes to source retrieval chains – so the modeling may get refactored in the near future.

re:

"But for cases where value is not a translator tool, then some translator tool (the source) will be making a statement about where the data came from (the value)"

I'll reiterate what I think you are saying here with an example. Consider a scenario where the Automat KP pulls knowledge from DGIdb and codifies it as an Edge in their graph, then sends on to Aragorn. But DGIdb as an aggregator pulled this knowledge from Chembl. In this scenario, we would have the following Attribute objects:

 
           {    
                "attribute_type_id": "biolink:original_knowledge_source",
                "value": "infores:chembl",
                "value_type_id": "biolink:InformationResource",
                "attribute_source": "infores:dgidb"   # b/c the info that chembl is the original source came from dgidb
            },
 
           {    
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "value": "infores:dgidb",
                "value_type_id": "biolink:InformationResource",
                "attribute_source": "infores:automat-dgidb"    # b/c automat is the one characterizing dgidb as their source, and saying they are an aggregator
            },
 
           {    
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "value": "infores:automat-dgidb",
                "value_type_id": "biolink:InformationResource",
                "attribute_source": "infores:automat-dgidb"   # b/c automat is also saying that they themselves are an aggregator before passing the info to aragorn
            },
 
           {    
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "value": "infores:aragorn",
                "value_type_id": "biolink:InformationResource",
                "attribute_source": "infores:aragorn"   # b/c aragorn is saying that they are an aggregator as well, for anyone pulling this info from them
            }

Again, not ideal, but where we are at now.

@kennethmorton
Copy link
Collaborator

I believe this is covered from a Strider/Aragorn perspective by this PR. Automat/Plater will also need an update.

@mbrush thanks for all of your help. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants