New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrospectively assign group attribution for existing GO-CAMs #458

Closed
RLovering opened this Issue Jul 5, 2017 · 29 comments

Comments

@RLovering

RLovering commented Jul 5, 2017

Hi
I have raised this issue a couple of times, obviously I should have created a ticket so that I can be kept in the loop about progress in this area.

Currently there is no incentive for my team at UCL to use Noctua to create annotation because the annotations we create will not appear in AmiGO or QuickGO as attributed to one of the UCL teams, therefore I will not be able to create a report for my grant funding bodies on the annotations created and demonstrate that I have met my funded targets.

In addition, It is important that any SynGO Noctua annotations that we make that are exported to the GOC annotation files are attributed to SynGO-UCL as SynGO VU curators want to be able to distinguish their 'expert created' annotations from those UCL is creating.

In addition, Tony has pointed out that currently he is unable to pick up the Noctua annotations. Consequently the GO annotations displayed by QuickGO, NCBI Gene, Ensembl, UniProt and the files incorporated into many functional analysis tools are out of sync with the data displayed in AmiGO.

Best

Ruth

@cmungall @tonysawfordebi @BarbaraCzub @thomaspd

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Jul 5, 2017

Member

Obviously this is not working as intended.

Let's break this down. The first thing to check is the model itself, let's see if your work is attributed to the relevant groups in the model - can you post a link to a model?

Member

cmungall commented Jul 5, 2017

Obviously this is not working as intended.

Let's break this down. The first thing to check is the model itself, let's see if your work is attributed to the relevant groups in the model - can you post a link to a model?

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Jul 5, 2017

Member
  • retrospective: we need a way to retrospectively go back to annotations done by people in your group and add the provided_by assertions
  • prospective: we need to make sure moving forward people know how attribution works, and that they need to select a group for it not to be attributed to GO_Noctua. We will go over this on an annotation call.
  • Translation from models to GPAD. We will also go over this and ensure it works according to people's expectations. There are some known issues to be discussed - e.g. what if >1 group is assigned to a piece of evidence (see #502)
Member

cmungall commented Jul 5, 2017

  • retrospective: we need a way to retrospectively go back to annotations done by people in your group and add the provided_by assertions
  • prospective: we need to make sure moving forward people know how attribution works, and that they need to select a group for it not to be attributed to GO_Noctua. We will go over this on an annotation call.
  • Translation from models to GPAD. We will also go over this and ensure it works according to people's expectations. There are some known issues to be discussed - e.g. what if >1 group is assigned to a piece of evidence (see #502)
@BarbaraCzub

This comment has been minimized.

Show comment
Hide comment
@BarbaraCzub

BarbaraCzub Jul 5, 2017

Hello @cmungall

You're welcome to use the model entitled 'SynGO no. 1a SLC1A1_PMID:7914198' as an example:
http://noctua.berkeleybop.org/editor/graph/gomodel:586fc17a00001440

It falls under SynGO-UCL.

Thanks,
Barbara

BarbaraCzub commented Jul 5, 2017

Hello @cmungall

You're welcome to use the model entitled 'SynGO no. 1a SLC1A1_PMID:7914198' as an example:
http://noctua.berkeleybop.org/editor/graph/gomodel:586fc17a00001440

It falls under SynGO-UCL.

Thanks,
Barbara

@kltm kltm added the enhancement label Jul 5, 2017

@kltm kltm added this to the wishlist milestone Jul 5, 2017

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Jul 5, 2017

Member

Thanks!

OK, so in this model there is no providedBy at the level of the evidence assertions. That's as expected as you didn't know about the groups ability when making these in March. Now don't worry we're not going to make you go back and retrospectively manually tag every evidence instance with your project, we'll figure out a solution here.

One idea we kicked around today was this one:
geneontology/minerva#119
In this case, you already have a model-level providedBy:

image

so the idea would be that 'orphan' evidences would inherit from the model level (prospectively, you would try and make sure you wear the correct 'hat' whenever making new assertions). However, we need to discuss this further.

Thanks for your patience!

Member

cmungall commented Jul 5, 2017

Thanks!

OK, so in this model there is no providedBy at the level of the evidence assertions. That's as expected as you didn't know about the groups ability when making these in March. Now don't worry we're not going to make you go back and retrospectively manually tag every evidence instance with your project, we'll figure out a solution here.

One idea we kicked around today was this one:
geneontology/minerva#119
In this case, you already have a model-level providedBy:

image

so the idea would be that 'orphan' evidences would inherit from the model level (prospectively, you would try and make sure you wear the correct 'hat' whenever making new assertions). However, we need to discuss this further.

Thanks for your patience!

@BarbaraCzub

This comment has been minimized.

Show comment
Hide comment
@BarbaraCzub

BarbaraCzub Jul 6, 2017

I think a model-level providedBy could be a suitable solution.

BarbaraCzub commented Jul 6, 2017

I think a model-level providedBy could be a suitable solution.

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Nov 9, 2017

Member

For looking at the scope of the providedBy issues, here are some numbers:

Of the 595 non-SYNGO models...

  • has 0 providedBy: 470
    • has 0 ORCIDs: 51
    • has 1 ORCID: 73
    • has more than 1 ORCID: 346
  • has at least 1 providedBy: 125
    • has 0 ORCIDs: 1
    • has 1 ORCID: 89
    • has more than 1 ORCID: 35

(Also, we have 98 models that include some kind of GOC xref for the user; I believe we should map those forward: geneontology/go-site#453 (comment))

Given that this problem is unlikely to be manually tractable, I would suggest the following approach once the metadata is updated, but would like feedback and agreement from all parties:

  • All GOC xrefs are replaced with the full user URIs.
  • For all models that do not have a model-level providedBy
    • For every contributor statement that does not already have an analogous providedBy statement we add one to the user's default group.

The middle step there is for hoping that groups that are using the groups to their full effect (to track different funding sources, etc.) can give us a list of those at the model level so we can add those with a quick script.

This is more a @cmungall or @balhoff question, but I'm not actually sure of the state of our use of model-level providedBys--I can see they are used by SYNGO as a default, but not much elsewhere. I think that unless there will be more model-level defaults used in the future it might simplify things to just make sure that those are pushed down into the model as much as possible (as the current meaning is more like a "touch" than a default and we will have cases where there is a mix).

For those who want to look at getting these numbers out of the files on the command line:

grep -c provided 0*.ttl 5*.ttl  | grep "\:0" | cut -d ':' -f 1 | xargs -d '\n' bash -c 'for filename; do grep -H orcid "$filename" | sort | uniq; done;' | cut -d ':' -f 1 | sort | uniq -c | sort -nr | grep "1 " | wc
Member

kltm commented Nov 9, 2017

For looking at the scope of the providedBy issues, here are some numbers:

Of the 595 non-SYNGO models...

  • has 0 providedBy: 470
    • has 0 ORCIDs: 51
    • has 1 ORCID: 73
    • has more than 1 ORCID: 346
  • has at least 1 providedBy: 125
    • has 0 ORCIDs: 1
    • has 1 ORCID: 89
    • has more than 1 ORCID: 35

(Also, we have 98 models that include some kind of GOC xref for the user; I believe we should map those forward: geneontology/go-site#453 (comment))

Given that this problem is unlikely to be manually tractable, I would suggest the following approach once the metadata is updated, but would like feedback and agreement from all parties:

  • All GOC xrefs are replaced with the full user URIs.
  • For all models that do not have a model-level providedBy
    • For every contributor statement that does not already have an analogous providedBy statement we add one to the user's default group.

The middle step there is for hoping that groups that are using the groups to their full effect (to track different funding sources, etc.) can give us a list of those at the model level so we can add those with a quick script.

This is more a @cmungall or @balhoff question, but I'm not actually sure of the state of our use of model-level providedBys--I can see they are used by SYNGO as a default, but not much elsewhere. I think that unless there will be more model-level defaults used in the future it might simplify things to just make sure that those are pushed down into the model as much as possible (as the current meaning is more like a "touch" than a default and we will have cases where there is a mix).

For those who want to look at getting these numbers out of the files on the command line:

grep -c provided 0*.ttl 5*.ttl  | grep "\:0" | cut -d ':' -f 1 | xargs -d '\n' bash -c 'for filename; do grep -H orcid "$filename" | sort | uniq; done;' | cut -d ':' -f 1 | sort | uniq -c | sort -nr | grep "1 " | wc

@vanaukenk vanaukenk added this to TODO in Noctua Release 1.0 Nov 16, 2017

@vanaukenk vanaukenk moved this from TODO to In Progress in Noctua Release 1.0 Nov 16, 2017

@vanaukenk vanaukenk removed this from In Progress in Noctua Release 1.0 Nov 16, 2017

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Nov 22, 2017

Member

Moving forward:

  • script for
    • if noctua perms, check at least one group
    • if noctua perms, check has ORCID

For now, make report public; later, make sanity check.
Once completed, can migrate all GOC:xzys forward.

All people in users.yaml who are Noctua/GO-CAM editors need to understand that the top group is the default one. If we cannot make this clear, we should add a new required field: group-default.

Once the above migration is done, create a spreadsheet has:

  • col1 model name
  • col2 model ID
  • col3 contributor names/ids
  • col4: for users to add their preferred providedBy for all activity in a model that is different than their current default.

(We may tag @dougli1sqrd to help out with that spreadsheet generation.)

Once that is filled in, it will be back to the software group to figure out the implementation strategy.

Member

kltm commented Nov 22, 2017

Moving forward:

  • script for
    • if noctua perms, check at least one group
    • if noctua perms, check has ORCID

For now, make report public; later, make sanity check.
Once completed, can migrate all GOC:xzys forward.

All people in users.yaml who are Noctua/GO-CAM editors need to understand that the top group is the default one. If we cannot make this clear, we should add a new required field: group-default.

Once the above migration is done, create a spreadsheet has:

  • col1 model name
  • col2 model ID
  • col3 contributor names/ids
  • col4: for users to add their preferred providedBy for all activity in a model that is different than their current default.

(We may tag @dougli1sqrd to help out with that spreadsheet generation.)

Once that is filled in, it will be back to the software group to figure out the implementation strategy.

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Nov 23, 2017

Member

@vanaukenk looking at the output of the scripts, we have 177 violations. The script will run in a public place tonight; I'll try and add the link soon.

Member

kltm commented Nov 23, 2017

@vanaukenk looking at the output of the scripts, we have 177 violations. The script will run in a public place tonight; I'll try and add the link soon.

@vanaukenk

This comment has been minimized.

Show comment
Hide comment
@vanaukenk

vanaukenk Nov 27, 2017

@kltm -thanks. If you have the link for the script output, I can add it to the agenda for the annotation call tomorrow:
http://wiki.geneontology.org/index.php/Annotation_Conf._Call_2017-11-28#users.yaml_file

vanaukenk commented Nov 27, 2017

@kltm -thanks. If you have the link for the script output, I can add it to the agenda for the annotation call tomorrow:
http://wiki.geneontology.org/index.php/Annotation_Conf._Call_2017-11-28#users.yaml_file

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Nov 27, 2017

Member

@vanaukenk
This should now update nightly, assuming the pipeline is successful:
http://skyhook.berkeleybop.org/snapshot/metadata/users-and-groups-report.txt

Member

kltm commented Nov 27, 2017

@vanaukenk
This should now update nightly, assuming the pipeline is successful:
http://skyhook.berkeleybop.org/snapshot/metadata/users-and-groups-report.txt

@vanaukenk

This comment has been minimized.

Show comment
Hide comment
@vanaukenk

vanaukenk Nov 28, 2017

Okay, thanks. Right now I'm getting this when I click on the link above:

Not Found

The requested URL /snapshot/metadata/users-and-groups-report.txt was not found on this server.

vanaukenk commented Nov 28, 2017

Okay, thanks. Right now I'm getting this when I click on the link above:

Not Found

The requested URL /snapshot/metadata/users-and-groups-report.txt was not found on this server.

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Nov 28, 2017

Member

Clicking on the above works for me (now). It is possible that you looked at the snapshot while the data products were rebuilding.

Member

kltm commented Nov 28, 2017

Clicking on the above works for me (now). It is possible that you looked at the snapshot while the data products were rebuilding.

@vanaukenk

This comment has been minimized.

Show comment
Hide comment
@vanaukenk

vanaukenk Nov 28, 2017

Okay, thanks @kltm I can see the report now, and will add the link to the 2017-11-28 annotation call minutes on the wiki.

vanaukenk commented Nov 28, 2017

Okay, thanks @kltm I can see the report now, and will add the link to the 2017-11-28 annotation call minutes on the wiki.

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Nov 28, 2017

Member

Please keep in mind that the report is generated as part of the new build and a new attempt is made every night. This means that 1) if the load fails the report may not be there (which is a good thing) and 2) there will times every day during regeneration when the report is not yet there.

Member

kltm commented Nov 28, 2017

Please keep in mind that the report is generated as part of the new build and a new attempt is made every night. This means that 1) if the load fails the report may not be there (which is a good thing) and 2) there will times every day during regeneration when the report is not yet there.

@cmungall cmungall changed the title from Attribution with GO-CAM exports to GOC annotation files to Retrospectively assign group attribution for existing GOCAMs Dec 19, 2017

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Dec 19, 2017

Member

Some additional notes for my benefit

to run the report, from g-site:

python3 ./scripts/sanity-check-users-and-groups.py --users metadata/users.yaml --groups metadata/groups.yaml

looks like we have a lot of people missing groups. Many may no longer be active so it would be good to indicate these

Member

cmungall commented Dec 19, 2017

Some additional notes for my benefit

to run the report, from g-site:

python3 ./scripts/sanity-check-users-and-groups.py --users metadata/users.yaml --groups metadata/groups.yaml

looks like we have a lot of people missing groups. Many may no longer be active so it would be good to indicate these

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Feb 20, 2018

Member

@cmungall @kltm there are also group IRIs in Noctua models for which there is no entry in groups.yaml. For example http://purl.obolibrary.org/go/groups/UniProt in 59dc728000000205. I'm not sure how these have gotten in.

Member

balhoff commented Feb 20, 2018

@cmungall @kltm there are also group IRIs in Noctua models for which there is no entry in groups.yaml. For example http://purl.obolibrary.org/go/groups/UniProt in 59dc728000000205. I'm not sure how these have gotten in.

@kltm kltm changed the title from Retrospectively assign group attribution for existing GOCAMs to Retrospectively assign group attribution for existing GO-CAMs Feb 26, 2018

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Feb 28, 2018

Member

Okay, these are getting in by two of our workbenches.

#459 (comment)

Essentially,

                        reqs.use_groups([
                            // WARNING: We're minting money that we
                            // might not honor here.
                            'http://purl.obolibrary.org/go/groups/' + assby
                        ]);

This also relates to #539

@cmungall

Member

kltm commented Feb 28, 2018

Okay, these are getting in by two of our workbenches.

#459 (comment)

Essentially,

                        reqs.use_groups([
                            // WARNING: We're minting money that we
                            // might not honor here.
                            'http://purl.obolibrary.org/go/groups/' + assby
                        ]);

This also relates to #539

@cmungall

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Mar 14, 2018

Member

@cmungall
From #458 (comment)
I would like to add a check such that if there are noctua edit rights then a group must be assigned.
I would implement this as a first check for progress.

Member

kltm commented Mar 14, 2018

@cmungall
From #458 (comment)
I would like to add a check such that if there are noctua edit rights then a group must be assigned.
I would implement this as a first check for progress.

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Mar 14, 2018

Member

List of production models without model-level providedBy: http://yasgui.org/short/HJSFxyvFM

I tried a similar query that dug into providedBy on particular statements and got the same number of models, so I think it's fine to use this simpler query.

61 models—maybe not too many to just look at individually?

Member

balhoff commented Mar 14, 2018

List of production models without model-level providedBy: http://yasgui.org/short/HJSFxyvFM

I tried a similar query that dug into providedBy on particular statements and got the same number of models, so I think it's fine to use this simpler query.

61 models—maybe not too many to just look at individually?

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Mar 14, 2018

Member

Once users.ttl and groups.ttl are added to the triplestore, I could make a query that provides an example of what an automatically proposed group for each would be.

Member

balhoff commented Mar 14, 2018

Once users.ttl and groups.ttl are added to the triplestore, I could make a query that provides an example of what an automatically proposed group for each would be.

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Mar 14, 2018

Member

A bit of a slog, but not impossible. If we make a spreadsheet or something that curators can get and and update, I have a tool that we can use to just inject those into the models.

Just to confirm, currently, if there is no contributes/provided by annotation "locally" present, the exporter grabs it from the model level annotations?

Member

kltm commented Mar 14, 2018

A bit of a slog, but not impossible. If we make a spreadsheet or something that curators can get and and update, I have a tool that we can use to just inject those into the models.

Just to confirm, currently, if there is no contributes/provided by annotation "locally" present, the exporter grabs it from the model level annotations?

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Mar 14, 2018

Member

Just to confirm, currently, if there is no contributes/provided by annotation "locally" present, the exporter grabs it from the model level annotations?

@kltm yes that is true.

Member

balhoff commented Mar 14, 2018

Just to confirm, currently, if there is no contributes/provided by annotation "locally" present, the exporter grabs it from the model level annotations?

@kltm yes that is true.

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Mar 14, 2018

Member

This query adds contributor ID and name: http://yasgui.org/short/Hy0bwkwtG

It only creates 4 additional rows, so nearly all of these models have only one contributor. It's a very short list of folks; I think this will be easy to sort out.

Member

balhoff commented Mar 14, 2018

This query adds contributor ID and name: http://yasgui.org/short/Hy0bwkwtG

It only creates 4 additional rows, so nearly all of these models have only one contributor. It's a very short list of folks; I think this will be easy to sort out.

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Mar 14, 2018

Member

Query for groups used in production models: http://yasgui.org/short/SJvdsyvFz

Member

balhoff commented Mar 14, 2018

Query for groups used in production models: http://yasgui.org/short/SJvdsyvFz

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Mar 14, 2018

Member

@balhoff @dougli1sqrd - when was that triplestore last updated? Will we be able to sparql the source store (or something closely synced to it) soon?

Member

cmungall commented Mar 14, 2018

@balhoff @dougli1sqrd - when was that triplestore last updated? Will we be able to sparql the source store (or something closely synced to it) soon?

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Mar 14, 2018

Member

I would like to add a check such that if there are noctua edit rights then a group must be assigned.

Yes, let's do this.

Member

cmungall commented Mar 14, 2018

I would like to add a check such that if there are noctua edit rights then a group must be assigned.

Yes, let's do this.

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Mar 14, 2018

Member

@cmungall I updated the triplestores earlier today.

Member

kltm commented Mar 14, 2018

@cmungall I updated the triplestores earlier today.

@kltm

This comment has been minimized.

Show comment
Hide comment
@kltm

kltm Mar 22, 2018

Member

As a step in this process (and discussion today), @dougli1sqrd will be looking at fixing up the users.yaml and groups.yaml. As part of this change, any Noctua editors who do not have an ORCID as their uri will temporarily lose permissions to edit in Noctua until they are re-added. I think most editors at this point should be clear, but if anybody wants to double check the current report can be found here:
http://snapshot.geneontology.org/reports/users-and-groups-report.txt
The end goal of @dougli1sqrd 's work will be that report is empty (and we've added it to our travis checks).

Member

kltm commented Mar 22, 2018

As a step in this process (and discussion today), @dougli1sqrd will be looking at fixing up the users.yaml and groups.yaml. As part of this change, any Noctua editors who do not have an ORCID as their uri will temporarily lose permissions to edit in Noctua until they are re-added. I think most editors at this point should be clear, but if anybody wants to double check the current report can be found here:
http://snapshot.geneontology.org/reports/users-and-groups-report.txt
The end goal of @dougli1sqrd 's work will be that report is empty (and we've added it to our travis checks).

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall May 7, 2018

Member

covered by geneontology/noctua-models#84, and done

Member

cmungall commented May 7, 2018

covered by geneontology/noctua-models#84, and done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment