Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move PI profile images to official metadata schema #19

Closed
jeffbaumes opened this issue Apr 1, 2021 · 24 comments
Closed

Move PI profile images to official metadata schema #19

jeffbaumes opened this issue Apr 1, 2021 · 24 comments
Assignees
Labels
LARGE 7-10 days
Milestone

Comments

@jeffbaumes
Copy link

This is currently in the portal repo. Should we consider something like gravatar? Unsure if ORCID provides public access to a profile image but that might be ideal.

@dehays
Copy link
Contributor

dehays commented Apr 5, 2021

I don't see anywhere to include an image to a profile in ORCiD so I'd guess we can't get researcher images from there. Gravitar - maybe in cases where someone uses Wordpress or something else that caused them to add an avatar to Gravatar.

The PI profiles for the initial FICUS studies were very manually provided. I think the best the schema can do here is provide an optional attribute for an image URL. Thinking of the case where 10K+ studies are imported from NCBI - probably nothing there to set a profile image on import.

@jeffbaumes
Copy link
Author

Agree, a URL would work fine and may be what we want to stick with at least near-term. Submitters could use the URL of their gravatar if they are savvy enough to do so I suppose. The important thing is to get it into the schema and not hard-coded in the client.

@jeffbaumes
Copy link
Author

@jbeezley could you get a PR together to add an image URL added to the PI schema? We could just point current PI urls to our server for now, which is maybe not ideal but at least the PI info would be upstream. I expect the current mapping of PI to image URL could also be placed in the PR and used in the scripts that pull together that JSON from our current dataset.

@wdduncan could you point us to the code that creates PI JSON?

@wdduncan
Copy link
Contributor

wdduncan commented Apr 6, 2021

@jbeezley I am pulling the principle investigators name from the contact table in GOLD. To add this info to my ETL, I think I might have to add another table to the ETL ingest. Might be easier to have a call about how to best do it ...

@jbeezley
Copy link

jbeezley commented Apr 6, 2021

We can't use a URL from our server because I store the data as a binary blob in the database. It also depends on the uuid of the principal investigator row. Perhaps we could base64 encode the data and put it in the provided study json? For reference, the images used are stored in https://github.com/microbiomedata/nmdc-server/tree/master/nmdc_server/ingest/pis.

@jbeezley
Copy link

jbeezley commented Apr 6, 2021

I don't see anywhere to include an image to a profile in ORCiD so I'd guess we can't get researcher images from there. Gravitar - maybe in cases where someone uses Wordpress or something else that caused them to add an avatar to Gravatar.

Gravatar would be a good alternative, but we don't necessarily get email addresses from ORCiD's. I checked some of our existing PI's and they don't make email public.

@jeffbaumes
Copy link
Author

jeffbaumes commented Apr 6, 2021

Would it be reasonable to support both external URLs and image content with allowing either a URL or data URI? As long as it is validated as one or the other, we could safely pass this through to the img src attribute and it would work in either case.

@dwinston
Copy link
Collaborator

dwinston commented Apr 6, 2021

@jeffbaumes I'd rather there be only one (external URL) versus two modes. It should be no more difficult to supply a URL for a profile image than for a data object.

On the topic of offering use of a Gravar image, I think that can be convenient, but should be opt-in somehow. Some PIs may have gravatar images from a decade ago that they have forgotten about and would prefer not to use.

@jeffbaumes
Copy link
Author

You could think of my suggestion as actually only one thing: provide a URI to an image. It just happens there are multiple types of URI we could support fairly easily. They could be handled identically start to finish, with no extra logic anywhere to support them other than a more complex regex for validation, so I'm not seeing much downside.

I'd be ok with URL, and savvy PIs could find and use their gravatar URL so in that sense it would be opt-in. The main extra-work-for-us for URL-only is that we would need to decide where to host the current PI images. They need to be at static, stable URLs.

@wdduncan
Copy link
Contributor

wdduncan commented Apr 7, 2021

All these ideas are fine with me, but where in the ETL do we insert the URL/URI? I can do it on my end by simply having a file that gives the URI for each investigator in the contact table. However, this won't work for investigators that not registered in the GOLD database.

@jbeezley
Copy link

jbeezley commented Apr 7, 2021

We can make it optional and on the portal show a placeholder image. A lot of these questions on what to make required (#310) depend on features needed for the portal and what we are willing to leave blank.

@dehays
Copy link
Contributor

dehays commented Apr 9, 2021

Moving to nmdc-schema to add optional pi_image_url slot

For the studies we currently have (12 FICUS) - will need to put image files on NERSC and manually set them in metadata - there is no standard source for these images so I don't see a way the GOLD ETL can set these

@wdduncan Question on implementation - an image_url on the person entity seems correct to me. Then study would refer to the PI (a person) image_url attribute. I think this is what you are doing with the principal_investigator_name on study. Does this make sense?

@dehays dehays transferred this issue from microbiomedata/nmdc-metadata Apr 9, 2021
@wdduncan
Copy link
Contributor

wdduncan commented Apr 9, 2021

@dehays yes, that is what I was thinking.

@dwinston
Copy link
Collaborator

dwinston commented Apr 9, 2021

So the plan is to

  1. eliminate core/person_value class
  2. give core/person class an orcid slot
  3. give core/person class an image_url slot
  4. remove nmdc/study class' principal_investigator_name slot
  5. add nmdc/study class principal_investigator slot with range core/person

Is that right?

@jeffbaumes
Copy link
Author

@wdduncan I added you as an assignee since I don't think @jbeezley can make the actual query changes himself. Please correct this or delegate if I'm off here.

@ssarrafan
Copy link
Collaborator

Adding comments from email exchange for reference:

Agree with your comments here David (and yours Kjiersten). I commented in parallel on GitHub, but the gist is that we need to resolve which fields we can and can't expect to require. Many "required" things for the portal could be made optional if needed.

#41

On Wed, Apr 28, 2021 at 1:31 PM Kjiersten Fagnan kmfagnan@lbl.gov wrote:
I support the approach David laid out for what fields are required vs optional. I can add the following comments to the ticket, but we seem to be getting into this via email.

Could we create some default values for the portal to populate the page - avatar, URL, DOI and scientific objectives would be harder if not impossible.

In the future, when contributors are providing data to NMDC, could we also collect - photo, website URL, etc as part of the submission process - or perhaps give the PI the ability to add this themselves? This depends on having some level of access controls (different roles in the data portal than exist right now). Maybe this is part of working directly with the PIs to get their help on the study/data landing pages?

Kjiersten

On Wed, Apr 28, 2021 at 10:20 AM David Hays dehays@lbl.gov wrote:
Bill and Emiley said:

@emiley Eloe-Fadrosh any idea of who to follow up on with to get access to the data needed?

@emiley Eloe-Fadrosh any idea of who to follow up on with to get access to the PI images needed?

David should be able to address #41, seems like a GOLD database dump issue?

For #19, as was indicated in the github ticket, these were all manually collected by me. Not sure the best solution here, but this could tie into the more general discussion of the study pages (and some items from #41 like scientific objectives that are not part of the GOLD db dump). Not everything can be fully automated.... just my two cents.


On #41, I believe the fields that Bill is referring to are the ones that are NOT available from GOLD; i.e. those listed in microbiomedata/nmdc-metadata#301 and #19 such as PI web site, PI image, scientific objective, publication DOIs. Basically, the ones that Emiley collected manually and provided to Kitware.

For these, Bill could add these to the schema as non required attributes. He has no way of making the GOLD ETL populate these because they do not exist in GOLD.

Jon states that the portal UI depends on these fields - but I believe the portal UI will need to treat them as optional fields as well because normally they will not be available. If we add 10K studies tomorrow from GOLD or NCBI - we will not be waiting for Emiley to collect values for these fields before they can be displayed in the UI. The portal UI needs to be flexible enough to handle cases where these values are not available.

I also believe it should not be the responsibility of search portal development to merge additional metadata for studies to extend what was made available for ingest. So that implies the need for an curate/annotate procedure that is available between GOLD or NCBI ETL and search portal ingest. And in the case of images - in addition to a way to edit the study json docs to add PI image URLs, there is also the need to add and manage the image files to a location associated with the metadata URL.

So for #41, there are a number of fields that are always available (We will always have a PI for a study.) that can be made required in the schema. But for those for which there is no available source except manual curation, at best these could be optional fields in the schema. Make sense?

-David

On Tue, Apr 27, 2021 at 1:51 PM Emiley Eloe-Fadrosh eaeloefadrosh@lbl.gov wrote:
For these two:

@emiley Eloe-Fadrosh any idea of who to follow up on with to get access to the data needed?

@emiley Eloe-Fadrosh any idea of who to follow up on with to get access to the PI images needed?

David should be able to address #41, seems like a GOLD database dump issue?

For #19, as was indicated in the github ticket, these were all manually collected by me. Not sure the best solution here, but this could tie into the more general discussion of the study pages (and some items from #41 like scientific objectives that are not part of the GOLD db dump). Not everything can be fully automated.... just my two cents.

@wdduncan
Copy link
Contributor

Please move to May sprint.

@wdduncan wdduncan removed this from To do in NMDC April 2021 Sprint Apr 29, 2021
@wdduncan wdduncan added this to To do in NMDC May 2021 Sprint via automation Apr 29, 2021
@ssarrafan ssarrafan modified the milestones: Sprint 1, Sprint 2 May 3, 2021
@wdduncan wdduncan added the LARGE 7-10 days label May 5, 2021
@dwinston
Copy link
Collaborator

dwinston commented May 5, 2021

@jeffbaumes is this subsumed by / a component of #41?

@emiley
Copy link

emiley commented May 5, 2021 via email

@dwinston
Copy link
Collaborator

dwinston commented May 5, 2021

sorry @emiley, thank you for alerting us! We meant @emileyfadrosh. You can unsubscribe yourself.

ScreenFlow.mp4

@jeffbaumes
Copy link
Author

@jeffbaumes is this subsumed by / a component of #41?

@dwinston This issue has a slight additional complication attached (we need to host the profile images elsewhere and link to them by URL in the schema) so I feel it could be deserving of its own issue. But I'm also ok rolling it into #41.

@ssarrafan
Copy link
Collaborator

Based on the meeting today with @dehays, @emileyfadrosh, @dwinston, @wdduncan, and @jbeezley, @wdduncan will add image URL on the person object to the schema. The images can be stored on an object store at NERSC.

@wdduncan wdduncan removed this from To do in NMDC May 2021 Sprint May 29, 2021
@wdduncan wdduncan added this to To do in NMDC June 2021 Sprint via automation May 29, 2021
@ssarrafan
Copy link
Collaborator

Removing Jon from assignee.

@wdduncan
Copy link
Contributor

I've added a profile image url slot (see PR #68).

This is to be use with study objects like so:

 {
    "id": "gold:Gs0112340",
    "name": "Thawing permafrost microbial communities from the Arctic, studying carbon transformations",
    "description": "....",
    "principal_investigator": {
        "has_raw_value": "Virginia Rich",
         "profile image url": "http://....." <--- new slot
     }
}

NB: the property principal_investigator_name is now named principal_investigator.

Closing this ticket. But it can be reopened if needed.

NMDC June 2021 Sprint automation moved this from To do to Done Jun 17, 2021
turbomam pushed a commit that referenced this issue Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LARGE 7-10 days
Projects
No open projects
Development

No branches or pull requests

7 participants