Move PI profile images to official metadata schema #19

jeffbaumes · 2021-04-01T13:56:53Z

This is currently in the portal repo. Should we consider something like gravatar? Unsure if ORCID provides public access to a profile image but that might be ideal.

dehays · 2021-04-05T17:12:10Z

I don't see anywhere to include an image to a profile in ORCiD so I'd guess we can't get researcher images from there. Gravitar - maybe in cases where someone uses Wordpress or something else that caused them to add an avatar to Gravatar.

The PI profiles for the initial FICUS studies were very manually provided. I think the best the schema can do here is provide an optional attribute for an image URL. Thinking of the case where 10K+ studies are imported from NCBI - probably nothing there to set a profile image on import.

jeffbaumes · 2021-04-05T21:26:12Z

Agree, a URL would work fine and may be what we want to stick with at least near-term. Submitters could use the URL of their gravatar if they are savvy enough to do so I suppose. The important thing is to get it into the schema and not hard-coded in the client.

jeffbaumes · 2021-04-05T21:30:54Z

@jbeezley could you get a PR together to add an image URL added to the PI schema? We could just point current PI urls to our server for now, which is maybe not ideal but at least the PI info would be upstream. I expect the current mapping of PI to image URL could also be placed in the PR and used in the scripts that pull together that JSON from our current dataset.

@wdduncan could you point us to the code that creates PI JSON?

wdduncan · 2021-04-06T13:31:30Z

@jbeezley I am pulling the principle investigators name from the contact table in GOLD. To add this info to my ETL, I think I might have to add another table to the ETL ingest. Might be easier to have a call about how to best do it ...

jbeezley · 2021-04-06T14:41:11Z

We can't use a URL from our server because I store the data as a binary blob in the database. It also depends on the uuid of the principal investigator row. Perhaps we could base64 encode the data and put it in the provided study json? For reference, the images used are stored in https://github.com/microbiomedata/nmdc-server/tree/master/nmdc_server/ingest/pis.

jbeezley · 2021-04-06T15:50:10Z

I don't see anywhere to include an image to a profile in ORCiD so I'd guess we can't get researcher images from there. Gravitar - maybe in cases where someone uses Wordpress or something else that caused them to add an avatar to Gravatar.

Gravatar would be a good alternative, but we don't necessarily get email addresses from ORCiD's. I checked some of our existing PI's and they don't make email public.

jeffbaumes · 2021-04-06T20:32:05Z

Would it be reasonable to support both external URLs and image content with allowing either a URL or data URI? As long as it is validated as one or the other, we could safely pass this through to the img src attribute and it would work in either case.

dwinston · 2021-04-06T21:30:53Z

@jeffbaumes I'd rather there be only one (external URL) versus two modes. It should be no more difficult to supply a URL for a profile image than for a data object.

On the topic of offering use of a Gravar image, I think that can be convenient, but should be opt-in somehow. Some PIs may have gravatar images from a decade ago that they have forgotten about and would prefer not to use.

jeffbaumes · 2021-04-07T12:45:50Z

You could think of my suggestion as actually only one thing: provide a URI to an image. It just happens there are multiple types of URI we could support fairly easily. They could be handled identically start to finish, with no extra logic anywhere to support them other than a more complex regex for validation, so I'm not seeing much downside.

I'd be ok with URL, and savvy PIs could find and use their gravatar URL so in that sense it would be opt-in. The main extra-work-for-us for URL-only is that we would need to decide where to host the current PI images. They need to be at static, stable URLs.

wdduncan · 2021-04-07T14:42:25Z

All these ideas are fine with me, but where in the ETL do we insert the URL/URI? I can do it on my end by simply having a file that gives the URI for each investigator in the contact table. However, this won't work for investigators that not registered in the GOLD database.

jbeezley · 2021-04-07T17:02:52Z

We can make it optional and on the portal show a placeholder image. A lot of these questions on what to make required (#310) depend on features needed for the portal and what we are willing to leave blank.

dehays · 2021-04-09T19:18:07Z

Moving to nmdc-schema to add optional pi_image_url slot

For the studies we currently have (12 FICUS) - will need to put image files on NERSC and manually set them in metadata - there is no standard source for these images so I don't see a way the GOLD ETL can set these

@wdduncan Question on implementation - an image_url on the person entity seems correct to me. Then study would refer to the PI (a person) image_url attribute. I think this is what you are doing with the principal_investigator_name on study. Does this make sense?

wdduncan · 2021-04-09T19:55:30Z

@dehays yes, that is what I was thinking.

dwinston · 2021-04-09T20:34:14Z

So the plan is to

eliminate core/person_value class
give core/person class an orcid slot
give core/person class an image_url slot
remove nmdc/study class' principal_investigator_name slot
add nmdc/study class principal_investigator slot with range core/person

Is that right?

jeffbaumes · 2021-04-12T18:50:44Z

@wdduncan I added you as an assignee since I don't think @jbeezley can make the actual query changes himself. Please correct this or delegate if I'm off here.

ssarrafan · 2021-04-28T17:40:39Z

Adding comments from email exchange for reference:

Agree with your comments here David (and yours Kjiersten). I commented in parallel on GitHub, but the gist is that we need to resolve which fields we can and can't expect to require. Many "required" things for the portal could be made optional if needed.

#41

On Wed, Apr 28, 2021 at 1:31 PM Kjiersten Fagnan kmfagnan@lbl.gov wrote:
I support the approach David laid out for what fields are required vs optional. I can add the following comments to the ticket, but we seem to be getting into this via email.

Could we create some default values for the portal to populate the page - avatar, URL, DOI and scientific objectives would be harder if not impossible.

In the future, when contributors are providing data to NMDC, could we also collect - photo, website URL, etc as part of the submission process - or perhaps give the PI the ability to add this themselves? This depends on having some level of access controls (different roles in the data portal than exist right now). Maybe this is part of working directly with the PIs to get their help on the study/data landing pages?

Kjiersten

On Wed, Apr 28, 2021 at 10:20 AM David Hays dehays@lbl.gov wrote:
Bill and Emiley said:

Make explicit list of fields required for portal to function, make missing entries invalidate the data #41 I do not have access to the data for some of the fields that Jeff is requesting. I can give an estimate until we track the data.

@emiley Eloe-Fadrosh any idea of who to follow up on with to get access to the data needed?

Move PI profile images to official metadata schema #19 Again, I do not have access to images of PIs.

@emiley Eloe-Fadrosh any idea of who to follow up on with to get access to the PI images needed?

David should be able to address #41, seems like a GOLD database dump issue?

For #19, as was indicated in the github ticket, these were all manually collected by me. Not sure the best solution here, but this could tie into the more general discussion of the study pages (and some items from #41 like scientific objectives that are not part of the GOLD db dump). Not everything can be fully automated.... just my two cents.

On #41, I believe the fields that Bill is referring to are the ones that are NOT available from GOLD; i.e. those listed in microbiomedata/nmdc-metadata#301 and #19 such as PI web site, PI image, scientific objective, publication DOIs. Basically, the ones that Emiley collected manually and provided to Kitware.

For these, Bill could add these to the schema as non required attributes. He has no way of making the GOLD ETL populate these because they do not exist in GOLD.

Jon states that the portal UI depends on these fields - but I believe the portal UI will need to treat them as optional fields as well because normally they will not be available. If we add 10K studies tomorrow from GOLD or NCBI - we will not be waiting for Emiley to collect values for these fields before they can be displayed in the UI. The portal UI needs to be flexible enough to handle cases where these values are not available.

I also believe it should not be the responsibility of search portal development to merge additional metadata for studies to extend what was made available for ingest. So that implies the need for an curate/annotate procedure that is available between GOLD or NCBI ETL and search portal ingest. And in the case of images - in addition to a way to edit the study json docs to add PI image URLs, there is also the need to add and manage the image files to a location associated with the metadata URL.

So for #41, there are a number of fields that are always available (We will always have a PI for a study.) that can be made required in the schema. But for those for which there is no available source except manual curation, at best these could be optional fields in the schema. Make sense?

-David

On Tue, Apr 27, 2021 at 1:51 PM Emiley Eloe-Fadrosh eaeloefadrosh@lbl.gov wrote:
For these two:

Make explicit list of fields required for portal to function, make missing entries invalidate the data #41 I do not have access to the data for some of the fields that Jeff is requesting. I can give an estimate until we track the data.

@emiley Eloe-Fadrosh any idea of who to follow up on with to get access to the data needed?

Move PI profile images to official metadata schema #19 Again, I do not have access to images of PIs.

@emiley Eloe-Fadrosh any idea of who to follow up on with to get access to the PI images needed?

David should be able to address #41, seems like a GOLD database dump issue?

For #19, as was indicated in the github ticket, these were all manually collected by me. Not sure the best solution here, but this could tie into the more general discussion of the study pages (and some items from #41 like scientific objectives that are not part of the GOLD db dump). Not everything can be fully automated.... just my two cents.

wdduncan · 2021-04-29T16:26:47Z

Please move to May sprint.

dwinston · 2021-05-05T22:00:33Z

@jeffbaumes is this subsumed by / a component of #41?

emiley · 2021-05-05T22:04:02Z

FYI - I think I’m possibly on this thread in error. Perhaps someone inadvertently at mentioned me rather than the correct recipient.

…

On Wed, May 5, 2021 at 6:00 PM Donny Winston ***@***.***> wrote: @jeffbaumes <https://github.com/jeffbaumes> is this subsumed by / a component of #41 <#41> ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2XIQFE27N3UE6BP3BKMY3TMG5ZDANCNFSM42VRBG3A> .

dwinston · 2021-05-05T22:51:31Z

sorry @emiley, thank you for alerting us! We meant @emileyfadrosh. You can unsubscribe yourself.

ScreenFlow.mp4

jeffbaumes · 2021-05-11T13:58:15Z

@jeffbaumes is this subsumed by / a component of #41?

@dwinston This issue has a slight additional complication attached (we need to host the profile images elsewhere and link to them by URL in the schema) so I feel it could be deserving of its own issue. But I'm also ok rolling it into #41.

ssarrafan · 2021-05-18T23:10:04Z

Based on the meeting today with @dehays, @emileyfadrosh, @dwinston, @wdduncan, and @jbeezley, @wdduncan will add image URL on the person object to the schema. The images can be stored on an object store at NERSC.

ssarrafan · 2021-06-02T23:28:47Z

Removing Jon from assignee.

wdduncan · 2021-06-17T21:07:47Z

I've added a profile image url slot (see PR #68).

This is to be use with study objects like so:

 {
    "id": "gold:Gs0112340",
    "name": "Thawing permafrost microbial communities from the Arctic, studying carbon transformations",
    "description": "....",
    "principal_investigator": {
        "has_raw_value": "Virginia Rich",
         "profile image url": "http://....." <--- new slot
     }
}

NB: the property principal_investigator_name is now named principal_investigator.

Closing this ticket. But it can be reopened if needed.

1446 instrument modeling

jeffbaumes assigned jbeezley Apr 5, 2021

dehays transferred this issue from microbiomedata/nmdc-metadata Apr 9, 2021

jeffbaumes assigned wdduncan Apr 12, 2021

ssarrafan added this to the Sprint 1 milestone Apr 12, 2021

ssarrafan mentioned this issue Apr 28, 2021

Make explicit list of fields required for portal to function, make missing entries invalidate the data #41

Closed

wdduncan removed this from To do in NMDC April 2021 Sprint Apr 29, 2021

wdduncan added this to To do in NMDC May 2021 Sprint via automation Apr 29, 2021

ssarrafan modified the milestones: Sprint 1, Sprint 2 May 3, 2021

wdduncan added the LARGE 7-10 days label May 5, 2021

wdduncan removed this from To do in NMDC May 2021 Sprint May 29, 2021

wdduncan added this to To do in NMDC June 2021 Sprint via automation May 29, 2021

ssarrafan unassigned jbeezley Jun 2, 2021

ssarrafan modified the milestones: Sprint 2, Sprint 3 Jun 4, 2021

dwinston mentioned this issue Jun 15, 2021

add person/profile_image #68

Merged

wdduncan closed this as completed Jun 17, 2021

NMDC June 2021 Sprint automation moved this from To do to Done Jun 17, 2021

turbomam pushed a commit that referenced this issue Feb 21, 2024

Merge pull request #19 from microbiomedata/1446-instrument-modeling

28aceee

1446 instrument modeling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move PI profile images to official metadata schema #19

Move PI profile images to official metadata schema #19

jeffbaumes commented Apr 1, 2021

dehays commented Apr 5, 2021

jeffbaumes commented Apr 5, 2021

jeffbaumes commented Apr 5, 2021

wdduncan commented Apr 6, 2021

jbeezley commented Apr 6, 2021

jbeezley commented Apr 6, 2021

jeffbaumes commented Apr 6, 2021 •

edited

dwinston commented Apr 6, 2021 •

edited

jeffbaumes commented Apr 7, 2021

wdduncan commented Apr 7, 2021

jbeezley commented Apr 7, 2021

dehays commented Apr 9, 2021

wdduncan commented Apr 9, 2021

dwinston commented Apr 9, 2021

jeffbaumes commented Apr 12, 2021

ssarrafan commented Apr 28, 2021

wdduncan commented Apr 29, 2021

dwinston commented May 5, 2021

emiley commented May 5, 2021 via email

dwinston commented May 5, 2021

jeffbaumes commented May 11, 2021

ssarrafan commented May 18, 2021

ssarrafan commented Jun 2, 2021

wdduncan commented Jun 17, 2021

Move PI profile images to official metadata schema #19

Move PI profile images to official metadata schema #19

Comments

jeffbaumes commented Apr 1, 2021

dehays commented Apr 5, 2021

jeffbaumes commented Apr 5, 2021

jeffbaumes commented Apr 5, 2021

wdduncan commented Apr 6, 2021

jbeezley commented Apr 6, 2021

jbeezley commented Apr 6, 2021

jeffbaumes commented Apr 6, 2021 • edited

dwinston commented Apr 6, 2021 • edited

jeffbaumes commented Apr 7, 2021

wdduncan commented Apr 7, 2021

jbeezley commented Apr 7, 2021

dehays commented Apr 9, 2021

wdduncan commented Apr 9, 2021

dwinston commented Apr 9, 2021

jeffbaumes commented Apr 12, 2021

ssarrafan commented Apr 28, 2021

wdduncan commented Apr 29, 2021

dwinston commented May 5, 2021

emiley commented May 5, 2021 via email

dwinston commented May 5, 2021

jeffbaumes commented May 11, 2021

ssarrafan commented May 18, 2021

ssarrafan commented Jun 2, 2021

wdduncan commented Jun 17, 2021

jeffbaumes commented Apr 6, 2021 •

edited

dwinston commented Apr 6, 2021 •

edited