# ORCID People

Most USGS staff who are publishing authors, data creators, or otherwise contributors to some published works now have ORCID identifiers as a matter of policy. Much more than just a convenient globally unique and persistent identifier, the ORCID system and its evolving schema provides a way for us to get at a wealth of additional useful details and linkages on people. In our metadata harvesting process, we regularly identify ORCIDs of interest from across various systems, queue those up for processing, and then retrieve ORCID details into a cache. Content negotiation against orcid.org is pretty reliable, but we still encounter a number of error conditions that are useful to pre-process through and have the need for occasional re-processing of information into our graph or other forms of this information. This makes caching the ORCID data for those identities we care about a reasonable practice.

We split the process up here just a bit; first pulling in anything new or updated in terms of basic identifying information. In many cases, we are already going to have encountered a person and included their ORCID identifier in properties.

# Note
I need to come back to this one and break out entity creation from relationship creation.

In [1]:
import isaid_helpers
import pandas as pd

In [2]:
pd.read_csv(isaid_helpers.f_graphable_orcid).head()

Unnamed: 0,orcid,date_qualifier,reference,name,alternate_name,entity_type,rel_type,grid_id,url,doi,ringgold_id
0,0000-0001-9225-9594,2021-04-14T05:06:29.691739,https://orcid.org/0000-0001-9225-9594,"United States Geological Survey, Grand Canyon ...",Grand Canyon Monitoring & Research Center,Organization,AFFILIATED_WITH,grid.2865.9,,,
1,0000-0003-1800-0183,2021-05-30T18:33:22.687156,https://orcid.org/0000-0003-1800-0183,Adaptive introgression of the beta-globin clus...,,CreativeWork,AUTHOR_OF,,https://doi.org/10.1038/s41437-021-00437-6,10.1038/s41437-021-00437-6,
2,0000-0003-1800-0183,2021-05-30T18:33:22.687156,https://orcid.org/0000-0003-1800-0183,Implications of Historical and Contemporary Pr...,,CreativeWork,AUTHOR_OF,,https://doi.org/10.3390/d13030103,10.3390/d13030103,
3,0000-0003-1800-0183,2021-05-30T18:33:22.687156,https://orcid.org/0000-0003-1800-0183,Mitochondrial genome diversity and population ...,,CreativeWork,AUTHOR_OF,,https://doi.org/10.1007/s00300-020-02703-5,10.1007/s00300-020-02703-5,
4,0000-0003-1800-0183,2021-05-30T18:33:22.687156,https://orcid.org/0000-0003-1800-0183,Lousy grouse: Comparing evolutionary patterns ...,,CreativeWork,AUTHOR_OF,,https://doi.org/10.1002/ece3.6545,10.1002/ece3.6545,


In [3]:
%%time
with isaid_helpers.graph_driver.session(database=isaid_helpers.graphdb) as session:
    session.run("""
        LOAD CSV WITH HEADERS FROM '%(source_path)s/%(source_file)s' AS row
        WITH row
            MATCH (p:Person {orcid: row.orcid})
        
        WITH p, row
            WHERE row.entity_type = "Organization"
                MERGE (o:Organization {name: row.name})
                ON CREATE
                    SET o.alternate_name = row.alternate_name,
                    o.grid_id = row.grid_id,
                    o.url = row.url,
                    o.doi = row.doi,
                    o.ringgold_id = row.ringgold_id
                MERGE (p)-[rel:AFFILIATED_WITH]->(o)
                    SET rel.date_qualifier = row.date_qualifier,
                    rel.reference = row.reference
    """ % {
        "source_path": isaid_helpers.local_cache_path,
        "source_file": isaid_helpers.f_graphable_orcid
    })

CPU times: user 2.26 ms, sys: 3.35 ms, total: 5.62 ms
Wall time: 1min 16s


In [4]:
%%time
with isaid_helpers.graph_driver.session(database=isaid_helpers.graphdb) as session:
    session.run("""
        LOAD CSV WITH HEADERS FROM '%(source_path)s/%(source_file)s' AS row
        WITH row
            MATCH (p:Person {orcid: row.orcid})
        
        WITH p, row
            WHERE row.entity_type = "CreativeWork" AND NOT row.doi IS NULL
                MERGE (w:CreativeWork {doi: row.doi})
                ON CREATE
                    SET w.url = row.url,
                    w.name = row.name,
                    w.source = "ORCID"
                ON MATCH
                    SET w.url = row.url,
                    w.name = row.name

        WITH p, w, row
            WHERE row.rel_type = "AUTHOR_OF"
                MERGE (p)-[rel:AUTHOR_OF]->(w)
                    SET rel.date_qualifier = row.date_qualifier,
                    rel.reference = row.reference

        WITH p, w, row
            WHERE row.rel_type = "FUNDER_OF"
                MERGE (p)-[rel:FUNDER_OF]->(w)
                    SET rel.date_qualifier = row.date_qualifier,
                    rel.reference = row.reference
    """ % {
        "source_path": isaid_helpers.local_cache_path,
        "source_file": isaid_helpers.f_graphable_orcid
    })

CPU times: user 5.81 ms, sys: 3.54 ms, total: 9.35 ms
Wall time: 5min 40s


In [5]:
%%time
with isaid_helpers.graph_driver.session(database=isaid_helpers.graphdb) as session:
    session.run("""
        LOAD CSV WITH HEADERS FROM '%(source_path)s/%(source_file)s' AS row
        WITH row
            MATCH (p:Person {orcid: row.orcid})
        
        WITH p, row
            WHERE row.entity_type = "CreativeWork" AND row.doi IS NULL
                MERGE (w:CreativeWork {name: row.name})
                ON CREATE
                    SET w.url = row.url,
                    w.source = "ORCID"

        WITH p, w, row
            WHERE row.rel_type = "AUTHOR_OF"
                MERGE (p)-[rel:AUTHOR_OF]->(w)
                    SET rel.date_qualifier = row.date_qualifier,
                    rel.reference = row.reference

        WITH p, w, row
            WHERE row.rel_type = "FUNDER_OF"
                MERGE (p)-[rel:FUNDER_OF]->(w)
                    SET rel.date_qualifier = row.date_qualifier,
                    rel.reference = row.reference
    """ % {
        "source_path": isaid_helpers.local_cache_path,
        "source_file": isaid_helpers.f_graphable_orcid
    })

CPU times: user 6.61 ms, sys: 4.37 ms, total: 11 ms
Wall time: 6min 55s
