In this notebook, I'm addressing the problem I raised in a GitHub issue of how to keep things straight between a shared Zotero library and processing to a) establish linkages to the xDD Digital Library and b) extracted data/information from associated xDD documents. I reworked the original process of operating against an export from the Zotero library to one that works against the Zotero API via the pyzotero package.

I'm breaking this up into a two part exploration. This first part simply runs the process of establishing a linkage on DOI to a corresponding xDD document identifier by putting a link attachment back onto the Zotero library item. That establishes a base of operations for subsequent data extraction processes, keeping everything on established Zotero library items where continued curation can occur via web or desktop clients.

To operate this code, you will need to establish the ZOTERO_API_KEY environment variable in your Python environment. That key is tied to a given user account that has read/write permissions to the WLCI Library Group (established with the group ID parameter.

This same idea should apply for any other Zotero library where we want to run the same overall process of linking a library reference to an xDD digital NLP representation. In the second part of this experiment, I will pick up from the established "xDD Document Link" identifier, pull back a list of associated species scientific names, and put those into a child attachment of the xDD Document Link attachment. This then begins to lay out structured annotation retrieved via algorithms directly into the Zotero library items, adding value to those items and using the Zotero library as a living cloud-based repository with both human and software generated value.

In [1]:
from pyzotero import zotero
import os
import requests
from IPython.display import display

In [2]:
wlci_library_group_id = "2341914"

In [3]:
wlci_lib = zotero.Zotero(wlci_library_group_id, "group", os.environ["ZOTERO_API_KEY"])

In [None]:
wlci_lib_items = wlci_lib.everything(wlci_lib.top())

In [None]:
print(len(wlci_lib_items))

In [None]:
set([i["data"]["itemType"] for i in wlci_lib_items])

In our current work with the xDD library, we are only working on those articles for which we were able to establish a DOI in our Zotero library. DOIs either came from the original article metadata imported into the library and are in the DOI property, or they were added as a note and will show up in the extra property for now. (Need to revisit how we manage this in the library.) To work these over, I assemble a list of Zotero IDs and DOIs.

In [None]:
lookup_doi_list = [(i["data"]["key"],i["data"]["DOI"]) for i in wlci_lib_items if "DOI" in i["data"].keys() and len(i["data"]["DOI"]) > 0]
lookup_doi_list.extend([(i["data"]["key"],i["data"]["extra"]) for i in wlci_lib_items if "DOI" not in i["data"].keys() and len(i["data"]["extra"]) > 0 and i["data"]["extra"].split(":")[0] != "OCLC"])

We will eventually pull these functions out into our Python package for this work. I tweaked on what Daniel started here with a somewhat different take on the xdd_api consultation process. I also added a helper function to assemble the necessary information into the Zotero template for the attachment.

I had to fork the pyzotero package and create a [branch](https://github.com/skybristol/pyzotero/tree/attachment-template-type) with an adjustment to the item_template function, which was failing on the "attachment" template type and needed an additional parameter against the Zotero REST API. I'll post this back to the project as a pull request for consideration.

In [5]:
def xdd_api(route, params):
    """Create list of docs mentioning a term of interest
    Parameters : see https://geodeepdive.org/api for more detail
    ----------
    routes : str of available api routes for xDD 
    params : str of key value pairs of paramaters:values separated by &
    """
    base_url = 'https://geodeepdive.org/api'
    search = (base_url + '/' + route + '?' + str(params))
    r=requests.get(search)
    if r.status_code == 200 and 'success' in r.json():
        json_r = r.json()
        data = json_r['success']['data']
        return data
    elif r.status_code == 200:
        return None
    else:
        return None

    
def xdd_link_attachment(parent_id, xdd_record, template):
    template["parentItem"] = parent_id
    template["title"] = "xDD Document Link"
    template["url"] = f"https://geodeepdive.org/api/articles?docid={xdd_record['_gddid']}"
    template["accessDate"] = "CURRENT_TIMESTAMP"
    template["note"] = "Link to xDD document established through search algorithm"
    template["tags"] = ["xdd_doc_link"]
    return template

Here I run through the list of DOIs and Zotero IDs to check xDD for a link. This could be rearranged in a variety of ways and will need to be explored further for production use. I do check to make sure we don't already have a link of the appropriate "type" (based on assigning a particular title to the attachment).

In [None]:
for ref in lookup_doi_list:
    print(ref[0], ref[1])
    xdd_data = xdd_api(
        'articles', 
        'max=1&doi='+str(ref[1])
    )

    if xdd_data is not None:
        child_items = wlci_lib.children(ref[0])
        current_xdd_doc_attachment = next((i for i in child_items if i["data"]["title"] == "xDD Document Link"), None)
        
        if current_xdd_doc_attachment is None:
            xdd_doc_attachment = xdd_link_attachment(ref[0], xdd_data[0], wlci_lib.item_template("attachment", linkmode="linked_url"))
            create_response = wlci_lib.create_items([xdd_doc_attachment])
            if not create_response["successful"]:
                display(create_response)
        else:
            display(current_xdd_doc_attachment)
