<a href="https://colab.research.google.com/github/elibtronic/Borealis_Helper_Notebooks/blob/main/Check_Associated_Publication.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Associated Publication Check

Will guide user on a process of checking datasets in a DV site to generate a list of sets with no **associated publication**. Currently working with [Borealis](https://borealisdata.ca) from Scholars Portal. Tweaked to be run on Google Colab.

1. You will need an [API Key](https://borealisdata.ca/guides/en/latest/api/auth.html)

1. Paste that in, adjust other values

1. Under _Runtime_ menu click _Run all_



In [77]:
# @title Datavere Details Needed {"vertical-output":true}
BASE_URL = "https://borealisdata.ca" # @param {"type":"string","placeholder":"https://borealisdata.ca"}
DATAVERSE_NAME = "brock" # @param {"type":"string","placeholder":"brock"}
API_KEY = "change me" # @param {"type":"string","placeholder":"change me"}
#DOI_FROM_DATE = "2024-01-01" # @param {"type":"date"}


!pip install pyDataverse
!pip install jmespath
from pyDataverse.api import NativeApi
import json
import jmespath
import pandas as pd
from IPython.display import HTML
api = NativeApi(BASE_URL,API_KEY)

#Exploring the API
#(save this for later)
#dv_item['data']
#item = jmespath.search("data.latestVersion.metadataBlocks.citation.fields[?typeName=='publication']",dv_item)
#resp = api.get_dataverse(DATAVERSE_NAME)
#resp.json()
#dv_item['data']['persistentUrl']


print("\n\nReady to Proceed")




Ready to Proceed


In [78]:
def find_related_info(doi_to_fetch):
  '''
  Attempts to find citation for related publication and title using DOI supplied
  Returns two strings
  - item_title
  - cite
  '''
  doi_to_fetch = "doi:"+doi_to_fetch
  item_url = ""
  cite = ""
  #This is very brittle
  try:
    dv_item = api.get_dataset(doi_to_fetch).json()
    item_url = dv_item['data']['persistentUrl']
    item_title = jmespath.search("data.latestVersion.metadataBlocks.citation.fields[?typeName=='title']",dv_item)
    item_title = item_title[0]['value']
    item = jmespath.search("data.latestVersion.metadataBlocks.citation.fields[?typeName=='publication']",dv_item)
    cite = item[0]['value'][0]['publicationCitation']['value']
  except:
    cite = "No Citation Found"

  return(item_title,cite)


In [79]:
#Build a list of just items from the Dataverse
resp = api.get_dataverse_contents(DATAVERSE_NAME)
item_list = resp.json()['data']
final_list = []

while len(item_list):
  for item in item_list:
    if item['type'] != 'dataverse':
      final_list.append(item)
      item_list.remove(item)
    else:
      resp2 = api.get_dataverse_contents(item['id'])
      new_items = resp2.json()['data']
      item_list.remove(item)
      item_list.extend(new_items)

doi_info_list = []

for item in final_list:
  doi = item['persistentUrl'].split("//")[1].split("doi.org/")[1]
  title, related = find_related_info(doi)
  doi_info_list.append([item['persistentUrl'], title, related])

#final dataframe creation
dataset_check_df = pd.DataFrame(doi_info_list, columns=['URL','TITLE', 'Related Publication'])

print("Done making the check !")

Done making the check !


In [80]:
#If you prefer to work with a dataframe uncomment below
#dataset_check_df

In [81]:
#To show the list as nicely rendered HTML uncomment below
HTML(dataset_check_df.to_html(render_links=True, escape=False))

Unnamed: 0,URL,TITLE,Related Publication
0,https://doi.org/10.5683/SP2/HUAFRE,Indigenous civil society and ICTs in Latin America,"Lupien, P., G. Chiriboga and S. Machaca. 2021. Indigenous Movements, ICTs and the State in Latin America. Journal of Information Technology & Politics. Forthcoming."
1,https://doi.org/10.5683/SP3/PPTY0N,Carbon Disclosure Project Climate Change Data,No Citation Found
2,https://doi.org/10.5683/SP3/NXSL3Q,Phenotypic Rates of Change Evolutionary and Ecological Database,"Sanderson S, Beausoleil MO, O’Dea RE, Wood ZT, Correa C, Frankel V, Gorné LD, Haines GE, Kinnison MT, Oke KB, Pelletier F, Pérez-Jvostov F, Reyes-Corral WD, Ritchot Y, Sorbara F, Gotanda KM, Hendry AP. 2021. The pace of modern life, revisited. Molecular Ecology 31: 1028–1043."
3,https://doi.org/10.5683/SP3/VBVFJT,Investigating Postsynaptic Effects of a Drosophila Neuropeptide on Muscle Contraction - Supplemental Data,No Citation Found
4,https://doi.org/10.5683/SP3/X6BWUQ,Dataset to accompany manuscript: Seasonal variation of behavioural thermoregulation in a fossorial salamander (Ambystoma maculatum),No Citation Found
5,https://doi.org/10.5683/SP3/MQPFHT,"Data set to accompany manuscript ""Seasonal plasticity in the thermal sensitivity of metabolism but not water loss in a fossorial ectotherm""",No Citation Found
6,https://doi.org/10.5683/SP3/Y6RFTJ,Replication Data for: Costs and benefits of maternal nest choice: tradeoffs between brood survival and thermal stress in bees,"Ecology, accepted for publication"
7,https://doi.org/10.5683/SP3/PHWSVM,"""Put the fucking salary in the job ad!"" Twitter Post Data","Ribaric, T. (2022). “Put the fucking salary in the job ad!” In S. Acadia, Libraries as Dysfunctional Organizations and Workplaces (1st ed., pp. 167–192). Routledge. https://doi.org/10.4324/9781003159155-8"
8,https://doi.org/10.5683/SP3/WVWNBV,Saving Brains Father Involvement Data,No Citation Found
9,https://doi.org/10.5683/SP3/LJEMOC,A systematic review and meta-analysis of the effects of probiotics on bone outcomes in rodent models,No Citation Found


In [82]:
#Run this cell to download as CSV
dataset_check_df.to_csv('dois_to_check.csv',index=False)


#Uncomment this block if running locally
#from IPython.display import FileLink, display
#FileLink('dois_to_check.csv')

#Uncomment this block if running in Colab
from google.colab import files
files.download('dois_to_check.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>