## Overview

Notes for DSS data tools presentation on 9. Nov. 2023 by Holger and Wolfgang

ColStudies – Wolfgang

+ <https://github.com/moeltgen/ColStudies>
+ <https://zenodo.org/records/8360299> — DOI [10.5281/zenodo.8360298](https://doi.org/10.5281/zenodo.8360298)

ESS linking – Holger

+ <https://github.com/hdigital/ess-linking/tree/v0.1>
+ <https://zenodo.org/records/8421233>


## Explore DOI APIs

In [19]:
import requests

doi_url = "https://doi.org/10.5281/zenodo.8360298"

### Formatted text citation

In [20]:
headers = {"accept": "text/x-bibliography"}
r = requests.get(doi_url, headers=headers)

r.text

'Zenk-Möltgen, W. (2023). <i>ColStudies: A web-frontend to Colectica API to register DOIs with da|ra</i> (Version v0.3) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.8360298'

### Schema.org in JSON-LD

In [21]:
headers = {"accept": "application/vnd.schemaorg.ld+json"}
r = requests.get(doi_url, headers=headers)

r.json()

{'@context': 'http://schema.org',
 '@type': 'SoftwareSourceCode',
 '@id': 'https://doi.org/10.5281/zenodo.8360298',
 'identifier': {'@type': 'PropertyValue',
  'propertyID': 'URL',
  'value': 'https://zenodo.org/record/8360299'},
 'url': 'https://zenodo.org/record/8360298',
 'name': 'ColStudies: A web-frontend to Colectica API to register DOIs with da|ra',
 'author': {'name': 'Wolfgang Zenk-Möltgen',
  'givenName': 'Wolfgang',
  'familyName': 'Zenk-Möltgen',
  'affiliation': {'@type': 'Organization', 'name': 'GESIS'},
  '@type': 'Person'},
 'description': 'Basic ColStudies application, connect to Colectica repository, view studies, register DOIs with da|ra.',
 'license': ['https://opensource.org/licenses/MIT',
  'info:eu-repo/semantics/openAccess'],
 'version': 'v0.3',
 'keywords': 'DDI, da|ra',
 'datePublished': '2023-09-19',
 '@reverse': {'isBasedOn': {'@type': 'ScholarlyArticle',
   'identifier': {'@type': 'PropertyValue',
    'propertyID': 'URL',
    'value': 'https://github.com/mo

## Crosscite DOI APIs

<https://citation.crosscite.org/docs.html#sec-4>

> "Currently three DOI registration agencies have implemented content negotation for their DOIs: Crossref, DataCite and mEDRA. They support a number of metadata content types, some of which are common to the three RAs."

- Formatted text citation // `text/x-bibliography`
- BibTeX // `application/x-bibtex`
- Citeproc JSON // `application/vnd.citationstyles.csl+json`
- Schema.org in JSON-LD // `application/vnd.schemaorg.ld+json` (only DataCite)


In [22]:
import time
import requests

CONTENT_TYPES = {
    "bibliography": "text/x-bibliography",
    "bibtex": "application/x-bibtex",
    "json-csl": "application/vnd.citationstyles.csl+json",
    "json-ld": "application/vnd.schemaorg.ld+json",
}


def get_doi_data(doi, content_type="bibliography"):
    """Return metadata for DOI."""

    url = f"https://dx.doi.org/{doi}"
    headers = {"accept": CONTENT_TYPES.get(content_type, "bibliography")}

    return requests.get(url, headers=headers)


doi = "https://doi.org/10.5281/zenodo.8421232"

for key in CONTENT_TYPES:
    time.sleep(2)  # pause to avoid requests limit
    doi_request = get_doi_data(doi, key)
    print(f"## DOI content type: {key}\n\n{doi_request.text}\n\n")

## DOI content type: bibliography

Bederke, P., &amp; Döring, H. (2023). <i>Harmonizing and linking party information: The ESS as an example of complex data linking</i> (Version v0.1) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.8421232


## DOI content type: bibtex

@misc{https://doi.org/10.5281/zenodo.8421232,
  doi = {10.5281/ZENODO.8421232},
  url = {https://zenodo.org/record/8421232},
  author = {Bederke, Paul and Döring, Holger},
  keywords = {comparative politics, data management, survey data, voting behavior, expert surveys, validation},
  title = {Harmonizing and linking party information: The ESS as an example of complex data linking},
  publisher = {Zenodo},
  year = {2023},
  copyright = {Open Access}
}



## DOI content type: json-csl

{
  "type": "book",
  "id": "https://doi.org/10.5281/zenodo.8421232",
  "categories": [
    "comparative politics",
    "data management",
    "survey data",
    "voting behavior",
    "expert surveys",
    "validation"
  ],
 

## Zenodo API

<https://developers.zenodo.org/#rest-api>

### API key

A [Zenodo API key](https://zenodo.org/account/settings/applications/tokens/new/) to access the REST API.

Read API key from a local `.env` not in Git repository with [python-decouple](https://github.com/HBNetwork/python-decouple#how-to-use-python-decouple-with-jupyter) 

In [23]:
import os
from decouple import Config, RepositoryEnv

config = Config(RepositoryEnv("/workspaces/dss-presentation_data-tools/.env"))

ACCESS_TOKEN = config("ZENODO_API_KEY")

### Depositions

In [24]:
r = requests.get(
    "https://zenodo.org/api/deposit/depositions",
    params={"access_token": "no Zenodo access token provided"},
)
r.status_code
r.json()

{'status': 403, 'message': 'Permission denied.'}

In [25]:
r = requests.get(
    "https://zenodo.org/api/deposit/depositions", params={"access_token": ACCESS_TOKEN}
)
r.status_code
r.json()

[{'created': '2023-10-09T11:48:42.068068+00:00',
  'modified': '2023-10-09T14:27:00.101118+00:00',
  'id': 8421233,
  'conceptrecid': '8421232',
  'doi': '10.5281/zenodo.8421233',
  'conceptdoi': '10.5281/zenodo.8421232',
  'doi_url': 'https://doi.org/10.5281/zenodo.8421233',
  'metadata': {'title': 'Harmonizing and linking party information: The ESS as an example of complex data linking',
   'doi': '10.5281/zenodo.8421233',
   'publication_date': '2023-10-09',
   'description': '<p>Combining party information from multiple sources is a work-intensive challenge for quantitative studies of political representation. Differences in the definition of political parties and difficult data structures can make linking party information across datasets challenging. The European Social Survey (ESS) is an example of a prominent data source in political science research whose party information is particularly difficult to work with. Here, we demonstrate how Party Facts, an online infrastructure fo

### Records

In [26]:
records_api_url = "https://zenodo.org/api/records"
search_query = 'creators.affiliation:("GESIS")'
params = {"q": search_query, "access_token": ACCESS_TOKEN}

r = requests.get(records_api_url, params=params)

gesis_records = r.json()

In [27]:
gesis_records.keys()

gesis_records["hits"]

{'hits': [{'created': '2022-02-03T15:26:38.380712+00:00',
   'modified': '2022-03-04T10:58:01.633724+00:00',
   'id': 5914219,
   'conceptrecid': '5914218',
   'doi': '10.5281/zenodo.5914219',
   'conceptdoi': '10.5281/zenodo.5914218',
   'doi_url': 'https://doi.org/10.5281/zenodo.5914219',
   'metadata': {'title': 'KonsortSWD PID Registrator',
    'doi': '10.5281/zenodo.5914219',
    'publication_date': '2022-01-31',
    'description': '<p>The purpose of this software is to enable registering any object with an existing PID handle server based on the ePIC API.</p>',
    'access_right': 'open',
    'creators': [{'name': 'Zhang, Yudong', 'affiliation': 'GESIS'},
     {'name': 'Baran, Erdal', 'affiliation': 'GESIS'},
     {'name': 'Zloch, Matthäus', 'affiliation': 'GESIS'},
     {'name': 'Mühlbauer, Alexander', 'affiliation': 'GESIS'},
     {'name': 'Klas, Claus-Peter', 'affiliation': 'GESIS'},
     {'name': 'Mutschke, Peter', 'affiliation': 'GESIS'}],
    'contributors': [{'name': 'Klas

In [28]:
for record in gesis_records["hits"]["hits"]:
    print(f"\n\n## {record['doi']}")
    for creator in record["metadata"]["creators"]:
        if creator["affiliation"] and "GESIS" in creator["affiliation"]:
            print(creator["name"])



## 10.5281/zenodo.5914219
Zhang, Yudong
Baran, Erdal
Zloch, Matthäus
Mühlbauer, Alexander
Klas, Claus-Peter
Mutschke, Peter


## 10.5281/zenodo.259554
Hopt, Oliver
Klas, Claus-Peter
Mühlbauer, Alexander
Zenk-Möltgen, Wolfgang


## 10.5281/zenodo.6630263
Klas, Claus-Peter
Hopt, Oliver
Krämer, Thomas
Nugraha, Sigit


## 10.5281/zenodo.7220636
Lipinsky, Anke
Schredl, Claudia
Baumann, Horst
Lomazzi, Vera
Freund, Frederike


## 10.5281/zenodo.7024958
Geisler, Helena
Löther, Andrea
Steinweg, Nina


## 10.5281/zenodo.7023258
Geisler, Helena
Löther, Andrea
Steinweg, Nina


## 10.5281/zenodo.259565
Müller, Stefan
Zenk-Möltgen, Wolfgang
Schweers, Stefan


## 10.5281/zenodo.4621051
Borschewski, Kerrin
Akdeniz, Esra
Piesch, Sophia


## 10.5281/zenodo.1118382
MahmoudHashemi, Azadeh
Mühlbauer, Alexander
Zenk-Möltgen, Wolfgang


## 10.5281/zenodo.5180976
Veronika Keck
Dorothée Behr
Brita Dorer


## 10.5281/zenodo.1118391
Klas, Claus-Peter
Hopt, Oliver
Mühlbauer, Alexander


## 10.5281/zenodo.714330

In [29]:
gesis_authors = []

for record in gesis_records["hits"]["hits"]:
    for creator in record["metadata"]["creators"]:
        if creator["affiliation"] and "GESIS" in creator["affiliation"]:
            gesis_authors.append(
                {
                    "record": record["doi"],
                    "creator": creator["name"],
                    "type": record["metadata"]["resource_type"]["type"],
                }
            )

In [30]:
gesis_authors

[{'record': '10.5281/zenodo.5914219',
  'creator': 'Zhang, Yudong',
  'type': 'software'},
 {'record': '10.5281/zenodo.5914219',
  'creator': 'Baran, Erdal',
  'type': 'software'},
 {'record': '10.5281/zenodo.5914219',
  'creator': 'Zloch, Matthäus',
  'type': 'software'},
 {'record': '10.5281/zenodo.5914219',
  'creator': 'Mühlbauer, Alexander',
  'type': 'software'},
 {'record': '10.5281/zenodo.5914219',
  'creator': 'Klas, Claus-Peter',
  'type': 'software'},
 {'record': '10.5281/zenodo.5914219',
  'creator': 'Mutschke, Peter',
  'type': 'software'},
 {'record': '10.5281/zenodo.259554',
  'creator': 'Hopt, Oliver',
  'type': 'presentation'},
 {'record': '10.5281/zenodo.259554',
  'creator': 'Klas, Claus-Peter',
  'type': 'presentation'},
 {'record': '10.5281/zenodo.259554',
  'creator': 'Mühlbauer, Alexander',
  'type': 'presentation'},
 {'record': '10.5281/zenodo.259554',
  'creator': 'Zenk-Möltgen, Wolfgang',
  'type': 'presentation'},
 {'record': '10.5281/zenodo.6630263',
  'crea

In [31]:
import pandas as pd

df = pd.DataFrame(gesis_authors)

print(df)

                    record                 creator          type
0   10.5281/zenodo.5914219           Zhang, Yudong      software
1   10.5281/zenodo.5914219            Baran, Erdal      software
2   10.5281/zenodo.5914219         Zloch, Matthäus      software
3   10.5281/zenodo.5914219    Mühlbauer, Alexander      software
4   10.5281/zenodo.5914219       Klas, Claus-Peter      software
..                     ...                     ...           ...
70  10.5281/zenodo.1118393  Zenk-Möltgen, Wolfgang  presentation
71  10.5281/zenodo.1134531         Winters, Kristi        poster
72  10.5281/zenodo.1134531      Friedrichs, Martin        poster
73  10.5281/zenodo.7149418            Philipp Mayr  presentation
74  10.5281/zenodo.7149418           Tobias Backes  presentation

[75 rows x 3 columns]


In [32]:
(
    df.groupby("creator")
    .size()
    .reset_index(name="Count")
    .sort_values(by="Count", ascending=False)
    .head(10)
)

Unnamed: 0,creator,Count
19,"Klas, Claus-Peter",6
45,"Zenk-Möltgen, Wolfgang",5
15,"Hopt, Oliver",5
29,"Mühlbauer, Alexander",4
0,"Akdeniz, Esra",3
14,"Geisler, Helena",2
43,"Weller, Katrin",2
41,Veronika Keck,2
39,"Steinweg, Nina",2
20,"Krämer, Thomas",2


In [33]:
import altair as alt

(alt.Chart(df).mark_bar().encode(x="count():Q", y="creator:N", color="type"))