# Konvertierung einer Metadaten Datei in VZG JSON

In diesem Notebook wird eine XML-Datei mit Metadaten zu einem [Artikel](https://dx.doi.org/10.1140/epjc/s10052-019-7027-6) in eine VZG JSON-Datei umgewandelt.

Das Ausgangsformat liegt dabei im [JATS](https://jats.nlm.nih.gov/archiving/tag-library/1.1d1/) Format vor.

Die neu erstellten Daten werden dann in das [VZG-JSON Schema](https://github.com/gbv/articleformat) überführt.

## Einfaches Beispiel

Die Ausgansdatei wird genutzt, um einmal beispielhaft zu zeigen, wie die Konvertierung im Prinzip verläuft.

Dabei wird aus der Ausgansdatei ein Datensatz erzeugt, der den Minimalanforderungen des VZG JSON Schemas genügt.

In [4]:
from pathlib import Path
from pprint import pprint
from IPython.display import Markdown
from lxml import etree
from vzg.jconv.gapi import NAMESPACES
from vzg.jconv.langcode import ISO_639
from vzg.jconv.gapi import JSON_SCHEMA
import jsonschema
import json
import logging

logging.basicConfig(level=50)

xmldir = Path(".").absolute()
xmlpath = xmldir / "article.xml"
iso639 = ISO_639()

def simple_example():
    if not xmlpath.exists():
        print("Datei existiert nicht. {}".format(xmlpath.name))
        return None

    print("{} wird bearbeitet\n".format(xmlpath.name))
    
    article = {"abstracts": [{'text': ""}],
               "journal": {},
               "lang_code": "",
               "primary_id": {}, 
               "title": "Test", }
    
    # XML-Datei einlesen
    with open(xmlpath, 'rb') as fh:
        dom = etree.parse(fh)

    # Hilfsfunktionen
    def xpath(expression):
        return dom.xpath(expression, namespaces=NAMESPACES)
    
    def totext(node):
        return etree.tostring(node, encoding="utf-8", method="text").decode()
    
    # Titel des Artikels
    titlenode = xpath("//article-meta/title-group/article-title")[0]
    article['title'] = totext(titlenode)

    # Abstract
    attributes = xpath("//article-meta/abstract/@xml:lang")
    article["abstracts"][0]['lang_code'] = iso639.i1toi2[attributes[0]]

    sections = xpath("//article-meta/abstract")
    
    atext = []

    for secnode in sections:
        nodes = secnode.xpath("title")
        if len(nodes) > 0:
            atext.append(totext(nodes[0]))

        paras = [totext(para) for para in secnode.xpath("p")]
        atext += paras

    atext = [para for para in atext if isinstance(para, str)]
    article["abstracts"][0]['text'] = "\n\n".join(atext)
    
    # Primärer Identifier
    pdict = {"type": "SPRINGER", "id": ""}
    pdict['id'] = xpath("""//article-meta/article-id[@pub-id-type="publisher-id"]/text()""")[0]
    article["primary_id"] = pdict
    
    # Zeitschrift
    jdict = {"title": "", "year": ""}
    
    jdict["title"] = xpath("//journal-meta/journal-title-group/journal-title/text()")[0].strip()
    
    pubnode = xpath("""//article-meta/pub-date[@date-type="epub"]""")[0]
    jdict["year"] = pubnode.xpath("year/text()")[0]
    
    article["journal"] = jdict
    
    # Sprachcode
    attributes = xpath("//article-meta/title-group/article-title/@xml:lang")
    article['lang_code'] = [iso639.i1toi2[attributes[0]]]
    
    return article

# Konvertierung starten
article = simple_example()

# Test, ob die erzeugten Daten dem VZG Artikelformat entsprechen
jsonschema.validate(instance=article, schema=JSON_SCHEMA)

# Das erzeugte Python Dictionary anzeigen
pprint(article)

# Die konvertierten Daten als JSON Datei abspeichern
with open("article-simple.json", "w") as jfh:
    json.dump(article, jfh)

mdtext = """### {}

Zeitschrift: {} ({})
""".format(article['title'], article["journal"]["title"], article["journal"]["year"])

if len(article['abstracts']) > 0:
    abstract = article['abstracts'][0]['text']
    mdtext += """\n{}""".format(abstract)

# Die Daten aus der Konvertierung benutzen, um diese formatiert anzuzeigen 
Markdown(mdtext)

article.xml wird bearbeitet

{'abstracts': [{'lang_code': 'eng',
                'text': 'Abstract\n'
                        '\n'
                        'This paper presents measurements of '
                        'W±Z\\documentclass[12pt]{minimal}\n'
                        '\t\t\t\t\\usepackage{amsmath}\n'
                        '\t\t\t\t\\usepackage{wasysym}\n'
                        '\t\t\t\t\\usepackage{amsfonts}\n'
                        '\t\t\t\t\\usepackage{amssymb}\n'
                        '\t\t\t\t\\usepackage{amsbsy}\n'
                        '\t\t\t\t\\usepackage{mathrsfs}\n'
                        '\t\t\t\t\\usepackage{upgreek}\n'
                        '\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n'
                        '\t\t\t\t\\begin{document}$$W^{\\pm '
                        '}Z$$\\end{document} production cross sections in pp '
                        'collisions at a centre-of-mass energy of 13\xa0'
                        'TeV\\documentclass[12pt]{

### Measurement of W±Z\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$W^{\pm }Z$$\end{document} production cross sections and gauge boson polarisation in pp collisions at s=13TeV\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\sqrt{s} = 13~\text {TeV}$$\end{document} with the ATLAS detector

Zeitschrift: The European Physical Journal C (2019)

Abstract

This paper presents measurements of W±Z\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$W^{\pm }Z$$\end{document} production cross sections in pp collisions at a centre-of-mass energy of 13 TeV\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\text {TeV}$$\end{document}. The data were collected in 2015 and 2016 by the ATLAS experiment at the Large Hadron Collider, and correspond to an integrated luminosity of 36.1fb-1\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$36.1~\hbox {fb}^{-1}$$\end{document}. The W±Z\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$W^{\pm }Z$$\end{document} candidate events are reconstructed using leptonic decay modes of the gauge bosons into electrons and muons. The measured inclusive cross section in the detector fiducial region for a single leptonic decay mode is σW±Z→ℓ′νℓℓfid.=63.7±1.0(stat.)±2.3(syst.)±1.4(lumi.)\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\sigma _{W^\pm Z \rightarrow \ell ^{'} \nu \ell \ell }^{\text {fid.}} = 63.7 \, \pm ~1.0~\text {(stat.)} \, \pm ~2.3~\text {(syst.)} \, \pm ~1.4~\text {(lumi.)}$$\end{document} fb, reproduced by the next-to-next-to-leading-order Standard Model prediction of 61.5-1.3+1.4\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$61.5^{+1.4}_{-1.3}$$\end{document} fb. Cross sections for W+Z\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$W^+Z$$\end{document} and W-Z\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym}
				\usepackage{amsfonts}
				\usepackage{amssymb}
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$W^-Z$$\end{document} production and their ratio are presented as well as differential cross sections for several kinematic observables. An analysis of angular distributions of leptons from decays of W and Z bosons is performed for the first time in pair-produced events in hadronic collisions, and integrated helicity fractions in the detector fiducial region are measured for the W and Z bosons separately. Of particular interest, the longitudinal helicity fraction of pair-produced vector bosons is also measured.

## Vollständige Konvertierung eines Springer Artikels

In diesem Beispiel wird der Konverter aus dem vzg.jconv Modul benutzt.

In [5]:
from vzg.jconv.converter.jats import JatsConverter
from vzg.jconv.converter.jats import JatsArticle

def springer_article():
    if not xmlpath.exists():
        print("Datei existiert nicht. {}".format(xmlpath.name))
        return None

    print("{} wird bearbeitet\n".format(xmlpath.name))
    
    jconv = JatsConverter(xmlpath, validate=True)
    jconv.run()
    
    anum = len(jconv.articles)
    msg = f"Die Datei beinhaltet {anum} Artikel.\n"
    print(msg)
    
    title = ""
    abstract = ""
    
    if len(jconv.articles) > 0:
        return jconv.articles[0]
    
    return None

# Konvertierung mit Validierung starten
article = springer_article()

mdtext = ""

if isinstance(article, JatsArticle):
    
    # Das erzeugte Python Dictionary anzeigen
    pprint(article.jdict)
    
    # Die konvertierten Daten als JSON Datei abspeichern
    with open("article-complete.json", "w") as jfh:
        json.dump(article.jdict, jfh)
        
    mdtext = """### {}

Zeitschrift: {} ({})
""".format(article.jdict['title'].replace("$$", "$"), 
           article.jdict["journal"]["title"], 
           article.jdict["journal"]["year"])
    
    if len(article.jdict['abstracts']) > 0:
        abstract = article.jdict['abstracts'][0]['text']
        mdtext += """\n{}""".format(abstract.replace("$$", "$"))

# Die Daten aus der Konvertierung benutzen, um diese formatiert anzuzeigen 
Markdown(mdtext)

article.xml wird bearbeitet

Die Datei beinhaltet 1 Artikel. 

{'abstracts': [{'lang_code': 'eng',
                'text': 'Abstract\n'
                        '\n'
                        'This paper presents measurements of $$W^{\\pm }Z$$ '
                        'production cross sections in pp collisions at a '
                        'centre-of-mass energy of 13\xa0$$\\text {TeV}$$. The '
                        'data were collected in 2015 and 2016 by the ATLAS '
                        'experiment at the Large Hadron Collider, and '
                        'correspond to an integrated luminosity of '
                        '$$36.1~\\hbox {fb}^{-1}$$. The $$W^{\\pm }Z$$ '
                        'candidate events are reconstructed using leptonic '
                        'decay modes of the gauge bosons into electrons and '
                        'muons. The measured inclusive cross section in the '
                        'detector fiducial region for a single leptonic decay 

### Measurement of $W^{\pm }Z$ production cross sections and gauge boson polarisation in pp collisions at $\sqrt{s} = 13~\text {TeV}$ with the ATLAS detector

Zeitschrift: The European Physical Journal C (2019)

Abstract

This paper presents measurements of $W^{\pm }Z$ production cross sections in pp collisions at a centre-of-mass energy of 13 $\text {TeV}$. The data were collected in 2015 and 2016 by the ATLAS experiment at the Large Hadron Collider, and correspond to an integrated luminosity of $36.1~\hbox {fb}^{-1}$. The $W^{\pm }Z$ candidate events are reconstructed using leptonic decay modes of the gauge bosons into electrons and muons. The measured inclusive cross section in the detector fiducial region for a single leptonic decay mode is $\sigma _{W^\pm Z \rightarrow \ell ^{'} \nu \ell \ell }^{\text {fid.}} = 63.7 \, \pm ~1.0~\text {(stat.)} \, \pm ~2.3~\text {(syst.)} \, \pm ~1.4~\text {(lumi.)}$ fb, reproduced by the next-to-next-to-leading-order Standard Model prediction of $61.5^{+1.4}_{-1.3}$ fb. Cross sections for $W^+Z$ and $W^-Z$ production and their ratio are presented as well as differential cross sections for several kinematic observables. An analysis of angular distributions of leptons from decays of W and Z bosons is performed for the first time in pair-produced events in hadronic collisions, and integrated helicity fractions in the detector fiducial region are measured for the W and Z bosons separately. Of particular interest, the longitudinal helicity fraction of pair-produced vector bosons is also measured.