# SchemaOrg Reader

This notebook shows how to read metadata from Schema.org, using a URL with embedded Schema.org JSON-LD.

[Download Notebook](https://github.com/front-matter/commonmeta-py/blob/main/docs/readers/schema_org_reader.ipynb)

In [2]:
from commonmeta import Metadata

# Fetch metadata from a URL pointing to a landing page for a scholary resource
string = 'https://doi.pangaea.de/10.1594/PANGAEA.836178'
metadata = Metadata(string)

# Check that metadata was fetched successfully
print(metadata.state)

ModuleNotFoundError: No module named 'talbot'

## Inspect the metadata


The metadata are optionally embedded in the HTML of the page, using the [JSON-LD](https://json-ld.org/) format. The metadata are embedded in a `<script>` tag with the `type` attribute set to `application/ld+json` They are converted into the internal commonmeta format.

* `pid`: the persistent identifier of the resource
* `titles`: the title(s) of the resource
* `creators`: the creator(s)/author(s) of the resource
* `publisher`: the publisher of the resource
* `publication_year`: the publication year of the resource
* `types`: the type of the resource, as [defined Schema.org](https://schema.org/) as e.g. `ScholarlyArticle`, `Dataset`, `SoftwareSourceCode`, `ImageObject`, `VideoObject`, `Event`, `CreativeWork`, `Collection`, `DataCatalog`, `Report`, `Thesis`, `Service`, or `Review`. Also mapped to other metadata formats to faciliate metadata conversion.

In [None]:
print(metadata.pid)
print(metadata.titles)
print(metadata.creators)
print(metadata.publisher)
print(metadata.publication_year)
print(metadata.types)

https://doi.org/10.1594/pangaea.836178
[{'title': 'Hydrological and meteorological investigations in a lake near Kangerlussuaq, west Greenland'}]
[{'nameType': 'Personal', 'givenName': 'Emma', 'familyName': 'Johansson'}, {'nameType': 'Personal', 'givenName': 'Sten', 'familyName': 'Berglund'}, {'nameType': 'Personal', 'givenName': 'Tobias', 'familyName': 'Lindborg'}, {'nameType': 'Personal', 'givenName': 'Johannes', 'familyName': 'Petrone'}, {'nameType': 'Personal', 'givenName': 'Dirk', 'familyName': 'van As'}, {'nameType': 'Personal', 'givenName': 'Lars-Göran', 'familyName': 'Gustafsson'}, {'nameType': 'Personal', 'givenName': 'Jens-Ove', 'familyName': 'Näslund'}, {'nameType': 'Personal', 'givenName': 'Hjalmar', 'familyName': 'Laudon'}]
PANGAEA
2014
{'resourceTypeGeneral': 'Dataset', 'schemaOrg': 'Dataset', 'citeproc': 'dataset', 'bibtex': 'misc', 'ris': 'DATA'}


## Enhance the metadata with HTML meta tags

The metadata are enhanced with the following HTML meta tags: `citation_doi`, `citation_author`, `citation_title`, `citation_publisher`, `citation_publication_date`, `citation_keywords`, `citation_language`, `citation_issn`. These tags are recommended by [Google Scholar](https://scholar.google.com/intl/en/scholar/inclusion.html#indexing) and widely used by publishers and repositories. Below is an example with embedded HTML meta tags not using Schema.org:

In [None]:
url = 'https://verfassungsblog.de/einburgerung-und-ausburgerung'
metadata = Metadata(url)

print(metadata.doi)
print(metadata.titles)
print(metadata.creators)
print(metadata.publisher)
print(metadata.publication_year)
print(metadata.types)

[{'nameType': 'Personal', 'givenName': 'Maria Martha', 'familyName': 'Gerdes'}]
10.17176/20221210-001644-0
[{'title': 'Einbürgerung und Ausbürgerung: Warum die Staatsangehörigkeitsrechtsreform nicht ohne Ausbürgerungsrechtsreform funktioniert'}]
[{'nameType': 'Personal', 'givenName': 'Maria Martha', 'familyName': 'Gerdes'}]
Verfassungsblog
2022
{'resourceTypeGeneral': 'Preprint', 'schemaOrg': 'Article', 'citeproc': 'article-newspaper', 'bibtex': 'article', 'ris': 'GEN'}
