<a href="https://colab.research.google.com/github/schemaorg/schemaorg/blob/main/scripts/Schema_org_Dashboard.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook is part of the Schema.org project codebase at https://github.com/schemaorg/schemaorg and licensed under the same terms. **bold text**


The purpose of this notebook is to show how to work programmatically with schema.org's definitions. 

See also https://colab.research.google.com/drive/1GVQaP5t8G-NRLAmEvVSp8k5MnsrfttDP for another approach to this.

SPARQL

How to query schema.org schemas using SPARQL

In [None]:
# run this once per session to bring in a required library

!pip --quiet install sparqlwrapper | grep -v 'already satisfied'

from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd
import io
import requests

In [None]:


q1 = """SELECT distinct ?prop ?type1 ?type2 WHERE  {
  ?type1 rdfs:subClassOf* <https://schema.org/Organization> . 
  ?type2 rdfs:subClassOf* <https://schema.org/Person> . 
  ?prop <https://schema.org/domainIncludes> ?type1 .
  ?prop <https://schema.org/domainIncludes> ?type2 .
}"""

pd.set_option('display.max_colwidth', None)

# data
wd_endpoint = 'https://query.wikidata.org/sparql'
sdo_endpoint = "https://dydra.com/danbri/schema-org-v11/sparql"

# utility function
def df_from_query(querystring=q1, endpoint=sdo_endpoint):
  sparql = SPARQLWrapper(endpoint)
  sparql.setQuery(querystring)
  sparql.setReturnFormat(JSON)
  results = sparql.query().convert()
  return( pd.json_normalize(results['results']['bindings']) )

In [None]:

x = df_from_query(q1)
x

Unnamed: 0,prop.type,prop.value,type1.type,type1.value,type2.type,type2.value
0,uri,https://schema.org/email,uri,https://schema.org/Organization,uri,https://schema.org/Person
1,uri,https://schema.org/faxNumber,uri,https://schema.org/Organization,uri,https://schema.org/Person
2,uri,https://schema.org/award,uri,https://schema.org/Organization,uri,https://schema.org/Person
3,uri,https://schema.org/telephone,uri,https://schema.org/Organization,uri,https://schema.org/Person
4,uri,https://schema.org/memberOf,uri,https://schema.org/Organization,uri,https://schema.org/Person
5,uri,https://schema.org/sponsor,uri,https://schema.org/Organization,uri,https://schema.org/Person
6,uri,https://schema.org/knowsAbout,uri,https://schema.org/Organization,uri,https://schema.org/Person
7,uri,https://schema.org/gender,uri,https://schema.org/SportsTeam,uri,https://schema.org/Person
8,uri,https://schema.org/vatID,uri,https://schema.org/Organization,uri,https://schema.org/Person
9,uri,https://schema.org/brand,uri,https://schema.org/Organization,uri,https://schema.org/Person


# Examples

How to access schema.org examples

In [13]:
# First we clone the entire schema.org repo, then we collect up the examples from .txt files:

!git clone https://github.com/schemaorg/schemaorg
!ls

Cloning into 'schemaorg'...
remote: Enumerating objects: 148, done.[K
remote: Counting objects: 100% (148/148), done.[K
remote: Compressing objects: 100% (126/126), done.[K
remote: Total 23493 (delta 92), reused 50 (delta 21), pack-reused 23345[K
Receiving objects: 100% (23493/23493), 96.33 MiB | 26.83 MiB/s, done.
Resolving deltas: 100% (16713/16713), done.
Checking out files: 100% (1788/1788), done.
CONTRIBUTING.md  LICENSE		  SchemaExamples  SOFTWARE_README.md
data		 MASTER_BRANCH_RENAME.md  schemaorg	  templates
devserv.py	 README.md		  SchemaTerms	  tests
docs		 RELEASING.md		  script	  util
gcloud		 requirements.txt	  scripts	  versions.json


In [None]:
!find . -name \*example\*.txt -exec ls {} \;

./schemaorg/SchemaExamples/example-code/examples.txt
./schemaorg/data/sdo-bus-stop-examples.txt
./schemaorg/data/sdo-trip-examples.txt
./schemaorg/data/sdo-police-station-examples.txt
./schemaorg/data/sdo-airport-examples.txt
./schemaorg/data/sdo-train-station-examples.txt
./schemaorg/data/sdo-videogame-examples.txt
./schemaorg/data/sdo-book-series-examples.txt
./schemaorg/data/sdo-automobile-examples.txt
./schemaorg/data/sdo-invoice-examples.txt
./schemaorg/data/sdo-creativework-examples.txt
./schemaorg/data/sdo-itemlist-examples.txt
./schemaorg/data/sdo-offeredby-examples.txt
./schemaorg/data/sdo-digital-document-examples.txt
./schemaorg/data/examples.txt
./schemaorg/data/sdo-hotels-examples.txt
./schemaorg/data/ext/pending/issue-2490-examples.txt
./schemaorg/data/ext/pending/issue-1670-examples.txt
./schemaorg/data/ext/pending/issue-2192-examples.txt
./schemaorg/data/ext/pending/issue-894-examples.txt
./schemaorg/data/ext/pending/issue-2396-examples.txt
./schemaorg/data/ext/pending/

In [16]:
import sys
import os

for path in [os.getcwd(),"./SchemaExamples","/content/schemaorg/SchemaExamples"]:
  sys.path.insert( 1, path ) #Pickup libs from shipped lib directory


from schemaexamples import SchemaExamples, Example


SchemaExamples.loadExamplesFiles("default")

for ex in SchemaExamples.allExamples(sort=True):
	print(ex.getKey())


/content/schemaorg
SchemaExamples.loadExamplesFiles() loading from default files found in globs: ['data/*examples.txt', 'data/ext/*/*examples.txt']
eg-0001
eg-0015
eg-0003
eg-0004
eg-0005
eg-0006
eg-0007
eg-0008
eg-0461
eg-0010
eg-0011
eg-0462
eg-0013
eg-0009
eg-0016
eg-0017
eg-0018
eg-0019
eg-0020
eg-0021
eg-0022
eg-0023
eg-0024
eg-0025
eg-0026
eg-0027
eg-0028
eg-0029
eg-0030
eg-0031
eg-0032
eg-0033
eg-0034
eg-0035
eg-0036
eg-0037
eg-0038
eg-0039
eg-0040
eg-0041
eg-0042
eg-0043
eg-0044
eg-0045
eg-0046
eg-0047
eg-0048
eg-0049
eg-0050
eg-0051
eg-0052
eg-0053
eg-0054
eg-0055
eg-0056
eg-0057
eg-0058
eg-0059
eg-0060
eg-0061
eg-0062
eg-0063
eg-0064
eg-0065
eg-0066
eg-0067
eg-0068
eg-0069
eg-0070
eg-0071
eg-0072
eg-0073
eg-0074
eg-0075
eg-0076
eg-0077
eg-0078
eg-0079
eg-0080
eg-0081
eg-0082
eg-0083
eg-0084
eg-0085
eg-0086
eg-0087
eg-0088
eg-0089
eg-0090
eg-0091
eg-0092
eg-0093
eg-0094
eg-0095
eg-0096
eg-0097
eg-0098
eg-0099
eg-0100
eg-0101
eg-0102
eg-0103
eg-0104
eg-0105
eg-0106
eg-0107
eg-0

TODOs:
 * can we load all the examples into a multi-graph SPARQL store? (in rdflib not remote endpoint); put them into 'core' and 'pending' named graphs or similar.
  * then load triples from latest webschemas, https://webschemas.org/version/latest/schemaorg-current-https.jsonld into a named graph.
  * find triples in 'core' examples that are not in the vocabulary (then same with pending)
