In [1]:
# for use in tutorial and development; do not include this `sys.path` change in production:
import sys ; sys.path.insert(0, "../")

# Building a graph in RDF using `rdflib`

First we'll build a Graph object in [`rdflib`](https://rdflib.readthedocs.io/) to which we can add nodes and relations:

In [2]:
import rdflib

g = rdflib.Graph()

In RDF, a graph is constructed from [*triples*](https://www.w3.org/TR/n-triples/), each of which represents an RDF *statement* that has at least three components:

  * *subject*: the entity being annotated
  * *predicate*: a relation between the subject and the object
  * *object*: another entity or a literal value

We'll represent the **anytime crepes** recipe by making programmatic calls to `rdflib`, starting with a URL constructed from the recipe `id` as an initial node.
We'll show this as our first subject `s` to be annotated using RDF statements.

In [3]:
uri = "https://www.food.com/recipe/327593"
s = rdflib.URIRef(uri)
s

rdflib.term.URIRef('https://www.food.com/recipe/327593')

Throughout work with KGs, there's an important practice of using [*persistent identifiers*](https://www.openaire.eu/what-is-a-persistent-identifier) which are both *unique* and *persistent*, in other words the opposite of [*link rot*](https://youtu.be/EEtMFq7lAKQ).

We could have used other ways to identify that node, such as a unique name.
Even so, if we think of this recipe as a resource online, then its URL is both *unique* and *persistent* as long as the "food.com" website is available. 

Next we'll use [`rdf:type`](https://www.w3.org/TR/rdf-schema/#ch_type) as the predicate `p` to describe the subject as an instance of `wmt:Recipe`

In [4]:
from rdflib.namespace import RDF

p = RDF.type
p

rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type')

While the first two nodes in the graph used vocabularies that are predefined in `rdflib`, now we'll need to reference other vocabularies.
We'll need to use the [`NamespaceManager`](https://rdflib.readthedocs.io/en/stable/namespaces_and_bindings.html) in `rdflib` to bind and access the namespaces for those vocabularies, which is the `nm` variable:

In [5]:
nm = g.namespace_manager

By convention we use a *prefix* as a convenience way to abbreviate each namespace.
For example, in the `rdf:type` predicate above the `rdf:` prefix is an abbreviation for the full `http://www.w3.org/1999/02/22-rdf-syntax-ns#` URL of the RDF namespace. 
See the <http://prefix.cc/> online resource to lookup the common usages for prefixes.

Next we'll define the `wtm` prefix for the "What to Make Base Ontology" at <http://purl.org/heals/food/>

In [6]:
uri = "http://purl.org/heals/food/"
ns_wtm = rdflib.Namespace(uri)

prefix = "wtm"
nm.bind(prefix, ns_wtm)

Now we can use this `wtm:` namespace to reference the object `o` as the `wtm:Recipe` entity:

In [7]:
o = ns_wtm.Recipe
o

rdflib.term.URIRef('http://purl.org/heals/food/Recipe')

Note how that object resolves to the URL <http://purl.org/heals/food/Recipe> – which is a link to the vocabulary's RDF description.

Finally, we'll add the tuple `(s, p, o,)` to the graph:

In [8]:
g.add((s, p, o,))
g

<Graph identifier=Na79ce66b747d43168f34368039d72c2a (<class 'rdflib.graph.Graph'>)>

Now let's add the remaining metadata for the **anytime crepes** recipe.
The required cooking time of "8 minutes" can be represented as a predicate `wtm:hasCookTime` and the literal `8` which we'll define as an [`xsd:integer`](https://rdflib.readthedocs.io/en/stable/rdf_terms.html) value:

In [9]:
p = ns_wtm.hasCookTime
p

rdflib.term.URIRef('http://purl.org/heals/food/hasCookTime')

In [10]:
from rdflib.namespace import XSD

o = rdflib.Literal("8", datatype=XSD.integer)
o

rdflib.term.Literal('8', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer'))

In [11]:
g.add((s, p, o,))

<Graph identifier=Na79ce66b747d43168f34368039d72c2a (<class 'rdflib.graph.Graph'>)>

Now let's add the three ingredients `["eggs", "milk", "whole wheat flour"]` based on the vocabulary <http://purl.org/heals/ingredient/> of food ingredients:

In [12]:
p = ns_wtm.hasIngredient
p

rdflib.term.URIRef('http://purl.org/heals/food/hasIngredient')

The WhatToMake Individuals Ontology contains the invididual recipes and food items (`ind`) used in those recipes that are available in the system. These individuals are defined using classes and properties that are defined in the WhatToMake Base Ontology.

In [13]:
uri = "http://purl.org/heals/ingredient/"
ns_ind = rdflib.Namespace(uri)

prefix = "ind"
nm.bind(prefix, ns_ind)

In [14]:
o = ns_ind.ChickenEgg
o

rdflib.term.URIRef('http://purl.org/heals/ingredient/ChickenEgg')

In [15]:
g.add((s, p, o,))

<Graph identifier=Na79ce66b747d43168f34368039d72c2a (<class 'rdflib.graph.Graph'>)>

In [16]:
g.add((s, p, ns_ind.CowMilk,))
g.add((s, p, ns_ind.WholeWheatFlour,))

<Graph identifier=Na79ce66b747d43168f34368039d72c2a (<class 'rdflib.graph.Graph'>)>

To confirm what we've built so far, we can iterate through each of the `(s, p, o,)` statements in the graph:

In [17]:
for s, p, o in g:
    print(s, p, o)

https://www.food.com/recipe/327593 http://purl.org/heals/food/hasIngredient http://purl.org/heals/ingredient/CowMilk
https://www.food.com/recipe/327593 http://purl.org/heals/food/hasCookTime 8
https://www.food.com/recipe/327593 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/heals/food/Recipe
https://www.food.com/recipe/327593 http://purl.org/heals/food/hasIngredient http://purl.org/heals/ingredient/WholeWheatFlour
https://www.food.com/recipe/327593 http://purl.org/heals/food/hasIngredient http://purl.org/heals/ingredient/ChickenEgg


## Serialization as "Turtle" statements

First let's show how to serialize the graph as `ttl` or [*turtle*](https://www.w3.org/TR/turtle/) format.
This will be returned from RDF as a byte array, so we'll need to use a Unicode [*codec*](https://docs.python.org/3/library/codecs.html) to convert the serialized graph into a string:

In [18]:
s = g.serialize(format="ttl")
print(s)

@prefix ind: <http://purl.org/heals/ingredient/> .
@prefix wtm: <http://purl.org/heals/food/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://www.food.com/recipe/327593> a wtm:Recipe ;
    wtm:hasCookTime 8 ;
    wtm:hasIngredient ind:ChickenEgg,
        ind:CowMilk,
        ind:WholeWheatFlour .




Similarly, we can serialize the graph as RDF statements to a file `tmp.ttl` in the local directory:

In [19]:
g.serialize(destination="tmp.ttl", format="ttl", encoding="utf-8") ;

Try taking a look at the `tmp.ttl` file.
Is it the same as the serialization shown above?

## Serialization as JSON-LD

Next, let's serialize the graph in [JSON-LD](https://json-ld.org/) format, stored in the `tmp.jsonld` file in the local directory:

In [20]:
data = g.serialize(
    format="json-ld",
    indent=2,
    encoding="utf-8",
    )

with open("tmp.jsonld", "wb") as f:
    f.write(data)

Try taking a look at the `tmp.jsonld` file.
Each entity, relation, and literal value has a full URL known as an *IRI* (internationalized resource locator) which [identifies a resource](https://www.w3.org/TR/json-ld11/#iris) used to define it.

We can make these JSON-LD files a bit more succinct by adding a `context` that defines prefixes for each of the vocabularies used:

In [21]:
context = {
    "@language": "en",
    "wtm": "http://purl.org/heals/food/",
    "ind": "http://purl.org/heals/ingredient/",
    }

In [22]:
context

{'@language': 'en',
 'wtm': 'http://purl.org/heals/food/',
 'ind': 'http://purl.org/heals/ingredient/'}

Now we'll serialize again as JSON-LD, this time using the context:

In [23]:
data = g.serialize(
    format="json-ld",
    context=context,
    indent=2,
    encoding="utf-8",
    )

with open("tmp.jsonld", "wb") as f:
    f.write(data)

Open these two files and compare the difference.
Notice how the `ttl` file is easier to read (for people), while the `json-ld` file has all of the metadata explicitly linked and it easier for machines to read – even simply as a JSON file, not using any semantic technologies.

---

## Exercises

**Exercise 1:**

By using `ns_ind.AllPurposeFlour` to represent `"flour"` as another possible ingredient, how would you extend the graph to represent the *German Egg Pancakes* <https://www.food.com/recipe/406738> recipe?

**Exercise 2:**

The `wtm:hasCookTime` predicate uses an `xsd:integer` literal to represent cooking time in minutes.
That may be confusing to someone who is not familiar with this dataset.
Instead, represent the cooking time using an [`xsd:duration`](http://www.datypic.com/sc/xsd/t-xsd_duration.html) literal.