# LENABI: Prototype of new metadata API for serlo.org

## How metadata is stored

We use the specification [Allgemeines Metadatenprofil für Bildungsressourcen](https://github.com/dini-ag-kim/amb) for describing our learning resources. It uses [JSON-LD](https://json-ld.org/) as a format and [schema.org](https://schema.org/) for the vocabulary. For example the metadata about the article [Addition](https://de.serlo.org/mathe/1495/addition) is:

```json
{
  "@context": [
    "https://w3id.org/kim/lrmi-profile/draft/context.jsonld",
    {
      "@language": "de"
    }
  ],
  "id": "https://serlo.org/1495",
  "identifier": {
    "type": "PropertyValue",
    "propertyID": "UUID",
    "value": 1495
  },
  "type": [
    "LearningResource",
    "Article"
  ],
  "learningResourceType": "Article",
  "name": "Addition",
  "description": "Addition, auch Plusrechnen genannt, gehört zu den Grundrechenarten der Mathematik. Lerne, was ein Summand ist. ⇒ Hier lernst du, dass das Assoziativgesetz und Kommutativgesetz gelten.  ⇒ Veranschaulichung durch Merktabellen und Zahlengeraden. Für den Anfang kannst du auch schriftlich addieren! Viele Übungsaufgaben sind verfügbar.✓ Lernen mit Serlo!",
  "dateCreated": "2014-03-01T20:36:44+00:00",
  "dateModified": "2021-03-08T20:51:17+00:00",
  "license": {
    "id": "https://creativecommons.org/licenses/by-sa/4.0/deed.de"
  },
  "version": "https://serlo.org/197588"
}
```

Each property is a [schema.org](http://schema.org) property. For example `identifier` is the same as the property https://schema.org/identifier. Therefore each property has a clear definition which should when metadata of many learning resources are used together.

**Further reading:**

* See https://dini-ag-kim.github.io/amb/draft/schemas/schema.json for a detailed description of the used specification
* The section [Basic concepts](https://www.w3.org/TR/json-ld/#basic-concepts) of the JSON-LD specification gives a good explanation how JSON-LD extends JSON so that properties can get a clear definition.

## Accessing the metadata

The metadata can be accessed via our GraphQL endpoint at https://api.serlo-staging.dev/graphql (serlo-staging.dev is our testing enviornment). In the query namespace `metadata` we have two properties `publisher` and `entities`. `publisher` points to the metadata about Serlo Education e.V. (i.e. the publisher) and via `entities` one can access the metadata about our learning resources. The later property supports pagination via the properties `first` and `after`, filtering by language via `instance` and filtering by mofication date via `modifiedAfter`:

```graphql
extend type Query {
  metadata: MetadataQueryNamespace!
}

type MetadataQueryNamespace {
  # Returns metadata about Serlo Education e.V.
  publisher: JSONObject!
  
  # Returns metadata about learning resources at serlo.org
  entities(
    # Number of metadata objects which shall be returned (default is 100)
    first: Int
    
    # Cursor to the metadata object after which the metadata shall be returned
    after: String
    
    # Filter for the subdomain / language
    instance: Instance
    
    # Filter to return only those learning resources which have been modified
    # after this date (format YYYY-MM-DDTHH:MM:SS+00:00)
    modifiedAfter: String
  ): EntitiesMetadataConnection!
}

type EntitiesMetadataConnection {
  # Array of metadata for learning resources
  nodes: [JSONObject!]!
  
  # Information whether there are more resources to query
  pageInfo: HasNextPageInfo!
}

type HasNextPageInfo {
  # If true then more learning resources can be queried
  hasNextPage: Boolean!
  
  # Cursor which needs to be passed to `after` in order to fetch more learning resources
  endCursor: String
}
```

**Further reading:**

* User [https://api.serlo-staging.dev/___graphql](https://api.serlo-staging.dev/___graphql) to test your GraphQL queries. There you can also find a in depth documention when you click at button `DOCS` on the right side.
* See https://graphql.org/learn/serving-over-http/#post-request for a description how you can make GraphQL requests
* https://graphql.org/learn/ gives a good introduction into GraphQL

## Examples

### Helper functions for displaying the results

In [34]:
import json

from IPython.display import display, Markdown, HTML

def display_json(value, title="The result"):
    
    json_formated = json.dumps(value, indent=2)
    
    display_markdown(f"### {title}")
    display_markdown(f"```json\n{json_formated}\n```")
    
def display_len(list_object, explanation="elements were fetched"):
    display_markdown("**Result:** %s %s" % (len(list_object), explanation))

def display_markdown(text):
    display(Markdown(text))

### Fetching metadata about Serlo Education e.V. (i.e. the publisher)

In [28]:
import requests

req = requests.post(
    "https://api.serlo-staging.dev/graphql",
    headers = {
        "Content-Type": "application/json",
    },
    json = {
        "query": """
            query {
                metadata {
                    publisher
                }
            }
        """
    }
)

display_json(req.json())

### The result

```json
{
  "data": {
    "metadata": {
      "publisher": {
        "@context": [
          "https://w3id.org/kim/lrmi-profile/draft/context.jsonld",
          {
            "@language": "de"
          }
        ],
        "id": "https://serlo.org/",
        "type": [
          "EducationalOrganization",
          "NGO"
        ],
        "name": "Serlo Education e.V.",
        "url": "https://de.serlo.org/",
        "description": "Serlo.org bietet einfache Erkl\u00e4rungen, Kurse, Lernvideos, \u00dcbungen und Musterl\u00f6sungen mit denen Sch\u00fcler*innen und Studierende nach ihrem eigenen Bedarf und in ihrem eigenen Tempo lernen k\u00f6nnen. Die Lernplattform ist komplett kostenlos und werbefrei.",
        "image": "https://assets.serlo.org/5ce4082185f5d_5df93b32a2e2cb8a0363e2e2ab3ce4f79d444d11.jpg",
        "logo": "https://de.serlo.org/_assets/img/serlo-logo.svg",
        "address": {
          "type": "PostalAddress",
          "streetAddress": "Daiserstra\u00dfe 15 (RGB)",
          "postalCode": "81371",
          "addressLocality": "M\u00fcnchen",
          "addressRegion": "Bayern",
          "addressCountry": "Germany"
        },
        "email": "de@serlo.org"
      }
    }
  }
}
```

### Fetching the first page of metadata for entities 

In [29]:
import requests

req = requests.post(
    "https://api.serlo-staging.dev/graphql",
    headers = {
        "Content-Type": "application/json",
    },
    json = {
        "query": """
            query {
                metadata {
                    entities(first: 2) {
                        nodes
                    }
                }
            }
        """
    }
)

display_json(req.json())

### The result

```json
{
  "data": {
    "metadata": {
      "entities": {
        "nodes": [
          {
            "@context": [
              "https://w3id.org/kim/lrmi-profile/draft/context.jsonld",
              {
                "@language": "de"
              }
            ],
            "id": "https://serlo.org/1495",
            "identifier": {
              "type": "PropertyValue",
              "propertyID": "UUID",
              "value": 1495
            },
            "type": [
              "LearningResource",
              "Article"
            ],
            "learningResourceType": "Article",
            "name": "Addition",
            "description": "Addition, auch Plusrechnen genannt, geh\u00f6rt zu den Grundrechenarten der Mathematik. Lerne, was ein Summand ist. \u21d2 Hier lernst du, dass das Assoziativgesetz und Kommutativgesetz gelten.  \u21d2 Veranschaulichung durch Merktabellen und Zahlengeraden. F\u00fcr den Anfang kannst du auch schriftlich addieren! Viele \u00dcbungsaufgaben sind verf\u00fcgbar.\u2713 Lernen mit Serlo!",
            "dateCreated": "2014-03-01T20:36:44+00:00",
            "dateModified": "2021-03-08T20:51:17+00:00",
            "license": {
              "id": "https://creativecommons.org/licenses/by-sa/4.0/deed.de"
            },
            "version": "https://serlo.org/197588"
          },
          {
            "@context": [
              "https://w3id.org/kim/lrmi-profile/draft/context.jsonld",
              {
                "@language": "de"
              }
            ],
            "id": "https://serlo.org/1497",
            "identifier": {
              "type": "PropertyValue",
              "propertyID": "UUID",
              "value": 1497
            },
            "type": [
              "LearningResource",
              "Article"
            ],
            "learningResourceType": "Article",
            "name": "Kleinstes gemeinsames Vielfaches",
            "description": "",
            "dateCreated": "2014-03-01T20:36:51+00:00",
            "dateModified": "2021-09-06T11:11:40+00:00",
            "license": {
              "id": "https://creativecommons.org/licenses/by-sa/4.0/deed.de"
            },
            "version": "https://serlo.org/224107"
          }
        ]
      }
    }
  }
}
```

### Fetching the first and the second page

In [30]:
import requests

def fetch_entities(first, after=None):
    req = requests.post(
        "https://api.serlo-staging.dev/graphql",
        headers = {
            "Content-Type": "application/json",
        },
        json = {
            "query": """
                query($first: Int, $after: String) {
                    metadata {
                        entities(first: $first, after: $after) {
                            nodes
                            pageInfo {
                                hasNextPage
                                endCursor
                            }
                        }
                    }
                }
            """,
            "variables": { "first": first, "after": after }
        }
    )
    
    return req.json()

# Fetching the first page with the first two metadata elements
first_result = fetch_entities(first=2)
first_result_page_info = first_result["data"]["metadata"]["entities"]["pageInfo"]

display_json(first_result_page_info, title="The PageInfo object of the first request")

# Fetching the second page with the next two elements
second_result = fetch_entities(first=2, after=first_result_page_info["endCursor"])

display_json(second_result, title="Metadata of the second page")

### The PageInfo object of the first request

```json
{
  "hasNextPage": true,
  "endCursor": "MTQ5Nw=="
}
```

### Metadata of the second page

```json
{
  "data": {
    "metadata": {
      "entities": {
        "nodes": [
          {
            "@context": [
              "https://w3id.org/kim/lrmi-profile/draft/context.jsonld",
              {
                "@language": "de"
              }
            ],
            "id": "https://serlo.org/1499",
            "identifier": {
              "type": "PropertyValue",
              "propertyID": "UUID",
              "value": 1499
            },
            "type": [
              "LearningResource",
              "Article"
            ],
            "learningResourceType": "Article",
            "name": "Binomische Formeln",
            "description": "Binomische Formeln einfach erkl\u00e4rt. Verwendung der binomischen Formel zum Aufl\u00f6sen von Klammern  und Faktorisieren.  Mit vielen Beispielen und \u00dcbungen! Erfahre mehr zu leichten Beweisen der binomischen Formel mithilfe des Quadrats. \u21d2 Ein Kochrezept zur allgemeinen Vorhergehensweise. Video\u2713",
            "dateCreated": "2014-03-01T20:37:01+00:00",
            "dateModified": "2021-09-06T11:41:11+00:00",
            "license": {
              "id": "https://creativecommons.org/licenses/by-sa/4.0/deed.de"
            },
            "version": "https://serlo.org/224109"
          },
          {
            "@context": [
              "https://w3id.org/kim/lrmi-profile/draft/context.jsonld",
              {
                "@language": "de"
              }
            ],
            "id": "https://serlo.org/1501",
            "identifier": {
              "type": "PropertyValue",
              "propertyID": "UUID",
              "value": 1501
            },
            "type": [
              "LearningResource",
              "Article"
            ],
            "learningResourceType": "Article",
            "name": "Ergebnismenge",
            "description": "",
            "dateCreated": "2014-03-01T20:37:01+00:00",
            "dateModified": "2021-09-08T10:24:28+00:00",
            "license": {
              "id": "https://creativecommons.org/licenses/by-sa/4.0/deed.de"
            },
            "version": "https://serlo.org/224215"
          }
        ],
        "pageInfo": {
          "hasNextPage": true,
          "endCursor": "MTUwMQ=="
        }
      }
    }
  }
}
```

### Fetching metadata of all learning resources

In [35]:
import requests    

def fetch_all_entities(first=500):
    result = []
    endCursor = None
    
    while True:
        current_page = fetch_entities(first, after=endCursor)["data"]["metadata"]["entities"]
        
        result += current_page["nodes"]
        
        if current_page["pageInfo"]["hasNextPage"]:
            endCursor = current_page["pageInfo"]["endCursor"]
        else:
            break
    
    return result

def fetch_entities(first, after=None):
    req = requests.post(
        "https://api.serlo-staging.dev/graphql",
        headers = {
            "Content-Type": "application/json",
        },
        json = {
            "query": """
                query($first: Int, $after: String) {
                    metadata {
                        entities(first: $first, after: $after) {
                            nodes
                            pageInfo {
                                hasNextPage
                                endCursor
                            }
                        }
                    }
                }
            """,
            "variables": { "first": first, "after": after }
        }
    )
    
    return req.json()

all_entities = fetch_all_entities()

display_len(all_entities)

**Result:** 8110 elements were fetched

### Fetching all metadata with filters

In [37]:
import requests    

def fetch_all_entities(first=500, instance=None, modifiedAfter=None):
    result = []
    endCursor = None
    
    while True:
        current_page = fetch_entities(first, after=endCursor, instance=instance, modifiedAfter=modifiedAfter)
        current_page = current_page["data"]["metadata"]["entities"]
        
        result += current_page["nodes"]
        
        if current_page["pageInfo"]["hasNextPage"]:
            endCursor = current_page["pageInfo"]["endCursor"]
        else:
            break
    
    return result

def fetch_entities(first, after=None, instance=None, modifiedAfter=None):
    req = requests.post(
        "https://api.serlo-staging.dev/graphql",
        headers = {
            "Content-Type": "application/json",
        },
        json = {
            "query": """
                query($first: Int, $after: String, $instance: Instance, $modifiedAfter: String) {
                    metadata {
                        entities(first: $first, after: $after, instance: $instance, modifiedAfter: $modifiedAfter) {
                            nodes
                            pageInfo {
                                hasNextPage
                                endCursor
                            }
                        }
                    }
                }
            """,
            "variables": {
                "first": first,
                "after": after,
                "instance": instance,
                "modifiedAfter": modifiedAfter
            }
        }
    )
    
    return req.json()

# == Fetch elements by language / subdomain ==
# Here metadata are fetched from de.serlo.org
german_entities = fetch_all_entities(instance="de")

display_len(german_entities, explanation="entities fetched from de.serlo.org")

# == Fetch elements which are modified in 2021 ==
# Format for modifiedAfter is YYYY-MM-DDTHH:MM:SSZ
entities2021 = fetch_all_entities(modifiedAfter="2021-01-01T00:00:00Z")

display_len(entities2021, explanation="entities fetched which are modified in 2021")

**Result:** 7327 entities fetched from de.serlo.org

**Result:** 2828 entities fetched which are modified in 2021