# 📘 KG course SPARQL notebook

A notebook to run SPARQL queries for the KG course at UM DACS.

1. Update the `g.parse()` calls in the first cell to import your RDF files.
2. In the same folder as the notebook create files with your SPARQL queries (e.g. `q1.rq`), and execute them with `run_query(g, 'q1.rq')`

Use the `.rq` file extension to get SPARQL syntax coloration

In [1]:
import sys
!{sys.executable} -m pip install pandas oxrdflib Pygments

import pandas as pd
from IPython.display import display, HTML
from pygments import highlight
from pygments.lexers import SparqlLexer
from pygments.formatters import HtmlFormatter
from rdflib import Graph

def run_query(graph, query_path):
    try:
        with open(query_path, 'r') as file:
            query = file.read()
    except Exception as _e:
        print(f"No file for {query_path}")
        return
    results = graph.query(query)
    # Display the SPARQL query
    formatted_query = highlight(query, SparqlLexer(), HtmlFormatter(style='solarized-dark', full=True, nobackground=True))
    display(HTML(formatted_query))
    # Convert results to a Pandas DataFrame
    res_list = []
    for row in results:
        res_list.append([str(item) for item in row])
    df = pd.DataFrame(res_list, columns=[str(var) for var in results.vars]) if len(res_list) > 0 else pd.DataFrame()
    # Display the DataFrame as a table in Jupyter Notebook
    display(HTML(df.to_html()))

g = Graph(store="Oxigraph")

# TODO: modify/add paths to your RDF files
g.parse("food_kg.ttl")

print(f"Working with {len(g)} triples")




[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: c:\Users\annamaria.grillhosl\.pyenv\pyenv-win\versions\3.12.0\python.exe -m pip install --upgrade pip
Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#duration, Converter=<function parse_xsd_duration at 0x000002D85D6B19E0>
Traceback (most recent call last):
  File "c:\Users\annamaria.grillhosl\.pyenv\pyenv-win\versions\3.12.0\Lib\site-packages\rdflib\term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
           ^^^^^^^^^^^^^^^^^^
  File "c:\Users\annamaria.grillhosl\.pyenv\pyenv-win\versions\3.12.0\Lib\site-packages\rdflib\xsd_datetime.py", line 433, in parse_xsd_duration
    raise ValueError("Unable to parse duration string " + dur_string)
ValueError: Unable to parse duration string nan
Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#duration, Converter=<function parse

Working with 1801 triples


1. Identify one type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [3]:
run_query(g, 'check_ingredients_in_instruction.rq')

Unnamed: 0,recipe,ingredient,cleanedIngredient,instruction
0,http://kg-course/food-nutrition/recipe/88096,2 onions,onions,"Add the onions and tomatoes, squeezing them with your hands as you put them into the pot to break them up."
1,http://kg-course/food-nutrition/recipe/88096,2 brown sugar,brown sugar,"Stir in the lemon juice, brown sugar, and tomato paste."
2,http://kg-course/food-nutrition/recipe/88096,1/2 - 3/4 tomato paste,tomato paste,"Stir in the lemon juice, brown sugar, and tomato paste."
3,http://kg-course/food-nutrition/recipe/88096,1 cabbage,cabbage,"Stir in the cabbage, salt, and pepper and cover and cook for 1 hour."
4,http://kg-course/food-nutrition/recipe/74837,3 milk,milk,Stir the dry ingredients and milk mixture into the egg mixture as quickly as possible; stir in vanilla and lemon extract if using.
5,http://kg-course/food-nutrition/recipe/74837,3 milk,milk,"Sift the flour, baking powder and salt together; in a saucepan heat the milk and melt the butter."
6,http://kg-course/food-nutrition/recipe/74837,3 milk,milk,"Filling: In a double boiler over medium-low heat, cook sugar, flour, salt and milk for 15 minutes, stirring constantly until thickened and bubbly."
7,http://kg-course/food-nutrition/recipe/74837,2 butter,butter,"Sift the flour, baking powder and salt together; in a saucepan heat the milk and melt the butter."
8,http://kg-course/food-nutrition/recipe/74837,2 butter,butter,"Frosting: Bring the sugar, cornstarch, chocolate, salt and water to a boil; cook, stirring, 3-5 minutes or until thickened enough to coat a spoon thickly; remove from heat, add butter and vanilla and stir until combined."
9,http://kg-course/food-nutrition/recipe/74837,1/4 vanilla,vanilla,"Temper the egg with the cooked mixture and pour back into the pan; cook another 3-5 minutes until thickened; remove from heat, stir in vanilla or orange extract and cool completely with plastic wrap on custard surface; spread on bottom cake layer, add top layer."


2. Identify a second type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [13]:
run_query(g, 'check_for_existing_type.rq')

Unnamed: 0,entity
0,http://kg-course/food-nutrition/recipe/49/nutrition
1,http://kg-course/food-nutrition/recipe/49/nutrition
2,http://kg-course/food-nutrition/recipe/49/nutrition
3,http://kg-course/food-nutrition/recipe/49/nutrition
4,http://kg-course/food-nutrition/recipe/49/nutrition
5,http://kg-course/food-nutrition/recipe/49/nutrition
6,http://kg-course/food-nutrition/recipe/49/nutrition
7,http://kg-course/food-nutrition/recipe/49/nutrition
8,http://kg-course/food-nutrition/recipe/49/nutrition
9,http://kg-course/food-nutrition/recipe/48/nutrition


3. Identify a third type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [4]:
#run_query(g, 'check_unique_ingredients.rq')
run_query(g, 'check_unique_keyword.rq')

4. Identify a forth type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [6]:
run_query(g, 'q4.rq')

No file for q4.rq


5. Identify a fifth type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [7]:
run_query(g, 'q5.rq')

No file for q5.rq


6. Identify a sixth type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [8]:
run_query(g, 'q6.rq')

No file for q6.rq


7. Identify a seventh type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [9]:
run_query(g, 'q7.rq')

No file for q7.rq


8. Identify an eighth type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [10]:
run_query(g, 'q8.rq')

No file for q8.rq


9. Identify a seventh type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [11]:
run_query(g, 'q9.rq')

No file for q9.rq


10. Identify a final type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [12]:
run_query(g, 'q10.rq')

No file for q10.rq
