# 📘 KG course SPARQL notebook

A notebook to run SPARQL queries for the KG course at UM DACS.

1. Update the `g.parse()` calls in the first cell to import your RDF files.
2. In the same folder as the notebook create files with your SPARQL queries (e.g. `q1.rq`), and execute them with `run_query(g, 'q1.rq')`

Use the `.rq` file extension to get SPARQL syntax coloration

In [17]:
import sys
!{sys.executable} -m pip install pandas oxrdflib Pygments

import pandas as pd
from IPython.display import display, HTML
from pygments import highlight
from pygments.lexers import SparqlLexer
from pygments.formatters import HtmlFormatter
from rdflib import Graph

def run_query(graph, query_path):
    try:
        with open(query_path, 'r') as file:
            query = file.read()
    except Exception as _e:
        print(f"No file for {query_path}")
        return
    results = graph.query(query)
    # Display the SPARQL query
    formatted_query = highlight(query, SparqlLexer(), HtmlFormatter(style='solarized-dark', full=True, nobackground=True))
    display(HTML(formatted_query))
    # Convert results to a Pandas DataFrame
    res_list = []
    for row in results:
        res_list.append([str(item) for item in row])
    df = pd.DataFrame(res_list, columns=[str(var) for var in results.vars]) if len(res_list) > 0 else pd.DataFrame()
    # Display the DataFrame as a table in Jupyter Notebook
    display(HTML(df.to_html()))

g = Graph(store="Oxigraph")

# TODO: modify/add paths to your RDF files
g.parse("food_kg.ttl")

print(f"Working with {len(g)} triples")


[notice] A new release of pip is available: 23.2.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#duration, Converter=<function parse_xsd_duration at 0x00000204FA6F4790>
Traceback (most recent call last):
  File "C:\Users\Johan\PycharmProjects\BMKG-Assignemnt-2\.venv\lib\site-packages\rdflib\term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "C:\Users\Johan\PycharmProjects\BMKG-Assignemnt-2\.venv\lib\site-packages\rdflib\xsd_datetime.py", line 433, in parse_xsd_duration
    raise ValueError("Unable to parse duration string " + dur_string)
ValueError: Unable to parse duration string nan
Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#duration, Converter=<function parse_xsd_duration at 0x00000204FA6F4790>
Traceback (most recent call last):
  File "C:\Users\Johan\PycharmPr

Working with 1801 triples


1. Identify one type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [18]:
run_query(g, 'checkNANcookTime.rq')

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#duration, Converter=<function parse_xsd_duration at 0x00000204FA6F4790>
Traceback (most recent call last):
  File "C:\Users\Johan\PycharmProjects\BMKG-Assignemnt-2\.venv\lib\site-packages\rdflib\term.py", line 2163, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "C:\Users\Johan\PycharmProjects\BMKG-Assignemnt-2\.venv\lib\site-packages\rdflib\xsd_datetime.py", line 433, in parse_xsd_duration
    raise ValueError("Unable to parse duration string " + dur_string)
ValueError: Unable to parse duration string nan
Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#duration, Converter=<function parse_xsd_duration at 0x00000204FA6F4790>
Traceback (most recent call last):
  File "C:\Users\Johan\PycharmProjects\BMKG-Assignemnt-2\.venv\lib\site-packages\rdflib\term.py", line 2163, in _castLexicalToPython
    return conv_func(lexic

Unnamed: 0,subject,cookTime
0,http://kg-course/food-nutrition/recipe/48,
1,http://kg-course/food-nutrition/recipe/46,
2,http://kg-course/food-nutrition/recipe/337283,
3,http://kg-course/food-nutrition/recipe/280584,
4,http://kg-course/food-nutrition/recipe/162371,


2. Identify a second type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [19]:
run_query(g, 'check_for_singular_instructions.rq')

Unnamed: 0,recipe,singleInstruction
0,http://kg-course/food-nutrition/recipe/336136,"""Put everything in a large pot and bring to a boil. Simmer till vegetables are to your liking."""
1,http://kg-course/food-nutrition/recipe/337283,"""Use potato masher, mash lemon slices and sugar in deep bowl, until slices release their juice and sugar begins to dissolve. Stir in water and lemon juice until sugar completely dissolves. Strain out lemon slices and chill or pour over ice before serving."""
2,http://kg-course/food-nutrition/recipe/48678,"""Simmer until cabbage is tender and enjoy."""
3,http://kg-course/food-nutrition/recipe/280584,"""Cook meat and onion in a pan. Drain and add to tomatoes and chopped cabbage. Cook rice and add to the pot. Add bay leaf and salt and pepper to taste. Cook till heated through."""
4,http://kg-course/food-nutrition/recipe/241886,"""Combine ingredients and cook until all veggies are soft."""


3. Identify a third type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [20]:
run_query(g,"check_for_objectproperty.rq")

Unnamed: 0,property
0,https://schema.org/recipeIngredient


4. Identify a forth type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [21]:
run_query(g, 'inconsistent_recipe_categories.rq')

Unnamed: 0,foodName,distinctCategories,categories
0,A Jad - Cucumber Pickle,1,Vegetable
1,best blackbottom bie,1,Pie
2,Boston Cream Pie,3,"Dessert, Pie, < 60 Mins"
3,Best Blackbottom Pie,1,Pie
4,Biryani,2,"Indian, Chicken Breast"


5. Identify a fifth type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [22]:
run_query(g, 'checkNegativeValues.rq')

Unnamed: 0,recipe,property,value
0,http://kg-course/food-nutrition/recipe/45,https://schema.org/proteinContent,-4.2
1,http://kg-course/food-nutrition/recipe/41,https://schema.org/fiberContent,-0.2


6. Identify a sixth type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [23]:
run_query(g, 'check_for_durations.rq')

Unnamed: 0,recipe,name,prepTime,cookTime
0,http://kg-course/food-nutrition/recipe/241266,Boston Cream Pie,PT1M,PT1M
1,http://kg-course/food-nutrition/recipe/148683,Cabbage Soup,P0D,PT2H
2,http://kg-course/food-nutrition/recipe/41,Carina's Tofu-Vegetable Kebabs,P1D,PT20M


7. Identify a seventh type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [24]:
run_query(g, 'q7.rq')

No file for q7.rq


8. Identify an eighth type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [25]:
run_query(g, 'check_for_wrong_category_value.rq')

Unnamed: 0,recipe,category
0,http://kg-course/food-nutrition/recipe/520961,< 60 Mins
1,http://kg-course/food-nutrition/recipe/416077,< 60 Mins
2,http://kg-course/food-nutrition/recipe/213018,< 30 Mins
3,http://kg-course/food-nutrition/recipe/148683,< 4 Hours


9. Identify a seventh type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [26]:
run_query(g, 'check_for_empty_reference.rq')

Unnamed: 0,nutrition
0,http://example.org/nonexistent/Nutrition
1,cholesterol


10. Identify a final type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [27]:
run_query(g, 'checkInconsistentNutritionValues.rq')

Unnamed: 0,nutritionNode,property,valueCount
0,http://kg-course/food-nutrition/recipe/42/nutrition,https://schema.org/fiberContent,4
1,http://kg-course/food-nutrition/recipe/42/nutrition,https://schema.org/proteinContent,4
2,http://kg-course/food-nutrition/recipe/42/nutrition,https://schema.org/carbohydrateContent,4
3,http://kg-course/food-nutrition/recipe/42/nutrition,https://schema.org/sugarContent,4
4,http://kg-course/food-nutrition/recipe/42/nutrition,https://schema.org/sodiumContent,4


11. Identify an eleventh type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [28]:
run_query(g, 'check_for_distinct_datatypes.rq')

Unnamed: 0,nutrition,datatype
0,http://kg-course/food-nutrition/recipe/47/nutrition,http://kg-course/food-nutrition/recipe/47/nutrition
1,http://kg-course/food-nutrition/recipe/42/nutrition,http://kg-course/food-nutrition/recipe/42/nutrition
2,http://kg-course/food-nutrition/recipe/45/nutrition,http://kg-course/food-nutrition/recipe/45/nutrition
3,http://kg-course/food-nutrition/recipe/44/nutrition,http://kg-course/food-nutrition/recipe/44/nutrition
4,cholesterol,http://www.w3.org/2001/XMLSchema#string


12. Identify a twelve type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [29]:
run_query(g, 'wrong_recipe_type.rq')

Unnamed: 0,recipe,type
0,http://kg-course/food-nutrition/recipe/38,https://schema.org/Product


13. Identify a thirteen type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [30]:
run_query(g, 'complete_recipe.rq')

Unnamed: 0,recipe
0,http://kg-course/food-nutrition/recipe/98664
1,http://kg-course/food-nutrition/recipe/88095/
2,http://kg-course/food-nutrition/recipe/57879
3,http://kg-course/food-nutrition/recipe/55724
4,http://kg-course/food-nutrition/recipe/510419


14. Identify a fourteen type of quality check different than above, write and run SPARQL to implement the check and return the violating entities.

In [31]:
run_query(g, "check_for_existing_type.rq")

Unnamed: 0,entity
0,http://kg-course/food-nutrition/recipe/49/nutrition
1,http://kg-course/food-nutrition/recipe/49/nutrition
2,http://kg-course/food-nutrition/recipe/49/nutrition
3,http://kg-course/food-nutrition/recipe/49/nutrition
4,http://kg-course/food-nutrition/recipe/49/nutrition
