# SPARQL queries

This notebook demonstrates how SPARQL queries can be composed programmatically, and without (almost) no knowledge of SPARQL. For this purpose, we will use an existing dataset.

In [5]:
from atomrdf import KnowledgeGraph

In [6]:
#kg = KnowledgeGraph.unarchive('dataset.tar.gz')
kg = KnowledgeGraph.unarchive('dataset', compress=False)

In [7]:
kg.n_samples

22

The dataset contains 22 samples

Of course, SPARQL queries can be directly run through atomRDF. See an example:

In [8]:
query = """
PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
SELECT DISTINCT ?sample ?symbol ?number 
WHERE {
    ?sample cmso:hasMaterial ?material .
    ?material cmso:hasStructure ?structure .
    ?structure cmso:hasSpaceGroupSymbol ?symbol .
    ?sample cmso:hasNumberOfAtoms ?number .
FILTER (?number="4"^^xsd:integer)
}"""

The above query finds the Space Group symbol of all structures which have four atoms.

In [9]:
kg.query(query)

Unnamed: 0,sample,symbol,number
0,sample:10ffd2cc-9e92-4f04-896d-d6c0fdb9e55f,Pm-3m,4
1,sample:1f6b1b0f-446a-4ad8-877e-d2e6176797df,Fm-3m,4
2,sample:286c3974-962b-4333-a2bb-d164ae645454,Fm-3m,4
3,sample:67be61c7-f9c7-4d46-a61d-5350fd0ee246,Fm-3m,4
4,sample:721b7447-8363-4e65-9515-9da2581d7124,Fm-3m,4
5,sample:8fc8e47b-acee-40f8-bcbf-fc298cc31f05,Fm-3m,4
6,sample:9f0f48d1-5ebf-4f7a-b241-5e7aa273f5a0,Fm-3m,4
7,sample:a3cf6d97-c922-4c4d-8517-e784df83b71e,Fm-3m,4
8,sample:ab2bea57-39ea-49ea-ad3f-c1c40b013154,Fm-3m,4
9,sample:aef7472e-7577-4256-8422-6ba77a954ce1,Fm-3m,4


The results are given in the form of a Pandas DataFrame.

This query can also be performed programmatically, which looks like this:

In [18]:
kg.terms.cmso.hasNumberOfAtoms.domain

['cmso:AtomicScaleSample',
 'cmso:ComputationalSample',
 'cmso:MacroscaleSample',
 'cmso:MesoscaleSample',
 'cmso:MicroscaleSample',
 'cmso:NanoscaleSample']

In [19]:
kg.query_sample([kg.terms.cmso.hasSpaceGroupSymbol, kg.terms.cmso.hasNumberOfAtoms==4])

Unnamed: 0,AtomicScaleSample,hasSpaceGroupSymbolvalue,hasNumberOfAtomsvalue
0,sample:10ffd2cc-9e92-4f04-896d-d6c0fdb9e55f,Pm-3m,4
1,sample:286c3974-962b-4333-a2bb-d164ae645454,Fm-3m,4
2,sample:8fc8e47b-acee-40f8-bcbf-fc298cc31f05,Fm-3m,4
3,sample:9f0f48d1-5ebf-4f7a-b241-5e7aa273f5a0,Fm-3m,4
4,sample:e54c0e91-52ec-4c47-8ba3-43979a1ebe2e,Fm-3m,4
5,sample:1f6b1b0f-446a-4ad8-877e-d2e6176797df,Fm-3m,4
6,sample:67be61c7-f9c7-4d46-a61d-5350fd0ee246,Fm-3m,4
7,sample:721b7447-8363-4e65-9515-9da2581d7124,Fm-3m,4
8,sample:a3cf6d97-c922-4c4d-8517-e784df83b71e,Fm-3m,4
9,sample:ab2bea57-39ea-49ea-ad3f-c1c40b013154,Fm-3m,4


or more explicitely (`query_sample` is just a shortcut for `auto_query`)

In [20]:
kg.auto_query(kg.terms.cmso.AtomicScaleSample, [kg.terms.cmso.hasSpaceGroupSymbol, kg.terms.cmso.hasNumberOfAtoms==4])

Unnamed: 0,AtomicScaleSample,hasSpaceGroupSymbolvalue,hasNumberOfAtomsvalue
0,sample:10ffd2cc-9e92-4f04-896d-d6c0fdb9e55f,Pm-3m,4
1,sample:286c3974-962b-4333-a2bb-d164ae645454,Fm-3m,4
2,sample:8fc8e47b-acee-40f8-bcbf-fc298cc31f05,Fm-3m,4
3,sample:9f0f48d1-5ebf-4f7a-b241-5e7aa273f5a0,Fm-3m,4
4,sample:e54c0e91-52ec-4c47-8ba3-43979a1ebe2e,Fm-3m,4
5,sample:1f6b1b0f-446a-4ad8-877e-d2e6176797df,Fm-3m,4
6,sample:67be61c7-f9c7-4d46-a61d-5350fd0ee246,Fm-3m,4
7,sample:721b7447-8363-4e65-9515-9da2581d7124,Fm-3m,4
8,sample:a3cf6d97-c922-4c4d-8517-e784df83b71e,Fm-3m,4
9,sample:ab2bea57-39ea-49ea-ad3f-c1c40b013154,Fm-3m,4


Now the building of such a query programmatically is discussed. The function needs a source and destination(s). Destination can include conditions attached to it, for example, that numbers of atoms. The first thing to do is to find the right terms. For this, we can use the tab completion feature.

In [21]:
kg.terms

cmso, qudt, pldo, podo, asmo, ns, calculation, ldo, prov, rdf, rdfs

Those are all the ontologies, with the terms we use. One can go deeper down

In [22]:
kg.terms.cmso

SimulationCell, UnitCell, LatticeAngle, SimulationCellAngle, Angle, AtomicScaleSample, AtomicForce, AtomicPosition, AtomicVelocity, CoordinationNumber, Occupancy, AtomAttribute, Basis, MacroscaleSample, MesoscaleSample, MicroscaleSample, NanoscaleSample, ComputationalSample, CalculatedProperty, CrystallineMaterial, CrystalDefect, Atom, Molecule, ChemicalSpecies, ChemicalElement, LatticeParameter, SimulationCellLength, Length, AmorphousMaterial, Material, CrystalStructure, SpaceGroup, Microstructure, Structure, LatticeVector, SimulationCellVector, Vector, ChemicalComposition, hasAngle, hasAttribute, hasBasis, hasCalculatedProperty, isCalculatedPropertyOf, hasDefect, isDefectOf, hasElement, hasLatticeParameter, hasLength, hasMaterial, isMaterialOf, hasSimulationCell, hasSpaceGroup, hasSpecies, hasStructure, hasUnit, hasUnitCell, hasVector, hasAltName, hasName, hasAngle_alpha, hasAngle_beta, hasAngle_gamma, hasAtomicPercent, hasBravaisLattice, hasChemicalSymbol, hasSymbol, hasComponent_x,

And further select terms from there.

In [23]:
kg.terms.cmso.AtomicScaleSample

cmso:AtomicScaleSample
Atomic scale sample is a computational sample in the atomic length scale.

Applying constraints can be done through basic comparison operators

## Basic comparison operations

Basic operators such as <, >, <=, >=, and ==

These operations are useful for adding conditions to the SPARQL query. When these operations are performed on a term, it is stored in its condition string. No other changes are needed. 

In [25]:
kg.terms.cmso

SimulationCell, UnitCell, LatticeAngle, SimulationCellAngle, Angle, AtomicScaleSample, AtomicForce, AtomicPosition, AtomicVelocity, CoordinationNumber, Occupancy, AtomAttribute, Basis, MacroscaleSample, MesoscaleSample, MicroscaleSample, NanoscaleSample, ComputationalSample, CalculatedProperty, CrystallineMaterial, CrystalDefect, Atom, Molecule, ChemicalSpecies, ChemicalElement, LatticeParameter, SimulationCellLength, Length, AmorphousMaterial, Material, CrystalStructure, SpaceGroup, Microstructure, Structure, LatticeVector, SimulationCellVector, Vector, ChemicalComposition, hasAngle, hasAttribute, hasBasis, hasCalculatedProperty, isCalculatedPropertyOf, hasDefect, isDefectOf, hasElement, hasLatticeParameter, hasLength, hasMaterial, isMaterialOf, hasSimulationCell, hasSpaceGroup, hasSpecies, hasStructure, hasUnit, hasUnitCell, hasVector, hasAltName, hasName, hasAngle_alpha, hasAngle_beta, hasAngle_gamma, hasAtomicPercent, hasBravaisLattice, hasChemicalSymbol, hasSymbol, hasComponent_x,

In [None]:
kg.terms.cmso.hasElementRatio==1.0

## Logical operators

Logical operators currently supported are & and |. These operators, when applied, aggregates the condition between two terms|

In [None]:
(kg.terms.cmso.hasChemicalSymbol=='Al') & (kg.terms.cmso.hasElementRatio==1.0)

In [None]:
(kg.terms.cmso.hasChemicalSymbol=='Al') | (kg.terms.cmso.hasElementRatio==1.0)

## @ operator

The final class of operator we have is the @ operator. This can be used for resolving terms that has multiple paths. For example, rdfs:label which multiple entities can have. 

If we want to specify label for the InputParameter, it can be done like this:

In [None]:
kg.terms.rdfs.label@kg.terms.asmo.hasInputParameter

conditions can also be applied on top

In [None]:
kg.terms.rdfs.label@kg.terms.asmo.hasInputParameter=='label_string'

That summarises all the possible options. Now we put together these blocks to formulate some more complex queries

__All samples that have been used for an energy calculation__

In [26]:
kg.auto_query(kg.terms.cmso.AtomicScaleSample, kg.terms.asmo.EnergyCalculation)

Unnamed: 0,AtomicScaleSample,EnergyCalculation
0,sample:e54c0e91-52ec-4c47-8ba3-43979a1ebe2e,activity:f61a2139-2dae-4aab-954e-73d34d7bc042
1,sample:721b7447-8363-4e65-9515-9da2581d7124,activity:0848b931-d647-41c7-a6dc-8150989e36c7
2,sample:ab2bea57-39ea-49ea-ad3f-c1c40b013154,activity:8a680cb2-c7f1-4747-95b0-a4ce71fab87f
3,sample:b1f52dc6-5c92-428f-8f7a-78794fd0544c,activity:2e461195-15a4-45ba-b369-5a2429ded084
4,sample:d015cfca-e047-40bc-baab-423e87fa2618,activity:1e081e86-73fd-45e5-8341-cab787b9ff0c
5,sample:fb01a7f2-8984-442b-a32e-15321c4fa99b,activity:923e1808-efdf-4a6a-a5de-9e0a64cb198c


__Which of those had an input parameter, called Temperature?__

In [27]:
kg.auto_query(kg.terms.cmso.AtomicScaleSample, [kg.terms.asmo.EnergyCalculation,
                                                kg.terms.rdfs.label@kg.terms.asmo.InputParameter=='temperature'])

Unnamed: 0,AtomicScaleSample,EnergyCalculation,InputParameter_labelvalue
0,sample:e54c0e91-52ec-4c47-8ba3-43979a1ebe2e,activity:f61a2139-2dae-4aab-954e-73d34d7bc042,temperature


And the value

In [28]:
kg.auto_query(kg.terms.cmso.AtomicScaleSample, [kg.terms.asmo.EnergyCalculation,
                                                kg.terms.rdfs.label@kg.terms.asmo.InputParameter=='temperature',
                                                kg.terms.asmo.hasValue@kg.terms.asmo.InputParameter])

Unnamed: 0,AtomicScaleSample,EnergyCalculation,InputParameter_labelvalue,InputParameter_hasValuevalue
0,sample:e54c0e91-52ec-4c47-8ba3-43979a1ebe2e,activity:f61a2139-2dae-4aab-954e-73d34d7bc042,temperature,500.0


**What is the composition and space group of these structures**

In [30]:
ss = kg.auto_query(kg.terms.cmso.AtomicScaleSample, [
        kg.terms.rdfs.label@kg.terms.asmo.InputParameter=='temperature',
        kg.terms.asmo.hasValue@kg.terms.asmo.InputParameter,
        kg.terms.cmso.hasSpaceGroupSymbol,
        kg.terms.cmso.hasChemicalSymbol,
        kg.terms.cmso.hasElementRatio],
             return_query=True)

In [31]:
print(ss)

PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
PREFIX qudt: <http://qudt.org/schema/qudt/>
PREFIX pldo: <http://purls.helmholtz-metadaten.de/pldo/>
PREFIX podo: <http://purls.helmholtz-metadaten.de/podo/>
PREFIX asmo: <http://purls.helmholtz-metadaten.de/asmo/>
PREFIX ns: <http://www.w3.org/ns/>
PREFIX calculation: <https://w3id.org/mdo/calculation/>
PREFIX ldo: <http://purls.helmholtz-metadaten.de/cdos/ldo/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?AtomicScaleSample ?InputParameter_labelvalue ?InputParameter_hasValuevalue ?hasSpaceGroupSymbolvalue ?hasChemicalSymbolvalue ?hasElementRatiovalue
WHERE {
    ?AtomicScaleSample prov:wasGeneratedBy ?asmo_EnergyCalculation .
    ?asmo_EnergyCalculation asmo:hasInputParameter ?InputParameter .
    ?InputParameter rdfs:label ?InputParameter_labelvalue .
    ?InputParameter asmo:hasValue ?InputParameter_hasV