# How to query the PAP Knowledge base: a jupyter notebook guide

For this guide code to work, the services that compose PAP must be running. Please refer to the [README](https://github.com/mobr-ai/PolkadotAnalytics/blob/main/README.md) file to understand how to run these services.

The KBM class creates an abstraction layer on top of our triplestore's SPARQL engine. A triplestore is a database used to store and manage semantic data in the form of subject-predicate-object triples. The triplestore along with PAP endpoints and the POnto ontology form the basis of PAP knowledge base. SPARQL is the W3C standard language to query triplestores.

The first step to query PAP's knowledge base is importing the KBM (Knowledge Base Manager) class. 

```python
from pap.datalayer.kbm import KBM
```

Then we can make a query using the method 

```python
KBM.run_sparql(sparql_spec:str, term:str)
```

In a sparql query usually you specify wich term you are interested in the SELECT statement, having an '*' when you are interested in all the terms in the triple. The second parameter in the KBM method specifies the name of the term of interest, so the results will be retrieved accordingly. 

The code below executes a query to get all the Parachains described in the Knowledge Base. The query is specified in the sparql_spec variable. For each query it is key to specify the prefixes you will use in the query. The variable KBM.sparql_prefix keeps POnto prefixes for us so we can reuse them in our queries, while being able to add others as needes. Details about the SPARQL query language can be found [here](https://www.w3.org/TR/sparql11-query/).

Note that the result brings only a few Parachains, which we created only to illustrate the process. The appropriate representation will be available only after the PAP stage 2 is complete. The primary goal of stage 2 will be to structure and develop processing workflows to gather raw data from the Polkadot Ecosystem. The knowledge base will dynamically maintain a formal representation of extracted data aligned with the POnto ontology.

> Tip: use the code snippets below to test your own queries

In [4]:
import sys
sys.path.append("../")

from pap.datalayer.kbm import KBM

sparql_spec = KBM.sparql_prefix + """
SELECT ?s
WHERE {
    ?s a ponto:Parachain
}
"""

r = KBM.run_sparql(sparql_spec, "s")
print (r)

['Astar', 'Statemine', 'Acala', 'Collectives', 'Moonbeam', 'Statemint', 'Phala']


This type of query can be specified for any class represented in the knowledge base. The code snippet below gets a list of all the POnto classes present in the KB.

In [5]:
sparql_spec = KBM.sparql_prefix + """
    SELECT DISTINCT ?s
    WHERE {
    {
        ?s a owl:Class .
    }
    UNION
    {
        ?s rdfs:subClassOf ?class .
    }
    }
"""

r = KBM.run_sparql(sparql_spec, "s")
print (r)

['Extrinsic', 'Transaction', 'StakeHolder', 'OnlineMessage', 'Interoperability', 'Metadata', 'AccountHolder', 'BrowserWallet', 'Oracle', 'Teleport', 'SDK', 'Preimage', 'SlotLeader', 'MobileWallet', 'Wallet', 'Component', 'Collective', 'ConsensusMechanism', 'FinalityGadget', 'ProofOfStake', 'LedgerArchitecture', 'DataFeed', 'Inherent', 'Commission', 'RelayChain', 'DistributedLedger', 'Equivocation', 'StakePool', 'Treasury', 'NativeToken', 'FungibleToken', 'TechnicalCommittee', 'SCALE', 'Governance', 'Origin', 'NextSession', 'BLS', 'LedgerRecord', 'Node', 'ProofOfWork', 'Attestation', 'Parachain', 'KSM', 'PolkadotArchitecture', 'Block', 'Track', 'Oversubscribed', 'Curator', 'Bridge', 'Sender', 'ActiveNomination', 'Blockchain', 'SoftFork', 'VoteDelegation', 'Stake', 'Architecture', 'FullNode', 'LightNode', 'Session', 'ReGenesis', 'TVL', 'Authority', 'CommunityQueue', 'Liveness', 'InactiveNomination', 'Validator', 'Motion', 'ProxyAccount', 'AvailabilityCores', 'NodeAccount', 'Account', 'Co

The way POnto ontology is structure, each class has its deffinition specified as an rdfs:comment. Note that any POnto class can be used in the query specified in the code below to answer "what is" type of questions.

In [6]:
sparql_spec = KBM.sparql_prefix + """
    SELECT DISTINCT ?def
    WHERE {
        ponto:XCMChannel rdfs:comment ?def
    }
"""

r = KBM.run_sparql(sparql_spec, "def")
print (r)

['Represents a communication pathway between two or more Parachains or between a Parachain and the Relay Chain. It allows for the exchange of messages, instructions, and assets across different chains, facilitating inter-chain communication.']


Now lets create some individuals to illustrate how powerful a query can be. It is important to highlight that these individuals will be automatically fetched from the Polkadot ecosystem as soon as PAP stage 2 is completed.

In the example, the triples specifies three accounts a1, a2, and a3. These accounts have 6.1, 4.9, and 5.2 DOTs, respectivelly. In addition, the account a3 has 5.1 KSM in it.

In the code snippet below, the triples are stored in the triples variable, which is used in the KBM.inject_triples method to inject the triples in the knowledge base.

In [10]:
triples = """
ponto:a1 a ponto:Account ;
     ponto:hasToken ponto:dot1 .

ponto:a2 a ponto:Account ;
     ponto:hasToken ponto:dot2 .

ponto:a3 a ponto:Account ;
     ponto:hasToken ponto:dot3 ;
     ponto:hasToken ponto:ksm1 .

ponto:dot1 a ponto:DOT ;
     ponto:hasBalance "6.1"^^xsd:decimal .

ponto:dot2 a ponto:DOT .
ponto:dot2 ponto:hasBalance "4.9"^^xsd:decimal .

ponto:dot3 a ponto:DOT ;
     ponto:hasBalance "5.2"^^xsd:decimal .

ponto:ksm1 a ponto:KSM ;
     ponto:hasBalance "5.1"^^xsd:decimal .
"""

KBM.inject_triples(triples)

Now lets use these triples to make a more interesting query. The code snipet below specifies a SPARQL query to check how many accounts have DOT and KSM tokens? 

Again, note that the result is upon the triples we just injected in the triplestore, for illustrative purposes only. The appropriate representation will be available only after the PAP stage 2 is complete.

In [13]:
sparql_spec = KBM.sparql_prefix + """
    SELECT (COUNT(DISTINCT ?account) AS ?numOfAccounts)
    WHERE {
        ?account ponto:hasToken ?tokenD .
        ?tokenD a ponto:DOT .
        ?account ponto:hasToken ?tokenK .
        ?tokenK a ponto:KSM .
    }
"""

r = KBM.run_sparql(sparql_spec, "numOfAccounts")
print (r)

['1']
