# Using HDT hops in python
## 1. Install the modified pyHDT library

`git clone https://github.com/webdata/pyHDT.git` 

`./install.sh`

## 2. Load the HDT document

In [59]:
from hdt import HDTDocument
from enum import Enum

 # Load an HDT file. Missing indexes are generated automatically
document = HDTDocument("test.hdt")

### Just a random test to search and print the first 10 triples in the dataset

In [61]:
# Fetch all triples that matches { ?s ?p ?o }
# Use empty strings ("") to indicates variables
#(triples, cardinality) = document.search_triples("", "", "")

print("cardinality of { ?s ?p ?o }: %i" % cardinality)
#for triple in triples:
#  print(triple)

# Search also support limit and offset
(triples, cardinality) = document.search_triples("", "", "", limit=10)
print ("\nPrinting the first 10 triples:")
for triple in triples:
    print(triple)



#document.string_to_id("http://example.org/uri3",TripleComponentRole.SUBJECT)

cardinality of { ?s ?p ?o }: 15

Printing the first 10 triples:
('http://example.org/uri3', 'http://example.org/predicate3', 'http://example.org/uri4')
('http://example.org/uri3', 'http://example.org/predicate3', 'http://example.org/uri5')
('http://example.org/uri4', 'http://example.org/predicate4', 'http://example.org/uri5')
('http://example.org/uri5', 'http://example.org/predicate1', 'http://example.org/uri5')
('http://example.org/uri5', 'http://example.org/predicate2', 'http://example.org/uri5')
('http://example.org/uri6', 'http://example.org/predicate4', 'http://example.org/uri5')
('http://example.org/uri1', 'http://example.org/predicate1', '"literal1"')
('http://example.org/uri1', 'http://example.org/predicate1', '"literalA"')
('http://example.org/uri1', 'http://example.org/predicate1', '"literalB"')
('http://example.org/uri1', 'http://example.org/predicate1', '"literalC"')


## 3. Configure the hop functionality
The function `configure_hops` sets up the main parameters that can be reused in multiple hop computations. 

The arguments are as follows:
1. Number of hops 
2. List of predicates `["predicate1","predicate2"]` : Filter only those hops with the given predicate terms
3. List of prefixes `["prefix1","prefix2"]` : Filter the hops to only those terms starting with the given prefix
4. `true|false` continuous mapping: use true for a novel continuous dictionary (objects follow the subject IDs) or false for the HDT default dictionary. True is recommended


In [63]:
document.configure_hops(2,["http://example.org/predicate1","http://example.org/predicate2"],"http://example.org",True)


## 4. Run the hops

The function `compute_hops` provides the hops from the given set of entities

The argument is a list of entities as `[id_entity_1,id_entity_2]`

The result is a tuple with three components:
1. The entity IDs of the result, `[result_ID_X,result_ID_Y]`, in the order that will be used in the predicate matrix, i.e. ID 0 --> result_ID_X , ID 1 --> result_ID_Y, etc.
2. The list of predicates in the result set, `[predicate_ID_V,predicate_ID_W]` in the same order as the matrix to follow.
3. Predicate Matrixes, each of them as a list of (subject,object) pairs, i.e. `(ID_S,ID_O),...][(ID_S,ID_O),...]`

In [64]:
document.compute_hops([1])

([5, 1, 12], [2], [[(0, 1), (0, 2)]])