## A quick tutor of `prank`

#### First step
we need to import the text data and tag it. Thanks to `spaCy`, we can do it really quick and simple.

In [1]:
from prank.object import Docs
mydocs = Docs('./data/toy.txt')
mydocs.initialize()

1it [00:00,  1.88it/s]


*for the `initialize` method, we can use a keyword `preload` to control the size of text we need, it's useful for some very large data like, `wiki` with more than 6GB text*

#### second step
we should define a way to search new tuples and patterns from the raw text data, we refer the work [PRDualRank](https://dl.acm.org/doi/10.1145/1935826.1935933) here.

In [2]:
from prank.search import PRDualRankSearch
searcher = PRDualRankSearch(mydocs)

Then, we can start to search(bootstrapping) new tuples and patterns, based on the given seed tuples. We use the seed tuples from the relationship `multiply two`

In [15]:
from prank.world import *
from prank.object import Tuple, Pattern

seeds = set([
    Tuple("1", "2", seed=True), Tuple('2', '4', seed=True), 
    Tuple("3", '6', seed=True), Tuple('4', '8', seed=True)
])
# Note that, Tuple and Pattern will automatically record data everytime we instantiate.
# We can use Tuple.tuples() or Pattern.patterns() to access them

propagate_time = 20
for _ in range(propagate_time):
    searcher.fromTuple2Pattern(Tuple.tuples())
    searcher.fromPattern2Tuple(Pattern.patterns(), Tuple.tuples())
print(ystr(f"Found {Pattern.pattern_num()} patterns"))
print(ystr(f"Found {Tuple.tuple_num()} tuples"))

Let's take a look about those tuples and patterns

In [16]:
print(gstr("Pattern examples:"), Pattern.patterns())
print(gstr("Tuple examples:"), Tuple.tuples()[10:15])

So far, we already have a bunch of tuples and patterns that maybe revelant to the relation we interest. What we need to do next is to rank those tuples and patterns

#### Third step
Infer the precision and recall of tuples and patterns, respectively

In [17]:
from prank.inference import PRDualRank
inferor = PRDualRank()

tuples = Tuple.remainTopK(20)

relation = {
    tup : tup.relationship for tup in tuples
}

results = inferor.infer(
    Tuple.tuples(),
    Pattern.patterns(),
    relation,
    seed_tuples=list(seeds),
    max_iter=10
)

In [19]:
list(results)

['tuple precision', 'pattern precision', 'tuple recall', 'pattern recall']

#### The last step
Using the precision and recall to rank tuples, patterns, respectively.
Here we use f1-score to balance both metrics.

In [18]:
from prank.rank import f1_score_rank
top_t, top_p = f1_score_rank(results, inferor)
print(gstr("Top-8 tuples:"), top_t[-8:])
print(gstr("Top-2 patterns:"), top_p[-2:])

Finally, we can see the results.
The result is intuitively reasonable. By our inference, we can find some similar tuples from the seed tuples we have, such as `(24, 48)`, `(15,30)`.
But the top tuples still hacv wrong answer, which is easy to understand.
Our seed tuples is from the relation `multiply two`, however there are some seed tuples also satisfy other relationship.
* `(2,4)` also satisfy `square` relationship
* `(1,2)` also satisfy `plus one` relationship
That's also the diffcult of tuple extraction