# Examples

Connects to IPFS and instantiate configured indices for searching
It will take a time depending on your IPFS performance

In [None]:
from libstc_geck.client import StcGeck
geck = StcGeck(
    ipfs_http_base_url='http://127.0.0.1:8080',
    index_names=('stc',),
    timeout=300,
)
await geck.start()

GECK encapsulates Python client to Summa. It can be either external stand-alone server or embed server, but details are hidden behind `SummaClient` interface.

In [None]:
summa_client = geck.get_summa_client()

Match search returns top-5 documents which contain `additive manufacturing` in their title, abstract or content.

In [None]:
import json

search_response = await summa_client.search([{
    "index_alias": "stc",
    "query": {
        "match": {
            "value": "additive manufacturing",
            "query_parser_config": {"default_fields": ["abstract", "title", "content"]}
        }
    },
    "collectors": [{"top_docs": {"limit": 5}}],
    "is_fieldnorms_scoring_enabled": False,
}])
for scored_document in search_response.collector_outputs[0].documents.scored_documents:
    document = json.loads(scored_document.document)
    print('DOI:', document['doi'])
    print('Title:', document['title'])
    print('Abstract:', document.get('abstract'))
    print('Links:', document.get('links'))
    print('-----')

Let's download PDFs. `links` field of the document contains IPFS hashes of files with articles. So firtly we check if the link is present and then download this file.

In [None]:
from urllib.parse import quote

for scored_document in search_response.collector_outputs[0].documents.scored_documents:
    document = json.loads(scored_document.document)
    if 'links' not in document:
        continue
    link = document['links'][0]
    with open(quote(document['doi'], safe='') + '.' + link['extension'], 'wb') as f:
        pdf_file = await geck.download(link['cid'])
        f.write(pdf_file)

Below we have several more examples of search queries. More documentation on how to do queries to Summa can be found at https://izihawa.github.io/summa/core/query-dsl/

In [None]:
# Term search in science collection
await summa_client.search([{
    "index_alias": "stc",
    "query": {"term": {"field": "doi", "value": "10.1109/healthcom54947.2022.9982758"}},
    "collectors": [{"top_docs": {"limit": 1}}],
    "is_fieldnorms_scoring_enabled": False,
}])

In [None]:
# Complex query and count results too
await summa_client.search([{
    "index_alias": "stc",
    "query": {"boolean": {"subqueries": [{
        "occur": "should",
        "query": {
            "match": {
                "value": "hemoglobin",
                "query_parser_config": {"default_fields": ["title"]},
             },
        },
    }, {
        "occur": "should",
        "query": {
            "match": {
                "value": "fetal",
                "query_parser_config": {"default_fields": ["title"]},
            },
        },
    }]}},
    "collectors": [{"top_docs": {"limit": 5}}, {"count": {}}],
    "is_fieldnorms_scoring_enabled": False,
}])