# Examples

Connects to IPFS and instantiate configured indices for searching
It will take a time depending on your IPFS performance

In [1]:
from stc_geck.client import StcGeck
geck = StcGeck(
    ipfs_http_base_url='http://127.0.0.1:8080',
    timeout=300,
)
await geck.start()

GECK encapsulates Python client to Summa. It can be either external stand-alone server or embed server, but details are hidden behind `SummaClient` interface.

In [2]:
summa_client = geck.get_summa_client()

Match search returns top-5 documents which contain `additive manufacturing` in their title, abstract or content.

In [3]:
import json

search_response = await summa_client.search({
    "index_alias": "nexus_science",
    "query": {
        "match": {
            "value": "additive manufacturing",
            "query_parser_config": {"default_fields": ["abstract", "title", "content"]}
        }
    },
    "collectors": [{"top_docs": {"limit": 5}}],
    "is_fieldnorms_scoring_enabled": False,
})
for scored_document in search_response.collector_outputs[0].documents.scored_documents:
    document = json.loads(scored_document.document)
    print('ID:', document['id'])
    print('DOI:', document['id']['dois'])
    print('Title:', document['title'])
    print('Abstract:', document.get('abstract'))
    print('Links:', document.get('links'))
    print('-----')

ID: {'dois': ['10.21203/rs.3.rs-212116/v1']}
DOI: ['10.21203/rs.3.rs-212116/v1']
Title: Research on Gradient Additive Remanufacturing of Ultra-large Hot Forging Die Based on Automatic Wire Arc Additive Manufacturing Technology
Abstract: <abstract><header>Abstract</header>
<p>In this paper, an automatic WAAM technology are proposed to realize the gradient additive remanufacturing of ultra-large hot forging dies. Firstly, a vertical additive manufacturing strategy and a normal additive manufacturing strategy are proposed to meet different additive manufacturing demands. Secondly, the basic principle of layering design of ultra-large hot forging dies is developed, and the wear resistance of Ni-based, Co-based and Fe-based alloys at room temperature and high temperature is analyzed. The Co-based alloy has the best high temperature wear resistance, which can be used on the surface of the hot forging die to strengthen the die. In order to control the forming quality of additive manufacturing

Let's download PDFs. `links` field of the document contains IPFS hashes of files with articles. So firtly we check if the link is present and then download this file.

In [4]:
from urllib.parse import quote

for scored_document in search_response.collector_outputs[0].documents.scored_documents:
    document = json.loads(scored_document.document)
    if 'links' not in document:
        continue
    link = document['links'][0]
    with open(quote(document['id']['dois'][0], safe='') + '.' + link['extension'], 'wb') as f:
        pdf_file = await geck.download(link['cid'])
        f.write(pdf_file)

Below we have several more examples of search queries. More documentation on how to do queries to Summa can be found at https://izihawa.github.io/summa/core/query-dsl/

In [5]:
# Term search in science collection
search_response = await summa_client.search({
    "index_alias": "nexus_science",
    "query": {"term": {"field": "id.dois", "value": "10.1109/healthcom54947.2022.9982758"}},
    "collectors": [{"top_docs": {"limit": 1}}],
    "is_fieldnorms_scoring_enabled": False,
})
for found_document in search_response.collector_outputs[0].documents.scored_documents:
    document = json.loads(found_document.document)
    print(document['title'])
    print('-----')

Long COVID Diary - Design and Development of a Support Application for People with Long COVID
-----


In [6]:
# Complex query and count results too
search_response = await summa_client.search({
    "index_alias": "nexus_science",
    "query": {"boolean": {"subqueries": [{
        "occur": "should",
        "query": {
            "match": {
                "value": "hemoglobin",
                "query_parser_config": {"default_fields": ["title"]},
             },
        },
    }, {
        "occur": "should",
        "query": {
            "match": {
                "value": "fetal",
                "query_parser_config": {"default_fields": ["title"]},
            },
        },
    }]}},
    "collectors": [{"top_docs": {"limit": 5}}, {"count": {}}],
    "is_fieldnorms_scoring_enabled": False,
})
for found_document in search_response.collector_outputs[0].documents.scored_documents:
    document = json.loads(found_document.document)
    print(document['title'])
    print('-----')

Fetal hemoglobin-containing cells have the same mean corpuscular hemoglobin as cells without fetal hemoglobin: a reciprocal relationship between gamma- and beta-globin gene expression in normal subjects and in those with high fetal hemoglobin production
-----
Fetal hemoglobin-containing cells have the same mean corpuscular hemoglobin as cells without fetal hemoglobin: a reciprocal relationship between gamma- and beta-globin gene expression in normal subjects and in those with high fetal hemoglobin production
-----
Fetal hemoglobin production in cultures of primitive and mature human erythroid progenitors: differentiation affects the quantity of fetal hemoglobin produced per fetal-hemoglobin-containing cell
-----
Fetal hemoglobin production in cultures of primitive and mature human erythroid progenitors: differentiation affects the quantity of fetal hemoglobin produced per fetal-hemoglobin-containing cell
-----
Fetal hemoglobin (HbF) synthesis in baboons, Papio cynocephalus. Analysis of