# Examples

Connects to IPFS and instantiate configured indices for searching
It will take a time depending on your IPFS performance

In [1]:
from stc_geck.client import StcGeck
geck = StcGeck(
    ipfs_http_base_url='http://127.0.0.1:8080',
    timeout=300,
)
await geck.start()

GECK encapsulates Python client to Summa. It can be either external stand-alone server or embed server, but details are hidden behind `SummaClient` interface.

In [2]:
summa_client = geck.get_summa_client()

Match search returns top-5 documents which contain `additive manufacturing` in their title, abstract or content.

In [3]:
from stc_geck.advices import format_document


documents = await summa_client.search_documents({
    "index_alias": "nexus_science",
    "query": {
        "match": {
            "value": "additive manufacturing",
            "query_parser_config": {"default_fields": ["abstract", "title", "content"]}
        }
    },
    "collectors": [{"top_docs": {"limit": 5}}],
    "is_fieldnorms_scoring_enabled": False,
})
for document in documents:
    print(format_document(document) + '\n')

Title: Research on Gradient Additive Remanufacturing of Ultra-large Hot Forging Die Based on Automatic Wire Arc Additive Manufacturing Technology
Authors: [{'family': 'Hong', 'given': 'Xiaoying', 'sequence': 'first'}, {'family': 'Xiao', 'given': 'Guiqian', 'sequence': 'additional'}, {'family': 'Zhang', 'given': 'Yancheng', 'sequence': 'additional'}, {'family': 'Zhou', 'given': 'Jie', 'sequence': 'additional'}]
ID: {'dois': ['10.21203/rs.3.rs-212116/v1']}
Links: [{'cid': 'bafyb4idhyrmmv3y35kioq4crkwq2qgev22cgj2vz5td5r2hixermzdgyiu', 'extension': 'pdf', 'filesize': 2641562, 'md5': 'ff5707679c428842ea48c8699963a8a0', 'type': 'primary'}]
Abstract: <abstract><header>Abstract</header>
<p>In this paper, an automatic WAAM technology are proposed to realize the gradient additive remanufacturing of ultra-large hot forging dies. Firstly, a vertical ad

Title: Additive Manufacturing Technologies: Rapid Prototyping to Direct Digital Manufacturing
Authors: [{'name': 'Ian Gibson, David W. Rosen, Bren

Let's download PDFs. Helper function `download_document` stores files in the current directory.

In [4]:
for document in documents:
    await geck.download_document(document)

Below we have several more examples of search queries. More documentation on how to do queries to Summa can be found at https://izihawa.github.io/summa/core/query-dsl/

In [5]:
# Term search in science collection
documents = await summa_client.search_documents({
    "index_alias": "nexus_science",
    "query": {"term": {"field": "id.dois", "value": "10.1109/healthcom54947.2022.9982758"}},
    "collectors": [{"top_docs": {"limit": 1}}],
    "is_fieldnorms_scoring_enabled": False,
})
for document in documents:
    print(format_document(document) + '\n')

Title: Long COVID Diary - Design and Development of a Support Application for People with Long COVID
Authors: [{'family': 'Hausberger', 'given': 'Andreas', 'sequence': 'first'}, {'family': 'Baranyi', 'given': 'Rene', 'sequence': 'additional'}, {'family': 'Winkler', 'given': 'Sylvia', 'sequence': 'additional'}, {'family': 'Tappeiner', 'given': 'Barbara', 'sequence': 'additional'}, {'family': 'Grechenig', 'given': 'Thomas', 'sequence': 'additional'}]
ID: {'dois': ['10.1109/healthcom54947.2022.9982758']}
Links: [{'cid': 'bafyb4ic2ztwiiwp2i5tfen7kc3hcg6df7qmyjevnd76n6if52fm3mzjruq', 'extension': 'pdf', 'type': 'primary'}]
Abstract: Due to the COVID pandemic more and more people suffer from mid-to long-term problems associated with it. COVID can also cause a wide range of health issues over a longer period of time, which has been



In [6]:
# Complex query and count results too
documents = await summa_client.search_documents({
    "index_alias": "nexus_science",
    "query": {"boolean": {"subqueries": [{
        "occur": "should",
        "query": {
            "match": {
                "value": "hemoglobin",
                "query_parser_config": {"default_fields": ["title"]},
             },
        },
    }, {
        "occur": "should",
        "query": {
            "match": {
                "value": "fetal",
                "query_parser_config": {"default_fields": ["title"]},
            },
        },
    }]}},
    "collectors": [{"top_docs": {"limit": 5}}, {"count": {}}],
    "is_fieldnorms_scoring_enabled": False,
})
for document in documents:
    print(format_document(document) + '\n')

Title: Fetal hemoglobin-containing cells have the same mean corpuscular hemoglobin as cells without fetal hemoglobin: a reciprocal relationship between gamma- and beta-globin gene expression in normal subjects and in those with high fetal hemoglobin production
Authors: [{'family': 'Dover', 'given': 'GJ'}, {'family': 'Boyer', 'given': 'SH'}]
ID: {'dois': ['10.1182/blood.v69.4.1109.1109']}
Abstract: Abstract
We have developed methodology that allows comparison of the mean corpuscular hemoglobin (MCH) of fetal hemoglobin (HbF)-containing red cells (F cells) with the MCH of non-F cells from the sam

Title: Fetal hemoglobin-containing cells have the same mean corpuscular hemoglobin as cells without fetal hemoglobin: a reciprocal relationship between gamma- and beta-globin gene expression in normal subjects and in those with high fetal hemoglobin production
Authors: [{'family': 'Dover', 'given': 'GJ'}, {'family': 'Boyer', 'given': 'SH'}]
ID: {'dois': ['10.1182/blood.v69.4.1109.bloodjournal69