# Advanced Querying Techniques in Elasticsearch for BiorXiv Data Analysis

Elasticsearch, renowned for its speed and scalability, is an indispensable tool for data scientists and researchers working with large datasets like those from BiorXiv. BiorXiv provides a rich corpus of preprint publications in the life sciences, offering a wealth of data for analysis. In this guide, we delve into sophisticated querying techniques to extract meaningful insights from BiorXiv data using Elasticsearch. We'll explore everything from basic keyword searches to complex aggregations and filters.

### Establishing a Secure Connection to Elasticsearch


Establishing a secure connection to your Elasticsearch cluster is paramount. This ensures that your data interactions are encrypted and protected. Here’s a brief refresher on setting up a secure connection:

In [1]:
%%time

import ssl
import json
from elasticsearch import Elasticsearch
    

# Path to the CA certificate
ca_cert_path = '/workspace/repos/osl/rxiv-restapi/containers/esconfig/certs/http_ca.crt'
# Create an SSL context
ssl_context = ssl.create_default_context(cafile=ca_cert_path)
# Create a connection to Elasticsearch with authentication and SSL context
es = Elasticsearch(
    ["https://es:9200"],
    basic_auth=("elastic", "worksfine"),
    ssl_context=ssl_context
)

CPU times: user 145 ms, sys: 12.1 ms, total: 157 ms
Wall time: 190 ms


### Index JSON Data into Elasticsearch
**Before we can search, you must have the data indexed in Elasticsearch.** 
**Here's a simplified function to read the JSON file and index its contents.** 
**This example assumes that your JSON data is an array of objects, each representing a document to be indexed.**

In [2]:

def index_json_data(es: Elasticsearch, file_path: str, index_name: str) -> None:
    """
    Reads data from a JSON file and indexes it into Elasticsearch.

    Parameters:
    es (Elasticsearch): An Elasticsearch client instance.
    file_path (str): The path to the JSON file.
    index_name (str): The name of the Elasticsearch index where data will be stored.
    """
    # Load JSON data from the file
    with open(file_path, 'r', encoding='utf-8') as file:
        data = json.load(file)

    # Assuming `data` is a list of documents
    for doc in data:
        # Index each document
        res = es.index(index=index_name, document=doc)
        # print(res['result'])


### Indexing Data into Elasticsearch

In [3]:
%%time

# Assuming index_json_data is a previously defined function that indexes data from a JSON file to Elasticsearch
# Path to the JSON file containing data to be indexed
file_path = '/workspace/repos/osl/rxiv-restapi/docs/notebooks/data/biorxiv_2022-01-01_2024-01-11.json'
# Name of the Elasticsearch index
index_name = 'biorxiv'
# Index data from the specified JSON file into Elasticsearch
index_json_data(es, file_path, index_name)

CPU times: user 3min 30s, sys: 19.1 s, total: 3min 49s
Wall time: 27min 31s


## Querying Analyses for BiorXiv Data


### Create a Search Function

**After indexing the data, we can create a function to perform searches using the Elasticsearch client.**

**Note**: By default, searches return the top 10 matching hits. To page through a larger set of results, you can use the search API's from and size parameters. The from parameter defines the number of hits to skip, defaulting to 0. The size parameter is the maximum number of hits to return. Together, these two parameters define a page of results.

In [26]:
from elasticsearch import Elasticsearch
from typing import List, Dict, Any

def search_data(es: Elasticsearch, index_name: str, query: Dict[str, Any], page: int, page_size: int) -> List[Dict[str, Any]]:
    """
    Performs a search query in an Elasticsearch index and returns document hits.

    Parameters:
    - es (Elasticsearch): An Elasticsearch client instance.
    - index_name (str): The name of the Elasticsearch index to search in.
    - query (Dict[str, Any]): The search query in Elasticsearch Query DSL format.
    - page (int): The page number (starting from 1).
    - page_size (int): The number of results to return per page.

    Returns:
    - List[Dict[str, Any]]: A list of documents from the search results, each represented as a dictionary.
    """
    from_param = (page - 1) * page_size
    # Include 'from' and 'size' within the query body
    body = query
    body['from'] = from_param
    body['size'] = page_size

    try:
        response = es.search(index=index_name, body=body)
        documents = [hit['_source'] for hit in response['hits']['hits']]
        return documents
    except Exception as e:
        print(f"Search failed: {e}")
        return []

# Now, you can use this function to perform your search queries without encountering the deprecation warnings.


### Keyword Searches: The Basics
#### Search for documents with a specific title.


These query examples illustrate the flexibility of Elasticsearch's Query DSL to retrieve specific data based on various search criteria.

Creating a variety of Elasticsearch query combinations involves using different aspects of the Elasticsearch Query DSL (Domain Specific Language) to retrieve specific documents based on your criteria. Below are several examples of query combinations that can be used to retrieve keys and values from the JSON data provided

Each of these queries can be passed to the `search_data` function you've defined to retrieve documents from Elasticsearch based on the specified criteria. Remember to replace `index_name` with the name of your index when calling the function:

### Match Query


Keyword searches are the foundation of data retrieval in Elasticsearch. For instance, finding all BiorXiv papers related to CRISPR:


In [27]:
%%time 

query = {
      "query": {
          "match": {
              "abstract": "CRISPR"
          }
        
          
}

# Example usage: Fetch the first page of results, assuming 2 results per page
# See the scroll api for a more efficient way to request large data sets
page_number = 1
page_size = 2
results = search_data(es, index_name, query, page_number, page_size)

print("Results: ",len(results), "\n")

for result in results:
    print(result)

print("-" * 25)

Results:  2 

{'doi': '10.1101/2024.01.05.574328', 'title': 'CRISPR-repressed toxin-antitoxin provides population-level immunity against diverse anti-CRISPR elements', 'authors': 'Li, M.; Shu, X.; Wang, R.; Li, Z.; Xue, Q.; Liu, C.; Cheng, F.; Zhao, H.; Wang, J.; Liu, J.; Hu, C.; Li, J.; Ouyang, S.', 'author_corresponding': 'Ming Li', 'author_corresponding_institution': 'Institute of Microbiology, CAS', 'date': '2024-01-05', 'version': '1', 'license': 'cc_no', 'category': 'Microbiology', 'jatsxml': 'https://www.biorxiv.org/content/early/2024/01/05/2024.01.05.574328.source.xml', 'abstract': 'Prokaryotic CRISPR-Cas systems are highly vulnerable to phage-encoded anti-CRISPR (Acr) factors. How CRISPR-Cas systems protect themselves remains unclear. Here, we uncovered a broad-spectrum anti-anti-CRISPR strategy involving a phage-derived toxic protein. Transcription of this toxin is normally reppressed by the CRISPR-Cas effector, but is activated to halt cell division when the effector is inhi

### Term Query
Retrieve documents where the `license` field exactly matches the specified value.


In [60]:
%%time

query = {
      "query": {
          "term": {
              "license": {
                  "value": "cc_by_nc_nd"
            }
        }
    }
}

# Example usage: Fetch the thirdy page of results, assuming 6 results per page
# See the scroll api for a more efficient way to request large data sets
page_number = 3
page_size = 6
results = search_data(es, index_name, query, page_number, page_size)

print("Results: ",len(results), "\n")

for result in results:
    print(result)

print("-" * 25)

Results:  6 

{'doi': '10.1101/2022.01.10.475769', 'title': 'Efficient inactivation of African swine fever virus by a highly complexed iodine combined with compound organic acids', 'authors': 'Qi, M.; Pan, L.; Gao, Y.; Li, M.; Wang, Y.; Li, L.-F.; Ji, C.; Sun, Y.; Qiu, H.-J.', 'author_corresponding': 'Hua-Ji  Qiu', 'author_corresponding_institution': 'Harbin Veterinary Research Institute', 'date': '2022-01-11', 'version': '1', 'license': 'cc_by_nc_nd', 'category': 'Microbiology', 'jatsxml': 'https://www.biorxiv.org/content/early/2022/01/11/2022.01.10.475769.source.xml', 'abstract': 'African swine fever (ASF) is a highly contagious disease with high morbidity and mortality caused by African swine fever virus (ASFV). Cleaning and disinfection remain one of the most effective biosecurity measures to prevent and control the spread of ASFV. In this study, we evaluated the inactivation effects of highly complexed iodine (HPCI) combined with compound organic acids (COAs) against ASFV under di

### Range Query
Find documents published within a specific date range.


In [31]:
%%time

query = {
    "query": {
        "range": {
            "date": {
                "gte": "2022-12-01",
                "lte": "2022-12-31"
            }
        }
    }
}

# Example usage: Fetch the second page of results, assuming 2 results per page
# See the scroll api for a more efficient way to request large data sets
page_number = 2
page_size = 2
results = search_data(es, index_name, query, page_number, page_size)

print("Results: ",len(results), "\n")

for result in results:
    print(result)

print("-" * 25)

Results:  2 

{'doi': '10.1101/2022.01.14.476310', 'title': 'Non-target effects of ten essential oils on the egg parasitoid Trichogramma evanescens', 'authors': 'Van Oudenhove, L.; Cazier, A.; Fillaud, M.; Lavoir, A.-V.; Fatnassi, H.; Perez, G.; Calcagno, V.', 'author_corresponding': 'Louise  Van Oudenhove', 'author_corresponding_institution': "Institut Sophia Agrobiotech, INRAE, Universite Cote d'Azur, CNRS, France.", 'date': '2022-12-26', 'version': '5', 'license': 'cc_by_nc_nd', 'category': 'Ecology', 'jatsxml': 'https://www.biorxiv.org/content/early/2022/12/26/2022.01.14.476310.source.xml', 'abstract': 'Essential oils (EOs) are increasingly used as biopesticides due to their insecticidal potential. This study addresses their non-target effects on a biological control agent: the egg parasitoid Trichogramma evanescens. In particular, we tested whether EOs affected parasitoid fitness either directly, by decreasing pre-imaginal survival, or indirectly, by disrupting parasitoids orienta


### Bool Query
Combine multiple search criteria. For example, search for documents by authors in a specific category and with a specific license.


In [32]:
%%time

query = {
    "query": {
        "bool": {
            "must": [
                {"match": {"authors": "Colman-Lerner"}},
                {"match": {"category": "Systems Biology"}},
                {"term": {"license": "cc_by_nc_nd"}}
            ]
        }
    }        
}

# Example usage: Fetch the first page of results, assuming 1 results per page
# See the scroll api for a more efficient way to request large data sets
page_number = 1
page_size = 1
results = search_data(es, index_name, query, page_number, page_size)

print("Results: ",len(results), "\n")

for result in results:
    print(result)

print("-" * 25)

Results:  1 

{'doi': '10.1101/2022.10.06.511167', 'title': 'High selectivity of frequency induced transcriptional responses', 'authors': 'Givre, A.; Colman-Lerner, A.; Ponce-Dawson, S.', 'author_corresponding': 'Silvina Ponce-Dawson', 'author_corresponding_institution': 'School of Natural and Exact Sciences, University of Buenos Aires', 'date': '2022-10-07', 'version': '1', 'license': 'cc_by_nc_nd', 'category': 'Systems Biology', 'jatsxml': 'https://www.biorxiv.org/content/early/2022/10/07/2022.10.06.511167.source.xml', 'abstract': 'Cells continuously interact with their environment, detect its changes and generate responses accordingly. This requires interpreting the variations and, in many occasions, producing changes in gene expression. In this paper we use information theory and a simple transcription model to analyze the extent to which the resulting gene expression is able to identify and assess the intensity of extracellular stimuli when they are encoded in the amplitude, durat

### Match phrase Query
Use a match_phrase to search for documents with titles that contain specific patterns.


In [34]:
%%time

query = {
    "query": {
        "match_phrase": {
            "title": "Carm1 regulates the speed of"
        }
    }
}

# Example usage: Fetch the first page of results, assuming 6 results per page
# See the scroll api for a more efficient way to request large data sets
page_number = 1
page_size = 6
results = search_data(es, index_name, query, page_number, page_size)

print("Results: ",len(results), "\n")

for result in results:
    print(result)

print("-" * 25)

Results:  6 

{'doi': '10.1101/2022.10.03.510647', 'title': 'Carm1 regulates the speed of C/EBPa-induced transdifferentiation by a cofactor stealing mechanism', 'authors': 'Garcia, G. T.; Kowenz-Leutz, E.; Tian, T. V.; Klonizakis, A.; Lerner, J.; De Andres-Aguayo, L.; Berenguer, C.; Carmona, M. P.; Casadesus, M. V.; Bulteau, R.; Francesconi, M.; Leutz, A.; Zaret, K. S.; Zaret, K. S.; Peiro, S.', 'author_corresponding': 'Achim Leutz', 'author_corresponding_institution': 'MDC, Berlin', 'date': '2022-10-04', 'version': '1', 'license': 'cc_by_nc_nd', 'category': 'Cell Biology', 'jatsxml': 'https://www.biorxiv.org/content/early/2022/10/04/2022.10.03.510647.source.xml', 'abstract': 'Cell fate decisions are driven by lineage-restricted transcription factors but how they are regulated is incompletely understood. The C/EBP-induced B cell to macrophage transdifferentiation (BMT) is a powerful system to address this question. Here we describe that C/EBP with a single arginine mutation (C/EBPR35A)


### Multi-Match Query
Search for a text across multiple fields.


In [37]:
%%time

query = {
      "query": {
        "multi_match": {
            "query": "transcription",
            "fields": ["title", "abstract"]
        }
      }
    }

# Example usage: Fetch the first page of results, assuming 3 results per page
# See the scroll api for a more efficient way to request large data sets
page_number = 1
page_size = 3
results = search_data(es, index_name, query, page_number, page_size)

print("Results: ",len(results), "\n")

for result in results:
    print(result)

print("-" * 25)

Results:  3 

{'doi': '10.1101/2021.04.02.438179', 'title': 'Mechanism of transcription modulation by the transcription-repair coupling factor', 'authors': 'Paudel, B. P.; Xu, Z.-Q.; Jergic, S.; Oakley, A. J.; Sharma, N.; Brown, S. H.; Bouwer, J. C.; Lewis, P.; Dixon, N. E.; Van Oijen, A. M.; Ghodke, H.', 'author_corresponding': 'Harshad  Ghodke', 'author_corresponding_institution': 'University of Wollongong', 'date': '2022-02-22', 'version': '3', 'license': 'cc_no', 'category': 'Biophysics', 'jatsxml': 'https://www.biorxiv.org/content/early/2022/02/22/2021.04.02.438179.source.xml', 'abstract': 'Elongation by RNA polymerase is dynamically modulated by accessory factors. The transcription-repair coupling factor (TRCF) recognizes distressed RNAPs and either rescues transcription or initiates transcription termination. Precisely how TRCFs choose to execute either outcome remains unclear. With Escherichia coli as a model, we used single-molecule assays to study dynamic modulation of elonga

### Advanced Filtering: Beyond Keywords

Filtering allows for more refined searches, such as retrieving documents within a specific date range or by particular authors, enhancing the precision of your data analysis.


In [38]:
%%time

query = {
    "query": {
        "bool": {
            "must": {
                "match": {"title": "Molecular Biology"}
            },
            "filter": {
                "range": {
                    # publish_date
                    "date": {
                        "gte": "2022-01-01",
                        "lte": "2023-12-31"
                    }
                }
            }
        }
    }
}

# Example usage: Fetch the first page of results, assuming 5 results per page
# See the scroll api for a more efficient way to request large data sets
page_number = 1
page_size = 5
results = search_data(es, index_name, query, page_number, page_size)

print("Results: ",len(results), "\n")

for result in results:
    print(result)

print("-" * 25)

Results:  5 

{'doi': '10.1101/2023.05.24.542151', 'title': 'POMBOX: a fission yeast toolkit for molecular and synthetic biology', 'authors': 'Hebra, T.; Smrckova, H.; Elkatmis, B.; Prevorovsky, M.; Pluskal, T.', 'author_corresponding': 'Tomas Pluskal', 'author_corresponding_institution': 'Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Praha, Czech Republic', 'date': '2023-05-24', 'version': '1', 'license': 'cc_by_nc_nd', 'category': 'Synthetic Biology', 'jatsxml': 'https://www.biorxiv.org/content/early/2023/05/24/2023.05.24.542151.source.xml', 'abstract': 'Schizosaccharomyces pombe is a popular model organism in molecular biology and cell physiology. With its ease of genetic manipulation and growth, supported by in-depth functional annotation in the PomBase database and genome-wide metabolic models, S. pombe is an attractive option for synthetic biology applications. However, S. pombe currently lacks modular tools for generating genetic circuits with

### Aggregation Query


Aggregations are pivotal for summarizing data, enabling the analysis of trends across thousands of documents. For example, an aggregation query to count publications by category:

In [42]:
def search_data_with_aggregation(es: Elasticsearch, index_name: str, query: Dict[str, Any]) -> List[Dict[str, Any]]:
    try:
        response = es.search(index=index_name, body=query)
        if 'aggregations' in response:
            # Safely access the 'categories' aggregation results
            categories_agg = response.get('aggregations', {}).get('categories', {}).get('buckets', [])
            return categories_agg  # Directly return the categories aggregation results
        else:
            print("No aggregations found in the response.")
            return []
    except Exception as e:
        print(f"Search failed: {e}")
        return []

In [43]:
query = {
    "size": 0,  # This should be at the root level of the request body
    "query": {
        "match_all": {}  # Assuming you want to aggregate over all documents, use match_all
    },
    "aggs": {
        "categories": {
            "terms": {
                "field": "category.keyword",  # Ensure this field name matches your index
                "size": 10  # Adjust the size as needed
            }
        }
    }
}


#### Aggregate data, such as counting documents by category.

Checks if the response contains an 'aggregations' key and processes the results accordingly. For aggregation queries, it returns the aggregation results directly. 

In [44]:
%%time

# Assuming 'es', 'index_name', and 'query' are properly defined
results = search_data_with_aggregation(es, index_name, query)

# Display the aggregation results
if results:
    for category in results:
        print(f"Category: {category['key']}, Count: {category['doc_count']}")
else:
    print("No results found.")

Category: Neuroscience, Count: 60261
Category: Microbiology, Count: 27936
Category: Bioinformatics, Count: 27171
Category: Cell Biology, Count: 18837
Category: Biophysics, Count: 15609
Category: Evolutionary Biology, Count: 15576
Category: Biochemistry, Count: 13371
Category: Immunology, Count: 13251
Category: Cancer Biology, Count: 13236
Category: Ecology, Count: 13203
CPU times: user 5.66 ms, sys: 315 µs, total: 5.98 ms
Wall time: 24.7 ms


---

### Generating and Executing an Elasticsearch Query

#### Create a function that generates Elasticsearch queries according to specified logic operators and a date range:

In [58]:
from typing import List, Union, Dict, Any

def generate_es_queries(logic_operators: List[Union[str, List[str]]], start_date: str, end_date: str, abstract_field: str = "abstract", date_field: str = "date") -> Dict[str, Any]:
    """
    Generates an Elasticsearch query based on logic operators and a date range.

    Parameters:
    - logic_operators (List[Union[str, List[str]]]): A list of strings and/or lists representing the logic operators.
      Nested lists represent OR conditions within AND conditions.
    - start_date (str): The start date in 'YYYY-MM-DD' format.
    - end_date (str): The end date in 'YYYY-MM-DD' format.
    - abstract_field (str): The document field to search for abstract text.
    - date_field (str): The document field that contains the date.

    Returns:
    - Dict[str, Any]: An Elasticsearch query in DSL format.
    """
    must_conditions = []  # To store AND conditions
    for operator in logic_operators:
        if isinstance(operator, list):  # Handle OR conditions
            should_conditions = [{"match": {abstract_field: term}} for term in operator]
            must_conditions.append({
                "bool": {"should": should_conditions, "minimum_should_match": 1}
            })
        else:  # Handle AND conditions
            must_conditions.append({"match": {abstract_field: operator}})
    
    # Add date range filter
    must_conditions.append({
        "range": {
            date_field: {  # Use the provided date field name
                "gte": start_date,
                "lte": end_date,
                "format": "yyyy-MM-dd"
            }
        }
    })
    
    # Construct the final query
    es_query = {
        "query": {
            "bool": {
                "must": must_conditions
            }
        }
    }
    
    return es_query


#### Construct an Elasticsearch query using a predefined function generate_es_queries, based on logic operators and a specified date range. It then pretty prints the generated query, executes it to fetch results with pagination, and displays the results.


In [59]:
%%time

logic_operators = ['COVID-19', 'coronavirus', 'vaccine']

start_date = '2020-01-01'
end_date = '2024-12-31'
es_query = generate_es_queries(logic_operators, start_date, end_date)

# Pretty print the Elasticsearch query object
pretty_es_query = json.dumps(es_query, indent=4)
print(pretty_es_query)

page_number = 1
page_size = 5

results = search_data(es, index_name, es_query, page_number, page_size)

print("Results: ",len(results), "\n")

for result in results:
    print(result)

print("-" * 25)

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "abstract": "COVID-19"
                    }
                },
                {
                    "match": {
                        "abstract": "coronavirus"
                    }
                },
                {
                    "match": {
                        "abstract": "vaccine"
                    }
                },
                {
                    "range": {
                        "date": {
                            "gte": "2020-01-01",
                            "lte": "2024-12-31",
                            "format": "yyyy-MM-dd"
                        }
                    }
                }
            ]
        }
    }
}
Results:  5 

{'doi': '10.1101/2023.05.24.541850', 'title': 'Cross-Protection Induced by Highly Conserved Human B, CD4+, and CD8+ T Cell Epitopes-Based Coronavirus Vaccine Against Severe Infection, Di

---

### The biorxiv database was downloaded from the medrxivr library

The json file used is structure as a list of dictionaries downloaded by MedrdR from a specific date range from 2022 to 2024 and contain 000 papers of data

```python
[
  {
    "doi": "10.1101/043794",
    "title": "Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet",
    "authors": "Viner, C.; Ishak, C. A.; Johnson, J.; Walker, N. J.; Shi, H.; Sjöberg-Herrera, M. K.; Shen, S. Y.; Lardo, S. M.; Adams, D. J.; Ferguson-Smith, A. C.; De Carvalho, D. D.; Hainer, S. J.; Bailey, T. L.; Hoffman, M. M.",
    "author_corresponding": "Michael M. Hoffman",
    "author_corresponding_institution": "Princess Margaret Cancer Centre, Toronto, ON, Canada",
    "date": "2022-07-29",
    "version": "2",
    "license": "cc_by_nc_nd",
    "category": "Bioinformatics",
    "jatsxml": "https://www.biorxiv.org/content/early/2022/07/29/043794.source.xml",
    "abstract": "Transcription factors bind DNA in specific sequence contexts. In addition to distinguishing one nucleobase from another, some transcription factors can distinguish between unmodified and modified bases. Current models of transcription factor binding tend not take DNA modifications into account, while the recent few that do often have limitations. This makes a comprehensive and accurate profiling of transcription factor affinities difficult.\n\nHere, we developed methods to identify transcription factor binding sites in modified DNA. Our models expand the standard A/C/G/T DNA alphabet to include cytosine modifications. We developed Cytomod to create modified genomic sequences and enhanced the Multiple EM for Motif Elicitation (MEME) Suite by adding the capacity to handle custom alphabets. We adapted the well-established position weight matrix (PWM) model of transcription factor binding affinity to this expanded DNA alphabet.\n\nUsing these methods, we identified modification-sensitive transcription factor binding motifs. We confirmed established binding preferences, such as the preference of ZFP57 and C/EBP{beta} for methylated motifs and the preference of c-Myc for unmethylated E-box motifs. Using known binding preferences to tune model parameters, we discovered novel modified motifs for a wide array of transcription factors. Finally, we validated predicted binding preferences of OCT4 using cleavage under targets and release using nuclease (CUT&RUN) experiments across conventional, methylation-, and hydroxymethylation-enriched sequences. Our approach readily extends to other DNA modifications. As more genome-wide single-base resolution modification data becomes available, we expect that our method will yield insights into altered transcription factor binding affinities across many different modifications.",
    "published": "NA",
    "node": 2,
    "link_page": "https://www.biorxiv.org/content/10.1101/043794v2?versioned=TRUE",
    "link_pdf": "https://www.biorxiv.org/content/10.1101/043794v2.full.pdf"
  },
]
```

---

### Final Note: The Impact of Advanced Querying on Research


Advanced querying techniques in Elasticsearch empower researchers to navigate and analyze the vast repository of BiorXiv data with unprecedented depth and precision. From basic keyword searches to sophisticated aggregations and scripted calculations, Elasticsearch facilitates a comprehensive understanding of the life sciences landscape. By harnessing these querying capabilities, researchers can accelerate discovery, foster innovation, and contribute to the advancement of science.



---