# Metadata module from the SPARC Python Client


This section introduces metadata search services which query an [Elasticsearch](https://www.elastic.co) endpoint.  This endpoint contains the same metadata that powers search on the SPARC portal.  There are two basic ways to retrieve information on SPARC datasets:

1. Retrieve a full list of datasets and associated metadata
2. Submit a valid Elasticsearch query to retrieve selected dataset information

Query results are returned as JSON in Elasticsearch format. Information on specific fields and content can be found in a [K-Core API Handbook](https://fdilab.gitbook.io/api-handbook/sparc-metadata-elasticsearch/sparc-dataset-elasticsearch-json-data-model). 

In order to get started, you must add information about an API Key you can register for.  Instructions for registering and receiving an API key are available as a [walk-through tutorial](https://fdilab.gitbook.io/api-handbook/sparc-k-core-api-overview/getting-started-with-sparc-apis)

### Pre-requites:
- Python version >=3.10: as requested by the [sparc.client](https://pypi.org/project/sparc.client/) package.
- An API Key from SciCrunch.org.Instructions for registering and receiving an API key are available as a [walk-through tutorial](https://fdilab.gitbook.io/api-handbook/sparc-k-core-api-overview/getting-started-with-sparc-apis)
- Have a look to the [NIH SPARC Python Client doc](https://github.com/nih-sparc/sparc.client/blob/main/docs/tutorial.ipynb) as a preliminary step (and learn how to set up the Configuration).

## Installation

The easiest way to obtain Python Sparc Client library (sparc.client) is to install the latest available version from PyPI:

In [None]:
!pip install sparc.client

## Setup Config File to Access Metadata Services

Add your API key to the config file located at config/config.ini

Edit the file to include the key you registered based on instructions above:

scicrunch_api_key=YOUR_API_KEY_HERE

## Getting Started and Listing Datasets

In [10]:
# import the necessary packages
import json

from sparc.client import SparcClient

Now we can instantiate the client to get started

In [11]:
# initiate sparc.client with config file that has your API key
client = SparcClient(connect=False, config_file='../config/config.ini')

The following example retrieves a list of datasets and associated metadata. Getting a list of datasets will also allow you to view the JSON structure and content of the metadata.

Query results are returned as JSON in Elasticsearch format. Information on specific fields and content can be found in a [K-Core API Handbook](https://fdilab.gitbook.io/api-handbook/sparc-metadata-elasticsearch/sparc-dataset-elasticsearch-json-data-model).

In [12]:
# Call function to list all published datasets
response = {}
response = client.metadata.list_datasets()

response

{'took': 7,
 'timed_out': False,
 '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0},
 'hits': {'total': 267,
  'max_score': 1.0,
  'hits': [{'_index': 'scr_017041-sparc_algolia_pr-ks-2022jul26',
    '_type': 'ks',
    '_id': '35',
    '_score': 1.0,
    '_ignored': ['pennsieve.versionPublishedAt.timestamp',
     'pennsieve.revisedAt.timestamp',
     'dates.updated.timestamp',
     'pennsieve.firstPublishedAt.timestamp',
     'dates.created.timestamp',
     'pennsieve.createdAt.timestamp',
     'pennsieve.updatedAt.timestamp'],
    '_source': {'item': {'types': [{'name': 'dataset', 'type': 'category'}],
      'contentTypes': [{'curie': 'ilx:0381348', 'name': 'product'}],
      'names': [{'nameType': 'Complete Data Set',
        'name': 'Brainstem Neuron Recording 2019'}],
      'statistics': {'files': {'count': '164'},
       'directory': {'count': '43'},
       'subjects': {'count': '13'},
       'bytes': {'count': '2208039789512'}},
      'keywords': [{'keyword': 'sw

## Search datasets via POST based query

The search is a basic search for dataset based on dataset identifier (in this example we will be retrieving SPARC datasets 10, 11, and 12 that are used in other sections of this tutorial).  The query is created as a JSON body that is submitted via POST. The dataset identifiers used in the example below refer to:

1. [Dataset with ID 10:](https://sparc.science/datasets/10?type=dataset) Spatial distribution and morphometric characterization of vagal afferents associated with the myenteric plexus of the rat stomach
2. [Dataset with ID 11:](https://sparc.science/datasets/11?type=dataset) Spatial distribution and morphometric characterization of vagal afferents (intramuscular arrays (IMAs)) within the longitudinal and circular muscle layers of the rat stomach
3. [Dataset with ID 12:](https://sparc.science/datasets/12?type=dataset) Spatial distribution and morphometric characterization of vagal efferents associated with the myenteric plexus of the rat stomach

In [14]:
# Set query as a JSON query string
response = {}
body = "{\"query\": {\"terms\": {\"_id\": [ \"10\", \"11\", \"12\" ] } } }"
body_json = json.loads(body)

# Get results from Elasticsearch
response = client.metadata.search_datasets(body_json)

# Check number of search hits available
number_of_records = response['hits']['total']
print('Number of records returned: ' + str(number_of_records))

# Can now process individual records as needed  

Number of records returned: 3


You now are able to submit queries to the Elasticsearch endpoint.  As described above, you can refer to Elasticsearch documentation to generate queries for your specific use case.