# Browsing projects
***
This **tutorial** shows an example of how to browse among the different entries (projects) in the **[MDDB](https://irb.mddbr.eu) database**. Each project contains metadata such as names of the authors, parameters of the simulation, etc.

Although the **examples** are presented **step by step** with associated information, it is extremely advisable to previously spend some time reading documentation about **Molecular Dynamics (MD)** and the generated **trajectories and metadata**, to get familiar with the terms used, especially for newcomers to the field. 

This workflow is based on the MDDB database **[REST API](https://irb.mddbr.eu/api/rest/docs/)**: 

***
**Version:** 1.0 (May 2025)
***
**Contributors:**  Adam Hospital, Daniel Beltrán, Aurélien Luciani, Genís Bayarri, Josep Lluís Gelpí, Modesto Orozco (IRB-Barcelona, Spain)
***
**Contact:** [daniel.beltran@irbbarcelona.org](mailto:daniel.beltran@irbbarcelona.org)
***

#### Import required libraries

In [1]:
import json, urllib
from urllib.request import urlopen
from math import floor, ceil

#### Set some constants

In [2]:
API_BASE_URL = "http://irb.mddbr.eu/api/rest/current"

#### Set a function to query the REST API

In [3]:
# Set a function to call the API
def query_api (url : str) -> dict:
    # Parse the URL in case it contains any HTTP control characters
    # Replace white spaces by the corresponding percent notation character
    parsed_url = url.replace(" ", "%20")
    with urllib.request.urlopen(parsed_url) as response:
        return json.loads(response.read().decode("utf-8"))

## The projects endpoint

In order to browse projects programatically from the REST API we use the **'projects'** endpoint. This means we make a call to the API URL by just adding '/projects' at the end. In this first block we call the projects endpoint with no additional parameters.<br />

The expected **response** includes 2 fields: filteredCount and projects. The filtered count stands for the number of projects in the database which were found by the query. In this case the number of projects corresponds to the total number of projects in the database, since there is no query parameters and thus there is no filter. The projects list contains data of projects found by the query. Not all found projects are included in the response. This is explanied in the next block.

Note that you can also do the query and see the response in [your own browser](http://irb.mddbr.eu/api/rest/current/projects).

In [4]:
# Set the URL for the projects endpoint
projects_url = API_BASE_URL + '/projects'
print('We query the API at ' + projects_url)

# Query the API
response = query_api(projects_url)
print(f'We found {response["filteredCount"]} projects')

We query the API at http://irb.mddbr.eu/api/rest/current/projects
We found 4146 projects


## Pagination

When the number of matched projects exceeds the **limit** the number of returned projects is reduced to this limit. This limit is set to 10 by default but it may be changed using the 'limit' parameter. However there is a **hard limit of 100** to not overload the API memory. Requesting more than 100 projects results in retrieving only 100.

In order to obtain all projects data **it is necessary to paginate**. This means we have to do several queries to get the response by pieces. To do so we must use also the 'page' parameter. For this example we use no query parameters, as we do in the previous block, to get all projects in the database. Our aim is to mine the accession values of all projects in the database.

In [5]:
# Set a list to store all the mined accession values
accessions = []
# Get the number of projects from the previous response
n_projects = response['filteredCount']
# Set the limit of projects per page
limit = 100
# Calculate the expected number of pages
pages = ceil(n_projects / limit)
# Iterate over pages
# Note that pages are 1-base numerated, and NOT 0-based
for page in range(1, pages + 1):
    print(f'Requesting page {page}/{pages}', end='\r')
    # Set the URL for the projects endpoint
    # This time include both limit and page parameters
    paginated_url = f'{projects_url}?limit={limit}&page={page}'
    # Query the API
    response = query_api(paginated_url)
    # Mine target data
    projects = response['projects']
    project_accessions = [ project['accession'] for project in projects ]
    accessions += project_accessions
    
print(f'We have mined {len(accessions)} accessions')

We have mined 4146 accessions


***
## Simple queries

The simple query is done with the **'search'** query argument. The API will search for all projects which contain the requested word(s) in the following fields:
   - Project name
   - Project description
   - Author names
   - Group names
   
In the following example we will query for projects including the term 'IRB Barcelona'. Note that the amount of projects found has fallen.
***

In [6]:
# Set the URL for the projects endpoint including a query parameter
projects_query_url = projects_url + '?search=IRB Barcelona'
print(f'We query the API at "{projects_query_url}"')

# Query the API
response = query_api(projects_query_url)
print(f'We found {response["filteredCount"]} projects')

We query the API at "http://irb.mddbr.eu/api/rest/current/projects?search=IRB Barcelona"
We found 2426 projects


In the following tutorial we will learn what data can be found in a project and how to mine it.

# If you have any question, please do not hesitate to ask

Contact: daniel.beltran@irbbarcelona.org