# Exercises for the IMPC API Online Tutorial
Before starting the exercises, rerun the cells below. This will import the necessary libraries

To complete the exercises, use the provided function templates. The exercises will build step-by-step on top of each other.

You can check your answer by opening the correct answer on the tutorial page.

In [None]:
# Import Python package.
from impc_api import solr_request, batch_solr_request

# Exercise 1: Getting Familiar with the Core
We will be working with the `genotype-phenotype` core. To familiarise yourself with the data, request everything from this core using the solr_request function and the `q` parameter.

In [None]:
# Modify the function below to do the exercise.
num_found, df = solr_request(
    ...
)

# Exercise 2: Requesting Three Documents
Let's try using the `rows` parameter. Request three documents from the `genotype-phenotype` core. You can modify the query from the previous exercise.

In [None]:
# Modify the function below to do the exercise.
num_found, df = solr_request(
    core='genotype-phenotype',
    ...
)

# Exercise 3: Selecting Specific Fields
As you can see, there are many fields. To focus on the ones we need, request only the following:

* marker_symbol
* marker_accession_id
* parameter_name
* parameter_stable_id
* p_value
* zygosity

Modify query from the Exercise 2 to request limited list of the fields above. Here is the list of comma-separated fields: `marker_symbol,marker_accession_id,parameter_name,parameter_stable_id,p_value,zygosity`

In [None]:
# Modify the function below to do the exercise.
num_found, df = solr_request(
    core='genotype-phenotype',
    params={
        'q': '*:*',
        'rows': 3,
        ...
    }
)

# Exercise 4: Filtering by Single Field
Let's now focus on a particular gene. In this example we will be using *Dclk1*. Filter the results so there only documents of this gene are displayed by modifying query from Exercise 3.

In [None]:
# Modify the function below to do the exercise.
num_found, df = solr_request(
    core='genotype-phenotype',
    params={
        'q': '*:*',
        'rows': 3,
        'fl': 'marker_symbol,marker_accession_id,parameter_name,parameter_stable_id,p_value,zygosity'
    }
)

# Exercise 5: Changing P-Value Threshold
Let's apply more strict p-value threshold, so that it is less than 1e-4. Modify query from the Exercise 3.

**Note:** Sometimes spelling may differ.
e.g. `p_value` is the name of the field in Solr, whereas "p-value" is the term used in real life.

In [None]:
# Modify the function below to do the exercise.
num_found, df = solr_request(
    core='genotype-phenotype',
    params={
        'q': '*:*',
        'rows': 3,
        'fl': 'marker_symbol,marker_accession_id,parameter_name,parameter_stable_id,p_value,zygosity'
    }
)

# Exercise 6: Applying Multiple Filters
Let's combine two filters from Exercise 4 and Exercise 5: `marker_symbol` and more strict p-value threshold, so that it is less than 1e-4.

In [None]:
# Modify the function below to do the exercise.
num_found, df = solr_request(
    core='genotype-phenotype',
    params={
        'q': '*:*',
        'rows': 3,
        'fl': 'marker_symbol,marker_accession_id,parameter_name,parameter_stable_id,p_value,zygosity'
    }
)

# Exercise 7: Explore Null Values
Run the query below and answer the questions: how many fields will be in the generated dataset? Why?

In [None]:
num_found, df = solr_request(
    core='statistical-result',
    params={
        'q': 'NOT mp_term_name:[* TO *]',
        'fl': 'marker_symbol,effect_size,p_value,mp_term_name',
        'rows': 3
    }
)

# Exercise 8: Download the Data
We have an example of a query using the `solr_request` function below. Download this data using the `batch_solr_request` function in JSON format.

In [None]:
# Example query.
num_found, df = solr_request(
    core='genotype-phenotype',
    params={
        'q': 'marker_symbol:Xrcc5',
        'fl': 'marker_symbol,marker_accession_id,parameter_name,parameter_stable_id,p_value,zygosity',
        'rows': 3
    }
)

# Exercise 9: Facet Request
Run the cell below and take a look at the output. How many categories are there?

In [None]:
num_found, df = solr_request(
    core='genotype-phenotype',
    params={
         "q": "*:*",
         "rows": 0,
         "facet": "on",
         "facet.field": "zygosity",
         "facet.limit": 15,
         "facet.mincount": 1,
    }
)

# Exercise 10: Iterate Over Genes
You have a list of genes. Run the script below and observe the result.

In [None]:
# Write genes to the Python list.
genes = ['Prkdc', 'Xrcc5', 'Xrcc4', 'Wrn']

# Iterate over list of genes.
df = batch_solr_request(
    core='genotype-phenotype',
    params={
        'q':'*:*',
        'fl': 'marker_symbol,mp_term_name,p_value',
        'field_list': genes,
        'field_type': 'marker_symbol'
    },
    download = False
)

display(df)