### 2-3b Finding files via user interface

#### Learning Objectives
Workshop attendees will learn how use the GA4GH Data Connect Service.  

What will participants do as part of the exercise?

 - Understanding how the schema provided by Data Connect can be used to generate a user interface specific to the data source.


 #### Icons in this Guide

 🖐 A hands-on section where you will code something or interact with the server
 
### Query files
This example uses the same query as in notebook 2-3a but provides a user interface to filter the query results without the user having to write an SQL query.

#### Step 1: Set up a Data Connect Client  

🖐 Run the next two cells to set up Data Connect and run a query to populate a dataframe. We won't display the dataframe yet.

In [1]:
from fasp.search import DataConnectClient
searchClient = DataConnectClient('https://publisher-data.publisher.dnastack.com/data-connect/')

In [2]:
query_all = '''SELECT f.sample_name, drs_id bam_drs_id, acc, population, mapped, sequencing_type
FROM collections.public_datasets.onek_genomes_ssd_drs s 
join collections.public_datasets.onek_genomes_sra_drs_files f on f.sample_name = s.su_submitter_id 
where filetype = 'bam'   
 '''
int_df = searchClient.run_query(query_all, returnType='dataframe')
print("Query complete. Continue with next step.")

Retrieving the query
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________
____Page7_______________
____Page8_______________
Query complete. Continue with next step.


### Set up and interactive widget to filter the query results
This uses the information from the table schemas to create an interactive user interface to fill the data.

The details of the next cell are not important unless you are a python programmer.

In [3]:
import ipywidgets as widgets
from ipywidgets import interact, interact_manual, IntRangeSlider

def getColValues(info, columns):
    enumVals = {}
    for column in columns:
        var = info['data_model']['properties'][column]
        valueList = []
        for value in var['oneOf']:
            valueList.append( (value['title'],value['const']) )
        enumVals[column] = valueList
    return enumVals
    
info1 = searchClient.list_table_info('thousand_genomes.onek_genomes.ssd_drs').schema
enumCols1 = getColValues(info1, ['population'])
info2 = searchClient.list_table_info('thousand_genomes.onek_genomes.sra_drs_files').schema
enumCols2 = getColValues(info2, ['sequencing_type','mapped'])
#print(enumCols1)
#print(enumCols2)

def filter_onek(
                 population=enumCols1['population'],
                 sequencing_type=enumCols2['sequencing_type'],
                 mapped=enumCols2['mapped']
                ):
    
    selected_df = int_df.loc[ (int_df['population'] == population) 
                            & (int_df['sequencing_type'] == sequencing_type)
                            & (int_df['mapped'] == mapped)]
    #drs_ids = selected_df['bam_drs_id'].tolist()
    return selected_df

### Try out the user interface
🖐 Run the following cell to display the interface
Make different selections from the dropdown menus to filter the data shown in the dataframe.

In [4]:
drs_list = interact(filter_onek,  
                 population=enumCols1['population'],
                 sequencing_type=enumCols2['sequencing_type'],
                 mapped=enumCols2['mapped']
                )

interactive(children=(Dropdown(description='population', options=(('African Caribbean in Barbados', 'ACB'), ('…

#### Key point:
This illustrates the value of a machine readable schema describing the data to
* Allow a user to understand a new datasource
* Allow a developer to create a user interface for those users using the information provided by the schema.