# BTE Integration Possibilities and Suggestions

By Paul Gaudin

## 1. Functionality to allow 2+ inputs, intermediates, and outputs

Functionality I have been using in my use case notebooks often relies on getting the predictions from many input nodes, intermediate types, and output types. This, in turn, could look like a funciton that takes lists of input objects, and lists of intermediate types and output types, then concatenates the results into one single dataframe for the provided the result. 

Here is what the funciton would could look like: 

In [2]:
def predict_many(input_object_list, output_type_list, intermediate_node_list):
    df_list = []
    for input_object in input_object_list: 
        for output_type in output_type_list: 
            for inter in intermediate_node_list:
                try: 
                    print("Intermediate Node type running:")
                    print(inter)
                    fc = FindConnection(input_obj=input_object, output_obj=output_type, intermediate_nodes=[inter])
                    fc.connect(verbose=False)
                    df = fc.display_table_view()
                    rows = df.shape[0]
                    if(rows > 0):
                        df_list.append(df)
                except:
                    print("FAILED")
    if(len(df_list) > 0):
        return pd.concat(df_list)
    else:
        return None

## 2. Get Edges Out Count from Genes

This is a metric I use in the use cases as a rough estimate of gene specificity / how well researched a gene is. The functionality to obtain this from a list of gene symbols looks as follows: 

In [13]:
## node type list is list of all nodes
def get_gene_edges_out_count(gene_list, node_type_list):
    # dictionary that keeps track of all connections from a gene to any node type 
    connection_dict = {}
    for gene_symbol in gene_list:
        gene_found = False
        gene_query = ht.query(gene_symbol)['Gene']
        for i in gene_query:
            if(i['SYMBOL'].lower() == gene_symbol.lower()):
                gene = i
                gene_found = True
        if(gene_found == True):
            count = 0
            input_object = gene
            for x in node_type_list:
                try: 
                    ## only look at direct connections
                    fc = FindConnection(input_obj=input_object, output_obj=x, intermediate_nodes=None)
                    fc.connect(verbose=False)
                    df = fc.display_table_view()
                    rows = df.shape[0]
                    count = count + rows
                except: 
                    print("gene " + str(gene_symbol) + " for node intermediate " + str(x) + " failed")
            connection_dict[gene_symbol]  = count
        else:
            print(gene_symbol + ' could not be found')
            connection_dict[gene_symbol] = 'Unknown'
    return(connection_dict)


In [14]:
from biothings_explorer.user_query_dispatcher import FindConnection
from biothings_explorer.hint import Hint
ht = Hint()
node_type_list = (['Gene', 'SequenceVariant', 'ChemicalSubstance', 'Disease', 
                'MolecularActivity', 'BiologicalProcess', 'CellularComponent', 
                'Pathway', 'AnatomicalEntity', 'PhenotypicFeature'])

## for example purposes, include ASLD5. - not a real gene as far as I know
example_gene_counts = get_gene_edges_out_count(['F5','F3','XPA','ASLD5'], node_type_list)
example_gene_counts

ASLD5 could not be found


{'F5': 783, 'F3': 1903, 'XPA': 531, 'ASLD5': 'Unknown'}

### 2.1 Idea: Alternative for gene counts functions

Since there are only about 20k protein coding genes in humans, I imagine it would be possible to store the 'edges out' counts from genes, incorporate as new 'API', and periodically update these. This would in turn save people (or at least me personally) time when looking to obtain this "edges out" count from genes

## 3. Querying for Disease Symptoms