## Proccessing a subset of COIVD-19 biomodels.

The BioModels database contains 28 published COVID-19 models. Each of these models is available in a structured format: SBML. We first process a subset of the 28 models to compute pairwise similarity scores with and without grounding. 

The six selected reproducible simulations studies targeting COVID-19 are listed [here](https://www.ebi.ac.uk/biomodels/covid-19).

In [None]:
from mira.sources import biomodels
from mira.metamodel.comparison import *
from mira.metamodel.template_model import *
from mira.metamodel.templates import *
from mira.sources import biomodels

from itertools import combinations
from copy import deepcopy
from tabulate import tabulate

import pandas as pd
import matplotlib.pyplot as plt


%env MIRA_REST_URL=http://34.230.33.149:8771
rc = get_dkg_refinement_closure()

env: MIRA_REST_URL=http://34.230.33.149:8771


In [None]:
SUBSET_MODEL_LIST = ["BIOMD0000000955", "BIOMD0000000956", "BIOMD0000000957","BIOMD0000000958","BIOMD0000000960","BIOMD0000000962"]

tm_covid_subset_grounding_list = []
for covid_model in SUBSET_MODEL_LIST:
    tm_covid_subset_grounding_list.append(biomodels.get_template_model(covid_model))

model_id_subset_name_mapping = {id:tm.annotations.name.split(' ')[0] for id,tm in enumerate(tm_covid_subset_grounding_list)}

## We convert the subset of COVID-19 biomodels into MIRA template models. We then compute pairwise similarity scores between each grounded model.

Similarity scores between template models are calculated by comparing the nodes of each template model. If the nodes are equal, then we add 1 to the score, if the first template model's node is a reinfement of the second template model's node, we add 0.5 to the score. The similarity score is achieved by dividing the total score by the number of nodes in the larger template model.   


In [None]:
tm_covid_subset_comparison = TemplateModelComparison(tm_covid_subset_grounding_list,refinement_func=rc.is_ontological_child)
grounded_scores = tm_covid_subset_comparison.model_comparison.get_similarity_scores()

grounded_scores_df_list = [{'Model1':model_id_subset_name_mapping[d['models'][0]], 'Model2':model_id_subset_name_mapping[d['models'][1]], 'Similarity Score':d['score']} for d in grounded_scores]
df_grounded = pd.DataFrame(grounded_scores_df_list)
df_grounded

## We visualize the difference between the third (BIOMD0000000957) and sixth (BIOMD0000000962) grounded covid models 

In [None]:
print(tm_covid_subset_comparison.template_models[2].annotations.name)
print(tm_covid_subset_comparison.template_models[5].annotations.name)

In [None]:
TemplateModelDelta.for_jupyter(tm_covid_subset_comparison.template_models[2],tm_covid_subset_comparison.template_models[5],
                                               rc.is_ontological_child, args="-Grankdir=TB")

## We then remove groundings from each of the template models.

With groundings from each template model, we expect the pair-wise template model similarity scores to decrease due to the refinements dissapearing. This is because in the presence of groundings, the confirmed concept node would be classified as equal between the two template models. However, without groundings, the shared concept node between the two models cannot be classified as equal between the two models. 

In [None]:
tm_covid_subset_no_grounding_list = [] 
for tm in tm_covid_subset_grounding_list:
    copied_tm = deepcopy(tm)
    for template in copied_tm.templates:
        for concept in template.get_concepts():
            concept.identifiers = {}
            concept.context = {}
    tm_covid_subset_no_grounding_list.append(copied_tm)

## Compute pairwise similarity scores between each ungrounded model

We compare grounded and ungrounded model similarity scores.

In [None]:
tm_covid_subset_comparison_copy = TemplateModelComparison(tm_covid_subset_no_grounding_list,refinement_func=rc.is_ontological_child)
ungrounded_scores = tm_covid_subset_comparison_copy.model_comparison.get_similarity_scores()

list_of_both_subset = []

for grounded_score,ungrounded_score in zip(grounded_scores,ungrounded_scores):
    list_of_both_subset.append({'Model1':model_id_subset_name_mapping[grounded_score['models'][0]],
                         'Model2':model_id_subset_name_mapping[grounded_score['models'][1]],
                         'Similarity Score with Grounding':grounded_score['score'],
                        'Similarity Score without Grounding':ungrounded_score['score']})
                    

no_ground_df = pd.DataFrame(list_of_both_subset)
no_ground_df

## Visualize the difference between the two selected models with groundings now removed.

In [None]:
TemplateModelDelta.for_jupyter(tm_covid_subset_comparison_copy.template_models[2],tm_covid_subset_comparison_copy.template_models[5],
                                               rc.is_ontological_child,args="-Grankdir=TB")

In [None]:
print(f"The similarity score between models 2 and 5 when grounded is {grounded_scores[11]['score']}. The similarity score between the models when ungrounded is {list_of_both_subset[11]['Similarity Score without Grounding']}.")

In [None]:
lower_count = 0 
higher_count = 0
same_count = 0
for model_comparison in list_of_both_subset:
    if model_comparison['Similarity Score without Grounding'] < model_comparison['Similarity Score with Grounding']:
        lower_count += 1
    elif model_comparison['Similarity Score without Grounding'] > model_comparison['Similarity Score with Grounding']:
        higher_count += 1
    elif model_comparison['Similarity Score without Grounding'] == model_comparison['Similarity Score with Grounding']:
        same_count += 1
        
print(f"Out of {len(list_of_both_subset)} pairs of models, {lower_count} pairs of models have reduced similarity scores with \
groundings removed. {higher_count} pairs of models have increased similarity scores without groundings. {same_count} pairs of models \
have the same similarity scores with groundings removed.")

## We then compute pairwise similarity scores between 24 of the 26 listed COVID-19 models in the BioModels database. 

The entire database of models can be found [here](https://www.ebi.ac.uk/biomodels/search?query=submitter_keywords:COVID-19&domain=biomodels). 

In [None]:
ALL_MODEL_LIST = ["BIOMD0000000955", "BIOMD0000000956", "BIOMD0000000957","BIOMD0000000958","BIOMD0000000960","BIOMD0000000962",
                    "BIOMD0000000963", 
                    "BIOMD0000000964", "BIOMD0000000969", "BIOMD0000000970","BIOMD0000000971","BIOMD0000000972",
                    "BIOMD0000000974","BIOMD0000000976", 
                    "BIOMD0000000977", "BIOMD0000000978", "BIOMD0000000979", "BIOMD0000000980", "BIOMD0000000981",
                    "BIOMD0000000982",
                    "BIOMD0000000983",
                    "BIOMD0000000984","BIOMD0000000988","BIOMD0000000991"]


In [None]:
tm_covid_all_grounding_list = []
for covid_model in ALL_MODEL_LIST:
    tm_covid_all_grounding_list.append(biomodels.get_template_model(covid_model))

model_id_all_name_mapping = {id:tm.annotations.name.split(' ')[0] for id,tm in enumerate(tm_covid_all_grounding_list)}

In [None]:
tm_covid_all_comparison = TemplateModelComparison(tm_covid_all_grounding_list,refinement_func=rc.is_ontological_child)
all_grounded_scores = tm_covid_all_comparison.model_comparison.get_similarity_scores()

In [None]:
tm_covid_all_no_grounding_list = [] 
for tm in tm_covid_all_grounding_list:
    copied_tm = deepcopy(tm)
    for template in copied_tm.templates:
        for concept in template.get_concepts():
            concept.identifiers = {}
            concept.context = {}
    tm_covid_all_no_grounding_list.append(copied_tm)

In [None]:
tm_covid_all_comparison_copy = TemplateModelComparison(tm_covid_all_no_grounding_list,refinement_func=rc.is_ontological_child)
all_ungrounded_scores = tm_covid_all_comparison_copy.model_comparison.get_similarity_scores()

list_of_both_all = []
for grounded_score,ungrounded_score in zip(all_grounded_scores,all_ungrounded_scores):
    list_of_both_all.append({'Model1':model_id_all_name_mapping[grounded_score['models'][0]],
                         'Model2':model_id_all_name_mapping[grounded_score['models'][1]],
                         'Similarity Score with Grounding':grounded_score['score'],
                        'Similarity Score without Grounding':ungrounded_score['score']})

In [None]:
after_sim_lower_count = 0
after_sim_higher_count = 0
after_sim_same = 0 
for model_comparison in list_of_both_all:
    if model_comparison['Similarity Score without Grounding'] < model_comparison['Similarity Score with Grounding']:
        after_sim_lower_count += 1
    elif model_comparison['Similarity Score without Grounding'] > model_comparison['Similarity Score with Grounding']:
        after_sim_higher_count += 1
    elif model_comparison['Similarity Score without Grounding'] == model_comparison['Similarity Score with Grounding']:
        after_sim_same += 1

In [None]:
print(f"Removing groundings led to a decrease in pairwise similarity scores for {after_sim_lower_count} or {round(after_sim_lower_count /len(list_of_both_all),2)}% of the {len(list_of_both_all)} pairwise model comparisons.")

In [None]:
print(f"Removing groundings led to an increase in pairwise similarity scores for {after_sim_higher_count} or {round(after_sim_higher_count/len(list_of_both_all),2)}% of the {len(list_of_both_all)} pairwise model comparisons.")

In [None]:
print(f"Removing groundings led to no change in pairwise similarity scores for {after_sim_same} or {round(after_sim_same/len(list_of_both_all),2)}% of the {len(list_of_both_all)} pairwise model comparisons.")

In [None]:
diffs = []
for model_comparison in list_of_both_all:
    diffs.append(model_comparison['Similarity Score with Grounding'] - model_comparison['Similarity Score without Grounding'])

## We then plot the difference of similarity scores before and after grounding has been removed

In [None]:
plt.boxplot(diffs, showmeans=True)
plt.ylabel('Model comparison score difference\n (with DKG - without DKG)')
plt.xlabel('COVID-19 epidemiology model pairs')
plt.xticks([])
plt.plot([-1, 2], [0, 0], 'k--')
plt.xlim([0.5, 1.5])

In [None]:
plt.boxplot([abs(d) for d in diffs], showmeans=True)
plt.ylabel('Model comparison score difference\n abs(with DKG - without DKG)')
plt.xlabel('COVID-19 epidemiology model pairs')
plt.xticks([])
plt.xlim([0.5, 1.5])