# Overview

This notebook is offered as a solution to the question 'What has been published about medical care'

The approach is to use ScispaCy with BERT to make embeddings of sentences and paragraphs, then use cosine similarity to find papers that match the topic. Autogenerated summaries are included for highly relevant papers.

We are given an overall topic, a paragraph explaination of the topic, and a list of specific sub-topics to answer. The dataset is ~47K papers as of now, and includes all the text, as well as author and reference information. 

# Details of Approach

1. Read in the metadata file. Filter out any duplicates, or papers that are missing title/abstract
2. use ScispaCy to create an embedding for each of the titles. Use the average of the word embeddings provided by Scispacy
3. Repeat '2' for the abstract
4. Repeat '2' for the topic description. 
5. Use cosine similarity to score the similarity of all titles and abstracts with the topic description
6. Split the data between Cov19 and non-Cov19 papers, and save the 1000 most relevant papers from each.
7. Load the body text for the 2000 chosen papers.
8. Some papers are missing the body text. Attempt to use the doi url to mine the text from the web
9. Use BERT and the sentence-transformation package to make sentence embeddings for every sentence in the body text

**Then for each subtopic**

10. Use keywords of the subtopic to filter out irrelevant papers. Keywords can be autogenerated with ScispaCy or user specified.
11. Use cosine similarity between each of the body text sentences and the sub-topic sentence.
12. Save papers that have a large number of sentences with high similarity scores. 
13. For each saved paper, look at the results section and pull out the two most relevant senteces to the subtopic. If results is missing, use the abstract instead.
14. Display results

# Acknowledgements
### Thanks Kaggle user xhulu for their work on loading and processing the json files
### Thanks Dr. Levine at Accenture for their work on mining missing text data from websites. Sadly he forgot his Kaggle username!

In [None]:
!pip install scispacy
!pip install -U sentence-transformers
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_ner_bc5cdr_md-0.2.4.tar.gz

In [None]:
import sys

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import json
from glob import glob
import gc

import scispacy
import spacy
import en_ner_bc5cdr_md
from bs4 import BeautifulSoup
from sklearn.metrics.pairwise import cosine_similarity

from wordcloud import WordCloud

pd.set_option('max_columns', 100)
pd.set_option('max_colwidth',200)
from IPython.core.display import display

In [None]:
#load the scispacy model relevant to diseases
nlp = spacy.load('en_ner_bc5cdr_md')

#load bert sentence transformer
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('bert-base-nli-mean-tokens')

In [None]:
#threshold for scores matching to general topic
MATCH_LIMIT = 1000

#threshold for scores matching the subtopics
SUBMATCH_THRESHOLD = 0.85

# Helper Functions

In [None]:
def json_reader(file):
    #takes a json file, processes the body, ref, and bib data into a dataframe
    #based off xhlulu's work at https://www.kaggle.com/xhlulu/cord-19-eda-parse-json-and-generate-clean-csv
    with open(file) as f:
        j = json.load(f)
        
    #format the body text so the sections are clear, but it's easy to view the whole thing
    body_text = '\n\n'.join([x['section'] + '\n\n' + x['text'] for n,x in enumerate(j['body_text'])])

    df = pd.DataFrame(index=[0], data={'body_text':body_text, 
                                            'paper_id': j['paper_id']})
    
    return df


def parse_folder(data_folder):
    filelist = glob('/kaggle/input/CORD-19-research-challenge/{0}/{0}/*'.format(data_folder))
    filelist.sort()
    print('{} has {} files'.format(data_folder, len(filelist)))

    df_ls=[]
    for n,file in enumerate(filelist):
        if n%1000==0:
            print(n,file[-46:])
        df = json_reader(file)
        df_ls.append(df)
    return pd.concat(df_ls)


def load_meta():
    meta = pd.read_csv('/kaggle/input/CORD-19-research-challenge/metadata.csv')
    meta.rename(columns={'sha':'paper_id'}, inplace=True)
    return meta


#go through each of the four folders of json files and put everything into one dataframe
#takes around 3-4min to complete
def combine_datasets():
    df_ls = []
    for folder in ['comm_use_subset', 'noncomm_use_subset', 'custom_license', 'biorxiv_medrxiv']:
        t = parse_folder(folder)
        df_ls.append(t)
    df = pd.concat(df_ls)
    
    meta = load_meta()
    df = meta.merge(df, on='paper_id', how='left')
    return df


def get_doc_vec(tokens):
    #combine word embeddings from a document into a single document vector
    #filter out any stop words like 'the', and remove any punction/numbers
    w_all = np.zeros(tokens[0].vector.shape)
    n=0
    for w in tokens:
        if (not w.is_stop) and (len(w)>1) and (not w.is_punct) and (not w.is_digit):
            w_all += w.vector
            n+=1
    return (w_all / n) if n>0 else np.zeros(tokens[0].vector.shape)


def process_all_docs(col,id_col):
    vecs = {}
    for n,row in df.iterrows():
        if n%5000==0:
            print(n)
        if isinstance(row[col], str)==False:
            print(row[col])
        if len(row[col]) > 0:
            vecs[row[id_col]] = get_doc_vec(nlp(row[col]))
    return vecs


def get_matching_papers(df, q_str, sent_df, keyword_list = []):
    q_nlp = nlp(q_str)
    
    #use nouns and objects from the question to find keywords and phrases
    if len(keyword_list) == 0:
        #print('keywords not provided, created some from subjects and objects of question')
        noun_ls = []
        for noun in q_nlp.noun_chunks:
            if ('obj' in noun.root.dep_ or 'subj' in noun.root.dep_ or noun.root.dep_ == 'appos') and len(str(noun.root)) > 1 and noun.root.is_stop == False:
                noun_ls.append(str(noun.root).lower())

        #also use any entities found in the text. Don't take the root of these
        noun_ls += [x.text for x in q_nlp.ents]
        keyword_list = list(set(noun_ls))
        #print('auto keywords are: {}'.format(keyword_list))

    key_condition = df['text'].str.contains(r'|'.join(keyword_list))

    #get similarity to all available sentences and make a dataframe with the sentences
    sent_sims = cosine_similarity(sent_df['vecs'].tolist(), model.encode([q_str], show_progress_bar=False))
    sent_df['score'] = sent_sims.max(axis=1)
    
    #filter out sentences that don't belong to papers which include the keywords
    sent_df = sent_df[sent_df['title'].isin(df[key_condition]['title'])]
    
    #filter out sentences with a low match score
    sent_df = sent_df[sent_df['score'] > SUBMATCH_THRESHOLD * sent_df['score'].max()]

    #sort by papers with a high number of relevant sentences and return the results
    return sent_df.groupby('title').agg({'sent':'count', 'score':'mean'}).sort_values(by='sent', ascending=False).reset_index().rename(columns={'sent':'relevant sentences'})


def get_text(url,abstract=False,body=True,bib=False):
    '''Returns the full text of a paper, given the source html, provided the paper is in the rough format of the Wiley Online Library
    Ex: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24357
    abstract = True will return abstract as part of the text
    body = True will return the body of the paper as part of the text
    bib = True will return the bibliography as part of the text

    Author: Aaron Levine
    '''
    text = ''
    try:
        html_text = !wget -qO- --timeout=60 $url
        soup = BeautifulSoup('\n'.join(html_text),'html.parser')
        if abstract:
            abstract_txt = '\n'.join([x.text for x in soup.find('div',{'class':'abstract-group'}).findChildren(recursive=False)])
            text += abstract_txt
        if body:
            body_txt = '\n'.join([x.text for x in soup.find('section',{'class':'article-section article-section__full'}).findChildren(recursive=False) if True not in [tag.has_attr('data-bib-id') for tag in x.find_all()]])
            text += body_txt
        if bib:
            bib_txt = '\n'.join([x.text for x in soup.find('section',{'class':'article-section article-section__full'}).findChildren(recursive=False) if True in [tag.has_attr('data-bib-id') for tag in x.find_all()]])
            text += bib_txt
    except:
        print('failed to load paper')

    if len(text) > 0:
        print('found the paper!')
    return text


def get_summary(df, q, title):
    if df[df['title'] == title]['body_text'].isna().values.item():
        text = df[df['title'] == title]['abstract'].values.item() #abstract might still have a results section, some parsers are weird
    else:
        text = df[df['title'] == title]['body_text'].values.item()

    if isinstance(text,float):
        print(text, title)
    sent_ls = []
    summary_text = ''
    if '\n\nResults\n\n' in text:
        #grab everything from first 'result section onward'
        results = text.split('\n\nResults\n\n')[1:]

        #cut off everything after results
        results[-1] = results[-1].split('\n\n')[0]
        results = ' '.join(results)

        #get the sentences
        sents = nlp(results).sents
        sent_ls = []
        for sent in sents:
            row = sent_df[(sent_df['title'] == title) & (sent_df['sent'].str.contains(sent.text, case=False, regex=False))]
            sent_ls.append(row)
    else: #no results, try just using the abstract
        results = df[df['title'] == title]['abstract'].values.item()

        #get the sentences
        sents = nlp(results).sents
        sent_ls = []
        for sent in sents:
            row = sent_df[(sent_df['title'] == title) & (sent_df['sent'].str.contains(sent.text, case=False, regex=False))]
            sent_ls.append(row)

    if len(sent_ls) > 0:
        result_df = pd.concat(sent_ls)
        result_df.drop_duplicates(subset='sent', inplace=True)
        sent_sims = cosine_similarity(result_df['vecs'].tolist(), model.encode([q], show_progress_bar=False))
        result_df['score'] = sent_sims.max(axis=1)
        summary_text = ' '.join(result_df.sort_values(by='score', ascending=False).head(2)['sent'].tolist())
    return summary_text


def get_results(q, keyword_list=None):
    color_indexer = {'1':9, '2':9}
    text_indexer = {'1':0, '2':0}
    cm = sns.light_palette("green", as_cmap=True)
    cm2 = sns.light_palette("blue", as_cmap=True)
    cm_list = [cm(i*0.1) for i in range(0,10)]
    cm2_list = [cm2(i*0.1) for i in range(0,10)]
    
    title_match = get_matching_papers(match_df, q, sent_df, keyword_list=kw)[['title', 'relevant sentences']].head(10)
    title_nocov_match = get_matching_papers(match_nocov_df, q, sent_df, keyword_list=kw)[['title', 'relevant sentences']].head(10)
    
    title_match['is_covid'] = True
    title_nocov_match['is_covid'] = False
    title_match = title_match.merge(match_df[['title','publish_time']], on='title', how='left')
    title_nocov_match = title_nocov_match.merge(match_nocov_df[['title','publish_time']], on='title', how='left')
    title_match['publish_time'] = title_match['publish_time'].str[:4].astype(float)
    title_nocov_match['publish_time'] = title_nocov_match['publish_time'].str[:4].astype(float)
    
    all_match = pd.concat([title_match,title_nocov_match]).reset_index(drop=True).sort_values(by='relevant sentences', ascending=False)
    all_match['summary'] = all_match.apply(lambda x: get_summary(match_df, q, x['title']) if x['is_covid'] else get_summary(match_nocov_df, q, x['title']), axis=1)
    all_match['display_score'] = all_match['relevant sentences'] + all_match['is_covid'] - (2020 - all_match['publish_time'])
    all_match = all_match.sort_values(by='display_score', ascending=False)
    display(all_match[['is_covid','relevant sentences', 'publish_time', 'title', 'summary']].style.hide_index()
            .apply(color_rows, cm=cm_list, cm2=cm2_list, d_idx=color_indexer,axis=1, subset=['is_covid', 'relevant sentences', 'publish_time'])
            .apply(color_text, d_idx=text_indexer, axis=1, subset=['is_covid', 'relevant sentences', 'publish_time']))
    
    
def color_rows(s, cm, cm2, d_idx):
    #takes in a row from a dataframe and applies necessary colors for display
    if s['is_covid']:
        style = ['background-color: {}'.format(cmap_to_hex(cm[d_idx['1']]))]*s.shape[0]
        d_idx['1'] -= 1
        return style
    else:
        style = ['background-color: {}'.format(cmap_to_hex(cm2[d_idx['2']]))]*s.shape[0]
        d_idx['2'] -= 1
        return style
def color_text(s, d_idx):
    if s['is_covid']:
        if d_idx['1'] < 5:
            style = ['color: white']*s.shape[0]
        else:
            style = ['color: black']*s.shape[0]
        d_idx['1'] += 1
        return style
    else:
        if d_idx['2'] < 5:
            style = ['color: white']*s.shape[0]
        else:
            style = ['color: black']*s.shape[0]
        d_idx['2'] += 1
        return style
    
    
def cmap_to_hex(rgb_color):
    [r, g, b] = [int(x*255) for x in rgb_color[:3]]
 
    r = hex(r).lstrip('0x')
    g = hex(g).lstrip('0x')
    b = hex(b).lstrip('0x')
    # re-write '7' to '07'
    r = (2 - len(r)) * '0' + r
    g = (2 - len(g)) * '0' + g
    b = (2 - len(b)) * '0' + b
 
    hex_color = '#' + r + g + b
    return hex_color

# Data Loading and Cleaning
#### Only need to load the meta data at frist, find relevant papers based on abstract and title
#### Handle Duplicate papers

In [None]:
df = load_meta()
df.shape

In [None]:
#drop duplicates and any publications missing abstracts
df.drop_duplicates(['title'], inplace=True)
df.dropna(subset=['abstract','title'], inplace=True)

#want an identifier for every row, but paper_id is missing from a lot of them
#cord_uid might work as well, but that was added after I started
df = df.reset_index()
df.rename(columns={'index':'uid'},inplace=True)

df.shape

# Topic Definition

First use the topic description to find relevant papers. Alternatively, could just load the body text and do sentence embeddings on all papers, but that drastically increases runtime and memory consumption.

For the overall similarity score, take the mean of the title and abstract similarity scores. This helps promote papers that are highly focused on the topic. Using the max instead of mean would be another option, but this leads to some papers that only briefly mention the topic without directly addressing it.

*"What has been published about medical care? What has been published concerning surge capacity and nursing homes? What has been published concerning efforts to inform allocation of scarce resources? What do we know about personal protective equipment? What has been published concerning alternative methods to advise on disease management? What has been published concerning processes of care? What do we know about the clinical characterization and management of the virus?"*

In [None]:
covid_selection = (df['abstract'].str.contains('covid-19',case=False)) | (df['title'].str.contains('covid-19',case=False)) | (df['abstract'].str.contains('sars-cov-2',case=False)) | (df['title'].str.contains('sars-cov-2',case=False))

In [None]:
q_str = 'What has been published about medical care? What has been published concerning surge capacity and nursing homes? What has been published concerning efforts to inform allocation of scarce resources? What do we know about personal protective equipment? What has been published concerning alternative methods to advise on disease management? What has been published concerning processes of care? What do we know about the clinical characterization and management of the virus?'
q_vec = [get_doc_vec(nlp(q_str))]

In [None]:
# Find Relevant Papers
# Define title and abstract document vectors by averaging the word vectors, after filtering stop words. 
# Next, calculate a similarity score of each document vector to the topic shown above. 
# Then average the title and abstract similarity scores. 
# Another option would be to combine the title and abstract, and compute a single document vector. 
# However, I like keeping them separate as it gives the title more weight in the final score

abstract_vectors = process_all_docs('abstract', 'uid')
title_vectors = process_all_docs('title', 'uid')

abstract_vals = list(abstract_vectors.values())
abstarct_vals = [v for v in abstract_vals if all(v==0)==False]

title_vals = list(title_vectors.values())
title_vals = [v for v in title_vals if all(v==0)==False]

abstract_sims = cosine_similarity(abstract_vals, q_vec)
title_sims = cosine_similarity(title_vals, q_vec)

sim_df = pd.concat([pd.Series(dict(zip(abstract_vectors.keys(), abstract_sims[:,0]))),
                    pd.Series(dict(zip(title_vectors.keys(), title_sims[:,0])))], axis=1).reset_index().rename(columns={'index':'uid'})
sim_df.rename(columns={0:'abstract_score', 1:'title_score'},inplace=True)
sim_df['mean_score'] = sim_df[['abstract_score', 'title_score']].mean(axis=1)

#merge the scores into the dataframe
df = df.merge(sim_df,on='uid',how='left')

#find papers with a high match score to the topic. 
match_df = df[covid_selection].sort_values(by='mean_score', ascending=False).head(MATCH_LIMIT)
match_nocov_df = df[covid_selection==False].sort_values(by='mean_score', ascending=False).head(MATCH_LIMIT)
#match_df = df[((df['mean_score'] > df['mean_score'].max()*MATCH_THRESHOLD))]

# Load Body Text
#### Use BERT sentence embeddings to find more precise matches to specific questions

In [None]:
# Load Body Text
# First load the body text for all matched papers. 
#Next, compute a vector for each of the sentences in each papers abstract and body_text. 
#Then define a similarity score for each sentence to a given question. Use SUBMATCH_THRESHOLD to define which sentences are highly relevant.
# Also parse each question for a set of keywords. Filter out any papers that don't contain any of these words

df_ls = []
for data_folder in ['comm_use_subset', 'noncomm_use_subset', 'custom_license', 'biorxiv_medrxiv']:
    filelist = glob('/kaggle/input/CORD-19-research-challenge/{0}/{0}/pdf_json/*'.format(data_folder))
    filelist.sort()
    print('{} has {} files'.format(data_folder, len(filelist)))

    for n,file in enumerate(filelist):
        if n%1000==0:
            print(n,file[-46:])
        t = json_reader(file)
        t = t[(t['paper_id'].isin(match_df['paper_id'])) | (t['paper_id'].isin(match_nocov_df['paper_id']))]
        df_ls.append(t[['paper_id','body_text']])
df_body = pd.concat(df_ls)
del df_ls
gc.collect()
match_df = match_df.merge(df_body, on='paper_id', how='left')
match_nocov_df = match_nocov_df.merge(df_body, on='paper_id', how='left')

 # Missing Body Text
 #### Some articles missing the body text can be filled in. The helper functions above contain a function for mining the url for the paper text. It doesn't work for every format, but it does fill in some of them.


In [None]:
print('Missng Text Before: {}'.format(match_df['body_text'].fillna('').apply(lambda x: len(x)==0).sum() + match_nocov_df['body_text'].fillna('').apply(lambda x: len(x)==0).sum()))

In [None]:
missed_text = match_df[match_df['body_text'].fillna('').apply(lambda x: len(x)==0)]['url'].apply(lambda x: get_text(x))
match_df.loc[match_df['has_full_text']==False,'body_text'] = missed_text

missed_text = match_nocov_df[match_nocov_df['body_text'].fillna('').apply(lambda x: len(x)==0)]['url'].apply(lambda x: get_text(x))
match_nocov_df.loc[match_nocov_df['has_full_text']==False,'body_text'] = missed_text

In [None]:
print('Missing Text After: {}'.format(match_df['body_text'].fillna('').apply(lambda x: len(x)==0).sum() + match_nocov_df['body_text'].fillna('').apply(lambda x: len(x)==0).sum()))

# Make Sentence Vectors

In [None]:
#make all text lower case to help with matching keywords
match_df['text'] = (match_df['abstract'] + match_df['body_text'].fillna('')).str.lower()
match_nocov_df['text'] = (match_nocov_df['abstract'] + match_nocov_df['body_text'].fillna('')).str.lower()

#make sentence vectors from the abstract and body text
#use spacy to split documents by sentences
#try using BERT to make embeddings of sentences
sent_data = {'sent':[], 'title':[], 'uid':[]}
for n,row in match_df.iterrows():
    if n%100==0:
        print(n)
    sents = nlp(row['text']).sents
    for s in sents:
        if len(s) > 0:
            #new_vec = get_doc_vec(nlp(str(s)))
            #if all(new_vec==0)==False:
                #sent_vecs.append(new_vec)
            sent_data['sent'].append(str(s))
            sent_data['title'].append(row['title'])
            sent_data['uid'].append(row['uid'])
            
for n,row in match_nocov_df.iterrows():
    if n%100==0:
        print(n)
    sents = nlp(row['text']).sents
    for s in sents:
        if len(s) > 0:
            #new_vec = get_doc_vec(nlp(str(s)))
            #if all(new_vec==0)==False:
                #sent_vecs.append(new_vec)
            sent_data['sent'].append(str(s))
            sent_data['title'].append(row['title'])
            sent_data['uid'].append(row['uid'])
            

print(len(sent_data['sent']))
sent_df = pd.DataFrame(sent_data)
print('making sentence embeddings with BERT')
sent_df['vecs'] = model.encode(sent_df['sent'].tolist())

# Save or Load Data Here

In [None]:
# match_df.to_csv('match_df.csv')
# match_nocov_df.to_csv('match_nocov_df.csv')
# sent_df.to_pickle('sent_vecs_bert_all.pkl')

#add kernel output to data tab, then load the match_df file
match_df = pd.read_csv('/kaggle/input/base-analysis-for-medical-care/match_df.csv')
sent_df = pd.read_pickle('/kaggle/input/base-analysis-for-medical-care/sent_vecs_bert_all.pkl')
match_nocov_df = pd.read_csv('/kaggle/input/base-analysis-for-medical-care/match_nocov_df.csv')

# Results

## Resources to support skilled nursing facilities and long term care facilities

In [None]:
q = 'Resources to support skilled nursing facilities and long term care facilities'
kw = ['nurse', 'long term care', 'facilities']
get_results(q,kw)

## Mobilization of surge medical staff to address shortages in overwhelmed communities

In [None]:
q = 'Mobilization of surge medical staff to address shortages in overwhelmed communities'
kw = ['mobilization', 'staff', 'overwhelmed']
get_results(q,kw)

## Age-adjusted mortality data for Acute Respiratory Distress Syndrome (ARDS) with/without other organ failure – particularly for viral etiologies

In [None]:
q = 'Age-adjusted mortality data for Acute Respiratory Distress Syndrome (ARDS) with/without other organ failure – particularly for viral etiologies'
kw = ['ards', 'acute respiratory distress syndrome']
get_results(q,kw)

## Extracorporeal membrane oxygenation (ECMO) outcomes data of COVID-19 patients

In [None]:
q = 'Extracorporeal membrane oxygenation (ECMO) outcomes data of COVID-19 patients'
kw = ['ecmo']
get_results(q,kw)

## Outcomes data for COVID-19 after mechanical ventilation adjusted for age

In [None]:
q = 'Outcomes data for COVID-19 after mechanical ventilation adjusted for age'
kw = ['mechanical ventilation']
get_results(q,kw)

## Knowledge of the frequency, manifestations, and course of extrapulmonary manifestations of COVID-19, including, but not limited to, possible cardiomyopathy and cardiac arrest

In [None]:
q = 'Knowledge of the frequency, manifestations, and course of extrapulmonary manifestations of COVID-19, including, but not limited to, possible cardiomyopathy and cardiac arrest'
kw = ['cardica arrest', 'cadiomyopathy','extrapulmonary']
get_results(q,kw)

## Application of regulatory standards (e.g., EUA, CLIA) and ability to adapt care to crisis standards of care level

In [None]:
q = 'Application of regulatory standards (e.g., EUA, CLIA) and ability to adapt care to crisis standards of care level'
kw = ['clia', 'care level', 'eua', 'care']
get_results(q,kw)

## Approaches for encouraging and facilitating the production of elastomeric respirators, which can save thousands of N95 masks

In [None]:
q = 'Approaches for encouraging and facilitating the production of elastomeric respirators, which can save thousands of N95 masks'
kw = ['elastomeric respirator']
get_results(q,kw)

## Best telemedicine practices, barriers and facilitators, and specific actions to remove/expand them within and across state boundaries

In [None]:
q = 'Best telemedicine practices, barriers and facilitators, and specific actions to remove/expand them within and across state boundaries'
kw = ['telemedicine']
get_results(q,kw)

## Guidance on the simple things people can do at home to take care of sick people and manage disease

In [None]:
q = 'Guidance on the simple things people can do at home to take care of sick people and manage disease'
kw = ['home']
get_results(q,kw)

## Oral medications that might potentially work

In [None]:
q = 'Oral medications that might potentially work'
kw = ['oral']
get_results(q,kw)

## Use of AI in real-time health care delivery to evaluate interventions, risk factors, and outcomes in a way that could not be done manually

In [None]:
q = 'Use of AI in real-time health care delivery to evaluate interventions, risk factors, and outcomes in a way that could not be done manually'
kw = ['artificial intelligence']
get_results(q,kw)

## Best practices and critical challenges and innovative solutions and technologies in hospital flow and organization, workforce protection, workforce allocation, community-based support resources, payment, and supply chain management to enhance capacity, efficiency, and outcomes

In [None]:
q = 'Best practices and critical challenges and innovative solutions and technologies in hospital flow and organization, workforce protection, workforce allocation, community-based support resources, payment, and supply chain management to enhance capacity, efficiency, and outcomes'
kw = [] #autogenerate
get_results(q,kw)

## Efforts to define the natural history of disease to inform clinical care, public health interventions, infection prevention control, transmission, and clinical trials

In [None]:
q = 'Efforts to define the natural history of disease to inform clinical care, public health interventions, infection prevention control, transmission, and clinical trials'
kw = ['history']
get_results(q,kw)

## Efforts to develop a core clinical outcome set to maximize usability of data across a range of trials

In [None]:
q = 'Efforts to develop a core clinical outcome set to maximize usability of data across a range of trials'
kw = ['outcome', 'trials']
get_results(q,kw)

## Efforts to determine adjunctive and supportive interventions that can improve the clinical outcomes of infected patients (e.g. steroids, high flow oxygen)

In [None]:
q = 'Efforts to determine adjunctive and supportive interventions that can improve the clinical outcomes of infected patients (e.g. steroids, high flow oxygen)'
kw = ['interventions']
get_results(q,kw)