# CORD 19 Task 2: Population Studies

![](http://cdn.pixabay.com/photo/2020/03/17/22/29/coronavirus-4942077_1280.jpg)

The CORD19 challenge is an excellent proving ground for modern natural language processing. The challenge: to build machine-generated summaries of the Covid 19 research literature to assist in further research. The task is atypical of the average kaggle competition, as there are no training examples (other than a handful of example summaries) against which to train a supervised model, and no holdout test set against which to score submissions. Competitors must therefore develop unsupervised methods to generate the target output and define their own means of evaluating this output. This setting is typical of the broader challenges that nlp practitioners face "in the wild": choosing a latent space capable of representing the seemingly limitless variability of language data that fits a given task, and similarly designing an output space that meets the task specification.

## contents

1. [Approach](#Approach)
2. [Results](#Results)
3. [Pros](#Pros)
4. [Cons](#Cons)
5. [Next Steps](#Next-Steps)
6. [Notes on Study Design](#Notes-on-Study-Design)
7. [Example - Combatting Resource Failures](#Example---Combatting-Resource-Failures)

# Approach

This notebook describes a solution capable of generating the summary tables inf the format specified for the task. The approach is divided into search and summary stages, with a separate mdoel / tool for both of these stages:

### Search Tool:

The search tool identifies papers that answer a given research question in two steps:

1. **Regex search**: Articles of interest are identified by a simple regular expression keyword search of the article abstracts
2. **Sentence-level TandA QA**: Abstracts identified in stage one are divided into sentences, and these sentences are scored based on the likelihood that they answer a generalised research question. Logits generated by the [Amazon Alexa Team's RoBERTa-based TandA model](https://github.com/alexa/wqa_tanda) are used to define these scores.  Those papers whose abstracts contain scores above a certain threshold are deemed to answer the research question.

### Summary Tool:

Full text from each article identified in stage one is taken and analysed for the following information:

* **Study Design**: regular expressions highlight explicit mentions of a specific study design in the text, and other features that may relate to a given study design, for example statistical analysis terms (e.g. "p<", "ci") or the mention of modelling methods (e.g. "regresion", "SVM"). These are used to determine the most likely study design.
* **Measure of Evidence**: [Spacy](https://spacy.io/)'s POS and NER tools are used to highlight word tokens relating to quantities of study subjects (e.g. "patients", "participants", "samples") or study locations mentioned in the text. Tokens identified in this process are cross-referenced with their overall frequency in the text; the quantities of tokens demed significant are returned as measures of evidence (e.g. "100 patients", "1000 participants").
* **Challenge**: Similar to the second stage of the search process, a TandA question-answering model is used to score the likelihood that each sentence in the abstract text answers the question **"what is the problem / issue / challenge?"** (this can be customised by the user) - the highest scoring sentence is used as the **challenge** feature
* **Solution**: Again, sentences from the full text of the paper are scored based on the likelihood they answer a question, which defaults to **"what action should be taken?"** - the highest scoring sentence is used as the **solution** feature.
* **Addressed Population**: regular expressions are used to identify keywords relating to gropus (e.g. "workers", "patients", "groups") - POS tagging rules are then used to parse descriptions from the word tokens immediately following or preceeding these keywords to identify population descriptions (e.g. "**mental health** workers...", "patients **with diabetes**"). The most common of the group descriptions are identified as addressed populations.

# Results

The results of the search / summarization process can be seen in each of the csv files in the output to this notebook. All have been prepared within the kaggle notebook environment. An example of search process is featured below, that produces the summary table for "RESEARCH QUESTION", with comments on the output for that task and a more detailed explanation of the search and summary results. Scripts used to build the summary tables for the other research questions can be found in the following notebooks:

* [Modes of communicating with target high-risk populations (elderly, health care workers)](https://www.kaggle.com/dustyturner/modes-of-communicating-with-target-communities)
* [Management of patients who are underhoused or otherwise lower socioeconomic status](https://www.kaggle.com/dustyturner/management-of-patients-who-are-underhoused)
* [What are ways to create hospital infrastructure to prevent nosocomial outbreaks and protect uninfected patients?](https://www.kaggle.com/dustyturner/hospital-infrastructure-to-prevent-outbreaks)
* [Methods to control the spread in communities, barriers to compliance](https://www.kaggle.com/dustyturner/methods-to-control-spread-in-communities)

This notebook and the above can all be forked and run to reproduce the output results. The author's [tanda_search_qa_tool](https://github.com/samrelins/tanda_search_qa_tool) repository contains the code for the search and summary tools as well as various helper functions for preprocessing of the metadata and document texts.

**Please Note:** The results provided are an attempt to trade off specificity and accuracy of the summaries with quantity, and to demonstrate the results of this method un altered / curated. With a greater number of queries and less specific search criteria, many more potentially useful studies can be identified and included with these results.


# Pros

* The approach works!: It is capable of parsing the entire CORD dataset to highlight the information required in the summary tables
* Simple methodology: each stage of the task is based on a process that can be easily understood and explained to lay-people
* Flexible: the different stages of the search and summary process allow for considerable flexibility, meaning results can be tailored to a wide range of research questions, beyond that of the current task specification
* Visible: the tools are capable of producing visual cues to assist in refining the search process and explaining the results
* Potential for fine-tuning: the TandA model is designed to be easily adapted to new tasks with minimal labled data. The simplicity of the sentence-level QA task offers the potential to produce labled training data to fine-tune the model and better recognise features of the covid-19 literature

# Cons

* Computationally intensive: the question answering elements of the search and summary process require individual sentences to be analysed by a RoBERTa base model and token-level POS and NER tagging for each paper in the summary tables. As a result, the search process is slow and would not be suitable as an on-demand search-engine based solution like a number of the task 1 submissions
* Reliant on CORD dataset: the summary tool is unable to produce results for papers that are not featured in the json parses provided in the competition data. As such, a significant portion of promising results from the search process cannot be summarised and are not included in the results. Inaccuracies in the parsing can also cause some unusual features in the output tables.
* Reliant on user input: producing summaries is still heavily reliant on user input to specify keywords, search questions that identify relevant publications, and parameters to define features in the summary data. In general, the model can't be thought of as independently generating the summaries


# Next Steps

As highlighted above, the TandA model was conceived as an answer to training question answering models in situations where task-specific training data is scarce. As such, this approach has potential for significant improvement by producing labled question / answer sentence pair examples from the covid-19 literature. Given that sentence-level QA is a binary classification task, it is relatively easy to procedurally generate training examples that can be used to fine-tune the model. This approach offers potentially large improvements to the quality of the search results.

Whilst developing this solution, attempts were made to use the summary examples provided in the competition data as training data, aiming to develop classification models to identify the study design and important features from publications. The small number of examples, and inconsistencies in the labeling prevented this from being a successful approach. As greater quantities of quality paper summaries are produced, this may become a viable means to parse information from the papers that has a greater basis on language understanding, and is less reliant on keyword tagging.

# Notes on Study Design

Significant time was spent approaching the problem of identifying study design; so much so that it merits special attention. 

The author is not an epidemiologist, nor has any significant scientific research experience, and so is well versed in the specifics of medical research design. Given this, the example summaries and descriptions of the study designs were heavily relied upon when designing this part of the summary tool. This presented some considerable confusion given inconsistencies between assigned labels, the descriptions, and in several cases with the author's own description of their study design! Also, the descriptions did not seem to fit with research areas other than medicine / epidemiology - good examples were papers relating to biochemistry or molecular biology primary studies.

In an effort to simplify matters the following decisions were taken:

* General Rule: the more specific study designs - **systematic review and meta-analysis, prospective / retrospective observational studies, cross sectional studies, and case series** -  were only assigned to a paper if the authors explicitly stated their study design
* Expert Review: The description of the "expert review" study type states "provides quantitative secondary data" - this was contrary to the majority of such studies in the summary tables that included no such quantitative data. Moreover, the descriptions of the study designs did not allow for studies that were not a structured literature review, but did not include any numerical analysis. On this basis, a decision was taken to follow the examples in the summary tables and label non-numerate studies without a structured literature review component as "expert review"
* Analytical Studies: a large number of papers in the corpora were clearly quantitative primary studies, with an experimental method, quantitative results and analysis, but did not fit any of the descriptions of quantitative studies. Given the authors did not specify any of the prescribed study designs, it did not seem appropriate to "force a square peg into a round hole" so to speak. As such, studies with these features were labled "analytical study"

It should be stressed that these decisions were taken based on intuition and can easily be adjusted with more specific feedback and guidance.

# Example - Combatting Resource Failures

The following is a quick walkthrough of the usage and features of the search and summarizer tools, answering the research question **What are recommendations for combating/overcoming resource failures?**

In [1]:
!wget https://github.com/samrelins/tanda_search_qa_tool/archive/master.zip
!unzip master.zip
!wget https://wqa-public.s3.amazonaws.com/tanda-aaai-2020/models/tanda_roberta_base_asnq.tar
%mkdir tanda_roberta_base_asnq
!tar xvf tanda_roberta_base_asnq.tar
%mv /kaggle/working/models /kaggle/working/tanda_roberta_base_asnq/
%mv /kaggle/working/tanda_search_qa_tool-master/* /kaggle/working/
%rm -rf tanda_roberta_base_asnq.tar master.zip

--2020-07-22 14:10:09--  https://github.com/samrelins/tanda_search_qa_tool/archive/master.zip
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/samrelins/tanda_search_qa_tool/zip/master [following]
--2020-07-22 14:10:10--  https://codeload.github.com/samrelins/tanda_search_qa_tool/zip/master
Resolving codeload.github.com (codeload.github.com)... 140.82.114.10
Connecting to codeload.github.com (codeload.github.com)|140.82.114.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘master.zip’

master.zip              [  <=>               ] 563.96K  1.34MB/s    in 0.4s    

2020-07-22 14:10:10 (1.34 MB/s) - ‘master.zip’ saved [577497]

Archive:  master.zip
dc98e4112de54c5283cda1e1716a7ecb73b7f170
   creating: tanda_search_qa_tool-master/
  inflating: t

In [2]:
import numpy as np
import pandas as pd 
from cord_search_qa_tool import CordSearchQATool
from cord_result_summarizer import CordResultSummarizer
from summarizer_helpers import *
from prep_metadata import add_missing_abstracts
from IPython.core.display import display, HTML

pd.set_option('display.max_colwidth', None)

First, the metadata is pre-processed. Papers without abstracts are identified and the first 1500 or more characters from the full text of the paper (if available) are used as an abstract.

In [3]:
data_dir = "/kaggle/input/CORD-19-research-challenge/"
meta = add_missing_abstracts(data_dir)

  if (await self.run_code(code, result,  async_=asy)):


86260 entries related to covid-19
34325 without an abstract
9173 of these have json text data
Creating abstracts from json text data for these entries...


  for entry in meta[covid_related & no_abstract & has_json_files].itertuples():


8811 abstracts added from json text


Next, the search tool is initialised with the pre-processed metadata and the location of the directory containing the pre-trained Roberta-TandA model. The tool's init process....

1. Filters the metadata to identify papers that relate to covid-19 using a regex search to identify covid-related keywords (e.g. "covid-19", "sars-cov-2", "wuhan coronavirus"). The abstract text and cord_uid of these papers are then stored as key-value pairs in a dictionary which is set as the search tool's `.texts` attribute
2. Initialises the pytorch Roberta model, loading the tokenizer and model into memory and the model to the gpu if required.

In [4]:
searchtool = CordSearchQATool(covid_meta = meta,
                              qa_model_dir = "tanda_roberta_base_asnq")

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meta.drop_duplicates(subset=["title"], inplace=True)


Building Search tool
Initialising QA Model
QA Model Loaded and waiting for questions


The search tool is then ready to use. The aim of the search process can be summarised by two broad strategies:

1. Narrow the search space to relevant papers by searching for keywords
2. Identify papers that answer a specific research question by highlighting sentences that are likely to contain a relevant answer

Beginning with the keyword search, the aim is to find all the papers in the corpus that may relate to the research question; in this example, this is the papers relating to resource shortages. This is done using the search tool's `.search` method. The user defines a list of regex terms that are passed to the `.search` method's `containing` and / or `not_containing` arguments. The `containing_threshold` argument can also be used to set a minimum number of keywords an abstract should contain. The abstracts that meet these criteria are then stored as a list of ids in the search tool's `.search_results` attribute under the `search_name` provided (or a default name if none is provided). The regex search uses a basic logical `OR` criteria, so any abstracts containing a sufficient number of any of the keywords (individually or in combination) will be returned.

The search tool is also able to provide a html output of the search results with the `return_html_search_results` method, which is helpful in visualising the progress of any searches:

In [5]:
# search for keywords relating to resource shortages
searchtool.search(search_name="shortage", 
               containing=["shortage", "resources", "supply\W",
                           "availab", "scarce"],
               containing_threshold=2)

# display the first 3 results of the "shortage" search above
display(HTML(
    searchtool.return_html_search_results(search_name="shortage",
                                          n_results=3)
))

Search ## shortage ## created:
Searching 62853 texts with search parameters:
	containing: ['shortage', 'resources', 'supply\\W', 'availab', 'scarce']
above a threshold of 2
1093 search results returned and stored in shortage
0 results do not have an abstract


__________________________________________________________

Once satisfied that the search contains studies that may relate to the research question, papers that contain answers relevant to the research question can then be identified. 

The `return_answers` method uses the RoBERTa TandA model to output logit scores, indicating the likelihood that each sentence of an abstract answers a given question  - the higher the score, the more likely the model "thinks" the sentence contains an answer to the question posed. The user defines the search (i.e. the above "shortage" search) from which the abstracts are taken. The method takes the following arguments:

* **question**: the question all sentences are to be scored against
* **search_name**: the name of a search (like that above), from which the abstracts are taken and scored. If `None` every abstract will be used (not advisable as this will be a huge set of sentences over which to search)
* **min_score**: the minimum score a sentence must have to be returned in the output. If `None` the method will return a score for every sentence.
* **max_length**: the maximum length of a sentence in words / token. This is a variable used by the Roberta model to define the fixed-length of the model inputs and defaults to 128.

The output is a list of tuples with the format `(cord_uid, sentence index, sentence text, logit score)`, ordered by descending score. For example, take the following: 

In [6]:
# collect answers to the question specified
eg_answers = searchtool.return_answers(
    search_name="shortage", 
    question="preventing or stopping shortages or supply failures",
    min_score=-2)
eg_answers[0]

Checking 1093 search results for answers to preventing or stopping shortages or supply failures
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Inputs converted to BERT InputExamples
InputExamples converted to InputFeatures
InputFeatures converted to TensorDataset
TensorDataset converted to torch DataLoader
Ranking 12149 possible answers from 1093 texts:


100%|██████████| 122/122 [00:46<00:00,  2.61it/s]


('0afdww8c',
 5,
 'the policy and regulatory changes implemented at the federal and state levels can be categorized into the following four classes: 1) preventing virus transmission, which includes policies relating to visitation restrictions, personal protective equipment (ppe) guidance, and testing requirements; 2) expanding facilities’ capacities, which includes both the expansion of physical space for isolation purposes and the expansion of workforce to combat covid-19; 3) relaxing administrative requirements, which includes measures enacted to shift the attention of caretakers and administrators from administrative requirements to residents’ care; and 4) reporting covid-19 data, which includes the reporting of cases and deaths to residents, families, and administrative bodies (such as state health departments).',
 2.889717)

The above is the highest scoring sentence to the question **"preventing or stopping shortages or supply failures"** from the abstracts identified in the **"shortage"** search. The output shows the cord_uid of the paper, the order in which that sentence appears in the abstract, the sentence text and finally the score. 

To make visualising the QA search prcess easier, the tool also has a `return_html_answers` method. This works in exactly the same way as the `return_answers` method, featuring  the same arguments and outputting the same list of tuples. An additional HTML output is provided alongside the tuples, that features the title and full abstract from the highest scoring papers (the number of which can be set with the `top_n` argument), whith sentences above a certain score highlighted (specified by the `highlight_score` argument):

In [7]:
# same output as above, plus html to visualise the top 5 answers
answers_1, html_answers = searchtool.return_html_answers(
    search_name="shortage", 
    question="preventing or stopping shortages or supply failures",
    min_score=-2,
    top_n = 5,
    highlight_score=-2)

display(HTML(html_answers))

Checking 1093 search results for answers to preventing or stopping shortages or supply failures
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Inputs converted to BERT InputExamples
InputExamples converted to InputFeatures
InputFeatures converted to TensorDataset
TensorDataset converted to torch DataLoader
Ranking 12149 possible answers from 1093 texts:


100%|██████████| 122/122 [00:45<00:00,  2.66it/s]


______________________________________________

Analysis of the output shows some of the papers relate to "hearing loss" without any mention of resourcing issues, which is likely a confusion of the concept of "shortage" by the model given the question. Such papers can be removed from the search process using the `refine_search` method. This takes an existing searche and applies a new set of criteria to the results of the specified search (rather than the whole corpora in the metadata). The arguments of the `refine_search` method are the same as the `search` method. The following removes the papers relating to hearing loss from the original results of the "shortage" search:

In [8]:
searchtool.refine_search(search_name="shortage",
                         not_containing=["hearing loss", "loss of hearing"])

Refining search results from ## shortage ##
Searching 1093 with search parameters:
not containing:['hearing loss', 'loss of hearing']
1091 refined results returned and stored in shortage


Note that, by printing the `search_results` attribute of the search tool, a description of each search and subsequent refined search can be seen:

In [9]:
searchtool.search_results

{'shortage': ## SearchResult shortage ##
 2 Searches:
 Original Search:
 	containing: ['shortage', 'resources', 'supply\\W', 'availab', 'scarce']
 Refined Search 1:
 	not containing: ['hearing loss', 'loss of hearing']}

And with the refined search results, the `return_answers` method can again be used to retrieve relevant papers:

In [10]:
# run the search again with the excluded papers
answers_1 = searchtool.return_answers(
    search_name="shortage", 
    question="preventing or stopping shortages or supply failures",
    min_score=-2)

Checking 1091 search results for answers to preventing or stopping shortages or supply failures
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Inputs converted to BERT InputExamples
InputExamples converted to InputFeatures
InputFeatures converted to TensorDataset
TensorDataset converted to torch DataLoader
Ranking 12125 possible answers from 1091 texts:


100%|██████████| 122/122 [00:45<00:00,  2.66it/s]


*The full script relating to the covid search tool can be found in the [cord_search_qa_tool.py](https://github.com/samrelins/tanda_search_qa_tool/blob/master/cord_search_qa_tool.py) and [tanda_search_qa_tool.py](https://github.com/samrelins/tanda_search_qa_tool/blob/master/cord_search_qa_tool.py) modules in the task [github repo](https://github.com/samrelins/tanda_search_qa_tool)*

Once a list of satisfactory answers has been achieved, the second stage of the process can begin. A summarizer tool takes the unique ids from each of the papers highlighted in the search process, and automatically summarises them to produce a dataframe in the format of the competition specification. First, a list of cord_uids for the answers identified above is required:

In [11]:
#return a list of individual ids for each paper containing an answer / answers
cord_uids_1 = []
for cord_uid, *_ in answers_1:
    if cord_uid not in cord_uids_1:
        cord_uids_1.append(cord_uid)

Then the summariser tool can be initialised using these ids as the `cord_uids` argument. The tool also requires the metatdata dataframe (the `meta` argument), the directory containing the Roberta-Tanda model (the `tanda_dir` argument), and the location of the CORD-19 dataset (the `data_dir` argument) to access the full text of each paper:

In [12]:
summarizer_1 = CordResultSummarizer(cord_uids=cord_uids_1,
                                          meta=meta,
                                          data_dir=data_dir,
                                          tanda_dir="tanda_roberta_base_asnq")

Building result summarizer
Initialising QA Model
QA Model Loaded. Ready to build summary tables


The summary tool can then output a summary table of these papers by calling the `summary_table` method which takes three arguments:

* **challenge_question**: the question used by the summariser to identify the **challenge** feature for each text. The sentence from each abstract that scores the highest against this feature is used. This defaults to "what is the problem issue challenge"
* **solution_question**: similar to the **challenge_question** this is the question used to identify the solution feature. This defaults to "what action should be taken".
* **n_hits**: This is the number of times a particular group must be mentioned in the text to be used in the **addressed_population** feature. This defaults to 2.

The following is the summary output for the "shortage" results:

In [13]:
summary_table_1 = summarizer_1.summary_table(
    challenge_question="what is the problem issue challenge", # these are defaults and can be left out as below
    solution_question="what action should be taken" 
)

display_features = ["study", "addressed_population", "strength_of_evidence", 
                    "study_type", "challenge", "solution", "journal"]
summary_table_1[display_features].head(10)


Building summary table from 50 papers
Finding challenges from paper text
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Ranking 517 possible answers from 50 texts:


100%|██████████| 6/6 [00:01<00:00,  3.05it/s]

Finding solutions from paper text





[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Ranking 6771 possible answers from 50 texts:


100%|██████████| 68/68 [00:25<00:00,  2.66it/s]


Building table entries



100%|██████████| 50/50 [00:36<00:00,  1.37it/s]


Unnamed: 0,study,addressed_population,strength_of_evidence,study_type,challenge,solution,journal
0,"Long-Term Care, Residential Facilities, and COVID-19: An Overview of Federal and State Policy Responses",Healthcare Workers,"data: 80,",expert review,the high morbidity and mortality at these facilities has been attributed to a combination of a particularly vulnerable population and a lack of resources to mitigate the risk.,"the policy and regulatory changes implemented at the federal and state levels can be categorized into the following four classes: 1) preventing virus transmission, which includes policies relating to visitation restrictions, personal protective equipment (ppe) guidance, and testing requirements; 2) expanding facilities’ capacities, which includes both the expansion of physical space for isolation purposes and the expansion of workforce to combat covid-19; 3) relaxing administrative requirements, which includes measures enacted to shift the attention of caretakers and administrators from administrative requirements to residents’ care; and 4) reporting covid-19 data, which includes the reporting of cases and deaths to residents, families, and administrative bodies (such as state health departments).",J Am Med Dir Assoc
1,Double-Edged Spike—Are SARS-CoV-2 Serologic Tests Safe Right Now?,General Population,,expert review,sars-cov-2 is a highly contagious and acute severe respiratory pathogen that has produced an enormous strain on healthcare resources.,"in the united states and many other countries, specific social behavior restrictions have been enacted to moderate the impact of rapid propagation of this contagion (ie, “flatten the curve”).",Lab Med
2,Double-Edged Spike: Are SARS-CoV-2 Serologic Tests Safe Right Now?,General Population,,expert review,sars-cov-2 is a highly contagious and acute severe respiratory pathogen that has produced an enormous strain on health care resources.,"in the united states and many other countries, specific social behavior restrictions have been enacted to moderate the impact of rapid propagation of this contagion (ie, “flatten the curve”).",Am J Clin Pathol
3,Navigating the COVID-19 Pandemic: Lessons From Global Surgery,family members,Locations: Us,expert review,"lack of adequate testing, small reserves of ventilators and global supply chain disruptions, among other causes, have led to shortages affecting care for critically ill patients – most notably human resources, ventilators, and personal protective equipment (ppe).2 this has transformed hospitals in hic to a “resource variable environment” with uncertainty of the supplies, intensive care unit (icu) beds, and staff available at any given time.although this challenging environment is novel for many providers in hic, these constraints are commonplace for providers in low- and middle-income countries (lmic).","however, it is essential that social media be used responsibly, and that precautions are taken to prevent the spread of misinformation.",Ann Surg
4,Challenges and solutions for addressing critical shortage of supply chain for personal and protective equipment (PPE) arising from Coronavirus disease (COVID19) pandemic – Case study from the Republic of Ireland,healthcare workers,"Locations: Ireland, Ffrs, China",simulation,"coronavirus (covid-19) is highly infectious agent that causes fatal respiratory illnesses, which is of great global public health concern.","surface, or contact surface, disinfection or sterilization of ppe will suffice, as coronavirus does not penetrate materials.",Sci Total Environ
5,AGS Position Statement: Resource Allocation Strategies and Age‐Related Considerations in the COVID‐19 Era and Beyond,"older adults, health care providers",,expert review,"concurrently, concerns about potential shortages of healthcare professionals and health supplies to address these needs have focused attention on how resources are ultimately allocated and used.","it is intended to inform stakeholders including hospitals, health systems, and policymakers about ethical considerations to consider when developing strategies for allocating scarce resources during an emergency involving older adults.",J Am Geriatr Soc
6,"Viable supply chain model: integrating agility, resilience and sustainability perspectives—lessons from and thinking beyond the COVID-19 pandemic",General Population,,simulation,viability is the ability of a supply chain (sc) to maintain itself and survive in a changing environment through a redesign of structures and replanning of performance with long-term impacts.,"the principal ideas of the vsc model are multiple structural sc designs for supply–demand matching and, most importantly, establishment and control of adaptive mechanisms for transitions between the structural designs.",Ann Oper Res
7,Ultra-fast one-step RT-PCR protocol for the detection of SARS-CoV-2,General Population,,expert review,"the currently used methods for diagnostics are time consuming and also hindered by the limited availability of reagents and reaction costs, thus presenting a bottle neck for prevention of covid-19 spread.","since vaccines and therapies are still not available for the population, prevention becomes desperately needed.",
8,Mayo Clinic Strategies for COVID-19 Elements of an Effective Incident Command Center,General Population,,expert review,"any limitation of critical care components-physicians, nurses, respiratory therapists, intensive care unit (icu) beds, ventilator circuits, or pharmaceuticals-potentially limits the institution's capacity to care for a surge of patients.",pandemic planning\n\npandemic planning involves preparing health care systems to care for a prolonged surge of patients.,Mayo Clin Proc
9,Epidemic investigations within an arm’s reach – role of google maps during an epidemic outbreak,General Population,"locations: 4, cases: 95, participants: 6, Locations: Israel",analytical study,epidemiologic inquiry requires resources and time which may not be available or reduced when the outbreak is excessive.,"in these cases, possible options are confiscation of the information under the law upon review by an ethics committee or direct acquisition of data from google depending on the laws and regulations of the country.",Health Technol (Berl)


The summary process takes place in the following stages:

* **study design**: regular expression keywords are used to identify sections of the text where the author specifies their study design. If these are detected the study design is allocated accordingly. If not, keywords highlighting phrases commonly found in simulation studies are identified that can be confidently used to assign that study type. Remaining studies are queried for keywords relating to quantitative and experimental study designs - these are classified as "analytical studies" to differentiate them from the non-primary literature review type studies. The remaining studies are then classified as "editorial" if they are addressed "to the editor" or "expert review" otherwise.

* **challenge / solution**: These sections make use of the same QA model as the search tool. Default questions **"what is the problem / issue / challenge?"** and **"what action should be taken?"** (minus the punctuation) are used to rank sentences from the paper text. The highest scoring answers are used in the respective sections as a high level summary. The challenge question uses only text from the abstract, as the abstracts are generally found to be a better summary of the issues the paper addresses - the solutions, however, are often better found in the full text. The tool does allow for custom challenge / solution questions should that be required, although the author has found the default questions to produce the best results in most cases.

* **addressed population**: The tool starts by identifying generic population terms i.e. "people", "patients", "staff", "workers" via regular expression search. The [Spacy](https://spacy.io/) POS tags from the words immediately preceding and following these terms are then examined - particular grammatical constructs are looked for that identify descriptions of groups, for example "critically ill patients" or "people experiencing economic hardship". Terms that are found multiple times in the text, determined by a user-defined threshold, are then returned as the addressed populations.

* **measure of evidence**: This also makes use of Spacy POS tags along with NER tagging. The named entities in the text are iterated through to identify cardinal numbers that relate to evidential quantities, such as"respondents", "patients", "studies". These are then compared with the overall mentions of these terms within the text to ensure relevance, and then reported as measures of evidence. Care is taken to eliminate numbers that form part of the referencing or structure of the text, and are not themselves quantities. 


The remaining features can easily be parsed from the metadata provided in the task dataset. All of the code and logic for the above can be found in the [cord_result_summarizer.py](https://github.com/samrelins/tanda_search_qa_tool/blob/master/cord_result_summarizer.py) and [summarizer_helpers.py](https://github.com/samrelins/tanda_search_qa_tool/blob/master/summarizer_helpers.py) files in the [competition repo](https://github.com/samrelins/tanda_search_qa_tool). 

The TandA model is very sensitive to the wording of the questions it is provided. The best approach to the search process is generally to ask a number of related questions, which enlarges the search space for potential answers in an iterative manner. This is done in the following cells, which return answers to other questions relating to resource failures, resulting in the final output table. It should be noted that none of the outputs submitted are exhaustive and this process has potential to identify and summarise many more useful studies. 

(Unhide the outputs below to see the results from each subsequent search and summary)

In [14]:
answers_2, html_answers = searchtool.return_html_answers(
    search_name="shortage", 
    question="reducing demand or use of scarce low resources",
    min_score=-2,
    highlight_score=-2)

display(HTML(html_answers))

Checking 1091 search results for answers to reducing demand or use of scarce low resources
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Inputs converted to BERT InputExamples
InputExamples converted to InputFeatures
InputFeatures converted to TensorDataset
TensorDataset converted to torch DataLoader
Ranking 12125 possible answers from 1091 texts:


100%|██████████| 122/122 [00:45<00:00,  2.66it/s]


In [15]:
cord_uids_2 = []
for cord_uid, *_ in answers_2:
    if cord_uid not in cord_uids_2 and cord_uid not in cord_uids_1:
        cord_uids_2.append(cord_uid)
        
summarizer_2 = CordResultSummarizer(cord_uids=cord_uids_2,
                                          meta=meta,
                                          data_dir=data_dir,
                                          tanda_dir="tanda_roberta_base_asnq")

summary_table_2 = summarizer_2.summary_table()

summary_table_2[display_features].head(10)

Building result summarizer
Initialising QA Model
QA Model Loaded. Ready to build summary tables

Building summary table from 47 papers
Finding challenges from paper text
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Ranking 451 possible answers from 47 texts:


100%|██████████| 5/5 [00:01<00:00,  2.90it/s]


Finding solutions from paper text
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Ranking 3985 possible answers from 47 texts:


100%|██████████| 40/40 [00:15<00:00,  2.65it/s]


Building table entries



100%|██████████| 47/47 [00:21<00:00,  2.23it/s]


Unnamed: 0,study,addressed_population,strength_of_evidence,study_type,challenge,solution,journal
0,The Hidden Victims of COVID-19 Pandemic: Congenital Heart Disease Patients,chd patients,Locations: Egypt,expert review,"limited resources such as hospital beds, ventilators, and blood products have resulted in difficult decisions regarding timing of chd surgery.","it is paramount for each program to reduce overall exposure by scheduling providers in on/off rotations, maintaining adequate ppe, surveillance via widespread testing of asymptomatic health care providers and strategies for remote telehealth (10).",JACC Case Rep
1,COVID-19 and the RAAS—a potential role for angiotensin II?,General Population,Locations: China,expert review,the severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and its associated coronavirus disease 2019 (covid-19) have wreaked havoc on healthcare systems globally.,"should it be considered earlier in the course of disease, perhaps as a first-line vasopressor?",Crit Care
2,COVID-19 pandemic: a stress test for interventional radiology,General Population,,expert review,"is currently struggling due to a lack of beds, material and human resources to meet demand and to cope with this global threat.","if the procedure cannot be delayed for a patient who is actually affected by covid-19, such as emergencies involving life-threatening conditions or severe symptoms or progressive disease, treatment should be undertaken using a full deployment of the personal protective equipment and following the recommendations on protecting, cleaning and disinfecting the facility (13, 14) .",Diagn Interv Imaging
3,COVID-19: Should We Test Everyone?,infected patients,"Locations: China, Wuhan",analytical study,"on the other hand, they are challenged by the patients' frustrations and anxieties, stemming from the concerns of not being tested for covid-19 for not meeting the definition of pui (person under investigation).",infected people must be isolated to control the virus spread; potentially infected individuals should be quarantined to minimize the possibility of infecting healthy people; and vulnerable people such as the elderly and patients with chronic health issues need to be secluded to prevent infection.,
4,"If Not Now, When? the Role of Geriatric Leadership as Covid-19 Brings the World to Its Knees","older people, health care workers",,expert review,"the need to treat a vast number of patients is overwhelming, resources are scarce, and difficult ethical decisions have to be made.","the strategies include closing off the facility by restricting visitors, the use of personal protective equipment, the active screening of residents and staff, the implementation of social distancing and isolation of suspected cases, and the early identification, and treatment of severe illness.",Front Med (Lausanne)
5,Principles of ethics and critical communication during the COVID-19 pandemic(),"patients with cancer, healthcare professionals",,expert review,• scarce resources should be allocated to maximize benefit without unfairly affecting any group.,"in the extreme, gynecologic oncologists may be asked to communicate that potentially life-sustaining resources (e.g., ventilators) are not available within crisis standards of care.",Gynecol Oncol
6,Optimization of Resources and Modifications in Acute Ischemic Stroke Care in Response to the Global COVID-19 Pandemic,stroke patients,,expert review,"a steadily rising number of patients requiring intensive care, a large proportion from racial and ethnic minorities, demands creative solutions to provide high-quality care while ensuring healthcare worker safety in the face of limited resources.","15, 16 in light of this data and the increased risk of healthcare worker exposure associated with performing a ct scan in a covid+ patient, we recommend that pharmacologic prophylaxis to prevent deep vein thrombosis and antiplatelet therapy for secondary stroke prevention can be initiated 24 hours following administration of iv rtpa in the absence of a ct scan in neurologically stable patients.",J Stroke Cerebrovasc Dis
7,A rapid review of evidence and recommendations from the SIOPE radiation oncology working group to help mitigate for reduced paediatric radiotherapy capacity during the COVID-19 pandemic or other crises(),middle - income countries,"fractions: 33, Locations: Germinoma, Optimal",analytical study,objective: to derive evidence-based recommendations for the optimal utilisation of resources during unexpected shortage of radiotherapy capacity.,")•consider active surveillance (for who grade i-ii primary central nervous system low-grade gliomas and craniopharyngiomas after initial biopsy or debulking surgery).•consider isoeffective hypofractionated radiotherapy schedules (which also reduce overall treatment time), changing dose per fraction from 1.6-1.8 gy to above 2.0 gy, for selected poor prognosis patients where radiotherapy cannot be safely deferred, (neuroblastoma, rhabdomyosarcoma, ewing sarcoma and high-grade or diffuse midline gliomas.)",Radiother Oncol
8,Strategies for Liver Transplantation during the SARS CoV‐2 Outbreak Preliminary Experience from a Single Center in France,sars - cov-2 infected patients,"recipients: 9, Locations: France",retrospective observational study,liver transplantation during the ongoing sars‐cov‐2 pandemic is challenging given the urgent need to reallocate resources to other areas of patient care.,"in addition, to further prioritize lt activity, major elective interventions (e.g major hepatectomy, esophagectomy) in frail patients with potential long icu stays were reduced according to the national guidelines.",Am J Transplant
9,Ethics guidelines on COVID-19 triage—an emerging international consensus,The Elderly,,expert review,"this can lead to a shortage of ventilators and intensive care resources, resulting in limited medical care and death [2].","in recognition of the moral stress that taking these decisions may bring on, all guidelines call for psychosocial support for health professionals.",Crit Care


In [16]:
answers_3, html_answers = searchtool.return_html_answers(
    search_name="shortage", 
    question="allocating or prioritising care or scarce resources",
    min_score=-2,
    highlight_score=-2)

display(HTML(html_answers))

Checking 1091 search results for answers to allocating or prioritising care or scarce resources
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Inputs converted to BERT InputExamples
InputExamples converted to InputFeatures
InputFeatures converted to TensorDataset
TensorDataset converted to torch DataLoader
Ranking 12125 possible answers from 1091 texts:


100%|██████████| 122/122 [00:45<00:00,  2.66it/s]


In [17]:
cord_uids_3 = []
previous_results = cord_uids_1 + cord_uids_2
for cord_uid, *_ in answers_3:
    if cord_uid not in cord_uids_3 and cord_uid not in previous_results:
        cord_uids_3.append(cord_uid)
        
summarizer_3 = CordResultSummarizer(cord_uids=cord_uids_3,
                                          meta=meta,
                                          data_dir=data_dir,
                                          tanda_dir="tanda_roberta_base_asnq")

summary_table_3 = summarizer_3.summary_table()

summary_table_3[display_features].head(10)

Building result summarizer
Initialising QA Model
QA Model Loaded. Ready to build summary tables

Building summary table from 29 papers
Finding challenges from paper text
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Ranking 345 possible answers from 29 texts:


100%|██████████| 4/4 [00:01<00:00,  3.01it/s]

Finding solutions from paper text





[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Ranking 3435 possible answers from 29 texts:


100%|██████████| 35/35 [00:13<00:00,  2.69it/s]


Building table entries



100%|██████████| 29/29 [00:18<00:00,  1.56it/s]


Unnamed: 0,study,addressed_population,strength_of_evidence,study_type,challenge,solution,journal
0,Personalized Risk–Benefit Ratio Adaptation of Breast Cancer Care at the Epicenter of COVID‐19 Outbreak,patients with cancer,Locations: Italy,expert review,"the pandemic spread has challenged the national health system, requiring reallocation of most of the available health care resources to treat covid‐19‐positive patients, generating a competition with other health care needs, including cancer.","if the screening is positive, an immediate test for covid‐19 or quarantine is indicated, as needed.",Oncologist
1,"Personalized Predictive Models for Symptomatic COVID-19 Patients Using Basic Preconditions: Hospitalizations, Mortality, and the Need for an ICU or Ventilator",ICU Patients,Locations: Mexico,simulation,"objective: to develop personalized models that predict the following events: (1) hospitalization, (2) mortality, (3) need for icu, and (4) need for a ventilator.","the main idea of the svm is to maximize the margin between the data and the chosen hyperplane, where the margin is defined as the distance of the closest data point in a class to the margin.",medRxiv
2,Paucity and disparity of publicly available sex-disaggregated data for the COVID-19 epidemic hamper evidence-based decision-making,"infected men, other countries, women among confirmed cases, age groups","countries: 9, cases: 6, deaths: 63, Locations: Spain, Italy, Us, Denmark, China, Belgium, Sweden, Australia, Norway, Canada, France, Switzerland, Portugal, Germany, Netherlands, Austria, Usa, Washington, California, Illinois, Brazil, Uk",analytical study,"beyond suboptimal sex disaggregation, our analysis found a paucity of usable raw data sets and a generalized lack of standardization of captured data, making comparisons difficult.","inclusion of women in nih-sponsored clinical trials has been required since 1993 and, since 2015, nih policies requires consideration of the concept of sex as a biological variable (sabv) in the design, analysis, and reporting of studies 15 .",
3,ASE Statement on the Reintroduction of Echocardiography Services During the COVID-19 Pandemic,Healthcare Workers,,expert review,"as the pandemic unfolded, many non-urgent echo studies were deferred in an attempt to reduce coronavirus transmission among patients and healthcare workers, conserve personal protective equipment (ppe), and prepare for a potential surge of covid-19 patients.","waiting area\n\n• communication with patients about readiness for exam prior to arrival in reception/waiting area (e.g., text messaging, phone call).",J Am Soc Echocardiogr
4,Risk Factors Associated with Disease Severity and Length of Hospital Stay in COVID-19 Patients,General Population,"patients: 99,",retrospective observational study,"given the wide clinical spectrum of covid-19, a key challenge faced by frontline clinical staff is prioritisation of stretched resources.","in our study, the log-rank test suggested glucocorticoids use led to a prolonged length of hospital stay in covid-19 patients, which discourages its use.",J Infect
5,Addressing the shortage of personal protective equipment during the COVID-19 pandemic in India-A public health perspective,General Population,Locations: India,expert review,"previously, ppe was commonly used in the hospital environment, is now a scarce and precious commodity in many locations when it is needed most to care for highly infectious patients [1].it is even more difficult to get ppe when common people get started to use/stock ppe in fear of infectious disease contamination without following national guidelines, which is an added insult to the injury of health system.","legislative steps like mandatory social distancing, curfew, can help the crisis period by flattening the epidemic curve [10].",AIMS Public Health
6,Evolution of Plastic Surgery Provision due to COVID-19 – The Role of the ‘Pandemic Pack’,General Population,,expert review,"2 the combination of unfamiliar environments, lack of accessible equipment, requirement to reduce time spent with patients and adherence to social distancing has resulted in the need to provide a more mobile and flexible service.in order to support our mobile service, we have found that, as in other disaster situations where specialised bags have been deployed, 3 using a simple bag containing essential equipment and consumables has revolutionised our ability to work at the point of referral and avoid unnecessary trips to theatre.","the bag is easily cleaned with 1,000 ppm available chlorine (in accordance with public health england guidance) after each patient exposure.",J Plast Reconstr Aesthet Surg
7,Reallocating Ventilators During the COVID-19 Pandemic: Is it Ethical?,health care professionals,,expert review,"many physicians are already facing a profound ethical dilemma: how to allocate these resources during shortages1, 2, 3, 4; with some hospitals, states and countries even having to establish policies on which groups of patients to prioritize in providing lifesaving treatment during the covid-19 crisis.5, 6, 7, 8\nbioethicists, thought leaders, and think tanks have formulated frameworks to provide guidance for physicians on how to allocate scarce resources during crises9, 10, 11, 12, 13.","in essence, this recommendation, which has been advocated by some bioethicists, is asking health care professionals to, metaphorically speaking, push the fat man over the bridge in order to save the five lives.",Surgery
8,Commentary: Vulnerability and Resilience Demonstrated: Cardiac Surgeons During COVID-19,Healthcare Workers,,expert review,"the authors point out, in a pandemic, it is not who you are, but what you can do, and when should you do it?",the authors begin with stratifying cardiac surgery patients by acuity and reduced surgeries to less than 10% normal case volumes.,J Thorac Cardiovasc Surg
9,EU wine policy in the framework of the CAP: post-2020 challenges,non - eu countries,"measures: 20, Locations: Italy, Vineyards, France, Spain, Germany, Portugal",expert review,"the eu common agricultural policy (cap), and with it the eu wine policy, is experiencing a reform process, started in 2018, in order to address ambitious environmental and social objectives, in conjunction with the goal of a competitive agricultural sector.",the 9 specific objectives of the future cap are the following:\nensure a fair income to farmers;increase competitiveness;rebalance the power in the food chain;climate change action;environmental care;preserve landscapes and biodiversity;support generational renewal;vibrant rural areas;protect food quality and health.,Agric Econ


In [18]:
answers_4, html_answers = searchtool.return_html_answers(
    search_name="shortage", 
    question="managing intensive care bed shortage",
    min_score=-2,
    highlight_score=-2)

display(HTML(html_answers))

Checking 1091 search results for answers to managing intensive care bed shortage
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Inputs converted to BERT InputExamples
InputExamples converted to InputFeatures
InputFeatures converted to TensorDataset
TensorDataset converted to torch DataLoader
Ranking 12125 possible answers from 1091 texts:


100%|██████████| 122/122 [00:45<00:00,  2.66it/s]


In [19]:
cord_uids_4 = []
previous_results = cord_uids_1 + cord_uids_2 + cord_uids_3
for cord_uid, *_ in answers_4:
    if cord_uid not in cord_uids_4 and cord_uid not in previous_results:
        cord_uids_4.append(cord_uid)
        
summarizer_4 = CordResultSummarizer(cord_uids=cord_uids_4,
                                          meta=meta,
                                          data_dir=data_dir,
                                          tanda_dir="tanda_roberta_base_asnq")

summary_table_4 = summarizer_4.summary_table()

summary_table_4[display_features].head(10)

Building result summarizer
Initialising QA Model
QA Model Loaded. Ready to build summary tables

Building summary table from 65 papers
Finding challenges from paper text
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Ranking 736 possible answers from 65 texts:


100%|██████████| 8/8 [00:02<00:00,  2.85it/s]

Finding solutions from paper text





[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Ranking 6959 possible answers from 65 texts:


100%|██████████| 70/70 [00:26<00:00,  2.66it/s]


Building table entries



100%|██████████| 65/65 [00:35<00:00,  1.81it/s]


Unnamed: 0,study,addressed_population,strength_of_evidence,study_type,challenge,solution,journal
0,Demand for hospitalization services for COVID-19 patients in Brazil,General Population,"resources: 191, cases: 100, Locations: Brazil, China",simulation,"as the number of cases grows in the country, there is a concern that the health system may become overwhelmed, resulting in shortages of hospital beds, intensive care unit beds, and mechanical ventilators.","the response must be immediate, and 184 it will demand a concerted effort from society.",
1,Philanthropy and Humanity in the Face of a Pandemic – A letter to the editor on “World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19)” (Int J Surg 2020; 76:71-6),General Population,Locations: China,expert review,"since the very beginning of the covid 19 pandemic, the health care industry has been forced to confront an invisible enemy -the shortage of personal protected equipment (ppe).",another innovative approach was to enlist local garment shop and volunteers to start producing gowns from suitable material.,Int J Surg
2,What does the COVID-19 pandemic teach us about global value chains? The case of medical supplies,u.s . healthcare workers,"masks: 20, Locations: China, Germany, Canada, Malaysia, Mexico, Ireland, Singapore",expert review,the covid-19 pandemic has caused a dramatic shortage in the medical supplies needed to treat the virus due to a massive surge in demand as the disease circled the globe during the first half of 2020.,"apparently spurred by fox news host tucker carlson’s segment on 3m that criticized the company for allegedly putting consumers in other countries before healthcare workers and local governments in the u.s. (derensis, 2020), president trump issued an executive order on april 2 that invoked the defense production act (dpa) of 1950 to require 3m to cease its export of n95 masks.",J Int Bus Policy
3,MADVent: A low‐cost ventilator for patients with COVID‐19,General Population,,simulation,"there is an unmet need for rapidly deployable, emergency‐use ventilators with sufficient functionality to manage covid‐19 patients with severe acute respiratory distress syndrome.",a low-pressure situation was simulated by disconnecting the endotracheal tube to trigger an alarm which results in the system immediately stopping.,Med Devices Sens
4,Use of subcutaneous tocilizumab to prepare intravenous solutions for COVID-19 emergency shortage: Comparative analytical study of physicochemical quality attributes,General Population,"samples: 500, Locations: Tcz, Spain, Germany",systematic review and metaanalysis,"covid-19, a disease caused by the novel coronavirus sars-cov-2, has produced a serious emergency for global public health, placing enormous stress on national health systems in many countries.","the virus binds to alveolar epithelial cells, thus activating the innate immune system and adaptive immune system, resulting in the release of a large number of cytokines, including il-6 [6].",J Pharm Anal
5,Practical Considerations When Performing Neurodiagnostic Studies on Patients with COVID-19 and Other Highly Virulent Diseases,staff exposure,,expert review,"the coronavirus disease 2019, sars-cov-2 (the cause of covid-19), has led to a worldwide shortage of personal protective equipment (ppe) and an increased stress on hospital resources, which has resulted in a spike in the anxiety of the frontline healthcare workers.",this protocol is to guide the neurodiagnostic service line with the steps/actions needed to complete a procedure on a patient who is presumptive positive (i.e.,Neurodiagn J
6,Bacillus Calmette Guérin (BCG) vaccination use in the fight against COVID-19 – what’s old is new again?,healthcare workers,,analytical study,this treatment has been affected in recent years by global shortages of the agent.,\n\nbacillus calmette guérin (bcg) is a vaccine derived from the live attenuated strain of mycobacterium bovis and used widely as a vaccination against tuberculosis in high-risk regions.,Future oncology
7,The pandemic under siege: A view from the Gaza Strip,General Population,"Locations: Gaza, Israel",expert review,"additionally, the limited movement of goods because of the siege has led to an acute shortage of medical supplies and equipment that are essential for combating a pandemic.","following an outbreak of hostilities between the two factions, and as a way of preempting a fatah-led coup, hamas assumed full military control of gaza in june 2007 (sayigh, 2007).",World Dev
8,COVID‐19: How Can Rural Community Pharmacies Respond to the Outbreak?,"rural communities, primary care providers",,expert review,"rural patients also face unique challenges such as extended travel time to an acute care facility, hazardous terrain, and the lack of reliable or public transportation.","legislative action should be taken immediately to ensure pharmacists are recognized as health care providers, as the difference between success and failure in responding to the current covid‐19 pandemic will depend equally on what is done now, and what will be done in the near future.",J Rural Health
9,Alternative Qualitative Fit Testing Method for N95 Equivalent Respirators in the Setting of Resource Scarcity at the George Washington University,General Population,Locations: Milwaukee,expert review,the 2019 novel coronavirus (covid-19) has caused an acute shortage of personal protective equipment (ppe) globally as well as shortage in the ability to test ppe such as respirator fit testing.,the sensitivity screening involves placing the subject under a hood without a respirator and aerosolizing one of the four qualitative substances.,


In [20]:
%rm -rf *

%cp ../input/hospital-infrastructure-to-prevent-outbreaks/what_are_ways_to_create_hospital_infrastructure_to_prevent_nosocomial_outbreaks.csv /kaggle/working
%cp ../input/management-of-patients-who-are-underhoused/management_of_patients_who_are_underhoused_or_otherwise_lower_social_economic_status.csv /kaggle/working
%cp ../input/methods-to-control-spread-in-communities/methods_to_control_the_spread_in_communities.csv /kaggle/working
%cp ../input/modes-of-communicating-with-target-communities/modes_of_communicating_with_target_high_risk_populations.csv /kaggle/working

In [21]:
summary_table = pd.concat([summary_table_1, 
                           summary_table_2,
                           summary_table_3,
                           summary_table_4])
summary_table.to_csv("what_are_recommendations_for_combating_overcoming_resource_failures.csv")