# Pairwise construction of PubMed search strings

## About this Notebook Template

### What is this?

This notebook is a template that can be used to generate an **all-pairs** list of search strategies from two lists of terms. It can be used to create searches for MeSH term/subheading combinations and to combine keywords as phrases, proximity searches, or simple AND combinations.

### Why?

Generating a pairwise list of search strategies is a useful approach when you need to use a set of MeSH terms with a set of subheadings, and to convert a truncation/wildcard search in PubMed to a proximity search, especially when access to databases with different proximity search functionality is not available. Because proximity searching in PubMed is not yet compatible with truncation and wildcards, this often means variations on search terms must be fully spelled out, and this can involve a great deal of typing. This notebook does the hard work for you: simply enumerate the terms you wish to combine below and let the notebook generate all possible pairings as database-ready formatted search strings. It generates outputs that can be pasted directly into PubMed or into Excel for search documentation. 

### Where did this idea come from?

The pairwise technique is inspired by the [all-pairs testing methodology](https://en.wikipedia.org/wiki/All-pairs_testing) from the field of software quality assurance.

## Imports and Constants

In [183]:
from IPython.display import display, HTML

pubmed_search_url = "https://pubmed.ncbi.nlm.nih.gov/?term="

## Construct MeSH search strings


### Enter Term Lists

Create two files using the ['writefile' Jupyter Magic Command](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-writefile), one that contains a list of MeSH terms and one that contains a list of Subheadings, that you wish to create a list of pairwise combinations from.
 
Place one term per row.

Note: The script does not check whether a particular heading/subheading pairing is allowed in MeSH.

#### Enter MeSH Main Terms

In [184]:
%%writefile mesh-terms.txt
heading one
heading two
heading three
heading four

Overwriting mesh-terms.txt


#### Enter MeSH Subheading Terms

In [185]:
%%writefile subheadings.txt
subheading one
subheadding two
subheading three

Overwriting subheadings.txt


### Construct MeSH Main Heading/Subheading Search String

Read the contents of each file into a list

In [186]:
with open("mesh-terms.txt") as f:
    mesh_terms = f.read().splitlines()

with open("subheadings.txt") as f:
    subheadings = f.read().splitlines()

Create a list of pairwise combinations of MeSH terms and Subheadings with appropriate PubMed syntax.

In [187]:
mesh_searches = [
    mesh_term + "/" + subheading + "[mh]"
    for mesh_term in mesh_terms
    for subheading in subheadings
]

for mesh_search in mesh_searches:
    print(mesh_search)

heading one/subheading one[mh]
heading one/subheadding two[mh]
heading one/subheading three[mh]
heading two/subheading one[mh]
heading two/subheadding two[mh]
heading two/subheading three[mh]
heading three/subheading one[mh]
heading three/subheadding two[mh]
heading three/subheading three[mh]
heading four/subheading one[mh]
heading four/subheadding two[mh]
heading four/subheading three[mh]


Concatenate the search strings into a single search strategy with " OR ".

This string can be copied and pasted directly into PubMed's search box.

In [188]:
mesh_search_string = " OR ".join(mesh_searches)

print(mesh_search_string)

heading one/subheading one[mh] OR heading one/subheadding two[mh] OR heading one/subheading three[mh] OR heading two/subheading one[mh] OR heading two/subheadding two[mh] OR heading two/subheading three[mh] OR heading three/subheading one[mh] OR heading three/subheadding two[mh] OR heading three/subheading three[mh] OR heading four/subheading one[mh] OR heading four/subheadding two[mh] OR heading four/subheading three[mh]


### Link to launch search

In [189]:
display(
    HTML(
        f'<a href="{pubmed_search_url + mesh_search_string.replace(" ", "+")}">Search PubMed with MeSH heading/subheading search string</a>'
    )
)

## Construct keyword proximity search strings

### Enter Term Lists

Create two files, one for each list of concepts you want to combine in a proximity search.

#### Enter term list for first topic

In [190]:
%%writefile topic1-keywords-proximity.txt
one
ones
two
twos
three
threes

Overwriting topic1-keywords-proximity.txt


#### Enter term list for second topic

In [191]:
%%writefile topic2-keywords-proximity.txt
four
fours
five
fives
six
sixes

Overwriting topic2-keywords-proximity.txt


#### Specify search field

In [192]:
proximity_field = "tiab"

#### Specify proximity distance

In [193]:
proximity_distance = 2

### Construct keyword proximity search string

Read the contents of each file into a list

In [220]:
with open("topic1-keywords-proximity.txt") as f:
    topic1_terms = f.read().splitlines()

with open("topic2-keywords-proximity.txt") as f:
    topic2_terms = f.read().splitlines()

Create a list of pairwise combinations of terms in each list in appropriate PubMed syntax.

In [216]:
keyword_proximity_searches_pm = [
    f'"{topic1_term} {topic2_term}"[{proximity_field}:~{proximity_distance}]'
    for topic1_term in topic1_terms
    for topic2_term in topic2_terms
]
for keyword_search in keyword_proximity_searches_pm:
    print(keyword_search)

"one four"[tiab:~2]
"one fours"[tiab:~2]
"one five"[tiab:~2]
"one fives"[tiab:~2]
"one six"[tiab:~2]
"one sixes"[tiab:~2]
"ones four"[tiab:~2]
"ones fours"[tiab:~2]
"ones five"[tiab:~2]
"ones fives"[tiab:~2]
"ones six"[tiab:~2]
"ones sixes"[tiab:~2]
"two four"[tiab:~2]
"two fours"[tiab:~2]
"two five"[tiab:~2]
"two fives"[tiab:~2]
"two six"[tiab:~2]
"two sixes"[tiab:~2]
"twos four"[tiab:~2]
"twos fours"[tiab:~2]
"twos five"[tiab:~2]
"twos fives"[tiab:~2]
"twos six"[tiab:~2]
"twos sixes"[tiab:~2]
"three four"[tiab:~2]
"three fours"[tiab:~2]
"three five"[tiab:~2]
"three fives"[tiab:~2]
"three six"[tiab:~2]
"three sixes"[tiab:~2]
"threes four"[tiab:~2]
"threes fours"[tiab:~2]
"threes five"[tiab:~2]
"threes fives"[tiab:~2]
"threes six"[tiab:~2]
"threes sixes"[tiab:~2]


In [217]:
keyword_proximity_search_string = " OR ".join(keyword_proximity_searches_pm)

print(keyword_proximity_search_string)

"one four"[tiab:~2] OR "one fours"[tiab:~2] OR "one five"[tiab:~2] OR "one fives"[tiab:~2] OR "one six"[tiab:~2] OR "one sixes"[tiab:~2] OR "ones four"[tiab:~2] OR "ones fours"[tiab:~2] OR "ones five"[tiab:~2] OR "ones fives"[tiab:~2] OR "ones six"[tiab:~2] OR "ones sixes"[tiab:~2] OR "two four"[tiab:~2] OR "two fours"[tiab:~2] OR "two five"[tiab:~2] OR "two fives"[tiab:~2] OR "two six"[tiab:~2] OR "two sixes"[tiab:~2] OR "twos four"[tiab:~2] OR "twos fours"[tiab:~2] OR "twos five"[tiab:~2] OR "twos fives"[tiab:~2] OR "twos six"[tiab:~2] OR "twos sixes"[tiab:~2] OR "three four"[tiab:~2] OR "three fours"[tiab:~2] OR "three five"[tiab:~2] OR "three fives"[tiab:~2] OR "three six"[tiab:~2] OR "three sixes"[tiab:~2] OR "threes four"[tiab:~2] OR "threes fours"[tiab:~2] OR "threes five"[tiab:~2] OR "threes fives"[tiab:~2] OR "threes six"[tiab:~2] OR "threes sixes"[tiab:~2]


### Link to launch search

In [218]:
display(
    HTML(
        f'<a href="{pubmed_search_url + keyword_proximity_search_string.replace(" ", "+")}">Search PubMed with keyword proximity search string</a>'
    )
)

### Proximity search for Excel

The output from this cell can be used to create line-by-line search documentation in an Excel spreadsheet.

Excel will strip out quotes if we try to format them the PubMed way, so it has extras.

In [219]:
keyword_proximity_searches_xls = [
    f'"""{topic1_term} {topic2_term}""[tiab:~{proximity_distance}]'
    for topic1_term in topic1_terms
    for topic2_term in topic2_terms
]

for keyword_search in keyword_proximity_searches_xls:
    print(keyword_search)

"""one four""[tiab:~2]
"""one fours""[tiab:~2]
"""one five""[tiab:~2]
"""one fives""[tiab:~2]
"""one six""[tiab:~2]
"""one sixes""[tiab:~2]
"""ones four""[tiab:~2]
"""ones fours""[tiab:~2]
"""ones five""[tiab:~2]
"""ones fives""[tiab:~2]
"""ones six""[tiab:~2]
"""ones sixes""[tiab:~2]
"""two four""[tiab:~2]
"""two fours""[tiab:~2]
"""two five""[tiab:~2]
"""two fives""[tiab:~2]
"""two six""[tiab:~2]
"""two sixes""[tiab:~2]
"""twos four""[tiab:~2]
"""twos fours""[tiab:~2]
"""twos five""[tiab:~2]
"""twos fives""[tiab:~2]
"""twos six""[tiab:~2]
"""twos sixes""[tiab:~2]
"""three four""[tiab:~2]
"""three fours""[tiab:~2]
"""three five""[tiab:~2]
"""three fives""[tiab:~2]
"""three six""[tiab:~2]
"""three sixes""[tiab:~2]
"""threes four""[tiab:~2]
"""threes fours""[tiab:~2]
"""threes five""[tiab:~2]
"""threes fives""[tiab:~2]
"""threes six""[tiab:~2]
"""threes sixes""[tiab:~2]


## Construct keyword intersection search strings (Boolean AND)

### Enter Term Lists

#### Enter term list for first topic

In [199]:
%%writefile topic1-keywords-intersection.txt
first*
second*
third*

Overwriting topic1-keywords-intersection.txt


#### Enter term list for second topic

In [200]:
%%writefile topic2-keywords-intersection.txt
fourth*
fifth*
sixth*

Overwriting topic2-keywords-intersection.txt


In [201]:
with open("topic1-keywords-intersection.txt") as f:
    topic1_terms = f.read().splitlines()

with open("topic2-keywords-intersection.txt") as f:
    topic2_terms = f.read().splitlines()

In [202]:
keyword_intersection_searches = [
    "(" + topic1_term + " AND " + topic2_term + ")"
    for topic1_term in topic1_terms
    for topic2_term in topic2_terms
]

for keyword_search in keyword_intersection_searches:
    print(keyword_search)

(first* AND fourth*)
(first* AND fifth*)
(first* AND sixth*)
(second* AND fourth*)
(second* AND fifth*)
(second* AND sixth*)
(third* AND fourth*)
(third* AND fifth*)
(third* AND sixth*)


In [203]:
keyword_intersection_search_string = " OR ".join(keyword_intersection_searches)

print(keyword_intersection_search_string)

(first* AND fourth*) OR (first* AND fifth*) OR (first* AND sixth*) OR (second* AND fourth*) OR (second* AND fifth*) OR (second* AND sixth*) OR (third* AND fourth*) OR (third* AND fifth*) OR (third* AND sixth*)


Link to launch search

In [204]:
display(
    HTML(
        f'<a href="{pubmed_search_url + keyword_intersection_search_string.replace(" ", "+")}">Search PubMed with keyword intersection search string</a>'
    )
)