# 0.0 Scratch
## Published Topics in Global Health
### Author: Isabelle Feldhaus
#### Last updated: 25 July 2023
#### Previously updated: 16 December 2022

The objective of this analysis is to identify the most published topics in the field of global health looking at the titles published in a pre-defined set of journals from 2010 to 2022. Titles and metadata will be extracted from PubMed.

Potential sub-analyses: open-access vs. restricted access, setting/countries, universities/author institutions, eventually to extend to journal publication times. 

In this notebook, I gather the metadata of the articles in selected journals.

In [1]:
# Import libraries and functions
import pubmed_fetcher

In [12]:
# List of journals to search 
journals = ["Lancet Glob Health", "Health Policy Plan"]

In [13]:
import entrezpy.esearch.esearcher
for i in journals:
    e = entrezpy.esearch.esearcher.Esearcher('esearcher', '')
    a = es.inquire('db':'pubmed','term':journals[i]+ '[journal]+AND+' + dateRange + '[pdat]', 'retmax': 110000, 'rettype': 'uilist')
    print(a.get_result().uids)

SyntaxError: invalid syntax (<ipython-input-13-04ee046a64f1>, line 4)

In [14]:
# Working example
import entrezpy.esearch.esearcher

e = entrezpy.esearch.esearcher.Esearcher('entrezpy', 'ifeldhaus@g.harvard.edu')
a = e.inquire({'db':'pubmed','term':'lancet glob health[journal] AND 2010:2022[pdat]', 'rettype': 'uilist'})
print(a.result.count, a.result.uids)

3021 ['36565706', '36565705', '36565704', '36565703', '36565702', '36563699', '36528032', '36528031', '36525983', '36493797', '36480932', '36480931', '36463917', '36462511', '36455593', '36442498', '36435182', '36435181', '36427517', '36423645', '36403587', '36400090', '36400089', '36400088', '36400087', '36400086', '36400085', '36400084', '36400083', '36400082', '36400081', '36400080', '36400079', '36400078', '36400077', '36400076', '36400075', '36400074', '36400073', '36400072', '36400071', '36400070', '36372077', '36370714', '36332655', '36327997', '36309034', '36309033', '36306809', '36272437', '36240832', '36240831', '36240830', '36240829', '36240828', '36240827', '36240826', '36240825', '36240824', '36240823', '36240822', '36240821', '36240820', '36240819', '36240818', '36240817', '36240816', '36240815', '36240814', '36240813', '36240812', '36240811', '36240810', '36240809', '36240808', '36240807', '36240806', '36209761', '36183737', '36179736', '36179735', '36179734', '36162427'

In [15]:
e = entrezpy.esearch.esearcher.Esearcher('entrezpy', 'ifeldhaus@g.harvard.edu')
results = []
for journal in journals:
    a = e.inquire({'db':'pubmed','term': journal+'[journal] AND 2010:2022[pdat]', 'rettype': 'uilist'})
    results.append([journal, a.result.count, a.result.uids])
#     print(a.result.count, a.result.uids)

RuntimeError: threads can only be started once

In [7]:
journals[0]

'Lancet Glob Health'

In [14]:
for i in journals:
    tt = i + 'journal'
    print(i)

Lancet Glob Health
Health Policy Plan


**Continued 19 July 2023**

The below functions were generated by Bard using the following prompt: "I would like to create a dataset of the metadata of articles in a specific list of academic journals between 2010 and 2023, especially the article keywords. Write python code to extract metadata from a list of journals using the entrezpy python library."

Note: These don't work, but may provide a nice structure. 

In [None]:
import entrezpy

def get_metadata(journal_name):
  """Retrieves the metadata for all articles in the specified journal.

  Args:
    journal_name: The name of the journal to retrieve metadata for.

  Returns:
    A list of dictionaries, each containing the metadata for a single article.
  """

  entrez_client = entrezpy.Entrez()
  search_results = entrez_client.search(db="pubmed", term="journal:{}".format(journal_name))
  metadata_list = []
  for article in search_results["PubmedArticle"]:
    metadata = {
        "article_id": article["PMID"],
        "title": article["Title"],
        "authors": article["AuthorList"],
        "journal": article["Journal"],
        "publication_date": article["PubDate"],
    }
    metadata_list.append(metadata)
  return metadata_list

if __name__ == "__main__":
  metadata_list = get_metadata("Lancet Glob Health")
  print(metadata_list)

In [None]:
import entrezpy

def get_metadata(journal_list, start_year, end_year):
  """Retrieves the metadata for all articles in the specified journals between the specified years.

  Args:
    journal_list: A list of journal names.
    start_year: The start year of the publication range.
    end_year: The end year of the publication range.

  Returns:
    A list of dictionaries, each containing the metadata for a single article.
  """

  entrez_client = entrezpy.Entrez()
  search_results = entrez_client.search(db="pubmed", term="journal:{}".format(",".join(journal_list)))
  metadata_list = []
  for article in search_results["PubmedArticle"]:
    if article["PubDate"][0:4] >= start_year and article["PubDate"][0:4] <= end_year:
      metadata = {
          "article_id": article["PMID"],
          "title": article["Title"],
          "authors": article["AuthorList"],
          "journal": article["Journal"],
          "publication_date": article["PubDate"],
          "keywords": article["Keywords"],
      }
      metadata_list.append(metadata)
  return metadata_list

if __name__ == "__main__":
  journal_list = ["Lancet Glob Health", "Nature Medicine", "Science"]
  start_year = 2010
  end_year = 2023
  metadata_list = get_metadata(journal_list, start_year, end_year)
  print(metadata_list)

In [None]:
import entrezpy

def get_metadata(journal_list, start_year, end_year):
  """Retrieves the metadata for all articles in the specified journals.

  Args:
    journal_list: A list of journal names to retrieve metadata for.
    start_year: The start year of the articles to retrieve metadata for.
    end_year: The end year of the articles to retrieve metadata for.

  Returns:
    A list of dictionaries, each containing the metadata for a single article.
  """

  entrez_client = entrezpy.Entrez()
  metadata_list = []
  for journal in journal_list:
    search_results = entrez_client.search(db="pubmed", term="journal:{}".format(journal),
                                            mindate=start_year,
                                            maxdate=end_year)
    for article in search_results["PubmedArticle"]:
      metadata = {
          "article_id": article["PMID"],
          "title": article["Title"],
          "authors": article["AuthorList"],
          "journal": article["Journal"],
          "publication_date": article["PubDate"],
          "keywords": article["KeywordList"],
      }
      metadata_list.append(metadata)
  return metadata_list

if __name__ == "__main__":
  journal_list = ["Lancet Glob Health", "Nature Medicine", "Science"]
  start_year = 2010
  end_year = 2023
  metadata_list = get_metadata(journal_list, start_year, end_year)
  print(metadata_list)

Back to normal programming: 

Idea above being to get all of the relevant UIDs and then get the metadata for those UIDs. Ideally, we could just pull the relevant metadata directly based on search parameters (i.e., journal name, date range).

Following the tutorial and building a Conduit pipeline / replicating pubmed_fetcher.py:

In [17]:
# Import libraries
import os
import sys
import json
import xml.etree.ElementTree
import entrezpy
import entrezpy.conduit
import entrezpy.base.result
import entrezpy.base.analyzer

In [36]:
class PubmedRecord:
    """Simple data class to store individual Pubmed records. Individual authors will 
    be stored as dict('lname':last_name, 'fname': first_name) in authors. Citations 
    as string elements in the list citations. """

    def __init__(self):
        self.pmid = None
        self.title = None
        self.abstract = None
        self.keywords = None
        self.authors = []
        self.journal = None
        self.references = []
        
class PubmedResult(entrezpy.base.result.EutilsResult):
    """Derive class entrezpy.base.result.EutilsResult to store Pubmed queries.
    Individual Pubmed records are implemented in :class:`PubmedRecord` and stored in 
    :ivar:`pubmed_records`.

    :param response: inspected response from :class:`PubmedAnalyzer`
    :param request: the request for the current response
    :ivar dict pubmed_records: storing PubmedRecord instances"""

    def __init__(self, response, request):
        super().__init__(request.eutil, request.query_id, request.db)
        self.pubmed_records = {}

    def size(self):
        """Implement virtual method :meth:`entrezpy.base.result.EutilsResult.size`
        returning the number of stored data records."""
        return len(self.pubmed_records)

    def isEmpty(self):
        """Implement virtual method :meth:`entrezpy.base.result.EutilsResult.isEmpty`
        to query if any records have been stored at all."""
        if not self.pubmed_records:
            return True
        return False

    def get_link_parameter(self, reqnum=0):
        """Implement virtual method 
        :meth:`entrezpy.base.result.EutilsResult.get_link_parameter`.
        Fetching a pubmed record has no intrinsic elink capabilities and therefore
        should inform users about this."""
        print("{} has no elink capability".format(self))
        return {}

    def dump(self):
        """Implement virtual method :meth:`entrezpy.base.result.EutilsResult.dump`.

        :return: instance attributes
        :rtype: dict
        """
        return {self:{'dump':{'pubmed_records':[x for x in self.pubmed_records],
                              'query_id': self.query_id, 'db':self.db,
                              'eutil':self.function}}}

    def add_pubmed_record(self, pubmed_record):
        """The only non-virtual and therefore PubmedResult-specific method to handle
        adding new data records"""
        self.pubmed_records[pubmed_record.pmid] = pubmed_record

class PubmedAnalyzer(entrezpy.base.analyzer.EutilsAnalyzer):
    """Derived class of :class:`entrezpy.base.analyzer.EutilsAnalyzer` to analyze and 
    parse PubMed responses and requests."""

    def __init__(self):
        super().__init__()

    def init_result(self, response, request):
        """Implemented virtual method :meth:`entrezpy.base.analyzer.init_result`.
        This method initiate a result instance when analyzing the first response"""
        if self.result is None:
          self.result = PubmedResult(response, request)

    def analyze_error(self, response, request):
        """Implement virtual method :meth:`entrezpy.base.analyzer.analyze_error`. 
        Since we expect XML errors, just print the error to STDOUT for 
        logging/debugging."""
        print(json.dumps({__name__:{'Response': {'dump' : request.dump(),
                                                 'error' : response.getvalue()}}}))

    def analyze_result(self, response, request):
        """Implement virtual method :meth:`entrezpy.base.analyzer.analyze_result`.
        Parse PubMed  XML line by line to extract authors and citations.
        xml.etree.ElementTree.iterparse
        (https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse)
        reads the XML file incrementally. Each  <PubmedArticle> is cleared after processing.

        ..note::  Adjust this method to include more/different tags to extract.
                  Remember to adjust :class:`.PubmedRecord` as well."""
        self.init_result(response, request)
        isAuthorList = False
        isAuthor = False
        isKeywordsList = False
        isKeywords = False
        isRefList = False
        isRef = False
        isArticle = False
        isJournal = False
        medrec = None
        for event, elem in xml.etree.ElementTree.iterparse(response, events=["start", "end"]):
            if event == 'start':
                if elem.tag == 'PubmedArticle':
                    medrec = PubmedRecord()
                if elem.tag == 'AuthorList':
                    isAuthorList = True
                if isAuthorList and elem.tag == 'Author':
                    isAuthor = True
                    medrec.authors.append({'fname': None, 'lname': None})
                if elem.tag == 'ReferenceList':
                    isRefList = True
                if isRefList and elem.tag == 'Reference':
                    isRef = True
                if elem.tag == 'KeywordsList':
                    isKeywordsList = True
                if isKeywordsList and elem.tag == 'Keywords':
                    isKeywords = True
                if elem.tag == 'Article':
                    isArticle = True
                if elem.tag == 'Journal':
                    isJournal = True
            else: 
                if elem.tag == 'PubmedArticle':
                    self.result.add_pubmed_record(medrec)
                    elem.clear()
                if elem.tag == 'AuthorList':
                    isAuthorList = False
                if isAuthorList and elem.tag == 'Author':
                    isAuthor = False
                if elem.tag == 'ReferenceList':
                    isRefList = False
                if elem.tag == 'Reference':
                    isRef = False
                if elem.tag == 'Article':
                    isArticle = False
                if elem.tag == 'Journal':
                    isJournal = False
                if isJournal and elem.tag == 'ISOAbbreviation':
                    medrec.journal = elem.text.strip()
                if elem.tag == 'PMID':
                    medrec.pmid = elem.text.strip()
                if isAuthor and elem.tag == 'LastName':
                    medrec.authors[-1]['lname'] = elem.text.strip()
                if isAuthor and elem.tag == 'ForeName':
                    medrec.authors[-1]['fname'] = elem.text.strip()
                if isRef and elem.tag == 'Citation':
                    medrec.references.append(elem.text.strip())
                if isArticle and elem.tag == 'AbstractText':
                    if not medrec.abstract:
                        medrec.abstract = elem.text.strip()
                    else:
                        medrec.abstract += elem.text.strip()
                if isArticle and elem.tag == 'ArticleTitle':
                    medrec.title = elem.text.strip()

def main():
    c = entrezpy.conduit.Conduit(sys.argv[1])
    fetch_pubmed = c.new_pipeline()
    fetch_pubmed.add_fetch({'db':'pubmed', 'id':[sys.argv[2].split(',')],
                          'retmode':'xml'}, analyzer=PubmedAnalyzer())
    
    a = c.run(fetch_pubmed)

    #print(a)
    # Testing PubmedResult
    #print("DUMP: {}".format(a.get_result().dump()))
    #print("SIZE: {}".format(a.get_result().size()))
    #print("LINK: {}".format(a.get_result().get_link_parameter()))

    res = a.get_result()
    print("PMID","Title","Abstract","Authors", "Journal", "RefCount", "References", sep='=')
    for i in res.pubmed_records:
        print("{}={}={}={}={}={}={}".format(res.pubmed_records[i].pmid, 
                                            res.pubmed_records[i].title, 
                                            res.pubmed_records[i].abstract,
                                            ';'.join(str(x['lname']+","+x['fname'].replace(' ', '')) for x in res.pubmed_records[i].authors),
                                            res.pubmed_records[i].journal,
                                            len(res.pubmed_records[i].references),
                                            ';'.join(x for x in res.pubmed_records[i].references)))
        return 0

if __name__ == '__main__':
    main()

PMID=Title=Abstract=Authors=Journal=RefCount=References
7=Maturation of the adrenal medulla--IV. Effects of morphine.=None=Anderson,TR;Slotkin,TA=Biochem Pharmacol=80=Br J Gen Pract. 1999 Oct;49(447):823-8;J Gastroenterol Hepatol. 2000 Oct;15(10):1093-9;Ann Intern Med. 2001 Jul 3;135(1):68-9;J Viral Hepat. 2001 Sep;8(5):358-66;BJOG. 2002 Mar;109(3):227-35;Drug Saf. 2002;25(5):323-44;Am J Med. 2002 Oct 15;113(6):506-15;J Altern Complement Med. 2003 Feb;9(1):161-8;Psychosomatics. 2003 Jul-Aug;44(4):271-82;Aliment Pharmacol Ther. 2003 Sep 1;18(5):451-71;Dig Dis Sci. 2003 Oct;48(10):1925-8;Cochrane Database Syst Rev. 2004;(1):CD002286;Mycoses. 2004 Apr;47(3-4):87-92;Planta Med. 2004 Apr;70(4):293-8;J Herb Pharmacother. 2004;4(1):49-67;J Herb Pharmacother. 2003;3(2):69-90;J Herb Pharmacother. 2003;3(1):121-33;J Herb Pharmacother. 2002;2(3):49-72;J Herb Pharmacother. 2002;2(1):71-85;J Herb Pharmacother. 2004;4(2):63-78;Expert Opin Pharmacother. 2004 Dec;5(12):2485-501;Drug Saf. 2005;28(4):31

Quickly trying out ChatGPT-3:

In [38]:
from Bio import Entrez

def fetch_publication_info(email, journal_list):
    """
    Fetch publication information from specified academic journals on PubMed.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'keywords': List of keywords.
            - 'journal': Journal name.
            - 'references': List of references.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # Search for articles using the query
        handle = Entrez.esearch(db='pubmed', term=query)
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        # Fetch detailed information for each article
        for pmid in record['IdList']:
            handle = Entrez.efetch(db='pubmed', id=pmid, retmode='xml')
            article = Entrez.read(handle)[0]
            handle.close()

            # Extract relevant information from the article record
            pubmed_id = article['MedlineCitation']['PMID']
            title = article['MedlineCitation']['Article']['ArticleTitle']
            abstract = article['MedlineCitation']['Article'].get('Abstract', {}).get('AbstractText', '')
            authors = [author['LastName'] + ' ' + author['Initials'] for author in article['MedlineCitation']['Article']['AuthorList']]
            keywords = article['MedlineCitation']['KeywordList']
            journal_name = article['MedlineCitation']['Article']['Journal']['Title']
            references = article['PubmedData'].get('ReferenceList', [])

            # Add the information to the publication_info dictionary
            publication_info[pubmed_id] = {
                'title': title,
                'abstract': abstract,
                'authors': authors,
                'keywords': keywords,
                'journal': journal_name,
                'references': references
            }

    return publication_info

# # Example usage:
# email_address = 'your_email@example.com'
# journals = ['Nature', 'Science', 'Cell']
# pub_info = fetch_publication_info(email_address, journals)

# # Print the fetched information for each article
# for pmid, info in pub_info.items():
#     print(f"PubMed ID: {pmid}")
#     print(f"Title: {info['title']}")
#     print(f"Abstract: {info['abstract']}")
#     print(f"Authors: {', '.join(info['authors'])}")
#     print(f"Keywords: {', '.join(info['keywords'])}")
#     print(f"Journal: {info['journal']}")
#     print(f"References: {', '.join(info['references'])}")
#     print("\n")

In [44]:
# Adding a line to handle when function can't find any articles...
from Bio import Entrez

def fetch_publication_info(email, journal_list):
    """
    Fetch publication information from specified academic journals on PubMed.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'keywords': List of keywords.
            - 'journal': Journal name.
            - 'references': List of references.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # Search for articles using the query
        handle = Entrez.esearch(db='pubmed', term=query)
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        # Fetch detailed information for each article
        for pmid in record['IdList']:
            handle = Entrez.efetch(db='pubmed', id=pmid, retmode='xml')
            article = Entrez.read(handle)[0]
            handle.close()

            # Extract relevant information from the article record
            pubmed_id = article['MedlineCitation']['PMID']
            title = article['MedlineCitation']['Article']['ArticleTitle']
            abstract = article['MedlineCitation']['Article'].get('Abstract', {}).get('AbstractText', '')
            authors = [author['LastName'] + ' ' + author['Initials'] for author in article['MedlineCitation']['Article']['AuthorList']]
            keywords = article['MedlineCitation']['KeywordList']
            journal_name = article['MedlineCitation']['Article']['Journal']['Title']
            references = article['PubmedData'].get('ReferenceList', [])

            # Add the information to the publication_info dictionary
            publication_info[pubmed_id] = {
                'title': title,
                'abstract': abstract,
                'authors': authors,
                'keywords': keywords,
                'journal': journal_name,
                'references': references
            }

    return publication_info

# # Example usage:
# email_address = 'your_email@example.com'
# journals = ['Lancet Glob Health', 'Health Policy Plan']
# pub_info = fetch_publication_info(email_address, journals)

# # Print the fetched information for each article
# for pmid, info in pub_info.items():
#     print(f"PubMed ID: {pmid}")
#     print(f"Title: {info['title']}")
#     print(f"Abstract: {info['abstract']}")
#     print(f"Authors: {', '.join(info['authors'])}")
#     print(f"Keywords: {', '.join(info['keywords'])}")
#     print(f"Journal: {info['journal']}")
#     print(f"References: {', '.join(info['references'])}")
#     print("\n")

In [45]:
fetch_publication_info('ifeld03@gmail.com', ['Lancet Glob Health', 'Health Policy Plan'])

KeyError: 0

In [47]:
from Bio import Entrez

def fetch_publication_info(email, journal_list):
    """
    Fetch publication information from specified academic journals on PubMed.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'keywords': List of keywords.
            - 'journal': Journal name.
            - 'references': List of references.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # Search for articles using the query and fetch IDs
        handle = Entrez.esearch(db='pubmed', term=query)
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        # Fetch detailed information for each article
        for pmid in record['IdList']:
            handle = Entrez.efetch(db='pubmed', id=pmid, rettype='medline', retmode='text')
            article_text = handle.read()
            handle.close()

            # Parse the article text to extract relevant information
            article_info = {}
            for line in article_text.splitlines():
                line = line.strip()
                if line.startswith('PMID-'):
                    article_info['pubmed_id'] = line.replace('PMID-', '')
                elif line.startswith('TI  - '):
                    article_info['title'] = line.replace('TI  - ', '')
                elif line.startswith('AB  - '):
                    article_info['abstract'] = line.replace('AB  - ', '')
                elif line.startswith('AU  - '):
                    article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                elif line.startswith('KW  - '):
                    article_info.setdefault('keywords', []).append(line.replace('KW  - ', ''))
                elif line.startswith('JT  - '):
                    article_info['journal'] = line.replace('JT  - ', '')
                elif line.startswith('RN  - '):
                    article_info.setdefault('references', []).append(line.replace('RN  - ', ''))

            # Add the information to the publication_info dictionary
            publication_info[pmid] = article_info

    return publication_info

# Example usage:
email_address = 'ifeld03@gmail.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
pub_info = fetch_publication_info(email_address, journals)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Keywords: {', '.join(info.get('keywords', []))}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"References: {', '.join(info.get('references', []))}")
    print("\n")

PubMed ID: 37474237
Title: Is this pill an antibiotic or a painkiller? Improving the identification of oral
Abstract: In this Viewpoint, we discuss how the identification of oral antibiotics and
Authors: Monnier AA, Do NTT, Asante KP, Afari-Asiedu S, Khan WA, Munguambe K, Sevene E, Tran TK, Nguyen CTK, Punpuing S, Gomez-Olive FX, van Doorn HR, Caillet C, Newton PN, Ariana P, Wertheim HFL
Keywords: 
Journal: The Lancet. Global health
References: 


PubMed ID: 37474236
Title: The values and risks of an Intergovernmental Panel for One Health to strengthen
Abstract: The COVID-19 pandemic has shown the need for better global governance of pandemic
Authors: Hobeika A, Stauffer MHT, Dub T, van Bortel W, Beniston M, Bukachi S, Burci GL, Crump L, Markotter W, Sepe LP, Placella E, Roche B, Thiongane O, Wang Z, Guerin F, van Kleef E
Keywords: 
Journal: The Lancet. Global health
References: 


PubMed ID: 37474235
Title: Effects of an urban cable car intervention on physical activity: the TrUST
Abs

In [48]:
from Bio import Entrez

def fetch_publication_info(email, journal_list, start_date=None, end_date=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'keywords': List of keywords.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'mesh_terms': List of MeSH terms.
            - 'references': List of references.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Search for articles using the query and fetch IDs
        handle = Entrez.esearch(db='pubmed', term=query)
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        # Fetch detailed information for each article
        for pmid in record['IdList']:
            handle = Entrez.efetch(db='pubmed', id=pmid, rettype='medline', retmode='text')
            article_text = handle.read()
            handle.close()

            # Parse the article text to extract relevant information
            article_info = {}
            for line in article_text.splitlines():
                line = line.strip()
                if line.startswith('PMID-'):
                    article_info['pubmed_id'] = line.replace('PMID-', '')
                elif line.startswith('TI  - '):
                    article_info['title'] = line.replace('TI  - ', '')
                elif line.startswith('AB  - '):
                    article_info['abstract'] = line.replace('AB  - ', '')
                elif line.startswith('AU  - '):
                    article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                elif line.startswith('KW  - '):
                    article_info.setdefault('keywords', []).append(line.replace('KW  - ', ''))
                elif line.startswith('JT  - '):
                    article_info['journal'] = line.replace('JT  - ', '')
                elif line.startswith('DP  - '):
                    article_info['publication_date'] = line.replace('DP  - ', '')
                elif line.startswith('MH  - '):
                    article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                elif line.startswith('RN  - '):
                    article_info.setdefault('references', []).append(line.replace('RN  - ', ''))

            # Add the information to the publication_info dictionary
            publication_info[pmid] = article_info

    return publication_info

# Example usage:
email_address = 'ifeld03@gmail.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
pub_info = fetch_publication_info(email_address, journals, start_date, end_date)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Keywords: {', '.join(info.get('keywords', []))}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"References: {', '.join(info.get('references', []))}")
    print("\n")

PubMed ID: 37429304
Title: Universal health coverage is a matter of equity, rights, and justice.
Abstract: 
Authors: Cuevas Barron G, Koonin J, Akselrod S, Fogstad H, Karema C, Ditiu L, Dain K, Joshi N
Keywords: 
Journal: The Lancet. Global health
Publication Date: 2023 Jul 7
MeSH Terms: 
References: 


PubMed ID: 37390834
Title: Implications for assessing the association between maternal anaemia and
Abstract: 
Authors: Khan MN
Keywords: 
Journal: The Lancet. Global health
Publication Date: 2023 Aug
MeSH Terms: 
References: 


PubMed ID: 37390833
Title: Maternal anaemia and the risk of postpartum haemorrhage: a cohort analysis of
Abstract: BACKGROUND: Worldwide, more than half a billion women of reproductive age are
Authors: 
Keywords: 
Journal: The Lancet. Global health
Publication Date: 2023 Aug
MeSH Terms: 
References: 


PubMed ID: 37349046
Title: Bending the HIV epidemic curve: can prevention cascades show us how?
Abstract: 
Authors: Ferrand RA, Kranzer K
Keywords: 
Journal: The L

In [51]:
len(pub_info)

40

In [52]:
email_address = 'ifeld03@gmail.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2022/01/01'
end_date = '2023/07/21'
pub_info = fetch_publication_info(email_address, journals, start_date, end_date)

In [53]:
len(pub_info)

40

This function results in 40 search results as a maximum. The function below attempts to extend the `retmax` parameter of `Entrez.esearch()`.

In [56]:
from Bio import Entrez

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'keywords': List of keywords.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'mesh_terms': List of MeSH terms.
            - 'references': List of references.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Search for articles using the query and fetch IDs
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results)
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        # Fetch detailed information for each article
        for pmid in record['IdList']:
            handle = Entrez.efetch(db='pubmed', id=pmid, rettype='medline', retmode='text')
            article_text = handle.read()
            handle.close()

            # Parse the article text to extract relevant information
            article_info = {}
            for line in article_text.splitlines():
                line = line.strip()
                if line.startswith('PMID-'):
                    article_info['pubmed_id'] = line.replace('PMID-', '')
                elif line.startswith('TI  - '):
                    article_info['title'] = line.replace('TI  - ', '')
                elif line.startswith('AB  - '):
                    article_info['abstract'] = line.replace('AB  - ', '')
                elif line.startswith('AU  - '):
                    article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                elif line.startswith('KW  - '):
                    article_info.setdefault('keywords', []).append(line.replace('KW  - ', ''))
                elif line.startswith('JT  - '):
                    article_info['journal'] = line.replace('JT  - ', '')
                elif line.startswith('DP  - '):
                    article_info['publication_date'] = line.replace('DP  - ', '')
                elif line.startswith('MH  - '):
                    article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                elif line.startswith('RN  - '):
                    article_info.setdefault('references', []).append(line.replace('RN  - ', ''))

            # Add the information to the publication_info dictionary
            publication_info[pmid] = article_info

    return publication_info

# Example usage:
email_address = 'ifeld03@gmail.com'
journals = ['Lancet Glob Health']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 300 ## max results appears to be 250 articles at a time
pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Keywords: {', '.join(info.get('keywords', []))}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"References: {', '.join(info.get('references', []))}")
    print("\n")

PubMed ID: 37429304
Title: Universal health coverage is a matter of equity, rights, and justice.
Abstract: 
Authors: Cuevas Barron G, Koonin J, Akselrod S, Fogstad H, Karema C, Ditiu L, Dain K, Joshi N
Keywords: 
Journal: The Lancet. Global health
Publication Date: 2023 Jul 7
MeSH Terms: 
References: 


PubMed ID: 37390834
Title: Implications for assessing the association between maternal anaemia and
Abstract: 
Authors: Khan MN
Keywords: 
Journal: The Lancet. Global health
Publication Date: 2023 Aug
MeSH Terms: 
References: 


PubMed ID: 37390833
Title: Maternal anaemia and the risk of postpartum haemorrhage: a cohort analysis of
Abstract: BACKGROUND: Worldwide, more than half a billion women of reproductive age are
Authors: 
Keywords: 
Journal: The Lancet. Global health
Publication Date: 2023 Aug
MeSH Terms: 
References: 


PubMed ID: 37349046
Title: Bending the HIV epidemic curve: can prevention cascades show us how?
Abstract: 
Authors: Ferrand RA, Kranzer K
Keywords: 
Journal: The L

Keywords: 
Journal: The Lancet. Global health
Publication Date: 2023 Feb
MeSH Terms: Humans, *COVID-19 Vaccines, Vaccination Hesitancy, *COVID-19/epidemiology/prevention & control, Vaccination Refusal, Self Report, Vaccination
References: 0 (COVID-19 Vaccines)


PubMed ID: 36563699
Title: Cost-effectiveness of voluntary medical male circumcision for HIV prevention
Abstract: BACKGROUND: Voluntary medical male circumcision (VMMC) has been a recommended HIV
Authors: Bansi-Matharu L, Mudimu E, Martin-Hughes R, Hamilton M, Johnson L, Ten Brink D, Stover J, Meyer-Rath G, Kelly SL, Jamieson L, Cambiano V, Jahn A, Cowan FM, Mangenah C, Mavhu W, Chidarikire T, Toledo C, Revill P, Sundaram M, Hatzold K, Yansaneh A, Apollo T, Kalua T, Mugurungi O, Kiggundu V, Zhang S, Nyirenda R, Phillips A, Kripke K, Bershteyn A
Keywords: 
Journal: The Lancet. Global health
Publication Date: 2023 Feb
MeSH Terms: Humans, Male, Cost-Benefit Analysis, *HIV Infections/epidemiology/prevention & control, *Circumcision

In [57]:
len(pub_info)

250

In [58]:
email_address = 'ifeld03@gmail.com'
journals = ['Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 300
pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

In [59]:
len(pub_info)

83

Information to add: Affiliation (AD), Language (LA), Place of Publication (PL), Publication History Status (PHST), Corporate Author (CN).

Retrieve search results in batches to overcome the limitation of 250 results per query:

In [65]:
from Bio import Entrez

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'keywords': List of keywords.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'mesh_terms': List of MeSH terms.
            - 'references': List of references.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results)
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        # Fetch detailed information for each article using Entrez.efetch() with retstart and retmax
        id_list = record['IdList']
        batch_size = 200  # Fetch up to 200 records per batch

        for start in range(0, len(id_list), batch_size):
            batch_ids = id_list[start : start + batch_size]
            id_string = ','.join(batch_ids)

            handle = Entrez.efetch(db='pubmed', id=id_string, rettype='medline', retmode='text',
                                   retstart=start, retmax=batch_size)
            article_text = handle.read()
            handle.close()

            # Parse the article text to extract relevant information
            for article_text in article_text.split('\n\n'):
                article_info = {}
                for line in article_text.splitlines():
                    line = line.strip()
                    if line.startswith('PMID-'):
                        article_info['pubmed_id'] = line.replace('PMID-', '')
                    elif line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        article_info['abstract'] = line.replace('AB  - ', '')
                    elif line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('KW  - '):
                        article_info.setdefault('keywords', []).append(line.replace('KW  - ', ''))
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('RN  - '):
                        article_info.setdefault('references', []).append(line.replace('RN  - ', ''))

                # Add the information to the publication_info dictionary
                publication_info[article_info['pubmed_id']] = article_info

    return publication_info

# Example usage:
email_address = 'ifeld03@gmail.com'
journals = ['Lancet Glob Health']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# # Print the fetched information for each article
# for pmid, info in pub_info.items():
#     print(f"PubMed ID: {pmid}")
#     print(f"Title: {info.get('title', '')}")
#     print(f"Abstract: {info.get('abstract', '')}")
#     print(f"Authors: {', '.join(info.get('authors', []))}")
#     print(f"Keywords: {', '.join(info.get('keywords', []))}")
#     print(f"Journal: {info.get('journal', '')}")
#     print(f"Publication Date: {info.get('publication_date', '')}")
#     print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
#     print(f"References: {', '.join(info.get('references', []))}")
#     print("\n")

HTTPError: HTTP Error 400: Bad Request

In [69]:
from Bio import Entrez

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'keywords': List of keywords.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'mesh_terms': List of MeSH terms.
            - 'references': List of references.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        # Fetch detailed information for each article using WebEnv and query_key
        webenv = record['WebEnv']
        query_key = record['QueryKey']
        retmax = 20  # Fetch up to 500 records per batch

        for start in range(0, max_results, retmax):
            batch_start = start + 1  # PubMed uses 1-based indexing
            batch_end = min(start + retmax, max_results)
            handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                   retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
            article_text = handle.read()
            handle.close()

            # Parse the article text to extract relevant information
            for article_text in article_text.split('\n\n'):
                article_info = {}
                for line in article_text.splitlines():
                    line = line.strip()
                    if line.startswith('PMID-'):
                        article_info['pubmed_id'] = line.replace('PMID-', '')
                    elif line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        article_info['abstract'] = line.replace('AB  - ', '')
                    elif line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('KW  - '):
                        article_info.setdefault('keywords', []).append(line.replace('KW  - ', ''))
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('RN  - '):
                        article_info.setdefault('references', []).append(line.replace('RN  - ', ''))

                # Add the information to the publication_info dictionary
                publication_info[article_info['pubmed_id']] = article_info

    return publication_info

# Example usage:
email_address = 'ifeld03@gmail.com'
journals = ['Lancet Glob Health']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# # Print the fetched information for each article
# for pmid, info in pub_info.items():
#     print(f"PubMed ID: {pmid}")
#     print(f"Title: {info.get('title', '')}")
#     print(f"Abstract: {info.get('abstract', '')}")
#     print(f"Authors: {', '.join(info.get('authors', []))}")
#     print(f"Keywords: {', '.join(info.get('keywords', []))}")
#     print(f"Journal: {info.get('journal', '')}")
#     print(f"Publication Date: {info.get('publication_date', '')}")
#     print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
#     print(f"References: {', '.join(info.get('references', []))}")
#     print("\n")

HTTPError: HTTP Error 400: Bad Request

In [70]:
from Bio import Entrez
import time

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'keywords': List of keywords.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'mesh_terms': List of MeSH terms.
            - 'references': List of references.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        # Fetch detailed information for each article using WebEnv and query_key
        webenv = record['WebEnv']
        query_key = record['QueryKey']
        retmax = 200  # Fetch up to 200 records per batch

        for start in range(0, max_results, retmax):
            batch_start = start + 1  # PubMed uses 1-based indexing
            batch_end = min(start + retmax, max_results)
            attempt = 1

            while attempt <= 3:
                try:
                    handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                           retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
                    article_text = handle.read()
                    handle.close()
                    break
                except Exception as e:
                    print(f"Error fetching batch {batch_start}-{batch_end}. Attempt {attempt} of 3. Error: {e}")
                    attempt += 1
                    time.sleep(15)

            # Parse the article text to extract relevant information
            for article_text in article_text.split('\n\n'):
                article_info = {}
                for line in article_text.splitlines():
                    line = line.strip()
                    if line.startswith('PMID-'):
                        article_info['pubmed_id'] = line.replace('PMID-', '')
                    elif line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        article_info['abstract'] = line.replace('AB  - ', '')
                    elif line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('KW  - '):
                        article_info.setdefault('keywords', []).append(line.replace('KW  - ', ''))
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('RN  - '):
                        article_info.setdefault('references', []).append(line.replace('RN  - ', ''))

                # Add the information to the publication_info dictionary
                publication_info[article_info['pubmed_id']] = article_info

    return publication_info

# Example usage:
email_address = 'ifeld03@gmail.com'
journals = ['Lancet Glob Health']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# # Print the fetched information for each article
# for pmid, info in pub_info.items():
#     print(f"PubMed ID: {pmid}")
#     print(f"Title: {info.get('title', '')}")
#     print(f"Abstract: {info.get('abstract', '')}")
#     print(f"Authors: {', '.join(info.get('authors', []))}")
#     print(f"Keywords: {', '.join(info.get('keywords', []))}")
#     print(f"Journal: {info.get('journal', '')}")
#     print(f"Publication Date: {info.get('publication_date', '')}")
#     print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
#     print(f"References: {', '.join(info.get('references', []))}")
#     print("\n")

Error fetching batch 401-600. Attempt 1 of 3. Error: HTTP Error 400: Bad Request
Error fetching batch 401-600. Attempt 2 of 3. Error: HTTP Error 400: Bad Request
Error fetching batch 401-600. Attempt 3 of 3. Error: HTTP Error 400: Bad Request
Error fetching batch 601-800. Attempt 1 of 3. Error: HTTP Error 400: Bad Request
Error fetching batch 601-800. Attempt 2 of 3. Error: HTTP Error 400: Bad Request
Error fetching batch 601-800. Attempt 3 of 3. Error: HTTP Error 400: Bad Request
Error fetching batch 801-1000. Attempt 1 of 3. Error: HTTP Error 400: Bad Request
Error fetching batch 801-1000. Attempt 2 of 3. Error: HTTP Error 400: Bad Request
Error fetching batch 801-1000. Attempt 3 of 3. Error: HTTP Error 400: Bad Request


In [71]:
len(pub_info)

250

Results are only storing the last 250 articles. This and the previous error - where it continues to try to fetch articles until it reaches the maximum 1000 articles is reached, regardless of the actual (fewer) number of articles found in search results. The following function should resolve these issues. 

I also refine the data included in the results. Keywords and references fields are removed since PubMed doesn't store this in its database. The final set of information for each article should include: 
* PubMed ID
* Title
* Abstract
* Authors
* Corporate Author(s) 
* Affiliation
* Journal
* Publication Date
* Place of Publication
* Publication History Status
* MeSH Terms
* Language

In [75]:
from Bio import Entrez
import time

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'pubmed_id': PubMed ID.
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'corporate_authors': List of corporate authors.
            - 'affiliation': Affiliation of the authors.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'place_of_publication': Place of publication.
            - 'publication_history_status': Publication history status.
            - 'mesh_terms': List of MeSH terms.
            - 'language': Language of the article.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        total_results = int(record['Count'])
        print(f"Total articles found for journal '{journal}': {total_results}")

        # Fetch detailed information for each article using WebEnv and query_key
        webenv = record['WebEnv']
        query_key = record['QueryKey']
        retmax = 500  # Fetch up to 500 records per batch

        for start in range(0, total_results, retmax):
            batch_start = start + 1  # PubMed uses 1-based indexing
            batch_end = min(start + retmax, total_results)
            attempt = 1

            while attempt <= 3:
                try:
                    handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                           retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
                    article_text = handle.read()
                    handle.close()
                    break
                except Exception as e:
                    print(f"Error fetching batch {batch_start}-{batch_end}. Attempt {attempt} of 3. Error: {e}")
                    attempt += 1
                    time.sleep(15)

            # Parse the article text to extract relevant information
            for article_text in article_text.split('\n\n'):
                article_info = {}
                for line in article_text.splitlines():
                    line = line.strip()
                    if line.startswith('PMID-'):
                        article_info['pubmed_id'] = line.replace('PMID-', '')
                    elif line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        article_info['abstract'] = line.replace('AB  - ', '')
                    elif line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('CN  - '):
                        article_info.setdefault('corporate_authors', []).append(line.replace('CN  - ', ''))
                    elif line.startswith('AD  - '):
                        article_info['affiliation'] = line.replace('AD  - ', '')
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('PL  - '):
                        article_info['place_of_publication'] = line.replace('PL  - ', '')
                    elif line.startswith('PHST - '):
                        article_info['publication_history_status'] = line.replace('PHST - ', '')
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('LA  - '):
                        article_info['language'] = line.replace('LA  - ', '')

                # Update the publication_info dictionary with the information from each batch
                publication_info.update({article_info['pubmed_id']: article_info})

            # Break the loop if all articles have been fetched
            if start + retmax >= total_results:
                break

    return publication_info

# Example usage:
email_address = 'your_email@example.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Corporate Author(s): {', '.join(info.get('corporate_authors', []))}")
    print(f"Affiliation: {info.get('affiliation', '')}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"Place of Publication: {info.get('place_of_publication', '')}")
    print(f"Publication History Status: {info.get('publication_history_status', '')}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"Language: {info.get('language', '')}")
    print("\n")

Total articles found for journal 'Lancet Glob Health': 251
Total articles found for journal 'Health Policy Plan': 83
PubMed ID:  37482074
Title: Correction to Lancet Glob Health 2023; 11: e1075-85.
Abstract: 
Authors: 
Corporate Author(s): 
Affiliation: 
Journal: The Lancet. Global health
Publication Date: 2023 Jul 20
Place of Publication: England
Publication History Status: 
MeSH Terms: 
Language: eng


PubMed ID:  37429304
Title: Universal health coverage is a matter of equity, rights, and justice.
Abstract: 
Authors: Cuevas Barron G, Koonin J, Akselrod S, Fogstad H, Karema C, Ditiu L, Dain K, Joshi N
Corporate Author(s): 
Affiliation: UHC2030, Geneva 1211, Switzerland. Electronic address: info@uhc2030.org.
Journal: The Lancet. Global health
Publication Date: 2023 Jul 7
Place of Publication: England
Publication History Status: 
MeSH Terms: 
Language: eng


PubMed ID:  37390834
Title: Implications for assessing the association between maternal anaemia and
Abstract: 
Authors: Khan MN
C

Abstract: BACKGROUND: Interactions between genes and early-life exposures during
Authors: Kumaran K, Birken C, Baillargeon JP, Dennis CL, Fraser WD, Huang H, Fan J, Lye S, Matthews SG, Norris SA
Corporate Author(s): 
Affiliation: University of the Witwatersrand, Johannesburg, South Africa.
Journal: The Lancet. Global health
Publication Date: 2023 Mar
Place of Publication: England
Publication History Status: 
MeSH Terms: Child, Infant, Pregnancy, Child, Preschool, Female, Humans, *Pediatric Obesity/epidemiology/prevention & control, Adiposity, Overweight/epidemiology/prevention & control, Life Change Events, Canada/epidemiology, China/epidemiology, South Africa
Language: eng


PubMed ID:  36866471
Title: Effects of a school-based lifestyle intervention on ideal cardiovascular health
Abstract: BACKGROUND: The prevalence of ideal cardiovascular health among Chinese children
Authors: Guo P, Zhou Y, Zhu Y
Corporate Author(s): 
Affiliation: Department of Maternal and Child Health, School of 

Publication Date: 2023 May 30
Place of Publication: England
Publication History Status: 
MeSH Terms: 
Language: eng


PubMed ID:  37243741
Title: Covid 19 corruption in the public health sector - Emerging evidence from
Abstract: This paper explores the Covid-19-related corruption in Bangladesh. Specifically,
Authors: Hossain M, Rahaman M, Rahman J
Corporate Author(s): 
Affiliation: College of Business and Public Management, Wenzhou-Kean University, China.
Journal: Health policy and planning
Publication Date: 2023 May 27
Place of Publication: England
Publication History Status: 
MeSH Terms: 
Language: eng


PubMed ID:  37217184
Title: Civil society priorities for global health: concepts and measurement.
Abstract: The global health agenda-a high stakes process in which problems are defined and
Authors: Smith SL
Corporate Author(s): 
Affiliation: Virginia Tech, 900 N. Glebe Rd., Arlington, VA 22203-1822, USA.
Journal: Health policy and planning
Publication Date: 2023 Jun 16
Place of Publi

Fixing issues with storage of publication history status and full abstracts. Issue of full abstracts is more likely that print truncates the string.

In [76]:
from Bio import Entrez
import time

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'pubmed_id': PubMed ID.
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'corporate_authors': List of corporate authors.
            - 'affiliation': Affiliation of the authors.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'place_of_publication': Place of publication.
            - 'publication_history_status': Publication history status.
            - 'mesh_terms': List of MeSH terms.
            - 'language': Language of the article.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        total_results = int(record['Count'])
        print(f"Total articles found for journal '{journal}': {total_results}")

        # Fetch detailed information for each article using WebEnv and query_key
        webenv = record['WebEnv']
        query_key = record['QueryKey']
        retmax = 500  # Fetch up to 500 records per batch

        for start in range(0, total_results, retmax):
            batch_start = start + 1  # PubMed uses 1-based indexing
            batch_end = min(start + retmax, total_results)
            attempt = 1

            while attempt <= 3:
                try:
                    handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                           retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
                    article_text = handle.read()
                    handle.close()
                    break
                except Exception as e:
                    print(f"Error fetching batch {batch_start}-{batch_end}. Attempt {attempt} of 3. Error: {e}")
                    attempt += 1
                    time.sleep(15)

            # Parse the article text to extract relevant information
            for article_text in article_text.split('\n\n'):
                article_info = {}
                for line in article_text.splitlines():
                    line = line.strip()
                    if line.startswith('PMID-'):
                        article_info['pubmed_id'] = line.replace('PMID-', '')
                    elif line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        article_info['abstract'] = line.replace('AB  - ', '')
                        for abstract_line in article_text.splitlines()[1:]:
                            if abstract_line.startswith('     '):
                                article_info['abstract'] += ' ' + abstract_line.strip()
                            else:
                                break
                    elif line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('CN  - '):
                        article_info.setdefault('corporate_authors', []).append(line.replace('CN  - ', ''))
                    elif line.startswith('AD  - '):
                        article_info['affiliation'] = line.replace('AD  - ', '')
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('PL  - '):
                        article_info['place_of_publication'] = line.replace('PL  - ', '')
                    elif line.startswith('PHST - '):
                        article_info['publication_history_status'] = line.replace('PHST - ', '')
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('LA  - '):
                        article_info['language'] = line.replace('LA  - ', '')

                # Update the publication_info dictionary with the information from each batch
                publication_info.update({article_info['pubmed_id']: article_info})

            # Break the loop if all articles have been fetched
            if start + retmax >= total_results:
                break

    return publication_info

# Example usage:
email_address = 'your_email@example.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Corporate Author(s): {', '.join(info.get('corporate_authors', []))}")
    print(f"Affiliation: {info.get('affiliation', '')}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"Place of Publication: {info.get('place_of_publication', '')}")
    print(f"Publication History Status: {info.get('publication_history_status', '')}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"Language: {info.get('language', '')}")
    print("\n")

Total articles found for journal 'Lancet Glob Health': 251
Total articles found for journal 'Health Policy Plan': 83
PubMed ID:  37482074
Title: Correction to Lancet Glob Health 2023; 11: e1075-85.
Abstract: 
Authors: 
Corporate Author(s): 
Affiliation: 
Journal: The Lancet. Global health
Publication Date: 2023 Jul 20
Place of Publication: England
Publication History Status: 
MeSH Terms: 
Language: eng


PubMed ID:  37429304
Title: Universal health coverage is a matter of equity, rights, and justice.
Abstract: 
Authors: Cuevas Barron G, Koonin J, Akselrod S, Fogstad H, Karema C, Ditiu L, Dain K, Joshi N
Corporate Author(s): 
Affiliation: UHC2030, Geneva 1211, Switzerland. Electronic address: info@uhc2030.org.
Journal: The Lancet. Global health
Publication Date: 2023 Jul 7
Place of Publication: England
Publication History Status: 
MeSH Terms: 
Language: eng


PubMed ID:  37390834
Title: Implications for assessing the association between maternal anaemia and
Abstract: 
Authors: Khan MN
C

Publication Date: 2023 Mar
Place of Publication: England
Publication History Status: 
MeSH Terms: Pregnancy, Female, Child, Infant, Newborn, Humans, *Premature Birth/prevention & control, Prospective Studies, Child Health, *Perinatal Death, Cost-Benefit Analysis, Women's Health, Aspirin/therapeutic use
Language: eng


PubMed ID:  36796986
Title: Secondary prevention with a structured semi-interactive stroke prevention package
Abstract: BACKGROUND: There is a high burden of stroke, including recurrent stroke, in
Authors: 
Corporate Author(s): SPRINT INDIA trial collaborators
Affiliation: 
Journal: The Lancet. Global health
Publication Date: 2023 Mar
Place of Publication: England
Publication History Status: 
MeSH Terms: Adult, Humans, Secondary Prevention, Treatment Outcome, *Stroke/prevention & control, Educational Status, India/epidemiology
Language: eng


PubMed ID:  36796985
Title: Genomic epidemiology of SARS-CoV-2 infections in The Gambia: an analysis of
Abstract: BACKGROUND: COVID

Journal: Health policy and planning
Publication Date: 2023 Mar 16
Place of Publication: England
Publication History Status: 
MeSH Terms: Adult, Humans, Cross-Sectional Studies, India, *Insurance, Health, Surveys and Questionnaires, *National Health Programs
Language: eng


PubMed ID:  36477517
Title: Tracking development assistance for mental health: time for better data.
Abstract: 
Authors: Iemmi V
Corporate Author(s): 
Affiliation: Department of Health Policy, London School of Economics and Political Science,
Journal: Health policy and planning
Publication Date: 2023 Apr 11
Place of Publication: England
Publication History Status: 
MeSH Terms: Humans, *Mental Health, *Health Expenditures, Developing Countries
Language: eng


PubMed ID:  36477200
Title: Catastrophic health care expenditure and impoverishment in Bhutan.
Abstract: Monitoring financial hardship due to out-of-pocket spending on health care is a
Authors: Sharma J, Pavlova M, Groot W
Corporate Author(s): 
Affiliation: Depar

In [77]:
len(pub_info)

334

Fixing Publication History Status. 

In [80]:
# First iteration
from Bio import Entrez
import time

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'pubmed_id': PubMed ID.
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'corporate_authors': List of corporate authors.
            - 'affiliation': Affiliation of the authors.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'place_of_publication': Place of publication.
            - 'publication_history_status': Publication history status.
            - 'mesh_terms': List of MeSH terms.
            - 'language': Language of the article.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        total_results = int(record['Count'])
        print(f"Total articles found for journal '{journal}': {total_results}")

        # Fetch detailed information for each article using WebEnv and query_key
        webenv = record['WebEnv']
        query_key = record['QueryKey']
        retmax = 500  # Fetch up to 500 records per batch

        for start in range(0, total_results, retmax):
            batch_start = start + 1  # PubMed uses 1-based indexing
            batch_end = min(start + retmax, total_results)
            attempt = 1

            while attempt <= 3:
                try:
                    handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                           retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
                    article_text = handle.read()
                    handle.close()
                    break
                except Exception as e:
                    print(f"Error fetching batch {batch_start}-{batch_end}. Attempt {attempt} of 3. Error: {e}")
                    attempt += 1
                    time.sleep(15)

            # Parse the article text to extract relevant information
            for article_text in article_text.split('\n\n'):
                article_info = {}
                phst_lines = []
                abstract_lines = []
                for line in article_text.splitlines():
                    line = line.strip()
                    if line.startswith('PMID-'):
                        article_info['pubmed_id'] = line.replace('PMID-', '')
                    elif line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        abstract_lines.append(line.replace('AB  - ', ''))
                    elif line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('CN  - '):
                        article_info.setdefault('corporate_authors', []).append(line.replace('CN  - ', ''))
                    elif line.startswith('AD  - '):
                        article_info['affiliation'] = line.replace('AD  - ', '')
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('PL  - '):
                        article_info['place_of_publication'] = line.replace('PL  - ', '')
                    elif line.startswith('PHST - '):
                        phst_lines.append(line.replace('PHST - ', ''))
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('LA  - '):
                        article_info['language'] = line.replace('LA  - ', '')

                # Update the publication_history_status with concatenated PHST lines
                if phst_lines:
                    article_info['publication_history_status'] = ' '.join(phst_lines)

                # Update the abstract with concatenated abstract lines
                if abstract_lines:
                    article_info['abstract'] = ' '.join(abstract_lines)

                # Update the publication_info dictionary with the information from each batch
                publication_info.update({article_info['pubmed_id']: article_info})

            # Break the loop if all articles have been fetched
            if start + retmax >= total_results:
                break

    return publication_info

# Example usage:
email_address = 'your_email@example.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Corporate Author(s): {', '.join(info.get('corporate_authors', []))}")
    print(f"Affiliation: {info.get('affiliation', '')}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"Place of Publication: {info.get('place_of_publication', '')}")
    print(f"Publication History Status: {info.get('publication_history_status', '')}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"Language: {info.get('language', '')}")
    print("\n")

Total articles found for journal 'Lancet Glob Health': 251
Total articles found for journal 'Health Policy Plan': 83
PubMed ID:  37482074
Title: Correction to Lancet Glob Health 2023; 11: e1075-85.
Abstract: 
Authors: 
Corporate Author(s): 
Affiliation: 
Journal: The Lancet. Global health
Publication Date: 2023 Jul 20
Place of Publication: England
Publication History Status: 
MeSH Terms: 
Language: eng


PubMed ID:  37429304
Title: Universal health coverage is a matter of equity, rights, and justice.
Abstract: 
Authors: Cuevas Barron G, Koonin J, Akselrod S, Fogstad H, Karema C, Ditiu L, Dain K, Joshi N
Corporate Author(s): 
Affiliation: UHC2030, Geneva 1211, Switzerland. Electronic address: info@uhc2030.org.
Journal: The Lancet. Global health
Publication Date: 2023 Jul 7
Place of Publication: England
Publication History Status: 
MeSH Terms: 
Language: eng


PubMed ID:  37390834
Title: Implications for assessing the association between maternal anaemia and
Abstract: 
Authors: Khan MN
C

Place of Publication: England
Publication History Status: 
MeSH Terms: Humans, *Vaccination Hesitancy, *Measles/epidemiology/prevention & control, Measles Vaccine, Vaccination
Language: eng


PubMed ID:  36925156
Title: Measuring vulnerability to childhood cancer treatment delays in low-income and
Abstract: 
Authors: Geel J, Eyal K
Corporate Author(s): 
Affiliation: Southern Africa Labour and Development Research Unit, School of Economics,
Journal: The Lancet. Global health
Publication Date: 2023 Apr
Place of Publication: England
Publication History Status: 
MeSH Terms: Humans, Child, *Developing Countries, Time-to-Treatment, *Neoplasms/therapy, Poverty, Income
Language: eng


PubMed ID:  36913958
Title: Inequitable access to aid after the devastating earthquake in Syria.
Abstract: 
Authors: Alkhalil M, Ekzayez A, Rayes D, Abbara A
Corporate Author(s): 
Affiliation: Syria Public Health Network, London, UK; Department of Infection, Imperial
Journal: The Lancet. Global health
Publication

PubMed ID:  36427517
Title: What sounds like Aedes, acts like Aedes, but is not Aedes? Lessons from dengue
Abstract: Aedes mosquitoes are responsible for transmission of dengue, chikungunya, Zika,
Authors: Allan R, Budge S, Sauskojus H
Corporate Author(s): 
Affiliation: The MENTOR Initiative, Haywards Heath, UK.
Journal: The Lancet. Global health
Publication Date: 2023 Jan
Place of Publication: England
Publication History Status: 
MeSH Terms: Animals, Humans, *Aedes, *Anopheles, *Dengue Virus, Mosquito Vectors, *Zika Virus Infection, *Zika Virus, *Malaria/prevention & control, Nigeria
Language: eng


PubMed ID:  36423645
Title: Correction to Lancet Glob Health 2022; 10: e1764-73.
Abstract: 
Authors: 
Corporate Author(s): 
Affiliation: 
Journal: The Lancet. Global health
Publication Date: 2023 Feb
Place of Publication: England
Publication History Status: 
MeSH Terms: 
Language: eng


PubMed ID:  36403587
Title: Correction to Lancet Glob Health 2022; 10: e1855-59.
Abstract: 
Authors: 
Co

In [85]:
# Second iteration - a simpler implementation - but still doesn't work for publication history status (fixed below - an issue of spacing between text and dash in the string.)
from Bio import Entrez
import time

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'pubmed_id': PubMed ID.
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'corporate_authors': List of corporate authors.
            - 'affiliation': Affiliation of the authors.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'place_of_publication': Place of publication.
            - 'publication_history_status': Publication history status.
            - 'mesh_terms': List of MeSH terms.
            - 'language': Language of the article.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        total_results = int(record['Count'])
        print(f"Total articles found for journal '{journal}': {total_results}")

        # Fetch detailed information for each article using WebEnv and query_key
        webenv = record['WebEnv']
        query_key = record['QueryKey']
        retmax = 500  # Fetch up to 500 records per batch

        for start in range(0, total_results, retmax):
            batch_start = start + 1  # PubMed uses 1-based indexing
            batch_end = min(start + retmax, total_results)
            attempt = 1

            while attempt <= 3:
                try:
                    handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                           retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
                    article_text = handle.read()
                    handle.close()
                    break
                except Exception as e:
                    print(f"Error fetching batch {batch_start}-{batch_end}. Attempt {attempt} of 3. Error: {e}")
                    attempt += 1
                    time.sleep(15)

            # Parse the article text to extract relevant information
            for article_text in article_text.split('\n\n'):
                article_info = {}
                for line in article_text.splitlines():
                    line = line.strip()
                    if line.startswith('PMID-'):
                        article_info['pubmed_id'] = line.replace('PMID-', '')
                    elif line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        article_info['abstract'] = line.replace('AB  - ', '')
                        for abstract_line in article_text.splitlines()[1:]:
                            if abstract_line.startswith('     '):
                                article_info['abstract'] += ' ' + abstract_line.strip()
                            else:
                                break
                    elif line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('CN  - '):
                        article_info.setdefault('corporate_authors', []).append(line.replace('CN  - ', ''))
                    elif line.startswith('AD  - '):
                        article_info['affiliation'] = line.replace('AD  - ', '')
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('PL  - '):
                        article_info['place_of_publication'] = line.replace('PL  - ', '')
                    elif line.startswith('PHST- '):
                        article_info.setdefault('publication_history_status', []).append(line.replace('PHST- ', ''))
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('LA  - '):
                        article_info['language'] = line.replace('LA  - ', '')

                # Update the publication_info dictionary with the information from each batch
                publication_info.update({article_info['pubmed_id']: article_info})

            # Break the loop if all articles have been fetched
            if start + retmax >= total_results:
                break

    return publication_info

# Example usage:
email_address = 'your_email@example.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Corporate Author(s): {', '.join(info.get('corporate_authors', []))}")
    print(f"Affiliation: {info.get('affiliation', '')}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"Place of Publication: {info.get('place_of_publication', '')}")
    print(f"Publication History Status: {', '.join(info.get('publication_history_status', []))}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"Language: {info.get('language', '')}")
    print("\n")

Total articles found for journal 'Lancet Glob Health': 251
Total articles found for journal 'Health Policy Plan': 83
PubMed ID:  37482074
Title: Correction to Lancet Glob Health 2023; 11: e1075-85.
Abstract: 
Authors: 
Corporate Author(s): 
Affiliation: 
Journal: The Lancet. Global health
Publication Date: 2023 Jul 20
Place of Publication: England
Publication History Status: 2023/07/24 00:41 [medline], 2023/07/24 00:41 [pubmed], 2023/07/23 18:53 [entrez]
MeSH Terms: 
Language: eng


PubMed ID:  37429304
Title: Universal health coverage is a matter of equity, rights, and justice.
Abstract: 
Authors: Cuevas Barron G, Koonin J, Akselrod S, Fogstad H, Karema C, Ditiu L, Dain K, Joshi N
Corporate Author(s): 
Affiliation: UHC2030, Geneva 1211, Switzerland. Electronic address: info@uhc2030.org.
Journal: The Lancet. Global health
Publication Date: 2023 Jul 7
Place of Publication: England
Publication History Status: 2023/06/29 00:00 [received], 2023/06/30 00:00 [accepted], 2023/07/11 01:07 [med

Title: Effects of a school-based lifestyle intervention on ideal cardiovascular health
Abstract: BACKGROUND: The prevalence of ideal cardiovascular health among Chinese children
Authors: Guo P, Zhou Y, Zhu Y
Corporate Author(s): 
Affiliation: Department of Maternal and Child Health, School of Public Health, Sun Yat-sen
Journal: The Lancet. Global health
Publication Date: 2023 Mar
Place of Publication: England
Publication History Status: 2023/03/03 03:14 [entrez], 2023/03/04 06:00 [pubmed], 2023/03/07 06:00 [medline]
MeSH Terms: Adolescent, Child, Female, Humans, Male, Cholesterol, East Asian People, *Life Style, Schools, *Health Promotion, *Cardiovascular Diseases/prevention & control, Health Behavior
Language: eng


PubMed ID:  36866470
Title: Effectiveness and co-benefits of a telephone-based intervention in reducing
Abstract: BACKGROUND: Evidence of effective early childhood obesity prevention is scarce
Authors: Wen LM, Taki S, Xu H, Phongsavan P, Rissel C, Hayes A, Baur LA
Corporat

Journal: Health policy and planning
Publication Date: 2023 May 27
Place of Publication: England
Publication History Status: 2022/03/20 00:00 [received], 2023/05/11 00:00 [revised], 2023/05/24 00:00 [accepted], 2023/05/27 19:14 [medline], 2023/05/27 19:14 [pubmed], 2023/05/27 14:59 [entrez]
MeSH Terms: 
Language: eng


PubMed ID:  37217184
Title: Civil society priorities for global health: concepts and measurement.
Abstract: The global health agenda-a high stakes process in which problems are defined and
Authors: Smith SL
Corporate Author(s): 
Affiliation: Virginia Tech, 900 N. Glebe Rd., Arlington, VA 22203-1822, USA.
Journal: Health policy and planning
Publication Date: 2023 Jun 16
Place of Publication: England
Publication History Status: 2022/09/15 00:00 [received], 2023/04/30 00:00 [revised], 2023/05/15 00:00 [accepted], 2023/06/19 13:08 [medline], 2023/05/23 01:06 [pubmed], 2023/05/22 19:43 [entrez]
MeSH Terms: Humans, *HIV Infections/epidemiology, Global Health, Pandemics, *COVID-

In [84]:
from Bio import Entrez
import time

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'pubmed_id': PubMed ID.
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'corporate_authors': List of corporate authors.
            - 'affiliation': Affiliation of the authors.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'place_of_publication': Place of publication.
            - 'publication_history_status': Publication history status.
            - 'mesh_terms': List of MeSH terms.
            - 'language': Language of the article.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        total_results = int(record['Count'])
        print(f"Total articles found for journal '{journal}': {total_results}")

        # Fetch detailed information for each article using WebEnv and query_key
        webenv = record['WebEnv']
        query_key = record['QueryKey']
        retmax = 500  # Fetch up to 500 records per batch

        for start in range(0, total_results, retmax):
            batch_start = start + 1  # PubMed uses 1-based indexing
            batch_end = min(start + retmax, total_results)
            attempt = 1

            while attempt <= 3:
                try:
                    handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                           retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
                    article_text = handle.read()
                    handle.close()
                    break
                except Exception as e:
                    print(f"Error fetching batch {batch_start}-{batch_end}. Attempt {attempt} of 3. Error: {e}")
                    attempt += 1
                    time.sleep(15)

            # Parse the article text to extract relevant information
            for article_text in article_text.split('\n\n'):
                article_info = {}
                publication_history_status = ""
                for line in article_text.splitlines():
                    line = line.strip()
                    if line.startswith('PMID-'):
                        article_info['pubmed_id'] = line.replace('PMID-', '')
                    elif line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        article_info['abstract'] = line.replace('AB  - ', '')
                        for abstract_line in article_text.splitlines()[1:]:
                            if abstract_line.startswith('     '):
                                article_info['abstract'] += ' ' + abstract_line.strip()
                            else:
                                break
                    elif line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('CN  - '):
                        article_info.setdefault('corporate_authors', []).append(line.replace('CN  - ', ''))
                    elif line.startswith('AD  - '):
                        article_info['affiliation'] = line.replace('AD  - ', '')
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('PL  - '):
                        article_info['place_of_publication'] = line.replace('PL  - ', '')
                    elif line.startswith('PHST- '):
                        publication_history_status += line.replace('PHST- ', '') + " "
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('LA  - '):
                        article_info['language'] = line.replace('LA  - ', '')

                # Update the publication_info dictionary with the information from each batch
                article_info['publication_history_status'] = publication_history_status.strip()
                publication_info.update({article_info['pubmed_id']: article_info})

            # Break the loop if all articles have been fetched
            if start + retmax >= total_results:
                break

    return publication_info

# Example usage:
email_address = 'your_email@example.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Corporate Author(s): {', '.join(info.get('corporate_authors', []))}")
    print(f"Affiliation: {info.get('affiliation', '')}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"Place of Publication: {info.get('place_of_publication', '')}")
    print(f"Publication History Status: {info.get('publication_history_status', '')}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"Language: {info.get('language', '')}")
    print("\n")

Total articles found for journal 'Lancet Glob Health': 251
Total articles found for journal 'Health Policy Plan': 83
PubMed ID:  37482074
Title: Correction to Lancet Glob Health 2023; 11: e1075-85.
Abstract: 
Authors: 
Corporate Author(s): 
Affiliation: 
Journal: The Lancet. Global health
Publication Date: 2023 Jul 20
Place of Publication: England
Publication History Status: 2023/07/24 00:41 [medline] 2023/07/24 00:41 [pubmed] 2023/07/23 18:53 [entrez]
MeSH Terms: 
Language: eng


PubMed ID:  37429304
Title: Universal health coverage is a matter of equity, rights, and justice.
Abstract: 
Authors: Cuevas Barron G, Koonin J, Akselrod S, Fogstad H, Karema C, Ditiu L, Dain K, Joshi N
Corporate Author(s): 
Affiliation: UHC2030, Geneva 1211, Switzerland. Electronic address: info@uhc2030.org.
Journal: The Lancet. Global health
Publication Date: 2023 Jul 7
Place of Publication: England
Publication History Status: 2023/06/29 00:00 [received] 2023/06/30 00:00 [accepted] 2023/07/11 01:07 [medline

Publication History Status: 2023/03/03 03:14 [entrez] 2023/03/04 06:00 [pubmed] 2023/03/07 06:00 [medline]
MeSH Terms: Adolescent, Child, Female, Male, Humans, *Pediatric Obesity/epidemiology, Prospective Studies, Parents, Healthy Lifestyle, China/epidemiology
Language: eng


PubMed ID:  36866482
Title: Parental lifestyle patterns around pregnancy and risk of childhood obesity in
Abstract: BACKGROUND: A high prevalence of excess weight in children younger than 5 years
Authors: Lecorguille M, Schipper M, O'Donnell A, Aubert AM, Tafflet M, Gassama M, Douglass A, Hebert JR, Kelleher C, Charles MA, Phillips CM, Gaillard R, Lioret S, Heude B
Corporate Author(s): 
Affiliation: Center for Research in Epidemiology and Statistics, National Institute of Health
Journal: The Lancet. Global health
Publication Date: 2023 Mar
Place of Publication: England
Publication History Status: 2023/03/03 03:14 [entrez] 2023/03/04 06:00 [pubmed] 2023/03/07 06:00 [medline]
MeSH Terms: Child, Female, Pregnancy, Hu

Title: The genesis of the PM-JAY health insurance scheme in India: technical
Abstract: Many countries are using health insurance to advance progress towards universal
Authors: Srivastava S, Bertone MP, Parmar D, Walsh C, De Allegri M
Corporate Author(s): 
Affiliation: Heidelberg Institute of Global Health, Medical Faculty and University Hospital,
Journal: Health policy and planning
Publication Date: 2023 Jul 3
Place of Publication: England
Publication History Status: 2022/09/12 00:00 [received] 2023/04/18 00:00 [revised] 2023/06/30 00:00 [accepted] 2023/07/12 13:07 [medline] 2023/07/12 13:07 [pubmed] 2023/07/12 12:03 [entrez]
MeSH Terms: 
Language: eng


PubMed ID:  37421152
Title: Correction to: Civil society priorities for global health: concepts
Abstract: 
Authors: 
Corporate Author(s): 
Affiliation: 
Journal: Health policy and planning
Publication Date: 2023 Jul 8
Place of Publication: England
Publication History Status: 2023/06/26 00:00 [received] 2023/07/08 10:42 [medline] 2023/0

In [86]:
# Final version

# Second iteration - a simpler implementation - but still doesn't work for publication history status (fixed below - an issue of spacing between text and dash in the string.)
from Bio import Entrez
import time

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'pubmed_id': PubMed ID.
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'corporate_authors': List of corporate authors.
            - 'affiliation': Affiliation of the authors.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'place_of_publication': Place of publication.
            - 'publication_history_status': Publication history status.
            - 'mesh_terms': List of MeSH terms.
            - 'language': Language of the article.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        total_results = int(record['Count'])
        print(f"Total articles found for journal '{journal}': {total_results}")

        # Fetch detailed information for each article using WebEnv and query_key
        webenv = record['WebEnv']
        query_key = record['QueryKey']
        retmax = 500  # Fetch up to 500 records per batch

        for start in range(0, total_results, retmax):
            batch_start = start + 1  # PubMed uses 1-based indexing
            batch_end = min(start + retmax, total_results)
            attempt = 1

            while attempt <= 3:
                try:
                    handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                           retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
                    article_text = handle.read()
                    handle.close()
                    break
                except Exception as e:
                    print(f"Error fetching batch {batch_start}-{batch_end}. Attempt {attempt} of 3. Error: {e}")
                    attempt += 1
                    time.sleep(15)

            # Parse the article text to extract relevant information
            for article_text in article_text.split('\n\n'):
                article_info = {}
                for line in article_text.splitlines():
                    line = line.strip()
                    if line.startswith('PMID-'):
                        article_info['pubmed_id'] = line.replace('PMID-', '')
                    elif line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        article_info['abstract'] = line.replace('AB  - ', '')
                        for abstract_line in article_text.splitlines()[1:]:
                            if abstract_line.startswith('     '):
                                article_info['abstract'] += ' ' + abstract_line.strip()
                            else:
                                break
                    elif line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('CN  - '):
                        article_info.setdefault('corporate_authors', []).append(line.replace('CN  - ', ''))
                    elif line.startswith('AD  - '):
                        article_info['affiliation'] = line.replace('AD  - ', '')
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('PL  - '):
                        article_info['place_of_publication'] = line.replace('PL  - ', '')
                    elif line.startswith('PHST- '):
                        article_info.setdefault('publication_history_status', []).append(line.replace('PHST- ', ''))
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('LA  - '):
                        article_info['language'] = line.replace('LA  - ', '')

                # Update the publication_info dictionary with the information from each batch
                publication_info.update({article_info['pubmed_id']: article_info})

            # Break the loop if all articles have been fetched
            if start + retmax >= total_results:
                break

    return publication_info

# Example usage:
email_address = 'your_email@example.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Corporate Author(s): {', '.join(info.get('corporate_authors', []))}")
    print(f"Affiliation: {info.get('affiliation', '')}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"Place of Publication: {info.get('place_of_publication', '')}")
    print(f"Publication History Status: {', '.join(info.get('publication_history_status', []))}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"Language: {info.get('language', '')}")
    print("\n")

Total articles found for journal 'Lancet Glob Health': 251
Total articles found for journal 'Health Policy Plan': 83
PubMed ID:  37482074
Title: Correction to Lancet Glob Health 2023; 11: e1075-85.
Abstract: 
Authors: 
Corporate Author(s): 
Affiliation: 
Journal: The Lancet. Global health
Publication Date: 2023 Jul 20
Place of Publication: England
Publication History Status: 2023/07/24 00:41 [medline], 2023/07/24 00:41 [pubmed], 2023/07/23 18:53 [entrez]
MeSH Terms: 
Language: eng


PubMed ID:  37429304
Title: Universal health coverage is a matter of equity, rights, and justice.
Abstract: 
Authors: Cuevas Barron G, Koonin J, Akselrod S, Fogstad H, Karema C, Ditiu L, Dain K, Joshi N
Corporate Author(s): 
Affiliation: UHC2030, Geneva 1211, Switzerland. Electronic address: info@uhc2030.org.
Journal: The Lancet. Global health
Publication Date: 2023 Jul 7
Place of Publication: England
Publication History Status: 2023/06/29 00:00 [received], 2023/06/30 00:00 [accepted], 2023/07/11 01:07 [med

Journal: The Lancet. Global health
Publication Date: 2023 Mar
Place of Publication: England
Publication History Status: 2022/09/11 00:00 [received], 2022/11/27 00:00 [revised], 2022/12/02 00:00 [accepted], 2023/02/16 20:59 [entrez], 2023/02/17 06:00 [pubmed], 2023/02/22 06:00 [medline]
MeSH Terms: Adult, Humans, Secondary Prevention, Treatment Outcome, *Stroke/prevention & control, Educational Status, India/epidemiology
Language: eng


PubMed ID:  36796985
Title: Genomic epidemiology of SARS-CoV-2 infections in The Gambia: an analysis of
Abstract: BACKGROUND: COVID-19, caused by SARS-CoV-2, is one of the deadliest pandemics of
Authors: Kanteh A, Jallow HS, Manneh J, Sanyang B, Kujabi MA, Ndure SL, Jarju S, Sey AP, Damilare K D, Bah Y, Sambou S, Jarju G, Manjang B, Jagne A, Bittaye SO, Bittaye M, Forrest K, Tiruneh DA, Samateh AL, Jagne S, Hue S, Mohammed N, Amambua-Ngwa A, Kampmann B, D'Alessandro U, de Silva TI, Roca A, Sesay AK
Corporate Author(s): 
Affiliation: Medical Research Coun

Abstract: The Chinese healthcare system faces a dilemma between its hospital-centric
Authors: Hu H, Wang R, Li H, Han S, Shen P, Lin H, Guan X, Shi L
Corporate Author(s): 
Affiliation: International Research Center for Medicinal Administration, Peking University
Journal: Health policy and planning
Publication Date: 2023 May 17
Place of Publication: England
Publication History Status: 2022/07/25 00:00 [received], 2023/02/20 00:00 [revised], 2023/03/08 00:00 [accepted], 2023/05/19 06:42 [medline], 2023/03/12 06:00 [pubmed], 2023/03/11 10:42 [entrez]
MeSH Terms: Adult, Female, Humans, Interrupted Time Series Analysis, Cross-Sectional Studies, *Physicians, Primary Care, Aging, Policy
Language: eng


PubMed ID:  36798965
Title: Understanding medical corruption in China: a mixed-methods study.
Abstract: Medical corruption is a significant obstacle to achieving health-related
Authors: Fu H, Lai Y, Li Y, Zhu Y, Yip W
Corporate Author(s): 
Affiliation: Department of Global Health and Population

In [91]:
# Example usage:
email = email_address
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

# Set your email address (required by PubMed)
Entrez.email = email

# Initialize a dictionary to store publication information
publication_info = {}

journal = journal_list[0]
# Build the query to search for articles from the specified journal
query = f'"{journal}"[Journal]'

# If date range is specified, add it to the query
if start_date and end_date:
    query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

In [92]:
# Fetch IDs of all articles matching the query (up to the max_results)
handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
record = Entrez.read(handle)
handle.close()


In [94]:
from pprint import pprint
pprint(record)

{'Count': '251',
 'IdList': ['37482074', '37429304', '37390834', '37390833', '37349046', '37349045', '37349044', '37349043', '37349042', '37349041', '37349040', '37349039', '37349038', '37349037', '37349036', '37349035', '37349034', '37349033', '37349032', '37349031', '37349030', '37349029', '37349028', '37349027', '37349026', '37349025', '37329894', '37321242', '37276879', '37276878', '37271163', '37271162', '37244269', '37236212', '37209703', '37209702', '37207683', '37207682', '37202030', '37202029', '37202028', '37202027', '37202026', '37202025', '37202024', '37202023', '37202022', '37202021', '37202020', '37202019', '37202018', '37202017', '37202016', '37202015', '37202014', '37202013', '37202012', '37202011', '37202010', '37202009', '37202008', '37202007', '37202006', '37202005', '37201545', '37201544', '37167984', '37167983', '37146627', '37119831', '37116530', '37061316', '37061315', '37061314', '37061313', '37061312', '37061311', '37061310', '37061309', '37061308', '37061307',

In [96]:
total_results = int(record['Count'])
print(f"Total articles found for journal '{journal}': {total_results}")

# Fetch detailed information for each article using WebEnv and query_key
webenv = record['WebEnv']
query_key = record['QueryKey']
retmax = 500  # Fetch up to 500 records per batch

Total articles found for journal 'Lancet Glob Health': 251


In [97]:
handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                       retstart=0, retmax=retmax, webenv=webenv, query_key=query_key)
article_text = handle.read()
handle.close()

In [99]:
print(article_text)


PMID- 37482074
OWN - NLM
STAT- Publisher
LR  - 20230723
IS  - 2214-109X (Electronic)
IS  - 2214-109X (Linking)
DP  - 2023 Jul 20
TI  - Correction to Lancet Glob Health 2023; 11: e1075-85.
LID - S2214-109X(23)00361-3 [pii]
LID - 10.1016/S2214-109X(23)00361-3 [doi]
LA  - eng
PT  - Published Erratum
DEP - 20230720
PL  - England
TA  - Lancet Glob Health
JT  - The Lancet. Global health
JID - 101613665
SB  - IM
EFR - Lancet Glob Health. 2023 Jul;11(7):e1075-e1085. PMID: 37349034
EDAT- 2023/07/24 00:41
MHDA- 2023/07/24 00:41
CRDT- 2023/07/23 18:53
PHST- 2023/07/24 00:41 [medline]
PHST- 2023/07/24 00:41 [pubmed]
PHST- 2023/07/23 18:53 [entrez]
AID - S2214-109X(23)00361-3 [pii]
AID - 10.1016/S2214-109X(23)00361-3 [doi]
PST - aheadofprint
SO  - Lancet Glob Health. 2023 Jul 20:S2214-109X(23)00361-3. doi: 
      10.1016/S2214-109X(23)00361-3.

PMID- 37429304
OWN - NLM
STAT- Publisher
LR  - 20230710
IS  - 2214-109X (Electronic)
IS  - 2214-109X (Linking)
DP  - 2023 Jul 7
TI  - Universal health cove

In [None]:
for start in range(0, total_results, retmax): # This gives a sequence. For example, if total_results is 500 and retmax is 100, the range will be [0, 100, 200, 300, 400].
    batch_start = start + 1  # PubMed uses 1-based indexing
    batch_end = min(start + retmax, total_results)
    attempt = 1

    while attempt <= 3:
        try:
            handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                   retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
            article_text = handle.read()
            handle.close()
            break
        except Exception as e:
            print(f"Error fetching batch {batch_start}-{batch_end}. Attempt {attempt} of 3. Error: {e}")
            attempt += 1
            time.sleep(15)

    # Parse the article text to extract relevant information
    for article_text in article_text.split('\n\n'):
        article_info = {}
        for line in article_text.splitlines():
            line = line.strip()
            if line.startswith('PMID-'):
                article_info['pubmed_id'] = line.replace('PMID-', '')
            elif line.startswith('TI  - '):
                article_info['title'] = line.replace('TI  - ', '')
            elif line.startswith('AB  - '):
                article_info['abstract'] = line.replace('AB  - ', '')
                for abstract_line in article_text.splitlines()[1:]:
                    if abstract_line.startswith('     '):
                        article_info['abstract'] += ' ' + abstract_line.strip()
                    else:
                        break
            elif line.startswith('AU  - '):
                article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
            elif line.startswith('CN  - '):
                article_info.setdefault('corporate_authors', []).append(line.replace('CN  - ', ''))
            elif line.startswith('AD  - '):
                article_info['affiliation'] = line.replace('AD  - ', '')
            elif line.startswith('JT  - '):
                article_info['journal'] = line.replace('JT  - ', '')
            elif line.startswith('DP  - '):
                article_info['publication_date'] = line.replace('DP  - ', '')
            elif line.startswith('PL  - '):
                article_info['place_of_publication'] = line.replace('PL  - ', '')
            elif line.startswith('PHST- '):
                article_info.setdefault('publication_history_status', []).append(line.replace('PHST- ', ''))
            elif line.startswith('MH  - '):
                article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
            elif line.startswith('LA  - '):
                article_info['language'] = line.replace('LA  - ', '')

        # Update the publication_info dictionary with the information from each batch
        publication_info.update({article_info['pubmed_id']: article_info})

    # Break the loop if all articles have been fetched
    if start + retmax >= total_results:
        break


In [87]:
len(pub_info)

334

In [88]:
pub_info.items()



In [104]:
from Bio import Entrez
import time

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'pubmed_id': PubMed ID.
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'corporate_authors': List of corporate authors.
            - 'affiliation': Affiliation of the authors.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'place_of_publication': Place of publication.
            - 'publication_history_status': Publication history status.
            - 'mesh_terms': List of MeSH terms.
            - 'language': Language of the article.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        total_results = int(record['Count'])
        print(f"Total articles found for journal '{journal}': {total_results}")

        # Fetch detailed information for each article using WebEnv and query_key
        webenv = record['WebEnv']
        query_key = record['QueryKey']
        retmax = 500  # Fetch up to 500 records per batch

        for start in range(0, total_results, retmax):
            batch_start = start + 1  # PubMed uses 1-based indexing
            batch_end = min(start + retmax, total_results)
            attempt = 1

            while attempt <= 3:
                try:
                    handle = Entrez.efetch(db='pubmed', rettype='medline', retmode='text',
                                           retstart=start, retmax=retmax, webenv=webenv, query_key=query_key)
                    article_text = handle.read()
                    handle.close()
                    break
                except Exception as e:
                    print(f"Error fetching batch {batch_start}-{batch_end}. Attempt {attempt} of 3. Error: {e}")
                    attempt += 1
                    time.sleep(15)

            # Parse the article text to extract relevant information
            articles = article_text.split('\n\nPMID')[1:]  # Separate individual articles in the batch

            for article in articles:
                article_info = {}
                lines = article.splitlines()

                # Extract the PubMed ID (PMID) from the article metadata
                pubmed_id = lines[0].replace('PMID-', '').strip()
                article_info['pubmed_id'] = pubmed_id

                # Continue parsing the rest of the article text
                for line in lines:
                    if line.startswith('TI  - '):
                        article_info['title'] = line.replace('TI  - ', '')
                    elif line.startswith('AB  - '):
                        article_info['abstract'] = line.replace('AB  - ', '')
                    elif line.startswith('PG  - '):  # Page tag marks the end of the title
                        break
                    elif line.startswith('CI  - '):  # Copyright tag marks the end of the abstract
                        break

                # Continue parsing the rest of the article text
                for line in lines:
                    if line.startswith('AU  - '):
                        article_info.setdefault('authors', []).append(line.replace('AU  - ', ''))
                    elif line.startswith('CN  - '):
                        article_info.setdefault('corporate_authors', []).append(line.replace('CN  - ', ''))
                    elif line.startswith('AD  - '):
                        article_info['affiliation'] = line.replace('AD  - ', '')
                    elif line.startswith('JT  - '):
                        article_info['journal'] = line.replace('JT  - ', '')
                    elif line.startswith('DP  - '):
                        article_info['publication_date'] = line.replace('DP  - ', '')
                    elif line.startswith('PL  - '):
                        article_info['place_of_publication'] = line.replace('PL  - ', '')
                    elif line.startswith('PHST- '):
                        article_info.setdefault('publication_history_status', []).append(line.replace('PHST- ', ''))
                    elif line.startswith('MH  - '):
                        article_info.setdefault('mesh_terms', []).append(line.replace('MH  - ', ''))
                    elif line.startswith('LA  - '):
                        article_info['language'] = line.replace('LA  - ', '')

                # Update the publication_info dictionary with the information from each batch
                publication_info.update({article_info['pubmed_id']: article_info})

            # Break the loop if all articles have been fetched
            if start + retmax >= total_results:
                break

    return publication_info

# Example usage:
email_address = 'your_email@example.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Corporate Author(s): {', '.join(info.get('corporate_authors', []))}")
    print(f"Affiliation: {info.get('affiliation', '')}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"Place of Publication: {info.get('place_of_publication', '')}")
    print(f"Publication History Status: {', '.join(info.get('publication_history_status', []))}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"Language: {info.get('language', '')}")
    print("\n")

Total articles found for journal 'Lancet Glob Health': 252
Total articles found for journal 'Health Policy Plan': 83
PubMed ID: - 37482074
Title: Correction to Lancet Glob Health 2023; 11: e1075-85.
Abstract: 
Authors: 
Corporate Author(s): 
Affiliation: 
Journal: The Lancet. Global health
Publication Date: 2023 Jul 20
Place of Publication: England
Publication History Status: 2023/07/24 00:41 [medline], 2023/07/24 00:41 [pubmed], 2023/07/23 18:53 [entrez]
MeSH Terms: 
Language: eng


PubMed ID: - 37429304
Title: Universal health coverage is a matter of equity, rights, and justice.
Abstract: 
Authors: Cuevas Barron G, Koonin J, Akselrod S, Fogstad H, Karema C, Ditiu L, Dain K, Joshi N
Corporate Author(s): 
Affiliation: UHC2030, Geneva 1211, Switzerland. Electronic address: info@uhc2030.org.
Journal: The Lancet. Global health
Publication Date: 2023 Jul 7
Place of Publication: England
Publication History Status: 2023/06/29 00:00 [received], 2023/06/30 00:00 [accepted], 2023/07/11 01:07 [m

Publication History Status: 2023/03/03 03:14 [entrez], 2023/03/04 06:00 [pubmed], 2023/03/07 06:00 [medline]
MeSH Terms: Child, Adolescent, Humans, Female, *Overweight/epidemiology/prevention & control, *Pediatric Obesity/epidemiology/prevention & control, Developing Countries, Breast Feeding, China/epidemiology
Language: eng


PubMed ID: - 36866474
Title: The voices of children on movement behaviours: implications for promoting 
Abstract: 
Authors: Kariippanon KE, Aguilar-Farias N, El Hamdouchi A, Hongyan G, Lubree H, Okely AD, Tremblay MS, Draper CE
Corporate Author(s): 
Affiliation: SAMRC Developmental Pathways for Health Research Unit, University of the 
Journal: The Lancet. Global health
Publication Date: 2023 Mar
Place of Publication: England
Publication History Status: 2023/03/03 03:14 [entrez], 2023/03/04 06:00 [pubmed], 2023/03/07 06:00 [medline]
MeSH Terms: Child, Child, Preschool, Humans, Female, Male, *Pediatric Obesity/prevention & control, Pilot Projects, Sedentary Behavi

Title: Strengthening Routine Data Reporting in Private Hospitals in Lagos, Nigeria.
Abstract: The availability of routine health information is critical for effective health 
Authors: Ohiri K, Makinde O, Ogundeji Y, Mobisson N, Oludipe M, Ohiri K
Corporate Author(s): 
Affiliation: Health Strategy and Delivery Foundation, Abuja, Nigeria.
Journal: Health policy and planning
Publication Date: 2023 Jun 3
Place of Publication: England
Publication History Status: 2022/02/06 00:00 [received], 2023/03/07 00:00 [revised], 2023/06/02 00:00 [accepted], 2023/06/06 19:12 [medline], 2023/06/06 19:12 [pubmed], 2023/06/06 17:45 [entrez]
MeSH Terms: 
Language: eng


PubMed ID: - 37256762
Title: Resilience of front-line facilities during COVID-19: evidence from 
Abstract: Responsive primary health-care facilities are the foundation of resilient health 
Authors: Peters MA, Ahmed T, Azais V, Amor Fernandez P, Baral P, Drouard S, Neill R, Bachir K, Bassounda P, Dube Q, Flora S, Montufar E, Nzelu C, Tassemb

Using the Medline parser for XML files:

In [111]:
from Bio import Entrez
from Bio import Medline

def fetch_publication_info(email, journal_list, start_date=None, end_date=None, max_results=None):
    """
    Fetch publication information from specified academic journals on PubMed within a date range.

    Args:
        email (str): Your email address for accessing PubMed.
        journal_list (list): List of academic journals to search.
        start_date (str): Start date in the format 'YYYY/MM/DD' to filter articles.
        end_date (str): End date in the format 'YYYY/MM/DD' to filter articles.
        max_results (int): Maximum number of search results to fetch.

    Returns:
        dict: A dictionary containing the fetched publication information.
            Keys are PubMed IDs, and values are dictionaries with the following keys:
            - 'pubmed_id': PubMed ID.
            - 'title': Title of the article.
            - 'abstract': Abstract of the article.
            - 'authors': List of authors.
            - 'corporate_authors': List of corporate authors.
            - 'affiliation': Affiliation of the authors.
            - 'journal': Journal name.
            - 'publication_date': Publication date of the article.
            - 'place_of_publication': Place of publication.
            - 'publication_history_status': Publication history status.
            - 'mesh_terms': List of MeSH terms.
            - 'language': Language of the article.
    """

    # Set your email address (required by PubMed)
    Entrez.email = email

    # Initialize a dictionary to store publication information
    publication_info = {}

    for journal in journal_list:
        # Build the query to search for articles from the specified journal
        query = f'"{journal}"[Journal]'

        # If date range is specified, add it to the query
        if start_date and end_date:
            query += f' AND ("{start_date}"[Date - Publication] : "{end_date}"[Date - Publication])'

        # Fetch IDs of all articles matching the query (up to the max_results)
        handle = Entrez.esearch(db='pubmed', term=query, retmax=max_results, usehistory='y')
        record = Entrez.read(handle)
        handle.close()

        if 'IdList' not in record or not record['IdList']:
            print(f"No articles found for journal: {journal}")
            continue

        total_results = int(record['Count'])
        print(f"Total articles found for journal '{journal}': {total_results}")

        # Fetch the PubMed records for the search results
        id_list = record['IdList']
        handle = Entrez.efetch(db='pubmed', id=id_list, rettype='medline', retmode='text')
        
        # Parse the Medline records and extract the relevant information
        records = Medline.parse(handle)
        for medline_record in records:
            article_info = {}
            pubmed_id = medline_record.get('PMID', '')
            article_info['pubmed_id'] = pubmed_id

            # Extract article information from the Medline record
            article_info['title'] = medline_record.get('TI', '')
            article_info['abstract'] = medline_record.get('AB', '')

            authors = medline_record.get('AU', [])
            article_info['authors'] = authors

            corporate_authors = medline_record.get('CN', [])
            article_info['corporate_authors'] = corporate_authors

            affiliation = medline_record.get('AD', [])
            article_info['affiliation'] = affiliation

            article_info['journal'] = medline_record.get('JT', '')

            publication_date = medline_record.get('DP', '')
            article_info['publication_date'] = publication_date

            article_info['place_of_publication'] = medline_record.get('IS', '')

            publication_history_status = medline_record.get('PHST', [])
            article_info['publication_history_status'] = publication_history_status

            mesh_terms = medline_record.get('MH', [])
            article_info['mesh_terms'] = mesh_terms

            article_info['language'] = medline_record.get('LA', '')

            # Update the publication_info dictionary with the information for the current article
            publication_info.update({pubmed_id: article_info})

    return publication_info

In [112]:
email_address = 'your_email@example.com'
journals = ['Lancet Glob Health', 'Health Policy Plan']
start_date = '2023/01/01'
end_date = '2023/07/21'
max_results = 1000  # Fetch up to 1000 search results per journal

pub_info = fetch_publication_info(email_address, journals, start_date, end_date, max_results)

# Print the fetched information for each article
for pmid, info in pub_info.items():
    print(f"PubMed ID: {pmid}")
    print(f"Title: {info.get('title', '')}")
    print(f"Abstract: {info.get('abstract', '')}")
    print(f"Authors: {', '.join(info.get('authors', []))}")
    print(f"Corporate Author(s): {', '.join(info.get('corporate_authors', []))}")
    print(f"Affiliation: {', '.join(info.get('affiliation', []))}")
    print(f"Journal: {info.get('journal', '')}")
    print(f"Publication Date: {info.get('publication_date', '')}")
    print(f"Place of Publication: {info.get('place_of_publication', '')}")
    print(f"Publication History Status: {', '.join(info.get('publication_history_status', []))}")
    print(f"MeSH Terms: {', '.join(info.get('mesh_terms', []))}")
    print(f"Language: {info.get('language', '')}")
    print("\n")

Total articles found for journal 'Lancet Glob Health': 252
Total articles found for journal 'Health Policy Plan': 83
PubMed ID: 37487517
Title: Defending the right to health during Sudan's civil war.
Abstract: 
Authors: Mohammed FEA, Viva MIF, Awadalla WAG, Elmahi OKO, Wainstock D, Patil P
Corporate Author(s): 
Affiliation: Faculty of Medicine, University of Khartoum, Khartoum, Sudan., NOVA Medical School, 1169-056 Lisbon, Portugal. Electronic address: inem.viva@gmail.com., Faculty of Medicine, Elrazi University, Khartoum, Sudan., Faculty of Medicine, Ibn Sina University, Khartoum, Sudan., Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil., London School of Hygiene & Tropical Medicine, London, UK.
Journal: The Lancet. Global health
Publication Date: 2023 Jul 21
Place of Publication: 2214-109X (Electronic) 2214-109X (Linking)
Publication History Status: 2023/07/25 01:09 [medline], 2023/07/25 01:09 [pubmed], 2023/06/07 00:00 [received], 2023/06/16 00:00 [accepted],

Corporate Author(s): 
Affiliation: Karolinska Institutet, Stockholm, Sweden. Electronic address: yuxia.wei@ki.se., Karolinska Institutet, Stockholm, Sweden; Sun Yat-sen University, Guangzhou, China., Karolinska Institutet, Stockholm, Sweden.
Journal: The Lancet. Global health
Publication Date: 2023 Mar
Place of Publication: 2214-109X (Electronic) 2214-109X (Linking)
Publication History Status: 2023/03/03 03:14 [entrez], 2023/03/04 06:00 [pubmed], 2023/03/07 06:00 [medline]
MeSH Terms: Adult, Child, Humans, Adiposity/genetics, Correlation of Data, *Diabetes Mellitus, Type 2, Genome-Wide Association Study, *Insulins, *Latent Autoimmune Diabetes in Adults, *Pediatric Obesity/epidemiology/genetics, Mendelian Randomization Analysis
Language: ['eng']


PubMed ID: 36863385
Title: Afghan women are essential to humanitarian NGO work.
Abstract: 
Authors: Essar MY, Raufi N, Head MG, Nemat A, Bahez A, Blanchet K, Shah J
Corporate Author(s): 
Affiliation: Department of Global Health, McMaster Unive

Place of Publication: 1460-2237 (Electronic) 0268-1080 (Print) 0268-1080 (Linking)
Publication History Status: 2022/06/06 00:00 [received], 2023/03/30 00:00 [revised], 2023/05/09 00:00 [accepted], 2023/06/19 13:08 [medline], 2023/05/10 12:42 [pubmed], 2023/05/10 09:23 [entrez]
MeSH Terms: Humans, *Transients and Migrants, Pakistan, Qatar, Ecosystem, Policy Making
Language: ['eng']


PubMed ID: 37148361
Title: 'All my co-workers are good people, but...': collaboration dynamics between frontline workers in rural Uttar Pradesh, India.
Abstract: Multisectoral collaboration has been identified as a critical component in a wide variety of health and development initiatives. For India's Integrated Child Development Services (ICDS) scheme, which serves >100 million people annually across more than one million villages, a key point of multisectoral collaboration-or 'convergence', as it is often called in India-is between the three frontline worker cadres jointly responsible for delivering essen