This notebook is clean version of topics review playground. It allows to run summarization both localy through Gemini call or
through Google Cloud function. Input function is JSON exported from PubTrends request (paper or keyword search).

This was prototype for integration with the hackathon project

In [190]:
from IPython.display import display, HTML

In [1]:
import os
import gzip

In [2]:
import pandas as pd
import numpy as np

import requests
import json
import re
from urllib.parse import urlencode

# Summarization service prototype

In [3]:
# file_path = "/Users/romeo/work/pubtrends_related/hackathon/json_examples/pubmed-human-aging-most-cited-1000.json.gz"
# file_path = "/Users/romeo/work/pubtrends_related/hackathon/json_examples/pubmed-drug-resistance-in-cancer.json.gz"
# file_path = "/Users/romeo/work/pubtrends_related/hackathon/json_examples/pubmed-fibrosis-from-mechanisms-to-medicines.json.gz"
file_path = "/Users/romeo/work/pubtrends_related/hackathon/json_examples/pubmed-hallmarks-of-aging-an-expanding-universe.json.gz"

## Load & Prepare data & Get Result From Lambda

In [4]:
from pysrc.papers.data import AnalysisData
from pysrc.papers.plot.plotter import get_topics_description

In [5]:
# Open and load the JSON file
# with open(file_path, 'r', encoding='utf-8') as file:
with gzip.open(file_path, 'rt') as file:
    json_data = json.load(file)

    data = AnalysisData.from_json(json_data)


The default value will be changed to `edges="edges" in NetworkX 3.6.


  nx.node_link_graph(data, edges="links") to preserve current behavior, or
  nx.node_link_graph(data, edges="edges") for forward compatibility.


In [6]:
def filter_by_connectivity(df, graph, percentile=75, preferred_count=None):
    # Step 1: Compute connectivity (without modifying df)
    connectivity = df['id'].apply(lambda pid: len(list(graph.neighbors(pid))))

    # Step 2: Compute the percentile threshold
    if len(df) <= preferred_count:
        # If few elements => TAKE all
        threshold = np.nanmin(connectivity)
    else:
        # If need to limit => limit by percentile, but not lower than preferred_count
        threshold = min(np.percentile(connectivity, percentile), sorted(connectivity, reverse=True)[preferred_count - 1])

    # Step 3: Get mask for nodes above threshold
    above_threshold_mask = connectivity >= threshold

    # print(f"len(df) = {len(df)}, threshold = {threshold}, {percentile}-th : {np.percentile(connectivity, percentile)}")
    # print(sorted(connectivity, reverse=True))
    # print("---")


    # Step 4: Apply the mask
    filtered_df = df[above_threshold_mask].copy()
    filtered_df['connections'] = connectivity[above_threshold_mask].values

    # Step 5: If max_count is specified, take top N by connections
    if (preferred_count is not None) and len(filtered_df) > preferred_count:
        filtered_df = filtered_df.sort_values('connections', ascending=False).head(preferred_count)

    return filtered_df

In [194]:
def prepare_abstracts_for_topic(data, topic_name, *, preferred_count_per_topic, connectivity_percentile_thr):
    filtered_df = data.df[data.df.comp == topic_name]

    highly_connected_df = filter_by_connectivity(
        filtered_df,
        data.papers_graph,
        percentile=connectivity_percentile_thr,
        preferred_count=preferred_count_per_topic
    )
    abstract_entries = highly_connected_df[['id', 'abstract']].to_dict(orient='records')

    topic_data = {
        'abstracts': abstract_entries
        #'abstracts': abstract_entries[0:2] # XXX: Playground
    }

    print(f"Highly connected papers: {len(filtered_df)} -> {len(highly_connected_df)}")

    return topic_data


In [196]:
import re
import html

def convert_to_html(text):
    # Step 1: Escape HTML special characters
    text = html.escape(text)

    # Step 2: Replace PMID references with links
    text = re.sub(
        r'PMID=(\d+)',
        r'<a href="https://pubmed.ncbi.nlm.nih.gov/\1" target="_blank">PMID: \1</a>',
        text
    )

    # Step 3: Convert double newlines or newlines to <p> blocks
    paragraphs = re.split(r'\n\s*\n|\n', text)
    html_paragraphs = [f"<p>{para.strip()}</p>" for para in paragraphs if para.strip()]

    return "\n".join(html_paragraphs)


In [223]:
from IPython.display import HTML, display

def _render_topic_html(topic_name, keyword_based_title, summary):
    return [
        "<tr>",
        "<td style='padding: 1em; border-bottom: 1px solid #ccc; text-align: left;'>",
        f"<h3 style='margin: 0 0 0.5em 0; font-weight: normal;'>Topic {topic_name}</h3>",
        f"<p style='margin: 0 0 0.5em 0;'><strong>{keyword_based_title}</strong></p>",
        "<hr style='border: none; border-top: 1px solid #ccc;'/>",
        convert_to_html(summary),
        "<hr style='border: none; border-top: 1px dashed #ccc;'/>",
        "</td>",
        "</tr>",
    ]

def _render_topic_plain(topic_name, keyword_based_title, summary):
    return [
        f"================ Topic {topic_name} =========================================",
        f"[TITLE]",
        keyword_based_title,
        f"--------------------------------",
        summary,
        f"--------------------------------\n",
    ]

def summarize_topics(
        *,
        data,
        topic_description_words=10,
        force_topic_name=None,
        as_html: bool = True
):
    preferred_count_per_topic = 50
    connectivity_percentile_thr = 50

    pubmed_cluster_names = sorted(data.df.comp.unique())

    topics_keywords = get_topics_description(
        data.df,
        data.corpus, data.corpus_tokens, data.corpus_counts,
        n_words=topic_description_words
    )

    output = []
    if as_html:
        output.append('<table style="width:100%; border-collapse: collapse;">')

    for topic_name in pubmed_cluster_names:
        if (force_topic_name is not None) and (topic_name != force_topic_name):
            continue

        print(f"Processing topic {topic_name}")

        topic_data = prepare_abstracts_for_topic(
            data, topic_name,
            connectivity_percentile_thr=connectivity_percentile_thr,
            preferred_count_per_topic=preferred_count_per_topic
        )

        summary = prompt_summarize_abstracts(topic_data)

        if summary:
            topic_summary_data = {
                "summary": summary,
                "topics_keywords": [k for k, v in topics_keywords[topic_name]],
            }
            keyword_based_title = prompt_assign_title_to_summary(topic_summary_data)
        else:
            keyword_based_title = ""

        # Render per topic
        if as_html:
            output.extend(_render_topic_html(int(topic_name)+1, keyword_based_title, summary))
        else:
            output.extend(_render_topic_plain(int(topic_name)+1, keyword_based_title, summary))

    if as_html:
        output.append("</table>")
        return "\n".join(output)  # HTML-safe string
    else:
        return "\n".join(output)

## Prompt System Instructions

In [159]:
SYSTEM_INSTRUCTIONS_LLM_PROMPT_SUMMARY_TITLE = """
You are a research bot, tasked with helping scientific researchers to assign a title to scientific text about some topic using text submitted to you and submitted keywords. Your job is to summarize the scientific topic text into one sentence to you using suggested keywords.

Be sure to:
* make a representative and short title for the given text
* focus on the main points of the text
* keep it condense and to the point
* NEVER output more that one sentence
* do not hallucinate
<EXCEPTION>
do not put by NO MEANS more that one sentence. Do not highlight title as bold string. E.g not like below:
**Virally Infected Diseases and Vaccine Strategies in Virology**

This research discusses respiratory infections, including COVID-19, and HPV-associated diseases. The emergence of SARS-CoV-2 led to a global pandemic, with vaccines showing efficacy. Community-acquired pneumonia and other respiratory viruses are also discussed. HPV infection is
</EXCEPTION>
"""

In [146]:
SYSTEM_INSTRUCTIONS_LLM_PROMPT_SUMMARIZE_ABSTRACT = """
You are a research bot, tasked with helping scientific researchers to explore scientific papers quicker. Your job is to summarize the text submitted to you.

Be sure to:
* do not add title to the summary
* NEVER make summary longer than 400 words
* tell as a story where each sentence are ALWAYS logically connected with previous
* put by NO MEANS short sentences that are not logically connected
* focus on the main points of the text
* keep it condense and to the point
* submitted text is set of blocks, each block marked with own ID, it is the line in the block starting from PMID= prefix
* if some sentence from summary is based on information from one or more blocks, specify at the end of this sentence block IDs used to generate the sentence
* Organize block IDs as coma-separated list in round brackets, e.g  (PMID=0000001, PMID=11111111, PMID=777777)
* do not hallucinate, please never hallucinate


<EXCEPTION>
do not put by NO MEANS short sentences like below:

**Comprehensive Analysis of Aging Mechanisms and Potential Interventions**

Aging leads to various cellular and molecular changes, contributing to age-related diseases and functional decline. Several studies explore these mechanisms and potential interventions.

One study found that Procyanidin C1 (PCC1), a compound with senolytic and senomorphic properties, counteracts aging-related changes in the hematopoietic and immune system by improving physiological parameters, increasing B cells and hematopoietic stem cells, suppressing senescence markers, and restoring immune homeostasis. PMID=40316527

Skeletal muscle deterioration, a hallmark of aging, involves reduced SIRT5 expression, leading to cellular senescence and inflammation; SIRT5 desuccinylates TBK1, suppressing inflammation and improving muscle function, suggesting the SIRT5-TBK1 pathway as a target for combating age-related muscle degeneration. PMID=40087407

Endothelial cell senescence, resulting from telomerase inactivation, induces transcriptional changes indicative of senescence and tissue hypoxia, compromising the blood-brain barrier and reducing muscle endurance, indicating that Tert loss causes EC senescence through a telomere length-independent mechanism undermining mitochondrial function. PMID=38475941

Blood-borne factors like osteocalcin (OCN) are crucial for maintaining neuronal synaptic plasticity, and OCN's effects are mediated by a primary cilium (PC) protein-autophagy axis; during aging, autophagy and PC core proteins are reduced, and restoring their levels improves cognitive impairments, suggesting the PC-autophagy axis as a gateway for communication between blood-borne factors and neurons. PMID=39984747

Mitochondrial dysfunction is a key aging determinant, and defects in mitochondrial protein and organelle quality control have been linked to various age-related diseases. PMID=37731280

Hyperactivation of mTORC1 signaling with aging contributes to cardiac dysfunction by dysregulating proteostasis, as shown in a 4EBP1 KO mouse model mimicking a hyperactive mTORC1/4EBP1/eIF4E axis. PMID=39379739

Dietary protein, particularly branched-chain amino acids (BCAAs), influences healthy aging; BCAA restriction protects against metabolic consequences of high protein diets and has tissue-specific effects on cellular senescence. PMID=39868338

Macroautophagy decreases with age, but mitophagy, the selective autophagic degradation of mitochondria, may increase or remain unchanged; pharmacological induction of mitophagy attenuates inflammation and ameliorates neurological function, pointing to mitophagy induction as a strategy to decrease age-associated inflammation. PMID=38280852

PGC-1, a mitochondrial regulator, is repressed with aging in the brain and is integral in coordinating metabolism and growth signaling, placing it centrally in a growth and metabolism network relevant to brain aging. PMID=40021651

Apigenin, a bioactive plant compound, may protect against age-related cognitive dysfunction by suppressing neuro-inflammatory processes driven by glial cells. PMID=38007051

Inhibition of mitochondrial malate dehydrogenase (MDH2) delays the aging process through metabolic-epigenetic regulation, identifying MDH2 as a potential therapeutic target for anti-aging drug development. PMID=39962087

The SATB protein DVE-1 influences lifespan independent of its canonical mitoUPR function, suggesting broader functions in modulating longevity and defending against stress. PMID=39423131

TFEB deficiency in the proximal tubules causes metabolic disorders and mitochondrial dysfunction, shedding light on the mechanisms of APOA4 amyloidosis pathogenesis and providing a therapeutic strategy for CKD-related metabolic disorders. PMID=39699959

β-hydroxybutyrate (HB), a ketone body, regulates protein solubility, selectively targeting pathological proteins like amyloid-β, suggesting a metabolically regulated mechanism of proteostasis relevant to aging and Alzheimer's disease. PMID=39626664

LRP5 promotes lower-body fat distribution and enhances insulin sensitivity, independent of its bone-related functions, and its activation may prevent age-related fat redistribution and metabolic disorders. PMID=40000740

Aging promotes STAT1 β-hydroxybutyrylation, attenuating IFN-I-mediated antiviral defense activity, and fructose can improve IFN-I antiviral defense activity by orchestrating STAT1 O-GlcNAc and β-hydroxybutyrylation modifications. PMID=39979583

HIRA and PML are essential for SASP expression, activating SASP through a CCF-cGAS-STING-TBK1-NF-κB pathway. PMID=39178863

TMEM242 depletion impairs ATP synthase, elevates ROS, upregulates sirt6 and nrf2, and increases f9a transcripts, potentially leading to bleeding tendencies. PMID=39856164

A disease-causing mutation of ABCA6 is identified for FPD, and ABCA6 is correlated with PD occurrence and subsequent OA progression, serving as a potential target in chondrogenesis and OA treatment by orchestrated intracellular cholesterol efflux and delayed cellular senescence. PMID=39823538

Endogenous DNA damage promotes hallmarks of age-related retinal degeneration, as shown in Ercc1-/- mice, which model a human progeroid syndrome. PMID=39604117

ACSS2 promotes the acetylation of PAICS, limiting purine metabolism and exacerbating cytoplasmic chromatin fragment accumulation and SASP, identifying ACSS2 as a potential senomorphic target to prevent senescence-associated diseases. PMID=40021646

SPP1 activates ITG5/1 to inhibit mitophagy, accelerates NPs degeneration, and induces calcification, leading to intervertebral disc degeneration (IVDD) and calcification. PMID=39721032

STXBP5 overexpression accelerates senescence, while STXBP5 deletion suppresses progerin expression, delaying senility, and decreasing the expression of senescence-related factors. PMID=39379476

Compartment-targeted FlucDM sensors pinpoint a diverse modulation of subcellular proteostasis by aging regulators. PMID=39383859

IGF-1 signaling plays a crucial role in preserving a youthful cerebromicrovascular endothelial phenotype and maintaining the integrity of the BBB. PMID=38082450

Aged hippocampal mitochondria exhibit impaired bioenergetic function, increased ROS production, deregulation of calcium homeostasis, and decreased mitochondrial biogenesis. PMID=36982549

TBK1-ATAD3A-Pink1 axis drives cellular senescence, suggesting a potential mitochondrial target for anti-aging therapy. PMID=39520088

Aging causes widespread reduction of proteins enriched in basic amino acids that is independent of mRNA regulation, and aberrant translation pausing leads to reduced ribosome availability resulting in proteome remodeling. PMID=38260253

Cysteine oxidation of muscle proteins impairs muscle power and strength, walking speed, and cardiopulmonary fitness with aging. PMID=38332629

CUL2FEM1B senses ROS produced by complex III of the electron transport chain (ETC), helping cells adjust their ETC to changing environments. PMID=39642856

HSF-1 mediates lifespan extension through mitochondrial network adaptations that occur in response to down-tuning of components associated with organellar protein degradation pathways. PMID=39532882

YBR238C oppositely affect mitochondria and aging, modulating mitochondrial function, demonstrating a feedback loop between TORC1 and mitochondria (the TORC1-MItochondria-TORC1 (TOMITO) signaling process) that regulates cellular aging processes. PMID=38713053

Mitochondrial metabolic modulation contributes to the longevity of daf-2 mutants, highlighting the crucial role of mitochondria in aging. PMID=40136535

MAVS safeguards mitochondrial homeostasis and antagonizes human stem cell senescence. PMID=37521327

TMEM135 is crucial for regulating mitochondria, peroxisomes, and lipids, emphasizing the importance of a balanced TMEM135 function for the health of the retina and other tissues. PMID=38576540

Blocking neddylation increased cellular hallmarks of aging and led to an increase in Tau aggregation and phosphorylation in neurons carrying the APPswe/swe mutation, indicating that cellular aging can reveal late-onset disease phenotypes. PMID=38917806

Mitochondrial DNA turnover in rat skeletal muscle decreases with age, contributing to losses of mitochondrial genomic integrity and potentially playing a role in skeletal muscle dysfunction. PMID=39312152

Suppression of NF-κB in cardiomyocytes leads to pronounced cardiac remodeling, dysfunction, and cellular damage associated with the aging process, influencing both cellular senescence and molecular damage pathways. PMID=39857807

MicroRNAs and neuropeptide-like proteins can form molecular regulatory networks involving downstream molecules to regulate lifespan, and such regulatory effects vary on environmental conditions. PMID=39323014

Pharmacological elevation of CISD2 expression at a late-life stage using hesperetin treatment is a feasible approach to effectively mitigating both intrinsic and extrinsic skin aging. PMID=38263133

Oxidative protein folding in the ER promotes cell aging, providing a potential target for aging and aging-related disease intervention. PMID=37306027

Mitochondrial morphology changes during aging, and C. elegans serve as a robust model for rapidly measuring mitochondrial

* Do not put same PMID in the one coma-separated list, like below:

Yin-Chen-Hao-Tang (YCHT) exhibited anti-fibrotic effects on the liver, potentially by suppressing oxidative stress and lipid peroxidation, restoring levels of metabolites like unsaturated fatty acids and lysophosphatidylcholines (PMID=26805802). Sophora flavescens (Kushen) itself showed anti-fibrosis activity, with studies identifying potential active compounds and targets through integrated network pharmacology and biomedical analysis (PMID=27754507). Fufang Biejia Ruangan Pill (FFBJ), an approved anti-fibrosis drug, contains various components including organic acids, terpenoids, flavonoids, phenylpropanoids, and alkaloids, with studies identifying absorbed components in vivo to understand its material basis (PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854, PMID=26724854,

</EXCEPTION>
"""

In [147]:
# SYSTEM_INSTRUCTIONS_LLM_PROMPT_SUMMARIZE_ABSTRACT = """
# You are a research bot, tasked with helping scientific researchers to explore scientific papers quicker.
#
# You are provided with a structured text where each paragraph marked with their "pmid" identifiers in a json format:
# [
#   {
#     "pmid": "37481675",
#     "paragraph": "<Here is the paragraph text>"
#   },...
# ]
#
# Your job is to summarize text as if it is some scientific text about some topic.
#
# If sentence from summary is based on on information from one or more "paragraph", specify at the end of this sentence "pmid"  of paragraphs used to generate the sentence. Organize sentence "pmid" ids as coma-separated list in round brackets, e.g  (PMID=0000001, PMID=11111111, PMID=777777)
#
# Be sure to:
# * do not add title to the summary
# * NEVER make summary text longer than 400 words if we count only words from sentences
# * tell as a story where each sentence are ALWAYS logically connected with previous
# * put by NO MEANS short sentences that are not logically connected
# * focus on the main points of the text
# * keep it condense and to the point
# * do not hallucinate, please never hallucinate
# """


In [148]:
# SYSTEM_INSTRUCTIONS_LLM_PROMPT_SUMMARIZE_ABSTRACT = """
# You are a research bot, tasked with helping scientific researchers to explore scientific papers quicker.
#
# You are provided with abstracts with their "pmid" identifiers in a json format:
# [
#   {
#     "pmid": "37481675",
#     "abstract": "<Here is the abstract of the first paper>"
#   },...
# ]
#
# Your job is to summarize all submitted abstracts as if it is one text about some topic.
#
# Your output format is a json file with a given structure:
# {
#     "summary": [
#         "paragraph": [
#             {
#                 "sentence": "<sentence from text>",
#                 "pmid_references": [
#                     "00001", "11111", "222222
#                 ]
#             }, ...
#         ], ...
#     ]
# }
# Summary could contain from one or more paragraphs. Each paragraph has one or more sentence.
# If sentence from summary is based on on information from one or more "abstract", you need to add corresponding "pmid" to "pmid_references" list for this "sentence".
#
# You task is to output every sentence from summary in a specified output format. Keeping valid output format is very important for my research.
#
# Be sure to:
# * make a representative and short title for the summary
# * NEVER make summary text longer than 400 words if we count only words from sentences
# * tell as a story where each sentence are ALWAYS logically connected with previous
# * put by NO MEANS short sentences that are not logically connected
# * focus on the main points of the text
# * keep it condense and to the point
# * do not hallucinate, please never hallucinate
# """

## Summarization wrappers for Local Execution

`summarize_topics(..)` function works through this API. This implementation is for direct prompt execution from local machine, w/o google cloud infrastructure

In [227]:
from google import genai
from google.genai import types
import base64

In [228]:
VERTEX_CLIENT_PROJECT = os.getenv('VERTEX_CLIENT_PROJECT', 'default_project')

In [236]:
def prompt_summarize_abstracts(topic_data):
    if 'abstracts' in topic_data:
        abstract_entries = topic_data['abstracts']

        validated_entries = []
        for entry in abstract_entries:
            assert 'id' in entry
            assert 'abstract' in entry


            pmid = entry['id']
            abstract = entry['abstract']

            # TODO: validate id & abstract size & type, truncated if too long

            validated_entries.append(dict(pmid=pmid, abstract=abstract))

        assert len(abstract_entries) == len(validated_entries)

        return do_llm_prompt_summarize_abstracts(validated_entries)


    # XXX: error
    return ""

In [237]:
def prompt_assign_title_to_summary(topic_summary_data):
    if ('summary' in topic_summary_data) and ('topics_keywords' in topic_summary_data):
        #print(request_data['topics_keywords'])

        return do_llm_prompt_assign_title_to_summary(topic_summary_data['summary'], topic_summary_data['topics_keywords'])

    # XXX: error
    return ""

## Local LLM Prompts Functions

This functions could be used directly in Google Cloud Function Impl

In [153]:
def do_llm_prompt_assign_title_to_summary(summary, topics_keywords):
    args = dict(
        model="gemini-2.0-flash-lite-001",
        temperature = 0.01, top_p = 0.95, top_k=None, max_output_tokens = 64,
    )

    topic_desc = ",".join(topics_keywords)
    llm_response = llm_generate(
        project=VERTEX_CLIENT_PROJECT,
        system_instructions=SYSTEM_INSTRUCTIONS_LLM_PROMPT_SUMMARY_TITLE,
        text_to_summarize=f"KEYWORDS: {topic_desc}\nTEXT:\n{summary}",
        **args
    )

    #print(llm_response)
    print("[DONE] Response len", len(llm_response))

    return llm_response

In [154]:
def do_llm_prompt_summarize_abstracts(entries):
    args = dict(
        model="gemini-2.5-flash-preview-04-17", thinking_budget=1000,
        # model="gemini-2.0-flash-001",
        # model="gemini-2.0-flash-lite-001",
        temperature=0.1, top_p=0.75, top_k=30, max_output_tokens=8192,
    )

    # entries = entries[0:5] # TODO: FIX

    # text_to_summarize=json.dumps(entries, ensure_ascii=False, indent=2)
    text_to_summarize = "\n".join(f"\nPMID={e['pmid']}\n{e['abstract']}" for e in entries)

    llm_response = llm_generate(
        project=VERTEX_CLIENT_PROJECT,
        system_instructions=SYSTEM_INSTRUCTIONS_LLM_PROMPT_SUMMARIZE_ABSTRACT,
        text_to_summarize=text_to_summarize,
        **args
    )

    #print(llm_response)
    print("[DONE] Response len", len(llm_response))

    return llm_response

In [155]:
def llm_generate(
        *,
        project,
        system_instructions,
        text_to_summarize,
        model, temperature, top_p, top_k, max_output_tokens,
        thinking_budget=None,
        max_retries=5
):
    # print(text_to_summarize)
    # if True:
    #     return ""

    client = genai.Client(
        vertexai=True,
        project=project,
        location="us-central1",
    )

    msg1_text1 = types.Part.from_text(text=text_to_summarize)
    si_text1 = system_instructions

    contents = [types.Content(role="user", parts=[msg1_text1])]

    optional_args = {}
    if thinking_budget is not None:
        optional_args['thinking_config'] = types.ThinkingConfig(thinking_budget=thinking_budget)

    generate_content_config = types.GenerateContentConfig(
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        seed=0,
        max_output_tokens=max_output_tokens,
        response_modalities=["TEXT"],
        safety_settings=[types.SafetySetting(
            category="HARM_CATEGORY_HATE_SPEECH",
            threshold="OFF"
        ), types.SafetySetting(
            category="HARM_CATEGORY_DANGEROUS_CONTENT",
            threshold="OFF"
        ), types.SafetySetting(
            category="HARM_CATEGORY_SEXUALLY_EXPLICIT",
            threshold="OFF"
        ), types.SafetySetting(
            category="HARM_CATEGORY_HARASSMENT",
            threshold="OFF"
        )],
        system_instruction=[types.Part.from_text(text=si_text1)],
        **optional_args
    )

    retries_cnt = 0
    while retries_cnt < max_retries:
        chunks = []
        retries_cnt += 1

        for chunk in client.models.generate_content_stream(
                model=model,
                contents=contents,
                config=generate_content_config,
        ):
            assert chunk is not None
            if chunk.text is None:
                # when chunk.text is None - smth goes wrong - XXX: is know bug in 2.5 ==> REDO
                chunks = None
                break

            # if debug:
            #     print("CHUNK: [", chunk.text, end="]\n")

            if chunk.text is not None:
                chunks.append(chunk.text)

        # join chunks:
        if chunks is None:
            print(f"Retry #{retries_cnt}...")
            continue

        return "".join(chunks)

    raise Exception("Cannot summarize text. Maximum number of retries reached")


## Run

In [240]:
as_html=False

result = summarize_topics(
    data=data, topic_description_words=10, #config.topic_description_words,
    force_topic_name=1,
    as_html=as_html
)
if as_html:
    display(HTML(result))
else:
    print(result)

Processing topic 1
Highly connected papers: 152 -> 50
[DONE] Response len 8784
[DONE] Response len 181
[TITLE]
Mitochondrial dysfunction and metabolic regulation of proteins are key factors in the aging process, impacting muscle and neuronal function through genetic and cellular mechanisms.

--------------------------------
Aging is a complex process marked by progressive functional decline across various systems, driven by hallmarks such as cellular senescence, mitochondrial dysfunction, and impaired proteostasis (PMID=37731280, PMID=36982549, PMID=39383859). Cellular senescence, characterized by stable proliferation arrest and an inflammatory senescence-associated secretory phenotype (SASP), contributes significantly to age-related pathologies (PMID=39178863, PMID=40021646). The SASP is activated through pathways involving cytoplasmic chromatin fragments, cGAS-STING, TBK1, and NF-κB, with HIRA and PML playing essential roles (PMID=39178863). ACSS2 exacerbates SASP by limiting purine 

# Run Prompts via Google Cloud Functions

## Summarization wrappers for Google Cloud Functions Execution

`summarize_topics(..)` function works through this API. This implementation prepare data and calls endpoints in Google Cloud

In [231]:
GOOGLE_MODEL_SUMMARIZE_TOPIC_ENDPOINT = os.getenv(
    "GOOGLE_MODEL_SUMMARIZE_TOPIC_ENDPOINT",
    "https://put.your.url.endpoint.here"
)

def prompt_summarize_abstracts(topic_data):
    response = requests.post(
        f"{GOOGLE_MODEL_SUMMARIZE_TOPIC_ENDPOINT}",
        json=topic_data, # XXX Pass Python object here, not dump
        # json=json.dumps(topic_data, ensure_ascii=False, indent=2),
        headers={"Content-Type": "application/json"}
    )

    # 5. Handle response
    if response.status_code == 200:
        data = response.json()

        if "summary" in data:
            return data["summary"]

        print(f"❌ Error: No summary in response")
    else:
        print(f"❌ Error: {response.status_code}")

    return ""


In [234]:
GOOGLE_MODEL_SUMMARIZE_TOPIC_TITLE_ENDPOINT = os.getenv(
    "GOOGLE_MODEL_SUMMARIZE_TOPIC_TITLE_ENDPOINT",
    "https://put.your.url.endpoint.here"
)

def prompt_assign_title_to_summary(topic_summary_data):
    response = requests.post(
        f"{GOOGLE_MODEL_SUMMARIZE_TOPIC_TITLE_ENDPOINT}",
        json=topic_summary_data,
        headers={"Content-Type": "application/json"}
    )

    # Handle response
    if response.status_code == 200:
        data = response.json()

        if "title" in data:
            return data["title"]

        print(f"❌ Error: No title in response")
    else:
        print(f"❌ Error: {response.status_code}")

    return ""


## Run

In [226]:
as_html=True

result = summarize_topics(
    data=data, topic_description_words=10, #config.topic_description_words,
    # force_topic_name=1,
    as_html=as_html
)
if as_html:
    display(HTML(result))
else:
    print(result)

Processing topic 0
Highly connected papers: 164 -> 50
Processing topic 1
Highly connected papers: 152 -> 50
Processing topic 2
Highly connected papers: 139 -> 50
Processing topic 3
Highly connected papers: 104 -> 50
Processing topic 4
Highly connected papers: 101 -> 50
Processing topic 5
Highly connected papers: 92 -> 50
Processing topic 6
Highly connected papers: 58 -> 50
Processing topic 7
Highly connected papers: 57 -> 50
Processing topic 8
Highly connected papers: 55 -> 50
Processing topic 9
Highly connected papers: 36 -> 36


0
"Topic 1 This study investigates how various biological and lifestyle factors associate with the risk of accelerated aging and mortality in adults. Aging is a complex process characterized by a gradual decline in functional capacity and increased susceptibility to diseases, and various methods are being developed to measure biological age beyond chronological age to better understand individual trajectories (PMID: 38581608, PMID: 38058300). Biomarkers such as proteomic profiles, epigenetic clocks, physiological indicators, and telomere length are utilized to estimate biological age and predict risks for age-related diseases, multimorbidity, and mortality (PMID: 39117878, PMID: 38130910, PMID: 39633416). Longitudinal studies mapping the blood proteome have identified aging-related proteins associated with clinical traits and chronic diseases, leading to the development of proteomic healthy aging scores capable of predicting cardiometabolic disease incidence (PMID: 39805987). Epigenetic clocks, particularly GrimAge and GrimAge2, have demonstrated strong associations with all-cause and cause-specific mortality risk, often showing superior predictive performance compared to earlier clocks like HorvathAge and HannumAge (PMID: 40301953, PMID: 39633416). Physiological age, derived from clinical indicators, also reveals disparities in aging trajectories influenced by factors like sex and education, with higher education potentially providing a greater midlife buffer against physiological aging for women (PMID: 39830243, PMID: 40156883). Growth differentiation factor 15 (GDF-15) is another significant biomarker, increasing with age and correlating with epigenetic aging clocks, impaired glycemic control, inflammation, and physical decline (PMID: 39644331). Beyond biological markers, multidomain frameworks integrating objective indicators like hearing loss, tooth loss, and falls with subjective aging perceptions are associated with increased risks of all-cause and cause-specific premature mortality (PMID: 39863635). Social factors, including disadvantage, discrimination, loss of loved ones, and social deficits, are linked to accelerated biological aging and increased risk of age-related diseases, potentially mediated by inflammatory pathways (PMID: 40087516, PMID: 39132086, PMID: 39073817, PMID: 39973988). Lifestyle factors significantly influence biological aging, with physical activity generally associated with lower age acceleration and longevity, while sedentary behavior, particularly screen time, is linked to accelerated epigenetic aging (PMID: 39230733, PMID: 39794269, PMID: 39821867). Dietary factors also play a role, as higher intake of methyl donor nutrients is negatively associated with phenotypic age acceleration, while sugar-sweetened beverage intake, especially in the evening, is positively correlated with it, potentially mediated by obesity (PMID: 40221525, PMID: 39780125). Insulin resistance, indicated by the Triglyceride-glucose index, is positively associated with biological age and accelerated aging risk (PMID: 40022176). Environmental exposures like volatile organic compounds are also linked to biological aging, with daily behaviors potentially influencing susceptibility (PMID: 40264054). Interventions such as enhancing cardiorespiratory fitness, time-restricted eating, and visual arts-mediated cognitive activation therapy show potential in slowing biological aging or mitigating its effects on health outcomes like chronic respiratory diseases, inflammation, oxidative stress, and dementia (PMID: 39391738, PMID: 39861451, PMID: 38524114)."
"Topic 2 Mitochondrial dysfunction and metabolic dysregulation are key factors in aging, impacting muscle and neuronal function, and are regulated by genetic and protein pathways in mouse models. Aging is a complex process marked by cellular and molecular changes like cellular senescence, mitochondrial dysfunction, and impaired proteostasis, which contribute to age-related diseases and functional decline (PMID: 40316527, PMID: 37731280, PMID: 39379739, PMID: 39383859). Cellular senescence, characterized by a stable proliferation arrest and an inflammatory senescence-associated secretory phenotype (SASP), is a key driver of aging (PMID: 39178863, PMID: 40021646). SASP expression is influenced by factors like cytoplasmic chromatin fragments activating the cGAS/STING pathway, and proteins like HIRA and PML are essential for this process (PMID: 39178863, PMID: 40021646). ACSS2 promotes SASP by acetylating PAICS, limiting purine metabolism and exacerbating cytoplasmic chromatin fragment accumulation, suggesting ACSS2 as a senomorphic target (PMID: 40021646). The TBK1-ATAD3A-Pink1 axis also drives cellular senescence, presenting a potential mitochondrial target for anti-aging therapy (PMID: 39520088). Mitochondrial dysfunction is a central hallmark of aging, linked to various age-related diseases and functional decline in tissues like the brain, skeletal muscle, and heart (PMID: 37731280, PMID: 36982549, PMID: 39187977, PMID: 39699959). Aged hippocampal mitochondria show impaired bioenergetic function, increased ROS, calcium deregulation, and decreased biogenesis and mitophagy (PMID: 36982549). Mitochondrial DNA turnover decreases with age in skeletal muscle, contributing to genomic integrity loss and dysfunction (PMID: 39312152). Defects in mitochondrial protein and organelle quality control are linked to age-related diseases (PMID: 37731280). The mTORC1 signaling pathway, often hyperactivated with aging, contributes to cardiac dysfunction by dysregulating proteostasis (PMID: 39379739). Loss of proteostasis is also a hallmark of aging and Alzheimer's disease, and the ketone body β-hydroxybutyrate (HB) regulates protein solubility, selectively targeting pathological proteins like amyloid-β (PMID: 39626664). Aging causes a widespread reduction of proteins enriched in basic amino acids, independent of mRNA regulation, due to aberrant translation pausing and reduced ribosome availability (PMID: 38260253). Oxidative protein folding in the ER, producing H2O2, promotes cell aging, identifying PDI as a potential target (PMID: 37306027). Endogenous DNA damage promotes hallmarks of age-related retinal degeneration, as seen in Ercc1-/- mice (PMID: 39604117). Age-related skeletal muscle deterioration involves reduced SIRT5 expression, which normally desuccinylates TBK1 to suppress inflammation, suggesting the SIRT5-TBK1 pathway as a target (PMID: 40087407). Endothelial cell senescence, caused by Tert loss, induces transcriptional changes indicative of senescence and tissue hypoxia, compromising the blood-brain barrier and reducing muscle endurance, partly through a telomere length-independent mechanism undermining mitochondrial function (PMID: 38475941). Blood-borne factors like osteocalcin (OCN) maintain neuronal synaptic plasticity via a primary cilium (PC) protein-autophagy axis, which declines with aging, and restoring this axis improves cognitive impairments (PMID: 39984747). Dietary components like branched-chain amino acids (BCAAs) influence healthy aging, with restriction protecting against metabolic consequences and having tissue-specific effects on senescence (PMID: 39868338). Pharmacological interventions show promise, such as Procyanidin C1 (PCC1) counteracting aging in the hematopoietic and immune system by improving physiological parameters, increasing B cells and HSCs, suppressing senescence markers, and restoring immune homeostasis (PMID: 40316527). Apigenin may protect against age-related cognitive dysfunction by suppressing neuro-inflammatory processes driven by glial cells (PMID: 38007051). Pharmacological induction of mitophagy with urolithin A attenuates inflammation and ameliorates neurological function in old mice (PMID: 38280852). Inhibition of mitochondrial malate dehydrogenase (MDH2) delays aging through metabolic-epigenetic regulation, identifying MDH2 as a potential therapeutic target and Glibenclamide as a lead compound (PMID: 39962087). Blocking neddylation increased cellular hallmarks of aging and led to increased Tau aggregation and phosphorylation in neurons, indicating that cellular aging can reveal late-onset disease phenotypes (PMID: 38917806). Suppression of NF-κB in cardiomyocytes leads to pronounced cardiac remodeling, dysfunction, and cellular damage associated with aging, influencing both cellular senescence and molecular damage pathways (PMID: 39857807). Pharmacological elevation of CISD2 expression at a late-life stage using hesperetin treatment is a feasible approach to effectively mitigating both intrinsic and extrinsic skin aging (PMID: 38263133). Inhibition of SGLT2 enhances clearance of senescent cells, ameliorating age-associated phenotypic changes and extending lifespan in premature aging mice (PMID: 38816549). Muscle-specific overexpression of TFEB, enhancing lysosomal and mitochondrial function, reduces neuroinflammation and tau pathology, preserving cognition in aged mice (PMID: 37952157). Betaine transcriptionally represses Mss51 via Yy1, improving age-related mitochondrial respiration in skeletal muscle and delaying muscle loss (PMID: 39187977). The uncharacterized gene CG11837, a putative ortholog of human DIMT1, regulates insect lifespan and protects human cells from cellular senescence (PMID: 38834883). Hypercapnia interferes with satellite cell activation, autophagy flux, and myogenesis, and systemic rapamycin administration improves these outcomes (PMID: 39589836)."
"Topic 3 The gut microbiotas and metabolites are effective in anti-aging interventions, potentially extending lifespans. Aging is a primary risk factor for most chronic disorders, characterized by cellular senescence and the development of a pro-inflammatory senescence-associated secretory phenotype (SASP), which significantly contributes to organismal aging and age-related diseases (PMID: 37475161, PMID: 38502584, PMID: 39273040, PMID: 39604391). This age-associated inflammation, termed inflammaging, is multifactorial and involves various pathways, including inflammasome activation, which can manifest at both transcriptional and post-translational levels depending on the tissue (PMID: 39256821). Mitochondrial dysfunction is also a key determinant of aging, impacting cellular fitness and contributing to age-related decline (PMID: 39273040, PMID: 39872286, PMID: 39599637, PMID: 38613037, PMID: 39857414). Furthermore, aging involves metabolic changes, such as altered arginine metabolism in red blood cells, which serves as a biomarker for aging and is linked to cellular damage (PMID: 39478346, PMID: 39604391). The gut microbiome undergoes considerable changes with age, influencing host metabolism and aging processes through complex interactions, and these changes can be sex and mitochondrial-haplotype specific (PMID: 37582366, PMID: 40140706, PMID: 39560153, PMID: 40015964, PMID: 38364095, PMID: 39628383, PMID: 40074999). Age-related changes are heterogeneous across different cell types and tissues, with some populations being more transcriptionally dynamic than others, and can involve DNA damage, epigenetic alterations, and changes in stem cell populations (PMID: 37706427, PMID: 38502584, PMID: 39385256, PMID: 38604248, PMID: 39604391). Vascular aging, driven by factors like oxidative stress and inflammation, promotes atherosclerosis, while the accumulation of senescent cells in vascular tissues is considered a primary cause (PMID: 38502584). Similarly, senescent hepatocytes accumulate in metabolic dysfunction-associated steatotic liver disease (MASLD) and contribute to disease progression by losing key functions and releasing detrimental factors (PMID: 40155379). Targeting senescent cells with senotherapeutics, which include senolytics that eliminate senescent cells and senomorphics that suppress the SASP, is an emerging strategy to delay aging and alleviate age-related pathologies (PMID: 37475161, PMID: 40155379, PMID: 39730824, PMID: 40220121, PMID: 39463954). Natural compounds and dietary interventions show promise as senotherapeutics or modulators of aging pathways; for instance, Rutin acts as a senomorphic by dampening SASP expression, while curcumin prolongs lifespan by inhibiting TORC1 and enhancing ATP levels (PMID: 37475161, PMID: 39273040). Other compounds like Ginkgetin inhibit the cGAS-STING pathway to alleviate inflammation and senescence, and specific bacterial derivatives or dietary components like cinnamaldehyde and fish collagen oligopeptides demonstrate anti-aging effects by modulating pathways like mTORC1, autophagy, oxidative stress, and inflammation (PMID: 39558862, PMID: 39560153, PMID: 39760475, PMID: 38613037). Time-restricted feeding can also preserve some aspects of organ function that decline with aging, such as retinal function (PMID: 38964431). Furthermore, interventions targeting protein damage, like immunotherapy against isoDGR-modified proteins, or modulating specific metabolic pathways, such as branched-chain amino acid metabolism via gut microbiota, show potential for mitigating age-related decline (PMID: 37971164, PMID: 39628383). Artificial intelligence platforms are being developed to identify potential geroprotectors and provide insights into aging mechanisms (PMID: 39627462)."
"Topic 4 Cellular senescence, driven by inflammation and regulated by various signaling pathways, is a key factor in aging, with potential treatments targeting senescent cells and their effects. Aging involves progressive deterioration across multiple biological systems, characterized by molecular dysregulation and functional decline (PMID: 37481675, PMID: 38107570). Cellular senescence, a state of permanent cell cycle arrest, is a key hallmark of aging, contributing to tissue dysfunction and age-related diseases (PMID: 37697347, PMID: 39739833, PMID: 38107570). Senescent cells often exhibit a senescence-associated secretory phenotype (SASP), releasing pro-inflammatory factors and other molecules that can impact surrounding tissues and promote chronic inflammation (PMID: 38839869, PMID: 39012326, PMID: 39617789). For instance, senescent lung endothelial cells weaken cell-cell junctions and promote neutrophil migration, contributing to inflammation (PMID: 39815038), while senescent microglia contribute to neuroinflammation in the brain (PMID: 39404415). The SASP is regulated by various pathways, including ASK1-p38 signaling (PMID: 38839869), NF-κB activation (PMID: 37697347), and involves extensive actin cytoskeleton remodeling linked to cell enlargement (PMID: 39962062). DNA damage and impaired DNA repair capacity are also observed in senescent cells, potentially making them ""ticking bombs"" (PMID: 39739833, PMID: 39772747). Maintaining protein homeostasis (proteostasis) is crucial, as its decline with age contributes to neurodegenerative disorders (PMID: 39753948). Mitochondrial dysfunction is another significant factor in aging, affecting bioenergetic function and contributing to cellular senescence and inflammation (PMID: 39554099, PMID: 38627524, PMID: 39868046, PMID: 38157106, PMID: 38397399). Impaired mitochondrial function can lead to increased reactive oxygen species (ROS) and DNA damage, triggering senescence (PMID: 38689095, PMID: 38217101). Several interventions and compounds show promise in mitigating aging effects. Systems-level analysis reveals that lifespan-extending interventions like calorie restriction and rapamycin often tighten the regulation of biological modules, particularly in metabolism and stress response, and reduce inflammation (PMID: 37481675, PMID: 37698783). Calorie restriction benefits may be partly mediated by metabolites like lithocholic acid (LCA), which activates AMPK (PMID: 39695227). Young blood components, specifically small extracellular vesicles (sEVs), can counteract aging by stimulating PGC-1 expression and enhancing mitochondrial function (PMID: 38627524, PMID: 40082997). The peptide DADLE improved motor/cognitive function and extended lifespan in aged mice, reducing inflammation and increasing anti-aging markers (PMID: 39312071). Mitochondrial-targeted peptide Elamipretide improved functional declines in aging mice, associated with pro-longevity shifts in gene expression related to metabolism and reduced inflammation, although not always reflected in molecular age markers (PMID: 39554099). Ketone bodies like beta-hydroxybutyrate (BHB), induced by ketogenic diets, can improve brain function, memory, and synaptic plasticity in aged mice, potentially by attenuating microglial inflammatory responses and promoting autophagy (PMID: 38843842, PMID: 39697898, PMID: 38106160, PMID: 39126207, PMID: 39821043). The gut microbiome also influences aging; an aged microbiome can accelerate vascular aging and metabolic impairment, which can be improved by acetate supplementation or a high-fiber diet (PMID: 39897133, PMID: 38157106). Targeting senescent cells or their effects is a therapeutic strategy; senolytics like TBB can selectively eliminate senescent cells by disrupting heterochromatin remodeling (PMID: 38816370), while inhibiting the YAP-TEAD pathway with verteporfin can eliminate senescent cells by hindering ER activity required for SASP (PMID: 37667102). CAR-T cells targeting NKG2D ligands on senescent cells show potential for selective elimination (PMID: 38704364). Compounds like Procyanidin A1 (PC A1) from peanut skin extract alleviate senescence by reducing ROS and inducing autophagy (PMID: 40227314). Bisphosphonates, used for bone loss, also show extra-skeletal anti-aging effects, potentially by influencing senescence, autophagy, and inflammation pathways (PMID: 40196558). Other potential targets include Cysltr1 for retinal aging (PMID: 39891615), LEF1 in immune cell senescence (PMID: 37961030), and the PI3K-AKT-FOXO1 pathway in endometrial aging (PMID: 39695355). Oxidative protein folding in the ER also promotes cell aging (PMID: 38771153). A novel hypothesis suggests tissue autodigestion by leaking digestive enzymes from the gut contributes to aging (PMID: 39418235). PML nuclear bodies and their associated proteins also play a significant role in cellular aging processes (PMID: 39768166)."
"Topic 5 Researchers in the field of geroscience are using science and tools to publicly study aging, identifying challenges and potential medicine through ketone trials. Aging is a complex process influenced by various factors, and researchers are developing methods to measure biological age, which often deviates from chronological age and better reflects health and disease risk (PMID: 39392224, PMID: 40159527, PMID: 39369207). Epigenetic clocks, particularly those based on DNA methylation, are widely used biomarkers that predict chronological age, health outcomes, and mortality, with newer generations incorporating health and lifestyle factors (PMID: 39411517, PMID: 39888134, PMID: 38441802, PMID: 38482631). These clocks can be developed for different tissues like blood, buccal swabs, and saliva, and their outputs, while useful indicators, should not be conflated with whole-body biological aging (PMID: 39411517, PMID: 39888134, PMID: 38037563, PMID: 39392224). Studies show that lifestyle factors such as diet, physical activity, sleep, obesity, and tobacco use are significantly associated with accelerated biological aging as measured by these clocks, particularly those trained on morbidity and mortality (PMID: 39073811, PMID: 39369207, PMID: 40096467). Beyond epigenetics, biological age can be assessed using multi-dimensional approaches combining various biomarkers like cognitive measures, body composition, and metabolomic profiles, which can provide a more comprehensive view of age-related decline and resilience (PMID: 40159527, PMID: 39278973, PMID: 38206765). Accelerated biological aging has been observed in specific diseases like chronic liver disease, type 2 diabetes, breast cancer, and heart failure, suggesting that targeting aging mechanisms could be a strategy for preventing or treating age-related conditions (PMID: 39026032, PMID: 37987887, PMID: 40175686, PMID: 40297496). Key molecular mechanisms underlying aging include cellular senescence, inflammation, mitochondrial dysfunction, and epigenetic alterations, and researchers are exploring these as potential therapeutic targets (PMID: 37148367, PMID: 39399066, PMID: 39839443, PMID: 39742485). Databases and computational tools are being developed to integrate multi-omics data, identify aging-related genes and pathways, and facilitate the discovery and validation of anti-aging interventions and biomarkers across species (PMID: 37870433, PMID: 37850649, PMID: 39695922, PMID: 39356317, PMID: 38470883). Animal models like nonhuman primates, mice, C. elegans, and Drosophila are crucial for understanding aging mechanisms and screening potential longevity interventions (PMID: 39585646, PMID: 37432607, PMID: 39346601). Overall, research is advancing the understanding of aging as a malleable process, identifying biomarkers and mechanisms that can be targeted to promote healthy longevity and reduce the burden of age-related diseases (PMID: 40297496, PMID: 38289789)."
"Topic 6 Age-related changes in the genome, transcriptome, and chromatin, including telomere shortening, contribute to somatic decline and influence lifespan across various species. Aging is a complex process characterized by transcriptomic and epigenetic changes across multiple tissues and organs, including the brain, contributing to functional decline and increased mortality risk (PMID: 37118429, PMID: 37822253, PMID: 39843737, PMID: 39242518, PMID: 37986960, PMID: 38471806, PMID: 39463924, PMID: 37986616, PMID: 40033047, PMID: 40287511, PMID: 37828039, PMID: 37465120). These age-associated changes manifest as hallmarks like genomic instability, telomere attrition, epigenetic alterations, loss of proteostasis, mitochondrial dysfunction, cellular senescence, stem cell exhaustion, altered intercellular communication, and inflammation (PMID: 3659939, PMID: 37986616). Transcriptomic analysis reveals age-related gene expression changes that are often tissue-specific but also include evolutionarily conserved pathways, with differential expression of protein-coding genes and lncRNAs being significant in aging (PMID: 37118429, PMID: 37269831, PMID: 39896546, PMID: 40033047, PMID: 39975269, PMID: 40287511, PMID: 37828039). Epigenetic modifications, such as changes in DNA methylation and chromatin accessibility, are key drivers of aging, influencing gene expression and cellular identity, with specific regions like AP-1 binding motifs becoming more accessible in elderly cells while TEAD binding motifs are enriched in younger cells (PMID: 39843737, PMID: 39934111, PMID: 40133272, PMID: 39242518, PMID: 38062919, PMID: 37465120). Cellular senescence, marked by irreversible cell cycle arrest and the accumulation of senescent cells, is a significant contributor to aging and age-related diseases, with markers like senescence-associated beta-galactosidase showing tissue-specific onset and intensity differences between long-lived and short-lived species (PMID: 40082363, PMID: 38568207, PMID: 39872064, PMID: 37935676, PMID: 37987889). Mitochondrial dysfunction, characterized by impaired bioenergetics, increased ROS, and altered dynamics, is another central hallmark of aging, linked to various age-related diseases and influenced by factors like mitochondrial DNA turnover and somatic mutations (PMID: 39188058, PMID: 38361161, PMID: 36982549, PMID: 39520088, PMID: 39312152, PMID: 38955799). Telomere shortening, exacerbated by oxidative stress, induces telomere damage and cellular senescence, involving structural changes like single-stranded breaks and R-loop formation (PMID: 39448517, PMID: 40219969, PMID: 37987889). Loss of proteostasis, including increased translational errors like stop-codon readthrough, contributes to age-related decline in tissues like muscle and brain (PMID: 40021653). Transposable elements, such as LINE-1 and Alu, can become active with age, contributing to genomic instability and inflammation, with cGAS playing a role in maintaining heterochromatin on these elements (PMID: 39314493, PMID: 39416083, PMID: 38318533). Interventions like heterochronic parabiosis have shown potential to reverse aspects of aging, improving physiological parameters, extending lifespan and healthspan, and inducing lasting epigenetic and transcriptomic remodeling towards a younger state (PMID: 37118429, PMID: 37500973). Exercise training also shifts epigenetic and transcriptomic patterns in human skeletal muscle towards a younger profile, maintaining genes related to muscle structure, metabolism, and mitochondrial function (PMID: 37128843). Chemical cocktails have been identified that can restore youthful transcript profiles and reverse transcriptomic age in human cells without altering cellular identity (PMID: 37437248). Microbiome transplants in fruit flies did not improve longevity and sometimes were detrimental, suggesting that for this model, the presence of a microbiome might not be beneficial for lifespan (PMID: 39835966). Sex differences are observed in aging, affecting memory impairment, transcriptomic and chromatin landscapes, and the impact of interventions like caloric restriction, highlighting the need for sex-specific studies (PMID: 39896546, PMID: 38318533, PMID: 40192444, PMID: 39975269). The study of aging across different species, including short-lived models like the African turquoise killifish and planarians, provides insights into conserved and unique mechanisms of lifespan regulation and potential rejuvenation strategies (PMID: 37269831, PMID: 38568207, PMID: 39302208, PMID: 37828039, PMID: 37837621, PMID: 40181188)."
"Topic 7 Single-cell analysis identifies age-related gene expression changes in the tumor microenvironment, offering prognostic insights for LUAD and potential immune modulation strategies. Studies exploring the aging transcriptome often focus on genes that change with age, but investigating age-invariant genes, which remain unchanged, can also provide valuable insights and serve as reference genes in expression studies (PMID: 38645168, PMID: 39873648). A systematic investigation across the lifespan in mouse tissues identified pan-tissue and tissue-specific age-invariant genes that are stable and validated, noting that these genes tend to have shorter transcripts and are enriched for CpG islands (PMID: 38645168, PMID: 39873648). While hallmarks of aging typically involve changes in cellular maintenance, some genes associated with these hallmarks resist expression fluctuations with age (PMID: 38645168, PMID: 39873648). Traditional reference genes are often not universally appropriate for aging studies across all tissues, highlighting the need for tissue-specific and pan-tissue alternatives identified in such studies (PMID: 38645168, PMID: 39873648). Aging is a complex process characterized by physiological decline and increased susceptibility to diseases, including various cancers, osteoporosis, sarcopenia, diabetic nephropathy, and stroke (PMID: 38274286, PMID: 38134103, PMID: 39022075, PMID: 39762477, PMID: 38978778, PMID: 37404805, PMID: 37382646). Cellular senescence, an irreversible cell cycle arrest, is a fundamental hallmark of aging implicated in the pathogenesis and progression of age-related diseases and cancers (PMID: 39820552, PMID: 40279419, PMID: 38798365, PMID: 39723224, PMID: 37752791, PMID: 37382646). Studies utilizing transcriptomic, proteomic, and epigenetic data, often at single-cell resolution, combined with machine learning techniques, are identifying aging-related biomarkers and constructing prognostic models for diseases like hepatocellular carcinoma, lung adenocarcinoma, breast cancer, gastric cancer, and thyroid carcinoma (PMID: 39353907, PMID: 40216977, PMID: 39820552, PMID: 39762477, PMID: 40205433, PMID: 40279419, PMID: 40111533, PMID: 38151291, PMID: 38714870, PMID: 37600707, PMID: 40076502, PMID: 39757226, PMID: 39723224, PMID: 38134103, PMID: 38978778, PMID: 37404805, PMID: 39769366, PMID: 39022075, PMID: 40256325, PMID: 40299459). These analyses reveal age-related heterogeneity in tumor cells and the microenvironment, identify key genes and pathways involved in disease progression, and suggest potential therapeutic targets and strategies, including those related to immune modulation and senotherapy (PMID: 40216977, PMID: 39578560, PMID: 40211000, PMID: 38714870, PMID: 37600707, PMID: 40076502, PMID: 39757226, PMID: 39022075, PMID: 39896625, PMID: 37996875, PMID: 40256325, PMID: 40299459). Biomarkers based on DNA methylation patterns or urinary proteomic profiles are also being developed to predict biological age and disease risk, offering potential for personalized healthcare (PMID: 38274286, PMID: 39210148, PMID: 39122459)."
"Topic 8 Extracellular vesicles from macrophages influence bone marrow mesenchymal stem cell senescence and differential capacity, impacting bone aging. Aging is a process accompanied by functional decline in tissues and organs, often linked to the reduced regenerative capacity of stem cells like bone marrow mesenchymal stem cells (BMSCs), which impairs bone tissue regeneration and contributes to skeletal disorders such as senile osteoporosis (PMID: 39876006, PMID: 39543818, PMID: 39123275, PMID: 39377219, PMID: 39939879). Mesenchymal stem cell (MSC) aging is also influenced by anomalies in the extracellular microenvironment, including matrix stiffness, which can trigger aging and decreased differentiation capacity (PMID: 39353686). Cellular senescence, characterized by irreversible growth arrest and the secretion of a senescence-associated secretory phenotype (SASP), plays a critical role in aging and age-related diseases (PMID: 39039843, PMID: 39370688, PMID: 38115574). For instance, macrophage senescence drives diabetic vascular aging by propagating SASP, exacerbating vascular dysfunction (PMID: 39834861). Vascular smooth muscle cell (VSMC) senescence also contributes to arterial wall inflammation and remodeling in conditions like Takayasu's arteritis (PMID: 38816066). Endothelial cell senescence, promoted by age-related inflammation, further contributes to vascular aging (PMID: 38358087). In intervertebral disc degeneration (IVDD), oxidative stress in the microenvironment leads to the senescence of nucleus pulposus-derived mesenchymal stem cells (NPMSCs) (PMID: 39920738). Macrophage-derived small extracellular vesicles (M-sEVs) can induce NP cell senescence by delivering specific microRNAs (PMID: 39990291). Bone marrow inflammaging, a low-grade chronic inflammation, induces bone marrow aging, with activation of the monocyte/macrophage lineage identified as a key event (PMID: 39868348). Senescent cells, including fibroblasts and osteocytes, accumulate in aged tissues and contribute to dysfunction, such as in skin photoaging and bone loss (PMID: 37066877, PMID: 39370688, PMID: 40202158). Mitochondrial dysfunction is intricately linked to cellular aging and senescence in various cell types, including MSCs and tendon stem/progenitor cells (TSPCs) (PMID: 39353686, PMID: 40298797, PMID: 36917311, PMID: 39238005). Various molecular mechanisms regulate stem cell aging and senescence, including lncRNAs like NEAT1, microRNAs such as miR-203-3p and miR-31-5p, and proteins like NAMPT, ADAM19, USP26, and CDC42, often involving pathways related to mitochondrial function, autophagy, or Wnt signaling (PMID: 39876006, PMID: 39896347, PMID: 39543818, PMID: 39123275, PMID: 39377219, PMID: 39927476, PMID: 39159429, PMID: 38578073). Targeting these mechanisms or using therapeutic strategies like transplantation of young hematopoietic stem cells, pharmacological modulation of pathways like STING-autophagy or AMPK/mTOR, or delivery of stem cell-derived extracellular vesicles shows promise for mitigating age-related decline and diseases (PMID: 39743633, PMID: 39834861, PMID: 36917311, PMID: 39237990, PMID: 40230357, PMID: 39939879, PMID: 39182069, PMID: 37367945)."
"Topic 9 This editorial discusses the ground of evidence-based aging research, including cellular and molecular changes, and potential therapeutic strategies. Aging is a complex process characterized by various cellular and molecular changes that contribute to functional decline and increased susceptibility to age-related diseases (PMID: 38532727, PMID: 39747653). Key hallmarks of aging include the accumulation of DNA damage, telomere shortening, and epigenetic alterations, which collectively impair cellular function and stability (PMID: 38167657, PMID: 39525813, PMID: 38455512). Mitochondrial dysfunction, marked by impaired bioenergetics and increased reactive oxygen species production, is another central feature of aging, contributing to cellular damage and energy deficits (PMID: 39805986). A decline in proteostasis, the ability to maintain protein quality and balance, also occurs with age, leading to the accumulation of misfolded or damaged proteins (PMID: 38235402). Cellular senescence, where cells enter a state of stable growth arrest and secrete pro-inflammatory factors, is a significant contributor to tissue dysfunction and chronic inflammation, often referred to as inflammaging (PMID: 38532727, PMID: 39747654). This chronic inflammation is further exacerbated by age-related changes in immune cell function, known as immunosenescence (PMID: 39747653, PMID: 37600859). The gut microbiome also undergoes alterations with aging, impacting overall health and potentially influencing longevity (PMID: 39633118). Furthermore, aging is associated with stem cell exhaustion, reducing the body's capacity for tissue repair and regeneration (PMID: 37962806). Changes in the extracellular matrix and shifts in cellular metabolism also contribute to the aging phenotype (PMID: 37682890, PMID: 39196123). Nutrient sensing pathways, such as mTOR, play a crucial role in regulating these aging processes (PMID: 40108369). These widespread changes manifest as age-related decline in specific tissues and systems, including the brain, vascular system, skin, muscle (sarcopenia), and bone (osteoporosis) (PMID: 40199981, PMID: 39982667, PMID: 37419982, PMID: 40216657, PMID: 38688939). Lifestyle interventions like exercise and dietary modifications, such as calorie restriction, have shown promise in mitigating some aspects of age-related decline (PMID: 39621687, PMID: 37161069). Additionally, targeting specific aging mechanisms, such as using senolytic drugs to clear senescent cells, represents a potential therapeutic strategy for age-related conditions (PMID: 39164495)."
"Topic 10 eccDNA, copper, and lactate metabolism, along with factors like sleep duration, parity, and collagen, are implicated in aging processes and potential interventions. Aging is a complex process involving various cellular and molecular changes across different tissues and organs, with studies utilizing resources like HuTAge to integrate cross-tissue human aging data and atlases detailing changes in mitochondrial function and extrachromosomal circular DNA (eccDNA) across mouse tissues (PMID: 40248357, PMID: 39704492, PMID: 39984484). Extrachromosomal circular DNA, known to be involved in cancer, has also been characterized in the aging mouse brain, although its total number does not increase with age, unlike other mutations (PMID: 38538648, PMID: 39984484). Cellular senescence, a hallmark of aging, contributes to brain aging and is implicated in SARS-CoV-2-induced neuropathology, where senolytics have shown therapeutic benefits by blocking viral replication and preventing senescence in neuronal populations (PMID: 37957361, PMID: 39187546). Beyond the brain, cellular senescence is also relevant in pancreatic cancer, where circHIF-1 has been identified as a factor inhibiting senescence and promoting tumor growth (PMID: 39825374). Metabolic alterations are closely linked to aging, with plasma metabolomics identifying numerous metabolites associated with age, longevity, and mortality, highlighting the role of essential fatty acids and suggesting nutrition as a potential intervention target (PMID: 39504246, PMID: 39683368). Specific metabolic pathways, such as methionine metabolism, are dysregulated with age, and dietary methionine restriction in late life has shown benefits in neuromuscular function, metabolic health, and frailty in mice, although it did not significantly impact epigenetic aging clocks in mice or humans (PMID: 40238871). Nutrient metabolism, specifically glucose and glutamine processing, is altered with age in tendons, potentially driving age-related degeneration (PMID: 39763790). Copper levels consistently increase with age, particularly in plasma, potentially due to deregulated nutrient sensing leading to increased intestinal absorption, suggesting a copper-centric view on aging and nutrient sensing (PMID: 39839250). The gut and oral microbiota are increasingly recognized for their influence on aging and age-related diseases, including brain aging and neurodegeneration (PMID: 38399774, PMID: 38993126). Mendelian randomization studies suggest a causal association between certain gut bacteria like Streptococcus and biological age acceleration (PMID: 38399774). In sows, increased parity, linked to aging, correlates with decreased gut microbiota diversity, an increase in pathogenic bacteria, and a decrease in beneficial bacteria, suggesting inflammatory aging contributes to declining reproductive performance (PMID: 38200843). Circadian rhythm disruptions, such as light-dark shifts, accelerate intestinal aging and increase colon carcinogenesis, partly by altering the intestinal barrier and microbiota (PMID: 39811661). Tools like microbeMASST are being developed to better identify microbial metabolites and link them to their producers, enhancing our understanding of microbial roles in health and aging (PMID: 38316926). Epigenetic clocks, based on DNA methylation patterns, are strong predictors of aging, and Mendelian randomization studies indicate that iron overload causally accelerates these clocks (PMID: 37805541). While telomere length is another marker of biological aging, studies on lithium use duration did not find associations with telomere length or other aging markers like frailty or metabolomic age delta (PMID: 38539016). However, SARS-CoV-2 infection has been associated with decreased leukocyte telomere length, although a direct link to cognitive impairment post-COVID-19 was not found in one study (PMID: 39457609). Specific interventions and factors show promise in mitigating age-related decline; lifelong voluntary aerobic exercise preserves physiological function and vascular endothelial function in mice, correlating with changes in plasma metabolites like spermidine and increased autophagy in skeletal muscle (PMID: 38265578). Long-term consumption of Tamogi-take mushrooms attenuated age-related decline in cardiac and vascular endothelial function and improved exercise tolerance in mice, potentially through antioxidant effects (PMID: 39779757). Supplementation with essential amino acids ameliorated age-associated sleep loss and fragmentation in Drosophila, suggesting a dietary intervention for age-related sleep problems (PMID: 39696747). In the brain, metallothioneins, particularly MT1 and MT3 expressed in astrocytes, appear to be a protective mechanism during aging, highly expressed in centenarians (PMID: 38769809). Neuronal lactate metabolism also has sex-specific cognitive effects and may become detrimental to learning and memory with aging (PMID: 39055955). Furthermore, deer bone collagen peptides have shown potential to improve skin aging problems in mice by enhancing hydration, antioxidant capacity, and regulating collagen synthesis and degradation (PMID: 38892482). The study of long-lived organisms like the orange roughy fish is also providing insights into potential genomic mechanisms of extreme longevity, such as those related to genomic instability, autophagy, and intercellular communication (PMID: 39187546)."


In [225]:
as_html=True

result = summarize_topics(
    data=data, topic_description_words=10, #config.topic_description_words,
    force_topic_name=0,
    as_html=as_html
)
if as_html:
    display(HTML(result))
else:
    print(result)

Processing topic 0
Highly connected papers: 164 -> 50


0
"Topic 1 A study associates frailty and mortality risk in adults with biological aging, highlighting various biomarkers and lifestyle factors that accelerate or decelerate the process. Aging is a complex process characterized by a gradual decline in functional capacity and increased disease susceptibility, with significant individual variations that necessitate measures of biological age beyond chronological age (PMID: 38581608, PMID: 38130910). Various biomarkers and indices have been developed to quantify biological aging, including proteomic age clocks, epigenetic clocks based on DNA methylation, and physiological age indices derived from clinical indicators (PMID: 39117878, PMID: 38130910, PMID: 38581608, PMID: 39792740, PMID: 37723992). Proteomic biomarkers in blood can identify aging-related proteins associated with clinical traits and chronic diseases, and a proteomic healthy aging score can predict cardiometabolic diseases, potentially influenced by gut microbiota (PMID: 39805987). A proteomic age clock based on plasma proteins has been shown to predict chronological age, incidence of major chronic diseases, multimorbidity, mortality, and age-related functional status across diverse populations (PMID: 39117878). Growth differentiation factor 15 (GDF-15) is another significant biomarker of aging, correlating with epigenetic aging markers, impaired glycemic control, inflammation, and physical decline, suggesting its potential for clinical use (PMID: 39644331). DNA methylation-based epigenetic clocks, such as GrimAge and PhenoAge, are reliable tools for predicting biological age acceleration and mortality risk, often outperforming earlier generations of clocks (PMID: 40301953, PMID: 39583644, PMID: 39687863). Longer DNA methylation-based telomere length (DNAmTL) is associated with reduced cardiovascular disease risk and long-term mortality, showing superior predictive performance compared to traditional telomere length measures (PMID: 39633416). Physiological age, derived from clinical indicators, can reveal disparities in health trajectories, with education and sex interacting to influence these trajectories over the life course (PMID: 39830243, PMID: 40156883). Objective and subjective aging indicators, such as hearing loss, tooth loss, falls, and subjective aging perception, can be combined to assess premature mortality risk, with cumulative effects being particularly elevated among younger individuals, those with unhealthy lifestyles, and lower socioeconomic status (PMID: 39863635). Age-related functional impairments like visual or hearing impairment, physical frailty, and cognitive impairment increase exponentially with age, forming a hazard network associated with mortality risk (PMID: 38058300). Cellular senescence, a hallmark of aging, contributes to tissue dysfunction through the senescence-associated secretory phenotype (SASP), and specific SASP proteins are associated with age and clinical traits like inflammation and physical function (PMID: 37982669). Factors influencing biological aging include genetics and environmental exposures (exposome), with the exposome potentially explaining a greater proportion of variation in the incidence of certain age-related diseases compared to polygenic risk (PMID: 39972219). Social factors like disadvantage, discrimination, and food insecurity are associated with accelerated biological aging, potentially mediated by inflammatory pathways and epigenetic alterations (PMID: 40087516, PMID: 39132086, PMID: 38723585, PMID: 39973988). Lifestyle factors also play a crucial role, as physical inactivity and sedentary behavior are linked to accelerated epigenetic aging, while moderate physical activity shows protective effects on longevity and age acceleration, partly mediated by lipids (PMID: 39794269, PMID: 39230773, PMID: 39821867). Dietary factors, such as higher intake of methyl donor nutrients and carotenoids, are negatively associated with phenotypic age acceleration, suggesting potential nutritional strategies for healthy longevity (PMID: 40221525, PMID: 39819329). Conversely, sugar-sweetened beverage intake, especially at night, is positively correlated with phenotypic age acceleration, potentially mediated by obesity (PMID: 39780125). Insulin resistance, indicated by the Triglyceride-glucose index, is significantly associated with increased biological age and a higher risk of accelerated aging (PMID: 40022176). Chronic respiratory diseases are linked to systemic inflammation and phenotypic age acceleration, with mutual mediating effects suggesting joint assessment for risk identification (PMID: 39825391). Age-related changes in body composition, like decreased lean mass and altered fat mass distribution, vary by sex and contribute to conditions like sarcopenia (PMID: 39028455, PMID: 39001569). Abdominal obesity, measured by the weight-adjusted waist index, is inversely associated with the anti-aging protein -Klotho, an association partially mediated by systemic immune inflammation, particularly in older individuals (PMID: 39905076). Exposure to volatile organic compounds is also associated with biological aging, and daily behaviors like smoking, drinking, and physical activity may influence susceptibility (PMID: 40264054). Biological aging metrics can inform risk stratification for cognitive decline and dementia, with faster pace of aging being a risk factor for preclinical cognitive decline (PMID: 39583644, PMID: 3989933). Schizophrenia is accompanied by accelerated biological aging by midlife, which may explain the increased risk for various age-related physical diseases (PMID: 37924924). Interventions like enhancing cardiorespiratory fitness may slow biological aging, particularly in individuals with chronic airflow limitation (PMID: 39391738). Visual arts-mediated cognitive activation therapy has shown potential in mitigating cellular aging by elongating telomere length, especially in men, alongside cognitive and functional improvements in patients with neurocognitive disorders (PMID: 38524114). While senolytic treatments like Dasatinib and Quercetin initially showed increases in epigenetic age acceleration, the addition of Fisetin appeared to mitigate this effect, suggesting complex interactions and the need for further research (PMID: 38393697). Frailty, characterized by multisystem dysfunction, may be predicted by allostatic load, a physiological status associated with prolonged stress, potentially allowing for earlier intervention (PMID: 39674921). Circulating serotonin has been identified as a potential novel predictor for overall morbidity, alongside other metabolites and proteins linked to organ-specific morbidity (PMID: 39496618). The phase angle from bioelectrical impedance analysis can serve as a predictor for sarcopenic obesity, with specific cutoff values identified for males and females (PMID: 39792740)."
