In [None]:
import langextract as lx
import textwrap

prompt = textwrap.dedent("""\
    Extract the relevant entities from the following text. Use exact text for extractions. Do not paraphrase or overlap entities.
    """)

sample_text = textwrap.dedent("""\
    In 2020, the WHO launched its first Global Strategy to Accelerate the Elimination of Cervical Cancer, outlining an ambitious set of targets for countries to achieve over the next decade. At the same time, new tools, technologies, and strategies are in the pipeline that may improve screening performance, expand the reach of prophylactic vaccines, and prevent the acquisition, persistence and progression of oncogenic HPV. Detailed mechanistic modeling can help identify the combinations of current and future strategies to combat cervical cancer. Open-source modeling tools are needed to shift the capacity for such evaluations in-country. Here, we introduce the Human papillomavirus simulator (HPVsim), a new open-source software package for creating flexible agent-based models parameterized with country-specific vital dynamics, structured sexual networks, and co-transmitting HPV genotypes. HPVsim includes a novel methodology for modeling cervical disease progression, designed to be readily adaptable to new forms of screening. The software itself is implemented in Python, has built-in tools for simulating commonly-used interventions, includes a comprehensive set of tests and documentation, and runs quickly (seconds to minutes) on a laptop. Performance is greatly enhanced by HPVsim’s multi-scale modeling functionality. HPVsim is open source under the MIT License and available via both the Python Package Index (via pip install) and GitHub (hpvsim.org).""")

examples = [
    lx.data.ExampleData(
        text= sample_text,
        extractions=[
            lx.data.Extraction(
                extraction_class="disease", extraction_text="cervical cancer"
            ),
            lx.data.Extraction(
                extraction_class="model", extraction_text="Human papillomavirus simulator (HPVsim)"
            ),
            lx.data.Extraction(
                extraction_class="intervention", extraction_text="prophylactic vaccine"
            ),
        ],
    )
]

In [None]:
input_text = textwrap.dedent("""\
    Background: Recent declines in HIV incidence among adolescent girls and young women (AGYW) in Africa are often attributed to the expansion of biomedical interventions such as antiretroviral therapy and voluntary medical male circumcision. However, changes in sexual behaviour may also play a critical role. Understanding the relative contributions of these factors is essential for developing strategies to sustain and further reduce HIV transmission.

    Methods: We conducted a mathematical modelling study of data from the Rakai Community Cohort Study (RCCS), an open, population-based cohort of 15- to 49-year-olds in 30 communities in Rakai, Uganda, to investigate the biomedical and behavioural drivers of HIV incidence decline in AGYW (15-24 years of age). We estimated changes in the HIV incidence rate between 2000-2019 using retrospective cohort data to validate our modelled incidence estimates. We ran modelled counterfactual scenarios to quantify the independent and combined effects (cumulative infections averted and difference in incidence rates) of antiretroviral therapy (ART), voluntary medical male circumcision (VMMC), and delays in age of first sex (AFS) over historical (between 2000-2020) and projected (between 2000-2050) time horizons.

    Findings: Incidence in women 15-24 years of age declined by 83% between 2000-2019 (from 1.72 per 100 person-years in 2000 to 0.30 per 100 person-years in 2019), the largest reduction in incidence of all age groups of women. Increasing AFS over the last two decades (by 3 years in women and 2 years in men) was the largest contributor to incidence decline in women 15-19 years of age, averting 17% of cumulative infections between 2000-2020 and 37% between 2000-2050. Incidence in 15-19-year-old women was 69% lower in 2020 and 75% lower in 2050 compared to counterfactual scenarios without changes in AFS. ART scale-up contributed the most to incidence declines among women 20-24 years of age, averting 13% of infections between 2000-2020 and 43% of infections between 2000-2050. VMMC averted < 5% of infections in 15-24-year-olds to-date, with larger reductions in incidence between 2000-2050 in both 15-19 year-olds (13% reduction in cumulative infections) and 20-24 year-olds (22% of cumulative infections). ART, VMMC, and increasing AFS acted additively to reduce HIV incidence in AGYW, with little redundancy when combined.

    Interpretation: Our results provide strong support for maintaining both the protective changes in sexual behaviours and effective biomedical interventions to sustain continued reductions in HIV incidence among AGYW.
    """)

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemma3:12b",  # Automatically selects Ollama provider
    model_url="http://localhost:11434",
    fence_output=False,
    use_schema_constraints=False,
)

In [None]:
for extraction in result.extractions:
    print(f"{extraction.extraction_class}: {extraction.extraction_text} | Attributes: {extraction.attributes}")