<a href="https://colab.research.google.com/github/Extralit/papers-ocr-benchmarks/blob/main/text_ocr_doc_structure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Explore doc OCR methods for text and header structure extraction

Self-contained, Google Colab-ready.

- Upload a PDF and extract markdown


In [1]:
# 1. Install marker-pdf and dependencies
!uv pip install --quiet marker-pdf[full] docling
!uv pip install -q "mineru[all]"
!uv pip install -q "PyMuPDF>=1.23.0" "pandas>=1.5.0"
!uv pip install -q pymupdf4llm pdf4llm llama_index
!uv pip install -q "matplotlib>=3.5.0" "seaborn>=0.11.0" "textdistance>=4.6.0"

In [19]:
!rm -rf /content/papers-ocr-benchmarks .git/ pdfs/
!git init .
!git remote add origin https://github.com/Extralit/papers-ocr-benchmarks.git
!git pull origin main

# Install the python package in `scripts/`
!uv pip install -e .

[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint: [m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint: [m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint: [m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /content/.git/
remote: Enumerating objects: 117, done.[K
remote: Counting objects: 100% (117/117), done.[K
remote: Compressing objects: 100% (88/88), done.[K
remote: Total 117 (delta 51), reused 68 (delta 26), pack-reused 0 (from 0)[K
Receiving objects: 100% (117/117), 15.25 MiB | 16.31 MiB/s, done.
Resolving deltas: 100% (51/51), done.
From https://github.com/Extralit/papers-ocr-benchmarks
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> 

In [2]:
from IPython.display import HTML, display, JSON, Markdown
from pprint import pprint

In [23]:
# Here's how you import code modules, but first you'll need to wrap the scripts in the file into the __main__() function otherwise it'll run the entire file, since we just want to import a module from it

# from scripts.ocr_benchmark_gpu_optimized import calculate_text_metrics

In [4]:
# from google.colab import files
# uploaded = files.upload()
# file_path = next(iter(uploaded))

file_path = "/content/pdfs/Allossogbe_et_al_2017_Mal_J.pdf"
# file_path = "/content/pdfs/Mbogo_et_al_1996_Med_Vet_Ento.pdf"

## MinerU

In [None]:
%%time
!mineru -p /content/Allossogbe_et_al_2017_Mal_J.pdf -o /content/mineru_output/

[32m2025-07-08 20:03:55.332[0m | [1mINFO    [0m | [36mmineru.backend.pipeline.pipeline_analyze[0m:[36mdoc_analyze[0m:[36m124[0m - [1mBatch 1/1: 11 pages/11 pages[0m
[32m2025-07-08 20:03:55.334[0m | [1mINFO    [0m | [36mmineru.backend.pipeline.pipeline_analyze[0m:[36mbatch_image_analyze[0m:[36m187[0m - [1mgpu_memory: 15 GB, batch_ratio: 8[0m
[32m2025-07-08 20:03:55.334[0m | [1mINFO    [0m | [36mmineru.backend.pipeline.model_init[0m:[36m__init__[0m:[36m137[0m - [1mDocAnalysis init, this may take some times......[0m
[32m2025-07-08 20:04:09.188[0m | [1mINFO    [0m | [36mmineru.backend.pipeline.model_init[0m:[36m__init__[0m:[36m182[0m - [1mDocAnalysis init done![0m
[32m2025-07-08 20:04:09.189[0m | [1mINFO    [0m | [36mmineru.backend.pipeline.pipeline_analyze[0m:[36mcustom_model_init[0m:[36m64[0m - [1mmodel init cost: 13.854581832885742[0m
Layout Predict: 100% 11/11 [00:02<00:00,  4.03it/s]
MFD Predict: 100% 11/11 [00:04<00:00,  2.2

## Docling

In [7]:
%%time
from docling.document_converter import DocumentConverter
converter = DocumentConverter()
result = converter.convert(file_path)

HfHubHTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/ds4sd/docling-models/revision/v2.2.0 (Request ID: Root=1-686d9ca9-75d18db76d819b064743c2d2;f93df949-b762-4052-bc1e-259e4e8a5168)

Invalid credentials in Authorization header

In [8]:
Markdown(result.document.export_to_markdown())

## PyMuPDF4LLM

Read https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/api.html#pymupdf4llm-api

## Improving Header Detection in PyMuPDF4LLM for Academic Papers

PyMuPDF4LLM is highly efficient for extracting structured text and headers from PDFs, but it can miss headers—sometimes treating them as bolded text rather than true markdown headers. This issue is common in academic papers with complex or inconsistent formatting. Here’s how to address and improve header detection for accurate markdown output.

### Why Headers Are Missed

- **Default detection relies on font size**: By default, PyMuPDF4LLM maps larger font sizes to headers. If a document uses bolding or other styles (not size) for headers, these may be missed and rendered as regular or bold text[1][2].
- **False negatives**: Headers with the same font size as body text but with different styling (bold, all-caps) are not detected as headers automatically.

### Solutions for Accurate Header Structure

#### 1. **Customize Header Detection Logic**

You can provide a custom function to the `hdr_info` parameter in `to_markdown()` to define what counts as a header. This function receives the text span and can use additional properties (e.g., font name, bold status, color) to identify headers.

**Example: Custom Header Detection Using Font Properties**

```python

def custom_header_logic(span, page=None):
    # Example: treat bold, all-caps lines as headers
    text = span["text"].strip()
    is_bold = "Bold" in span.get("font", "")
    is_all_caps = text.isupper()
    font_size = span["size"]

    if is_bold and is_all_caps and len(text) > 3:
        return "# "
    elif is_bold and font_size > 12:
        return "## "
    else:
        return ""

md_text = pymupdf4llm.to_markdown(doc, hdr_info=custom_header_logic)
```
- Adjust the logic to match your document’s header style (e.g., check for underline, color, or position)[1].

#### 2. **Leverage Table of Contents (TOC) for Header Mapping**

If your PDF has a TOC/bookmarks, use them to map headers accurately:

```python
toc = doc.get_toc()

def toc_header_logic(span, page=None):
    toc_items = [t for t in toc if t[-1] == page.number + 1]
    for lvl, title, _ in toc_items:
        if span["text"].strip().startswith(title):
            return "#" * lvl + " "
    return ""

md_text = pymupdf4llm.to_markdown(doc, hdr_info=toc_header_logic)
```
- This approach uses the document’s own hierarchy for maximum accuracy[1].

#### 3. **Limit or Expand Header Levels**

You can set the maximum number of header levels with `max_levels` in the `IdentifyHeaders` class. This helps if the default mapping is too shallow or deep:

```python
from pymupdf4llm import IdentifyHeaders

my_headers = IdentifyHeaders(doc, max_levels=4)
md_text = pymupdf4llm.to_markdown(doc, hdr_info=my_headers)
```
- Adjust `max_levels` to fit your needs[1].

#### 4. **Post-Process Markdown for Missed Headers**

If some headers are still missed (e.g., bolded lines without size change), post-process the markdown to promote bold lines to headers using regex or markdown libraries.

### Best Practices

- **Inspect font properties** in your PDF using `page.get_text("dict")` to see how headers are styled.
- **Combine multiple cues**: font size, weight, all-caps, position, and TOC.
- **Test your logic** on several papers, as academic PDFs often differ in formatting.

### Summary Table: Header Detection Strategies

| Approach                         | When to Use                          | Example Code Ref.  |
|-----------------------------------|--------------------------------------|--------------------|
| Custom font/style logic           | Headers are bold/caps, not larger    | See above          |
| TOC-based mapping                 | PDF has bookmarks/TOC                | See above          |
| Adjust header levels (`max_levels`)| Too many/few header levels           | See above          |
| Post-processing markdown          | Some headers still missed            | Use regex tools    |

By customizing the header detection logic and leveraging document-specific cues, you can achieve highly accurate, layout-aware header extraction for academic research papers using PyMuPDF4LLM[1][2][3].

[1] https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/api.html
[2] https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/
[3] https://artifex.com/blog/building-a-multimodal-llm-application-with-pymupdf4llm
[4] https://buildmedia.readthedocs.org/media/pdf/pymupdf/latest/pymupdf.pdf
[5] https://github.com/pymupdf/RAG/issues/90
[6] https://pymupdf.readthedocs.io/en/latest/xml-class.html
[7] https://stackoverflow.com/questions/937808/how-to-extract-data-from-a-pdf-file-while-keeping-track-of-its-structure
[8] https://endevsols.com/the-pdf-extraction-revolution-why-pymupdf4llm-is-the-ultimate-game-changer/
[9] https://pypi.org/project/pymupdf4llm/
[10] https://ai.gopubby.com/the-pdf-extraction-revolution-why-pymupdf4llm-is-your-new-best-friend-and-llamaparse-is-crying-e57882dee7f8
[11] https://pymupdf.readthedocs.io/en/latest/tutorial.html
[12] https://pymupdf.readthedocs.io/en/latest/changes.html
[13] https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/document_loaders/pymupdf4llm.ipynb
[14] https://sites.google.com/view/raybellwaves/courses/introduction-to-data-science-and-machine-learning
[15] https://www.reddit.com/r/LangChain/comments/1e7cntq/whats_the_best_python_library_for_extracting_text/
[16] https://artifex.com/blog/rag-llm-and-pdf-conversion-to-markdown-text-with-pymupdf
[17] https://arxiv.org/html/2409.05137v1

In [9]:
import pymupdf
import pymupdf4llm
from pymupdf4llm import IdentifyHeaders

doc = pymupdf.open(file_path)

doc.get_toc()

[[1,
  'WHO cone bio-assays of\xa0classical and\xa0new-generation long-lasting insecticidal nets call for\xa0innovative insecticides targeting the knock-down resistance mechanism in\xa0Benin',
  1],
 [2, 'Abstract ', 1],
 [3, 'Background: ', 1],
 [3, 'Methods: ', 1],
 [3, 'Results: ', 1],
 [3, 'Conclusion: ', 1],
 [2, 'Background', 2],
 [2, 'Methods', 2],
 [3, 'Study design', 2],
 [3, 'Study sites', 3],
 [4, 'Malanville', 3],
 [4, 'Tanguieta', 3],
 [4, 'Abomey-Calavi', 3],
 [4, 'Cotonou', 3],
 [4, 'Porto-Novo', 3],
 [4, 'Parakou', 3],
 [4, 'Zangnanado', 3],
 [3, 'Larvae collection', 3],
 [3, 'Highlighting resistance mechanisms', 3],
 [3, 'Mosquito nets', 3],
 [3, 'Cone test', 4],
 [3, 'Data analysis', 4],
 [2, 'Results', 4],
 [3, 'Characteristics of\xa0the studied mosquito populations', 4],
 [3, 'Knock-down (KD) and\xa0mortality of\xa0laboratory strains', 4],
 [3, 'Inhibition of\xa0mortality conferred by\xa0the kdr resistance gene', 4],
 [3,
  'Knock-down (Kd) effect and\xa0mortality i

In [10]:
# We can extract the paper title here
doc.metadata

{'format': 'PDF 1.7',
 'title': 'WHO cone bio-assays of classical and new-generation long-lasting insecticidal nets call for innovative insecticides targeting the knock-down resistance mechanism in Benin',
 'author': 'Marius Allossogbe',
 'subject': 'Malaria Journal, doi:10.1186/s12936-017-1727-x',
 'keywords': 'LLINs,Bio-efficacy,Piperonyl butoxide,Resistant mosquitoes',
 'creator': 'ocrmypdf 16.1.2 / Tesseract OCRhOCR 5.3.4',
 'producer': 'pikepdf 8.13.0',
 'creationDate': "D:20170214204129+05'30'",
 'modDate': "D:20240423063541+00'00'",
 'trapped': '',
 'encryption': None}

In [14]:
%%time

# header parse method 1: Custom logic
def custom_header_logic(span, page=None):
    # Example: treat bold, all-caps lines as headers
    text = span["text"].strip()
    is_bold = "Bold" in span.get("font", "")
    is_all_caps = text.isupper()
    font_size = span["size"]

    if is_bold and is_all_caps and len(text) > 3:
        return "# "
    elif is_bold and font_size > 12:
        return "## "
    else:
        return ""

# header parse method 2
my_headers = IdentifyHeaders(doc, max_levels=4)

# header parse method 3
toc = doc.get_toc()
def toc_header_logic(span, page=None):
    toc_items = [t for t in toc if t[-1] == page.number + 1]
    for lvl, title, _ in toc_items:
        if span["text"].strip().startswith(title): # Improve string matching here
            return "#" * lvl + " "
    return ""

md_text = pymupdf4llm.to_markdown(
    doc,
    hdr_info=toc_header_logic,
    # hdr_info=my_headers,
    # page_chunks="toc_items",
    # table_strategy=None
)

CPU times: user 8.69 s, sys: 30.6 ms, total: 8.72 s
Wall time: 8.77 s


In [15]:
print(md_text)

Allossogbe et al. Malar J (2017) 16:77
DOI 10.1186/s12936-017-1727-x


**RESEARCH**



Malaria Journal


**Open Access**



WHO cone bio‑assays of classical
and new‑generation long‑lasting insecticidal
nets call for innovative insecticides targeting
the knock‑down resistance mechanism in Benin


Marius Allossogbe [1,2*], Virgile Gnanguenon [1,2], Boulais Yovogan [1,2], Bruno Akinro [1], Rodrigue Anagonou [1,2],
Fiacre Agossa [1,2], André Houtoukpe [3], Germain Gil Padonou [1,2] and Martin Akogbeto [1,2]


**Abstract**

**Background:** To increase the effectiveness of insecticide-treated nets (ITN) in areas of high resistance, new longlasting insecticidal nets (LLINs) called new-generation nets have been developed. These nets are treated with the
piperonyl butoxide (PBO) synergist which inhibit the action of detoxification enzymes. The effectiveness of the
new-generation nets has been proven in some studies, but their specific effect on mosquitoes carrying detoxification enzymes and tho

#### pymupdf4llm.LlamaMarkdownReader

This is nice since it parses into LlamaDocument objects for us, which can directly feed into the vector db. But, it doesn't work so well because we need the documents to chunk by section headers, not by page number.

In [None]:
%%time
llama_reader = pymupdf4llm.LlamaMarkdownReader()
llama_docs = llama_reader.load_data(file_path)

In [None]:
[(display(p.metadata), print(p.text)) for p in llama_docs]

## Marker

In [None]:
from marker.converters.pdf import PdfConverter
from marker.config.parser import ConfigParser

from marker.models import create_model_dict
from marker.schema import BlockTypes

artifact_dict = create_model_dict()
artifact_dict

{'layout_model': <surya.layout.LayoutPredictor at 0x7f91c1e6ce90>,
 'recognition_model': <surya.recognition.RecognitionPredictor at 0x7f91d5c9e550>,
 'table_rec_model': <surya.table_rec.TableRecPredictor at 0x7f91ca3e67d0>,
 'detection_model': <surya.detection.DetectionPredictor at 0x7f93d2cb5ad0>,
 'ocr_error_model': <surya.ocr_error.OCRErrorPredictor at 0x7f91bded8b10>}

In [None]:
%%time
# Step 2: Configure to skip OCR on tables
cli_opts = {
    "OCR_ENGINE": None,
    "processors": None,
    "output_format": 'markdown'
}
config = ConfigParser(cli_opts).generate_config_dict()

# Step 3: Create converter with custom TOC header processor
converter = PdfConverter(
    config=config,
    artifact_dict=artifact_dict,
    renderer=ConfigParser(cli_opts).get_renderer(),
)

# Step 4: Run conversion
result = converter(file_path)
md, meta, imgs = result.markdown, result.metadata, result.images

Recognizing layout: 100%|██████████| 1/1 [00:05<00:00,  5.19s/it]
Running OCR Error Detection: 100%|██████████| 1/1 [00:00<00:00, 35.45it/s]
Detecting bboxes: 100%|██████████| 1/1 [00:03<00:00,  3.47s/it]
Recognizing Text: 100%|██████████| 607/607 [03:03<00:00,  3.31it/s]
Detecting bboxes: 100%|██████████| 1/1 [00:00<00:00,  1.28it/s]
Recognizing Text: 100%|██████████| 292/292 [01:02<00:00,  4.65it/s]
Recognizing tables: 100%|██████████| 1/1 [00:08<00:00,  8.70s/it]


CPU times: user 4min 35s, sys: 13.2 s, total: 4min 48s
Wall time: 4min 47s


In [None]:
Markdown(md)

# The impact of permethrin-impregnated bednets on malaria vectors of the Kenyan coast

C. N. M. MBOGO,<sup>1,2</sup> N. M. BAYA,<sup>1</sup> A. V. O. OFULLA.<sup>2</sup> J. I. GITHURE<sup>2</sup> and R. W. SNOW<sup>1,3</sup>

<sup>1</sup> Clinical Research Centre and <sup>2</sup> Biomedical Sciences Research Centre, Kenya Medical Research Institute (KEMRI), Kenya, and <sup>3</sup> Nuffield Department of Clinical Medicine, Oxford University, John Radcliffe Hospital, Oxford, U.K.

> **Abstract.** The effects of introducing permethrin-impregnated bednets on local populations of the malaria vector mosquitoes Anopheles funestus and the An.gambiae complex was monitored during a randomized controlled trial at Kilifi on the Kenyan coast. Pyrethrum spray collections inside 762 households were conducted between May 1994 and April 1995 after the introduction of bednets in half of the study area. All-night human bait collections were performed in two zones (one control and one intervention) for two nights each month during the same period. PCR identifications of An gambiae sensulato showed that proportions of sibling species were An gambiae sensu  $stricto > An.merus > An. arabiensis.$

> Indoor-resting densities of An.gambiae s.l. and the proportion of engorged females decreased significantly in intervention zones as compared to control zones. However, the human blood index and *Plasmodium falciparum* sporozoite rate remained unaffected. Also vector parous rates were unaltered by the intervention, implying that survival rates of malaria vectors were not affected. The human-biting density of An.gambiae s.l., the predominant vector, was consistently higher in the intervention zone compared to the control zone, but showed 8% reduction compared to pre-intervention biting rates - versus 94% increase in the control zone.

> Bioassay, susceptibility and high-performance liquid chromatography results all indicated that the permethrin content applied to the nets was sufficient to maintain high mortality of susceptible vectors throughout the trial. Increased rates of early outdoorbiting, as opposed to indoor-biting later during the night, were behavioural or vector composition changes associated with this intervention, which would require further monitoring during control programmes employing insecticide-treated bednets.

> **Key words.** Anopheles arabiensis, An.funestus, An.gambiae, An.merus, malaria, permethrin-impregnated bednets, mosquito nets, Kenya Coast.

## Introduction

Bednets (mosquito nets) are traditionally used to ward off mosquitoes and have been advocated as a means of personal protection against malaria vectors in Africa (W.H.O., 1986). However, torn or incorrectly tucked nets provide little additional protection and mosquitoes are adept at feeding through nets on exposed limbs (Port & Boreham, 1982). For these reasons the application of a residual insecticide (of low mammalian toxicity) to bednets was suggested in the late 1970s as a means of reinstating the effectiveness of torn or incorrectly used nets as a

Correspondence: Dr Charles N. M. Mbogo, Kilifi Research Unit, KEMRI, P.O. Box 230, Kilifi, Kenya.

man-vector barrier (Curtis et al., 1990). Synthetic pyrethroids such as permethrin and deltamethrin, which have high insecticidal and excito-repellant properties, are most suitable for the treatment of bednets and have been adopted in several countries as part of national malaria control activities (Curtis et al., 1990).

Malaria is the single largest cause of death among children living in tropical Africa (World Bank, 1993). Across this continent, the rates of malaria transmission and endemicity levels vary widely. The impact of insecticide-treated bednets (ITBN) upon the vector population's ability to transmit, and hence the degree of personal protection, depend largely upon the intensity of transmission in any given area. Despite encouraging effects of ITBN in reducing both morbidity and mortality among Gambian children by over 60% (Snow et al., 1988; Alonso et al.,

1991), those results apply to an area with low rates of sporozoite challenge with extremely seasonal vector activity (Lindsay et al., 1993). The limitations of recommendations based on one transmission setting prompted the W.H.O. to replicate ITBN trials in several other areas of Africa where transmission characteristics are very different to those of The Gambia.

Accordingly, in July 1993, ITBN were introduced as part of a randomized controlled trial, conducted in coastal Kenya, to examine their role in reducing childhood mortality and severe malaria morbidity (Nevill et al., 1996). This paper reports the entomological context in which the Kenyan trial was conducted and the impact of ITBNs on malaria vectors in the coastal area of Kenya.

## **Materials and Methods**

Study area. The study area is located in Kilifi district, 60 km north of Mombasa on the Kenyan coast, extending 30 km inland and 40 km along the Indian Ocean coast north of Kilifi town. The area was designated in 1989 for intensive entomological (Mbogo et al., 1993b, 1995), demographic (Snow et al., 1994) and epidemiologic studies (Snow et al., 1993) of malaria. The principal vectors of malaria are the Anopheles gambiae Giles complex with a minor role played by Anopheles funestus Giles. These two vectors yield on average ten sporozoite inoculations per person per year. Among the people inhabiting this geographical area, annual rates of *P.falciparum* challenge range from less than one to sixty per person (Mbogo et al., 1995). Despite these annual rates of *P.falciparum* inoculation being lower than in most parts of tropical Africa, it has been estimated that at least one in fifteen children will develop severe life-threatening malaria before their fifth birthday (Snow et al., 1993).

The study population comprises approximately 60,000 inhabitants living mainly in traditional style houses (walls of sticks and mud) with a coconut thatch roof. Unscreened windows, holes in the walls and large open eaves provide easy access for mosquitoes. Homesteads are scattered and separated from one another by open farmland. Maize is the staple crop cultivated for home consumption; cashews and coconuts are grown as cash crops. During the 1989 national Kenyan census the study area was divided into seventy-two enumeration zones, of which thirtyeight were randomly allocated to receive ITBN.

Green polyester 100 denier mosquito nets (SiamDutch, Thailand) were issued to be used over all beds within the intervention zones and impregnated with 25% permethrin (cis:trans 40:60) emulsifiable concentrate (Imperator, ICI, U.K.) to achieve a target dose of 0.5 g of permethrin per  $m<sup>2</sup>$  of netting. Nets were re-impregnated every 6 months to coincide with the two main rainy seasons: in May, the beginning of the long rains; and October, towards onset of the short rains. In the intervention area, people were asked not to wash their nets until immediately before the next re-impregnation. The remaining thirty-four zones served as the contemporaneous non-intervention control area where bednet ownership was less than 6% (Snow et al., 1992).

The study area was mapped using a hand-held satellite navigational system (Trimble Navigation Europe, U.K.) and computerized using MapInfo<sup>R</sup> software (Troy Ltd, U.S.A.).

Entomological surveillance. One homestead from each zone was randomly sampled for mosquitoes by Pyrethrum spray-catch (PSC) each month (May 1994 to April 1995); no homestead was sampled more than once. Nine zones (five intervention and four control) were excluded from the sampling frame because they formed part of ongoing entomological studies since 1989. Houses were visited in the morning (07.00–11.30 hours) and occupants asked to tie their bednets up away from the bed. White sheets were laid on the floors and the rooms sprayed with pyrethrum aerosol. All mosquitoes knocked down were collected into labelled petri dishes lined with moist cotton wool and taken to the laboratory at Kilifi for further investigation.

Pre-intervention all-night catches of human-biting mosquitoes were undertaken once a week at four sentinel households per zone (five intervention and four control), between May 1992 and April 1993. Post-intervention human-bait collections were performed at one control and one intervention zone (drawn from the nine pre-intervention zones and excluded from PSC catches) for two nights each month between May 1994 and April 1995. Pairs of experienced catchers recruited from the study area were positioned either indoors or outdoors at each site and collections made from 18.00 until 07.00 hours. Catchers rotated in shifts and used aspirators and torches to catch mosquitoes which landed on exposed limbs. Each hourly catch was placed into a prelabelled polystyrene container and taken to the laboratory at Kilifi for assessment.

Laboratory procedures. Mosquito species were identified morphologically and scored as unfed, blood-fed or gravid. A proportion of An.gambiae s.l. females collected by both PSC and all-night biting catches were identified to sibling species by the method of polymerase chain reaction, PCR (Paskewitz & Collins, 1990). Primers used were specific for An.gambiae s.s., An.arabiensis and An.merus, members of the An.gambiae complex found at the Kenyan coast (Mosha & Petrarca, 1983). Samples of An.gambiae s.l. collected on human bait were dissected for parity determination as described by Detinova (1962). Mosquitoes collected by PSC were prepared for sporozoite enzyme-linked immunosorbent assay (ELISA) testing using monoclonal antibodies to detect circumsporozoite proteins of P.falciparum (Wirtz et al., 1987). Tests were assessed visually for positivity (Beier & Koros, 1991). Bloodmeals were identified by direct ELISA using anti-host (IgG) conjugates against human, cow and goat (Beier et al., 1988).

Bioassay, bioavailability and susceptibility tests. Nets were randomly selected from intervention households, between 1 and 17 months after they were issued. These nets were visually inspected and coded as either clean or dirty, and for the number of re-impregnations each net had received. Bioassay cones (W.H.O., 1975) were attached to the nets by means of elastic bands whilst the nets were hung upright in the laboratory. Two cones were used, one placed at the top of the net and the other on the lower portion toward the floor. Twenty laboratory-colonized female An. gambiae s.s. were introduced to each cone and exposed to the netting for 3 min before they were removed to paper cups. Delayed mortality was recorded after the mosquitoes had been left in the paper cups for 24 h with adequate sugar water in an ambient temperature of 25°C and a relative humidity of 72%. Four repeats per net were performed. Identical procedures were followed for untreated nets to serve as controls. Mortality was corrected for control mortality where the latter exceeded 20% of exposed mosquitoes.

Sample swatches of netting fabric were collected for highperformance liquid chromatography (HPLC) immediately after nets were impregnated for the first time and 11 months later, after two re-impregnations. HPLC assays were conducted at the Centers for Disease Control, Atlanta, U.S.A., to determine the concentration of the active cis isomer of permethrin per m<sup>2</sup> of netting.

Susceptibility of wild-caught female mosquitoes, collected from an area adjacent to the study area, was determined in February 1995 using the W.H.O. (1981) test kit and procedure. Unfed An. gambiae s.l. females ( $n = 415$ ) were exposed to 0.25% permethrin test paper for 1 h. Delayed mortality was measured 24 h post-exposure to the permethrin or control papers, and corrected if control mortality exceeded 20%.

Statistical analysis. The mean number of mosquitoes per house was calculated (from PSC data) for each of the zones sampled over the 12 months of surveillance (Table 1). The annual means of the thirty-three intervention zones were compared with the annual means of the thirty control zones using a Mann-Whitney U test (given their non-normal distribution). Human blood indices, sporozoite rates, parity and man-biting rates were analysed post-intervention using a Chi-square test, or controlling for pre-intervention rates using a Mantel Haenzel Chi-square test.

## **Results**

A total of 762 houses were sampled by PSC between May 1994 and April 1995. Of the 362 houses sampled within the nonintervention (control) area, 31.5% (114) yielded at least one An.gambiae s.l. or An.funestus, compared to only 11.3% (45/

400) of the intervention houses sampled during the same period  $(\chi^2 = 45.9, v = 1, P < 0.001)$ . As described previously (Mbogo et al., 1995), large between-zone variation in vector abundance occurs within this relatively small geographical area (Table 1). Comparing the ranks of the mean zonal densities of either An.gambiae s.l. or An.funestus per house indicates, before intervention, a significant difference of indoor-resting mosquito densities between intervention and non-intervention areas (Mann-Whitney U test,  $P < 0.0001$ ). Overall, post-intervention, there was a nine-fold reduction of the indoor-resting densities of both An.gambiae s.l. and An.funestus associated with ITBN use. Fig. 1 shows that the typical peaks of An.gambiae s.l. density during the long rains (May-August) and the short rains (November-December) were virtually eliminated in areas where ITBN were used.

Composition of the An. gambiae complex differed significantly between non-intervention (control) and ITBN intervention areas (Table 2). Proportions of An.arabiensis, An.gambiae s.s. and An.merus were 7%, 49% and 44% respectively among seventytwo specimens identified by PCR from the intervention area. compared with 11%, 83% and 6% of these three sibling species, respectively, among 165 specimens identified from the nonintervention area. PSC densities of all three species were significantly different both between treatment areas and between species within areas (Table 2). Per house sampled, the intervention area had 2.9-fold more An.merus but 4-fold less An.arabiensis and 4.3-fold less An.gambiae s.s. than the non-intervention (control) area.

In houses with impregnated bednets, significantly fewer An.gambiae s.l. were found to be blood-fed and their humanblood index was lower than in control houses, although this difference was not statistically significant (Table 3). There were no significant differences between areas in the proportion of An.gambiae s.l. with detectable P.falciparum CS protein and

![](_page_2_Figure_10.jpeg)

Fig. 1. Monthly abundance (May 1994 to April 1995) of indoor-resting An. gambiae s.l. females among households where ITBN were used (closed line) and households in a non-intervention (control) zone without bednets (dotted line).

| An.gambiae s.l.<br>ż<br>0<br>$\geq$<br>$\sim$<br>≏<br>0.99<br>$\circ$<br>$\sim$ $\sim$<br>$\circ$<br>0<br>$\sim$<br>$\sim$<br>n u<br>n<br>$\sim$<br>$\circ$<br>$\bullet$<br>houses<br>Σ,<br>$\frac{2}{2}$<br>$\overline{5}$<br>5<br>$\overline{5}$<br>$\vec{a}$<br>$\overline{12}$<br>ż.<br>ā<br>$\bar{c}$<br>$\mathbf{r}$<br>$\mathbf{z}$<br>≌<br>N<br>្ម<br>$\overline{2}$<br>$\overline{a}$<br>$\overline{2}$<br>$\frac{2}{2}$<br>$\mathbf{5}$<br>$\begin{array}{c}\n2 \\ 2 \\ 3\n\end{array}$<br>$\overline{5}$<br>⋖<br>Zone<br>28<br>$\overline{\phantom{0}}$<br>$\overline{18}$<br>$\overline{24}$<br>25<br>ສ<br>$\frac{28}{28}$<br>$\ddot{ }$<br>54<br>$\infty$<br>$\infty$<br>g<br>$\mathbf{z}$<br>$\mathbf{r}$<br>호<br>$\overline{c}$<br>$\overline{5}$<br>47<br>$\frac{48}{5}$<br>$\overline{6}$<br>$\Xi$<br>$\overline{a}$<br>ఠ | An.funestus<br>ပ ဦ              |                         |                             | Non-intervention (control)            |                         |               |                                        |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|-------------------------|-----------------------------|---------------------------------------|-------------------------|---------------|----------------------------------------|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | Anopheles               |                             |                                       | <u>ន</u>                | u g           | Anopheles                              |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | $(B + C)$ /A<br>density | Zone                        | houses<br>$\leq \frac{3}{2}$          | An.gambiae s.l.         | An.funestus   | $(18 + C)/A$<br>density                |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 |                         |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | $\frac{8}{1}$           | $\boldsymbol{\mathcal{S}}$  |                                       | $\mathfrak{L}$          |               | 2.83                                   |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 00                              | 1.33                    | ड                           | $\vec{a}$ $\vec{a}$                   | $\equiv$                |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.17                    |                             |                                       | 29                      |               | 2.42                                   |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.17                    | 885                         |                                       | $\infty$                | 000           |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.00                    |                             |                                       | $\frac{10}{2}$          |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.08                    | జ                           |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 000000-000-000000-00000000-0000 | 0.17                    |                             | 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 |                         | 0000-00m0-    | <b>\$88888888888888888888888888558</b> |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.08                    |                             |                                       | $\omega$ 4              |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.25                    |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.00                    |                             |                                       | 4                       |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 |                         |                             |                                       | $\rightarrow$ $-$       |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.17<br>0.17            |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.08                    |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 |                         | 112835385544465585888865885 |                                       | $4 \omega \bar{\omega}$ |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.08                    |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.42                    |                             |                                       | S                       | $2n + 9n - n$ |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.17                    |                             |                                       | $\equiv$                |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | $0.25$<br>$0.50$        |                             |                                       | 3973                    |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 |                         |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.08                    |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.08                    |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.00                    |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.08                    |                             |                                       |                         | $\bullet$     |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.42                    |                             |                                       | 370                     | $\rightarrow$ |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.17                    |                             |                                       |                         |               |                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                 | 0.50                    |                             |                                       |                         |               |                                        |
| $\overline{c}$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                 | 0.08                    |                             |                                       |                         |               |                                        |
| ∘<br>$\overline{5}$ $\overline{5}$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                 | 0.00                    |                             |                                       |                         |               |                                        |
| 3<br><b>SSRFK</b>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                 | 1.17                    |                             |                                       | 22522                   | $0 - n$ $N$   |                                        |
| $\sim$<br>$\frac{13}{12}$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                 | 0.15                    |                             |                                       |                         |               |                                        |
| $\circ$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                 | 0.00                    |                             |                                       |                         |               |                                        |
| $\mathbf{r}$<br>77                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                 | 0.08                    |                             |                                       |                         |               |                                        |
| $\overline{13}$<br>79                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                 | 0.15                    |                             |                                       |                         |               |                                        |
| 82<br>400<br>Total                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 4                               | 0.22                    |                             | 362                                   | 647                     | 83            | $2.02*$                                |

|                           | Area                          |                        | Difference |          |                               |
|---------------------------|-------------------------------|------------------------|------------|----------|-------------------------------|
|                           | Non-intervention<br>(control) | Intervention<br>(ITBN) | $\chi^2$   | P        |                               |
| No. houses sampled by PSC | 362                           | 400                    |            |          |                               |
| Species                   |                               |                        |            |          |                               |
| arabiensis                | 18(11)                        | 5(7)                   | 7          | < 0.01   | $\chi^2 = 167$ ,<br>23        |
| gambiae                   | 137(83)                       | 35(49)                 | 60         | < 0.0001 | 172<br>$v = 2$ .              |
| merus                     | 10(6)                         | 32(44)                 | 12         | < 0.001  | P < 0.0001<br>42 <sup>7</sup> |
| Total An.gambiae s.l.     | 165 (100)                     | 72 (100)               | 48         | < 0.0001 |                               |

Table 2. Numbers (%) of each sibling species of the Anopheles gambiae complex collected in each area by pyrethrum spray collections indoors.

the proportion of parous female An.gambiae s.l. collected on human bait (Table 3).

Table 3 also gives the mean numbers of An. gambiae s.l. females/person/night landing on human bait, both indoors and outdoors, during periods of 12 months before and after intervention in both the ITBN area (185 man-nights pre- and 64 man-nights post-intervention) and the non-intervention area (182) man-nights pre- and 70 man-nights post-intervention), sampling consistently from the same two houses in each area. After introduction of ITBN, greater proportion of An.gambiae s.l. were caught biting outdoors (30.3%) in the intervention area, compared to the non-intervention area (23.2%): Mantel Haenzel allowing for differences pre- intervention:  $\chi^2 = 26.0$ ,  $P < 0.0001$ . Fig. 2 suggests that there was a tendency toward earlier biting activity inside houses where ITBN were in use compared to the biting cycle in control houses: 12% of bites occurred before 22.00 hours in houses with ITBN, compared to only 7% in control houses.

Calculating the product of the monthly man-biting rates and sporozoite rates shown in Table 3 (determined from all-night human bait catches in only two zones) suggests that the average annual sporozoite inoculation rate per person was not significantly reduced by the use of ITBN.

In an attempt to examine the possible influence on mosquitoes

Table 3. An.gambiae s.l. collections from control (non-intervention) and ITBN (intervention) areas, pre-intervention (May 1992 to April 1993) and post-intervention (May 1994 to April 1995). Proportions bloodfed, parous and sporozoite positive from PSC samples; man-biting rates and sporozoite inoculation rates from human bait catches.

|                                                 | Non-intervention<br>(control) | Intervention<br>(TTBN)   | Difference |         |
|-------------------------------------------------|-------------------------------|--------------------------|------------|---------|
|                                                 | $(n = 362)$                   | $(n = 400)$              | $\chi^2$   | P       |
| PSC surveys                                     |                               |                          |            |         |
| Blood-fed (%)                                   | 55.5% (359/647)               | 28.0% (23/82)            | 20.9       | < 0.001 |
| Human Blood Index (%)                           | 86.4% (242/280)               | 80.0% (16/20)            | 0.09       | >0.75   |
| Sporozoite rate $(\%)$ (csp positive)           | 5.0% (32/647)                 | 4.9% (4/82)              | 0.01       | >0.9    |
| Human-bait surveys                              |                               |                          |            |         |
| Parity (%)                                      |                               |                          |            |         |
| <b>Before</b>                                   | 65.5% (129/197)               | 63.2% (427/676)          | 0.11       | < 0.75  |
| After                                           | 63.4 (78/123)                 | 54.1% (242/447)          | 1.50       | < 0.25  |
| Difference                                      | $\chi^2$ = 0.05 P < 0.75      | $\chi^2$ = 3.60 P < 0.05 |            |         |
| Man-biting rate* per man per night $(n)$        |                               |                          |            |         |
| <b>Indoors</b>                                  |                               |                          |            |         |
| Before                                          | 5.3(182)                      | 93.4 (185)               |            |         |
| After                                           | 3.7(70)                       | 23.5(64)                 |            |         |
| Difference                                      | $-1.6$                        | $-69.9$                  |            |         |
| <b>Outdoors</b>                                 |                               |                          |            |         |
| <b>Before</b>                                   | 0.15(182)                     | 1.14(185)                |            |         |
| After                                           | 0.94(70)                      | 10.20(64)                |            |         |
| Difference                                      | $+0.79$                       | $+9.06$                  |            |         |
| Annual sporozoite inoculation rate <sup>†</sup> |                               |                          |            |         |
| <b>Before</b>                                   | 18.0                          | 59.6                     |            |         |
| After                                           | 35.0                          | 54.1                     |            |         |
| Difference                                      | $+94%$                        | $-9.2%$                  |            |         |

\* Sampling done in the same rooms before and after intervention, two houses per zone  $(n = number of man-nights collection)$ .

<sup>+</sup> Calculated as a sum of the products of the monthly indoor man-biting rates and monthly sporozoite rates.

![](_page_5_Figure_1.jpeg)

Fig. 2. Biting cycle (hourly percentage on human bait) of An gambiae s.l. in houses where ITBN were used (light bars) and in a non-intervention zone where bednets were not used (dark bars).

(deleterious or beneficial effects) entering houses in the control (non-intervention) area close to the intervention area, we used longitude and latitude to establish precise distances from intervention zones of thirty-five houses in control zones 57, 58 and 59, selected because they generated the largest numbers of mosquitoes. Distances were classified as more or less than 400 m from the intervention area. Among eleven households within 400 m, 9% had at least one mosquito, significantly less than 46% of twenty-four households at a distance of more than 400 m from the intervention area. ( $\chi^2 = 4.5$ ,  $v = 1$ ,  $P < 0.05$ ).

Details of user compliance of the intervention will be presented elsewhere (Some et al., in prep.). Potency of the permethrin applied to bednets remained high throughout the trial. HPLC analyses of the nets indicated that, following the first treatment, the average concentration of cis-permethrin was  $0.43$  g/m<sup>2</sup> (95% CL 0.34-0.52) on thirty-six nets. Eleven months later, following re-impregnation after 6 months, the average concentration on thirty-six nets was 1.36 g/m<sup>2</sup> (95% CL 1.21-1.51).

Table 4. An.gambiae s.l. mortality within 24 h following exposure for 3 min to permethrin-impregnated bednets retrieved from intervention villages in May 1994 and March 1995.

| Months in use<br>(no. of)<br>impregnations) | Percentage<br>mortality<br>(no. of nets tested) |
|---------------------------------------------|-------------------------------------------------|
| 0(1)                                        | 80.8(4)                                         |
| $4 - 8(2)$                                  | 78.3 (13)                                       |
| 10(3)                                       | 96.0(9)                                         |
| 17(4)                                       | 99.8(5)                                         |

Permethrin susceptibility tests of local wild-caught An.gambiae s.l., using the diagnostic dosage of 1 h exposure to  $0.25\%$ permethrin in W.H.O. (1975) test kits, gave a mortality-rate of 94.5% when tests were undertaken 21 months after the trial began.

Bioassay tests with An.gambiae s.l. (3 min exposure, 24 h mortality) on various nets that had been used for up to 8 months following their initial impregnation in July 1993 gave greater than 78% kill (Table 4). Nets re-impregnated in April and November 1994 showed an increased killing capacity of 96-99.8%. Interestingly, nets which were found to be dirty with cooking soot had higher killing effects (94.1%) than nets which were clean  $(84.5\%)$ .

#### **Discussion**

Our results demonstrate that permethrin-impregnated bednets exert a major impact upon the indoor-resting abundance of the principal vectors of *P.falciparum* malaria in coastal villages of Kenya. Indoor-resting densities of An.gambiae s.l. and An funestus were 9 times lower in houses where ITBN were in use, compared to households where no nets were used. This had the additional effect of eliminating the typical seasonal peaks in vector density usually seen in this part of Kenya (Fig. 1), despite evidence that more of the vector species were biting outdoors (Table 3). These findings are consistent with other studies of synthetic pyrethroid treated bednets or curtains in Africa (Lines et al., 1987; Majori et al., 1987; Lindsay et al., 1989, 1993; Magesa et al., 1991; Robert & Carnevale, 1991; Beach et al., 1993). The precise effect in each of these areas is difficult

to compare, given the inherent differences in the sampling procedures used within each study. We opted not to use light traps (Lines et al., 1991; Mbogo et al., 1993a) in our estimation of vector abundance, because they tend to be less efficient in areas of low vector abundance (such as our study area) and have been shown to over-estimate parity rates in this area of Kenya (Petrarca et al., 1991). Furthermore, we required a simple and rapid means of monitoring endophilic mosquitoes over a wide geographical area, so as to truly reflect the impact of ITBN within our entire study population. Intensive entomological surveillance limited to a few sites – as suggested by the W.H.O.  $(1991)$  – can yield unrepresentative results in areas where marked overdispersion of vectors is common. However, it could be argued that reductions of indoor-resting densities - as determined by PSC - may simply reflect increased excito-repellency of the insecticides and not a reduction in the numbers of vectors coming to feed. Indeed, studies with exit traps in The Gambia have shown an increased rate of exophily due to ITBNs indoors (Snow et al., 1987; Miller et al., 1991). In addition, however, there is clear evidence that houses with pyrethroid-treated fabrics tend to significantly deter entry of vectors into the house (Lines et al., 1987; Lindsay et al., 1991). Further evidence from our study that man-vector contact was reduced is shown by the very highly significantly reduced proportion of An. gambiae s.l. found bloodfed in the early-morning PSC samples (Table 3). Human bait catches, however, revealed no significant reduction in the number of sporozoite inoculations an unprotected individual is likely to receive per year when living in a household where ITBN were used, compared to living in a house where no nets were in use. Whereas the sporozoite inoculation rate increased by 94% in the non-intervention area, for unaccountable (probably climatic) reasons between pre- and post-intervention years, it decreased by 8.3% in the ITBN intervention area, a significant reduction.

Interestingly, our study did not demonstrate a significant reduction in the actual sporozoite rate or parity (an index of longevity) among vectors sampled from ITBN intervention zones compared to non-intervention (control) zones. Similar results were obtained in the Gambia, where bednets were also impregnated with permethrin  $0.5$  g/m<sup>2</sup>, and this has been interpreted as a probable lack of any so called 'mass effect' upon the vector population (Lindsay et al., 1993; Thomson et al., 1995). Mass effects would be difficult to prove in most field study designs, because the intervention could affect mosquito abundance in the untreated (control) as well as treated (ITBN) areas, as shown by the overall reductions compared to pre-intervention data in both The Gambia (Lindsay et al., 1993) and Burkina Faso (Robert & Carnevale, 1991). Hence Lines et al. (1987) and Lindsay et al. (1991) argued that, although individuals appear to be protected by ITBN against the bites of vector mosquitoes, there is no evidence that this increases the biting rate on unprotected neighbours. Under fortuitous circumstances, there may be some reduction of biting on people without ITBN if they are sufficiently closely associated with ITBN users to be afforded some protection. We have tried to assess this 'community protection' by studying three control communities in close proximity to intervention communities, comparing vector abundance by distance from the nearest houses where ITBN were widely employed. This analysis indicated that, within the non-intervention area, fewer houses closest to the intervention area had any malaria

vectors compared to those further away.

The dipping procedures used for bednet impregnation during this trial provided adequate target treatment concentration of  $0.5$  g/m<sup>2</sup> (over 76% of all netting samples tested had excess of this figure), giving bioassay mortalities in excess of 80% throughout the study, increasing to almost 100% following multiple re-impregnations at half-yearly intervals (Table 4).

Perhaps the greatest concern raised by this study is the observation that a significant proportion of malaria vectors appeared to bite earlier in the evening in houses where ITBN were used, with a greater tendency toward exophagy rather than the typical endophagy of most anthropophilic An gambiae.s.l. Furthermore, there was an apparent shift in sibling species composition of the An gambiae complex following the intro-duction of ITBN. Both An.merus and An.arabiensis have slightly different biting cycles to An. gambiae s.s. (Ivengar, 1962; White, 1974; Mosha & Petrarca, 1983). Earlier biting is associated with use of permethrin-treated bednets in Papua New Guinea (Charlwood & Graves, 1987). As the biting cycle change occurred immediately after installation of ITBNs in our study, in conjunction with the lack of evidence for a mass-killing effect, we conclude that the earlier biting reflects either an immediate intraspecific behavioural effect or a change in vector species proportions within the An.gambiae complex, and was not the result of selection for evolved behavioural resistance. Among our Kenyan study population, people usually 'go to bed' at 21.00-22.00 hours (unpublished data) and most children retire earlier, so their customs limit the opportunities for vectors to bite them, especially when they sleep under bednets. If ITBN are increasingly to be employed against malaria in tropical Africa, their effects on mosquito behaviour and insecticide susceptibility (cf. Vulule et al., 1996) should be monitored.

#### **Acknowledgments**

This study was supported by funds from the UNDP/World Bank/ W.H.O. Special Programme for Research and Training in Tropical Diseases: The Wellcome Trust; The International Development and Research Centre of the Canadian International Development Agency, and by the Kenya Medical Research Institute. We are grateful for the assistance of all scientific and technical staff at the Kilifi Research Unit, particularly Ms Laura New, Dr Chris Nevill, Dr Kevin Marsh, Dr N. M. Peshu, Mr Barnes Kitsao, David Ireri and Reuben K. Peshu. We thank Dr Bill Hawley of KEMRI/CDC, Nairobi, for identification of mosquitoes by PCR, Dr Robert Wirtz for providing monoclonal antibodies (through a grant from the World Health Organization), Dr Jim Todd of CDC, Atlanta, for conducting the HPLC assays, and Dr Jo Lines for useful comments on the manuscript. Dr Bob Snow is a Senior Wellcome Trust Fellow in Basic Biomedical Sciences. This paper is published with the permission of the Director of the Kenya Medical Research Institute.

# **References**

Alonso, P.L., Lindsay, S.W., Armstrong, J.R.M., Conteh, M., Hill, A.G., David, P.H., Fegan, G., de Francisco, A., Hall, A.J., Shenton, F.C., Cham, K. & Greenwood, B.M. (1991) The effect of insecticidetreated bednets on mortality of Gambian children. Lancet, 337, 1499-1502.

- Beach, R.F., Ruebush, T.K., Sexton, J.D., Bright, P.L., Hightower, A.N., Breman, J.G., Mount, D.L. & Oloo, A.J. (1993) Effectiveness of permethrin impregnated bed nets and curtains for malaria control in a holoendemic area of Western Kenya. American Journal of Tropical Medicine and Hygiene, 49, 290-300.
- Beier, J.C. & Koros, J.K. (1991) Visual assessment of sporozoite and blood meal ELISA samples in malaria field studies. Journal of Medical Entomology, 28, 805-808.
- Beier, J.C., Perkins, P.V., Wirtz, R.A., Koros, J., Diggs, D., Gargan, T.P. & Koech, D.K. (1988) Blood meal identification by direct enzymelinked immunosorbent assay (ELISA), tested on Anopheles (Diptera: Culicidae) in Kenva. Journal of Medical Entomology, 25, 9-16.
- Charlwood, J.D. & Graves, P.M. (1987) The effect of permethrinimpregnated bednets on a population of Anopheles farauti in coastal Papua New Guinea. Medical and Veterinary Entomology, 1, 319- $327$
- Curtis, C.F., Lines, J.D., Carnevale, P. & Robert, V. (1990) Impregnated bednets and curtains against malaria mosquitoes. Appropriate Methods of Vector Control (ed. by C. F. Curtis), pp. 5-46. CRC Press, Boca Raton, Florida
- Detinova, T.S. (1962) Age-grading methods in Diptera of medical importance with special reference to some vectors of malaria. World Health Organization Monograph Series, 47, 1-216.
- Ivengar, R. (1962) The bionomics of salt water Anopheles gambiae in East Africa. Bulletin of World Health Organization, 27, 223-229.
- Lindsay, S.W., Snow, R.W., Broomfield, G.I., Janneh, M.S., Wirtz, R.A. & Greenwood, B.M. (1989) Impact of permethrin treated bednets on malaria transmission by the Anopheles gambiae complex in The Gambia. Medical and Veterinary Entomology, 3, 263-271.
- Lindsay, S.W., Adiamah, J.H., Miller, J.E. & Armstrong, J.R.M. (1991) Pyrethroid-treated bednet effects on mosquitoes of the Anopheles gambiae complex in The Gambia. Medical and Veterinary Entomology, 5, 477-483.
- Lindsay, S.W., Alonso, P.L., Armstrong Schellenberg, J.R.M., Hemingway, J., Adiahmah, J.H., Shenton, F.C., Jawara, M. & Greenwood, B.M. (1993) A malaria control trial using insecticidetreated bed nets and targeted chemoprophylaxis in a rural area of The Gambia, West Africa. 7. Impact of permethrin-impregnated bed nets on malaria vectors. Transactions of Royal Society of Tropical Medicine and Hygiene. 87, 45-52.
- Lines, J.D., Myamba, J. & Curtis, C.F. (1987) Experimental hut trials of permethrin-impregnated mosquito nets and curtains against malaria vectors in Tanzania. Medical and Veterinary Entomology,  $1.37 - 51.$
- Lines, J.D., Curtis, C.F., Wilkes, T.J. & Njunwa, K.J. (1991) Monitoring human-biting mosquitoes (Diptera: Culicidae) in Tanzania with lighttraps hung beside mosquito nets. Bulletin of Entomological Research, 81, 77-84.
- Majori, G., Sabatinelli, G. & Coluzzi, M. (1987) Efficacy of permethrinimpregnated curtains for malaria vector control. Medical and Veterinary Entomology, 1, 185-192.
- Magesa, S.M., Wilkies, T.J., Mnzava, A.E.P., Njunwa, K.J., Myamba, J., Kivuyo, M.D.P., Hill, N., Lines, J.D. & Curtis, C.F. (1991) Trial of pyrethroid impregnated bednets in an area of Tanzania holoendemic for malaria. Part 2. Effects on the malaria vector population. Acta Tropica, 49, 97-108.
- Mbogo, C.N., Glass, G.E., Forster, D., Kabiru, E.W., Githure, J.I., Ouma, J.H. & Beier. J.C. (1993a) Evaluation of light traps for sampling anopheline mosquitoes in Kilifi, Kenya. Journal of the American Mosquito Control Association, 9, 141-144.
- Mbogo, C.N.M., Snow, R.W., Kabiru, E.W., Ouma, J.H., Githure, J.I., Marsh, K. & Beier, J.C. (1993b) Low-level Plasmodium falciparum

transmission and the incidence of severe malaria infections on the Kenyan coast. American Journal of Tropical Medicine and Hygiene, 49.245-253.

- Mbogo, C.N.M., Snow, R.W., Khamala, C.P.M., Kabiru, E.W., Ouma, J.H., Githure, J.I., Marsh, K. & Beier, J.C. (1995) Relationship between Plasmodium falciparum transmission by vector populations and the incidence of severe disease at nine sites on the Kenyan coast. American Journal of Tropical Medicine and Hygiene, 52, 201-206.
- Miller, J.E., Lindsay, S.W. & Armstrong, J.R.M. (1991) Experimental hut trials of bednets impregnated with synthetic pyrethroid and organophosphate insecticides for mosquito control in The Gambia. Medical and Veterinary Entomology, 5, 465-476.
- Mosha, F.W. & Petrarca V. (1983) Ecological studies of Anopheles gambiae complex species on the Kenyan Coast. Transactions of the Roval Society of Tropical Medicine and Hygiene, 77, 344-345.
- Nevill, C.G., Some, E.S., Mungala, V.O., New, L., Marsh, K., Lengeler, C. & Snow, R.W. (1996) Insecticide-treated bednets reduce mortality and severe morbidity from malaria among children on the Kenyan coast. Tropical Medicine and International Health, 1, 139-146.
- Paskewitz, S.M. & Collins, F.H. (1990) Use of the polymerase chain reaction to identify mosquito species of the Anopheles gambiae complex. Medical and Veterinary Entomology, 4, 367-373.
- Petrarca, V., Beier, J.C., Onyango, F., Koros, J., Asiago, C., Koech, D.K. & Roberts, C.R. (1991) Species composition of the An.gambiae complex (Diptera: Culicidae) at two sites in Western Kenya. Journal of Medical Entomology, 28, 307-313.
- Port, G.R. & Boreham, P.F.L. (1982) The effect of bed nets on feeding by Anopheles gambiae Giles (Diptera: Culicidae). Bulletin of Entomological Research, 72, 483-488.
- Robert, V. & Carnevale, P. (1991) Influence of deltamethrin treatment of bed nets on malaria transmission in the Kou valley, Burkina Faso. Bulletin of the World Health Organization, 69, 735-740.
- Snow, R.W., Juwara, M. & Curtis, C.F. (1987) Observations on Anopheles gambiae Giles s.l. during a trial of permethrin treated bed nets in The Gambia. Bulletin of Entomological Research, 77, 279-286.
- Snow, R.W., Lindsay, S.W., Hayes, R.J. & Greenwood, B.M. (1988) Permethrin-treated bednets (mosquito nets) prevent malaria in Gambian children. Transactions of the Royal Society of Tropical Medicine and Hygiene, 82, 838-842.
- Snow, R.W., Peshu, N., Forster, D., Mwenesi, H.M. & Marsh, K. (1992) The role of shops in the prevention of malaria on the coast of Kenya. Transactions of the Royal Society of Tropical Medicine and Hygiene, 86, 237-239.
- Snow, R.W., Armstrong, J.R.M., Forster, D., Winstanley, P.A., Mwangi, I., Waruiru, C., Warn, P., Newbold, C. & Marsh, K. (1993) Periodicity and time space clustering of severe childhood malaria on the Kenyan coast. Transactions of the Royal Society of Tropical Medicine and Hygiene, 87, 386-390.
- Snow, R.W., Mung'ala, V.O., Forster, D. & Marsh, K. (1994) The role of the district hospital in child survival at the Kenyan Coast. African Journal of Health Sciences, 1, 71-75.
- Thomson, M.C., Adiamah, J.H., Connor, S.J., Jawara, M., Bennett, S., D'Allessandro, U., Quinones, M., Langerock, P. & Greenwood, B.M. (1995) Entomological evaluation of the Gambia National Impregnated bednet Programme. Annals of Tropical Medicine and Parasitology,  $89, 229 - 241.$
- Vulule, J.M., Beach, R.F., Atieli, F.K., Mount, D.L., Roberts, J.M. & Mwangi, R.W. (1996) Long-term use of permethrin-impregnated nets does not increase Anopheles gambiae permethrin tolerance. Medical and Veterinary Entomology, 10, 71-79.
- Wirtz, R.A., Zavala, F., Charoenvit, Y., Campbell, G.H., Burkot, T.R., Schnieder, I., Esser, K.M., Beaudoin, R.L. & Andre, R.G. (1987) Comparative testing of Plasmodium falciparum circumsporozoite antibody. Bulletin of the World Health Organization, 65, 39-45.
- White, G.B. (1974) Anopheles gambiae complex and disease transmission

in Africa. Transactions of the Royal Society of Tropical Medicine and Hygiene, 77, 344-345.

- World Bank (1993) World Development Report: Investing in Health. Oxford University Press, New York.
- W.H.O. (1975) Manual on Practical Entomology in Malaria, Part II. Methods and Techniques. World Health Organization, Geneva.
- W.H.O. (1981) Instructions for determining the susceptibility or resistance of adult mosquitoes to organochloride, organophosphates and carbamate insecticides: diagnostic test. Unpublished document, VBC/

81.806/WHO, World Health Organization, Geneva.

- W.H.O. (1986) Expert Committee on Malaria, 18th Report. Technical Report Series, 737. World Health Organization, Geneva.
- W.H.O. (1991) Guidelines for the development of protocols for studies to evaluate the impact of insecticide-treated bed-nets on mortality. WHO/TDR Unpublished document, World Health Organization, Geneva.

Accepted 19 February 1996

In [None]:
display(JSON(meta, expanded=False))

<IPython.core.display.JSON object>

### Marker Blocks Filtering and Parsing

In [None]:
# 3. Run Marker to extract JSON structure
import os
output_dir = 'marker_output'
os.makedirs(output_dir, exist_ok=True)
json_out = os.path.join(output_dir, os.path.splitext(os.path.basename(pdf_path))[0] + '_structure.json')

!marker_single "{pdf_path}" --output_format json --output_dir "{output_dir}"

In [None]:
# 4. Load the Marker JSON
import json
with open(json_out, 'r') as f:
    marker_json = json.load(f)

In [None]:
# 5. Data models and flattening utilities (Python version of your TypeScript)
from typing import List, Dict, Any

class SimplifiedBlock:
    def __init__(self, type: str, content: str, page: int, bbox: list):
        self.type = type
        self.content = content
        self.page = page
        self.bbox = bbox

    def as_dict(self):
        return {
            'type': self.type,
            'content': self.content,
            'page': self.page,
            'bbox': self.bbox,
        }

import html

def decode_html_entities(text: str) -> str:
    return html.unescape(text)

def flatten_marker_json(blocks: List[Dict[str, Any]], page_number: int = 0) -> List[SimplifiedBlock]:
    flat_blocks = []
    for block in blocks:
        # Skip Page blocks but process their children
        if block.get('block_type') == 'Page':
            child_page = int(block.get('id', '0/0/0').split('/')[2]) if 'id' in block else 0
            flat_blocks.extend(flatten_marker_json(block.get('children', []), child_page))
            continue

        # Process current block
        content = ''
        if block.get('images') and isinstance(block['images'], dict) and block['images']:
            content = next(iter(block['images'].values()))
        elif block.get('block_type') == 'Table':
            content = block.get('html', '').strip()
        elif block.get('html'):
            import re
            content = re.sub(r'<[^>]*>', ' ', block['html']).strip()
        content = decode_html_entities(content)

        page = (int(block.get('id', '0/0/0').split('/')[2]) if 'id' in block else page_number) + 1
        bbox = block.get('bbox', [0,0,0,0])

        flat_blocks.append(SimplifiedBlock(
            type=block.get('block_type', ''),
            content=content,
            page=page,
            bbox=bbox
        ))

        # Recursively process children (except for Page blocks)
        if block.get('children'):
            flat_blocks.extend(flatten_marker_json(block['children'], page))
    return flat_blocks

def filter_and_flatten_marker_json(blocks: List[Dict[str, Any]], page_number: int = 0) -> List[SimplifiedBlock]:
    unfiltered = flatten_marker_json(blocks, page_number)
    remove_types = {
        'TableCell', 'TableGroup', 'FigureGroup', 'ListGroup', 'Reference',
        'PageFooter', 'PageHeader', 'Footnote'
    }
    return [b for b in unfiltered if b.type not in remove_types and b.content]

In [None]:
# 6. Flatten and filter the Marker output
flat_blocks = filter_and_flatten_marker_json(marker_json.get('children', []))

In [None]:
# 7. Explore block types and content
import pandas as pd

df = pd.DataFrame([b.as_dict() for b in flat_blocks])
print('Block types found:', df['type'].unique())
df.head(20)  # Show first 20 blocks

In [None]:
# 8. Simple metadata extraction (title, authors, abstract)
def extract_metadata(blocks: List[SimplifiedBlock]):
    title = next((b.content for b in blocks if b.type.lower() in {'title', 'main_title'}), '')
    authors = next((b.content for b in blocks if 'author' in b.type.lower()), '')
    abstract = next((b.content for b in blocks if 'abstract' in b.type.lower()), '')
    return {'title': title, 'authors': authors, 'abstract': abstract}

metadata = extract_metadata(flat_blocks)
print('Extracted Metadata:', metadata)

In [None]:
# 9. Find and display all tables and figures (with extensibility for custom processing)
tables = [b for b in flat_blocks if b.type == 'Table']
figures = [b for b in flat_blocks if b.type == 'Figure' or b.type == 'Picture']

print(f'Found {len(tables)} tables and {len(figures)} figures.')

# Example: Show first table's HTML (for further processing)
if tables:
    from IPython.display import display, HTML
    print('First table HTML:')
    display(HTML(tables[0].content))

# Example: Show first figure as image (if base64-encoded)
import base64
from IPython.display import Image

def show_base64_image(b64str):
    try:
        display(Image(data=base64.b64decode(b64str)))
    except Exception as e:
        print('Could not display image:', e)

if figures:
    print('First figure (if image):')
    show_base64_image(figures[0].content)

In [None]:
# 10. (Optional) Extensible: Add your own logic to process tables/figures, e.g., send table HTML to a model, extract captions, etc.
# (No LLM-based summarization or captioning included)

In [None]:
# 11. Save flattened blocks for further analysis
df.to_json('flattened_blocks.json', orient='records', indent=2)
from google.colab import files
files.download('flattened_blocks.json')

# Eval OCR accuracy