In [19]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline(task="text2text-generation", model="google-t5/t5-base")

Device set to use cuda:0


In [1]:
import re

def normalize_text(text):
    # 1. Replace multiple spaces or newlines with a single space
    text = re.sub(r'\s+', ' ', text)

    # 2. Strip leading/trailing whitespace
    text = text.strip()

    # 3. (Optional) Normalize unicode characters (e.g. accents, special quotes)
    import unicodedata
    text = unicodedata.normalize("NFKC", text)

    return text


In [21]:
print(pipe)


<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7d7270380c50>


In [2]:
context = """Capital fever: Malaria cases at a 6-year high in Delhi
The national capital [Delhi] right now is witnessing one of its worst post-monsoon seasons. Malaria and chikungunya cases are at a 6-year high, and dengue continues to haunt. According to the Municipal Corporation of Delhi [MCD], as of September 29 [2025], there were 371 malaria cases, 759 dengue cases, and 61 chikungunya cases -- a sharp rise from 237 malaria and 42 chikungunya cases reported in the same period last year [2024].

In fact, malaria cases have increased more than 5-fold since 2021, when only 66 cases were recorded by September. Dengue numbers, while lower than the 2701 cases reported in 2023, remain a seasonal concern for health officials.

The MCD has meanwhile intensified its mosquito-control drive this year [2025]. Between January 1 August 23, officials inspected over one lakh [100 000] houses that were found positive for breeding.

Nearly 99 000 legal notices and over 18 000 prosecutions were launched against owners of properties where mosquito breeding was found, violating civic health and sanitation bylaws. Around 8.79 lakh [879 000] houses were sprayed, and officials carried out 4588 community drives, collecting fines worth Rs 13.89 lakh [INR 1 389 000; USD 15 650]. To curb breeding, fish were released into 304 water bodies, up from 209 last year [2024].

Officials say dengue numbers remain relatively low, but malaria and chikungunya cases are rising. Dengue infections, which usually spike between September and November, are currently below last year's [2024] level, officials added.

[Byline: Ankita Tiwari]

--
Communicated by:
ProMED-SoAs

Moderator Comments
As per the media report above, Delhi is witnessing an increase in vector-borne diseases, with 371 malaria cases (highest in 6 years in the period up to 29 September 2025), 61 chikungunya cases, and 759 dengue cases reported so far in 2025. Vector control through measures such as source reduction, environmental management, and personal protection is the key to reducing the mosquito population and transmission of vector-borne diseases, including dengue, chikungunya, and malaria. Integrated vector management is essential to control the vector-borne diseases.

The Manual on Integrated Vector Management in India (2022) can be seen at https://ncvbdc.mohfw.gov.in/Doc/Guidelines/Manual-Integrated-Vector-Management-2022.pdf.

Malaria control strategy in India can be accessed at https://ncvbdc.mohfw.gov.in/index4.php?lang=1&level=0&linkid=421&lid=3707.

Chikungunya is a debilitating, but nonfatal, viral illness that is transmitted by Aedes aegypti mosquitoes. National guidelines for clinical case management of chikungunya in India can be seen at https://ncvbdc.mohfw.gov.in/WriteReadData/l892s/77728737401531912419.pdf.

Dengue is a mosquito-borne viral fever. National guidelines for clinical management of dengue fever in India can be seen at https://ncvbdc.mohfw.gov.in/WriteReadData/l892s/Dengue-National-Guidelines-2014.pdf.

Delhi is officially known as the National Capital Territory (NCT) of Delhi. A city and a union territory of India, it contains New Delhi, which is the capital of India (https://en.wikipedia.org/wiki/Delhi)."""

In [3]:
norm_context = normalize_text(context)

In [5]:
from transformers import AutoTokenizer, LongformerForQuestionAnswering
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("allenai/longformer-large-4096-finetuned-triviaqa")
model = LongformerForQuestionAnswering.from_pretrained("allenai/longformer-large-4096-finetuned-triviaqa")

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

question, text = "How many NEW chikugunya cases have been reported?", norm_context

# Tokenize
encoding = tokenizer(question, text, return_tensors="pt", truncation=True, max_length=4096)

# Move inputs to same device as model
input_ids = encoding["input_ids"].to(device)
attention_mask = encoding["attention_mask"].to(device)

# Forward pass
with torch.no_grad():  # optional: disables gradient tracking for inference
    outputs = model(input_ids, attention_mask=attention_mask)

# Get start and end logits
start_logits = outputs.start_logits
end_logits = outputs.end_logits

# Decode the answer
all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())
answer_tokens = all_tokens[torch.argmax(start_logits) : torch.argmax(end_logits) + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))

print("Answer:", answer)


Some weights of the model checkpoint at allenai/longformer-large-4096-finetuned-triviaqa were not used when initializing LongformerForQuestionAnswering: ['longformer.pooler.dense.bias', 'longformer.pooler.dense.weight']
- This IS expected if you are initializing LongformerForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LongformerForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Answer:  61


In [None]:
from transformers import AutoTokenizer, T5ForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
model = T5ForQuestionAnswering.from_pretrained("google-t5/t5-small")

question, text = "How many NEW dengue cases have been reported?", norm_context

inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens, skip_special_tokens=True)

# target is "nice puppet"
target_start_index = torch.tensor([14])
target_end_index = torch.tensor([15])

outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index)
loss = outputs.loss
round(loss.item(), 2)

Some weights of T5ForQuestionAnswering were not initialized from the model checkpoint at google-t5/t5-small and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Token indices sequence length is longer than the specified maximum sequence length for this model (882 > 512). Running this sequence through the model will result in indexing errors


7.03

In [None]:
print(pipe(sample, max_length=2000, min_length=1500, do_sample=False))

In [23]:
preds

[{'generated_text': 'False'}]