# 05.2 - Classification of Scientific Papers Using Open Hugging Face Models

This notebook explores how open LLMs, such a Mistral, Llama, Gemma, Specter, etc., can be used for classifying scientific papers based on the content or their abstracts. Specifically, these models will be used to detect papers that discuss infectious disease modeling, and further identify which modeling techniques are used.

In order to increase the accuracy of the classification, multiple models will be evaluated and employed.

Load the training dataset, which includes a column specifyting whether the paper has been manually classificed as a disease modeling paper, or not.

In [None]:
from genscai.data import load_classification_training_data

df = load_classification_training_data()
df.head(5)

Create a model client for classifying papers. The model clients come in three varieties: AisuiteClient, OllamaClient, and HuggingFaceClient. Aisuite works with cloud model providers (e.g. OpenAI) as well as models hosted locally with Ollama. The Ollama client work with models hosted locally with Ollama. And the HuggingFaceClient uses the Hugging Face Transformers library for running models locally.

For local models, Ollama is preferred if device memory is limited, since Ollama hosted models are typically 4-bit quantized. For greater control of quantization and model parameters, Hugging Face Transformer models are preferred.

In [None]:
from genscai.models import HuggingFaceClient, MODEL_KWARGS

client = HuggingFaceClient(HuggingFaceClient.MODEL_DEEPSEEK_R1_8B, MODEL_KWARGS)

# the following only works for HuggingFaceClient since the model is hosted locally.
client.print_model_info()
client.print_device_map()

Classify each of the papers using the following model parameters (e.g. temperature) and prompt.

For local, non-reasoning models (e.g. Llama, Gemma, Phi), we want low temperature, since we're looking for a deterministic classification. Also, we only need a single token in the response. For reasoning models, however (e.g. DeepSeek R1), we want higher temperature, to promote reasoning, and more output tokens, which includes the reasoning output.

In [None]:
import genscai.classification as gc

# increase temperature (from 0.01) and max_new_tokens (from 1) to allow for longer text generation for reasoning models
generate_kwargs = gc.CLASSIFICATION_GENERATE_KWARGS.copy()
generate_kwargs.update(
    {
        "max_new_tokens": 1024,
        "temperature": 0.70,
    }
)

df = gc.classify_papers(
    client, gc.CLASSIFICATION_TASK_PROMPT_TEMPLATE + gc.CLASSIFICATION_OUTPUT_PROMPT_TEMPLATE, generate_kwargs, df
)

df.head(5)

Determine the precision, recall, and accuracy of the test.

In [None]:
df, metrics = gc.test_paper_classifications(df)
metrics